IE seminar - Tutorials

Institute of Computer Science, Brandenburgische Technische Universität Cottbus-Senftenberg
Juan-Francisco Reyes
pacoreyes@protonmail.com

Overview

Information extraction (IE) is about automatically extracting information from natural language texts. One common use case of IE is the knowledge base (KB) population, an approach for building artificial intelligence systems. Knowledge bases allow people and computers to process interlinked representations of real-world concepts, entities, relationships, and events efficiently and unambiguously. This hands-on seminar will investigate concepts and techniques for extracting specific information from political discourse text crawled from the WWW. During the seminar, participants will explore the fundamental NLP challenges of extracting knowledge from the raw text, such as text classification, passage extraction, named-entity recognition, relation extraction, and more. The seminar participants will get experience with spaCy, BERT, and SetFit models, creating datasets, fine-tuning large-language models (LLMs), and populating a knowledge graph using Neo4j AuraDB, a popular graph database on the cloud.

Project 1 - Text Classification

  1. Dataset 1: Annotation procedure

  2. BERT Model 1: Fine-Tuning Deep-Learning Model for Binary Text Classification

Project 2 - Passage Boundary Detection

  1. Dataset 2: Annotation procedure

  2. BERT Model 2: Fine-Tuning Deep-Learning Model for Passage Boundary Detection

Project 3 - Stance Classification

  1. Dataset 3: Annotation procedure

  2. SetFit Model 3: Fine-Tuning Deep-Learning Model for Stance Detection

Project 4 - Named-Entity Resolution + Knowledge Graph Population

  1. Assignment description

Resources

  1. BERT Hyperparamters: A Guide to Fine Tune BERT models