IE seminar - Tutorials

Overview

Information extraction (IE) is about automatically extracting information from natural language texts. One common use case of IE is the knowledge base (KB) population, an approach for building artificial intelligence systems. Knowledge bases allow people and computers to process interlinked representations of real-world concepts, entities, relationships, and events efficiently and unambiguously. This hands-on seminar will investigate concepts and techniques for extracting specific information from political discourse text crawled from the WWW. During the seminar, participants will explore the fundamental NLP challenges of extracting knowledge from the raw text, such as text classification, passage extraction, named-entity recognition, relation extraction, and more. The seminar participants will get experience with spaCy, BERT, and SetFit models, creating datasets, fine-tuning large-language models (LLMs), and populating a knowledge graph using Neo4j AuraDB, a popular graph database on the cloud.

Project 1 - Text Classification

Project 2 - Passage Boundary Detection

Project 3 - Stance Classification

Project 4 - Named-Entity Resolution + Knowledge Graph Population

Assignment description

Resources

BERT Hyperparamters: A Guide to Fine Tune BERT models