Natural Language Processing
It has not yet been decided if this class is offered in English this semester.
The module Natural Language Processing provides a broad overview of research on developing systems to process textual human language data (no recognition of audio or handwriting). Beginning with early foundations in information retrieval and full-text search, over approaches to cluster text and model topics, we continue to the area of neural networks and embeddings, and touch the ideas of the transformer, the GPT and ChatGPT models.
This is an algorithm-oriented computer science class, and not designed to be an interdisciplinary module. The focus is on the algorithms, the foundations, the theory, and the data structures, as well how to implement and optimize them. We do not focus on applications or tools, as these constantly evolve, and are quickly obsolete. This lecture aims to teach you lasting knowledge of the foundations and the skills to acquire the latest ideas and develop future methods yourself. Good programming skills are a prerequisite. We -unfortunately- see high drop-out rates for attendees that do not have the necessary background.
Contents
Lecture contents include, but are not limited to:
- tokenization and data preprocessing
- bag-of-words and the vector space model
- information retrieval and full text search
- text clustering
- topic modeling
- matrix factorization
- sequential models: markov models, maximum entropy, and conditional random fields
- neural networks for text modeling
Requirements:
- good programming skills in Python
- good knowledge and understanding of data structures and algorithms
- curiosity and motivation for self-directed learning