Thesis Topics
For summer term 2025, we do not have further capacity to supervise thesises. Please contact larger working groups.
Prerequisites
Thesises in our group usually need good knowledge in:
- mathematics and statistics, in particular in probability theory
- data mining and artificial intelligence
- strong programming skills in Java, Rust, Python
In a bachelor thesis you will usually:
- read primary literature on novel methods
- summarize foundations and reproduce the derivation of the new methods in your own words
- implement methods from the studied literature yourself
- experimentally compare the new method with reference methods
In a master thesis, we typically expect that you go beyond the state-of-the-art, for example by designing a novel extension of existing methods and comparing its properties in detail.
We typically do not offer literature research topics, and rarely offer purely theoretical topics!
We have a new thesis template available.
Open topics
There are currently few announced topics due to limited capacity. Individual topics are also possible.
- Gaussian Mixture Modeling in Rust
In this Bachelor thesis topic, the objective is to implement GMM in Rust, with emphasis on a modular and extensible code structure that supports different cluster models (elliptical, spherical) and MAP-regularization. This can be modeled following our existing Java implementation, but we expect to improve the run-time with a Rust implementation. As extension it is possible to also implement BETULA for accelerating GMM in Rust. Knowledge of and interest in Rust is necessary for this assignment! - Frequent Sequence Mining
In bachelor thesises in this area, a series of algorithms from Frequent Sequence Mining literature should be implemented and evaluated, based on the implementations of FIM algorithms in the open-source framework ELKI.- Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. Proc. 11th int. Conf. On data engineering (1995), 3–14.
- Srikant, R. and Agrawal, R. 1996. Mining sequential patterns: Generalizations and performance improvements. Proc. 5th international conference on extending database technology (EDBT) (1996), 3–17.
- Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U. and Hsu, M. 2000. FreeSpan: Frequent pattern-projected sequential pattern mining. KDD (2000), 355–359.
- Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. 2001. PrefixSpan: Mining sequential patterns by prefix-projected growth. ICDE (2001), 215–224.
- Zaki, M.J. 2001. SPADE: An efficient algorithm for mining frequent sequences. Machine Learning. 42, 1/2 (2001), 31–60. DOI:10.1023/A:1007652502315
- Ayres, J., Flannick, J., Gehrke, J. and Yiu, T. 2002. Sequential PAttern mining using a bitmap representation. KDD (2002), 429–435.
- Fournier-Viger, P., Gomariz, A., Campos, M. and Thomas, R. 2014. Fast vertical mining of sequential patterns using co-occurrence information. Proc. 18th pacific-asia conference on advances in knowledge discovery and data mining PAKDD (2014), 40–52.
- Python wrapper für ELKI
To make it easier to compare the implementations of algorithms from ELKI and Python, the goal of this bachelor thesis is to implement a wrapper to efficiently call the Java implementations from within Python. For this jpype shall be used to avoid unnecessary copies, serialization, and deserialization of the data. By using introspection, the python APIs should be automatically generated so that it is not necessary to modify the adapter for every new algorithm. This requires advanced knowledge of Java and Python (in particular an understanding of numpy arrays and of introspection). (bachelor thesis topic) - R wrapper für ELKI
To make it easier to access ELKI algorithms from R, a wrapper based on rJava is to be implemented and evaluated. By means of introspection, the wrapper is to be automatically generated so it does not need to be manually updated for each version. Configuration and syntax should follow R conventions. This requires advanced knowledge of R. (bachelor thesis topic) - Incremental Nearest Neighbor Search in Rust
In this bachelor thesis, efficient incremental nearest neighbor search is to be implemented in Rust, based on the existing Java-code in ELKI. To evaluate the performance, an algorithm based on incremental nearest neighbor search shall be implemented. The use of Rust is expected to further improve the performance over the Java version.- Schubert, E. (2022, September). Automatic indexing for similarity search in ELKI. In International Conference on Similarity Search and Applications (pp. 205-213).
- Schubert, E. (2024, October). Hierarchical Clustering Without Pairwise Distances by Incremental Similarity Search. In International Conference on Similarity Search and Applications (pp. 238-252).
Previous Topics:
As examples, some earlier topics:
Bachelor Informatik
- Clustering with von-Mises-Fischer Distribution
- Choosing the number of Gaussian clusters
- Clustering with k-nearest neighbor consistency
- Accelerating Spherical k-means in Rust
- Textbasierte Empfehlungssysteme
- Erkennung von beleidigender Sprache in Twitter unter Verwendung von Maschinellem Lernen
- Accelerated Spherical k-means Clustering
- Analyse Alternativer Algorithmen für k-Means-Clustering
- Analysis and Evaluation of Approximate Nearest Neighbor Search using Hierarchical Navigable Small World Graphs
- Analysis and Evaluation of AnyDBC Clustering with Alternative Heuristics
- Compact Circular Fingerprints for Chemical Structures using Bloom Filters
- Entwicklung eines Greedy-Algorithmus zur Generierung elektrischer Niederspannungsnetze auf Basis öffentlich verfügbarer Daten
- Erstellung und Auswertung von Song-Embeddings mittels Word2Vec
- k-Means-Clustering mit k-d-Bäumen
Master Informatik, Master Data Science
- Vantage-Point Tree to HNSW: Fast Index Construction for Approximate Nearest Neighbor Search in High-Dimensional Data
- SQL Code Embeddings - Integrating Syntactical Information into Transformer Models
- Entwicklung und Evaluation schrankenbasierter DBSCAN Varianten
- On-Device Robot Localization Based on Representation Learning and a Feature-Rich Industrial Floor
- Beschleunigung von DBSCAN mithilfe der Dreiecksungleichung
- Fast Latent Dirichlet Allocation with Background Topics
- Evaluation of Feature Selection from Small Molecule Fingerprints by Means of Machine Learning
- Abstractive Text Summarization using a Reinforced Variational Autoencoder
- Efficient Approximate k-Nearest-Neighbor-Search in Large Graph Databases
- Automated Flexibility Coordination in Smart Grids Based on Neural Networks
- Leave-One-Out Optimierung von Support Vector Machines zur Ausreißererkennung
Bachelor Wirtschaftsinformatik
- no examples yet
Industry topics
I generally require the industry partner first establishes contact before promising topics to students, as many industry topic ideas do not satisfy the exam regulations. In particular, a non-disclosure agreement must be checked by the legal department, and this may take several weeks. Furthermore, I have negative prior experience with poor mentoring by industry partners. Ideally, your supervisor there is an alumni, who has previously supervised thesises – please ask your industry supervisor to first establish contact, resolve any legal contracts necessary, and reach an agreement on the topic supervision.