Thesis Topics

The following thesis topics have been published (on JobTeaser):

  • A Wikibase-port with Support for RDF 1.2
    This thesis proposes a Wikibase port that supports RDF 1.2 by integrating a triplestore that natively provides RDF 1.2 features and querying capabilities. The goal is to adapt Wikibase’s storage and export pipeline so that RDF serialization and SPARQL access are grounded in an RDF 1.2–compliant backend rather than legacy assumptions. In addition to the integration work, the student will propose and justify a mapping from Wikibase’s data model (items, statements, qualifiers, references, and ranks) to RDF 1.2, ensuring faithful semantics and practical interoperability, and will validate the approach through benchmarks and/or representative use cases.
  • A Declarative Approach to Wikibase Population
    This thesis proposes a declarative approach to populating Wikibase, enabling users to describe what data should be created or updated in a Wikibase instance rather than prescribing how to perform each step. The goal is to design and implement a rule- or mapping-based method that translates high-level specifications into reliable edit operations (e.g., item creation, statement insertion, reference and qualifier management), with attention to idempotency, provenance, and error handling. The work will also include benchmarking and stress testing on realistic datasets, along with developing concrete use cases that demonstrate how the approach simplifies and scales Wikibase ingestion workflows in practice.
  • Declarative GenAI-Enabled Logical Iterators for RDF Generation in RML
    This thesis proposes extending the RDF Mapping Language (RML) with a declarative, GenAI-enabled extraction step that produces an iterable logical source for RDF generation. The goal is to define and prototype “GenAI logical iterators” with clear semantics for how inputs (prompts plus supporting files or entity lists) are transformed into record streams that RML can map into RDF, while ensuring reproducibility, provenance, and constraint-based validation of the generated triples. The work will be evaluated through concrete use cases, such as systematically tagging movie clips with controlled-vocabulary annotations and materializing the results as RDF, assessing quality, scalability, and cost.
  • A SHACL-DS Implementation in Python
    This thesis proposes a Python implementation of SHACL-DS, based on the ESWC-accepted paper “From RDF Graph Validation to RDF Dataset Validation with SHACL-DS” by Davan Chiem Dao and Christophe Debruyne. The goal is to build a practical validator that extends SHACL from single RDF graphs to full RDF datasets, making dataset-level constraint checking directly usable in real-world workflows. In addition to implementation, the work will include a benchmarking study to evaluate performance and scalability, and the identification, development, and validation of representative use cases that demonstrate where SHACL-DS provides clear benefits over graph-only validation.
  • A SHACL-DS Implementation in Java
    This thesis proposes a Java implementation of SHACL-DS, based on the ESWC-accepted paper “From RDF Graph Validation to RDF Dataset Validation with SHACL-DS” by Davan Chiem Dao and Christophe Debruyne. The goal is to build a practical validator that extends SHACL from single RDF graphs to full RDF datasets, making dataset-level constraint checking directly usable in real-world workflows. In addition to implementation, the work will include a benchmarking study to evaluate performance and scalability, and the identification, development, and validation of representative use cases that demonstrate where SHACL-DS provides clear benefits over graph-only validation.
  • Towards a Database of Syntactic Diagrams
    We are looking for an MSc student to contribute to the TraDiSyO project, a research initiative led by Prof. Mazziotta to understand the historical evolution and transmission of syntactic diagrams (DS). The student will work on: 1) Database design and implementation. This involves conceiving and building an accessible and extensible digital database for DS, capable of storing diverse metadata, including both the internal graphical characteristics and external contextual information of the diagrams. You will collaborate with project partners to determine the database’s modelling and implementation, aiming to create a robust tool for cross-referenced historical and heuristic research. 2) Develop models for structuring information about the diagrams’ internal and external features and create tooling for the use and management of these models. The concepts behind these models and their views must be separated. This project offers an opportunity to apply knowledge engineering and database development skills within an interdisciplinary linguistic history context, laying the groundwork for significant future research. For this subject, students have ideally followed INFO9014 and/or INFO9016.
  • Evaluating the Overhead of Query Rewriting for Authorization of Linked Data Endpoints
    A research-oriented (applied) and company-based thesis in redpencil.io based in Brussels, Belgium (office located at Brussel’s Central Station)
  • Measuring the Impact of Semantic Graph Databases on a Linked Data Microservice Framework
    A research-oriented (applied) and company-based thesis in redpencil.io based in Brussels, Belgium (office located at Brussel’s Central Station)
  • A tool to support the relational database design process.
    Current Entity-Relationship Diagramming (ERD) tools often blur the lines between conceptual, logical, and physical database design, leading to diagrams that are overloaded with detail and fail to communicate the concepts of the data model effectively. This conflation of design levels hinders effective communication between stakeholders, complicates model evolution, and can lead to inconsistencies between the intended design and the final database implementation. This thesis aims to develop a tool that supports the conceptual design of the ER model, respecting the principles and notation used in INFO0009 and its transformation into the logical model, the relational model, and any adequate visualizations thereof. Ideally, the student can also support (some) aspects of the physical design. The student will have an opportunity to validate their tool in INFO0009 provided the implementation for the conceptual design is sufficiently advanced.

  • Personal topic on Knowledge Graphs and Declarative Data Integration.
    Alternatively, students who have followed either INFO9014 (Knowledge Representation and Reasoning) or INFO9016 (Advanced Databases) are welcome to propose a topic on knowledge graph generation, knowledge graph management, knowledge graph organization, knowledge graph storage, or any other subject pertaining to data, databases, data modelling and data engineering. Students should come up with a concrete and well-defined thesis problem or topic. Students must be mindful that their proposals need to be discussed and approved. Students are encouraged to contact me for a meeting and are expected to come prepared.

If any of these topics spark your interest, feel free to reach out to me or my team for a short chat; we are happy to discuss the ideas, possible directions, and how they could align with your interests. We are also open to hearing your own thesis ideas and exploring overlaps together.