The following lectures and exercises are based on the material I have developed for the course on Open Information Systems at the Vrije Universteit Brussel. I was the lecturer from 2013-2014 to 2015-2016 (three years), and in 2019-2020. Some of the material presented here haas also been repurposed and refined for public seminars.
All work is licensed under a CC BY-NC-SA 4.0 International License, unless where explicitely mentioned otherwise.
The first in a series of lectures on knowledge representation on the Web. We introduce the notion of "ontology" and its role in so-called "open information systems". We conclude by introducing ontologies for the Semantic Web.
In this lecture, we will cover both RDF and RDF(S). The first is a data model, and the latter a simple ontology language built on top of that model. Graduate students are assumed to be familiar with XML and XML Schema, but that assumption is not necessary for the lectures presented here.
Remember, RDF is a data model. Ontologies require ontology languages (such as the Web Ontology Language described later on) for which adequate tools exist. For RDF, however, most adequate tooling for "merely" performs syntactic checks rather than semantic checks.
Two fairly "fancy" editors with support for RDF (via plugins) are Visual Studio Code and Atom. The former might be more lightweight, easier to install, and proposes the installation of plugins upon or saving files of a particular type.
TURTLE syntax highlighting in Visual Studio Code
RDF(S) is too weak if we want to adequately capture our Universe of Discourse in an ontology. The Web Ontology Language (OWL) provides a language that allows us to do just that. Description Logics (DL) provided the foundations of OWL. To understand OWL in the next lecture, we need to understand DLs. In this lecture, I introduce important concepts of DLs: vocabulary, grammar, and semantics; the Open World and Unique Name Assumptions; and reasoning tasks with the Tableau Algorithm.
RDF(S) is too weak if we want to adequately capture our Universe of Discourse in an ontology. The Web Ontology Language (OWL) provides a language that allows us to do just that. This lecture covers the syntax and capabilities of the Web Ontology Language. We then demonstrate some of OWL capabilities in the Protégé ontology editor.
Now we know how to represent UoDs in ontology languages and represent resources with RDF, we will explore how we can interrogate RDF with the SPARQL query language. We will focus on retrieving information. Updating RDF datasets with SPARQL are left as an exercise in the project.
Now that we have covered some of the key technologies in the Semantic Web stack, we will cover a type of Semantic Web. Linked Data is both the name of an initiative and the name of a set of best practices and guidelines. These best practices and guidelines allow one to publish data on the Web in an interlinked and distributed manner, effectively creating a Web of Data.
In this lecture, we cover two W3C Recommendations to generate RDF from relational data. The first is Direct Mapping, which generates RDF that is based on the structure of a relational database. The second, and the focus of this lecture, allows us to declare how data from relational databases should be transformed into RDF.
In this lecture, we cover the validation and verification of ontologies and knowledge bases, which are port aspects that need to be evaluated in the ontology engineering process. We will cover several dimensions that can be assessed and take a closer look at SHACL, a W3C Recommendation for declaring the constraints that RDF datasets should comply with.
Ontology Matching is the process of identifying correspondences between entities in ontologies (and datasets) followed by the process of creating a mapping. Mappings capture how those entities are related (e.g., equivalence). Several types of mappings exists: e.g., mappings to integrate ontologies, mappings to transform RDF datasets, etc. In this lecture, we provide an overview of ontology matching as a problem in ontology engineering.
In this lecture, we cover a couple of standardized or well-established vocabularies for representing the provenance (i.e., the origin) and characteristics of datasets. As they are all stored as RDF, this allows us to store and engage with those together with the data.