After analysing some tools avalailable to interlinking data, we realised how different such tools were. So we attempt here are proposing a framework in which they act.
We present here a general framework encompassing the various approaches used to interlink resources on the web of data. This framework adapts to the different cases that can be encountered when two web data sets are interlinked. We will see how each of the studied interlinking tool find its place in the framework.
Resources may be manually interlinked.
Resources can be trivially linked using a simple transformation of their URIs.
Further than URIs, it may be necessary to consider the ontologies in order to identify entities. In a first case, the two data sets to interlink are described by the same ontology. The role of the interlinking system is to analyze resources of the same type in order to detect the equivalent ones. To do this, the system compares resource properties with a similarity measure. Systems in this category take as input the properties to compare, the type of comparison algorithm to use for each property, and the method to aggregate the similarity measures of the various properties in order to construct a measure between two resources.
Data sets may be described by different ontologies. In order to know which types of entities have to be linked together, the system needs to know the correspondences between these types of entities. The system then works similarly than if there were one ontology.
Two approaches might be used in order to interlink the data sets. In a first approach, the alignment between the two ontologies is implicitly specified in the input of the interlinking system. We represent this case in the following figure by introducing the correspondences between ontology classes as an alignment. This alignment is presented as implicit because it does not exist as such, but it is mixed with the linking specification or the data interlinking system.
Consider two data sets, one described using FOAF, the other using VCard. The linking specification will indicate to the tool to compare entities of type foaf:Person and entities of type vcard:VC, and that when comparing resources of these types, the properties foaf:givenname should be compared to vcard:fn, as well as the property foaf:familyname compared to the property vcard:ln. This is an implicit alignment containing two correspondences.
For example, OpenCyc represents the artist J.S. Bach using a different ontology than the one used to describe MusicBrainz. The properties ``firstname'' and ``lastname'' correspond to a property ``EnglishID'' in which both names are concatenated. The class MusicArtist in the Music Ontology corresponds to a class Classical Music Composer in OpenCyc. An alignment between classes and properties needs to be specified in order to find an equivalence between those two resources. This example is illustrated in the next figure:
Another approach takes advantage of an already existing explicit alignment between the two ontologies used by the data sets
Each of the analysed tools fits in one of the category of this framework as shown on the following table:
Category | System |
Manual link specification | |
URI correspondence | RKB-CRS |
Common ontology | LD-Mapper, ODD-Linker |
Different ontologies, implicit ontology alignment | RDF-AI, Silk |
Different ontologies, explicit ontology alignement | Knofuss |
This is the most general case illustrated below, in which two web data sets are related using a method for comparing their resources.
We do not specify at this stage if the method should be automatic or manual. Neither do we specify if the two data sets are described using a common ontology or if the ontologies describing their resources differ. The result of the interlinking process is a set of owl:sameAs predicates between these resources.
The diagram above, already shows how ontology matching and data interlinking could cooperate. Starting from these remarks, we can propose a way to make them collaborate.
François Scharffe and Jérôme Euzenat