Data interlinking through robust linkrule extraction
This page contains the key extraction program and datasets used for the experiments made for evaluating it.
Linkrule extraction software
The linkrule extraction algorithm is implemented in Java. It takes as input two RDF files (either in RDF/XML or TTL), describing only instances of a particular class, and eventually another RDF file containing a set of owl:sameAs links that can be used when comparing objects of properties.
The output is a set of candidate linkrules with the following statistics: #links, discriminability, coverage, h-mean.
Download linkrule extractor software
Syntax:
java -jar linkrule-extractor.jar dataset1.ttl dataset2.ttl [object-owlsame-links.ttl]
Datasets
The two datasets describing communes: communes_insee.ttl and communes_gn.ttl
The owl:sameAs links between arrondissments (object of some properties describing communes in both datasets):
links_arrondissements_insee_gn.ttl
Reference owl:sameAs links between communes:
links_communes_insee_gn.ttl
Reproduce Experiments
The following command performs robustness experiments. It makes varying the probability to resp. remove instances, remove triples, scramble triples, remove refernce links from 0 to 0.9. Each time, 10 runs are done and generated results are the average of these runs. The output files are the following:
- cov_instancerem.dat, cov_triplerem.dat and cov_scrambling.dat are the evolution of coverage in function of the proba
- disc_instancerem.dat, disc_triplerem.dat and disc_scrambling.dat are the evolution of dicriminability in function of the proba
- prec_robustness.dat, resp. rec_robustness.dat, are the evolution of precision, resp. recall, in function of the proba.
- table_prec_rec.txt contains the table of generated linkrules and their different quality values
Syntax:
java -mx6000m -cp linkrule-extractor.jar fr.inrialpes.exmo.linkrule.EcaiExperiments communes_insee.ttl communes_gn.ttl links_communes_insee_gn.ttl links_arrondissements_insee_gn.ttl