Legal datasets integration: keep it simple, keep it real

Tracking #: 462-1639

Gioele Barabucci
Angelo Di Iorio
Francesco Poggi

Responsible editor: 
Guest editors Semantic Web 4 Legal Domain

Submission type: 
Full Paper
Abstract Governments and public institutions are increasingly publishing their documents and data in freely available knowledge-bases, often as Linked Data silos. One of the goals is to connect and integrate heterogeneous legal sources, by exploiting Semantic Web technologies. Very often such integration is partial and difficult, due to the heterogeneity (and some conceptual flaws) of the datasets. In this paper we present ALLOT, a lightweight legal ontology based on the Akoma Ntoso ontological model, and we focus on how ALLOT can be used to align different datasets and to query all of them as a whole. The minimalistic approach of ALLOT makes dataset designers and users work with simple but effective concepts and queries.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 26/May/2013
Major Revision
Review Comment:

The paper has certainly improved since the previous version.

However, some more improvements would be useful before publication.

At least:
- The selection of the top level classes should be better justified. In fact, they are rather general.
- Clarify if there is the intention that other ontologies could use the ALLOT one and if it could be used in other applications than the one presented.
- Justify if the 3 cases used (Italy, Brazil and UK) are relevant enough for the analysis.
- Conclusions seem poor. The future work should be made explicit, including what would be the next objectives of the work.

Review #2
By Rinke Hoekstra submitted on 23/Oct/2013
Review Comment:

This paper presents ALLOT, a legal ontology based on the Akoma Ntoso model that can function as a single point of entry for multiple legal Linked Data sets.

In my review of the previous version of this paper [1], I highlighted the following points:

1. Convince the reader of the need for the ontology: does it solve a real problem that is not solved by any existing work?
2. Reference to ontology-categorizations from the literature was missing.
3. Long term content preservation arguments (RDF vs XML) were off the mark.
4. Section on naming convention was superfluous.
5. Relation between section on ALLOT and the introductory sections was not clear.
(6. Discussion on contextuality and time-dependence is interesting, and could be broader.)

The introduction has been rewritten almost from scratch, but could use some finetuning with respect to language and structure (see below). The section on "legal and para-legal ontologies" has been shortened and partially rewritten. Unfortunately it still does not really refer to earlier work on ontology categorizations. Also, the section does not explain why the focus on the relation to "external entities", i.e. the distinction between document-centric, content-centric and integration-centric ontologies is a useful one, and what the relation with ALLOT is. If this is meant as a section on related work, then a better position would be at the end of the paper, with more focus on explaining the relation between ALLOT and the other ontologies (+ the need for the distinctions made here). (point 5 above)

The integration-centric ontology category described in section 2.3 is perhaps the category I find most problematic. It seems that the strategy of integrating existing ontologies in your new ontology is more *methodological* in nature than that it reflects a design decision on the types of things described by the ontology. Indeed these efforts are typically a bit more modern than the other ones, since they are also more feasible with the growing number of eligible datasets and vocabularies available today.

Section 3 is still not really up to shape. It needs to explain why the authors choose Akoma Ntoso and not the (competing) MetaLex standard. This is not argued in the preceding sections. Section 3.1 still argues that it is apparently better to have an informal description of the TLCs than a formal one: "clues about the intended meaning of a marker even in the unfortunate case the formal ontology is no longer available". This holds in the reverse direction as well "clues about the meaning of a marker even in the unfortunate case the informal description is no longer available". Why not have both? (point 3 above) Section 3.2 needs more references to accepted practice in Linked Data land (e.g. Cool URIs etc.). With the current description it is wholly unclear to the reader what this naming convention adds on top of the resolvable URIs of Linked Data. Also, why the need for the URI scheme? In RDF, the type of any entity (global name) is preferably expressed explicitly using an rdf:type relation with its type. Indeed, it is considered good practice by many to also include type information in the URI (if only to ensure the minting of unique URIs), but this is not mandatory nor necessary (URIs do not carry semantics).

The main goal of the ontology is to allow integration between various datasets. However, you do not discuss the drawbacks of using ALLOT as entry point. Using only ALLOT means that there are no direct links between the existing ontologies. Wouldn't it be more worthwhile to define mappings between the various ontologies directly? That way you do not have to introduce yet another ontology.... A way to solve this is to not just map to ALLOT, but also from ALLOT to other ontologies. That way, ALLOT can truly function as an interlingua between the other ontologies. The reason why this is important, is that the other ontologies reflect existing, accepted standards, whereas ALLOT is a new kid on the block. Wouldn't you expect more interoperability and compatibility if you acknowledge the possible existence of implementations that already adhere to those existing standards? And following this line of thought, why does introducing yet another standard solve the interoperability problem? You are expecting developers to adopt a new standard in favour of older ones without giving concrete advantages in return. Compatibility is surely a nice-to-have, but it is certainly not critical to most applications.

More specifically on the mappings themselves, it is unclear to me what criteria you use for deciding which ontology/vocabulary to use in the implementation layer (e.g. LKIF, FOAF) and which you use in the integration layer (Metalex, OCD etc.). The decision appears to be quite arbitrary. Also, when you discuss the problems of some of the SKOS-based mappings, you do not really explain *why* they are problems, and why your solution (using the SKOS mappings) is the best. What exactly is different between metalex:Event and allot:Event, and what caused this difference? Is it because of the inclusion of some of the ontologies in the implementation layer, or is there some other reason? Akoma Ntoso does not specify that they *have* to be different. Furthermore, to the best of my knowledge, the events in MetaLex and LKIF Core are equivalent and you seem to have been using LKIF Core for the temporal parts in ALLOT. As for the role-example, this is also not clear to me. Roles in LKIF core are most certainly constrained to a context (in whatever form). So why can you not use them directly? Secondly, the need for SKOS mappings is unclear to me. Looking at the examples you give, wouldn't using subclass relations be a better way to capture the broader and narrower relations you describe? Also, you seem to only map classes, not properties. Is there a specific reason for this? Finally, how do you plan to intermix the use of SKOS-based mappings and the SPARQL-based ones in practice?

In the discussion you briefly hint at the problems surrounding the representation of provenance of legal statements. Have you considered using the PROV-O standard for doing this?

Finally, looking back at the points I brought forth with respect to the previous version of this paper, I must conclude that the authors did not fully address these issues.

ad 1. The authors still do not bring forth a convincing story for the need of ALLOT. It would certainly help if they could describe an actual use case and application that depends on the bridging of the various ontologies. I do see the amount of effort invested in defining the mappings, but this effort is moot if there is no clear use for them. Being able to query across datasets is a nice feature, but when is it necessary? Related is the question what makes this ontology a *legal* ontology (the TLCs are not specific to Law at all); how does the choice for the legal domain influenced the design decisions you made with respect to the development of ALLOT and the mappings you defined. Would the mappings work in another domain as well?

ad 2. There still is hardly any reference or discussion of the earlier literature on ontology categories. Nor is there a clear reason why these categories need to be described here, as there is little to no relation between the section and the actual discussion of ALLOT.

ad 3. The long term preservation argument is still there. If this argument is not that of the authors, but that of the Akoma Ntoso developers, then the authors should make this clear. Otherwise, it needs to be a whole lot more substantial, or just left out.

ad 4. The section on the naming convention and URI resolution still is not needed. It purely serves as an introduction to the Akoma Ntoso specifications, but this intro is not needed for the sections on ALLOT itself. Also a reference to and discussion of practices in Linked Data land is missing.

ad 5. Given the comments above, the links between the ALLOT section and the introductory sections is still lacking.

Concluding, I still think the work presented here is not mature enough to warrant publication as a journal paper. The scientific contribution is limited or at least unclear. There is little to no reference to similar work in ontology mapping applied to other domains, design decisions are not sufficiently backed by discussion. Also, the application of the ontology is not explicit enough (why is it needed) nor does the paper contain any evaluation as to the quality of the solution, nor with respect to the practical use of the ontology.

Smaller nitpicks:


* "Hansard" is only used in countries of the Commonwealth, consider a more generic term such as parliamentary transcripts.
* "Linked Data silo" is a misnomer as Linked Data is specifically designed to allow breaking the walls of data silos. It allows reuse of identifiers (resources) from other databases. The term 'data silo' is typically used to illustrate the problem Linked Data is to solve. Perhaps use a more neutral term (triple stores, datasets, repositories)
* "Linked Data silos [26] make it also possible to connect such legal data, by exploiting Semantic Web technologies. " -> Strange sentence: Linked Data = Semantic Web without a lot of schema-information & reasoning.
* "Much more meaningful information will be in fact available to the users." -> in fact be
* "Metalex and LKIF ontologies[13], for instance, can be successfully adopted for this task." -> *The* MetaLex and LKIF ontologies + [13] is not the best reference to the LKIF Core ontology.


* introduce what "TLCs" stands for before you use the abbreviation
* Why does compliance to Akoma Ntoso and the TLCs and formalization in OWL make ALLOT fully *interoperable* with Linked Data?
* "peculiar to certain field" -> "specific to a certain field" (peculiar is more like "awkward" than "specific")
* I do not feel that the construction and integration of legal ontologies is still a hot research topic, the discussion seems to have dissipated significantly over the past 5 years.


* "Some document-centric are domain-specific. " -> document-centric ontologies
* "CLO is strongly based on the rich axiomatization and reification provided by DOLCE, and is an extremely powerful logically-sound framework.". This juxtaposition of CLO and LKIF Core is not entirely fair. It is true that DOLCE and CLO have a more philosophical perspective, and DOLCE uses a more expressive language for its formalization, the LKIF Core ontology is just as "logically sound" as DOLCE CLO, as it is expressed in OWL 2 DL.
* You mention LRI Core, but do not cite it anywhere.
* "Notice that most of the ontologies we mentioned falls exclusively in one category." -> I gather that you mean "do not fall exclusively"


* "ALLOT can also used to bridge KBs" -> be used
* The versions of ALLOT published at the url are in OWL XML format, this is not an RDF format, and therefore not really Linked Data.


* Please refer to a paper about LKIF Core as you do with the other ontologies


* On pages 10, 13 and in the conclusion you briefly discuss the problems surrounding the representation of context. Especially in light of the compatibility between ALLOT and LKIF Core, and the dramatic increase of triples in the knowledge base if you make the context of every role explicit. Have you spent any thought on using named graphs for this issue?


* What syntax are you using? Is it manchester syntax? Why not use Turtle, or a graphical representation? (fig 4, 5 etc.)


* "fuzzy" has a very precise meaning. Are you sure you want to use it here?


* Figure 6, why are the property values annotations and not facts?


* "a few" -> "few"


Review #3
By Miriam Fernandez submitted on 12/Nov/2013
Major Revision
Review Comment:

This paper presents ALLOT, a new ontology based on the Akoma Notoso model used to map different ontological models and to uniformly query data from different knowledge bases.

After carefully reading the new version of the paper, as well as the previous reviews, seems to me that the current version of the paper does still not address some of the key concerns previously raised by the reviewers:

First of all, the need of ALLOT is still not clear. While the authors present a thorough state of the art review about ontologies in the legal domain (section 2) it is not clear how this work distinguishes from the previous ones, i.e., what is the key contribution of the proposed ontology versus the existing ones? Authors claim that the ALLOT ontology can serve as uniform data model, where other ontologies can be mapped to integrate information. However, it is not clear why can’t the existing ontologies be used as a common data model and what is the key need that ALLOT is trying to address.

The paper does not present any evaluation. While section 6 presents a use-case scenario where the authors try to highlight the simplicity of querying data with ALLOT vs. using other ontologies, this scenario constitutes a very particular information need. A more complete evaluation will be desirable where quality and performance of ALLOT versus other ontologies is tested using multiple information needs collected from stakeholders.

While the authors present a clear set of lessons learned I'm missing some ideas regarding issues such as:
- Synchronisation of data across the different knowledge bases
- The existence of incomplete information when using CONSTRUCT to populate the new ontology (the authors do not seem to use optional arguments at any stage of the population process)
- The fact that static values are used during construct (such as the legislature 54). Is this the only legislature in the Brazilian parliament dataset?

Minor details
The abbreviation TLCs is used before being introduced (page 2), same with CLO (page 4) and FRBR (page 11)
Some definitions have been -> some definitions HAVE been (page 9)
YAGO 2[24]is -> YAGO 2[24] is (missing space after ] ) (page 12)
The data contained in original dataset -> the data contained in THE original dataset (page 17)
Was that these the datasets -> was that these datasets (page 18)