TermIt: Managing Legal Thesauri

Tracking #: 3239-4453

Authors: 
Petr Křemen
Michal Med
Miroslav Blasko
Lama Saeeda
Martin Ledvinka1
Alan Buzek

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
Abstract: 
Thesauri are simple enough to be understood by domain experts yet formal enough to boost use cases like semantic search. Still, linking the meanings of thesauri concepts to their definitions in source documents, interlinking concepts across thesauri, and keeping the set of concepts semantically consistent and ready for subsequent conceptual modeling require appropriate tools. We present TermIt, a web-based thesauri manager addressing these issues. We show a scenario in the urban planning domain, and compare TermIt to other tools in this scenario. Next, we evaluate TermIt features and usability and discuss its impact beyond the original scenario.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Harshvardhan J. Pandit submitted on 16/Oct/2022
Suggestion:
Minor Revision
Review Comment:

# summary

The article describes the TermIt tool for managing 'legal thesauri' using SKOS and UFO (upper foundational ontology), along with its application in a Czech building-domain use-case. The tool information has been provided clearly, and the linked resource is both available and functional. The overall impression is that this tool/system has been developed well and has achieved usefulness and impact in its use-cases. In conclusion, I recommend the paper be accepted following some minor changes to the presentation of information and providing additional details regarding implementation, impact, and evaluation as stated below.

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

- The article clearly articulates the necessity of semantic disambiguation of concepts with examples from legal tasks in Sec.1 and Sec.2. IMHO, this only covers the problem, and does not provide enough insight as to how TermIt helps with these. Given the short length of a systems paper, I suggest removing the use-case in Sec.1 (viz. construction), and using Sec.2 to provide the scenario with example as well as how a tool might satisfy this OR how TermIt specifically satisfies the issue using SKOS and UFO. (Note: my comment only concerns providing information on how a tool would help with the highlighted problems. Authors should best decide what flow/structure they want to present.)

- Having worked on legal documents and vocabularies for some time, I can understand the importance and impact. The Sec.7 information on impact only mentions use in scenario from Sec.2 and two additional ones. IMO, the impact and use-case sections should provide more information on all three use-cases together and be more specific about how TermIt has changed practices and/or helped with the tasks. This will help get a holistic view on what/where this is being used, by whom, and what the actual impact is beyond stating 'it is adopted'.

- The stated results of TermIt being actively used by a significant amount of users is great, and definitely shows impact. Some more details would be nice to drive the point - for e.g. links to the use-cases or projects, especially detailing the use of TermIt; or if there is a public page about the webinars conducted; or if there are public resources being created using TermIt (footnotes 14 and 15 relate to this, but it can be highlighted as being more significant than mere 'users'). In addition, some information on how the usage has been so far in terms of whether there was any feedback collected about using the tool, issues/improvements identified, plans for incorporating them, continuous developments etc. The last ones relate to whether the tool is still planned on being developed or the work has 'concluded'.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess

- The paper is well written, understandable, and information links are provided to relevant resources.

(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,

- resources are well organised, and documented (which is an important indication of 'mature' work)
- The long term resource link provided is for the docker image, which is okay-ish, but it would better to also highlight the nice pages for installation and tutorials in the paper itself (or mention at the least) ; https://kbss-cvut.github.io/termit-web/

(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,

- I confirm that I accessed the resources and they were executable and functional.

(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and

- GitHub is fine, IMO. However, the repo will always point to the latest commit, without a way to identify what version was "described" in this paper. Ways to do this include: creating a release (with an explicit tag), uploading the tool archive to a repo (e.g. zenodo).

(4) whether the provided data artifacts are complete.

- Yes.

# other major comments

- A screenshot (or a figure with multiple) would be nice since this is a tool.
- Identifiers can be made more visually recognisable, e.g. Sec.4.3 can use brackets as (g2) or g2) or use a table.
- Sec.6.1 on tool comparison mentions discarding open-source tools more than 2 years old. This is bad practice IMHO. First, Why/How did the authors decide on 2 years specifically? Second, 2 years is too low of a bar to exclude something. SKOS is a standard which has been around for a while, which means there may be tools that have been developed for it, are relatively stable/mature, but were excluded from this list. Third, how many such (excluded) tools were there? Or was there no research on them at all - and only tools that were recently updated were looked at?
- In continuation, why were other similar practices and tools used for legal documents not considered? For example, individual topics as - SKOS management, or document annotation. The EU Publications Office does a lot of similar work on publications using metadata where such duplicate concepts can be easily seen (e.g. search for concepts on IATE https://iate.europa.eu/). They use VocBench. Similarly, there's SKOS Play that only does SKOS related documentation, and is commonly known/used. It would be helpful to understand where exactly TermIt fits within this ecosystem - does it replace all these tools, or it fits within them i.e. some other tool can be used for a feature such as documentation? This also relates to the requirements (F1-F5) described in the paper.
- Sec.5 Related Work is too small to be effective. Also Sec.6.1 provides more information on related tools. Perhaps Sec.5 contents can be integrated with Sec.6.1 to make relevant work more comprehensive as well continuous?

# minor things

- In the interest of saving space if needed to keep the 10 page limit: Fig.2 UFO types can be removed; The triple on pg. 4 line 24 can made inline or preferably just expressed as plain text; Sec.6.2 user evaluation details can be shortened
- "SKOS compliant" is an odd phrasing because SKOS is not a protocol or a specification, but a vocabulary description. If SKOS set out some requirements that TermIT implements, then the sentence can be understood as meaning these requirements are implemented correctly and the resulting output is 'compliant' with those requirements. Perhaps "SKOS compatible" or "SKOS adherent" may be more accurate?

Review #2
By Patricia Martin-Chozas submitted on 29/Nov/2022
Suggestion:
Major Revision
Review Comment:

This tool-paper presents Termit, an application to manage legal thesauri. In general, the sections of the paper are correct, the intended ones for a tool paper. However, the writing style can be greatly improved. For instance, the abstract is direct and brief, scarce in details and it presents strong subjective assumptions, such as that thesauri are simple. Why are thesauri simple? Also, in the abstract, the tool is presented in a very abstract manner; it should be explicit what the tool does.

Similarly, the first sentence of the introduction is not appropriate for the beginning of a paper. “Consider two sentences” is an order to the reader, which in my opinion, is not the best way to introduce a topic. Then, all of a sudden, the introduction puts the focus from urban planing into legal acts, but it is not explained why. Why are legal acts relevant for this paper? Are authors trying to solve problems related with legal documentation within a project? More context is required here.

After this, SKOS is introduced, but there is no connection between this paragraph and the previous one. Paragraphs present independent ideas and they are not connected amongst them. This should be improved, since it makes the paper difficult to understand.

P1-L42 presents another strong assumption “thesauri are not enough and ontologies are needed”. The purposes of thesauri and ontologies are different and can not be compared; they are complementary.

P1-L48 mentions a previous paper. The next sentence says “this paper focuses on the practical impact…” Does this sentence refer to the previous paper or the current one? It is ambiguous. If the tool was already presented, what are the differences between both papers? It should be mentioned.

Section 2 presents the motivating scenario for the development of the tool, but it is not clear why authors are studying documents such as norms and decrees. Again, is it a national/European project? Which are the tasks that you are trying to solve?
Also it would be better to add a legend to Figure one, exemplifying the meaning of the arrows and boxes (such as this one), instead of describing it in the caption.

Section 3 is supposed to present the background of the paper. I would expect something like previous work by the authors on the topic, the beginning of the research, but it only contains references to the vocabularies used by the application: SKOS and the UFO ontology. This section also mentions XKOS, but why is this extension relevant to the paper? Why mentioning this extension and not SKOS-XL, for instance?

Section 4 presents the architecture of Termit. It is surprising that in Figure 3 several components, represented by ovals, are depicted, but only three of them (thesauri management, document and web annotation and quality checking) are described throughout the section. Why is this? Also, since this is a tool report, i miss more detail both in the figure and in the section in general. For instance, what are the external services reused and what are those developed for the application? It would also be convenient to indicate which are the input and output data of each component. Another issue that should be tackled in this section is the requirements of the application. What are the (minimum) requirements that this tool should cover?

Section 5 presents the related work, which in my opinion is scarce. This section includes some tools for thesauri and other types of datasets in Semantic Web formats, such as VocBench, PoolParty or TopBraid. I think it should also include related works that present experiments in thesauri management and document annotation in Semantic Web formats that may have not yet materialised in applications/tools.

In the evaluation I miss some graphics representing the results and examples of the questions in the form. Sections 7 and 8 seem fine, in my opinion.

Minor remarks:

P2-L6: “section” and “Section” need to be unified.
P3-L38: the code excerpt should be presented inside a Listing and referenced in text.
P4-L24: the code excerpt should be presented inside a Listing and referenced in text.

Taking all the comments into account, in general, I think that the tool could be useful for legal documentation management, but its presentation and description in the paper could be highly improved. For this reason, I suggest a major revision of the document.

Review #3
Anonymous submitted on 09/Jan/2023
Suggestion:
Major Revision
Review Comment:

The paper presents a tool to support the management of thesaurus. The main use case that the work focuses on is urban planning. The work is somewhat original. However, its presentation in the paper should be improved significantly. I suggest the authors view the structure of similar work published in the journal. There is a lack of motivation for the work and a systematic review and analysis of the existing work (i.e. their purpose, scope, and limitations) in the paper. This will help better understand the significance of the authors' work. The tool's documentation in GitHub is sufficient.

Specific comments:
- The abstract is concise but too vague. It does not provide specific details about the tool's functionalities and why it should be used.
- The introduction should better motivate the work that has been done.
- One of the contributions mentions validation and quality checking using the UFO ontology. It should be clearer if the work focuses on ontologies and/or thesaurus. Further, I would not consider validation of the work a contribution. The findings of it could be on the other hand.
- Section 3 should start with a brief introduction of what is presented. The introduction of SKOS should be moved to the introduction or the related work.
- Section 5 should present an overview of related work, their scope, purpose and limitations. Currently, the existing tools are only listed. Why are these tools not able to do what TermIT does? What are their limitations with regard to the main use case?
- Section 6 should start with a brief presentation of the evaluation and its methodology. Why it was selected and how it was applied? The authors should think of a way to better present the evaluation results. This could be done with tables and graphs.
- The online documentation mentions that TermIt is based on an ontology. This should be better elaborated in the paper.

Other comments:
- On page 1, line 39, SKOS should be spelled out as it is mentioned for the first time in the text.
- On page 1, line 40, what other SKOS management tools exist?
- On page 1, line 45, SKOS-compliant should be elaborated.
- On page 2, lines 14-16, missing references to IPR, PBR and MPP.
- On page 3, line 47, XKOS should be spelled out.
- On page 5, line 47, there is a missing space in the brackets.
- References should be checked for consistency. References 8 and 13 have different formats.

Review #4
By Sabrina Kirrane submitted on 16/Jan/2023
Suggestion:
Major Revision
Review Comment:

This submission describes a tool for managing legal thesauri entitled TermIt. The paper starts by describing a scenario that motivates the need for legal thesauri management and presenting the necessary background information with respect to SKOS and UFO. Following on from this a high-level overview of the TermIt architecture is provided and TermIt is compared to other tools that offer similar functionality. Finally, the effectiveness of the tool is assessed via a user experience evaluation.

This work tackles an important problem, the paper is well written and easy to follow, and the tool has been evaluated in a real-world context. That being said the paper has several limitations in term of quality, importance, and impact.

It is well known that legislation is vague and ambiguous and contains not only explicit references but also implicit references that are difficult to interpret without proper legal training. Even with the proper legal training it is not unusual to have several different interpretations by several lawyers. Thus, I would be interested in knowing more about the suitability of the proposed approach for handling such cases. For instance, I would be particularly interested in knowing more about the coverage of the tool and if the ‘TermIt’ team have succeeded in identifying all ambiguities in a particular legislation or legal domain.

If I understand correctly users can manually annotate documents and additionally a tool called ‘Annotace’ can be used to automatically detect and annotate document concepts. Here I wonder what is the effectiveness of both the manual and the automated annotations. In the case of the former, have the annotations been validated by legal professional? In the case of the latter, is there a gold standard that can be used to assess the precision and the recall of the concept identification and annotation?

The evaluation section of the paper includes both a comparative analysis to similar tools and a user experience evaluation. I am a big fan of comparative evaluations in general, however the comparison is at a superficial level and thus it is not entirely clear to me which of the tools is more effective when it comes to legal thesauri management. Thus, I would be very interested in seeing a detailed comparison of both tools, for instance by comparing both the precision and recall and also the user experience.

Additionally, when it comes to the user experience evaluation I would like to know more about the methodology used to guide the evaluation. The paper states that the “testers were recruited from the people working as domain experts for urban planning, aviation and medicine and also developers, none of which participated in the TermIt development”, however considering the tool is positioned in terms of legal text in general (as opposed to planning in particular) I wonder if the testers had a legal background as they are an important stakeholder group? Additionally, I would like to know how well the tool generalises beyond the planning scenario, for instance who are the various stakeholder groups, and what pain points does it address?

The impact section highlights the fact that ‘TermIt’ has been used to systematise the terminology of the Digital Technical Maps across the Czech Republic and for managing the terminology of the eGovernment vocabulary. It is not clear from the text if these are real live systems or if they are simply prototypes developed in the context of various research projects.

From a resources sustainability perspective, a link to a new github repository is provided, which includes a README files that provides information with respect to prerequisites, running the tool, and configuring the tool.