Review Comment:
This paper introduces TermitUp, a service aiming at the generation of domain-specific terminologies directly from corpora, semantically enriched with data from existing language resources in the Linguistic Linked Open Data (LLOD) cloud, and published in open and structured formats.
Overall, the paper is well organized and clear, with a comprehensive section on the background and relevant previous work, especially in what concerns the challenges regarding i) automating the generation of terminological resources, and ii) the underlying interlinking process. It also comprises a thorough description of TermitUp's requirements and architecture, while also illustrating its current and potential impact. The final sections address some of its limitations, along with future work aimed at tackling those issues. TermitUp is available in both GitHub and Zenodo, although the GitHub link has not been provided by the authors in footnote 43. I could reach it, nonetheless, via the Prêt-a-LLOD website.
This work, developed within the scope of an H2020-funded project (Prêt-a-LLOD), represents an ambitious and relevant endeavour within the current research landscape of terminology work and its connection to Linguistic Linked Data. On the one hand, it leverages the existing resources in the LLOD cloud, benefitting from the semantic enrichment potential that these datasets entail, and integrates a set of previously isolated technologies into a seemingly robust pipeline. On the other hand, by resorting to both SKOS and Ontolex modelling within a legal use case, and to its subsequent feeding of a SPARQL endpoint, TermitUp provides flexibility to the end-user while also addressing the requirements focusing on reusability and standardisation, as well as on open source and ease of access (#4 and #6, respectively).
As regards the system architecture, there is added value in the disambiguation features in Module 3, as well as in the term relation validation features in Module 4, by resorting to ConceptNet. It is also my understanding that the ongoing challenges involving SKOS and Ontolex modelling, described in Section 7 of the paper, and the subsequent proposal put forward by the authors in [https://www.w3.org/community/ontolex/wiki/Terminology](https://www.w3.org/community/ontolex/wiki/Terminology)), constitute a relevant starting point for the discussion, within the community, on how to model terminological resources as Linked Data, and might help boost more fine-grained representation models.
One of the main challenges concerning the future development of this service, in my opinion, is how TermitUp will scale to other domains and languages, and how the potential issues resulting thereof will be handled. Also as regards future work, and due to its inherent complexity, it will also be interesting to see how the additional module allowing the extraction of domain-specific relations will unfold. Furthermore, I can certainly see the advantage of publishing the resulting terminologies in Terminoteca RDF, at least in the short term.
In my opinion, however, the biggest challenge lies in the fact that some of the existing resources in the LLOD cloud either lack curation or, even worse, can become inactive as soon as their respective projects end, which would make the service more cumbersome or, ultimately, hinder it altogether. The authors refer to this briefly in the paper and seem to be aware of such risks. In fact, outlining and setting up effective quality control processes regarding the resources pertaining to the LLOD cloud represents a necessary discussion within the community that is currently ongoing and which is certainly beyond the scope of this paper.
In conclusion, this is a fairly comprehensive paper overall, following the guidelines underlying the "Tools and Systems Report" articles, and it entails both ambitious and promising research. Its development within the Prêt-a-LLOD project, which collaboratively integrates several stakeholders, clearly demonstrates a level-II impact. This work does, however, have the potential to go beyond its original Prêt-a-LLOD scope and impact other research groups within the community (level III), benefitting from different use cases where it could be put to the test. By tackling the aforementioned challenges brought about by other languages and domains, this service could, when stabilized, be accessible to (and used by) various researchers. In addition, TermitUp could, in my opinion, successfully integrate future educational materials on the topic of LLOD.
I would therefore just suggest some minor revisions:
**Content**:
- Although the paper describes how TermitUp has been successfully deployed in another H2020-funded project (Lynx), and that the service is currently being applied to other recently funded projects, it would be pertinent to have access to more concrete results, from both a quantitative and qualitative standpoint, on how TermitUp ultimately helped improve the projects' pipeline and/or outputs (namely in Lynx).
- On page 6, section 5.1., could you provide more concrete data - namely from the "preliminary study" you refer to - to support your claim that Freeling's performance was not satisfactory when compared to other POS taggers for Spanish?
- On page 9, end of section 5.4, the "noun-verb" pattern is repeated.
- On page 11, end of section 6: although the SmarTerp project appears to be at its onset, it would be relevant to provide more concrete input on which "extra information" would be supplied to interpreting professionals in this regard.
**References**:
- It might be relevant to include the direct reference to Meyer's paper on Knowledge-Rich Contexts (2001): [https://benjamins.com/catalog/nlp.2.15mey](https://benjamins.com/catalog/nlp.2.15mey)
- In your first mention of Ontolex (p. 9, section 5.5), it would be pertinent to include at least a reference, perhaps to the core model: [https://www.w3.org/2016/05/ontolex/](https://www.w3.org/2016/05/ontolex/) or to John P. McCrae, Paul Buitelaar, and Philipp Cimiano. 2017. The OntoLex-Lemon Model: Development and Applications. In Proceedings of eLex 2017, pages 587–597. INT, Troj´ına and Lexical Computing, Lexical Computing CZ s.r.o
- Please include the GitHub link in footnote 43
- Footnote 44 was left blank as well → please renumber the footnotes accordingly
**Linguistic issues/typos:**
- Please replace "aroused" throughout the paper (e.g. with "arose" on p. 12, l. 3, left column)
- p. 2, l. 28, right column → eliminate "the" between "exposes" and "this"
- p. 4, l. 15, left column → "in the cloud" instead of "int the cloud"
- p. 4, l. 32, left column → "bilingual" and "English" instead of "biligual" and "Enlish"
- p. 5, l. 40, right column → "specificity" instead of "specifictiy"
- p. 5, l. 45, left column → "hierarchical" instead of "hierachical"
- p. 9, l. 16, Table 3 caption → "whose RDF version" instead of "which RDF version"
- p. 11, l. 18, left column → "requirement 6" instead of "requirement 7"
- p. 11, l. 43, right column → "or an individual" instead of "and even, or an individual"
- p. 12, l. 36, right column → "terms nor rich linguistic descriptions" instead of "terms and nor rich..."
|