Review Comment:
In this paper the authors present the description of the use of a set linked data managing tools in an editorial company. This set of tools (the Linked Data Stack) helps the company in managing its entire publishing process.
Introduction section
This section presents a data integration scenario based on an editorial company example. This example is related to local VAT modifications which have to be managed globally. Every country has its own laws and law modifications about VAT and the company has to adapt to them.
Comment:
I agree that this is a nice scenario for applying Semantic Web technologies. However, are there other scenarios in the company? are there others more related for such publication companies?
Overview of the Linked Data Stack Section
In this section the authors present the Linked Data Stack set of tools, aka LOD2 Stack. This stack can be installed as a Debian, use RDF as data model and REST for accessing data. The authors point that this work is an extension of [1] and [3]. They also point that this paper is the application of [2] in this specific scenario.
Reading this section I got a bit lost, I do not understand the relation between [1], [2[ and [3].Can the authors clarify a bit more that relation?
Linked Data Lifecycle Section
Half this section is Section 2 in [1]. A reference to it would be nice. Not much more to say since most of this section is based in [1] and [11].
Data-Flows at Global Publishers section
In this section the authors present briefly the company which provided the use case for this application report. The authors also describe briefly how the LOD2 Stack could help: company products are not connected with each other and linking these products could be the solution to provide a better customer service.
Comments:
I miss a bit more of detail about how the LOD2 stack could help in "provide a better customer service” Is that the only use case, customer service? what is customer service exactly?
Usage of the Linked Data Stack at WKG section
In this section the authors describe the requirements from WKG. These requirements are for the development of the solution to the data integration problems they had which were enriching content, vocabulary extensions, etc. These major business requirements were:
- Processing and enriching mass content from partners
- Extension and consolidation of controlled vocabularies
- Managing content metadata addition depending on the sources
- Enabling vertical view of the content
Next the authors how they used the technologies in the LOD2 Stack. First explaining why they decided to use commercial software, how they extracted content from documents, how they did the quality assurance, how they used the vocabularies for representing metadata, etc.
Comments:
I have two comments in this section:
- How all these processes relate with the major business requirements? the authors started describing 4 requirements, but I do not see in this 6 pages section details about how the LOD2 Stack helps in solving these requirements.
- I miss many details, like the following:
- Why the authors decided to convert certain documents?
- What vocabularies they used/extended?
- What these vocabularies are about, etc.
- How many controlled vocabularies are managed by PoolParty?
- How many triples are stored in Virtuoso globally?
- How many nodes/WKG departments are producing data?
- How distributed is the data?
- Do you use query federation?
- Link discovery: precision and recall 100%? how well performs the tool? how often the tool is executed?
- Is new data continuously added to the system? how often? how hard the Linked Data Stack works?
Related work section
In this section the authors briefly describe some related work, describing the NY Times application and the BBC. However, since the authors talk in the paper about legal documents, I miss some related work about legal documents and the Semantic Web. Besides, isn’t there any company that already has integrated highly distributed data from several companies’ offices?
Overall comments:
In general I think this is a nicely motivated paper trying to show how a set of Semantic Web technologies work in a real scenario. However I think it is hard to understand how these technologies are used to fulfil the company requirements. There is a list of requirements but I don’t see how that list is related to the next paragraphs in the same long section. Besides, I think that many interesting details are missing (details about the vocabularies, how URIs are generated, etc.). From my point of view a Semantic Web journal should publish papers containing these details, not only a generic description of the use of these technologies. Specially in an application report.
I also saw some typos:
Page 2: After that, the vision of the vision
but requirs -> but requires
afterwarts -> afterwards
All this are -> all these are
|