Review Comment:
* Summary: The article describes an approach to convert vector geographic features extracted from multiple historical maps into contextualized Spatio-temporal KGs. The resulting graphs can be easily queried (GeoSPARQL) to understand the changes in different regions over time. The approach and its evaluation focus on linear and polygonal geographic features.
* Overall Evaluation (ranging from 0-100):
[Criterion 1]
[Q]+ Quality: 92
[R]+ Importance/Relevance: 85
[I]+ Impact: 88
[N]+ Novelty: 80
[Criterion 2]
[W]+ Clarity, illustration, and readability: 95
[Criterion 3]
[S]+ Stability: 100
[U]+ Usefulness: 90
[Perspective]
[P]+ Impression score: 88
* Dimensions for research contributions (ranging from 0-100):
(1) Originality (QRN): 86
(2) Significance of the results (ISU): 93
(3) Quality of writing (QWP): 88
* Overall Impression (1,2,3): 89
* Suggested Decision: [Minor Revision]
* General comments:
The work is solid. The paper is really mature and good: easy to follow and understand.
There is a novelty in the proposed approach.
The authors can improve more some specific parts of the manuscript by addressing the issues presented in this review and feedback.
* Major points for improvements:
{
pag#06 / L#38: "generate a global bounding box for s", how do you do it (calculate it)? explain details about this task. Also, a possible improvement would be not to calculate a "box" (rectangle) for the bounding area but rather another geometric object to perform a better approximation (using an ellipse?). Your thoughts about this. Could this have better results in line 2 of Algorithm 2? or, is a "box" a limitation of the reverse-geocoding function?
pag#08 / L#29: "record provenance data" --> it would be better to align the model with PROV-O
pag#11 / L#33: "computation cost is not exponential in practice" --> it would be better to present a proper time complexity analysis of both algorithms.
pag#12 / L#22: "linearly dependent" --> "polynomial depending on..." | the time complexity also depends on tasks/lines 1,2 of Algorithm 2. It would be better to present a proper time complexity analysis.
pag#12 / Table 6: Does the F1 score gives the same weight to recall and precision ("P&R")? or is it "2P&R" or "P&2R"?
---
pag#13 / L#3: (Fig#15) "dcterms:date" appears two times (for values: 1962, 2001). Does this means that a "building-block" (feature parts) could have multiple "dcterms:date" attributes? I guess that this would be the case; is an expected result from Section 2.6? This should be clarified; the fact that the feature parts can have multiple "dcterms:date" is not intuitive. If so, then it would be better to include the cardinality for each relation in Fig#8 (semantic model).
Fig#17: The query seems wrong. In your main BGP, you have "dcterms:date 1958^^xsd:gYear". Later on, you add "?f dcterms:date ?date", and in the HAVING clause you are performing a "COUNT(DISTINCT ?date)". If all feature parts have only one "dcterms:date" then all selected "?f" will have "dcterms:date 1958^^xsd:gYear", so the COUNT will be 1. Perhaps, you meant "COUNT(DISTINCT ?f)"? Again, this is related to the previous comment (and the issue of multiple "dcterms:date"). This should be clarified; it's not intuitive.
ADDENDUM: I found an example in the data files:
a geo:Feature ;
dcterms:created "2020-03-14T09:55:56.418490"^^xsd:dateTime ;
dcterms:date "1942-01-01T00:00:00"^^xsd:dateTime,
"1950-01-01T00:00:00"^^xsd:dateTime,
"1965-01-01T00:00:00"^^xsd:dateTime ;
geo:hasGeometry ;
geo:sfContains ;
geo:sfWithin ,
.
So, yes, the geographic features can have multiple dates. But, it's better to clarify this in the manuscript (comments above).
---
"unsupervised": the authors used the word "unsupervised" 7 times in different parts of the paper (abstract, sec#1, sec#1.2, sec#2.2, sec#4, sec#5). The main usage is as in "automatic unsupervised approach/pipeline". Some examples:
(a) (pag#12 / L#39): Why the linking task has "unsupervised characteristics"? This is not clear at all.
(b) (pag#15 / L.22): "present an unsupervised pipeline"; why "unsupervised"?.
Why the authors are using the word "unsupervised"? The presented pipeline is a set of "non-Learning" algorithms and tasks. Please, clarify this point or drop the word.
}
* About the data files and related software artifacts: (“Long-term stable URL for resources”)
(1) "data files are well organized and contain a README file which makes it easy to assess the data": [YES], although there is no specific description about the data files in the README file of the repo.
(2) "the provided resources appear to be complete for replication of experiments": [YES. COMPLETE]
(3) "the chosen repository is appropriate for long-term repository discoverability": [YES. GitHub]
(4) "the provided data artifacts are complete": [YES]
* Minor corrections:
pag#05: L#14 "Data" --> "Input"; L#17 "Result" --> "Output"
pag#06 / L#45: Capital R in section title.
pag#09 / L#08: What is the meaning of the symbols below the column labels in Fig#9? IDs vs. literals? Please, clarify or remove them.
pag#14 / Fig#18, Fig#19, Fig#20: ", marked in" --> "; marked in" or "are marked in"
pag#15 / L#11: "exposed as the data as" --> "exposed the data as"
[everywhere]: "web" --> *Web*, "semantic web" --> "Semantic Web"
|