|Review Comment: |
This paper describes a practical workflow for transforming SDMX collections into Linked Data. The authors focus on four relevant statistical datasets:
- OECD: whose mission is to promote policies that will improve the economic and social well-being of people around the world.
- BFS Swiss Statistics, due to the Federal Statistical Office's web portal offering a wide range of statistical information including population, health, economy, employment and education.
- FAO, which works on achieving food security for all to make sure people have regular access to
enough high-quality food.
- ECB, whose main task is to maintain the euro's purchasing power and thus price stability in the
Nevertheless, the tool proposed in the paper can be easily used for transforming any other SDMX dataset into Linked Data.
On the one hand, statistical data are rich sources of knowledge currently underexploited. Any new approach is welcome and it describes an effective workflow for transforming SDMX collections into Linked Data. On the other hand, the approach technically sounds. It describes a simple but effective solution based on well-known toolss, guaranteing robustnees and making easy its final integration into existing environments. Thus, the workflow is a contribution by itself and each stage describes how it impacts in the final dataset configuration.
With respect to the obtained datasets, these are clearly described in Sections 7 and 8. These reuse well-known vocabularies and provide interesting interlinkage between them but also with DBpedia, World-Bank, Transparency International and EUNIS. Apache Jena TDB is used to load RDF and Apache Jena Fuseki is used to run the SPARQL endpoint. Datasets are also released as RDF dumps (referenced from Data Hub).
Finally, its relevant for me how scalability problems are addressed, because I think that 12 GB is an excessive amount of memory for processing these datasets (the largest one outputs less 250 million triples). Do you have an alternative for processing larger datasets? Maybe you could partition the original dataset into some fragments: is the tool enough flexible to support it? Please, explain how scalability issues will be addressed to guarantee big datasets to be effectively transformed.