CovKG: A Covid-19 Knowledge Graph for Enabling Multidimensional Analytics on Covid-19 Epidemiological Data considering Spatiotemporal, Environmental, Health, and Socioeconomic Aspects

Tracking #: 3696-4910

Authors: 
Rudra Pratrap Deb Nath
S.M. Shafkat Raihan
Tonmoy Chandro Das
Torben Bach Pedersen

Responsible editor: 
GQ Zhang

Submission type: 
Full Paper
Abstract: 
The Covid-19 pandemic is influenced by many environmental, health, and socioeconomic aspects such as air pollution, comorbidity, occupation, etc. Decision makers need better data on the mortality and morbidity of Covid-19 to efficiently withhold its spread. The majority of the data resources dedicated to Covid-19 focus on spatiotemporal aspects only. Furthermore, existing research often overlooks the integrated impact of combining multiple factors. In this study, we efficiently model and analyse Covid-19's epidemiological data from multiple dimensions, such as time, location, temperature, comorbidity, occupation, etc. Data warehousing technology is used to model and integrate data from disparate sources in a multidimensional format. Besides, to make the data interoperable and accessible, they are annotated, integrated, and published semantically using the Resource Description Framework (RDF) model in accordance with the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles. To facilitate Online Analytical Processing (OLAP) compatibility, we annotate the Covid-19 knowledge graph—referred to as CovKG—with multidimensional semantics using QB and QB4OLAP vocabularies. CovKG is analyzed through an interactive analytical interface to observe the Covid-19 confirmed cases and deaths from thirteen aspects. Finally, the performance and quality of CovKG are assessed against prominent data stores modeling Covid-19 data. The ETL workflow typically takes around 42 minutes to load CovKG, which is connected to 10,951 external resources, has a size of about 5.3 GB, and consists of around 44 million RDF triples. When evaluated using competency queries, CovKG can answer 100% of the questions, whereas other prominent data stores can only provide the best answers for 39% of them.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Sep/2024
Suggestion:
Reject
Review Comment:

This paper reports an effort to generate a multidimensional and semantically annotated Covid-19 knowledge graph called CovKG. The contribution attempts to address the need to integrate disparate data sources in order to perform spatial-temporal, socioeconomsic, health, and environmental analyses. CovKG supports OLAP operations and SPARQL queries at larger scale. The paper is generally well-structured and well written.

However, the paper does not seem to pass two of SWJ's standards in terms of (1) originality and (2) significance.

Originality: the paper is more focused on data engineering aspects and less on fundamental methodology, and it is reviewed as such. The issue is perhaps less about meaningful and lasting contribution than a not well-defined topic on Covid-19 that requires such a comprehensive scope of data sources for CovKG to be valuable for a very wide range of query interests. And yet CovKG needs to compete with the wide array of narrower-domain, existing Covid-19 knowledge graphs. (A quick search of "Covid-19" AND "knowledge graph" generates over 10,000 entries from google search.)

Significance of the results: the paper highlights CovKG's ability to confirm Covid stats, reasonable ETL performance query answering performances, but lacks motivation and focus on advancing Covid-19 research on the claimed spatial-temporal, socioeconomsic, health, and environmental aspects, which could make a difference. It gives an overall impression of lack of focus.

To sum, the paper described a solid data engineering effort using knowledge graph techniques to integrate multidimensional Covid-19 data through CovKG, but the paper attempted to address too many topics in a way that lacks demonstration of important advances in each of the worthy topical areas. Therefore, SWJ does not seem to be the best forum to publish such a paper. Repositioning of the work may be suitable for alternative publication venues.