Abstract:
The Covid-19 pandemic is influenced by many environmental, health, and socioeconomic aspects such as air pollution, comorbidity, occupation, etc. Decision makers need better data on the mortality and morbidity of Covid-19 to efficiently withhold its spread. The majority of the data resources dedicated to Covid-19 focus on spatiotemporal aspects only. Furthermore, existing research often overlooks the integrated impact of combining multiple factors. In this study, we efficiently model and analyse Covid-19's epidemiological data from multiple dimensions, such as time, location, temperature, comorbidity, occupation, etc. Data warehousing technology is used to model and integrate data from disparate sources in a multidimensional format. Besides, to make the data interoperable and accessible, they are annotated, integrated, and published semantically using the Resource Description Framework (RDF) model in accordance with the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles. To facilitate Online Analytical Processing (OLAP) compatibility, we annotate the Covid-19 knowledge graph—referred to as CovKG—with multidimensional semantics using QB and QB4OLAP vocabularies. CovKG is analyzed through an interactive analytical interface to observe the Covid-19 confirmed cases and deaths from thirteen aspects. Finally, the performance and quality of CovKG are assessed against prominent data stores modeling Covid-19 data. The ETL workflow typically takes around 42 minutes to load CovKG, which is connected to 10,951 external resources, has a size of about 5.3 GB, and consists of around 44 million RDF triples. When evaluated using competency queries, CovKG can answer 100% of the questions, whereas other prominent data stores can only provide the best answers for 39% of them.