Digital Heritage: Semantic Challenges of Long-term Preservation

Paper Title: 
Digital Heritage: Semantic Challenges of Long-term Preservation
Christoph Schlieder
The major digital preservation initiatives are about as old as the idea of the Semantic Web but the research areas only had little effect upon each other. This article identifies connections between the two research agendas. Three types of ageing processes are distinguished which affect digital records: media ageing, semantic ageing, and cultural ageing. It is argued that a period of 100 years constitutes an appropriate temporal frame of reference for addressing the problem of semantic ageing. Ongoing format migration constitutes currently the best option for temporal scaling at the semantic level. It can be formulated as an ontology matching problem. Research issues arising from this perspective are formulated that relate to the identification of long-term change patterns of ontologies and the long-term monitoring of ontology usage. Finally, challenges of cultural ageing are discussed.
Full PDF Version: 
Submission type: 
Responsible editor: 
Krzysztof Janowicz

Review 1 by Pascal Hitzler:
The paper reports on "Digital Preservation" as a field of research in relation (or the lack thereof) to Semantic Web. It argues that there are natural tight connections which would seem to make Semantic Web technologies applicable to Digital Preservation, and points out some links which have already been established.

While it becomes clear from the paper how Digital Preservation can benefit from applying Semantic Web methods, it would be nice to also include a discussion how Semantic Web as a field could benefit from this application area. Are there fundamental issues which should be addressed (or addressed in a different way) in the light of Digital Preservation?

There are some research challenges mentioned on page 3. For some of them, it is not immediately clear to me how these issues could be addressed considering the current state of the art. Perhaps this could be pointed out in a bit more detail.

minor remarks:

page 1 left: "what sort of digital legacy they will be able leave with today's technologies." (grammar?)

page 1 right: "perquisites" -> "prerequisites"

page 2 right, bottom: "reenactment of *a* user experience"

Review 2 by Krzysztof Janowicz:

Very interesting paper, I would be especially interested to read some more details about aspects of semantic/cultural aging beyond data formats. While I agree that parts of the problem of migration can be understood as ontology matching problem, I think that it probably needs a better understanding of ontology evolution in the first place. So far, most ontology matching approaches try to integrate ontologies by adding GCI axioms to the source ontology. To do so, they use probabilistic frameworks, structural matching, syntactic (and sometimes semantic) similarity measures, and so forth.

However, ontologies are always only approximations of the intended model and especially OWL ontologies tend to be very rough approximations. The reason why this 'works' is based on social agreement beyond formal specifications. This is especially the case for base symbols and primitives but also holds for defined concepts. This (hidden) agreement however changes over time. While it may have an important impact on the interpretation of data it cannot be handled by ontology matching but rather by studying conceptual shifts and ontology evolution. IMO, Raubal's paper on 'Representing Concepts in Time' or N. Noy's work on detecting conceptual changes in ontology evolution offer interesting insights. However, I am not sure whether this is rather a semantic aging or a cultural aging aspect to use your terminology.

One example may be the concept of time. I would assume that it is used as base symbol or primitive in most application/domain ontologies and especially also the annotated primary sources such as text documents. Using the 100 year frame in your paper some documents may be based on a notion of time (and space) before Einstein's work, while others rely on modern physics. Another (less abstract) example may be the term terrorist before and after 9/11.

In the paper, you propose to analyze time series to understand the changes, maybe one could use semantic similarity measure to quantify these changes and determine whether they will require new versions of the uses ontologies. Unfortunately, this would still not capture some of the underlying social drifts. Maybe focusing on instance data could help here to observe the changing categorization patterns?