The Semantic Web Journal Celebrates the 25th Anniversary of the Web

[Update 06/19/14: We are currently getting a lot of traffic. Please be patient while the data and components are loading.]

All around the world people are celebrating the 25th anniversary of the invention of the Web. Continuing the line of Linked Scientometrics work we started for our own SWJ portal as well as the DEKDIV system for the Learning Analytics and Knowledge community, the Semantic Web journal, the STKO lab, and IOS Press, would like to participate in the Web@25 festivities by launching a Linked Data-powered digital installation.

Currently, the system consists of two major component with a third module being added next months. As you may expect, we mostly focus on the spatiotemporal aspects and patterns of bibliographic data. So far, we only use data from the WWW conference series from 1994 to 2013. While we will add data for 2014 soon, we were not able to find data from 1995. The data is enriched by mining for additional keywords using a wikification approach and translated into Linked Data. We will add data from more conferences in the future but the lack of readily accessible affiliation data and abstracts makes this a cumbersome task.

The first component is a timeline of the top 100 keywords per WWW conference year. The font size in the term cloud depicts the relative importance (frequency) of this keyword for the conference. While we merged some obvious cases (e.g., the singular and plural versions), we tried to keep other syntactic variations. For instance, it is interesting to see how the term World Wide Web is increasingly replaced by WWW or simply the Web.

To show a more interesting example, one can move the mouse over any keyword to see how it gained or lost importance over the full range of WWW conferences. As can be seen below, the term Semantic Web was first introduced into the top 100 in 2002 and has been around ever since peaking in 2004.

One may now argue that its importance has declined since 2004 and that the 2013 level is far below the initial 2002 mark. So let us put this hypothesis to the test. Looking at the other keywords one will speedily note that terms such as semantic (first used in 1999), ontology (since 2002), RDF (since 1999) and OWL (since 2004) are present as well and are part of the same general trend in the WWW community. Even more interestingly, terms such as SPARQL and Linked Data show up since 2008/2009. Finally, the term Graph (since 2007) can relate to the Semantic Web as well as to the Social Web. Summing up, the Semantic Web has not lost importance, what we really see is an increasing diversification. This is not surprising and can be observed for many other research themes as well. Finally, it is important to keep in mind that we are currently only showing WWW data, i.e., ISWC and other conferences are not part of the current system.

To study these effects in more detail and especially also explore the relation between keywords will require an additional module, namely a so-called Self Organizing Map (essentially a specific type of an artificial neural network). We will deploy the module later this summer.

Moving away from a purely temporal perspective, the next component supports the spatial and spatiotemporal analysis of the WWW bibliographic data. It uses a method called Kernel Density Estimation to compute a statistical surface (we use a Gaussian kernel here). The density in a given region is determined by the number of publications in this area and the individual observations are derived from the authors' affiliations. The used kernel density implementation is sensitive to cartographic scale, so we limited the zoom level. In the future, we will provide a more scale-robust version that is not based on pixels. One can access the module by double-clicking on a term in the timeline. This will map the specific term-year tuple. As shown in the figure below, it is possible to change the range and also to add or remove certain keywords. The new range can be mapped or the year-by-year change can be shown using an animation. It is worth mentioning that the list of mappable keywords includes all extracted keywords, not just the top 100 from the timeline.

To give a concrete example, the figure below shows the densities in the US and Europe for the term Semantic Web (without any further keywords or variations) in the time range 2002-2004 as well as five years later in 2007-2009.

So let us play around a little bit and map several related terms, namely Semantic Web, Linked Data, and Ontology, over the full time range 1994-2013. The resulting densities and thus the paper origins can be seen below.

One may now argue that the same distribution will be shown for almost any other term combination as it reflects the general research landscape (e.g., less activity in Africa and South America). While this is certainly true, there are nonetheless very clear and interesting patterns. This is where the zoom levels come into play. The following figure focuses on Europe and can be used to compare the resulting densities when comparing semantics-related terms with terms typically associated with Web search. These differences aside, it is also not surprising that most Web search related publications are published from US-based teams.

We hope that you will enjoy the system and would like to congratulate the W3C and everybody else again. Let us hope the Web will stay the Web we want for the next 25 years.