SEOntology: Writing the Future of SEO

Tracking #: 3755-4969

Authors: 
Emilija Gjorgjevska
David Riccitelli
Andrea Volpini
Milos Jovanovik

Responsible editor: 
Philipp Cimiano

Submission type: 
Ontology Description
Abstract: 
The emergence of generative artificial intelligence (GenAI) presents both opportunities and challenges for the search engine optimization (SEO) industry, particularly concerning the maintenance of data quality and curation. The SEO industry has evolved significantly from traditional practices focused on keyword optimization to a more sophisticated, user-centric approach that emphasizes semantic data and entity understanding. Within this evolving landscape, we introduce SEOntology as a pioneering ontology designed to formalize and systematize SEO-related knowledge, effectively bridging the gap between human and machine understanding. SEOntology seeks to standardize the complex vocabulary and interrelationships of SEO concepts and enhance the transparency, consistency, and efficiency of SEO practices. This paper details the design and implementation of SEOntology and examines its potential to future-proof the SEO industry by enabling more dynamic content interactions, improving data quality, and supporting the responsible integration of AI technologies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 13/Jan/2025
Suggestion:
Reject
Review Comment:

The paper describes an ontology prototype for representing and managing SEO-related information. Such an ontology can be very useful, both for managing information at a project- or client-level, and for the SEO experts' community as a whole.

Unfortunately, both the ontology and the paper are suffering from substantial limitations, which make the submission unsuitable for publication in an academic journal. A more elaborated ontology could be very useful; unfortunately, the current approach is just an ad-hoc database schema expressed in OWL with negligent deployment.

**Note:** There is some similarity with the article already published online at .

## Presentation and Paper

The paper would benefit from a more objective writing style.

## Usage and Description of Design Principles and Methodologies

The paper cites some established methods from the field, namely BFO and Competency Questions. These are unfortunately not really used during the design of the ontology. The functional and non-functional requirements seem quite arbitrary and remain generic; it would be unclear how the ontology could be evaluated against these and they are not used for a serious evaluation of the results.

The Competency Questions (p. 6) are not as I would expect them: Historically, CQs were used to define the scope of an ontology by defining (or giving examples of) queries that should be answered with the help of the conceptual elements of the final ontology. The authors would benefit from reading the original Uschold/Gruninger paper from 1996: Uschold, M., Grüninger, M., 1996. Ontologies: Principles, Methods, and Applications. Knowledge Engineering Review 11, 93–155, in particular section 6.4.

## Comparison with other ontologies on the same topic

The ontology itself is to my knowledge novel; there is not similar ontology that would need to be used for comparison. However, the alignment with schema.org and other relevant ontologies leaves room for improvement (see below for more details). There is relevant work on fundamental ontologies covering the core concepts of the WWW architecture, e.g. Halpin/Presutti: The identity of resources on the Web: An ontology for Web architecture, Applied Ontology, Volume 6, Issue 3, pp. 263 - 293, 2011.

## References to applications or use-case experiments

Section 5 describes application scenarios, but remains rather superficial.

## Quality and relevance of the described ontology

The scope of the ontology looks promising; unfortunately, the current design and deployment is so limited that it insufficient for broad usage.

### Conceptual Modeling Perspective

While the information to be captured by the ontology is useful and valuable in real-world SEO tasks and projects, the ontology elements and their relationships look rather ad-hoc and not very carefully designed. A few examples:

1. The core model is not well-aligned with the architecture of the WWW and respective terminology, namely resource, identifier, and representation. This will lead to avoidable inconsistencies (e.g. when dealing with canonical URIs or multiple syntactical forms of the same Web content). The authors should try to re-use the standard building blocks, it would make the ontology more lasting and useful.

For instance, the ontology defines (or locally redefines) these three classes:

-
-
-

Now, one could argue that in an SEO context, URI/URLs are indeed something different from the ideal of the Web architecture, as e.g. a popular, high-ranking target URI might be considered an asset in its own right. But if you want to go that route, it does not make sense to model it as an owl:FunctionalProperty and owl:InverseFunctionalProperty like so:

```
### https://w3id.org/seovoc/hasURL
:hasURL rdf:type owl:ObjectProperty ,
owl:FunctionalProperty ,
owl:InverseFunctionalProperty ;
rdfs:domain :WebPage ;
rdfs:range :URL ;
rdfs:comment "The hasURL property establishes a unique and reciprocal relationship between a WebPage and its corresponding URL. It asserts that each WebPage is identified by exactly one URL, and conversely, each URL uniquely identifies one WebPage. As both a functional and inverse functional property, hasURL ensures that this link is both unique and bidirectional, which is critical for accurately representing the identity and accessibility of web content" ;
rdfs:isDefinedBy .
```

2. The ontology defines multiple properties for variants of the same characteristic that differ just by the unit of measurement or just by the the reference to another value. That approach makes the ontology unnecessarily inflexible yet increases the number of properties.

Examples:
-
-

The ontology could be improved by getting inspiration from, or reusing, the model for values, units of measurement, and value references in schema.org, namely , , and .

3. In general, the ontology does not support n-ry relationship types in a very convincing way, and is using a very flat model. If I understand it correctly, most data is attached to a Web page entity. Think of better ways of representing e.g.. See e.g. and .

4. and are practically useful, but it might be better to also support URIs as the range for both embeddings and embedding models and not just strings. Also, embeddings are typically understood as *vectors* of **numerical data.** There is work on sharing embeddings in RDF (e.g. ), but I think it will be better to distinguish an embedding from its representation and use standard Web architecture elements (resource, representation, mime types, ... ) for modeling them.

For instance, if an embedding is obtained from a truly RESTful API, then it may directly be a Web resource, and its representation could e.g. be in JSON or JSON-LD, like this example (taken from ):

```json
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.006929283495992422,
-0.005336422007530928,
-4.547132266452536e-05,
-0.024047505110502243
],
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
```

### Implementation and Deployment

The base URI for all elements is `https://w3id.org/seovoc/` using identifiers like (slash URIs), but they are not de-referenceable.

The whole deployment is also not well-suited for an ontology with slash-based URIs, as even if set up properly, the entire ontology would be returned as a representation.

```bash
curl -I https://w3id.org/seovoc/hasQuery
HTTP/1.1 307 Temporary Redirect
...
Location: https://raw.githubusercontent.com/wordlift/wl-ontology/main/SEOntology.owl
Content-Type: text/html; charset=iso-8859-1

curl -I https://raw.githubusercontent.com/wordlift/wl-ontology/main/SEOntology.owl
HTTP/2 404
content-type: text/plain; charset=utf-8
```

All in all, the ontology is not properly deployed and there has been an [open issue in Github highlighting several key problems with the proper deployment according to the state-of-the art since Feb 21, 2024](https://github.com/seontology/seontology/issues/1), e.g.

- serves the HTML representation of a directory listing (see screenshot below).
- The anchor text **readme.md** points to , which returns a 307 redirect to .

```bash
curl -I https://w3id.org/seovoc/readme.md
HTTP/1.1 307 Temporary Redirect
Date: Mon, 13 Jan 2025 19:40:44 GMT
Server: Apache/2.4.29 (Ubuntu)
Access-Control-Allow-Origin: *
Location: https://raw.githubusercontent.com/wordlift/wl-ontology/main/SEOntology.owl
Content-Type: text/html; charset=iso-8859-1
```
- That URI returns a 404 status:
```bash
curl -I https://raw.githubusercontent.com/wordlift/wl-ontology/main/SEOntology.owl
HTTP/2 404
...
```

Image: HTML representation served at https://w3id.org/seovoc/

**Also, there is NO HTML representation being served for the ontology.**

Review #2
By Basil Ell submitted on 30/Jan/2025
Suggestion:
Reject
Review Comment:

The authors present SEOntology, an ontology to model data relevant in the context of search engine optimization.

The paper contains multiple claims that are not substantiated, the paper is not well-written, and the design of the ontology is questionable.

Major critique:

1. The authors claim that SEOntology is based on BFO. BFO terminology does not occur in the description of the ontology. It is not described how the ontology is built on / links to BFO.

2. The authors claim that "SEOntology aims to standardize broader SEO concepts such as topicality, backlinks, and search intent." There are no classes or properties to model search intent in SEOntology.

3. The lists of functional and non-functional requirements is long, but it is not discussed which of these requirements are met and how they are met.

4. "SEOntology can encode the rules and best practices for internal linking into an AI agent." I do no understand what that is supposed to mean, but what I can see is that the ontology cannot be used to express rules and best practices. Along these lines: "SEOntology was used to model internal linking strategies". The ontology cannot be used to model strategies. "This language enables SEO professionals to encode their expertise into a shared knowledge base." Beyond these claims about what can be modeled using the ontology, there are many claims such as "SEOntology not only enhances SEO practices but also contributes to the broader goal of a more connected and intelligent web", "prevent the misuse of GenAI", "improving daa quality, interoperability, and machine comprehension", "by adopting SEOntology, SE professionals and businesses can keep up with technological advancements and actively contribute to the future of search engine optimization and digital marketing", "can play a significant role in promoting ethical AI practices ... to minimize bias and misinformation". I stop here, there are many of these unsubstantiated claims.

5. The authors list 18 competency questions, such as "CQ18: How does SEOntology promote transparency and ethical AI practices in SEO?" This is not an appropriate CQ for an ontology. The same holds for most of the other questions. Not only are the inappropriate, never is it discussed how the developed ontology can address these questions. Design decisions such as "each WebPage is identified by exactly one URI" are unusual and the rational is not explained well.

6. Most of the works listed in the bibliography are not peer-reviewed works.

I do not see that the current ontology, even when ignoring the unsubstantiated claims, in its current state, is of much use and of particular relevance to the community (be it the Semantic Web community, the Web community, or the SEO community). I would advice the authors to develop a deeper understanding of the purpose of an ontology and of ontology engineering. And I would advice them for a more careful use of generative AI, out of respect for the reviewers and readers and because of the resulting poor quality, when not using generative AI well. There is a lot of repetition, the paper lacks a clear structure, some things sound well but do not really make sense. Claims are inconsistent and are not met, which could be a result of using GenAI, either when not clearly specifying what the text should express, or when not paying attention to whether what the generated text expresses actually makes sense.