Ontology supported semantic based image retrieval

Tracking #: 3559-4773

Authors: 
Akif Gasi
Mustafa Dağtekin
Tolga Ensari

Responsible editor: 
Cogan Shimizu

Submission type: 
Full Paper
Abstract: 
In this study, a two-stage approach for developing a Semantic Based Image Retrieval system supported by Ontology is proposed. In the first stage, objects are detected with the Object Detection process from the image and a predicate describing the relationship between the two objects is determined with the developed Bi-directional Recurrent Neural Network (Bi-RNN) model. In the second stage, relations defined as <subject-predicate-object> are converted into Ontologies and used to search for semantically similar images. In the measurement of Semantic Gap, as the main problem encountered in the Semantic-Based Image Retrieval approach, it is proposed to calculate the number of similar relationships between two images by using entropy. By using the number of relationships (X) found in the image used for query purposes and the total number of relationships (Y) of the image with similar relationships that was found as a result of the query, the Semantic Gap between two images was calculated with the Joint Entropy method. The proposed approach has the characteristics of a new method used in this field and gives more effective results compared to other similar methods that are used in Semantic Based Image Retrieval by using Ontologies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Nov/2023
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Summary:

The paper proposes a novel two-stage Semantic-Based Image Retrieval (SBIR) system supported by Ontology, aiming to address the Semantic Gap in traditional Image Retrieval approaches. In the first stage, Object Detection and a Bi-directional Recurrent Neural Network (Bi-RNN) model are employed to determine relationships between objects. The second stage involves converting these relationships into Ontologies for more effective semantic similarity searches. The study introduces a method for measuring Semantic Gap using entropy.

Contributions:

Novel Ontology-Supported Approach: The paper introduces a unique two-stage SBIR approach that utilizes Ontologies for improved semantic representation and retrieval.

Effective Use of Visual Genome Dataset: The study employs the Visual Genome dataset for training the model and generating ontologies, contributing to the credibility of the research.

Innovative Semantic Gap Measurement: The use of entropy, specifically the Joint Entropy method, for measuring the Semantic Gap between images adds a novel quantitative dimension to the evaluation.

Strengths:
Originality:
The paper demonstrates a high level of originality by combining Object Detection, Bi-RNN models, and Ontology in a novel two-stage SBIR approach. The introduction of entropy for Semantic Gap measurement further contributes to the originality of the study.

Significance of Results:
The proposed approach, especially the second stage utilizing Ontologies, is shown to yield more effective results in semantic similarity searches compared to existing methods. The introduction of a quantitative measure for Semantic Gap provides a valuable contribution to the field.

Quality of Writing:
The writing is generally clear and concise, with well-organized sections. The technical details, such as the Word Embedding process, Bi-RNN model description, and Ontology creation, are presented in a detailed manner. However, there are instances where more clarification and details could enhance understanding.

Weakness:
The paper lacks a thorough discussion of the redesigned Bi-RNN model, including details on the redesign and its specific contributions.
While claiming superior results compared to Scene Graph structures, the absence of a direct comparative analysis weakens this assertion.
The section on Semantic Gap calculation using entropy is somewhat brief, and a more detailed explanation would improve clarity.

Review #2
By Md Kamruzzaman Sarker submitted on 31/Jan/2024
Suggestion:
Major Revision
Review Comment:

Authors proposed an image retrieval model where images are retrieved based on their semantic meaning rather than just visual or text information.
The overall algorithms works in two phase: On the initial phase, the Object Detection process identifies objects within the image, while a Bi-directional Recurrent Neural Network (Bi-RNN) model determines the relationship predicate between these objects. The Semantic Gap between two images is then computed using the Joint Entropy method, utilizing the number of relationships (X) detected in the query image and the total number of similar relationships (Y) found in the resulting image. This approach has a potential benefits of removing false positives.

I found this approach is interesting and worth investigating. If applied properly, it can help to remove many false positive images on the image search engines. Though more computation power, more pre-processing and training of the model may be needed, which may increase the cost. This research has potential and recently being investigated along the way the authors has proposed. So in terms of originality and significance, this paper is pretty good.

* Some of the main issues I found, the writing is not good. Several grammatical mistakes are on the paper, and often the sentences are not complete.
- Example: "As the proposed solution for addressing the Semantic Gap problem is the Semantic- 2
3 Based Image Retrieval (SBIR) approach." - Sentence should be re-written
- Introduction should be re-written.
- Related work should be more focused and the difference between the author's work and previous work should be clearly mentioned.

* Creating the ontology: The image dataset used for this experiment has already graph structured information which is easy to be converted to an ontology. What will happen, when the image dataset will not have extensive textual information? Will this approach still provide high accuracy? It would be good to see the experimental result on some other dataset where images are not extensively annotated.
* Limitations of this approach, potential computational cost, should be discussed.

The dataset and source code are publicly available, which is excellent.

Review #3
Anonymous submitted on 16/Feb/2024
Suggestion:
Major Revision
Review Comment:

This article addresses the topic of Semantic-Based Image Retrieval (SBIR). Objects are detected in images and used to train a Bi-RNN model and build an ontology. The resulting ontology is then used for SBIR. Finally, a number of similarities are calculated to address the problem of Semantic Gap using the entropy concept.

The article is technically good, but there are some concerns that need to be addressed before the paper can be considered for publication.

1. The end of the abstract should provide some statistical analyses of the results of the proposed method using suitable metrics such as accuracy, precision, recall, f1-meausre etc.

2. The related work section is not meant to define terminologies, but to discuss other studies that have addressed SBIR and highlight their shortcomings compared to this study; for instance, according to the authors, [6] has almost done the same work as that of this study; then, indicate how the work in [6] differs from this study. Furthermore, the number of papers discussed in the current Related Work section is too low, the authors must find and discuss more related studies. The authors may merge the current Related Work content with the introduction and rewrite the section to discussed previous studies that have addressed SBIR.

3. The notation Recall@X on page 6 is not a standard notation and the meaning must be explained in the paper

4. The evaluation of the proposed model performance is weak and does not provide any statistical analysis nor comparative results with related works to support its novelty or contribution.

5. The title of table 1 is misleading, the training of a model does not result to any prediction; then, how were the reported stats obtained during the training of the model?

6. The variables of Eq. 4 must be explained below the equation.

7. The entire paper must be proofread by a native English-speaking person.