A Survey on Visual Transfer Learning using Knowledge Graphs

Tracking #: 2730-3944

This paper is currently under review
Sebastian Monka
Lavdim Halilaj
Achim Rettinger1

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Survey Article
The information perceived via visual observations of real-world phenomena is unstructured and complex. Computer vision (CV) is the field of research that deals with the visual perception of the environment. Recent approaches of CV utilize deep learning (DL) methods to learn and infer latent representation from observational image data. To achieve a high accuracy, DL methods requires a huge amount of labeled images organized in datasets. These datasets may be scarce and incomplete in some domains, leading to an increasing amount of research aimed at augmenting DL approaches with auxiliary information. In particular, language information, which is freely available in large amounts on the internet, is in the focus of research and has shaped several deep transfer learning approaches in recent years. Language information heavily depends on the statistical correlations among the collected words which exist within a particular corpus. However, this learned representation, is unpredictable and cannot be adapted, making it difficult to use in specific domains. On the other hand, knowledge graphs (KG) show great potential in formalizing and organizing large-scale unstructured information. These KGs, engineered by domain experts, can be easily adopted to perform various tasks in specific domains. Recently, methods have been developed that transform KGs, in vector-based embeddings so that they can work directly in combination with deep neural networks (DNN). In this survey, we first describe different modeling structures of a KG, such as directed labeled graphs, hyper-relational graphs, and hypergraphs. Next, we explain the structure of a DNN, which consists of a prediction task and a visual feature extractor or a semantic feature extractor, respectively. Furthermore, we classify KG-embedding methods as semantic feature extractors and provide a brief list of these methods and their usage according to respective modeling structure of a KG. We also describe a number of joint training objectives suitable to operate on high dimensional spaces. The respective definitions of tasks for transfer learning and transfer learning using knowledge graphs are presented. Next, we introduced four different categories on how transfer learning can be supported by a knowledge graph: 1) Knowledge graph as a reviewer; 2) Knowledge graph as a trainee; 3) Knowledge graph as a trainer; and 4) Knowledge graph as a peer. We also provide an overview of generic KGs and a set of datasets and benchmarks containing images with or without additional information such as attributes or textual descriptions, with the intention of helping researchers find meaningful evaluation benchmarks. Last, we summarize related surveys in the field of transfer learning and deep learning using additional knowledge, and give an outlook about challenges and open issues for future research.
Full PDF Version: 
Under Review