Survey on Semantic Table Interpretation

Tracking #: 1946-3159

Authors: 
Lahiru de Alwis
Achala Dissanayake
Manujith Pallewatte
Kalana Silva
Uthayasanker Thayasivam

Responsible editor: 
Jens Lehmann

Submission type: 
Survey Article
Abstract: 
The web contains a vast amount of tables which provide useful information across multiple domains. Interpreting these tables contribute to a wide range of Semantic Web applications. Aligning web tables against an ontology to understand their semantics is known as Semantic Table Interpretation (STI). This paper presents a survey on Semantic Table Interpretation(STI). Goal of this paper is to provide an overview of STI algorithms, data-sets used, and their evaluation strategies and critically evaluate prior approaches. In the effort of providing the overview we developed a generic framework to analyze STI algorithms. Using this framework we analyzed the existing algorithms and point out their strengths and weakness. Additionally this enables us to categorize the prior works and be able to point out the key attributes of each categories. Our analysis reveals that search based approaches are better in terms of accuracy and overall completeness, while other categories perform better only in annotating columns with high precision. Also, We present the evaluation methodology utilized in algorithms and discuss the limitations of it while providing suggestions for future improvements. In addition, we point out the design choices in building an STI and their associated trade-offs, which could be of value for the future STI algorithm developers and users.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vasilis Efthymiou submitted on 16/Aug/2018
Suggestion:
Major Revision
Review Comment:

Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic:
This work is very well suited as an introductory text for someone wanting to have a general picture of this topic, and someone who wants to dive deeper into this field.

How comprehensive and how balanced is the presentation and coverage:
The presentation and coverage is quite good and balanced, but there is still work that needs to be done with respect to clarity and evaluation.

Readability and clarity of the presentation:
The paper is well written, but there are parts that need to be further clarified and given attention to.

Importance of the covered material to the broader Semantic Web community:
This work is important for Semantic Web researchers that want to have a quick overview of the field.

DISCLAIMER: I am a co-author of a paper [7] cited in this work.

In a nutshell:
This paper discusses the latest works on semantic table interpretation, the process of relating (a set of) Web tables to a given target Knowledge Base.
This is a work that was missing from the literature and, when improved, it will be a very good introduction to researchers who want to dive into this field.
It is well written and covers an adequate number of recent works in the field. The topic is important for the Semantic Web community, as it is on making the
Semantic Web richer, by incorporating knowledge given by raw, yet structured, text in the form of Web tables.
Therefore, I recommend accepting this work, but only after doing a lot of work to improve it in several aspects, as detailed below.

In more details:

Major comments:
- There is a recent work that should be mentioned and compared against: Ritze, Lehmberg, Oulabi, Bizer. Profiling the Potential of Web Tables for Augmenting Cross-domain KBs. WWW '16.

- Background: There should be at least a small discussion on the objective function Q. Definition 2.5 is too generic and the reader ends with this function considered as a black box. This is the core of the problem and it deserves further analysis.

- Figure 1 is supposed to represent a single ontology. However, it seems unrealistic for the same ontology to consider the same concept (China) to be both a Country and a City. Usually, this happens with two different concepts that may have the same name, so two vertices having the same name would be expected. Please clarify, as I find this Figure confusing and it is central in the paper.

- Intro of Section 4 (Table Annotation tasks): The first paragraph of this section is not clear at all. Furthermore, it took me 2-3 passes on this paragraph alone to understand how you define the difference between an annotation task and the problem of STI. After those passes, I think that the problem is here and in the first paragraph of the paper, where you define STI in a very vague way. Please clarify both parts. For example, you should at least state here how is this set of table entities selected in an annotation task?

- Section 6 (FactBase Lookup): FactBase Lookup is only one of the three methods that we presented in [7]. The others are using embeddings, ontology matching, and the best one (called Ensemble in [7]) is a hybrid of the FactBase lookup and the embeddings method. You don't have to put this clarification in the paper, I just wanted to state this for completeness, in case it was not 100% clear. In the description of our work in your paper, I could not understand in which of those methods you were referring; I think you are describing more than one of those approaches as a single approach.

- Table 2: Relation Annotation, a feature of this table, is not defined or discussed anywhere. If my understanding is correct, FactBase lookup needs a check there, as it detects relations in Web tables and associates them with the corresponding ones from the target ontology. Perhaps T2K should be ticked also. However, without a definition for this column, I cannot be 100% sure what you mean and which works belong there. I think you should add a new subsection (4.5) and discuss this problem.

- Last paragraph of Section 6.3: "By observing the table, it is apparent that search based approaches more concise in overall capabilities in comparison with the alternate approaches."
This is a VERY important statements that people reading this work are expecting to read and be sure about it. They want your expertise on the field to help them decide which way to go.
However, this is not at all justified. Nothing is "apparent" by observing the table, as you may think, for the reader who has just started reading on this field. You need to properly justify this statement with facts, even if it may seem apparent to you after spending so much time reading the bibliography.

- Evaluation: This is a weak part of the paper, in my opinion. It's good that you introduce the metrics used to evaluate those works and that you properly explain them, but could you do nothing more to include more works, even for some of the gold standards, if not all of them? I know that it's really hard getting results for all those tools, but I also know that may times you can get those results by asking the help of their authors.

- Conclusions and Future Work: I think you should extend this very useful discussion that you have started (it's good but can be better) with more insights and key observations and some first ideas on the future work you suggest. I believe that there is no page limit at the moment, but perhaps I am wrong.

Minor comments and typos (in order of appearance):
- Abstract: "This paper presents a survey on Semantic Table Interpretation(STI). Goal of this paper is to provide an overview of STI algorithms, data-sets used, and their evaluation strategies and critically evaluate prior approaches" --> "This paper presents a survey on STI, aiming to provide an overview of STI algorithms, data-sets used, and their evaluation strategies and critically evaluate prior approaches".
- Abstract: "and point out their strengths and weakness" --> "(...) weaknesses"
- Abstract: "Also, We present" --> "Also, we present"
- The second paragraph of the intro is not at all clear; please re-write from scratch.
- Intro, par. 3: "Without the loss of generality" --> "Without loss of generality"
- Equation (1): I would expect M to be one of Q's parameters (also that is what I expected after reading Definition 2.5). If not, please justify.
- Definitions 2.1, 2.2 are not properly discussed. In a survey paper, I would expect an extensive discussion about other types of tables (non-relational, e.g., vertical) and which ones are easier to interpret and why. Also, what if there is no header row?
- Definition 2.3: "Ontology O is" --> "An Ontology O is"
- Definition 2.4: Missing a '.' at the end.
- After Definition 2.5: How is |O| defined? Is this equal to |C|?
- After Equation (3) (and everywhere): "according to the definition 2.6" --> "according to Definition 2.6" (lose "the" and use capital first letter when referring to a specific definition/equation/figure).
- Last paragraph of Section 2: "In literature these sub-problems are referred to as Annotation Tasks" add reference(s) to this literature
- Section 3 (occurring again elsewhere): "ideology" --> "approach"
- Section 3: "Some literature[6, 8, 14] strongly suggests the existence of such relationships." What is the meaning/purpose of this sentence?
- Last sentence of Section 3: "We call out for a future work that investigates the possible trade offs between these two choices." Could you state a few more words about that?
- Section 7.2: " Efthymiou et al.[7] contributed to the process by converting both T2D and Limaye data sets to JSON format." This is not an actual contribution, but I am glad that you found it useful. Instead, what I consider an actual contribution, as stated in our paper [7], is the new gold standard that we created from Wikipedia tables and offer it publicly [7]. Adding it to your experiments (we also have the results for T2K) might enrich your evaluation results, but I don't suggest that it is necessary.
- References: try to use a consistent format in the references. Also, references 40-46 are never mentioned; please remove them, or say something about them, if you want to keep them.

Review #2
By Ziqi Zhang submitted on 19/Aug/2018
Suggestion:
Major Revision
Review Comment:

The paper reviews state of the art on semantic table interpretation. This is an important task for the vision of the Semantic Web, and has gained increasing popularity in the recent years. To my knowledge, this is also the first survey paper in this field, and therefore it would make a reasonable contribution to this area of research by summarising and comparing previous work to serve as guidance to future research. The paper is generally well written and I find it easy to follow. However, in my view, the paper still needs to address several major issues to be accepted.

First, some definitions involved in the task are rather peculiar and I think there are also some flaws. While I like the generic formalisation of the task, which on the whole, really generalises and defines the task quite well, but the use and definition of terminologies is not precise and sometimes confusing. Details below.

Definition 2.1: It seems you are only considering horizontal tables. This should be made clear. Although you can argue that vertical tables could be identified and converted to horizontal in a pre-processing step.

Definition 2.2: The term 'entity' is very confusing because in the Semantic Web community it is typically used to refer to named entities. Also in your definition 2.3 you used 'entity' to refer to a different meaning; later in paper the term is also used to refer to named entities. I suggest 'table elements/components/entities' instead.

Definition 2.3: Here you used 'entity' again but with a different meaning from that in Def2.1. Also, what is the point of the notations of V, E, and L, considering that you never use them later?

Definition 2.4: The term 'concept' is confusing because in the Semantic Web community, it often refers to class in an ontology. Instead, you meant to use this notation to refer to the concatenation of the set of classes, instances (i.e., entities), properties, and relations. I suggest you stick to the classic terminologies and their definitions.

Definition 2.5: Q is defined but not used later.

Last paragraph in the left column on page 2: regarding the search space, although the generalisation is useful, you should highlight that in practice, the search space size can be much smaller than |A+1|^|C|. This is because you do not need to search every table entity against all candidates in O. For example, you only search a table column header against all classes/properties/relations in O; and you only search a table cell against all entities in O.

Second, I think more coverage should be given on the datasets/evaluation methods that each surveyed method used. How big are the datasets used by each work, how are they created (particularly are there any ways to create datasets automatically?), do any work used similar/shared datasets, are any datasets available for re-use? In particular, I am curiuos about the supervised machine learning based methods. Considering that labelling tables is a very laborious task, what datasets did these methods use, how were they annotated, how big are they and can they be re-used? Sharing datasets in this research area is what's been missing so far and should be encouraged. You could improve this by adding more detailed discussion in Section 7, and also expand table 2.

Further, it is worth to mention that [8] published three datasets: one updated version of Limaye's dataset, one using IMDB and one using MusicBrainz. These are all annotated using Freebase which is shut down. But these annotations can be easily converted to Wikidata, because Freebase IDs can be easily mapped to wikidata.

Third, on page 11, the authors suggest some protocols for STI evaluation. But this needs to be elaborated. For example, how do you measure PRF1 for 'table annotation'? Are you considering the 'one-class-per-table' seting where you determine if a class is correctly assigned to label the table? If not, how is evaluating 'table annotation' different from evaluating 'row annotation', 'column annotation' etc? How exactly do you carry out the evaluations for each category?

Fourth, I consider a good survey article to not only be a good summary of state of the art but also highlight their limitations, existing challenges and future research directions. The paper currently has done a reasonable job on the first part, but it has been mostly descriptive and hardly addressed the second part. What do you think are the limitations of these work? What are the remaining challenges in this research area? And how do you envisage them to be addressed?

Finally a minor point: the paper needs proof-reading as there are some grammar issues such as plural/singular and misspellings.

Review #3
Anonymous submitted on 29/Aug/2018
Suggestion:
Reject
Review Comment:

The article surveys methods for aligning/matching web tables with a reference ontology in order to support applications in understanding the table data.

The article only provides superficial descriptions of the different approaches which partly contain factual errors. The framework that is used to categorize the different methods is very shallow. The article does not discuss which table and context features are used for matching. The statements that are made in the conclusion about the shortcomings of the currently gold standards and evaluation methodologies are wrong.

I thus opt for rejecting the article and explain my concerns in more detail below.

1. Superficial descriptions of methods: The descriptions of the specific methods in Chapter 6 are rather short and do not use the “generic framework” in a systematic way. For instance, the description of the Karma system only consists of 4 rather vague sentences that do not refer to the framework and do not explain what is meant with “semantic model”, do not explain what “user input” in incorporated, and do not explain what is learned using CRFs. The discussion of the evaluation of the system is also way too vague (“promising results”). It remains unclear which data is used for the evaluation and how the evaluation results compare to the results of other systems. Further, the article complains that the authors of Karma do not discuss how much training data is required for achieving good results. This is funny as the article itself contains a complete chapter on supervised approaches (Chapter 6.2) which also does not discuss how much training data is used by the different approaches.
2. Factual errors: The description of the approaches in Chapter 6 contains factual errors. For instance, T2K match does not lack “literal column annotation” (Page 7). T2K match also does not use the “DBpedia query interface” (Page 7), but reads the complete DBpedia ontology into memory in order to speed up matching.
3. Shallow framework: The analysis framework introduced in Chapters 2 to 5 is rather shallow and does not properly explain the challenges that are associated with the specific tasks (Chapter 4) or the motivation behind the different execution approaches (Chapter 5).
4. Missing aspects: The paper does not discuss which table features and which table context features (page titles, surrounding text) as well as external resources (like surface form dictionaries) are used by the different approaches. Knowing which features are used for matching is cruel for interpretation of results. Table model in Chapter 2 also does not contain table context while many systems exploit this context.
5. Superficial discussion and comparison of evaluation results. The comparison of the evaluation results not deep enough for a survey paper and does not discuss any specifics of the used gold standards (e.g. T2K and Limaye, other standards?), which are necessary to understand and interpret the evaluation results. In general, the paper does not make any attempt to interpret the evaluation results in Table 3. Table 3 also covers the evaluation of 3 systems out of the 12 systems mentioned in Chapter 6. No attempt is made to compare the performance of the remaining 9 systems, which is not acceptable for a survey article.
6. Wrong and blank conclusions: Conclusions in Chapter 8 partly do not build on systematic argumentation in paper (What is the problem with subject column detection? What is the problem with candidate space selection?). The claim that the “Gold standards used at present are not equipped to evaluate all listed criteria” is wrong and shows the missing understanding of the authors of the evaluation methods. Gold standards like T2K contain the correspondences (annotations) for all table elements as well as table class and subject column annotations. By comparing these gold standard correspondences to the correspondences (annotations) returned by the system, all the measures proposed in Section 7 can be calculated without any need to change the gold standard or introduce new gold standards as claimed in Section 8. Also many papers do report the evaluation results for specific subtasks, so the proposal of the authors is already implemented (for overview tables about the results on specific tasks see Chapter 8.3.7. in https://ub-madoc.bib.uni-mannheim.de/43123/1/thesis.pdf).

Review #4
By Sebastian Neumaier submitted on 05/Sep/2018
Suggestion:
Major Revision
Review Comment:

The submitted paper surveys existing semantic table interpretation algorithms, i.e., it tries to give an overview and to categorize the existing work in terms of annotation strategies, evaluations, used background knowledge bases etc.

Main Points:
- The motivation in the paper is basically “Interpreting these tables is humanly impossible due to its size.” Can you work on the motivation and use cases for semantic table interpretation. How do others motivate their work on the topic?
- In several cases it is not clear if references are missing, or how they are used. E.g., on page 5: “Sequential Approach is the most commonly used approach in STI algorithms [13, 24].” Listing only these two does not look like the “most common” approach. More examples below.
- Section 2 needs a thorough revision: You do not introduce sources of tables (web tables, e.g. from Wikipedia, spreadsheets, CSVs, e.g. from open data portals, etc.) and the definitions look partially incomplete (see detailed review below).
- Section 5 seems incomplete in terms of categorizing the four discussed approaches: For all of the approaches only one or two references are given. I wonder if this categorization is reasonable if you cannot group them.
- Section 6: While I like introducing the three classes (search based, supervised, and unsupervised learning based), the selection of the discussed papers looks a bit arbitrary; you do not argue or define clear selection criteria. This also relates to a comment below on missing discussion of the development of the approaches: For instance, if you would argue and discuss why TableMiner+ (or any other) is prototypical, or an evolution and state-of-the-art of related ones, it gets clear why you pick it.
- The evaluation methodology in 7.3 (including precision recall and f1 definitions) looks out of place; it’s not necessary to include these definitions. I’d rather like to see clearer statements and analysis of which evaluation strategies are currently applied; which gold standard datasets are missing. In the conclusions you write “the gold standards are not equipped to evaluate all listed criteria” -> please elaborate; this is important for others working on the topic.
- The typos and incorrect grammar pointed out below is not complete! The paper needs a thorough proofreading.
- The work looks - at it current state - a bit premature and unfinished. Also the presentation of (running) examples can be improved: E.g., you refer to Figure 1 as O1, which gets confusing.

Other aspects of a review on STI:
- Is there earlier related work, and how did the selected recent papers (that you pick) develop? E.g. which efforts are based on/emerged from each other.
- You could also cover complementary and preliminary steps for semantic table interpretation, such as schema extraction (e.g. [3]), and also discuss that (and how) the works are related to other domains e.g. data integration (e.g. [4])
- What is the target dataset/domain of the algorithms, i.e., which datasets aim the authors to annotate? E.g., do they cover web tables (from Wikipedia? common web crawl?), CSVs, spreadsheets, and what are the different aspects of these different tables? Which datasets are covered for evaluation?

Detailed review:
- “The web contains over 100 million tables in formats of web tables[1]”: Your reference ([1]) itself references another paper for this information, which is from 2008, i.e., already 10 years old. Reference the correct paper, and, ideally, give the year of the paper to put it in context.
- p.1 “A knowledge base is essentially a directed labeled graph”
- p.1 “Across literature, different terms such as knowledge graph [3, 4], taxonomy[5], catalog[6] and knowledge base[7, 8] are used to identify the knowledge bases.
- p.1 “Without the loss of generality we can model all above instances as Ontologies.”
You should be careful with terms and definitions here.. These might be all different things.
It’s not obvious why all these should be knowledge bases, and why they all can be modelled as ontologies. Several knowledge bases distinguish between ontologies and instance data.
- p.2 What kind of table do you discuss? Web tables? CSVs? Spreadsheets? What are your preconditions for a table?
- p.2 Def 2.2: “entity can be a row, a column, a header, a cell, or the table itself” Can you explain/give examples what you mean? When do you have an entity as a table row/column?
- p.2 Def 2.3: mapping L: What about the labels for the vertices? E.g. in your figure 1 the vertices have labels. Why don’t you use RDF as a representation of ontologies?
- p.2 Def 2.4: C is the set of all entities and relations? Typically, the concepts is the set of classes in an ontology
- Fig. 1.: the relation between City and Capital should be rather modelled as subclass relation (e.g. look at DBpedia)
- p.3: “Modelling *a* table as a single class”
- p.3: “corresponds each column of the table to a different class”: They do not map *each* column to a class, but *some* to classes and properties. Also, the model name “table as a set of classes model” is misleading in that sense, and sometimes leads to awkwardly constructed sentences.
- section 3: I found the name “Table Modelling” a bit misleading in the beginning. What about “table interpretation”, or “table schema”?
- p.3: You talk about web tables assuming the “single class model”: Again, I miss a discussion of the corpus the algorithms work on. What other tables are there used in the literature? What model is applied there? Why?
- p.6 “executes them repeatedly executed” -> reformulate
- p.6 “For an instance” -> for instance
- p.6 “by using partial annotations of annotation task to improvement each other” -> reformulate
- p.6 “TableMiner presents *an* efficient approach”
- discussion of “Nguyen et al. [3]” -> work is not published (Rejected? https://2018.eswc-conferences.org/paper_80/ ), uses Wikidata not DBpedia (Table 2)
- “tables as HTML[6] or XML[6]” -> remove first reference
- p.8 “as input and *generates* a semantic
- p.11: In the enumeration you propose “Precision, Recall and F1” measure for all annotation types -> Lots of repetitions. reformulate, or somehow present it differently
- “Conclusion and Future Work”: You do not discuss your future work in this section (which is not necessary in the survey - but the header is misleading)

In general, I like the idea of a review paper on this topic: It could bring a common formulation of the problem, and, more importantly, inform about evaluations and gold standards. Such a survey could not only inform about the current lack of a proper baselines, but guide towards better evaluation strategies. Also, there exist many varying works already, which justify such a survey.
However, the current paper needs a lot of work to become such a survey: the common formulation/definition (section 2) is not convincing (in my opinion), there are aspects not covered, and guidelines for evaluations are missing.

Other works I came across:
[1] Zhang, Meihui, and Kaushik Chakrabarti. "InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables." Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. ACM, 2013.
[2] Deng, Dong, et al. "Scalable column concept determination for web tables using large knowledge bases." Proceedings of the VLDB Endowment 6.13 (2013): 1606-1617.
[3] Adelfio, Marco D., and Hanan Samet. "Schema extraction for tabular data on the web." Proceedings of the VLDB Endowment 6.6 (2013): 421-432.
[4] Cafarella, Michael J., Alon Halevy, and Nodira Khoussainova. "Data integration for the relational web." Proceedings of the VLDB Endowment 2.1 (2009): 1090-1101.
[5] Knap, Tomás. "Towards Odalic, a Semantic Table Interpretation Tool in the ADEQUATe Project." LD4IE@ ISWC. 2017.
Also, have a look at the related works in the TableMiner+ paper, which is fairly exhaustive.