A Linked Data Wrapper for CrunchBase

Tracking #: 1493-2705

Michael Färber
Carsten Menne
Andreas Harth1

Responsible editor: 
Jens Lehmann

Submission type: 
Dataset Description
CrunchBase is a database about startups and technology companies. The database can be searched, browsed, and edited via a website, but is also accessible via an entity-centric HTTP API in JSON format. We present a wrapper around the API that provides the data as Linked Data. The wrapper provides schema-level links to schema.org, Friend-of-a-Friend and Vocabulary-of-a-Friend, and entity-level links to DBpedia for organization entities. We describe how to harvest the RDF data to obtain a local copy of the data for further processing and querying that goes beyond the access facilities of the CrunchBase API. Further, we describe the cases in which the Linked Data API for CrunchBase and the crawled CrunchBase RDF data have been used in other works.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Konrad Höffner submitted on 24/Nov/2016
Minor Revision
Review Comment:

# Review of the revised version of "A Linked Data Wrapper for CrunchBase"

**Summary: The major issues have all been adressed . Still, the SPARQL endpoint is missing and I don't agree with the reasoning to not provide one. Maybe the editors can clarify whether a SPARQL endpoint is truly optional? Until then, I upgrade my recommendation from reject to minor revision. Version date and number also should be added, as requested in the call.**

## Quality and stability of the dataset

The URL is provided now but version date and number are still missing.
The licensing of the RDF dataset has been clarified.

I do not agree with the the reasoning of why a SPARQL endpoint, which has been requested by all three reviewers, is still not provided.
1. "SPARQL endpoints suffer from availability issues": A non-existent SPARQL endpoint has 0% availability, so any endpoint you provide is better than none at all. I thank the authors for linking the very informative paper "Linked Dataset Description Papers at the Semantic Web Journal: A Critical Assessment" and it indeed highlights common technical issues, but I do not find the suggestion to remove the SPARQL endpoint altogether there, nor do I find this an acceptable solution to the problem.
2. "SPARQL endpoints are prone to complex queries which might affect the SPARQL endpoint performance or even might result in downtimes of the endpoint." This argument is related 1., as it is also concerned with availability issues.

## Clarity and completeness of the descriptions.

### Innovation

> Nowack already provided an RDF wrapper for the CrunchBase API called Semantic CrunchBase in 2008, the service is no longer available.

A (blog entry)[(http://bnode.org/blog/2008/07/29/semantic-web-by-example-semantic-crunch...)] for Semantic CrunchBase states “The initial RDF dataset is not using any known vocabs such as FOAF (or FOAFCorp). (We can INSERT mapping triples later, though.)”.

This is a major issue. According to the above statements, the existing approach could be extended with “mapping tiples” to integrate vocabularies. A thorough motivation, why this existing approach was not extended but a completely new approach was taken, is essential.

### Unproven Statements
The vague and unproven statements have been clarified and provided with sources. While those sources are mailing list posts and text from the Crunch Base web page instead of citable publications, I rate this as acceptable in this cause because this is probably the only way to get this information.

### Factual Errors

#### 5 Star Ranking
The Linked Data vocabulary star rating has been added and correctly applied.

#### JSON to RDF
The missing reference for the papers that transform JSON to RDF has been added.

### Formal Criteria and Writing
The writing is clearer and less verbose now and the paper now adheres to the 10 page limit.

Review #2
By Marta Sabou submitted on 14/Dec/2016
Review Comment:

I thank the authors for responding to my comments - the new version of the paper has been improved considerably. I accept the authors' arguments for not providing a SPARQL endpoint for the data and recommend accepting the paper as is.

Review #3
Anonymous submitted on 01/Jan/2017
Review Comment:

I read the revision and responses to the reviewer's comments. I think the new revision adequately addressed my concerns. So, I have no further requests for the current stage of the article.