Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

A Linked Data Wrapper for CrunchBase

Submitted by Michael Färber on 11/05/2016 - 02:29

Tracking #: 1493-2705

Authors:

Michael Färber

Carsten Menne

Andreas Harth

Responsible editor:

Jens Lehmann

Submission type:

Dataset Description

Abstract:

CrunchBase is a database about startups and technology companies. The database can be searched, browsed, and edited via a website, but is also accessible via an entity-centric HTTP API in JSON format. We present a wrapper around the API that provides the data as Linked Data. The wrapper provides schema-level links to schema.org, Friend-of-a-Friend and Vocabulary-of-a-Friend, and entity-level links to DBpedia for organization entities. We describe how to harvest the RDF data to obtain a local copy of the data for further processing and querying that goes beyond the access facilities of the CrunchBase API. Further, we describe the cases in which the Linked Data API for CrunchBase and the crawled CrunchBase RDF data have been used in other works.

Full PDF Version:

swj1493.pdf

Previous Version:

A Linked Data Wrapper for CrunchBase

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Konrad Höffner submitted on 24/Nov/2016

Suggestion:
Minor Revision

Review Comment:

# Review of the revised version of "A Linked Data Wrapper for CrunchBase"

**Summary: The major issues have all been adressed . Still, the SPARQL endpoint is missing and I don't agree with the reasoning to not provide one. Maybe the editors can clarify whether a SPARQL endpoint is truly optional? Until then, I upgrade my recommendation from reject to minor revision. Version date and number also should be added, as requested in the call.**

## Quality and stability of the dataset

The URL is provided now but version date and number are still missing.
The licensing of the RDF dataset has been clarified.

I do not agree with the the reasoning of why a SPARQL endpoint, which has been requested by all three reviewers, is still not provided.
1. "SPARQL endpoints suffer from availability issues": A non-existent SPARQL endpoint has 0% availability, so any endpoint you provide is better than none at all. I thank the authors for linking the very informative paper "Linked Dataset Description Papers at the Semantic Web Journal: A Critical Assessment" and it indeed highlights common technical issues, but I do not find the suggestion to remove the SPARQL endpoint altogether there, nor do I find this an acceptable solution to the problem.
2. "SPARQL endpoints are prone to complex queries which might affect the SPARQL endpoint performance or even might result in downtimes of the endpoint." This argument is related 1., as it is also concerned with availability issues.

## Clarity and completeness of the descriptions.

### Innovation

> Nowack already provided an RDF wrapper for the CrunchBase API called Semantic CrunchBase in 2008, the service is no longer available.

A (blog entry)[(http://bnode.org/blog/2008/07/29/semantic-web-by-example-semantic-crunch...)] for Semantic CrunchBase states “The initial RDF dataset is not using any known vocabs such as FOAF (or FOAFCorp). (We can INSERT mapping triples later, though.)”.

This is a major issue. According to the above statements, the existing approach could be extended with “mapping tiples” to integrate vocabularies. A thorough motivation, why this existing approach was not extended but a completely new approach was taken, is essential.

### Unproven Statements
The vague and unproven statements have been clarified and provided with sources. While those sources are mailing list posts and text from the Crunch Base web page instead of citable publications, I rate this as acceptable in this cause because this is probably the only way to get this information.

### Factual Errors

#### 5 Star Ranking
The Linked Data vocabulary star rating has been added and correctly applied.

#### JSON to RDF
The missing reference for the papers that transform JSON to RDF has been added.

### Formal Criteria and Writing
The writing is clearer and less verbose now and the paper now adheres to the 10 page limit.

Review #2

By Marta Sabou submitted on 14/Dec/2016

Suggestion:
Accept

Review Comment:

I thank the authors for responding to my comments - the new version of the paper has been improved considerably. I accept the authors' arguments for not providing a SPARQL endpoint for the data and recommend accepting the paper as is.

Review #3

Anonymous submitted on 01/Jan/2017

Suggestion:
Accept

Review Comment:

I read the revision and responses to the reviewer's comments. I think the new revision adequately addressed my concerns. So, I have no further requests for the current stage of the article.

Log in or register to post comments
14864 reads

Main menu

Editorial Board

Syndicate

A Linked Data Wrapper for CrunchBase

Tracking #: 1493-2705

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

A Linked Data Wrapper for CrunchBase

Tracking #: 1493-2705

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles