A metadata schema for documenting material samples from multiple domains

Steve Richard

Cogan Shimizu

Ontology Description
The Internet of Samples (iSamples) project brings together material sample metadata from the System for Earth Sample Registration (SESAR), Open Context, the Genomic Observatories Meta-Database (GEOME), and Smithsonian Institution Museum of Natural History (NMNH), representing geoscience, archaeology/anthropology, and biology disciplines. To create an index for sample discovery across these disparate domains, we reviewed the metadata schema and example metadata from each project partner to develop the sample description scheme described in this document. We determined that a single sample type classification vocabulary could not account for the spectrum of samples without becoming very large and unwieldy. By factoring the categorization into material type, material sample object type, and sampled feature type, it has been possible to classify the approximately 6,000,000 samples in the combined corpus. High-level vocabularies were developed based on random subsamples and unique values summaries from related fields in the source sample metadata, and tested with the project team using a card sorting exercise, and by developing code for automated mapping. Our goal was that each vocabulary should have on the order of 20 values and some hierarchy, values should be covering, but might overlap. These vocabularies are documented here, and registered with the ARDC Research Vocabularies Australia (RVA) vocabulary service for use by the community. The metadata schema is implemented as a JSON schema that is used to validate instance documents. To further test and evaluate the schema, mapping to the DataCite schema now used by IGSN (International Generic Sample Number), schema.org JSON-LD, the Biodiversity Information Standards (TDWG) Minimum Information about a Digital Specimen (MIDS), and Distributed System of Scientific Collections (DiSSCo) Open Digital Specimen (openDS) schema are included as appendices.
