Metadata is provided by
publishers to describe their resource to aid its discovery, exchange and
re-use. A metadata schema defines a set of elements, where each element refers
to one aspect of the resource description, for example the title or the author.
These elements are sometimes grouped into sets of elements useful for different
purposes, for example discovery metadata contains the elements useful for
finding a resource of interest;
technical or use metadata contains the elements necessary for using the
resource correctly. For each element, the schema defines the semantics, i.e.
the meaning intended to be conveyed, and the rules for the content i.e. the
data type, the domain range from which values can be user, whether the element is
mandatory, optional or conditional. The schema will also define how the
elements are related to each other in terms of a hierarchy and cardinality.
Some metadata standards inherently define an encoding by which the metadata
must be serialised, in other cases the method for encoding the metadata is
described in a guidance document or complementary standard.
By using these accepted
schemes, differing levels of interoperability can be achieved. At the simplest level,
the metadata can be presented using the agreed term labels and natural language
definitions so a human user can understand it and manually compare it to
metadata records in the same or alternative schemes.
To achieve additional
syntactic and schematic interoperability, the metadata can be encoded according
to the schema rules and appropriate data format in such that an application can
consume the metadata. Mappings (crosswalks or translations) between alternative
metadata schemes, or the individual elements within a scheme, can be used so that
metadata can be aggregated from various schema sources.
For even greater interoperability,
the data content of the elements within a metadata record instance can be encoded
using the W3C RDF (Resource Description Framework) standard such that a full
description of the term, its meaning and its relations to other terms can be easily
obtained as a web resource from a URI. The use of controlled vocabularies is
discussed further below.
There are a significant
number of standards for both discovery and technical metadata. There are also a
range of services by which metadata can be recorded and the data stored
alongside these data. As previously outlined by Hughes et al (2013), NERC itself puts a
significant amount of effort into storing data and model results and making the
metadata available. For example there are seven Data Centres and the Data
Catalogue Service (DCS) to search metadata for datasets stored in the NERC data
centres. A useful summary of metadata standards across a range of disciplines
is available on the RDA Metadata Directory REH20. The commonly used standards relevant to
this study are described below.
The Dublin Core Schema REH16 is a widely used and widely applicable metadata
standard. The Dublin Core Metadata Element Set is the original set of terms
published. It consists of fifteen optional metadata elements for cataloguing
web resources and physical resources. This set of elements has been endorsed as
ISO Standard 15836-2009, IETF RFC 5013 and NISO Standard Z39.85. The wider and
full set of Dublin Core metadata terms, including the 15 original ones, are
maintained by the Dublin Core Metadata Initiative (DCMI). This set includes
spatial and temporal extent, provenance and various ways of relating different
resources (source, isPartOf, isFormatOf etc). DCMI publishes a number of
encoding guidelines as recommendations, for example in RDF, XML and HTML REH21.
The ISO standards
and 19119 REH9,REH10
are internationally adopted schema for describing geographic information and
services respectively. The first edition of ISO 19115 was published in 2003,
but it has since been split into parts: ISO 19115-1:2014 contains the
fundamentals of the standard; ISO 19115-2:2009 contains extensions for imagery
and gridded data; and ISO/TS 19115-3:2016 provides an XML schema implementation
for the fundamental concepts compatible with ISO/TS 19138:2007 (Geographic
Metadata XML, or GMD). The ISO19115/19119 standards are defined in UML, but a
complementary ISO standard 19139 REH11 defines how the metadata should be serialised and
encoded in XML. Conformance against the 19115/19119 standards is therefore most
easily checked by presenting the metadata in xml and validating them against
the 19139 XML schema. The standard is primarily aimed at geographic
datasets and services but the domain code list of scopes that the metadata can
be used for includes non-geographic datasets, models (“information applies to a
copy of imitation of an existing or hypothetical object”) and software
(“information applies to a computer program or routine”), though there are no
additional elements that specifically relate to these resource types.
The Research Data Alliance (RDA) has provided links to
mappings of the ISO 19115 standard from various community standards and
exchange formats REH13.
The set of elements in the ISO standard is large and would unlikely to ever all
be used in one metadata record, but only a small number of elements are in the
core mandatory set. Discipline communities and national governments have therefore
created extensions and profiles of the ISO standard that enforce tighter
constraints on cardinality and mandatory elements while removing many optional
elements. The profiles used by INSPIRE,
UK GEMINI, ANZLIC REH26
and MEDIN REH27
are examples of these, further ones are also listed on the RDA directory REH13. The profiles
used for INSPIRE, UK GEMINI and NERC metadata standards are discussed below.
INSPIRE is an “Infrastructure
for Spatial Information within Europe” and is an EC directive. The purpose is
to enable interoperability and sharing of data that are related to activities
or policies having impact on the environment. INSPIRE has published a
specification for metadata used for data discovery based on the ISO19115/19119
Application Profile (metadata for geographic information) with a definition of
core metadata elements from this required for INSPIRE compliance REH3. Some mandatory ISO metadata elements are not
mandatory in the INSPIRE standard therefore need to be included as additional
elements in order to make an INSPIRE record into a valid ISO record; these are
noted in the Implementing Rules Technical Guidance REH3. The INSPIRE metadata implementing
rules were revised in March 2017 to version 2.0.1. ; the encoding continues
to be based on ISO 19115:2003 REH12 not on the latest version ISO 19115-1:2014. Resources that are in-scope of the INSPIRE directive
are those that are datasets, series or services; are digital; have a geospatial
component; and the subject of which is one or more of the defined environmental
themes. Process modelling code is not in scope of INSPIRE.
Within the UK, the UK
GEMINI metadata standard REH5,REH6
represents the UK implementation of
INSPIRE and as such also conforms to the ISO 19115/19119 standards. All
mandatory INSPIRE elements in version 2.1 of the implementing rules are
mandatory within UK GEMINI 2.2; revision
2.3 is in progress at the time of writing and will be conformant with the
INSPIRE implementing rules 2.0.1. The
GEMINI encoding guidelines for each version ensure that the metadata is encoded
in valid ISO 19139 xml which will also be valid against the relevant INSPIRE
The Data Catalog (DCAT) Vocabulary REH23 is a recommendation of the W3C. It is
an RDF vocabulary designed to facilitate interoperability and federated
searches across data catalogues published on the Web and also facilitate
digital preservation. It makes extensive use of terms from other vocabularies,
in particular Dublin Core and only defines a minimal set of classes and
properties of its own. An application profile DCAT-AP was published by the ISA
programme of the EU for data portals in Europe REH30. DCAT-AP has itself been extended to
better handle geospatial datasets and services and to provide an RDF
representation for the ISO 19139 xml metadata elements required by
INSPIRE. A stylesheet transformation has
been made available for converting between ISO19139 xml and geoDCAT-AP RDF REH24. As stated by the
authors: “The GeoDCAT-AP specification does not replace the INSPIRE Metadata Regulation nor
Metadata technical guidelines based on ISO 19115 and ISO19119.
Its purpose is give owners of geospatial metadata the possibility to achieve
more by providing an additional RDF syntax binding” REH25.
Climate and Forecast
(CF) Metadata Conventions
The Climate and
Forecast (CF) conventions REH31
define metadata that enable a file containing observation data to be
self-describing. It was originally
framed as a netCDF standard for model-generated climate forecast data but can
equally be used for other data formats and other observational datasets. The
metadata provides some basic discovery level elements such as the title, author
and spatial and temporal extent, but primarily is designed to promote the
re-use and processing of datasets by providing a definition of all the
variables in the dataset in terms of the observed parameter, the measurement
units, and data quality information about each data value.
As discussed above, using a standard metadata schema and a
standard encoding provides the syntactic and semantic interoperability for the
metadata record. To further enable the interoperability of the content of the
metadata, the vocabulary used in it should be taken as much as possible from
controlled vocabulary lists where the term is defined, and preferably from
vocabularies that are in common or agreed usage. Most metadata schemas provide such
vocabularies or code lists along with the metadata standard, and the content of
elements can be restricted according to that list.
One of the ways to make a vocabulary easily accessible to
both human and applications is to represent the vocabulary term as a web
resource using W3C Resource Description Framework standard, and serve the
vocabulary terms and definitions via a web API.
For example NERC provides an API to a variety of vocabularies used
across NERC Leadbetter 2012 REH28;
IUGS-CGI publishes a suite of internationally agreed geoscience vocabularies REH29; BGS has started
the publication of key stratigraphy vocabularies REH27.
For the additional metadata elements required for this
hazard metadata schema, new code lists were required for parameters, software
code etc. For this first version the
vocabularies and definitions are only available to the internal database and
are not available as web resources; this has been identified as a potential
task for future work as discussed later.