Skip to content

ChemDCAT-AP: A DCAT-AP+ based Application Profile for Chemistry and Catalysis

ChemDCAT-AP is an application profile that extends DCAT-AP+ for chemistry and catalysis research data. It is written in LinkML (Moxon et al. 2025) and developed jointly by NFDI4Chem and NFDI4Cat. As the first domain-specific extension of DCAT-AP+, ChemDCAT-AP illustrates how the domain-agnostic provenance core of DCAT-AP+ can be adapted to a specific scientific domain while maintaining full interoperability with the European data standard DCAT-AP.

As in DCAT-AP+, the schema itself — chem_dcat_ap.yaml — is the single source of truth from which SHACL shapes, JSON/-LD Schema and Context, Python/Pydantic data classes, and an HTML schema reference documentation are generated, which means all are guaranteed to be coherent.

What ChemDCAT-AP adds

DCAT-AP+ provides the generic building blocks that allow a detailed, machine-readable description of how a dataset was created and what it is about. Yet, it requires explicitly typing the instance data with the correct domain ontology or vocabulary terms.

ChemDCAT-AP eliminates this potential hurdle by baking domain knowledge directly into the schema. Every ChemDCAT-AP class extends a DCAT-AP+ class via is_a. Every slot inherits from a DCAT-AP+ slot the same way. All schema elements are mapped to established, relevant ontologies, such as the Chemical Entities of Biological Interest (ChEBI) ontology, the Chemical Information Ontology (CHEMINF), the Semanticscience Integrated Ontology (SIO), or the Named Reaction Ontology (RXNO). The extension rules of DCAT-AP+ are followed throughout.

Consequently, instance data gets automatically typed correctly, removing the need for external lookups. The result is a cognitively lighter modeling experience that yields less verbose instance data, stricter validation, and semantically more precise RDF output.

DCAT-AP+ base ChemDCAT-AP specialization What it enables
Dataset SubstanceSampleCharacterizationDataset, ReactionMonitoringDataset Constraining a dataset to be about a SubstanceSample respectively a ChemicalReaction in a very general way (ChemDCAT-AP sub profiles are expected to define their own more specific ones)
DataGeneratingActivity SubstanceSampleCharacterization, ReactionMonitoring Describing the activities that generate chemistry and catalysis datasets in a very general way (ChemDCAT-AP sub profiles are expected to define their own more specific ones)
Entity MaterialEntity, ChemicalEntity, Atom, ChemicalProduct, Reagent, StartingMaterial Describing chemical substances by composition and role, with identifiers and physical properties
EvaluatedEntity MaterialSample, SubstanceSample, PolymerSample Describing what kind of chemical substance or material was evaluated
EvaluatedActivity ChemicalReaction Describing what kind of chemical reaction was evaluated including its inputs, outputs, agents and conditions
AgenticEntity Catalyst, DissolvingSubstance, Reactor Describing what influenced or enabled a chemical reaction without being consumed
has_qualitative_attribute inchi, inchikey, smiles, molecular_formula, iupac_name, Providing common chemical identifiers via dedicated slots
has_quantitative_attribute has_temperature, has_mass, has_concentration, has_yield, ... Providing common physical and chemical quantities via dedicated slots

Example: A chemical substance sample in ChemDCAT-AP

# A SubstanceSample
id: doi:10.14272/UGRXAOUDHZOHPF-UHFFFAOYSA-N.2
title: "CRS-50440"
composed_of:
  - id: doi:10.14272/UGRXAOUDHZOHPF-UHFFFAOYSA-N.2#EvaluatedCompound
    inchikey:
      - value: "UGRXAOUDHZOHPF-UHFFFAOYSA-N"
        title: "assigned InChIKey"
    smiles:
      - value: "CNCc1csc(n1)c1ccccc1"
        title: "assigned SMILES"
    molecular_formula:
      - value: "C11H12N2S"
        title: "assigned formula"
    iupac_name:
      - value: "N-methyl-1-(2-phenyl-1,3-thiazol-4-yl)methanamine"
        title: "assigned IUPAC name"
    has_molar_mass:
      - has_quantity_type: http://qudt.org/vocab/quantitykind/MolarMass
        unit: https://qudt.org/vocab/unit/GM-PER-MOL
        value: 204.072119
        title: "calculated molar mass"

This is valid ChemDCAT-AP instance data (see A substance sample characterization dataset for a full dcat:Dataset example). It can be validated and converted to RDF using the LinkML tooling.

Architecture

ChemDCAT-AP is organized as four LinkML schema modules that import each other:

arcitecture_light.svg arcitecture_dark.svg

Each module has a distinct responsibility:

  • Material Entities AP — handles physical matter and its properties.
  • Chemical Entities AP — adds chemical entities and substances, their structures and properties.
  • Chemical Reaction AP — models all things needed for describing chemical reactions.
  • ChemDCAT-AP — ties these schemas together and adds exemplary DCAT-AP+ specializations for Dataset and DataGeneratingActivity constrained to be about/evaluating chemical substances and reactions.

This layered approach ensures that each profile can be used independently and all profiles align with the DCAT-AP+ core patterns.

Documentation

For DCAT-AP+ concepts (provenance core, QuantitativeAttribute/QualitativeAttribute, ClassifierMixin), see the DCAT-AP+ documentation.

Planned extensions

Several schema elements are currently stubs intended for future specialization:

PolymerSample (chemical_entites_ap.yaml): Extends SubstanceSample with PolymerMixin. The mixin currently has no additional slots. Future versions will add polymer-specific properties (degree of polymerization, molecular weight distribution, branching, etc.) following the same pattern as ChemicalSubstanceMixin. Note: PolymerSample currently shares class_uri: SIO:001378 with its parent SubstanceSample; a more specific mapping is planned.

Laboratory (chem_dcat_ap.yaml): Extends Surrounding (from DCAT-AP+), mapped to ENVO:01001405. Currently has no additional slots beyond those inherited from Surrounding. It is intended as an extension point where downstream profiles that import ChemDCAT-AP can plug in their own sub-shapes for laboratory-specific metadata.

Device (DCAT-AP+ scope): The Device class from DCAT-AP+ is planned to be extended with a dedicated device profile aligned with the PIDInst schema of the RDA Persistent Identification of Instruments Working Group. This is outside ChemDCAT-AP's scope and would be a separate DCAT-AP+ extension that plugs into the existing Device class.

Plan (DCAT-AP+ scope, with ChemDCAT-AP extensions): The DCAT-AP+ Plan class (mapped to prov:Plan) is planned to be extended to allow structured description of procedures, methods, and experimental executions. The goal is compatibility with the OBO Foundry planned process pattern, which uses IAO:0000104 (plan specification) for its definition. IAO:0000104 is planned to be used either as an additional ontology mapping on Plan or as the class_uri of a dedicated subclass. This would align with the formal BFO-to-PROV-O mapping and enable structured protocol metadata alongside the existing provenance chain.

Source code

The LinkML schemas, test data, and documentation source are on GitHub: nfdi-de/chem-dcat-ap