Design Patterns & Decisions of DCAT-AP+

Why DCAT-AP+ exists

The gap in DCAT-AP

DCAT-AP provides a robust metadata foundation for describing datasets across European data portals. However, it was designed for discoverability, not for capturing the scientific context of how data was generated or what it is about. Concretely, DCAT-AP offers only three ways to describe a dataset's subject matter:

DCAT-AP property	Limitation
`dcterms:description`	Free text — not machine-actionable
`dcat:keyword`	Free text — not machine-actionable
`dcat:theme`	Restricted to the EU Dataset Theme Vocabulary — too coarse for domain-specific discovery (e.g. no way to distinguish NMR from IR spectroscopy)

For describing how a dataset was created, DCAT-AP provides prov:wasGeneratedBy linking to a prov:Activity, but this Activity node shape is classified as a mere "Supportive Entity". It has no specified properties and is therefore a structural dead end. You can say a dataset was generated by some activity, but not what kind of activity, not what was evaluated, observed or measured, nor which instruments were used in doing so. This has to be defined in DCAT-AP extensions.

DCAT-AP+ closes this gap in a domain-agnostic way using patterns that can be reused by any domain for further specialization.

Problems DCAT-AP+ addresses

The SEMIC blog post on Application Profile modelling identifies six problems with current DCAT-AP extension practices. DCAT-AP+ directly addresses three of them:

Problem #2 — Inconsistent artefact generation. Published data specification artefacts (HTML docs, SHACL shapes, RDF vocabulary, JSON Schema) are often generated by different tools or edited manually, leading to inconsistencies between them. DCAT-AP+ uses LinkML as a single source of truth: one YAML schema generates SHACL shapes, JSON Schema, Python/Pydantic data classes, and a HTML schema reference documentation. Hence, they are all guaranteed to be coherent.

Problem #5 — Disadvantages of generalisation. The recommended approach for semantic adaptation in DCAT-AP extensions is to create subclasses or subproperties. But this can hinder adoption: if every domain creates its own dcat:Dataset subclass, interoperability requires reasoners or explicit multi-typing. DCAT-AP+ avoids subclass proliferation through its ClassifierMixin, which uses rdf:type to classify instances with domain-specific ontology terms without introducing new OWL subclasses. This approach is explicitly endorsed by the DCAT-AP developers.

Problem #6 — No entity profile mechanism. There is no established way to define multiple usage profiles for the same entity. DCAT-AP+ addresses this through its LinkML based approach, where classes function as SHACL node shapes, each with its own IRI, while referencing the same underlying ontology class.

Foundational principle: LinkML elements as SHACL shapes

This is the single most important concept for understanding and extending DCAT-AP+.

DCAT-AP+ is an application profile, a graph shape specification, not an ontology. Its LinkML classes correspond to SHACL node shapes and its slots to SHACL property shapes. The ontology terms they constrain are referenced via class_uri (on classes) and slot_uri (on properties), but the LinkML elements themselves are not those ontology terms.

This separation has a concrete consequence: multiple LinkML elements can reference the same ontology term. Each represents a different usage context, a different shape, for that term.

Accordingly, DCAT-AP+ often names its classes and slots differently from the OWL terms they reference. This serves three purposes: it signals that these are (different) shapes, not the ontology terms themselves; it follows LinkML naming conventions (e.g. snake_case for slots); and it allows more suitable labels where the OWL term label might be ambiguous or opaque for our usage context.

For classes: multiple node shapes, one ontology class

Consider these three DCAT-AP+ classes:

Entity:
  class_uri: prov:Entity          # ← all three share
  # generic: any entity used or produced by an activity

EvaluatedEntity:
  is_a: Entity
  class_uri: prov:Entity          # ← the same ontology class
  # narrower: the specific entity being evaluated

AnalysisSourceData:
  is_a: EvaluatedEntity
  class_uri: prov:Entity          # ← still prov:Entity
  # most specific: data produced by a prior activity, now being analysed

All three have class_uri: prov:Entity. In the generated SHACL, they become three distinct node shapes that all target prov:Entity, but each with different property constraints. This is exactly the "entity profile" mechanism that the SEMIC blog post on application profile modelling calls for: different profiles of the same entity, each with a unique shape identifier, without minting a new OWL class.

For slots: property shapes with replaceable predicates

The same logic applies to slots. DCAT-AP+ assigns slot_uri values to its slots. These are the RDF predicates used in the generated triples. When a domain-specific profile creates a sub-slot (via is_a), it inherits the parent's structural role but may assign a different slot_uri:

# In DCAT-AP+:
has_quantitative_attribute:
  slot_uri: dcterms:relation      # ← intentionally generic default
  range: QuantitativeAttribute

# In a domain-specific profile:
has_temperature:
  is_a: has_quantitative_attribute
  slot_uri: SIO:000008            # ← domain-specific predicate
  range: Temperature              # ← narrower range

The sub-slot plays the same role in the graph shape as the parent, but the RDF predicate changes. The DCAT-AP+ default predicates (dcterms:relation for attributes, dcterms:subject for aboutness) are intentionally semantics-thin. They were chosen as the lowest common denominator that avoids conflicting with domain-specific vocabularies, not to carry deep ontological commitments.

For a worked example of vocabulary replacement including the interoperability considerations, see the ChemDCAT-AP ontology alignment documentation.

What this means for extending DCAT-AP+

When you create a domain profile, you define new shapes, not new ontology terms. Your LinkML classes inherit slots from DCAT-AP+ classes via is_a and may:

Keep the parent's class_uri: if your class is a more constrained usage of the same ontology concept (a narrower shape, not a new type). This preserves discoverability: SPARQL queries for the parent type still find your instances.
Assign a different class_uri: if your class should map to a domain ontology term. Your instances will then be typed with that domain term, not the parent's PROV-O class. If your ontology is aligned with PROV-O (as BFO-based ontologies are, via the BFO → PROV-O mapping), a reasoner can still infer the PROV-O type. If it is not aligned, consider whether losing PROV-O discoverability is acceptable, or use rdf_type for explicit multi-typing (see Pattern 3: ClassifierMixin).

The same applies to slot_uri on sub-slots: keep the parent's predicate for interoperability, or replace it with a semantically richer domain predicate where the generic default is insufficient.

Don't confuse LinkML inheritance with OWL subclassing

is_a: EvaluatedEntity in LinkML means "this LinkML class inherits the slots of EvaluatedEntity." It does not generate an rdfs:subClassOf axiom. The ontological alignment is controlled solely by class_uri. Likewise, a sub-slot does not generate rdfs:subPropertyOf. It inherits the parent's structural constraints, not its RDF predicate (the same slot_uri value must be set manually for this).

Blank node duplication when projecting between vocabularies

If a knowledge graph generates RDF from the same instance data against both the DCAT-AP+ and a domain-specific schema, any inlined node without an id (i.e. a blank node) will be created twice, once per schema, with different predicates connecting it to its subject:

# From DCAT-AP+ schema
ex:sample-001 dcterms:relation _:b1 .
_:b1 prov:value 300.0 .

# From domain-specific schema (different slot_uri)
ex:sample-001 <domain:hasAttribute> _:b2 .
_:b2 prov:value 300.0 .

_:b1 and _:b2 are structurally identical but distinct blank nodes. Queries aggregating values will produce wrong results.

To avoid this: (a) ensure all inlined nodes carry IRIs, or (b) generate RDF from the domain-specific schema only and use SPARQL CONSTRUCT rules to add the DCAT-AP+ predicates to the existing nodes (see ChemDCAT-AP ontology alignment documentation.

For more on how to extend DCAT-AP+ see our Rules for domain profiles.

Pattern 1: The provenance core (PROV-O alignment)

Motivation

DCAT-AP already references PROV-O: Dataset has a prov:wasGeneratedBy property pointing to Activity, which is the node shape for the prov:Activity class. Yet, this node shape is an empty shell: it has no properties, no specified inputs, outputs, or agents. DCAT-AP+ fills this empty shell by reusing the Starting Point Terms of PROV-O.

The choice of PROV-O (rather than alternative provenance or observation models) was deliberate:

DCAT-AP already commits to PROV-O — we extend, not replace. Adding a second provenance vocabulary would create redundancy and force users to navigate two parallel models.
PROV-O's starting point terms are intentionally generic, matching our goal of domain-agnosticism.
A formal mapping between PROV-O and the Basic Formal Ontology (BFO) has been published, which validates alignment with OBO Foundry ontologies used by consortia like NFDI4Chem and NFDI4Cat.

Why not SOSA, P-Plan, or ProvONE?

Several alternative models were considered during the design of DCAT-AP+:

The Sensor, Observation, Sample, and Actuator (SOSA) ontology models the observation pattern: a Sensor makes an Observation of an ObservedProperty on a FeatureOfInterest, following a Procedure. This is well-suited for sensor networks and hard-science measurements.
P-Plan extends PROV-O with explicit plan/step structures for describing experimental workflows: a Plan has Steps, each Step has input/output Variables.
ProvONE extends PROV-O for computational workflows, adding Program, Port, and Channel classes for data pipeline provenance.

All three are more expressive than PROV-O for their respective domains. However, they share a commitment to the hard-science observation or computational workflow paradigm that makes them too narrow for DCAT-AP+'s intended scope.

DCAT-AP+ must accommodate data-generating activities across all research domains, including:

Natural sciences: NMR spectroscopy, chemical synthesis, sensor measurements — where SOSA's observation pattern fits well
Humanities: literature analysis, corpus annotation, archival research — where there is no "sensor", no "observed property", and the activity does not follow a repeatable "procedure" in the SOSA sense
Social sciences: surveys, interviews, ethnographic fieldwork — where data generation involves human participants, not instruments
Computational science: simulations, data transformations, model training — where ProvONE's workflow model fits but SOSA's sensor model does not

A literature analysis that produces a dataset is a prov:Activity that prov:generated a dcat:Dataset. It is not a SOSA Observation of an ObservedProperty. A humanities researcher should not be forced into an observation-centric vocabulary to describe their data provenance.

PROV-O's generality is a feature in this context: its starting point terms (Activity, Entity, Agent, used, wasGeneratedBy, wasAssociatedWith) are abstract enough to describe any data-producing process without imposing domain-specific assumptions. Domain-specific profiles can then add precision where needed. For example, ChemDCAT-AP adds chemical substances and reaction relevant classes and slots.

Domain profiles can still reference SOSA or P-Plan

Nothing prevents a domain profile from adding exact_mappings or close_mappings to SOSA or P-Plan terms on its classes. For example, a sensor-network profile could map DataGeneratingActivity → sosa:Observation and Device → sosa:Sensor. The PROV-O base layer does not preclude this. It simply does not require it.

The Activity pattern

DCAT-AP+ extends the Activity class (aligned to prov:Activity) with slots for its inputs, outputs, and agents:

Activity:
  class_uri: prov:Activity
  mixins:
    - ClassifierMixin            # adds rdf_type + type slots
  slots:
    - id
    - title
    - description
    - had_input_entity           # prov:used → Entity
    - had_output_entity          # prov:generated → Entity
    - had_input_activity         # prov:wasInformedBy → Activity
    - carried_out_by             # prov:wasAssociatedWith → AgenticEntity
    - has_qualitative_attribute  # dcterms:relation → QualitativeAttribute
    - has_quantitative_attribute # dcterms:relation → QuantitativeAttribute
    - has_part                   # dcterms:hasPart → Activity
    - part_of                    # dcterms:isPartOf → Activity

The slots are aligned to the PROV-O predicates and ranges from the prov:Activity base pattern, as well as to basic DC Terms via slot_uri in the schema (shown as comments).

AgenticEntity: why agents, not entities

DCAT-AP+ introduces AgenticEntity (aligned to prov:Agent) as the range of carried_out_by. In PROV-O, an agent is something that bears responsibility for an activity. It influences the activity taking place. This fits instruments and software: a spectrometer doesn't just exist during a measurement, it causes the measurement to happen.

AgenticEntity has two concrete subclasses:

Device:
  is_a: AgenticEntity
  class_uri: prov:Agent              # SHACL target = prov:Agent
  # exact_mappings: OBI:0000968, epos:Equipment, NCIT:C62103, SIO:000956, AFE:0000354
  # A physical instrument (spectrometer, reactor, sensor, ...)

Software:
  is_a: AgenticEntity
  class_uri: prov:SoftwareAgent      # SHACL target = prov:SoftwareAgent
  # exact_mappings: schema:SoftwareApplication
  # A software tool (analysis script, simulation code, ...)

Why not foaf:Agent?

DCAT-AP already uses foaf:Agent for people and organisations responsible for a dataset's publication. AgenticEntity (mapped to prov:Agent) serves a different purpose: it describes what was involved in the dataset's generation. Also, it is part of the core PROV-O Activity pattern and should thus be reused. We renamed it to avoid confusion between these two distinct shapes.

DataGeneratingActivity: the specialization for data production

Since DCAT-AP+ specifically concerns how datasets are generated, it introduces DataGeneratingActivity as a subclass of Activity:

DataGeneratingActivity:
  is_a: Activity
  class_uri: prov:Activity
  slots:
    - evaluated_entity    # what was measured/observed
    - evaluated_activity  # what process was studied
    - realized_plan       # what procedure was followed
    - occurred_in         # where it took place

The key new slots, evaluated_entity and evaluated_activity, are sub-slots of the inherited had_input_entity and had_input_activity:

evaluated_entity:
  is_a: had_input_entity       # ← LinkML slot inheritance
  slot_uri: prov:used          # ← same RDF predicate as parent
  range: EvaluatedEntity       # ← narrower range than parent (Entity)

This slot inheritance is how DCAT-AP+ implements the DCAT-AP extension guideline that "properties may be added, but must not duplicate existing ones" because evaluated_entity is had_input_entity, just with a narrower range and a more specific semantic intent.

Dual linking: Dataset ↔ subject matter

DCAT-AP+ links the subject matter to both the Dataset and the DataGeneratingActivity:

This intentional redundancy supports two query patterns:

Dataset-centric: "Find all datasets about substance X" → query is_about_entity
Process-centric: "Find all activities that evaluated substance X" → query evaluated_entity

Both is_about_entity and is_about_activity are mapped to dcterms:subject (with exact_mappings to IAO:0000136 — "is about" from the Information Artifact Ontology).

The DataAnalysis chain

Research data is often produced in multi-step pipelines: an instrument generates raw data, then software analyses that raw data to produce derived results. DCAT-AP+ models this with three additional classes:

Concrete example: An NMR spectrometer produces a raw FID (Free Induction Decay) dataset. Software then performs a Fourier transform and peak assignment on that raw data, producing a structural assignment dataset. The AnalysisSourceData node links the two, preserving the full provenance chain.

Note that all three — DataAnalysis, AnalysisDataset, and AnalysisSourceData — share class_uri values with their parents (prov:Activity, dcat:Dataset, prov:Entity respectively). They are different node shapes, not new ontology classes. See Foundational Principle.

Pattern 2: Generic attribute description

Motivation

Domain-specific metadata often involves quantitative or qualitative properties attached to entities, activities, or instruments, i.e. a temperature, a concentration, a solvent name, a calibration standard. In plain DCAT-AP, the only option is to encode these as free text in dcterms:description. DCAT-AP+ provides a structured alternative.

QuantitativeAttribute

Aligned to qudt:Quantity, this class represents a quantifiable property with a numeric value, a quantity kind, and a unit:

QuantitativeAttribute:
  class_uri: qudt:Quantity
  mixins:
    - ClassifierMixin
  slots:
    - title
    - description
    - value                # prov:value, range: float, required
  attributes:
    has_quantity_type:      # qudt:hasQuantityKind → DefinedTerm, required
    unit:                  # qudt:unit → DefinedTerm, recommended

The has_quantity_type and unit attributes use LinkML enum bindings to constrain their values to QUDT's QuantityKind and Unit vocabularies respectively.

Experimental feature

LinkML's enum binding feature is declared in the schema but may not yet be fully supported by the linkml-runtime validation tooling. The bindings express the intent that values should come from QUDT vocabularies and will be enforced once the feature matures. In the meantime, validation of these constraints may require additional checks outside of linkml-validate.

Design rationale: why a single-node pattern

QuantitativeAttribute represents a recorded value. It is the number a researcher writes down or a software tool exports. It is not an ontological model of the physical property itself. This is a deliberate choice. More expressive models exist (notably in the OBO Foundry stack, which separates physical qualities from information entities about those qualities through multiple intermediate nodes). These models offer stronger reasoning support but require data producers to understand and instantiate multi-hop structures that are not intuitive outside the ontology engineering community.

DCAT-AP+'s single-node pattern, value + quantity kind + unit, matches QUDT's Quantity model, which is an established standard for quantity representation in engineering, industrial, and web-of-data contexts. It is immediately understandable, directly queryable (one hop from entity to value), and sufficient for the structured discovery that DCAT-AP+ enables over plain DCAT-AP's free-text descriptions.

The classification slots from the ClassifierMixin provide an extension point: domain-specific profiles can use rdf_type to classify an attribute more precisely (e.g. as a measured value vs. a specified parameter) without changing the structural pattern.

Alignment with richer measurement models

Domain profiles that need the full expressivity of ontological measurement models (e.g. for reasoning over quality–datum–value chains) can subclass QuantitativeAttribute to add the necessary structure. The base pattern is intentionally minimal to remain accessible across research domains, from chemistry to digital humanities, where data producers have widely varying familiarity with formal ontology.

Worked example: describing a measurement temperature

Suppose an NMR measurement was performed at 298.0 K. In DCAT-AP+ instance data:

# Inside a DataGeneratingActivity or EvaluatedEntity:
has_quantitative_attribute:
  - title: "sample temperature setting"
    rdf_type:
      id: NMR:1400262
      title: "sample temperature information"  
    value: 298.0
    has_quantity_type:
      id: http://qudt.org/vocab/quantitykind/Temperature
      title: "Temperature"
    unit:
      id: http://qudt.org/vocab/unit/K
      title: "Kelvin"

Compare this to what you would write in plain DCAT-AP:

# DCAT-AP: free text only
description: "Measurement performed at 298 K"

The DCAT-AP version is not easily queryable, validatable, nor really interoperable. The DCAT-AP+ version enables SPARQL queries like "find all datasets where the measurement temperature was between 290 and 310 K".

QualitativeAttribute

Aligned to prov:Entity, this class represents a recorded non-numeric characterization of an entity, activity, or agent. Like QuantitativeAttribute, it captures what a researcher noted down, not the inherent property itself. The value slot carries the string representation; the ClassifierMixin provides ontology-grounded classification or vocabulary bound categorization of what that string describes:

QualitativeAttribute:
  class_uri: prov:Entity
  mixins:
    - ClassifierMixin
  slots:
    - title
    - description
    - value                # prov:value, range: string, required

Worked example: describing a spectrometer setting

An NMR spectrometer uses a specific pulse program. Recording this setting as a qualitative attribute makes it discoverable and classifiable via an ontology term:

# Inside a Device (e.g. an NMR spectrometer) or DataGeneratingActivity:
has_qualitative_attribute:
  - value: zgpg30
    title: used pulse program setting
    rdf_type:
      id: NMR:1400037
      title: NMR pulse sequence

The value slot represents a specific pulse program code of a Bruker NMR spectrometer; rdf_type provides the machine-actionable classification via the NMR ontology, and the title slots act as human-readable labels for this attribute.

Worked example: describing an assigned chemical identifier

When an NMR spectrum is analysed and a chemical structure is assigned, the resulting InChIKey is a qualitative attribute of the evaluated substance sample:

# Inside an EvaluatedEntity (e.g. a SubstanceSample in ChemDCAT-AP):
has_qualitative_attribute:
  - value: KVOIVNBYNQXCNY-BOCHJOTCSA-N
    title: assigned InChiKey
    rdf_type:
      id: CHEMINF:000059
      title: InChiKey

This pattern lets you record any non-numeric characterization on any DCAT-AP+ entity without modifying the schema. You only need the right ontology term for classification via rdf_type, or the right controlled vocabulary term for tagging via type.

When to use QualitativeAttribute vs. a domain profile sub-property

QualitativeAttribute is the generic fallback. If your domain profile (e.g. ChemDCAT-AP) defines a dedicated property like smiles or inchikey, use that instead. It is more explicit, easier to validate, and produces more concise instance data. Use QualitativeAttribute when no dedicated property exists.

Domain profiles may override slot_uri on attribute sub-slots

When a domain profile creates a sub-slot of has_quantitative_attribute or has_qualitative_attribute (e.g. ChemDCAT-AP's inchikey or has_temperature), it may assign a more specific slot_uri than the parent's dcterms:relation. For example, ChemDCAT-AP uses SIO:000008 (has attribute) for its chemistry-specific attribute sub-slots. This is a valid specialization. The sub-slot inherits the structural pattern but maps to a semantically richer predicate in the generated RDF. See slot_uri replacement in sub-slots for the interoperability implications.

Where attributes can be attached

Both has_quantitative_attribute and has_qualitative_attribute (mapped to dcterms:relation) are available on:

Entity (and all subclasses: EvaluatedEntity, AnalysisSourceData)
Activity (and all subclasses: DataGeneratingActivity, EvaluatedActivity, DataAnalysis)
AgenticEntity (and all subclasses: Device, Software)

This means you can describe properties of the thing being measured, the measurement process itself, or the instrument — all with the same pattern.

Pattern 3: Flexible classification (ClassifierMixin)

The mechanism

ClassifierMixin is an abstract mixin that injects two classification slots into every DCAT-AP+ core class:

ClassifierMixin:
  abstract: true
  mixin: true
  slots:
    - type       # dcterms:type → DefinedTerm
    - rdf_type   # rdf:type → DefinedTerm

Because it is a mixin, it does not generate its own node shape. Its slots are "mixed into" every class that declares mixins: [ClassifierMixin]: Activity, Entity, AgenticEntity, Plan, Surrounding, QualitativeAttribute, QuantitativeAttribute, and all their subclasses.

`rdf_type` vs. `type`: when to use which

These two slots serve fundamentally different purposes:

	`rdf_type`	`type`
Mapped to	`rdf:type`	`dcterms:type`
Semantic commitment	Ontological assertion — the instance is an instance of the referenced class	Cataloging assertion — the instance is categorized as the referenced concept
Range expectation	An OWL/RDFS class from a formal ontology	A SKOS concept or term from a controlled vocabulary
Reasoner behaviour	OWL reasoners will infer class membership and apply class axioms	No inference; treated as a simple annotation
Use when	You want the full semantic weight of formal ontology typing (e.g., classifying a `DataGeneratingActivity` as `CHMO:0000595` so that reasoners know it is an NMR measurement)	You want lightweight tagging without committing to an ontology's full logical structure (e.g., tagging a dataset with a SKOS concept from a local taxonomy)

Do: classify a measurement with an ontology class

# DataGeneratingActivity instance
rdf_type:
  id: CHMO:0000595                  # from the Chemical Methods Ontology
  title: "carbon-13 nuclear magnetic resonance spectroscopy"

This asserts that the activity instance is of type CHMO:0000595. A SPARQL query for ?x a CHMO:0000595 will find it. An OWL reasoner can infer that it is also a CHMO:0000293 (NMR spectroscopy) via the ontology's class hierarchy.

Do: tag with a SKOS concept

# DataGeneratingActivity instance
type:
  id: http://example.org/vocab/spectroscopy
  title: "Spectroscopy"
  from_CV: http://example.org/vocab/method-types

This says the activity is categorized as "Spectroscopy" in a local vocabulary. No ontological inference follows.

Don't: use `rdf_type` for loose tagging

If you don't intend the full ontological implications, use type. Misusing rdf_type with SKOS concepts can produce unintended reasoning results when the data is combined with ontology axioms.

Don't: use `type` when you need precision

If downstream consumers need to query by specific ontology classes (e.g. "find all ¹³C NMR measurements"), type with a vague label won't help. Use rdf_type with the precise ontology term.

SEMIC endorsement

This approach — allowing instances to carry additional rdf:type assertions pointing to domain ontology classes — is explicitly discussed and endorsed in the SEMIC Application Profiles blog post, which presents the dual assertion of multiple rdf:type values as a pragmatic alternative to subclass proliferation.

Pattern 4: Contextual metadata (Plan and Surrounding)

DCAT-AP+ provides two sparsely specified classes for optional context:

Plan

Aligned to prov:Plan (aliases: Plan Specification, Method, Procedure). Represents the directive information that prescribes how an activity should be carried out — a protocol, a standard operating procedure, a measurement method.

Plan:
  class_uri: prov:Plan
  mixins:
    - ClassifierMixin      # → rdf_type can point to e.g. a specific SOP type
  slots:
    - title
    - description

Linked from DataGeneratingActivity via realized_plan (mapped to prov:used).

Plan is intentionally minimal in DCAT-AP+. Domain profiles can extend it with specific properties (e.g. protocol version, parameters). At the DCAT-AP+ level, title and description carry the human-readable information, while rdf_type or type provide machine-actionable classification.

Surrounding

Aligned to prov:Location. Represents the spatial context of an activity — a laboratory, a field site, a clean room, a computational cluster.

Surrounding:
  class_uri: prov:Location
  mixins:
    - ClassifierMixin
  slots:
    - title
    - description

Linked from DataGeneratingActivity via occurred_in (mapped to prov:atLocation).

Surrounding vs. DCAT-AP's dcterms:spatial

DCAT-AP's geographical_coverage (mapped to dcterms:spatial) describes where the dataset applies (e.g. geographic extent of a climate dataset). Surrounding describes where the data generation took place. These are different questions and can coexist on the same dataset.

Summary: The DCAT-AP+ UML overview

The following UML class diagram shows the complete DCAT-AP+ extension layer (red highlighting). The classes and slots represent the LinkML schema elements, corresponding to SHACL node and property shapes. For brevity, slots inside the UML classes are referenced via their sh:path value rather than their LinkML slot name.

DCAT-AP-PLUS UML-diagram

Reading guide:

Dataset (top left) is the entry point. It must link to a DataGeneratingActivity via prov:wasGeneratedBy and should link to EvaluatedEntity / EvaluatedActivity via dcterms:subject.
DataGeneratingActivity (centre) links to its inputs (evaluated_entity, evaluated_activity), its agents (carried_out_by → AgenticEntity), an optional Plan, and an optional Surrounding.
QuantitativeAttribute and QualitativeAttribute (right) can be attached to entities, activities, and agents via dcterms:relation.
All green classes include rdf_type (rdf:type) and type (dcterms:type) from the ClassifierMixin.

Design Patterns & Decisions of DCAT-AP+

Why DCAT-AP+ exists

The gap in DCAT-AP

Problems DCAT-AP+ addresses

Foundational principle: LinkML elements as SHACL shapes

For classes: multiple node shapes, one ontology class

For slots: property shapes with replaceable predicates

What this means for extending DCAT-AP+

Pattern 1: The provenance core (PROV-O alignment)

Motivation

Why not SOSA, P-Plan, or ProvONE?

The Activity pattern

AgenticEntity: why agents, not entities

DataGeneratingActivity: the specialization for data production

Dual linking: Dataset ↔ subject matter

The DataAnalysis chain

Pattern 2: Generic attribute description

Motivation

QuantitativeAttribute

Design rationale: why a single-node pattern

Worked example: describing a measurement temperature

QualitativeAttribute

Worked example: describing a spectrometer setting

Worked example: describing an assigned chemical identifier

Where attributes can be attached

Pattern 3: Flexible classification (ClassifierMixin)

The mechanism

rdf_type vs. type: when to use which

Do: classify a measurement with an ontology class

Do: tag with a SKOS concept

Don't: use rdf_type for loose tagging

Don't: use type when you need precision

SEMIC endorsement

Pattern 4: Contextual metadata (Plan and Surrounding)

Plan

Surrounding

Summary: The DCAT-AP+ UML overview

`rdf_type` vs. `type`: when to use which

Don't: use `rdf_type` for loose tagging

Don't: use `type` when you need precision