Series: Understanding OCI from the Ground Up (Part 5 of 5)
In Part 1 we built an OCI image. In Part 2 we pushed it with raw HTTP. In Part 3 we ran it with bare Linux primitives. In Part 4 we signed it with Notation and saw how the OCI 1.1 `subject` + Referrers mechanism works. Now we use the exact same plumbing to attach a Software Bill of Materials (SBOM) to that image — proving the design generalizes far beyond signatures.
What is an SBOM?
A Software Bill of Materials (SBOM) is a machine-readable inventory of everything inside a piece of software. For a container image, an SBOM tells you:
- Every OS package with name, version, license, and supplier (
apt 2.4.14,libc6 2.35-0ubuntu3.4, ...) - Every language-level dependency (npm modules, pip wheels, Go modules, Maven JARs)
- Every file delivered by each package, with its hash
- The relationships between them — which package contains which file, which package depends on which other package, which package is the root "thing" the SBOM is about
- Cross-ecosystem identifiers (PURLs, CPEs) so the SBOM can be cross-referenced with package registries, advisory feeds, and license databases
Think of an SBOM as a typed graph: nodes are packages and files, edges are typed relationships (CONTAINS, DEPENDS_ON, DESCRIBES), and every node carries enough metadata to be uniquely identified across the world.
Why SBOMs exist:
- Inventory — You can't manage what you can't see. An SBOM is the first honest answer to "what's actually in this image?"
- Reproducibility & provenance — Two builds of the same Dockerfile a week apart can pull in different upstream versions. An SBOM captures the exact set that shipped.
- License compliance — The original driver behind SPDX (2010). Knowing every package's
licenseDeclaredis a legal requirement in many regulated industries. - Vulnerability matching — A scanner can take an SBOM and look up each package's PURL/CPE in a vulnerability database to find known CVEs (we'll see this briefly later).
- Compliance mandates — US Executive Order 14028 and the EU Cyber Resilience Act require SBOMs for software shipped to government and regulated buyers.
- Supply chain integrity — Combined with the Notation signatures from Part 4, SBOMs let you verify what's inside an image alongside who built it.
Background — Identifying a Package
Before we look at any SBOM file, we need to answer one question: given a file on disk, how do you describe a package precisely enough that a tool on the other side of the world can recognize it?
The answer is two parallel naming systems: PURL and CPE. Almost every package entry in every SBOM you'll ever see carries both.
PURL — Package URL (the modern identifier)
A PURL (spec) is a single string that uniquely identifies a package across ecosystems. Format:
pkg:<type>/<namespace>/<name>@<version>?<qualifiers>
| Part | Meaning | Example |
|---|---|---|
<type> | Package ecosystem | deb, rpm, apk, npm, pypi, golang, maven, cargo, oci |
<namespace> | Distro / org / scope (optional) | ubuntu, debian, @angular, github.com/gorilla |
<name> | Package name | apt, lodash, requests |
<version> | Exact version string | 2.4.14, 4.17.21, v1.8.0 |
<qualifiers> | Disambiguators (optional) | arch=arm64, distro=ubuntu-22.04, epoch=1 |
Examples you'll see in this post:
| PURL | What it means |
|---|---|
pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04 | Debian package apt 2.4.14 from Ubuntu 22.04, arm64 |
pkg:npm/lodash@4.17.21 | npm package lodash 4.17.21 |
pkg:pypi/django@4.2.7 | PyPI package django 4.2.7 |
pkg:golang/github.com/gorilla/mux@v1.8.0 | Go module gorilla/mux v1.8.0 |
pkg:oci/ubuntu-curl@sha256:0124b538... | The container image itself, by digest |
PURLs are the modern community standard — used by OSV.dev, GitHub Advisory Database, Snyk, Trivy, Syft, and almost every new tool. Given a PURL, a scanner can look up known vulnerabilities in seconds.
CPE — Common Platform Enumeration
A CPE (NIST spec) is the identifier scheme used by NIST's National Vulnerability Database. Format:
cpe:2.3:<part>:<vendor>:<product>:<version>:<update>:<edition>:<lang>:<sw_edition>:<target_sw>:<target_hw>:<other>
<part> is a (application), o (operating system), or h (hardware). Asterisks are wildcards. Example:
cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*
↑ ↑ ↑
part vendor version
CPEs predate PURLs by about a decade. They live on because NVD and many enterprise tools still use them. Modern SBOMs include both — PURL because that's what the open-source ecosystem uses, CPE because that's what NVD uses.
Why two systems? History. CPE came from US-government compliance work in the early 2000s; PURL came from the open-source community in the late 2010s. SBOM generators emit both so downstream tools can pick whichever they understand.
Where these identifiers go
In an SPDX SBOM, each package's externalRefs array carries them:
"externalRefs": [
{ "referenceCategory": "SECURITY", "referenceType": "cpe23Type", "referenceLocator": "cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*" },
{ "referenceCategory": "PACKAGE-MANAGER", "referenceType": "purl", "referenceLocator": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04" }
]
In a CycloneDX SBOM, PURL is a first-class field on every component:
{ "name": "apt", "version": "2.4.14", "purl": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04" }
With those two identifiers in hand, an SBOM is portable knowledge about an image — anyone, anywhere, with any tool, can pick it up and reason about it.
SBOM Formats — SPDX and CycloneDX
Two standards dominate. Both are JSON. Both describe the same things. Different communities chose different schemas:
| SPDX | CycloneDX | |
|---|---|---|
| Steward | Linux Foundation (ISO/IEC 5962:2021) | OWASP |
| Origin | License compliance (2010) | Application security (2017) |
| Identifier scheme | SPDXRef-* (internal) + PURL (external) | bom-ref + PURL |
| Top-level units | packages + files + relationships | components + dependencies |
| Vulnerabilities | Via separate VEX docs | Built into the BOM (vulnerabilities key) |
| Default for | syft (Anchore), Kubernetes | trivy (Aqua), cyclonedx-cli |
In practice, both formats describe the same image. Tools convert between them. We'll generate both.
How Syft Identifies Packages — Catalogers and Evidence
Before we run any commands, here's the mental model for what syft (or any SBOM generator) actually does inside.
Syft does not "scan binaries" or run heuristics. It runs a fleet of small, specialised programs called catalogers, each of which knows how to recognise one specific kind of evidence on a filesystem.
What a cataloger is
A cataloger is a Go module inside Syft (and similar in other tools) with one job:
Walk a filesystem. Recognise the metadata files of one packaging system. Parse them. Emit a list of structured
Packagerecords.
Each cataloger looks at a few specific path patterns and parses files in formats it knows. They run independently and their outputs are merged.
The catalogers Syft ships with
A short tour of catalogers relevant to container images:
| Cataloger | Looks for | Parses |
|---|---|---|
| dpkg-db | /var/lib/dpkg/status, /var/lib/dpkg/info/*.md5sums | Debian/Ubuntu OS packages |
| rpm-db | /var/lib/rpm/Packages (Berkeley DB or sqlite) | RPM packages on Red Hat / Fedora / SUSE |
| apk-db | /lib/apk/db/installed | Alpine packages |
| java-archive | *.jar, *.war, *.ear (and their META-INF/MANIFEST.MF, pom.properties) | Java libraries |
| python-package | *.dist-info/METADATA, *.egg-info/PKG-INFO, requirements.txt | Installed PyPI wheels and pip-style declarations |
| javascript-package | package.json, package-lock.json, yarn.lock | npm modules |
| go-module-binary | ELF binaries with embedded module info | Go modules statically compiled into a binary |
| go-mod-file | go.mod, go.sum | Declared Go dependencies |
| rust-cargo | Cargo.lock | Rust crates |
| ruby-gemspec | *.gemspec, Gemfile.lock | Ruby gems |
| php-composer | composer.lock, installed.json | PHP Composer packages |
| binary-classifier | Specific binaries (node, python3, httpd, nginx, ...) | Identifies a known binary by its byte signature and reads its embedded version |
For the full list: syft cataloger list. As of Syft 1.44 there are 30+ catalogers covering every major ecosystem.
Evidence sources — where the data actually comes from
For our ubuntu:22.04 image, the dpkg-db cataloger is the only one that finds anything. Watch what it reads. Recall the sourceInfo field on the apt package later in this post:
acquired package info from DPKG DB: /var/lib/dpkg/status /usr/share/doc/apt/copyright /var/lib/dpkg/info/apt.conffiles /var/lib/dpkg/info/apt.md5sums /var/lib/dpkg/info/apt.list /var/lib/dpkg/info/apt.postinst /var/lib/dpkg/info/apt.postrm /var/lib/dpkg/info/apt.preinst /var/lib/dpkg/info/apt.prerm /var/lib/dpkg/info/apt.shlibs /var/lib/dpkg/info/apt.triggers
That list is verbatim what dpkg itself maintains for every installed package. status gives name/version/architecture/dependencies; copyright gives license text; *.md5sums gives the exact list of files belonging to that package and their MD5 hashes; *.list gives the full file paths.
This is not magic. Syft's dpkg cataloger essentially reads the same files dpkg --status apt would read — it just does it without running dpkg, by parsing the files directly. That's why Syft works on a static filesystem (a tarball, a pulled image, an OCI registry blob) without needing the package manager installed.
How Syft sees a container image
A container image is a stack of layer tarballs. Syft does this:
1. Pull / open the image (from a registry, daemon, tarball, or directory) 2. Build a layered filesystem view in memory (the "squashed" view, plus per-layer detail) 3. For each registered cataloger: a. Use the cataloger's path-glob pattern to find candidate files b. Parse each candidate file c. Emit Package records 4. Run a relationships pass: - Tie each package to the files it owns (from .md5sums / .list) - Tie packages to the layer they came from - Tie everything to the image as the root 5. Emit the final SBOM in the requested format (SPDX, CycloneDX, syft-json)
Steps 3 and 4 are why an SBOM is a graph, not just a list. The relationships are what make queries like "which files in layer 2 belong to which package?" possible.
What syft — and any SBOM tool — cannot do reliably
Worth being honest about the limits, because the SBOM is only as good as the catalogers' coverage:
- Statically linked binaries with no metadata (a Go binary built with
-trimpathand stripped) often show up as "unidentified files" or just a binary classifier hit — the version may be wrong or missing. - Code copied into the source tree (vendored without a manifest) is invisible. There is no metadata file to read.
- Custom-compiled libraries dropped into
/usr/local/libwithout a package manager record are invisible to OS-package catalogers; they may still show up via thebinary-classifierif syft happens to recognise their signature. - Application-level dependencies inside a built artifact (e.g. node_modules already bundled into a single
dist.js) usually require running the bundler-aware cataloger before bundling, not after.
This is why generating the SBOM at build time — when lockfiles and intermediate artifacts are still present — is the production best practice. Generating it from a finished image is still useful, just less complete.
Prerequisites — The Lab
We use the same network-of-containers pattern from Parts 2 and 3: a real OCI registry plus a lab container with our tools.
# Create network and start the registry docker network create oci-net docker run -d --name oci-registry --network oci-net -p 5000:5000 registry:2 docker run -d --name oci-lab --network oci-net ubuntu:22.04 sleep 7200 # Install base tools in the lab docker exec oci-lab bash -c \ 'apt-get update -qq && apt-get install -y -qq curl jq skopeo ca-certificates > /dev/null 2>&1'
Install syft, oras, and trivy
docker exec oci-lab bash -c ' # syft — generates SBOMs (SPDX, CycloneDX, syft-json) curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin # oras — pushes/pulls arbitrary OCI artifacts (the swiss-army knife for OCI 1.1) curl -sSLo /tmp/oras.tar.gz "https://github.com/oras-project/oras/releases/download/v1.2.0/oras_1.2.0_linux_arm64.tar.gz" tar -xzf /tmp/oras.tar.gz -C /tmp/ && mv /tmp/oras /usr/local/bin/ # trivy — vulnerability scanner that also generates CycloneDX SBOMs curl -sSLo /tmp/trivy.tar.gz "https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-ARM64.tar.gz" tar -xzf /tmp/trivy.tar.gz -C /tmp/ && mv /tmp/trivy /usr/local/bin/ '
Note: Replace arm64 / ARM64 with amd64 / x86_64 if you're on Intel.
Verify:
$ syft version | head -3 Application: syft Version: 1.44.0 BuildDate: 2026-05-01T17:11:01Z $ oras version | head -3 Version: 1.2.0 Go version: go1.22.3 $ trivy --version | head -2 Version: 0.70.0
Push a target image
We'll generate SBOMs for the same ubuntu-curl:v1 image we used in Part 2:
docker exec oci-lab skopeo copy --dest-tls-verify=false \ docker://ubuntu:22.04 \ docker://oci-registry:5000/ubuntu-curl:v1
Capture the manifest digest — the SBOM will reference it via the subject field:
docker exec oci-lab bash -c '
curl -sI http://oci-registry:5000/v2/ubuntu-curl/manifests/v1 \
-H "Accept: application/vnd.oci.image.manifest.v1+json" \
| grep -i docker-content-digest
'
Docker-Content-Digest: sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
Step 1: Generate an SPDX SBOM with Syft
Syft can read images directly from a registry. Since our registry uses plain HTTP, we tell syft to allow that:
docker exec -w /work oci-lab bash -c '
export SYFT_REGISTRY_INSECURE_USE_HTTP=true
syft registry:oci-registry:5000/ubuntu-curl:v1 \
-o spdx-json=/work/sbom.spdx.json
'
Result: a 1.9 MB JSON file describing every package and file in the image.
$ ls -la /work/sbom.spdx.json -rw-r--r-- 1 root root 1943149 May 9 11:34 /work/sbom.spdx.json
What's inside
Top-level structure of an SPDX 2.3 document. The header is a handful of scalar fields plus creationInfo; the payload is the four arrays at the bottom:
{
"spdxVersion": "SPDX-2.3",
"dataLicense": "CC0-1.0",
"SPDXID": "SPDXRef-DOCUMENT",
"name": "oci-registry:5000/ubuntu-curl",
"documentNamespace": "https://anchore.com/syft/image/oci-registry-5000/ubuntu-curl-da9454a6-742f-497e-a5db-16ae9aa0b48f",
"creationInfo": {
"licenseListVersion": "3.28",
"creators": [
"Organization: Anchore, Inc",
"Tool: syft-1.44.0"
],
"created": "2026-05-09T11:34:11Z"
},
"packages": [ /* 102 entries */ ],
"files": [ /* 2,290 entries */ ],
"relationships": [ /* 2,848 entries */ ],
"hasExtractedLicensingInfos": [ /* custom licenses */ ]
}
Header fields (scalars + the creationInfo object) describe the document itself — covered in detail later under Document-level fields.
The four payload arrays are where the actual SBOM data lives:
| Array | Count (our image) | What it holds |
|---|---|---|
packages | 102 | Every package Syft identified — OS packages (deb), language-level deps (none in this image), and one entry for the image itself as the root |
files | 2,290 | Every file Syft cataloged, each with name, multiple checksums, and an SPDXID. Present because filesAnalyzed: true on the packages. |
relationships | 2,848 | Typed edges between SPDXIDs — DESCRIBES, CONTAINS, DEPENDS_ON, etc. This is what makes the document a graph rather than a flat list. |
hasExtractedLicensingInfos | varies | Full text of any non-standard license (LicenseRef-*) referenced from licenseDeclared / licenseConcluded. Empty if every package uses a standard SPDX License List ID. |
A few other arrays the spec defines that may or may not appear, depending on the producer and the input:
| Array | When it shows up |
|---|---|
snippets | Source-code analysis tools (FOSSology, ScanCode). Almost never in container-image SBOMs. |
annotations | Document-level reviewer/tool comments. Optional. |
externalDocumentRefs | When this SBOM references packages defined in another SBOM (e.g. an app SBOM pointing at a base-image SBOM). Optional. |
So the honest summary is: SPDX 2.3 has four payload arrays you'll see in nearly every container-image SBOM (packages, files, relationships, hasExtractedLicensingInfos), plus three optional ones (snippets, annotations, externalDocumentRefs) that show up in specialised use cases.
A sample package
{
"name": "apt",
"SPDXID": "SPDXRef-Package-deb-apt-5be364a4af57b701",
"versionInfo": "2.4.14",
"supplier": "NOASSERTION",
"downloadLocation": "NOASSERTION",
"filesAnalyzed": true,
"packageVerificationCode": {
"packageVerificationCodeValue": "e75a97363fdfe68c12c4bb109d55771cae4f3a3c"
},
"sourceInfo": "acquired package info from DPKG DB: /var/lib/dpkg/status, /usr/share/doc/apt/copyright, /var/lib/dpkg/info/apt.conffiles, ...",
"licenseConcluded": "NOASSERTION",
"licenseDeclared": "GPL-2.0-only AND LicenseRef-GPLv2-",
"copyrightText": "NOASSERTION",
"externalRefs": [
{
"referenceCategory": "SECURITY",
"referenceType": "cpe23Type",
"referenceLocator": "cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*"
},
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceType": "purl",
"referenceLocator": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04"
}
]
}
Every field is doing meaningful work. Walking the package entry top-to-bottom:
| Field | What it carries | Why it's there |
|---|---|---|
name, versionInfo | Human-readable identity | Lets people read the SBOM |
SPDXID | Document-internal ID (SPDXRef-Package-deb-apt-5be364a4af57b701) | Used as the source/target of relationships entries (see below). The hex suffix is a content hash so two builds emit stable IDs. |
supplier | Person / Organization who supplies the package | Often NOASSERTION for OS packages where dpkg doesn't track this cleanly |
downloadLocation | Where this version can be re-fetched | NOASSERTION if not known |
filesAnalyzed | true if Syft enumerated the package's files | Determines whether packageVerificationCode is meaningful |
packageVerificationCode | SHA-1 over the sorted list of file SHA-1s belonging to the package | Tamper-evident: if any file changes, this value changes. Reproducible across builds. |
sourceInfo | Free-text trace of which files Syft read to learn about this package | Provenance for the SBOM itself — you can audit Syft's evidence trail |
licenseConcluded | License concluded by analysis | What an analyst concluded after reading. NOASSERTION means "no claim". The two fields exist precisely because they can disagree. |
licenseDeclared | License declared by the upstream packager (here, from debian/copyright) | What the project says it is |
copyrightText | Copyright notice text | License-compliance use case |
externalRefs | Cross-ecosystem identifiers (PURL, CPE, etc.) | The portable handles other tools use |
externalRefs — not just PURL and CPE
The SPDX spec defines several referenceCategory values that you'll see in the wild:
referenceCategory | referenceType examples | Used for |
|---|---|---|
PACKAGE-MANAGER | purl, npm, maven-central | Cross-ecosystem package handle (PURL is the universal one) |
SECURITY | cpe23Type, cpe22Type, advisory, fix, url | Identifiers for vulnerability matching, plus links to advisories |
PERSISTENT-ID | swh (Software Heritage), gitoid | Long-term archival identifiers — the package's source code by content-hash |
OTHER | (anything) | Custom locator types tools have invented |
packageVerificationCode — the integrity check
The SPDX spec defines packageVerificationCode as: take every file SPDX considers part of this package, compute SHA-1 of each, sort the hex strings, concatenate, and SHA-1 the result. The output is stable across machines, OSes, and time. Two builds of the same package always produce the same value; any tampering with any owned file changes it. Together with licenseDeclared and externalRefs, this is what makes SPDX records independently verifiable, not just descriptive.
The files array — file-level evidence
With filesAnalyzed: true, the SPDX document contains a files array. In our SBOM that's 2,290 entries. A typical entry:
{
"SPDXID": "SPDXRef-File-bin-bash-3a7f1c8b9e2d4f56",
"fileName": "/bin/bash",
"checksums": [
{ "algorithm": "SHA1", "checksumValue": "a8c1b..." },
{ "algorithm": "SHA256", "checksumValue": "3f617f3..." },
{ "algorithm": "MD5", "checksumValue": "44136fa..." }
],
"licenseConcluded": "NOASSERTION",
"copyrightText": "NOASSERTION"
}
Files get their own SPDXIDs because they are first-class nodes in the relationships graph.
The relationships array — where the graph lives
This is the part most people miss when they first read an SBOM. Our document has 2,848 relationship entries. Each is a typed edge between two SPDXIDs:
[
{
"spdxElementId": "SPDXRef-DOCUMENT",
"relationshipType": "DESCRIBES",
"relatedSpdxElement": "SPDXRef-Package-oci-ubuntu-curl-..."
},
{
"spdxElementId": "SPDXRef-Package-oci-ubuntu-curl-...",
"relationshipType": "CONTAINS",
"relatedSpdxElement": "SPDXRef-Package-deb-apt-5be364a4af57b701"
},
{
"spdxElementId": "SPDXRef-Package-deb-apt-5be364a4af57b701",
"relationshipType": "CONTAINS",
"relatedSpdxElement": "SPDXRef-File-usr-bin-apt-..."
},
{
"spdxElementId": "SPDXRef-Package-deb-apt-5be364a4af57b701",
"relationshipType": "DEPENDS_ON",
"relatedSpdxElement": "SPDXRef-Package-deb-libapt-pkg6.0-..."
}
]
The relationship types you'll see most often:
| Type | Meaning |
|---|---|
DESCRIBES | The document describes the target. Used once, from SPDXRef-DOCUMENT to the root package (here, the image). |
CONTAINS | The source contains the target as a subcomponent. Image CONTAINS packages; package CONTAINS files. |
DEPENDS_ON | The source requires the target at runtime. |
BUILD_DEPENDENCY_OF | The source is required only at build time. |
PATCH_FOR | The source is a patch for the target. |
STATIC_LINK / DYNAMIC_LINK | The source links the target statically/dynamically. |
This is what makes SPDX a graph format and not just a list. Queries like "which files belong to package X?" or "what happens if I remove package Y?" are all just graph traversals over the relationships array.
SPDXRef-DOCUMENT
│ DESCRIBES
▼
SPDXRef-Package-oci-ubuntu-curl-... ← the image (root)
│ CONTAINS (×102)
├──► SPDXRef-Package-deb-apt-...
│ │ CONTAINS (×N files)
│ ├──► SPDXRef-File-usr-bin-apt-...
│ └──► SPDXRef-File-etc-apt-apt.conf.d-...
│ │ DEPENDS_ON
│ └──► SPDXRef-Package-deb-libapt-pkg6.0-...
├──► SPDXRef-Package-deb-bash-...
└──► SPDXRef-Package-deb-libc6-...
Spec versions — SPDX 2.3 vs SPDX 3.0
SPDX has gone through several major revisions. The two that matter today:
| SPDX 2.3 | SPDX 3.0 | |
|---|---|---|
| Released | 2022 | 2024 |
| Status | Current de-facto standard | Released, slow adoption |
| Schema | Flat: top-level packages, files, relationships arrays | Element-graph: everything is an Element, relationships are first-class elements |
| Profile system | None | Modular profiles: Core, Software, Build, Security, AI, Dataset, Licensing, Lite |
| Tooling | Universal: every SBOM tool emits SPDX 2.3 | Growing: spdx-tools, syft (preview) |
Almost every SBOM you'll meet in the wild today is SPDX 2.3. The Syft output we generated above is SPDX 2.3. SPDX 3.0 is a clean break — fundamentally a different data model — and will probably take a few years to dominate. Knowing 2.3 deeply transfers most of the way to 3.0 once you learn the new element-graph vocabulary.
This post focuses on 2.3 because that's what's in production.
Serialization formats
The same SPDX 2.3 document can be serialized into four formats. The on-the-wire bytes differ; the data is identical.
| Format | File extension | Notes |
|---|---|---|
| JSON | .spdx.json | The dominant format. What syft -o spdx-json emits. JSON Schema available. |
| YAML | .spdx.yaml | Human-friendlier; less common |
| Tag-Value | .spdx | The original SPDX format (key:value text). Still emitted by some tools. |
| RDF/XML | .spdx.rdf | Semantic-web format. Rare in practice. |
Tools like syft convert and spdx-tools move losslessly between them. Use JSON for anything new — it's what every tool reads, and it's what registries store when you oras attach an SBOM.
Document-level fields — what every SPDX document must have
The fields at the very top of an SPDX 2.3 JSON document are mandatory and have specific meanings. Going through ours:
{
"spdxVersion": "SPDX-2.3",
"dataLicense": "CC0-1.0",
"SPDXID": "SPDXRef-DOCUMENT",
"name": "oci-registry:5000/ubuntu-curl",
"documentNamespace": "https://anchore.com/syft/image/oci-registry-5000/ubuntu-curl-da9454a6-742f-497e-a5db-16ae9aa0b48f",
"creationInfo": { ... }
}
| Field | Why it must be exactly this |
|---|---|
spdxVersion | The schema version. Parsers branch on this. Always SPDX- prefix. |
dataLicense | The license of the SBOM document itself. SPDX 2.x mandates CC0-1.0 so that SBOM data is freely shareable, regardless of the license of the software it describes. |
SPDXID | The document's own ID. Must be exactly SPDXRef-DOCUMENT. |
name | A human label for the document. By convention, the name of the thing being described. |
documentNamespace | A globally unique URI for this document. Two regenerations of the same SBOM should have different namespaces (note the UUID in ours). It's how external documents reference each other unambiguously — see the externalDocumentRefs field in spec. |
creationInfo | Required metadata about the generation event. |
Why the namespace matters: if Document A wants to reference a package defined in Document B, it points to <B's documentNamespace>#SPDXRef-Package-foo. The namespace is the anchor for cross-document references. Without it, SPDXIDs would only be unique within a single file.
creationInfo — the provenance of the SBOM itself
"creationInfo": {
"licenseListVersion": "3.28",
"creators": [
"Organization: Anchore, Inc",
"Tool: syft-1.44.0"
],
"created": "2026-05-09T11:34:11Z",
"comment": "..."
}
| Field | Meaning |
|---|---|
created | UTC timestamp when the SBOM was generated. Required. |
creators | Array of who/what created it. Each entry must start with Tool:, Organization:, or Person:. Convention: tools list both the tool and the organization that ran it. |
licenseListVersion | Which version of the SPDX License List the document's license expressions were validated against. Important because the License List grows over time (3.28 has IDs that 3.20 didn't). |
comment | Optional free text. |
This is the SBOM's audit trail — by reading creationInfo you know what tool produced the document, when, and against which license vocabulary.
License expressions — the small DSL inside licenseConcluded and licenseDeclared
The single most underestimated piece of SPDX is its license expression syntax. Every licenseConcluded, licenseDeclared, and file-level license field uses it.
The simplest expression is one of ≈83,000 IDs from the SPDX License List:
"licenseDeclared": "MIT" "licenseDeclared": "Apache-2.0" "licenseDeclared": "GPL-2.0-only" "licenseDeclared": "GPL-2.0-or-later"
You can combine IDs with operators:
| Operator | Meaning | Example |
|---|---|---|
AND | Conjunction — you must comply with both licenses | (MIT AND Apache-2.0) |
OR | Disjunction — you may comply with either | (GPL-2.0-only OR Apache-2.0) |
WITH | License + an exception | Apache-2.0 WITH LLVM-exception |
+ | This version or any later | LGPL-2.1+ (deprecated in favor of -or-later IDs) |
Compound expressions are common in real SBOMs:
"licenseDeclared": "(MIT AND Apache-2.0) OR GPL-3.0-or-later"
That reads: "the recipient may comply with the conjunction (MIT and Apache-2.0), OR with GPL-3.0-or-later, at their choice."
Three special tokens: NONE, NOASSERTION, LicenseRef-*
"licenseConcluded": "NONE" ← there is no license; the file is in the public domain or unlicensed "licenseConcluded": "NOASSERTION" ← the analyst makes no claim about the license "licenseDeclared": "LicenseRef-Vendor-EULA-2024" ← a custom license defined elsewhere in the document
NONE and NOASSERTION are not the same thing. NONE is a positive claim ("no license applies"); NOASSERTION is a refusal to claim ("I don't know / I won't say"). Tools that auto-generate SBOMs default to NOASSERTION when the license is ambiguous — which, for OS packages, it usually is.
LicenseRef-* IDs let you reference a license that isn't on the SPDX License List. Their text is then provided in...
hasExtractedLicensingInfos — custom license texts
If your licenseDeclared includes a LicenseRef-Foo, you must also include a hasExtractedLicensingInfos entry with the actual license text:
"hasExtractedLicensingInfos": [
{
"licenseId": "LicenseRef-GPLv2-",
"name": "GPLv2 (Debian-modified header)",
"extractedText": " GNU GENERAL PUBLIC LICENSE\n Version 2, June 1991\n\n Copyright (C) 1989, 1991 Free Software Foundation, Inc. ...",
"comment": "Found in /usr/share/doc/apt/copyright"
}
]
This is what makes SPDX legally usable: even if a package ships under some bespoke vendor license, the SBOM carries the full text of that license alongside the reference. License-compliance auditors can review the SBOM as a self-contained legal artifact.
More package fields you'll meet
The package table earlier covered the most common fields. SPDX 2.3 defines a few more that show up regularly:
| Field | What it carries |
|---|---|
originator | The entity that created the package (vs. supplier, who delivered it). For Debian's apt package: originator is the upstream apt project; supplier is Ubuntu. |
primaryPackagePurpose | One of: APPLICATION, FRAMEWORK, LIBRARY, CONTAINER, OPERATING-SYSTEM, DEVICE, FIRMWARE, SOURCE, ARCHIVE, FILE, INSTALL, OTHER. Lets consumers filter (e.g. "show me all the FIRMWARE entries"). |
releaseDate | When upstream released this version (ISO-8601 UTC). |
builtDate | When the binary in this image was built. |
validUntilDate | Vendor-declared end-of-support date. Useful for "are we shipping anything past EOL?" queries. |
homepage | Project website URL. |
attributionTexts | Required attribution notices (BSD-style "this product includes...") that must appear in derivative work documentation. |
summary, description | Short and long human-readable descriptions of the package. |
comment | Free text from the SBOM author. |
annotations | See next subsection. |
annotations — reviewer or tool commentary
Both packages and the document itself can carry annotations: dated, signed-off comments. This is how an analyst leaves a note on a finding without modifying any other field:
"annotations": [
{
"annotationDate": "2026-05-09T11:35:00Z",
"annotationType": "REVIEW",
"annotator": "Person: Sandeep Choudary",
"comment": "Verified license claim against /usr/share/doc/apt/copyright on 2026-05-09."
}
]
annotationType is one of REVIEW, OTHER. The lightweight design lets compliance workflows attach evidence to specific package entries.
Snippets — sub-file granularity
Sometimes a single source file mixes code under different licenses. SPDX has snippets for that:
"snippets": [
{
"SPDXID": "SPDXRef-Snippet-libfoo-bsd-fragment",
"snippetFromFile": "SPDXRef-File-src-libfoo-merged.c",
"ranges": [
{ "startPointer": { "offset": 1024 }, "endPointer": { "offset": 4096 } }
],
"licenseConcluded": "BSD-3-Clause",
"copyrightText": "Copyright (c) 2018 Original Author"
}
]
Container-image SBOMs almost never use snippets — they're a source-code-analysis feature. But if you read SBOMs from compliance tools like FOSSology you'll meet them.
Full relationship type list
The earlier table showed the most common 6 relationship types. SPDX 2.3 actually defines about 40. The full set falls into rough categories:
Composition: CONTAINS, CONTAINED_BY, DESCRIBES, DESCRIBED_BY,
PACKAGE_OF, HAS_PREREQUISITE, PREREQUISITE_FOR
Dependency: DEPENDS_ON, DEPENDENCY_OF,
DEPENDENCY_MANIFEST_OF, DEV_DEPENDENCY_OF,
OPTIONAL_DEPENDENCY_OF, BUILD_DEPENDENCY_OF,
RUNTIME_DEPENDENCY_OF, TEST_DEPENDENCY_OF,
PROVIDED_DEPENDENCY_OF
Build & source: GENERATED_FROM, GENERATES, BUILD_TOOL_OF,
DEV_TOOL_OF, OPTIONAL_COMPONENT_OF
Linkage: STATIC_LINK, DYNAMIC_LINK
Lifecycle: PATCH_FOR, PATCH_APPLIED, COPY_OF,
ANCESTOR_OF, DESCENDANT_OF, VARIANT_OF
Distribution: DISTRIBUTION_ARTIFACT, METAFILE_OF,
DOCUMENTATION_OF, EXAMPLE_OF, TEST_CASE_OF,
EXPANDED_FROM_ARCHIVE, FILE_ADDED, FILE_DELETED, FILE_MODIFIED
Other: SPECIFICATION_FOR, REQUIREMENT_DESCRIPTION_FOR, OTHER, AMENDS
Note the inverse pairs (CONTAINS / CONTAINED_BY, DEPENDS_ON / DEPENDENCY_OF). SPDX lets you express the same edge from either direction; producers usually pick one direction and stay consistent.
Inter-document references
A single SBOM can reference packages defined in another SBOM. This is how ecosystems share license analyses without duplicating data:
"externalDocumentRefs": [
{
"externalDocumentId": "DocumentRef-ubuntu-base",
"spdxDocument": "https://ubuntu.com/sboms/22.04-base/spdx-2.3.json",
"checksum": {
"algorithm": "SHA256",
"checksumValue": "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"
}
}
]
Then anywhere in this document you can refer to DocumentRef-ubuntu-base:SPDXRef-Package-libc6 and consumers know exactly which libc6 you mean.
Validating an SPDX document
The official validation tools:
spdx-tools(Java) — the reference implementation. Runs schema validation, license-expression validation, and reference-integrity checks (every SPDXID referenced inrelationshipsmust exist).pyspdxtools(Python) — official Python implementation; same checks.- JSON Schema at https://github.com/spdx/spdx-spec/blob/master/schemas/spdx-schema.json — plug into any JSON Schema validator (
ajv,jsonschema, IDE plugins). - Online validator: https://tools.spdx.org/app/validate/
A well-formed SPDX document should pass all three: JSON Schema, the tooling reference checks, and license-expression validation against the License List.
Headline difference, in one sentence
SPDX is a graph of typed elements with rich licensing semantics — packages, files, snippets, and 40+ relationship types — designed first for license compliance and later extended to inventory and security. Everything else (CycloneDX, syft-json, etc.) is some compression of that idea.
Step 2: Generate a CycloneDX SBOM with Trivy
docker exec -w /work oci-lab bash -c '
TRIVY_INSECURE=true trivy image \
--format cyclonedx \
--output /work/sbom.cdx.json \
oci-registry:5000/ubuntu-curl:v1
'
2026-05-09T11:34:25Z INFO Detected OS family="ubuntu" version="22.04" 2026-05-09T11:34:25Z INFO Number of language-specific files num=0 $ wc -c /work/sbom.cdx.json 209683 /work/sbom.cdx.json
Trivy's CycloneDX is much smaller (210 KB vs 1.9 MB) because it doesn't catalog individual files — only packages.
What's inside
{
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"serialNumber": "urn:uuid:cb37c624-b4ca-4281-aec9-8ded8176714f",
"version": 1,
"metadata": { ... }, // document-level info + the root component
"components": [ ... ], // 102 packages
"dependencies": [ ... ], // adjacency list of the dependency graph
"vulnerabilities": [ ... ] // optional, populated only when --scanners vuln is set
}
Top-level fields
| Field | Purpose |
|---|---|
bomFormat | Always "CycloneDX" — lets a parser detect the format from the first key |
specVersion | The CycloneDX spec version this document targets (1.4, 1.5, 1.6...) |
serialNumber | A urn:uuid: per document. Two regenerations are different serial numbers but otherwise equivalent. |
version | Document revision counter. Bump when you re-publish a corrected SBOM for the same image. |
metadata | Document-level metadata: timestamp, tools (what generated it), and metadata.component (see below) |
components | Flat list of every package, library, file, container, OS, framework, etc. found in the target |
dependencies | Adjacency list: which bom-ref depends on which |
vulnerabilities | Optional: same vulnerability records you'd find in a Trivy report |
metadata.component — the root
This is the CycloneDX equivalent of SPDX's DESCRIBES relationship. It says "this BOM is about the following thing":
"metadata": {
"timestamp": "2026-05-09T11:34:25Z",
"tools": {
"components": [
{ "type": "application", "name": "trivy", "version": "0.70.0" }
]
},
"component": {
"bom-ref": "oci-registry:5000/ubuntu-curl@sha256:0124b538...",
"type": "container",
"name": "oci-registry:5000/ubuntu-curl",
"version": "sha256:0124b538...",
"purl": "pkg:oci/ubuntu-curl@sha256:0124b538...?repository_url=oci-registry%3A5000%2Fubuntu-curl"
}
}
The metadata.component.type here is container. Other valid types: application, framework, library, operating-system, device, firmware, file. Trivy emits container; Syft also emits container when serialising to CycloneDX.
The components array
A flat list. Order doesn't matter — the structure is in dependencies, not in nesting. A typical entry:
{
"bom-ref": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
"type": "library",
"name": "apt",
"version": "2.4.14",
"purl": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
"licenses": [
{ "license": { "name": "GPL-2.0-only" } }
],
"properties": [
{ "name": "aquasecurity:trivy:LayerDigest", "value": "sha256:6edbc812af48..." },
{ "name": "aquasecurity:trivy:PkgID", "value": "apt@2.4.14" },
{ "name": "aquasecurity:trivy:PkgType", "value": "ubuntu" }
]
}
Things to notice:
bom-refis the in-document identifier. CycloneDX lets you choose anything unique; Trivy and Syft conventionally use the PURL itself, which means refs are stable across regenerations.typeclassifies what the component is.libraryis the catch-all for packages;application,framework,operating-system,firmware,file,containerare the others.purlis duplicated outsidebom-refso consumers that key onpurldon't have to re-parse the ref.propertiesis the format's open-ended escape hatch. Tools embed namespaced key-value pairs (aquasecurity:trivy:*,syft:*) for tool-specific metadata that doesn't fit the schema.
The dependencies array — the graph as adjacency list
This is the structural counterpart to SPDX's relationships. It lists, for each bom-ref, the bom-refs it depends on:
"dependencies": [
{
"ref": "oci-registry:5000/ubuntu-curl@sha256:0124b538...",
"dependsOn": [
"pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
"pkg:deb/ubuntu/bash@5.1-6ubuntu1.1?arch=arm64&distro=ubuntu-22.04",
"pkg:deb/ubuntu/libc6@2.35-0ubuntu3.4?arch=arm64&distro=ubuntu-22.04"
]
},
{
"ref": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
"dependsOn": [
"pkg:deb/ubuntu/libapt-pkg6.0@2.4.14?arch=arm64&distro=ubuntu-22.04",
"pkg:deb/ubuntu/libc6@2.35-0ubuntu3.4?arch=arm64&distro=ubuntu-22.04"
]
}
]
CycloneDX explicitly does not distinguish runtime vs build-time vs static-link dependencies in the core dependencies schema (CycloneDX 1.6 added an optional dependencyType property to address this). For most container-image SBOMs that's fine — everything is a runtime dependency once the image is built.
Same data, different shape — SPDX vs CycloneDX side by side
For the same apt package:
| Concern | SPDX | CycloneDX |
|---|---|---|
| Internal ID | SPDXRef-Package-deb-apt-5be364a4af57b701 | bom-ref: "pkg:deb/ubuntu/apt@2.4.14?..." |
| Cross-ecosystem ID | externalRefs array (PURL + CPE) | purl field (CPE optional, in cpe field) |
| Files owned | relationships[CONTAINS] from package to file SPDXIDs | Not represented (Trivy CycloneDX); file-level requires extra components of type: "file" |
| Integrity | packageVerificationCode (SHA-1 over file hashes) | hashes array per component (less common in OS-package SBOMs) |
| License | licenseConcluded + licenseDeclared (separate fields) | licenses array (single notion) |
| Graph edges | relationships array (typed: CONTAINS, DEPENDS_ON, STATIC_LINK, ...) | dependencies array (single edge type) |
| Tool metadata | creationInfo.creators | metadata.tools.components |
| Document root | SPDXRef-DOCUMENT + DESCRIBES relationship | metadata.component |
The two formats describe the same reality. SPDX is more granular (separate concluded vs declared license, file-level relationships, multiple typed edges); CycloneDX is more compact and easier to round-trip programmatically. Most modern tools speak both and you can losslessly convert OS-level SBOMs between them with cyclonedx-cli convert or syft convert.
Aside: One Downstream Use — Vulnerability Scanning
The SBOM by itself is just an inventory. The most common downstream use is feeding the PURL/CPE list into a vulnerability database to discover CVEs. We won't dwell on it (this post is about SBOMs, not scanners), but here's the one-paragraph version of how the pipeline works:
SBOM (PURL/CPE per package) ──► lookup in vuln DB (NVD, OSV.dev, distro tracker)
──► match version against affected ranges
──► report CVEs + severity + fix version
For reference, running trivy image on our ubuntu-curl:v1 reports 35 CVEs (18 LOW, 17 MEDIUM, 0 HIGH/CRITICAL). A sample finding:
{
"VulnerabilityID": "CVE-2026-27456",
"PkgName": "bsdutils",
"InstalledVersion": "1:2.37.2-4ubuntu3.5",
"FixedVersion": null,
"Severity": "MEDIUM",
"Title": "util-linux: TOCTOU in the mount program when setting up loop devices"
}
Note the matching key: "PkgName": "bsdutils", "InstalledVersion": "1:2.37.2-4ubuntu3.5" — that's the same name+versionInfo we saw in the SBOM. The scanner looked up the package's PURL in its database and got back a CVE list. The SBOM was the input, the database was the lookup table, the CVE report was the output. Once you have an SBOM, this scan can run anywhere — you don't need access to the original image.
Step 3: Attach the SBOM to the Image as an OCI Artifact
Now the interesting part: we attach the SBOM to the registry next to the image, using the OCI 1.1 subject + Referrers mechanism we explored in Part 4.
What oras attach does
docker exec -w /work oci-lab bash -c '
oras attach --plain-http \
--artifact-type application/spdx+json \
oci-registry:5000/ubuntu-curl:v1 \
sbom.spdx.json:application/spdx+json
'
Uploading 28164ea0c196 sbom.spdx.json Uploaded 28164ea0c196 sbom.spdx.json Attached to [registry] oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8 Digest: sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
Three things happened:
- The SBOM JSON file was pushed as a blob (
sha256:28164ea0..., 1.9 MB). - An OCI manifest was created describing it (
sha256:f50bb644...). - The manifest's
subjectfield points to our image manifest.
Let's also attach the CycloneDX one:
docker exec -w /work oci-lab bash -c '
oras attach --plain-http \
--artifact-type application/vnd.cyclonedx+json \
oci-registry:5000/ubuntu-curl:v1 \
sbom.cdx.json:application/vnd.cyclonedx+json
'
Uploading fb4cd9377fac sbom.cdx.json Uploaded fb4cd9377fac sbom.cdx.json Attached to [registry] oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8 Digest: sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499
The SBOM Manifest — Look at What Was Just Pushed
docker exec oci-lab bash -c '
curl -s "http://oci-registry:5000/v2/ubuntu-curl/manifests/sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4" \
-H "Accept: application/vnd.oci.image.manifest.v1+json" | jq .
'
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/spdx+json",
"config": {
"mediaType": "application/vnd.oci.empty.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2,
"data": "e30="
},
"layers": [
{
"mediaType": "application/spdx+json",
"digest": "sha256:28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195",
"size": 1943149,
"annotations": {
"org.opencontainers.image.title": "sbom.spdx.json"
}
}
],
"subject": {
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8",
"size": 424
},
"annotations": {
"org.opencontainers.image.created": "2026-05-09T11:36:17Z"
}
}
This is the same shape as a notation signature manifest. The only differences:
| Field | Notation Signature | SBOM (this manifest) |
|---|---|---|
artifactType | application/vnd.cncf.notary.signature | application/spdx+json |
config.mediaType | application/vnd.cncf.notary.signature | application/vnd.oci.empty.v1+json |
layers[0].mediaType | application/cose | application/spdx+json |
subject | image manifest digest | image manifest digest (same!) |
The subject field works identically. The OCI registry doesn't care that one is a signature and the other is an SBOM — both are just manifests with a subject.
The empty config ({}, 2 bytes, mediaType application/vnd.oci.empty.v1+json) is the OCI-spec-blessed "I have no config" placeholder. Notice the data: "e30=" field — that's {} base64-encoded inlined directly into the manifest, so even fetching the config blob is optional.
Step 4: Discover Attached Artifacts via the Referrers Mechanism
Method 1 — oras discover (the convenient way)
docker exec oci-lab oras discover --plain-http oci-registry:5000/ubuntu-curl:v1 --format tree
oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
├── application/spdx+json
│ └── sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
└── application/vnd.cyclonedx+json
└── sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499
Both SBOMs are now discoverable, grouped by artifactType.
Method 2 — Raw HTTP via the Referrers tag fallback
registry:2 doesn't support the OCI 1.1 Referrers API natively, so oras (and notation, and trivy) all use the tag-based fallback: a tag named sha256-<hex> whose content is an OCI Image Index listing all referrers.
docker exec oci-lab bash -c ' curl -s http://oci-registry:5000/v2/ubuntu-curl/tags/list | jq . '
{
"name": "ubuntu-curl",
"tags": [
"sha256-0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8",
"v1"
]
}
Notice the sha256-... tag — that's the referrer index, named after our image's manifest digest with : replaced by -.
docker exec oci-lab bash -c '
curl -s "http://oci-registry:5000/v2/ubuntu-curl/manifests/sha256-0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8" \
-H "Accept: application/vnd.oci.image.index.v1+json" | jq .
'
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.index.v1+json",
"manifests": [
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4",
"size": 730,
"artifactType": "application/spdx+json",
"annotations": {
"org.opencontainers.image.created": "2026-05-09T11:36:17Z"
}
},
{
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"digest": "sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499",
"size": 746,
"artifactType": "application/vnd.cyclonedx+json",
"annotations": {
"org.opencontainers.image.created": "2026-05-09T11:36:17Z"
}
}
]
}
The artifactType annotation lets clients filter: "Give me the SPDX one only" or "Give me everything signature-related". Notation signatures, SBOMs, vulnerability scans, and SLSA attestations all live side by side under the same parent image.
Method 3 — The Referrers API (when supported)
On a registry that supports OCI 1.1 natively (Zot, Harbor 2.9+, GHCR, ECR, ACR, Docker Hub):
GET /v2/ubuntu-curl/referrers/sha256:0124b538... Accept: application/vnd.oci.image.index.v1+json → 200 OK → Body: <same Image Index as above, computed dynamically by the registry>
Optionally filter by artifact type:
GET /v2/ubuntu-curl/referrers/sha256:0124b538...?artifactType=application/spdx+json → 200 OK → OCI-Filters-Applied: artifactType → Body: <Image Index containing only SPDX referrers>
Clients try the API first; fall back to the tag if it returns 404.
Step 5: Pull the SBOM Back
A consumer (CI pipeline, security scanner, admission controller) can pull the SBOM by content:
docker exec oci-lab bash -c '
mkdir -p /tmp/pulled-sboms && cd /tmp/pulled-sboms
oras pull --plain-http \
oci-registry:5000/ubuntu-curl@sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
ls -la
'
Downloaded 28164ea0c196 sbom.spdx.json Pulled [registry] oci-registry:5000/ubuntu-curl@sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4 Digest: sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4 -rw-r--r-- 1 root root 1943149 May 9 11:36 sbom.spdx.json
Verify the content matches what we pushed:
docker exec oci-lab sha256sum /work/sbom.spdx.json /tmp/pulled-sboms/sbom.spdx.json
28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195 /work/sbom.spdx.json 28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195 /tmp/pulled-sboms/sbom.spdx.json
Identical. Content-addressable storage at work.
The Big Picture — Object Inventory
BEFORE attaching SBOMs:
┌─────────────────────────────────────────────────────────────────┐
│ Tags: │
│ v1 ──► sha256:0124b538... (image manifest) │
│ │
│ Blobs: │
│ sha256:0124b538... = image manifest (424 B) │
│ sha256:8bdde1d7... = image config (2,069 B) │
│ sha256:6edbc812... = Ubuntu layer (27,606,543 B) │
└─────────────────────────────────────────────────────────────────┘
AFTER attaching SPDX + CycloneDX SBOMs:
┌─────────────────────────────────────────────────────────────────┐
│ Tags: │
│ v1 ──► sha256:0124b538... (image manifest, unchanged) │
│ sha256-0124b538... ──► referrer index │
│ │
│ Original blobs (untouched, image digest unchanged): │
│ sha256:0124b538... = image manifest (424 B) │
│ sha256:8bdde1d7... = image config (2,069 B) │
│ sha256:6edbc812... = Ubuntu layer (27,606,543 B) │
│ │
│ New blobs from SBOM attachment: │
│ sha256:f50bb644... = SPDX manifest (730 B) │
│ sha256:39519e85... = CycloneDX manifest (746 B) │
│ sha256:44136fa3... = empty config {} (2 B, shared) │
│ sha256:28164ea0... = SPDX SBOM blob (1,943,149 B) │
│ sha256:fb4cd937... = CycloneDX SBOM blob (209,683 B) │
│ │
│ Relationships: │
│ referrer index ──lists──► [SPDX manifest, CycloneDX manifest]│
│ SPDX manifest ──subject──► image manifest │
│ CycloneDX manifest ──subject──► image manifest │
└─────────────────────────────────────────────────────────────────┘
The image is untouched. Its manifest digest is exactly the same before and after. Anyone pinning ubuntu-curl@sha256:0124b538... gets bit-for-bit identical bytes. The SBOMs live alongside, discoverable but separate.
Why This Design Wins
The OCI 1.1 subject + Referrers approach has three properties that older "embed-it-in-the-image" approaches lack:
1. The signed image stays signed
If you embedded the SBOM as an extra layer in the image, the image manifest digest would change every time you regenerated the SBOM. That breaks digest pinning, breaks signatures, and forces re-signing on every SBOM update. With referrers, the image is immutable; metadata is mutable.
2. Anyone can attach anything, anytime
You don't need to modify the image to attach an SBOM. Your CI pipeline can build and push the image, then a separate stage (or a completely separate team/service) can run syft and attach the result. Vulnerability scans can be re-run weekly and re-attached without touching the image.
3. One mechanism for all metadata
Same plumbing for everything:
Image manifest: sha256:0124b538... ↑ subject (referrers) ├── application/vnd.cncf.notary.signature ← Notation signature ├── application/spdx+json ← SBOM (SPDX) ├── application/vnd.cyclonedx+json ← SBOM (CycloneDX) ├── application/vnd.in-toto+json ← SLSA provenance attestation └── application/sarif+json ← Vulnerability scan results
The registry doesn't need plugins, special endpoints, or knowledge of these formats. It just stores manifests with subject fields and serves them via GET /v2/<name>/referrers/<digest> or the tag fallback.
Production Patterns
Pattern 1: Generate-and-attach in CI
# After docker push, before declaring success:
- name: Generate SBOM
run: syft $IMAGE_REF -o spdx-json=sbom.spdx.json
- name: Attach SBOM
run: |
oras attach \
--artifact-type application/spdx+json \
$IMAGE_REF \
sbom.spdx.json:application/spdx+json
Pattern 2: Admission-time verification
A Kubernetes admission controller (Kyverno, Ratify) can require both:
- A valid Notation signature (proves who built it)
- An attached SBOM with no CRITICAL CVEs (proves what's in it)
Both are discoverable through the same Referrers API call — Kubernetes admission gets the full provenance story in one place.
Pattern 3: SBOM diffing across versions
Pull SBOMs for myapp:v1 and myapp:v2, diff their package lists, and you have an automated changelog of dependencies. New packages → review for licensing. Removed packages → potential dead code. Version bumps → compare against vulnerability feeds.
Recap
In this part we:
| Step | Tool | Result |
|---|---|---|
| Generated SPDX SBOM | syft | 1.9 MB JSON, 102 packages, 2,290 files |
| Generated CycloneDX SBOM | trivy | 210 KB JSON, 102 components |
| Scanned for CVEs | trivy | 35 vulnerabilities (18 LOW, 17 MEDIUM) |
| Attached SPDX as OCI artifact | oras attach | New manifest with subject → image |
| Attached CycloneDX as OCI artifact | oras attach | Second referrer alongside SPDX |
| Discovered attached artifacts | oras discover + raw HTTP | Tree view + Image Index via referrer tag |
| Pulled the SBOM back | oras pull | Bit-identical to source (same sha256) |
The big takeaway
An SBOM is just an OCI artifact with artifactType: application/spdx+json (or application/vnd.cyclonedx+json). It uses the exact same plumbing as a notation signature: a manifest with a subject field, discoverable via the Referrers API or its tag-based fallback.
Once you understand subject + Referrers, you understand:
- Notation signatures (Part 4)
- SBOMs (this part)
- Vulnerability scan results
- SLSA build provenance
- Anything else the supply-chain world dreams up next
The OCI registry is no longer just a place to store images — it's a content-addressable graph of software, its provenance, and everything we know about it.
Cleanup
docker rm -f oci-registry oci-lab docker network rm oci-net
Series Recap
Across the five parts we went from spec to running container to a fully signed-and-described supply-chain artifact, end to end, with no magic:
| Part | We did | Spec |
|---|---|---|
| 1 | Built an OCI image by hand — manifest, config, layers, content-addressable blobs | OCI Image Spec |
| 2 | Pushed and pulled with raw curl against an nginx "registry" | OCI Distribution Spec |
| 3 | Ran a container with chroot, unshare, mount, overlayfs — no runc needed | OCI Runtime Spec |
| 4 | Signed an image with Notation and dissected the COSE envelope + Referrers index | OCI 1.1 Referrers / Notary Project |
| 5 (this) | Attached SBOMs as OCI artifacts and discovered them via the same Referrers mechanism | OCI 1.1 Referrers / Reference Types |
You now have the mental model to read every CVE feed, every supply-chain SBOM, every container registry response, and recognize what it is and where it fits.
Every digest, byte count, package count, CVE ID, and command output in this post was captured from an actual run inside Docker Desktop for Mac (arm64) on May 9, 2026. Tools used: registry:2, syft 1.44.0, oras 1.2.0, trivy 0.70.0, skopeo, curl, jq.