Series: Understanding OCI from the Ground Up (Part 5 of 5)

In Part 1 we built an OCI image. In Part 2 we pushed it with raw HTTP. In Part 3 we ran it with bare Linux primitives. In Part 4 we signed it with Notation and saw how the OCI 1.1 `subject` + Referrers mechanism works. Now we use the exact same plumbing to attach a Software Bill of Materials (SBOM) to that image — proving the design generalizes far beyond signatures.

What is an SBOM?

A Software Bill of Materials (SBOM) is a machine-readable inventory of everything inside a piece of software. For a container image, an SBOM tells you:

  • Every OS package with name, version, license, and supplier (apt 2.4.14, libc6 2.35-0ubuntu3.4, ...)
  • Every language-level dependency (npm modules, pip wheels, Go modules, Maven JARs)
  • Every file delivered by each package, with its hash
  • The relationships between them — which package contains which file, which package depends on which other package, which package is the root "thing" the SBOM is about
  • Cross-ecosystem identifiers (PURLs, CPEs) so the SBOM can be cross-referenced with package registries, advisory feeds, and license databases

Think of an SBOM as a typed graph: nodes are packages and files, edges are typed relationships (CONTAINS, DEPENDS_ON, DESCRIBES), and every node carries enough metadata to be uniquely identified across the world.

Why SBOMs exist:

  • Inventory — You can't manage what you can't see. An SBOM is the first honest answer to "what's actually in this image?"
  • Reproducibility & provenance — Two builds of the same Dockerfile a week apart can pull in different upstream versions. An SBOM captures the exact set that shipped.
  • License compliance — The original driver behind SPDX (2010). Knowing every package's licenseDeclared is a legal requirement in many regulated industries.
  • Vulnerability matching — A scanner can take an SBOM and look up each package's PURL/CPE in a vulnerability database to find known CVEs (we'll see this briefly later).
  • Compliance mandates — US Executive Order 14028 and the EU Cyber Resilience Act require SBOMs for software shipped to government and regulated buyers.
  • Supply chain integrity — Combined with the Notation signatures from Part 4, SBOMs let you verify what's inside an image alongside who built it.

Background — Identifying a Package

Before we look at any SBOM file, we need to answer one question: given a file on disk, how do you describe a package precisely enough that a tool on the other side of the world can recognize it?

The answer is two parallel naming systems: PURL and CPE. Almost every package entry in every SBOM you'll ever see carries both.

PURL — Package URL (the modern identifier)

A PURL (spec) is a single string that uniquely identifies a package across ecosystems. Format:

pkg:<type>/<namespace>/<name>@<version>?<qualifiers>
PartMeaningExample
<type>Package ecosystemdeb, rpm, apk, npm, pypi, golang, maven, cargo, oci
<namespace>Distro / org / scope (optional)ubuntu, debian, @angular, github.com/gorilla
<name>Package nameapt, lodash, requests
<version>Exact version string2.4.14, 4.17.21, v1.8.0
<qualifiers>Disambiguators (optional)arch=arm64, distro=ubuntu-22.04, epoch=1

Examples you'll see in this post:

PURLWhat it means
pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04Debian package apt 2.4.14 from Ubuntu 22.04, arm64
pkg:npm/lodash@4.17.21npm package lodash 4.17.21
pkg:pypi/django@4.2.7PyPI package django 4.2.7
pkg:golang/github.com/gorilla/mux@v1.8.0Go module gorilla/mux v1.8.0
pkg:oci/ubuntu-curl@sha256:0124b538...The container image itself, by digest

PURLs are the modern community standard — used by OSV.dev, GitHub Advisory Database, Snyk, Trivy, Syft, and almost every new tool. Given a PURL, a scanner can look up known vulnerabilities in seconds.

CPE — Common Platform Enumeration

A CPE (NIST spec) is the identifier scheme used by NIST's National Vulnerability Database. Format:

cpe:2.3:<part>:<vendor>:<product>:<version>:<update>:<edition>:<lang>:<sw_edition>:<target_sw>:<target_hw>:<other>

<part> is a (application), o (operating system), or h (hardware). Asterisks are wildcards. Example:

cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*
        ↑  ↑       ↑
     part vendor   version

CPEs predate PURLs by about a decade. They live on because NVD and many enterprise tools still use them. Modern SBOMs include both — PURL because that's what the open-source ecosystem uses, CPE because that's what NVD uses.

Why two systems? History. CPE came from US-government compliance work in the early 2000s; PURL came from the open-source community in the late 2010s. SBOM generators emit both so downstream tools can pick whichever they understand.

Where these identifiers go

In an SPDX SBOM, each package's externalRefs array carries them:

"externalRefs": [
  { "referenceCategory": "SECURITY",         "referenceType": "cpe23Type", "referenceLocator": "cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*" },
  { "referenceCategory": "PACKAGE-MANAGER",  "referenceType": "purl",      "referenceLocator": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04" }
]

In a CycloneDX SBOM, PURL is a first-class field on every component:

{ "name": "apt", "version": "2.4.14", "purl": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04" }

With those two identifiers in hand, an SBOM is portable knowledge about an image — anyone, anywhere, with any tool, can pick it up and reason about it.


SBOM Formats — SPDX and CycloneDX

Two standards dominate. Both are JSON. Both describe the same things. Different communities chose different schemas:

SPDXCycloneDX
StewardLinux Foundation (ISO/IEC 5962:2021)OWASP
OriginLicense compliance (2010)Application security (2017)
Identifier schemeSPDXRef-* (internal) + PURL (external)bom-ref + PURL
Top-level unitspackages + files + relationshipscomponents + dependencies
VulnerabilitiesVia separate VEX docsBuilt into the BOM (vulnerabilities key)
Default forsyft (Anchore), Kubernetestrivy (Aqua), cyclonedx-cli

In practice, both formats describe the same image. Tools convert between them. We'll generate both.


How Syft Identifies Packages — Catalogers and Evidence

Before we run any commands, here's the mental model for what syft (or any SBOM generator) actually does inside.

Syft does not "scan binaries" or run heuristics. It runs a fleet of small, specialised programs called catalogers, each of which knows how to recognise one specific kind of evidence on a filesystem.

What a cataloger is

A cataloger is a Go module inside Syft (and similar in other tools) with one job:

Walk a filesystem. Recognise the metadata files of one packaging system. Parse them. Emit a list of structured Package records.

Each cataloger looks at a few specific path patterns and parses files in formats it knows. They run independently and their outputs are merged.

The catalogers Syft ships with

A short tour of catalogers relevant to container images:

CatalogerLooks forParses
dpkg-db/var/lib/dpkg/status, /var/lib/dpkg/info/*.md5sumsDebian/Ubuntu OS packages
rpm-db/var/lib/rpm/Packages (Berkeley DB or sqlite)RPM packages on Red Hat / Fedora / SUSE
apk-db/lib/apk/db/installedAlpine packages
java-archive*.jar, *.war, *.ear (and their META-INF/MANIFEST.MF, pom.properties)Java libraries
python-package*.dist-info/METADATA, *.egg-info/PKG-INFO, requirements.txtInstalled PyPI wheels and pip-style declarations
javascript-packagepackage.json, package-lock.json, yarn.locknpm modules
go-module-binaryELF binaries with embedded module infoGo modules statically compiled into a binary
go-mod-filego.mod, go.sumDeclared Go dependencies
rust-cargoCargo.lockRust crates
ruby-gemspec*.gemspec, Gemfile.lockRuby gems
php-composercomposer.lock, installed.jsonPHP Composer packages
binary-classifierSpecific binaries (node, python3, httpd, nginx, ...)Identifies a known binary by its byte signature and reads its embedded version

For the full list: syft cataloger list. As of Syft 1.44 there are 30+ catalogers covering every major ecosystem.

Evidence sources — where the data actually comes from

For our ubuntu:22.04 image, the dpkg-db cataloger is the only one that finds anything. Watch what it reads. Recall the sourceInfo field on the apt package later in this post:

acquired package info from DPKG DB:
  /var/lib/dpkg/status
  /usr/share/doc/apt/copyright
  /var/lib/dpkg/info/apt.conffiles
  /var/lib/dpkg/info/apt.md5sums
  /var/lib/dpkg/info/apt.list
  /var/lib/dpkg/info/apt.postinst
  /var/lib/dpkg/info/apt.postrm
  /var/lib/dpkg/info/apt.preinst
  /var/lib/dpkg/info/apt.prerm
  /var/lib/dpkg/info/apt.shlibs
  /var/lib/dpkg/info/apt.triggers

That list is verbatim what dpkg itself maintains for every installed package. status gives name/version/architecture/dependencies; copyright gives license text; *.md5sums gives the exact list of files belonging to that package and their MD5 hashes; *.list gives the full file paths.

This is not magic. Syft's dpkg cataloger essentially reads the same files dpkg --status apt would read — it just does it without running dpkg, by parsing the files directly. That's why Syft works on a static filesystem (a tarball, a pulled image, an OCI registry blob) without needing the package manager installed.

How Syft sees a container image

A container image is a stack of layer tarballs. Syft does this:

1. Pull / open the image (from a registry, daemon, tarball, or directory)
2. Build a layered filesystem view in memory (the "squashed" view, plus per-layer detail)
3. For each registered cataloger:
   a. Use the cataloger's path-glob pattern to find candidate files
   b. Parse each candidate file
   c. Emit Package records
4. Run a relationships pass:
   - Tie each package to the files it owns (from .md5sums / .list)
   - Tie packages to the layer they came from
   - Tie everything to the image as the root
5. Emit the final SBOM in the requested format (SPDX, CycloneDX, syft-json)

Steps 3 and 4 are why an SBOM is a graph, not just a list. The relationships are what make queries like "which files in layer 2 belong to which package?" possible.

What syft — and any SBOM tool — cannot do reliably

Worth being honest about the limits, because the SBOM is only as good as the catalogers' coverage:

  • Statically linked binaries with no metadata (a Go binary built with -trimpath and stripped) often show up as "unidentified files" or just a binary classifier hit — the version may be wrong or missing.
  • Code copied into the source tree (vendored without a manifest) is invisible. There is no metadata file to read.
  • Custom-compiled libraries dropped into /usr/local/lib without a package manager record are invisible to OS-package catalogers; they may still show up via the binary-classifier if syft happens to recognise their signature.
  • Application-level dependencies inside a built artifact (e.g. node_modules already bundled into a single dist.js) usually require running the bundler-aware cataloger before bundling, not after.

This is why generating the SBOM at build time — when lockfiles and intermediate artifacts are still present — is the production best practice. Generating it from a finished image is still useful, just less complete.


Prerequisites — The Lab

We use the same network-of-containers pattern from Parts 2 and 3: a real OCI registry plus a lab container with our tools.

# Create network and start the registry
docker network create oci-net
docker run -d --name oci-registry --network oci-net -p 5000:5000 registry:2
docker run -d --name oci-lab --network oci-net ubuntu:22.04 sleep 7200

# Install base tools in the lab
docker exec oci-lab bash -c \
  'apt-get update -qq && apt-get install -y -qq curl jq skopeo ca-certificates > /dev/null 2>&1'

Install syft, oras, and trivy

docker exec oci-lab bash -c '
  # syft — generates SBOMs (SPDX, CycloneDX, syft-json)
  curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

  # oras — pushes/pulls arbitrary OCI artifacts (the swiss-army knife for OCI 1.1)
  curl -sSLo /tmp/oras.tar.gz "https://github.com/oras-project/oras/releases/download/v1.2.0/oras_1.2.0_linux_arm64.tar.gz"
  tar -xzf /tmp/oras.tar.gz -C /tmp/ && mv /tmp/oras /usr/local/bin/

  # trivy — vulnerability scanner that also generates CycloneDX SBOMs
  curl -sSLo /tmp/trivy.tar.gz "https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-ARM64.tar.gz"
  tar -xzf /tmp/trivy.tar.gz -C /tmp/ && mv /tmp/trivy /usr/local/bin/
'

Note: Replace arm64 / ARM64 with amd64 / x86_64 if you're on Intel.

Verify:

$ syft version | head -3
Application:   syft
Version:       1.44.0
BuildDate:     2026-05-01T17:11:01Z

$ oras version | head -3
Version:        1.2.0
Go version:     go1.22.3

$ trivy --version | head -2
Version: 0.70.0

Push a target image

We'll generate SBOMs for the same ubuntu-curl:v1 image we used in Part 2:

docker exec oci-lab skopeo copy --dest-tls-verify=false \
  docker://ubuntu:22.04 \
  docker://oci-registry:5000/ubuntu-curl:v1

Capture the manifest digest — the SBOM will reference it via the subject field:

docker exec oci-lab bash -c '
  curl -sI http://oci-registry:5000/v2/ubuntu-curl/manifests/v1 \
    -H "Accept: application/vnd.oci.image.manifest.v1+json" \
    | grep -i docker-content-digest
'
Docker-Content-Digest: sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8

Step 1: Generate an SPDX SBOM with Syft

Syft can read images directly from a registry. Since our registry uses plain HTTP, we tell syft to allow that:

docker exec -w /work oci-lab bash -c '
  export SYFT_REGISTRY_INSECURE_USE_HTTP=true
  syft registry:oci-registry:5000/ubuntu-curl:v1 \
    -o spdx-json=/work/sbom.spdx.json
'

Result: a 1.9 MB JSON file describing every package and file in the image.

$ ls -la /work/sbom.spdx.json
-rw-r--r-- 1 root root 1943149 May  9 11:34 /work/sbom.spdx.json

What's inside

Top-level structure of an SPDX 2.3 document. The header is a handful of scalar fields plus creationInfo; the payload is the four arrays at the bottom:

{
  "spdxVersion":       "SPDX-2.3",
  "dataLicense":       "CC0-1.0",
  "SPDXID":            "SPDXRef-DOCUMENT",
  "name":              "oci-registry:5000/ubuntu-curl",
  "documentNamespace": "https://anchore.com/syft/image/oci-registry-5000/ubuntu-curl-da9454a6-742f-497e-a5db-16ae9aa0b48f",

  "creationInfo": {
    "licenseListVersion": "3.28",
    "creators": [
      "Organization: Anchore, Inc",
      "Tool: syft-1.44.0"
    ],
    "created": "2026-05-09T11:34:11Z"
  },

  "packages":                   [ /* 102 entries */    ],
  "files":                      [ /* 2,290 entries */  ],
  "relationships":              [ /* 2,848 entries */  ],
  "hasExtractedLicensingInfos": [ /* custom licenses */ ]
}

Header fields (scalars + the creationInfo object) describe the document itself — covered in detail later under Document-level fields.

The four payload arrays are where the actual SBOM data lives:

ArrayCount (our image)What it holds
packages102Every package Syft identified — OS packages (deb), language-level deps (none in this image), and one entry for the image itself as the root
files2,290Every file Syft cataloged, each with name, multiple checksums, and an SPDXID. Present because filesAnalyzed: true on the packages.
relationships2,848Typed edges between SPDXIDs — DESCRIBES, CONTAINS, DEPENDS_ON, etc. This is what makes the document a graph rather than a flat list.
hasExtractedLicensingInfosvariesFull text of any non-standard license (LicenseRef-*) referenced from licenseDeclared / licenseConcluded. Empty if every package uses a standard SPDX License List ID.

A few other arrays the spec defines that may or may not appear, depending on the producer and the input:

ArrayWhen it shows up
snippetsSource-code analysis tools (FOSSology, ScanCode). Almost never in container-image SBOMs.
annotationsDocument-level reviewer/tool comments. Optional.
externalDocumentRefsWhen this SBOM references packages defined in another SBOM (e.g. an app SBOM pointing at a base-image SBOM). Optional.

So the honest summary is: SPDX 2.3 has four payload arrays you'll see in nearly every container-image SBOM (packages, files, relationships, hasExtractedLicensingInfos), plus three optional ones (snippets, annotations, externalDocumentRefs) that show up in specialised use cases.

A sample package

{
  "name": "apt",
  "SPDXID": "SPDXRef-Package-deb-apt-5be364a4af57b701",
  "versionInfo": "2.4.14",
  "supplier": "NOASSERTION",
  "downloadLocation": "NOASSERTION",
  "filesAnalyzed": true,
  "packageVerificationCode": {
    "packageVerificationCodeValue": "e75a97363fdfe68c12c4bb109d55771cae4f3a3c"
  },
  "sourceInfo": "acquired package info from DPKG DB: /var/lib/dpkg/status, /usr/share/doc/apt/copyright, /var/lib/dpkg/info/apt.conffiles, ...",
  "licenseConcluded": "NOASSERTION",
  "licenseDeclared": "GPL-2.0-only AND LicenseRef-GPLv2-",
  "copyrightText": "NOASSERTION",
  "externalRefs": [
    {
      "referenceCategory": "SECURITY",
      "referenceType": "cpe23Type",
      "referenceLocator": "cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*"
    },
    {
      "referenceCategory": "PACKAGE-MANAGER",
      "referenceType": "purl",
      "referenceLocator": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04"
    }
  ]
}

Every field is doing meaningful work. Walking the package entry top-to-bottom:

FieldWhat it carriesWhy it's there
name, versionInfoHuman-readable identityLets people read the SBOM
SPDXIDDocument-internal ID (SPDXRef-Package-deb-apt-5be364a4af57b701)Used as the source/target of relationships entries (see below). The hex suffix is a content hash so two builds emit stable IDs.
supplierPerson / Organization who supplies the packageOften NOASSERTION for OS packages where dpkg doesn't track this cleanly
downloadLocationWhere this version can be re-fetchedNOASSERTION if not known
filesAnalyzedtrue if Syft enumerated the package's filesDetermines whether packageVerificationCode is meaningful
packageVerificationCodeSHA-1 over the sorted list of file SHA-1s belonging to the packageTamper-evident: if any file changes, this value changes. Reproducible across builds.
sourceInfoFree-text trace of which files Syft read to learn about this packageProvenance for the SBOM itself — you can audit Syft's evidence trail
licenseConcludedLicense concluded by analysisWhat an analyst concluded after reading. NOASSERTION means "no claim". The two fields exist precisely because they can disagree.
licenseDeclaredLicense declared by the upstream packager (here, from debian/copyright)What the project says it is
copyrightTextCopyright notice textLicense-compliance use case
externalRefsCross-ecosystem identifiers (PURL, CPE, etc.)The portable handles other tools use

externalRefs — not just PURL and CPE

The SPDX spec defines several referenceCategory values that you'll see in the wild:

referenceCategoryreferenceType examplesUsed for
PACKAGE-MANAGERpurl, npm, maven-centralCross-ecosystem package handle (PURL is the universal one)
SECURITYcpe23Type, cpe22Type, advisory, fix, urlIdentifiers for vulnerability matching, plus links to advisories
PERSISTENT-IDswh (Software Heritage), gitoidLong-term archival identifiers — the package's source code by content-hash
OTHER(anything)Custom locator types tools have invented

packageVerificationCode — the integrity check

The SPDX spec defines packageVerificationCode as: take every file SPDX considers part of this package, compute SHA-1 of each, sort the hex strings, concatenate, and SHA-1 the result. The output is stable across machines, OSes, and time. Two builds of the same package always produce the same value; any tampering with any owned file changes it. Together with licenseDeclared and externalRefs, this is what makes SPDX records independently verifiable, not just descriptive.

The files array — file-level evidence

With filesAnalyzed: true, the SPDX document contains a files array. In our SBOM that's 2,290 entries. A typical entry:

{
  "SPDXID": "SPDXRef-File-bin-bash-3a7f1c8b9e2d4f56",
  "fileName": "/bin/bash",
  "checksums": [
    { "algorithm": "SHA1",   "checksumValue": "a8c1b..." },
    { "algorithm": "SHA256", "checksumValue": "3f617f3..." },
    { "algorithm": "MD5",    "checksumValue": "44136fa..." }
  ],
  "licenseConcluded": "NOASSERTION",
  "copyrightText": "NOASSERTION"
}

Files get their own SPDXIDs because they are first-class nodes in the relationships graph.

The relationships array — where the graph lives

This is the part most people miss when they first read an SBOM. Our document has 2,848 relationship entries. Each is a typed edge between two SPDXIDs:

[
  {
    "spdxElementId":      "SPDXRef-DOCUMENT",
    "relationshipType":   "DESCRIBES",
    "relatedSpdxElement": "SPDXRef-Package-oci-ubuntu-curl-..."
  },
  {
    "spdxElementId":      "SPDXRef-Package-oci-ubuntu-curl-...",
    "relationshipType":   "CONTAINS",
    "relatedSpdxElement": "SPDXRef-Package-deb-apt-5be364a4af57b701"
  },
  {
    "spdxElementId":      "SPDXRef-Package-deb-apt-5be364a4af57b701",
    "relationshipType":   "CONTAINS",
    "relatedSpdxElement": "SPDXRef-File-usr-bin-apt-..."
  },
  {
    "spdxElementId":      "SPDXRef-Package-deb-apt-5be364a4af57b701",
    "relationshipType":   "DEPENDS_ON",
    "relatedSpdxElement": "SPDXRef-Package-deb-libapt-pkg6.0-..."
  }
]

The relationship types you'll see most often:

TypeMeaning
DESCRIBESThe document describes the target. Used once, from SPDXRef-DOCUMENT to the root package (here, the image).
CONTAINSThe source contains the target as a subcomponent. Image CONTAINS packages; package CONTAINS files.
DEPENDS_ONThe source requires the target at runtime.
BUILD_DEPENDENCY_OFThe source is required only at build time.
PATCH_FORThe source is a patch for the target.
STATIC_LINK / DYNAMIC_LINKThe source links the target statically/dynamically.

This is what makes SPDX a graph format and not just a list. Queries like "which files belong to package X?" or "what happens if I remove package Y?" are all just graph traversals over the relationships array.

SPDXRef-DOCUMENT
     │ DESCRIBES
     ▼
SPDXRef-Package-oci-ubuntu-curl-...        ← the image (root)
     │ CONTAINS  (×102)
     ├──► SPDXRef-Package-deb-apt-...
     │        │ CONTAINS  (×N files)
     │        ├──► SPDXRef-File-usr-bin-apt-...
     │        └──► SPDXRef-File-etc-apt-apt.conf.d-...
     │        │ DEPENDS_ON
     │        └──► SPDXRef-Package-deb-libapt-pkg6.0-...
     ├──► SPDXRef-Package-deb-bash-...
     └──► SPDXRef-Package-deb-libc6-...

Spec versions — SPDX 2.3 vs SPDX 3.0

SPDX has gone through several major revisions. The two that matter today:

SPDX 2.3SPDX 3.0
Released20222024
StatusCurrent de-facto standardReleased, slow adoption
SchemaFlat: top-level packages, files, relationships arraysElement-graph: everything is an Element, relationships are first-class elements
Profile systemNoneModular profiles: Core, Software, Build, Security, AI, Dataset, Licensing, Lite
ToolingUniversal: every SBOM tool emits SPDX 2.3Growing: spdx-tools, syft (preview)

Almost every SBOM you'll meet in the wild today is SPDX 2.3. The Syft output we generated above is SPDX 2.3. SPDX 3.0 is a clean break — fundamentally a different data model — and will probably take a few years to dominate. Knowing 2.3 deeply transfers most of the way to 3.0 once you learn the new element-graph vocabulary.

This post focuses on 2.3 because that's what's in production.

Serialization formats

The same SPDX 2.3 document can be serialized into four formats. The on-the-wire bytes differ; the data is identical.

FormatFile extensionNotes
JSON.spdx.jsonThe dominant format. What syft -o spdx-json emits. JSON Schema available.
YAML.spdx.yamlHuman-friendlier; less common
Tag-Value.spdxThe original SPDX format (key:value text). Still emitted by some tools.
RDF/XML.spdx.rdfSemantic-web format. Rare in practice.

Tools like syft convert and spdx-tools move losslessly between them. Use JSON for anything new — it's what every tool reads, and it's what registries store when you oras attach an SBOM.

Document-level fields — what every SPDX document must have

The fields at the very top of an SPDX 2.3 JSON document are mandatory and have specific meanings. Going through ours:

{
  "spdxVersion":       "SPDX-2.3",
  "dataLicense":       "CC0-1.0",
  "SPDXID":            "SPDXRef-DOCUMENT",
  "name":              "oci-registry:5000/ubuntu-curl",
  "documentNamespace": "https://anchore.com/syft/image/oci-registry-5000/ubuntu-curl-da9454a6-742f-497e-a5db-16ae9aa0b48f",
  "creationInfo":      { ... }
}
FieldWhy it must be exactly this
spdxVersionThe schema version. Parsers branch on this. Always SPDX- prefix.
dataLicenseThe license of the SBOM document itself. SPDX 2.x mandates CC0-1.0 so that SBOM data is freely shareable, regardless of the license of the software it describes.
SPDXIDThe document's own ID. Must be exactly SPDXRef-DOCUMENT.
nameA human label for the document. By convention, the name of the thing being described.
documentNamespaceA globally unique URI for this document. Two regenerations of the same SBOM should have different namespaces (note the UUID in ours). It's how external documents reference each other unambiguously — see the externalDocumentRefs field in spec.
creationInfoRequired metadata about the generation event.

Why the namespace matters: if Document A wants to reference a package defined in Document B, it points to <B's documentNamespace>#SPDXRef-Package-foo. The namespace is the anchor for cross-document references. Without it, SPDXIDs would only be unique within a single file.

creationInfo — the provenance of the SBOM itself

"creationInfo": {
  "licenseListVersion": "3.28",
  "creators": [
    "Organization: Anchore, Inc",
    "Tool: syft-1.44.0"
  ],
  "created": "2026-05-09T11:34:11Z",
  "comment": "..."
}
FieldMeaning
createdUTC timestamp when the SBOM was generated. Required.
creatorsArray of who/what created it. Each entry must start with Tool:, Organization:, or Person:. Convention: tools list both the tool and the organization that ran it.
licenseListVersionWhich version of the SPDX License List the document's license expressions were validated against. Important because the License List grows over time (3.28 has IDs that 3.20 didn't).
commentOptional free text.

This is the SBOM's audit trail — by reading creationInfo you know what tool produced the document, when, and against which license vocabulary.

License expressions — the small DSL inside licenseConcluded and licenseDeclared

The single most underestimated piece of SPDX is its license expression syntax. Every licenseConcluded, licenseDeclared, and file-level license field uses it.

The simplest expression is one of ≈83,000 IDs from the SPDX License List:

"licenseDeclared": "MIT"
"licenseDeclared": "Apache-2.0"
"licenseDeclared": "GPL-2.0-only"
"licenseDeclared": "GPL-2.0-or-later"

You can combine IDs with operators:

OperatorMeaningExample
ANDConjunction — you must comply with both licenses(MIT AND Apache-2.0)
ORDisjunction — you may comply with either(GPL-2.0-only OR Apache-2.0)
WITHLicense + an exceptionApache-2.0 WITH LLVM-exception
+This version or any laterLGPL-2.1+ (deprecated in favor of -or-later IDs)

Compound expressions are common in real SBOMs:

"licenseDeclared": "(MIT AND Apache-2.0) OR GPL-3.0-or-later"

That reads: "the recipient may comply with the conjunction (MIT and Apache-2.0), OR with GPL-3.0-or-later, at their choice."

Three special tokens: NONE, NOASSERTION, LicenseRef-*

"licenseConcluded": "NONE"          ← there is no license; the file is in the public domain or unlicensed
"licenseConcluded": "NOASSERTION"   ← the analyst makes no claim about the license
"licenseDeclared":  "LicenseRef-Vendor-EULA-2024"  ← a custom license defined elsewhere in the document

NONE and NOASSERTION are not the same thing. NONE is a positive claim ("no license applies"); NOASSERTION is a refusal to claim ("I don't know / I won't say"). Tools that auto-generate SBOMs default to NOASSERTION when the license is ambiguous — which, for OS packages, it usually is.

LicenseRef-* IDs let you reference a license that isn't on the SPDX License List. Their text is then provided in...

hasExtractedLicensingInfos — custom license texts

If your licenseDeclared includes a LicenseRef-Foo, you must also include a hasExtractedLicensingInfos entry with the actual license text:

"hasExtractedLicensingInfos": [
  {
    "licenseId":      "LicenseRef-GPLv2-",
    "name":           "GPLv2 (Debian-modified header)",
    "extractedText":  "                    GNU GENERAL PUBLIC LICENSE\n                       Version 2, June 1991\n\n Copyright (C) 1989, 1991 Free Software Foundation, Inc. ...",
    "comment":        "Found in /usr/share/doc/apt/copyright"
  }
]

This is what makes SPDX legally usable: even if a package ships under some bespoke vendor license, the SBOM carries the full text of that license alongside the reference. License-compliance auditors can review the SBOM as a self-contained legal artifact.

More package fields you'll meet

The package table earlier covered the most common fields. SPDX 2.3 defines a few more that show up regularly:

FieldWhat it carries
originatorThe entity that created the package (vs. supplier, who delivered it). For Debian's apt package: originator is the upstream apt project; supplier is Ubuntu.
primaryPackagePurposeOne of: APPLICATION, FRAMEWORK, LIBRARY, CONTAINER, OPERATING-SYSTEM, DEVICE, FIRMWARE, SOURCE, ARCHIVE, FILE, INSTALL, OTHER. Lets consumers filter (e.g. "show me all the FIRMWARE entries").
releaseDateWhen upstream released this version (ISO-8601 UTC).
builtDateWhen the binary in this image was built.
validUntilDateVendor-declared end-of-support date. Useful for "are we shipping anything past EOL?" queries.
homepageProject website URL.
attributionTextsRequired attribution notices (BSD-style "this product includes...") that must appear in derivative work documentation.
summary, descriptionShort and long human-readable descriptions of the package.
commentFree text from the SBOM author.
annotationsSee next subsection.

annotations — reviewer or tool commentary

Both packages and the document itself can carry annotations: dated, signed-off comments. This is how an analyst leaves a note on a finding without modifying any other field:

"annotations": [
  {
    "annotationDate":   "2026-05-09T11:35:00Z",
    "annotationType":   "REVIEW",
    "annotator":        "Person: Sandeep Choudary",
    "comment":          "Verified license claim against /usr/share/doc/apt/copyright on 2026-05-09."
  }
]

annotationType is one of REVIEW, OTHER. The lightweight design lets compliance workflows attach evidence to specific package entries.

Snippets — sub-file granularity

Sometimes a single source file mixes code under different licenses. SPDX has snippets for that:

"snippets": [
  {
    "SPDXID":            "SPDXRef-Snippet-libfoo-bsd-fragment",
    "snippetFromFile":   "SPDXRef-File-src-libfoo-merged.c",
    "ranges": [
      { "startPointer": { "offset": 1024 }, "endPointer": { "offset": 4096 } }
    ],
    "licenseConcluded":  "BSD-3-Clause",
    "copyrightText":     "Copyright (c) 2018 Original Author"
  }
]

Container-image SBOMs almost never use snippets — they're a source-code-analysis feature. But if you read SBOMs from compliance tools like FOSSology you'll meet them.

Full relationship type list

The earlier table showed the most common 6 relationship types. SPDX 2.3 actually defines about 40. The full set falls into rough categories:

Composition:        CONTAINS, CONTAINED_BY, DESCRIBES, DESCRIBED_BY,
                    PACKAGE_OF, HAS_PREREQUISITE, PREREQUISITE_FOR

Dependency:         DEPENDS_ON, DEPENDENCY_OF,
                    DEPENDENCY_MANIFEST_OF, DEV_DEPENDENCY_OF,
                    OPTIONAL_DEPENDENCY_OF, BUILD_DEPENDENCY_OF,
                    RUNTIME_DEPENDENCY_OF, TEST_DEPENDENCY_OF,
                    PROVIDED_DEPENDENCY_OF

Build & source:     GENERATED_FROM, GENERATES, BUILD_TOOL_OF,
                    DEV_TOOL_OF, OPTIONAL_COMPONENT_OF

Linkage:            STATIC_LINK, DYNAMIC_LINK

Lifecycle:          PATCH_FOR, PATCH_APPLIED, COPY_OF,
                    ANCESTOR_OF, DESCENDANT_OF, VARIANT_OF

Distribution:       DISTRIBUTION_ARTIFACT, METAFILE_OF,
                    DOCUMENTATION_OF, EXAMPLE_OF, TEST_CASE_OF,
                    EXPANDED_FROM_ARCHIVE, FILE_ADDED, FILE_DELETED, FILE_MODIFIED

Other:              SPECIFICATION_FOR, REQUIREMENT_DESCRIPTION_FOR, OTHER, AMENDS

Note the inverse pairs (CONTAINS / CONTAINED_BY, DEPENDS_ON / DEPENDENCY_OF). SPDX lets you express the same edge from either direction; producers usually pick one direction and stay consistent.

Inter-document references

A single SBOM can reference packages defined in another SBOM. This is how ecosystems share license analyses without duplicating data:

"externalDocumentRefs": [
  {
    "externalDocumentId": "DocumentRef-ubuntu-base",
    "spdxDocument":       "https://ubuntu.com/sboms/22.04-base/spdx-2.3.json",
    "checksum": {
      "algorithm":     "SHA256",
      "checksumValue": "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"
    }
  }
]

Then anywhere in this document you can refer to DocumentRef-ubuntu-base:SPDXRef-Package-libc6 and consumers know exactly which libc6 you mean.

Validating an SPDX document

The official validation tools:

A well-formed SPDX document should pass all three: JSON Schema, the tooling reference checks, and license-expression validation against the License List.

Headline difference, in one sentence

SPDX is a graph of typed elements with rich licensing semantics — packages, files, snippets, and 40+ relationship types — designed first for license compliance and later extended to inventory and security. Everything else (CycloneDX, syft-json, etc.) is some compression of that idea.


Step 2: Generate a CycloneDX SBOM with Trivy

docker exec -w /work oci-lab bash -c '
  TRIVY_INSECURE=true trivy image \
    --format cyclonedx \
    --output /work/sbom.cdx.json \
    oci-registry:5000/ubuntu-curl:v1
'
2026-05-09T11:34:25Z    INFO    Detected OS  family="ubuntu" version="22.04"
2026-05-09T11:34:25Z    INFO    Number of language-specific files    num=0

$ wc -c /work/sbom.cdx.json
209683 /work/sbom.cdx.json

Trivy's CycloneDX is much smaller (210 KB vs 1.9 MB) because it doesn't catalog individual files — only packages.

What's inside

{
  "$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "serialNumber": "urn:uuid:cb37c624-b4ca-4281-aec9-8ded8176714f",
  "version": 1,
  "metadata":         { ... },           // document-level info + the root component
  "components":       [ ... ],           // 102 packages
  "dependencies":     [ ... ],           // adjacency list of the dependency graph
  "vulnerabilities":  [ ... ]            // optional, populated only when --scanners vuln is set
}

Top-level fields

FieldPurpose
bomFormatAlways "CycloneDX" — lets a parser detect the format from the first key
specVersionThe CycloneDX spec version this document targets (1.4, 1.5, 1.6...)
serialNumberA urn:uuid: per document. Two regenerations are different serial numbers but otherwise equivalent.
versionDocument revision counter. Bump when you re-publish a corrected SBOM for the same image.
metadataDocument-level metadata: timestamp, tools (what generated it), and metadata.component (see below)
componentsFlat list of every package, library, file, container, OS, framework, etc. found in the target
dependenciesAdjacency list: which bom-ref depends on which
vulnerabilitiesOptional: same vulnerability records you'd find in a Trivy report

metadata.component — the root

This is the CycloneDX equivalent of SPDX's DESCRIBES relationship. It says "this BOM is about the following thing":

"metadata": {
  "timestamp": "2026-05-09T11:34:25Z",
  "tools": {
    "components": [
      { "type": "application", "name": "trivy", "version": "0.70.0" }
    ]
  },
  "component": {
    "bom-ref": "oci-registry:5000/ubuntu-curl@sha256:0124b538...",
    "type": "container",
    "name": "oci-registry:5000/ubuntu-curl",
    "version": "sha256:0124b538...",
    "purl": "pkg:oci/ubuntu-curl@sha256:0124b538...?repository_url=oci-registry%3A5000%2Fubuntu-curl"
  }
}

The metadata.component.type here is container. Other valid types: application, framework, library, operating-system, device, firmware, file. Trivy emits container; Syft also emits container when serialising to CycloneDX.

The components array

A flat list. Order doesn't matter — the structure is in dependencies, not in nesting. A typical entry:

{
  "bom-ref": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
  "type": "library",
  "name": "apt",
  "version": "2.4.14",
  "purl": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
  "licenses": [
    { "license": { "name": "GPL-2.0-only" } }
  ],
  "properties": [
    { "name": "aquasecurity:trivy:LayerDigest", "value": "sha256:6edbc812af48..." },
    { "name": "aquasecurity:trivy:PkgID",      "value": "apt@2.4.14" },
    { "name": "aquasecurity:trivy:PkgType",    "value": "ubuntu" }
  ]
}

Things to notice:

  • bom-ref is the in-document identifier. CycloneDX lets you choose anything unique; Trivy and Syft conventionally use the PURL itself, which means refs are stable across regenerations.
  • type classifies what the component is. library is the catch-all for packages; application, framework, operating-system, firmware, file, container are the others.
  • purl is duplicated outside bom-ref so consumers that key on purl don't have to re-parse the ref.
  • properties is the format's open-ended escape hatch. Tools embed namespaced key-value pairs (aquasecurity:trivy:*, syft:*) for tool-specific metadata that doesn't fit the schema.

The dependencies array — the graph as adjacency list

This is the structural counterpart to SPDX's relationships. It lists, for each bom-ref, the bom-refs it depends on:

"dependencies": [
  {
    "ref": "oci-registry:5000/ubuntu-curl@sha256:0124b538...",
    "dependsOn": [
      "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
      "pkg:deb/ubuntu/bash@5.1-6ubuntu1.1?arch=arm64&distro=ubuntu-22.04",
      "pkg:deb/ubuntu/libc6@2.35-0ubuntu3.4?arch=arm64&distro=ubuntu-22.04"
    ]
  },
  {
    "ref": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
    "dependsOn": [
      "pkg:deb/ubuntu/libapt-pkg6.0@2.4.14?arch=arm64&distro=ubuntu-22.04",
      "pkg:deb/ubuntu/libc6@2.35-0ubuntu3.4?arch=arm64&distro=ubuntu-22.04"
    ]
  }
]

CycloneDX explicitly does not distinguish runtime vs build-time vs static-link dependencies in the core dependencies schema (CycloneDX 1.6 added an optional dependencyType property to address this). For most container-image SBOMs that's fine — everything is a runtime dependency once the image is built.

Same data, different shape — SPDX vs CycloneDX side by side

For the same apt package:

ConcernSPDXCycloneDX
Internal IDSPDXRef-Package-deb-apt-5be364a4af57b701bom-ref: "pkg:deb/ubuntu/apt@2.4.14?..."
Cross-ecosystem IDexternalRefs array (PURL + CPE)purl field (CPE optional, in cpe field)
Files ownedrelationships[CONTAINS] from package to file SPDXIDsNot represented (Trivy CycloneDX); file-level requires extra components of type: "file"
IntegritypackageVerificationCode (SHA-1 over file hashes)hashes array per component (less common in OS-package SBOMs)
LicenselicenseConcluded + licenseDeclared (separate fields)licenses array (single notion)
Graph edgesrelationships array (typed: CONTAINS, DEPENDS_ON, STATIC_LINK, ...)dependencies array (single edge type)
Tool metadatacreationInfo.creatorsmetadata.tools.components
Document rootSPDXRef-DOCUMENT + DESCRIBES relationshipmetadata.component

The two formats describe the same reality. SPDX is more granular (separate concluded vs declared license, file-level relationships, multiple typed edges); CycloneDX is more compact and easier to round-trip programmatically. Most modern tools speak both and you can losslessly convert OS-level SBOMs between them with cyclonedx-cli convert or syft convert.


Aside: One Downstream Use — Vulnerability Scanning

The SBOM by itself is just an inventory. The most common downstream use is feeding the PURL/CPE list into a vulnerability database to discover CVEs. We won't dwell on it (this post is about SBOMs, not scanners), but here's the one-paragraph version of how the pipeline works:

SBOM (PURL/CPE per package) ──► lookup in vuln DB (NVD, OSV.dev, distro tracker)
                              ──► match version against affected ranges
                              ──► report CVEs + severity + fix version

For reference, running trivy image on our ubuntu-curl:v1 reports 35 CVEs (18 LOW, 17 MEDIUM, 0 HIGH/CRITICAL). A sample finding:

{
  "VulnerabilityID": "CVE-2026-27456",
  "PkgName": "bsdutils",
  "InstalledVersion": "1:2.37.2-4ubuntu3.5",
  "FixedVersion": null,
  "Severity": "MEDIUM",
  "Title": "util-linux: TOCTOU in the mount program when setting up loop devices"
}

Note the matching key: "PkgName": "bsdutils", "InstalledVersion": "1:2.37.2-4ubuntu3.5" — that's the same name+versionInfo we saw in the SBOM. The scanner looked up the package's PURL in its database and got back a CVE list. The SBOM was the input, the database was the lookup table, the CVE report was the output. Once you have an SBOM, this scan can run anywhere — you don't need access to the original image.


Step 3: Attach the SBOM to the Image as an OCI Artifact

Now the interesting part: we attach the SBOM to the registry next to the image, using the OCI 1.1 subject + Referrers mechanism we explored in Part 4.

What oras attach does

docker exec -w /work oci-lab bash -c '
  oras attach --plain-http \
    --artifact-type application/spdx+json \
    oci-registry:5000/ubuntu-curl:v1 \
    sbom.spdx.json:application/spdx+json
'
Uploading 28164ea0c196 sbom.spdx.json
Uploaded  28164ea0c196 sbom.spdx.json
Attached to [registry] oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
Digest: sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4

Three things happened:

  1. The SBOM JSON file was pushed as a blob (sha256:28164ea0..., 1.9 MB).
  2. An OCI manifest was created describing it (sha256:f50bb644...).
  3. The manifest's subject field points to our image manifest.

Let's also attach the CycloneDX one:

docker exec -w /work oci-lab bash -c '
  oras attach --plain-http \
    --artifact-type application/vnd.cyclonedx+json \
    oci-registry:5000/ubuntu-curl:v1 \
    sbom.cdx.json:application/vnd.cyclonedx+json
'
Uploading fb4cd9377fac sbom.cdx.json
Uploaded  fb4cd9377fac sbom.cdx.json
Attached to [registry] oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
Digest: sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499

The SBOM Manifest — Look at What Was Just Pushed

docker exec oci-lab bash -c '
  curl -s "http://oci-registry:5000/v2/ubuntu-curl/manifests/sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4" \
    -H "Accept: application/vnd.oci.image.manifest.v1+json" | jq .
'
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/spdx+json",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "layers": [
    {
      "mediaType": "application/spdx+json",
      "digest": "sha256:28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195",
      "size": 1943149,
      "annotations": {
        "org.opencontainers.image.title": "sbom.spdx.json"
      }
    }
  ],
  "subject": {
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "digest": "sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8",
    "size": 424
  },
  "annotations": {
    "org.opencontainers.image.created": "2026-05-09T11:36:17Z"
  }
}

This is the same shape as a notation signature manifest. The only differences:

FieldNotation SignatureSBOM (this manifest)
artifactTypeapplication/vnd.cncf.notary.signatureapplication/spdx+json
config.mediaTypeapplication/vnd.cncf.notary.signatureapplication/vnd.oci.empty.v1+json
layers[0].mediaTypeapplication/coseapplication/spdx+json
subjectimage manifest digestimage manifest digest (same!)

The subject field works identically. The OCI registry doesn't care that one is a signature and the other is an SBOM — both are just manifests with a subject.

The empty config ({}, 2 bytes, mediaType application/vnd.oci.empty.v1+json) is the OCI-spec-blessed "I have no config" placeholder. Notice the data: "e30=" field — that's {} base64-encoded inlined directly into the manifest, so even fetching the config blob is optional.


Step 4: Discover Attached Artifacts via the Referrers Mechanism

Method 1 — oras discover (the convenient way)

docker exec oci-lab oras discover --plain-http oci-registry:5000/ubuntu-curl:v1 --format tree
oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
├── application/spdx+json
│   └── sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
└── application/vnd.cyclonedx+json
    └── sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499

Both SBOMs are now discoverable, grouped by artifactType.

Method 2 — Raw HTTP via the Referrers tag fallback

registry:2 doesn't support the OCI 1.1 Referrers API natively, so oras (and notation, and trivy) all use the tag-based fallback: a tag named sha256-<hex> whose content is an OCI Image Index listing all referrers.

docker exec oci-lab bash -c '
  curl -s http://oci-registry:5000/v2/ubuntu-curl/tags/list | jq .
'
{
  "name": "ubuntu-curl",
  "tags": [
    "sha256-0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8",
    "v1"
  ]
}

Notice the sha256-... tag — that's the referrer index, named after our image's manifest digest with : replaced by -.

docker exec oci-lab bash -c '
  curl -s "http://oci-registry:5000/v2/ubuntu-curl/manifests/sha256-0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8" \
    -H "Accept: application/vnd.oci.image.index.v1+json" | jq .
'
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4",
      "size": 730,
      "artifactType": "application/spdx+json",
      "annotations": {
        "org.opencontainers.image.created": "2026-05-09T11:36:17Z"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499",
      "size": 746,
      "artifactType": "application/vnd.cyclonedx+json",
      "annotations": {
        "org.opencontainers.image.created": "2026-05-09T11:36:17Z"
      }
    }
  ]
}

The artifactType annotation lets clients filter: "Give me the SPDX one only" or "Give me everything signature-related". Notation signatures, SBOMs, vulnerability scans, and SLSA attestations all live side by side under the same parent image.

Method 3 — The Referrers API (when supported)

On a registry that supports OCI 1.1 natively (Zot, Harbor 2.9+, GHCR, ECR, ACR, Docker Hub):

GET /v2/ubuntu-curl/referrers/sha256:0124b538...
Accept: application/vnd.oci.image.index.v1+json
→ 200 OK
→ Body: <same Image Index as above, computed dynamically by the registry>

Optionally filter by artifact type:

GET /v2/ubuntu-curl/referrers/sha256:0124b538...?artifactType=application/spdx+json
→ 200 OK
→ OCI-Filters-Applied: artifactType
→ Body: <Image Index containing only SPDX referrers>

Clients try the API first; fall back to the tag if it returns 404.


Step 5: Pull the SBOM Back

A consumer (CI pipeline, security scanner, admission controller) can pull the SBOM by content:

docker exec oci-lab bash -c '
  mkdir -p /tmp/pulled-sboms && cd /tmp/pulled-sboms
  oras pull --plain-http \
    oci-registry:5000/ubuntu-curl@sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
  ls -la
'
Downloaded  28164ea0c196 sbom.spdx.json
Pulled [registry] oci-registry:5000/ubuntu-curl@sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
Digest: sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4

-rw-r--r-- 1 root root 1943149 May  9 11:36 sbom.spdx.json

Verify the content matches what we pushed:

docker exec oci-lab sha256sum /work/sbom.spdx.json /tmp/pulled-sboms/sbom.spdx.json
28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195  /work/sbom.spdx.json
28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195  /tmp/pulled-sboms/sbom.spdx.json

Identical. Content-addressable storage at work.


The Big Picture — Object Inventory

BEFORE attaching SBOMs:
┌─────────────────────────────────────────────────────────────────┐
│  Tags:                                                          │
│    v1 ──► sha256:0124b538... (image manifest)                   │
│                                                                 │
│  Blobs:                                                         │
│    sha256:0124b538... = image manifest    (424 B)               │
│    sha256:8bdde1d7... = image config      (2,069 B)             │
│    sha256:6edbc812... = Ubuntu layer      (27,606,543 B)        │
└─────────────────────────────────────────────────────────────────┘

AFTER attaching SPDX + CycloneDX SBOMs:
┌─────────────────────────────────────────────────────────────────┐
│  Tags:                                                          │
│    v1 ──► sha256:0124b538... (image manifest, unchanged)        │
│    sha256-0124b538... ──► referrer index                        │
│                                                                 │
│  Original blobs (untouched, image digest unchanged):            │
│    sha256:0124b538... = image manifest    (424 B)               │
│    sha256:8bdde1d7... = image config      (2,069 B)             │
│    sha256:6edbc812... = Ubuntu layer      (27,606,543 B)        │
│                                                                 │
│  New blobs from SBOM attachment:                                │
│    sha256:f50bb644... = SPDX manifest      (730 B)              │
│    sha256:39519e85... = CycloneDX manifest (746 B)              │
│    sha256:44136fa3... = empty config {}    (2 B, shared)        │
│    sha256:28164ea0... = SPDX SBOM blob     (1,943,149 B)        │
│    sha256:fb4cd937... = CycloneDX SBOM blob (209,683 B)         │
│                                                                 │
│  Relationships:                                                 │
│    referrer index ──lists──► [SPDX manifest, CycloneDX manifest]│
│    SPDX manifest ──subject──► image manifest                    │
│    CycloneDX manifest ──subject──► image manifest               │
└─────────────────────────────────────────────────────────────────┘

The image is untouched. Its manifest digest is exactly the same before and after. Anyone pinning ubuntu-curl@sha256:0124b538... gets bit-for-bit identical bytes. The SBOMs live alongside, discoverable but separate.


Why This Design Wins

The OCI 1.1 subject + Referrers approach has three properties that older "embed-it-in-the-image" approaches lack:

1. The signed image stays signed

If you embedded the SBOM as an extra layer in the image, the image manifest digest would change every time you regenerated the SBOM. That breaks digest pinning, breaks signatures, and forces re-signing on every SBOM update. With referrers, the image is immutable; metadata is mutable.

2. Anyone can attach anything, anytime

You don't need to modify the image to attach an SBOM. Your CI pipeline can build and push the image, then a separate stage (or a completely separate team/service) can run syft and attach the result. Vulnerability scans can be re-run weekly and re-attached without touching the image.

3. One mechanism for all metadata

Same plumbing for everything:

Image manifest: sha256:0124b538...
  ↑ subject (referrers)
  ├── application/vnd.cncf.notary.signature  ← Notation signature
  ├── application/spdx+json                  ← SBOM (SPDX)
  ├── application/vnd.cyclonedx+json         ← SBOM (CycloneDX)
  ├── application/vnd.in-toto+json           ← SLSA provenance attestation
  └── application/sarif+json                 ← Vulnerability scan results

The registry doesn't need plugins, special endpoints, or knowledge of these formats. It just stores manifests with subject fields and serves them via GET /v2/<name>/referrers/<digest> or the tag fallback.


Production Patterns

Pattern 1: Generate-and-attach in CI

# After docker push, before declaring success:
- name: Generate SBOM
  run: syft $IMAGE_REF -o spdx-json=sbom.spdx.json

- name: Attach SBOM
  run: |
    oras attach \
      --artifact-type application/spdx+json \
      $IMAGE_REF \
      sbom.spdx.json:application/spdx+json

Pattern 2: Admission-time verification

A Kubernetes admission controller (Kyverno, Ratify) can require both:

  • A valid Notation signature (proves who built it)
  • An attached SBOM with no CRITICAL CVEs (proves what's in it)

Both are discoverable through the same Referrers API call — Kubernetes admission gets the full provenance story in one place.

Pattern 3: SBOM diffing across versions

Pull SBOMs for myapp:v1 and myapp:v2, diff their package lists, and you have an automated changelog of dependencies. New packages → review for licensing. Removed packages → potential dead code. Version bumps → compare against vulnerability feeds.


Recap

In this part we:

StepToolResult
Generated SPDX SBOMsyft1.9 MB JSON, 102 packages, 2,290 files
Generated CycloneDX SBOMtrivy210 KB JSON, 102 components
Scanned for CVEstrivy35 vulnerabilities (18 LOW, 17 MEDIUM)
Attached SPDX as OCI artifactoras attachNew manifest with subject → image
Attached CycloneDX as OCI artifactoras attachSecond referrer alongside SPDX
Discovered attached artifactsoras discover + raw HTTPTree view + Image Index via referrer tag
Pulled the SBOM backoras pullBit-identical to source (same sha256)

The big takeaway

An SBOM is just an OCI artifact with artifactType: application/spdx+json (or application/vnd.cyclonedx+json). It uses the exact same plumbing as a notation signature: a manifest with a subject field, discoverable via the Referrers API or its tag-based fallback.

Once you understand subject + Referrers, you understand:

  • Notation signatures (Part 4)
  • SBOMs (this part)
  • Vulnerability scan results
  • SLSA build provenance
  • Anything else the supply-chain world dreams up next

The OCI registry is no longer just a place to store images — it's a content-addressable graph of software, its provenance, and everything we know about it.


Cleanup

docker rm -f oci-registry oci-lab
docker network rm oci-net

Series Recap

Across the five parts we went from spec to running container to a fully signed-and-described supply-chain artifact, end to end, with no magic:

PartWe didSpec
1Built an OCI image by hand — manifest, config, layers, content-addressable blobsOCI Image Spec
2Pushed and pulled with raw curl against an nginx "registry"OCI Distribution Spec
3Ran a container with chroot, unshare, mount, overlayfs — no runc neededOCI Runtime Spec
4Signed an image with Notation and dissected the COSE envelope + Referrers indexOCI 1.1 Referrers / Notary Project
5 (this)Attached SBOMs as OCI artifacts and discovered them via the same Referrers mechanismOCI 1.1 Referrers / Reference Types

You now have the mental model to read every CVE feed, every supply-chain SBOM, every container registry response, and recognize what it is and where it fits.


Every digest, byte count, package count, CVE ID, and command output in this post was captured from an actual run inside Docker Desktop for Mac (arm64) on May 9, 2026. Tools used: registry:2, syft 1.44.0, oras 1.2.0, trivy 0.70.0, skopeo, curl, jq.