Part 5: SBOMs for Container Images — Generate, Attach, and Discover with OCI Referrers

Series · Understanding OCI from the Ground Up · Part 5 of 5

Series: Understanding OCI from the Ground Up (Part 5 of 5)

In Part 1 we built an OCI image. In Part 2 we pushed it with raw HTTP. In Part 3 we ran it with bare Linux primitives. In Part 4 we signed it with Notation and saw how the OCI 1.1 `subject` + Referrers mechanism works. Now we use the exact same plumbing to attach a Software Bill of Materials (SBOM) to that image — proving the design generalizes far beyond signatures.

What is an SBOM?

A Software Bill of Materials (SBOM) is a machine-readable inventory of everything inside a piece of software. For a container image, an SBOM tells you:

Every OS package with name, version, license, and supplier (apt 2.4.14, libc6 2.35-0ubuntu3.4, ...)
Every language-level dependency (npm modules, pip wheels, Go modules, Maven JARs)
Every file delivered by each package, with its hash
The relationships between them — which package contains which file, which package depends on which other package, which package is the root "thing" the SBOM is about
Cross-ecosystem identifiers (PURLs, CPEs) so the SBOM can be cross-referenced with package registries, advisory feeds, and license databases

Think of an SBOM as a typed graph: nodes are packages and files, edges are typed relationships (CONTAINS, DEPENDS_ON, DESCRIBES), and every node carries enough metadata to be uniquely identified across the world.

Why SBOMs exist:

Inventory — You can't manage what you can't see. An SBOM is the first honest answer to "what's actually in this image?"
Reproducibility & provenance — Two builds of the same Dockerfile a week apart can pull in different upstream versions. An SBOM captures the exact set that shipped.
License compliance — The original driver behind SPDX (2010). Knowing every package's licenseDeclared is a legal requirement in many regulated industries.
Vulnerability matching — A scanner can take an SBOM and look up each package's PURL/CPE in a vulnerability database to find known CVEs (we'll see this briefly later).
Compliance mandates — US Executive Order 14028 and the EU Cyber Resilience Act require SBOMs for software shipped to government and regulated buyers.
Supply chain integrity — Combined with the Notation signatures from Part 4, SBOMs let you verify what's inside an image alongside who built it.

Background — Identifying a Package

Before we look at any SBOM file, we need to answer one question: given a file on disk, how do you describe a package precisely enough that a tool on the other side of the world can recognize it?

The answer is two parallel naming systems: PURL and CPE. Almost every package entry in every SBOM you'll ever see carries both.

PURL — Package URL (the modern identifier)

A PURL (spec) is a single string that uniquely identifies a package across ecosystems. Format:

pkg:<type>/<namespace>/<name>@<version>?<qualifiers>

Part	Meaning	Example
`<type>`	Package ecosystem	`deb`, `rpm`, `apk`, `npm`, `pypi`, `golang`, `maven`, `cargo`, `oci`
`<namespace>`	Distro / org / scope (optional)	`ubuntu`, `debian`, `@angular`, `github.com/gorilla`
`<name>`	Package name	`apt`, `lodash`, `requests`
`<version>`	Exact version string	`2.4.14`, `4.17.21`, `v1.8.0`
`<qualifiers>`	Disambiguators (optional)	`arch=arm64`, `distro=ubuntu-22.04`, `epoch=1`

Examples you'll see in this post:

PURL	What it means
`pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04`	Debian package `apt` 2.4.14 from Ubuntu 22.04, arm64
`pkg:npm/lodash@4.17.21`	npm package `lodash` 4.17.21
`pkg:pypi/django@4.2.7`	PyPI package `django` 4.2.7
`pkg:golang/github.com/gorilla/mux@v1.8.0`	Go module `gorilla/mux` v1.8.0
`pkg:oci/ubuntu-curl@sha256:0124b538...`	The container image itself, by digest

PURLs are the modern community standard — used by OSV.dev, GitHub Advisory Database, Snyk, Trivy, Syft, and almost every new tool. Given a PURL, a scanner can look up known vulnerabilities in seconds.

CPE — Common Platform Enumeration

A CPE (NIST spec) is the identifier scheme used by NIST's National Vulnerability Database. Format:

cpe:2.3:<part>:<vendor>:<product>:<version>:<update>:<edition>:<lang>:<sw_edition>:<target_sw>:<target_hw>:<other>

<part> is a (application), o (operating system), or h (hardware). Asterisks are wildcards. Example:

cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*
        ↑  ↑       ↑
     part vendor   version

CPEs predate PURLs by about a decade. They live on because NVD and many enterprise tools still use them. Modern SBOMs include both — PURL because that's what the open-source ecosystem uses, CPE because that's what NVD uses.

Why two systems? History. CPE came from US-government compliance work in the early 2000s; PURL came from the open-source community in the late 2010s. SBOM generators emit both so downstream tools can pick whichever they understand.

Where these identifiers go

In an SPDX SBOM, each package's externalRefs array carries them:

"externalRefs": [
  { "referenceCategory": "SECURITY",         "referenceType": "cpe23Type", "referenceLocator": "cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*" },
  { "referenceCategory": "PACKAGE-MANAGER",  "referenceType": "purl",      "referenceLocator": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04" }
]

In a CycloneDX SBOM, PURL is a first-class field on every component:

{ "name": "apt", "version": "2.4.14", "purl": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04" }

With those two identifiers in hand, an SBOM is portable knowledge about an image — anyone, anywhere, with any tool, can pick it up and reason about it.

SBOM Formats — SPDX and CycloneDX

Two standards dominate. Both are JSON. Both describe the same things. Different communities chose different schemas:

	SPDX	CycloneDX
Steward	Linux Foundation (ISO/IEC 5962:2021)	OWASP
Origin	License compliance (2010)	Application security (2017)
Identifier scheme	SPDXRef-* (internal) + PURL (external)	bom-ref + PURL
Top-level units	`packages` + `files` + `relationships`	`components` + `dependencies`
Vulnerabilities	Via separate VEX docs	Built into the BOM (`vulnerabilities` key)
Default for	`syft` (Anchore), Kubernetes	`trivy` (Aqua), `cyclonedx-cli`

In practice, both formats describe the same image. Tools convert between them. We'll generate both.

How Syft Identifies Packages — Catalogers and Evidence

Before we run any commands, here's the mental model for what syft (or any SBOM generator) actually does inside.

Syft does not "scan binaries" or run heuristics. It runs a fleet of small, specialised programs called catalogers, each of which knows how to recognise one specific kind of evidence on a filesystem.

What a cataloger is

A cataloger is a Go module inside Syft (and similar in other tools) with one job:

Walk a filesystem. Recognise the metadata files of one packaging system. Parse them. Emit a list of structured Package records.

Each cataloger looks at a few specific path patterns and parses files in formats it knows. They run independently and their outputs are merged.

The catalogers Syft ships with

A short tour of catalogers relevant to container images:

Cataloger	Looks for	Parses
dpkg-db	`/var/lib/dpkg/status`, `/var/lib/dpkg/info/*.md5sums`	Debian/Ubuntu OS packages
rpm-db	`/var/lib/rpm/Packages` (Berkeley DB or sqlite)	RPM packages on Red Hat / Fedora / SUSE
apk-db	`/lib/apk/db/installed`	Alpine packages
java-archive	`.jar`, `.war`, `*.ear` (and their `META-INF/MANIFEST.MF`, `pom.properties`)	Java libraries
python-package	`.dist-info/METADATA`, `.egg-info/PKG-INFO`, `requirements.txt`	Installed PyPI wheels and pip-style declarations
javascript-package	`package.json`, `package-lock.json`, `yarn.lock`	npm modules
go-module-binary	ELF binaries with embedded module info	Go modules statically compiled into a binary
go-mod-file	`go.mod`, `go.sum`	Declared Go dependencies
rust-cargo	`Cargo.lock`	Rust crates
ruby-gemspec	`*.gemspec`, `Gemfile.lock`	Ruby gems
php-composer	`composer.lock`, `installed.json`	PHP Composer packages
binary-classifier	Specific binaries (`node`, `python3`, `httpd`, `nginx`, ...)	Identifies a known binary by its byte signature and reads its embedded version

For the full list: syft cataloger list. As of Syft 1.44 there are 30+ catalogers covering every major ecosystem.

Evidence sources — where the data actually comes from

For our ubuntu:22.04 image, the dpkg-db cataloger is the only one that finds anything. Watch what it reads. Recall the sourceInfo field on the apt package later in this post:

acquired package info from DPKG DB:
  /var/lib/dpkg/status
  /usr/share/doc/apt/copyright
  /var/lib/dpkg/info/apt.conffiles
  /var/lib/dpkg/info/apt.md5sums
  /var/lib/dpkg/info/apt.list
  /var/lib/dpkg/info/apt.postinst
  /var/lib/dpkg/info/apt.postrm
  /var/lib/dpkg/info/apt.preinst
  /var/lib/dpkg/info/apt.prerm
  /var/lib/dpkg/info/apt.shlibs
  /var/lib/dpkg/info/apt.triggers

That list is verbatim what dpkg itself maintains for every installed package. status gives name/version/architecture/dependencies; copyright gives license text; *.md5sums gives the exact list of files belonging to that package and their MD5 hashes; *.list gives the full file paths.

This is not magic. Syft's dpkg cataloger essentially reads the same files dpkg --status apt would read — it just does it without running dpkg, by parsing the files directly. That's why Syft works on a static filesystem (a tarball, a pulled image, an OCI registry blob) without needing the package manager installed.

How Syft sees a container image

A container image is a stack of layer tarballs. Syft does this:

1. Pull / open the image (from a registry, daemon, tarball, or directory)
2. Build a layered filesystem view in memory (the "squashed" view, plus per-layer detail)
3. For each registered cataloger:
   a. Use the cataloger's path-glob pattern to find candidate files
   b. Parse each candidate file
   c. Emit Package records
4. Run a relationships pass:
   - Tie each package to the files it owns (from .md5sums / .list)
   - Tie packages to the layer they came from
   - Tie everything to the image as the root
5. Emit the final SBOM in the requested format (SPDX, CycloneDX, syft-json)

Steps 3 and 4 are why an SBOM is a graph, not just a list. The relationships are what make queries like "which files in layer 2 belong to which package?" possible.

What syft — and any SBOM tool — cannot do reliably

Worth being honest about the limits, because the SBOM is only as good as the catalogers' coverage:

Statically linked binaries with no metadata (a Go binary built with -trimpath and stripped) often show up as "unidentified files" or just a binary classifier hit — the version may be wrong or missing.
Code copied into the source tree (vendored without a manifest) is invisible. There is no metadata file to read.
Custom-compiled libraries dropped into /usr/local/lib without a package manager record are invisible to OS-package catalogers; they may still show up via the binary-classifier if syft happens to recognise their signature.
Application-level dependencies inside a built artifact (e.g. node_modules already bundled into a single dist.js) usually require running the bundler-aware cataloger before bundling, not after.

This is why generating the SBOM at build time — when lockfiles and intermediate artifacts are still present — is the production best practice. Generating it from a finished image is still useful, just less complete.

Prerequisites — The Lab

We use the same network-of-containers pattern from Parts 2 and 3: a real OCI registry plus a lab container with our tools.

# Create network and start the registry
docker network create oci-net
docker run -d --name oci-registry --network oci-net -p 5000:5000 registry:2
docker run -d --name oci-lab --network oci-net ubuntu:22.04 sleep 7200

# Install base tools in the lab
docker exec oci-lab bash -c \
  'apt-get update -qq && apt-get install -y -qq curl jq skopeo ca-certificates > /dev/null 2>&1'

Install syft, oras, and trivy

docker exec oci-lab bash -c '
  # syft — generates SBOMs (SPDX, CycloneDX, syft-json)
  curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

  # oras — pushes/pulls arbitrary OCI artifacts (the swiss-army knife for OCI 1.1)
  curl -sSLo /tmp/oras.tar.gz "https://github.com/oras-project/oras/releases/download/v1.2.0/oras_1.2.0_linux_arm64.tar.gz"
  tar -xzf /tmp/oras.tar.gz -C /tmp/ && mv /tmp/oras /usr/local/bin/

  # trivy — vulnerability scanner that also generates CycloneDX SBOMs
  curl -sSLo /tmp/trivy.tar.gz "https://github.com/aquasecurity/trivy/releases/download/v0.70.0/trivy_0.70.0_Linux-ARM64.tar.gz"
  tar -xzf /tmp/trivy.tar.gz -C /tmp/ && mv /tmp/trivy /usr/local/bin/
'

Note: Replace arm64 / ARM64 with amd64 / x86_64 if you're on Intel.

Verify:

$ syft version | head -3
Application:   syft
Version:       1.44.0
BuildDate:     2026-05-01T17:11:01Z

$ oras version | head -3
Version:        1.2.0
Go version:     go1.22.3

$ trivy --version | head -2
Version: 0.70.0

Push a target image

We'll generate SBOMs for the same ubuntu-curl:v1 image we used in Part 2:

docker exec oci-lab skopeo copy --dest-tls-verify=false \
  docker://ubuntu:22.04 \
  docker://oci-registry:5000/ubuntu-curl:v1

Capture the manifest digest — the SBOM will reference it via the subject field:

docker exec oci-lab bash -c '
  curl -sI http://oci-registry:5000/v2/ubuntu-curl/manifests/v1 \
    -H "Accept: application/vnd.oci.image.manifest.v1+json" \
    | grep -i docker-content-digest
'

Docker-Content-Digest: sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8

Step 1: Generate an SPDX SBOM with Syft

Syft can read images directly from a registry. Since our registry uses plain HTTP, we tell syft to allow that:

docker exec -w /work oci-lab bash -c '
  export SYFT_REGISTRY_INSECURE_USE_HTTP=true
  syft registry:oci-registry:5000/ubuntu-curl:v1 \
    -o spdx-json=/work/sbom.spdx.json
'

Result: a 1.9 MB JSON file describing every package and file in the image.

$ ls -la /work/sbom.spdx.json
-rw-r--r-- 1 root root 1943149 May  9 11:34 /work/sbom.spdx.json

What's inside

Top-level structure of an SPDX 2.3 document. The header is a handful of scalar fields plus creationInfo; the payload is the four arrays at the bottom:

{
  "spdxVersion":       "SPDX-2.3",
  "dataLicense":       "CC0-1.0",
  "SPDXID":            "SPDXRef-DOCUMENT",
  "name":              "oci-registry:5000/ubuntu-curl",
  "documentNamespace": "https://anchore.com/syft/image/oci-registry-5000/ubuntu-curl-da9454a6-742f-497e-a5db-16ae9aa0b48f",

  "creationInfo": {
    "licenseListVersion": "3.28",
    "creators": [
      "Organization: Anchore, Inc",
      "Tool: syft-1.44.0"
    ],
    "created": "2026-05-09T11:34:11Z"
  },

  "packages":                   [ /* 102 entries */    ],
  "files":                      [ /* 2,290 entries */  ],
  "relationships":              [ /* 2,848 entries */  ],
  "hasExtractedLicensingInfos": [ /* custom licenses */ ]
}

Header fields (scalars + the creationInfo object) describe the document itself — covered in detail later under Document-level fields.

The four payload arrays are where the actual SBOM data lives:

Array	Count (our image)	What it holds
`packages`	102	Every package Syft identified — OS packages (deb), language-level deps (none in this image), and one entry for the image itself as the root
`files`	2,290	Every file Syft cataloged, each with name, multiple checksums, and an SPDXID. Present because `filesAnalyzed: true` on the packages.
`relationships`	2,848	Typed edges between SPDXIDs — `DESCRIBES`, `CONTAINS`, `DEPENDS_ON`, etc. This is what makes the document a graph rather than a flat list.
`hasExtractedLicensingInfos`	varies	Full text of any non-standard license (`LicenseRef-*`) referenced from `licenseDeclared` / `licenseConcluded`. Empty if every package uses a standard SPDX License List ID.

A few other arrays the spec defines that may or may not appear, depending on the producer and the input:

Array	When it shows up
`snippets`	Source-code analysis tools (FOSSology, ScanCode). Almost never in container-image SBOMs.
`annotations`	Document-level reviewer/tool comments. Optional.
`externalDocumentRefs`	When this SBOM references packages defined in another SBOM (e.g. an app SBOM pointing at a base-image SBOM). Optional.

So the honest summary is: SPDX 2.3 has four payload arrays you'll see in nearly every container-image SBOM (packages, files, relationships, hasExtractedLicensingInfos), plus three optional ones (snippets, annotations, externalDocumentRefs) that show up in specialised use cases.

A sample package

{
  "name": "apt",
  "SPDXID": "SPDXRef-Package-deb-apt-5be364a4af57b701",
  "versionInfo": "2.4.14",
  "supplier": "NOASSERTION",
  "downloadLocation": "NOASSERTION",
  "filesAnalyzed": true,
  "packageVerificationCode": {
    "packageVerificationCodeValue": "e75a97363fdfe68c12c4bb109d55771cae4f3a3c"
  },
  "sourceInfo": "acquired package info from DPKG DB: /var/lib/dpkg/status, /usr/share/doc/apt/copyright, /var/lib/dpkg/info/apt.conffiles, ...",
  "licenseConcluded": "NOASSERTION",
  "licenseDeclared": "GPL-2.0-only AND LicenseRef-GPLv2-",
  "copyrightText": "NOASSERTION",
  "externalRefs": [
    {
      "referenceCategory": "SECURITY",
      "referenceType": "cpe23Type",
      "referenceLocator": "cpe:2.3:a:apt:apt:2.4.14:*:*:*:*:*:*:*"
    },
    {
      "referenceCategory": "PACKAGE-MANAGER",
      "referenceType": "purl",
      "referenceLocator": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04"
    }
  ]
}

Every field is doing meaningful work. Walking the package entry top-to-bottom:

Field	What it carries	Why it's there
`name`, `versionInfo`	Human-readable identity	Lets people read the SBOM
`SPDXID`	Document-internal ID (`SPDXRef-Package-deb-apt-5be364a4af57b701`)	Used as the source/target of `relationships` entries (see below). The hex suffix is a content hash so two builds emit stable IDs.
`supplier`	Person / Organization who supplies the package	Often `NOASSERTION` for OS packages where dpkg doesn't track this cleanly
`downloadLocation`	Where this version can be re-fetched	`NOASSERTION` if not known
`filesAnalyzed`	`true` if Syft enumerated the package's files	Determines whether `packageVerificationCode` is meaningful
`packageVerificationCode`	SHA-1 over the sorted list of file SHA-1s belonging to the package	Tamper-evident: if any file changes, this value changes. Reproducible across builds.
`sourceInfo`	Free-text trace of which files Syft read to learn about this package	Provenance for the SBOM itself — you can audit Syft's evidence trail
`licenseConcluded`	License concluded by analysis	What an analyst concluded after reading. `NOASSERTION` means "no claim". The two fields exist precisely because they can disagree.
`licenseDeclared`	License declared by the upstream packager (here, from `debian/copyright`)	What the project says it is
`copyrightText`	Copyright notice text	License-compliance use case
`externalRefs`	Cross-ecosystem identifiers (PURL, CPE, etc.)	The portable handles other tools use

externalRefs — not just PURL and CPE

The SPDX spec defines several referenceCategory values that you'll see in the wild:

`referenceCategory`	`referenceType` examples	Used for
`PACKAGE-MANAGER`	`purl`, `npm`, `maven-central`	Cross-ecosystem package handle (PURL is the universal one)
`SECURITY`	`cpe23Type`, `cpe22Type`, `advisory`, `fix`, `url`	Identifiers for vulnerability matching, plus links to advisories
`PERSISTENT-ID`	`swh` (Software Heritage), `gitoid`	Long-term archival identifiers — the package's source code by content-hash
`OTHER`	(anything)	Custom locator types tools have invented

packageVerificationCode — the integrity check

The SPDX spec defines packageVerificationCode as: take every file SPDX considers part of this package, compute SHA-1 of each, sort the hex strings, concatenate, and SHA-1 the result. The output is stable across machines, OSes, and time. Two builds of the same package always produce the same value; any tampering with any owned file changes it. Together with licenseDeclared and externalRefs, this is what makes SPDX records independently verifiable, not just descriptive.

The `files` array — file-level evidence

With filesAnalyzed: true, the SPDX document contains a files array. In our SBOM that's 2,290 entries. A typical entry:

{
  "SPDXID": "SPDXRef-File-bin-bash-3a7f1c8b9e2d4f56",
  "fileName": "/bin/bash",
  "checksums": [
    { "algorithm": "SHA1",   "checksumValue": "a8c1b..." },
    { "algorithm": "SHA256", "checksumValue": "3f617f3..." },
    { "algorithm": "MD5",    "checksumValue": "44136fa..." }
  ],
  "licenseConcluded": "NOASSERTION",
  "copyrightText": "NOASSERTION"
}

Files get their own SPDXIDs because they are first-class nodes in the relationships graph.

The `relationships` array — where the graph lives

This is the part most people miss when they first read an SBOM. Our document has 2,848 relationship entries. Each is a typed edge between two SPDXIDs:

[
  {
    "spdxElementId":      "SPDXRef-DOCUMENT",
    "relationshipType":   "DESCRIBES",
    "relatedSpdxElement": "SPDXRef-Package-oci-ubuntu-curl-..."
  },
  {
    "spdxElementId":      "SPDXRef-Package-oci-ubuntu-curl-...",
    "relationshipType":   "CONTAINS",
    "relatedSpdxElement": "SPDXRef-Package-deb-apt-5be364a4af57b701"
  },
  {
    "spdxElementId":      "SPDXRef-Package-deb-apt-5be364a4af57b701",
    "relationshipType":   "CONTAINS",
    "relatedSpdxElement": "SPDXRef-File-usr-bin-apt-..."
  },
  {
    "spdxElementId":      "SPDXRef-Package-deb-apt-5be364a4af57b701",
    "relationshipType":   "DEPENDS_ON",
    "relatedSpdxElement": "SPDXRef-Package-deb-libapt-pkg6.0-..."
  }
]

The relationship types you'll see most often:

Type	Meaning
`DESCRIBES`	The document describes the target. Used once, from `SPDXRef-DOCUMENT` to the root package (here, the image).
`CONTAINS`	The source contains the target as a subcomponent. Image `CONTAINS` packages; package `CONTAINS` files.
`DEPENDS_ON`	The source requires the target at runtime.
`BUILD_DEPENDENCY_OF`	The source is required only at build time.
`PATCH_FOR`	The source is a patch for the target.
`STATIC_LINK` / `DYNAMIC_LINK`	The source links the target statically/dynamically.

This is what makes SPDX a graph format and not just a list. Queries like "which files belong to package X?" or "what happens if I remove package Y?" are all just graph traversals over the relationships array.

SPDXRef-DOCUMENT
     │ DESCRIBES
     ▼
SPDXRef-Package-oci-ubuntu-curl-...        ← the image (root)
     │ CONTAINS  (×102)
     ├──► SPDXRef-Package-deb-apt-...
     │        │ CONTAINS  (×N files)
     │        ├──► SPDXRef-File-usr-bin-apt-...
     │        └──► SPDXRef-File-etc-apt-apt.conf.d-...
     │        │ DEPENDS_ON
     │        └──► SPDXRef-Package-deb-libapt-pkg6.0-...
     ├──► SPDXRef-Package-deb-bash-...
     └──► SPDXRef-Package-deb-libc6-...

Spec versions — SPDX 2.3 vs SPDX 3.0

SPDX has gone through several major revisions. The two that matter today:

	SPDX 2.3	SPDX 3.0
Released	2022	2024
Status	Current de-facto standard	Released, slow adoption
Schema	Flat: top-level `packages`, `files`, `relationships` arrays	Element-graph: everything is an `Element`, relationships are first-class elements
Profile system	None	Modular profiles: Core, Software, Build, Security, AI, Dataset, Licensing, Lite
Tooling	Universal: every SBOM tool emits SPDX 2.3	Growing: spdx-tools, syft (preview)

Almost every SBOM you'll meet in the wild today is SPDX 2.3. The Syft output we generated above is SPDX 2.3. SPDX 3.0 is a clean break — fundamentally a different data model — and will probably take a few years to dominate. Knowing 2.3 deeply transfers most of the way to 3.0 once you learn the new element-graph vocabulary.

This post focuses on 2.3 because that's what's in production.

Serialization formats

The same SPDX 2.3 document can be serialized into four formats. The on-the-wire bytes differ; the data is identical.

Format	File extension	Notes
JSON	`.spdx.json`	The dominant format. What `syft -o spdx-json` emits. JSON Schema available.
YAML	`.spdx.yaml`	Human-friendlier; less common
Tag-Value	`.spdx`	The original SPDX format (key:value text). Still emitted by some tools.
RDF/XML	`.spdx.rdf`	Semantic-web format. Rare in practice.

Tools like syft convert and spdx-tools move losslessly between them. Use JSON for anything new — it's what every tool reads, and it's what registries store when you oras attach an SBOM.

Document-level fields — what every SPDX document must have

The fields at the very top of an SPDX 2.3 JSON document are mandatory and have specific meanings. Going through ours:

{
  "spdxVersion":       "SPDX-2.3",
  "dataLicense":       "CC0-1.0",
  "SPDXID":            "SPDXRef-DOCUMENT",
  "name":              "oci-registry:5000/ubuntu-curl",
  "documentNamespace": "https://anchore.com/syft/image/oci-registry-5000/ubuntu-curl-da9454a6-742f-497e-a5db-16ae9aa0b48f",
  "creationInfo":      { ... }
}

Field	Why it must be exactly this
`spdxVersion`	The schema version. Parsers branch on this. Always `SPDX-` prefix.
`dataLicense`	The license of the SBOM document itself. SPDX 2.x mandates `CC0-1.0` so that SBOM data is freely shareable, regardless of the license of the software it describes.
`SPDXID`	The document's own ID. Must be exactly `SPDXRef-DOCUMENT`.
`name`	A human label for the document. By convention, the name of the thing being described.
`documentNamespace`	A globally unique URI for this document. Two regenerations of the same SBOM should have different namespaces (note the UUID in ours). It's how external documents reference each other unambiguously — see the `externalDocumentRefs` field in spec.
`creationInfo`	Required metadata about the generation event.

Why the namespace matters: if Document A wants to reference a package defined in Document B, it points to <B's documentNamespace>#SPDXRef-Package-foo. The namespace is the anchor for cross-document references. Without it, SPDXIDs would only be unique within a single file.

`creationInfo` — the provenance of the SBOM itself

"creationInfo": {
  "licenseListVersion": "3.28",
  "creators": [
    "Organization: Anchore, Inc",
    "Tool: syft-1.44.0"
  ],
  "created": "2026-05-09T11:34:11Z",
  "comment": "..."
}

Field	Meaning
`created`	UTC timestamp when the SBOM was generated. Required.
`creators`	Array of who/what created it. Each entry must start with `Tool:`, `Organization:`, or `Person:`. Convention: tools list both the tool and the organization that ran it.
`licenseListVersion`	Which version of the SPDX License List the document's license expressions were validated against. Important because the License List grows over time (3.28 has IDs that 3.20 didn't).
`comment`	Optional free text.

This is the SBOM's audit trail — by reading creationInfo you know what tool produced the document, when, and against which license vocabulary.

License expressions — the small DSL inside `licenseConcluded` and `licenseDeclared`

The single most underestimated piece of SPDX is its license expression syntax. Every licenseConcluded, licenseDeclared, and file-level license field uses it.

The simplest expression is one of ≈83,000 IDs from the SPDX License List:

"licenseDeclared": "MIT"
"licenseDeclared": "Apache-2.0"
"licenseDeclared": "GPL-2.0-only"
"licenseDeclared": "GPL-2.0-or-later"

You can combine IDs with operators:

Operator	Meaning	Example
`AND`	Conjunction — you must comply with both licenses	`(MIT AND Apache-2.0)`
`OR`	Disjunction — you may comply with either	`(GPL-2.0-only OR Apache-2.0)`
`WITH`	License + an exception	`Apache-2.0 WITH LLVM-exception`
`+`	This version or any later	`LGPL-2.1+` (deprecated in favor of `-or-later` IDs)

Compound expressions are common in real SBOMs:

"licenseDeclared": "(MIT AND Apache-2.0) OR GPL-3.0-or-later"

That reads: "the recipient may comply with the conjunction (MIT and Apache-2.0), OR with GPL-3.0-or-later, at their choice."

Three special tokens: `NONE`, `NOASSERTION`, `LicenseRef-*`

"licenseConcluded": "NONE"          ← there is no license; the file is in the public domain or unlicensed
"licenseConcluded": "NOASSERTION"   ← the analyst makes no claim about the license
"licenseDeclared":  "LicenseRef-Vendor-EULA-2024"  ← a custom license defined elsewhere in the document

NONE and NOASSERTION are not the same thing. NONE is a positive claim ("no license applies"); NOASSERTION is a refusal to claim ("I don't know / I won't say"). Tools that auto-generate SBOMs default to NOASSERTION when the license is ambiguous — which, for OS packages, it usually is.

LicenseRef-* IDs let you reference a license that isn't on the SPDX License List. Their text is then provided in...

`hasExtractedLicensingInfos` — custom license texts

If your licenseDeclared includes a LicenseRef-Foo, you must also include a hasExtractedLicensingInfos entry with the actual license text:

"hasExtractedLicensingInfos": [
  {
    "licenseId":      "LicenseRef-GPLv2-",
    "name":           "GPLv2 (Debian-modified header)",
    "extractedText":  "                    GNU GENERAL PUBLIC LICENSE\n                       Version 2, June 1991\n\n Copyright (C) 1989, 1991 Free Software Foundation, Inc. ...",
    "comment":        "Found in /usr/share/doc/apt/copyright"
  }
]

This is what makes SPDX legally usable: even if a package ships under some bespoke vendor license, the SBOM carries the full text of that license alongside the reference. License-compliance auditors can review the SBOM as a self-contained legal artifact.

More package fields you'll meet

The package table earlier covered the most common fields. SPDX 2.3 defines a few more that show up regularly:

Field	What it carries
`originator`	The entity that created the package (vs. `supplier`, who delivered it). For Debian's `apt` package: originator is the upstream apt project; supplier is Ubuntu.
`primaryPackagePurpose`	One of: `APPLICATION`, `FRAMEWORK`, `LIBRARY`, `CONTAINER`, `OPERATING-SYSTEM`, `DEVICE`, `FIRMWARE`, `SOURCE`, `ARCHIVE`, `FILE`, `INSTALL`, `OTHER`. Lets consumers filter (e.g. "show me all the FIRMWARE entries").
`releaseDate`	When upstream released this version (ISO-8601 UTC).
`builtDate`	When the binary in this image was built.
`validUntilDate`	Vendor-declared end-of-support date. Useful for "are we shipping anything past EOL?" queries.
`homepage`	Project website URL.
`attributionTexts`	Required attribution notices (BSD-style "this product includes...") that must appear in derivative work documentation.
`summary`, `description`	Short and long human-readable descriptions of the package.
`comment`	Free text from the SBOM author.
`annotations`	See next subsection.

`annotations` — reviewer or tool commentary

Both packages and the document itself can carry annotations: dated, signed-off comments. This is how an analyst leaves a note on a finding without modifying any other field:

"annotations": [
  {
    "annotationDate":   "2026-05-09T11:35:00Z",
    "annotationType":   "REVIEW",
    "annotator":        "Person: Sandeep Choudary",
    "comment":          "Verified license claim against /usr/share/doc/apt/copyright on 2026-05-09."
  }
]

annotationType is one of REVIEW, OTHER. The lightweight design lets compliance workflows attach evidence to specific package entries.

Snippets — sub-file granularity

Sometimes a single source file mixes code under different licenses. SPDX has snippets for that:

"snippets": [
  {
    "SPDXID":            "SPDXRef-Snippet-libfoo-bsd-fragment",
    "snippetFromFile":   "SPDXRef-File-src-libfoo-merged.c",
    "ranges": [
      { "startPointer": { "offset": 1024 }, "endPointer": { "offset": 4096 } }
    ],
    "licenseConcluded":  "BSD-3-Clause",
    "copyrightText":     "Copyright (c) 2018 Original Author"
  }
]

Container-image SBOMs almost never use snippets — they're a source-code-analysis feature. But if you read SBOMs from compliance tools like FOSSology you'll meet them.

Full relationship type list

The earlier table showed the most common 6 relationship types. SPDX 2.3 actually defines about 40. The full set falls into rough categories:

Composition:        CONTAINS, CONTAINED_BY, DESCRIBES, DESCRIBED_BY,
                    PACKAGE_OF, HAS_PREREQUISITE, PREREQUISITE_FOR

Dependency:         DEPENDS_ON, DEPENDENCY_OF,
                    DEPENDENCY_MANIFEST_OF, DEV_DEPENDENCY_OF,
                    OPTIONAL_DEPENDENCY_OF, BUILD_DEPENDENCY_OF,
                    RUNTIME_DEPENDENCY_OF, TEST_DEPENDENCY_OF,
                    PROVIDED_DEPENDENCY_OF

Build & source:     GENERATED_FROM, GENERATES, BUILD_TOOL_OF,
                    DEV_TOOL_OF, OPTIONAL_COMPONENT_OF

Linkage:            STATIC_LINK, DYNAMIC_LINK

Lifecycle:          PATCH_FOR, PATCH_APPLIED, COPY_OF,
                    ANCESTOR_OF, DESCENDANT_OF, VARIANT_OF

Distribution:       DISTRIBUTION_ARTIFACT, METAFILE_OF,
                    DOCUMENTATION_OF, EXAMPLE_OF, TEST_CASE_OF,
                    EXPANDED_FROM_ARCHIVE, FILE_ADDED, FILE_DELETED, FILE_MODIFIED

Other:              SPECIFICATION_FOR, REQUIREMENT_DESCRIPTION_FOR, OTHER, AMENDS

Note the inverse pairs (CONTAINS / CONTAINED_BY, DEPENDS_ON / DEPENDENCY_OF). SPDX lets you express the same edge from either direction; producers usually pick one direction and stay consistent.

Inter-document references

A single SBOM can reference packages defined in another SBOM. This is how ecosystems share license analyses without duplicating data:

"externalDocumentRefs": [
  {
    "externalDocumentId": "DocumentRef-ubuntu-base",
    "spdxDocument":       "https://ubuntu.com/sboms/22.04-base/spdx-2.3.json",
    "checksum": {
      "algorithm":     "SHA256",
      "checksumValue": "9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08"
    }
  }
]

Then anywhere in this document you can refer to DocumentRef-ubuntu-base:SPDXRef-Package-libc6 and consumers know exactly which libc6 you mean.

Validating an SPDX document

The official validation tools:

spdx-tools (Java) — the reference implementation. Runs schema validation, license-expression validation, and reference-integrity checks (every SPDXID referenced in relationships must exist).
pyspdxtools (Python) — official Python implementation; same checks.
JSON Schema at https://github.com/spdx/spdx-spec/blob/master/schemas/spdx-schema.json — plug into any JSON Schema validator (ajv, jsonschema, IDE plugins).
Online validator: https://tools.spdx.org/app/validate/

A well-formed SPDX document should pass all three: JSON Schema, the tooling reference checks, and license-expression validation against the License List.

Headline difference, in one sentence

SPDX is a graph of typed elements with rich licensing semantics — packages, files, snippets, and 40+ relationship types — designed first for license compliance and later extended to inventory and security. Everything else (CycloneDX, syft-json, etc.) is some compression of that idea.

Step 2: Generate a CycloneDX SBOM with Trivy

docker exec -w /work oci-lab bash -c '
  TRIVY_INSECURE=true trivy image \
    --format cyclonedx \
    --output /work/sbom.cdx.json \
    oci-registry:5000/ubuntu-curl:v1
'

2026-05-09T11:34:25Z    INFO    Detected OS  family="ubuntu" version="22.04"
2026-05-09T11:34:25Z    INFO    Number of language-specific files    num=0

$ wc -c /work/sbom.cdx.json
209683 /work/sbom.cdx.json

Trivy's CycloneDX is much smaller (210 KB vs 1.9 MB) because it doesn't catalog individual files — only packages.

What's inside

{
  "$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "serialNumber": "urn:uuid:cb37c624-b4ca-4281-aec9-8ded8176714f",
  "version": 1,
  "metadata":         { ... },           // document-level info + the root component
  "components":       [ ... ],           // 102 packages
  "dependencies":     [ ... ],           // adjacency list of the dependency graph
  "vulnerabilities":  [ ... ]            // optional, populated only when --scanners vuln is set
}

Top-level fields

Field	Purpose
`bomFormat`	Always `"CycloneDX"` — lets a parser detect the format from the first key
`specVersion`	The CycloneDX spec version this document targets (`1.4`, `1.5`, `1.6`...)
`serialNumber`	A `urn:uuid:` per document. Two regenerations are different serial numbers but otherwise equivalent.
`version`	Document revision counter. Bump when you re-publish a corrected SBOM for the same image.
`metadata`	Document-level metadata: `timestamp`, `tools` (what generated it), and `metadata.component` (see below)
`components`	Flat list of every package, library, file, container, OS, framework, etc. found in the target
`dependencies`	Adjacency list: which `bom-ref` depends on which
`vulnerabilities`	Optional: same vulnerability records you'd find in a Trivy report

`metadata.component` — the root

This is the CycloneDX equivalent of SPDX's DESCRIBES relationship. It says "this BOM is about the following thing":

"metadata": {
  "timestamp": "2026-05-09T11:34:25Z",
  "tools": {
    "components": [
      { "type": "application", "name": "trivy", "version": "0.70.0" }
    ]
  },
  "component": {
    "bom-ref": "oci-registry:5000/ubuntu-curl@sha256:0124b538...",
    "type": "container",
    "name": "oci-registry:5000/ubuntu-curl",
    "version": "sha256:0124b538...",
    "purl": "pkg:oci/ubuntu-curl@sha256:0124b538...?repository_url=oci-registry%3A5000%2Fubuntu-curl"
  }
}

The metadata.component.type here is container. Other valid types: application, framework, library, operating-system, device, firmware, file. Trivy emits container; Syft also emits container when serialising to CycloneDX.

The `components` array

A flat list. Order doesn't matter — the structure is in dependencies, not in nesting. A typical entry:

{
  "bom-ref": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
  "type": "library",
  "name": "apt",
  "version": "2.4.14",
  "purl": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
  "licenses": [
    { "license": { "name": "GPL-2.0-only" } }
  ],
  "properties": [
    { "name": "aquasecurity:trivy:LayerDigest", "value": "sha256:6edbc812af48..." },
    { "name": "aquasecurity:trivy:PkgID",      "value": "apt@2.4.14" },
    { "name": "aquasecurity:trivy:PkgType",    "value": "ubuntu" }
  ]
}

Things to notice:

bom-ref is the in-document identifier. CycloneDX lets you choose anything unique; Trivy and Syft conventionally use the PURL itself, which means refs are stable across regenerations.
type classifies what the component is. library is the catch-all for packages; application, framework, operating-system, firmware, file, container are the others.
purl is duplicated outside bom-ref so consumers that key on purl don't have to re-parse the ref.
properties is the format's open-ended escape hatch. Tools embed namespaced key-value pairs (aquasecurity:trivy:*, syft:*) for tool-specific metadata that doesn't fit the schema.

The `dependencies` array — the graph as adjacency list

This is the structural counterpart to SPDX's relationships. It lists, for each bom-ref, the bom-refs it depends on:

"dependencies": [
  {
    "ref": "oci-registry:5000/ubuntu-curl@sha256:0124b538...",
    "dependsOn": [
      "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
      "pkg:deb/ubuntu/bash@5.1-6ubuntu1.1?arch=arm64&distro=ubuntu-22.04",
      "pkg:deb/ubuntu/libc6@2.35-0ubuntu3.4?arch=arm64&distro=ubuntu-22.04"
    ]
  },
  {
    "ref": "pkg:deb/ubuntu/apt@2.4.14?arch=arm64&distro=ubuntu-22.04",
    "dependsOn": [
      "pkg:deb/ubuntu/libapt-pkg6.0@2.4.14?arch=arm64&distro=ubuntu-22.04",
      "pkg:deb/ubuntu/libc6@2.35-0ubuntu3.4?arch=arm64&distro=ubuntu-22.04"
    ]
  }
]

CycloneDX explicitly does not distinguish runtime vs build-time vs static-link dependencies in the core dependencies schema (CycloneDX 1.6 added an optional dependencyType property to address this). For most container-image SBOMs that's fine — everything is a runtime dependency once the image is built.

Same data, different shape — SPDX vs CycloneDX side by side

For the same apt package:

Concern	SPDX	CycloneDX
Internal ID	`SPDXRef-Package-deb-apt-5be364a4af57b701`	`bom-ref: "pkg:deb/ubuntu/apt@2.4.14?..."`
Cross-ecosystem ID	`externalRefs` array (PURL + CPE)	`purl` field (CPE optional, in `cpe` field)
Files owned	`relationships[CONTAINS]` from package to file SPDXIDs	Not represented (Trivy CycloneDX); file-level requires extra components of `type: "file"`
Integrity	`packageVerificationCode` (SHA-1 over file hashes)	`hashes` array per component (less common in OS-package SBOMs)
License	`licenseConcluded` + `licenseDeclared` (separate fields)	`licenses` array (single notion)
Graph edges	`relationships` array (typed: `CONTAINS`, `DEPENDS_ON`, `STATIC_LINK`, ...)	`dependencies` array (single edge type)
Tool metadata	`creationInfo.creators`	`metadata.tools.components`
Document root	`SPDXRef-DOCUMENT` + `DESCRIBES` relationship	`metadata.component`

The two formats describe the same reality. SPDX is more granular (separate concluded vs declared license, file-level relationships, multiple typed edges); CycloneDX is more compact and easier to round-trip programmatically. Most modern tools speak both and you can losslessly convert OS-level SBOMs between them with cyclonedx-cli convert or syft convert.

Aside: One Downstream Use — Vulnerability Scanning

The SBOM by itself is just an inventory. The most common downstream use is feeding the PURL/CPE list into a vulnerability database to discover CVEs. We won't dwell on it (this post is about SBOMs, not scanners), but here's the one-paragraph version of how the pipeline works:

SBOM (PURL/CPE per package) ──► lookup in vuln DB (NVD, OSV.dev, distro tracker)
                              ──► match version against affected ranges
                              ──► report CVEs + severity + fix version

For reference, running trivy image on our ubuntu-curl:v1 reports 35 CVEs (18 LOW, 17 MEDIUM, 0 HIGH/CRITICAL). A sample finding:

{
  "VulnerabilityID": "CVE-2026-27456",
  "PkgName": "bsdutils",
  "InstalledVersion": "1:2.37.2-4ubuntu3.5",
  "FixedVersion": null,
  "Severity": "MEDIUM",
  "Title": "util-linux: TOCTOU in the mount program when setting up loop devices"
}

Note the matching key: "PkgName": "bsdutils", "InstalledVersion": "1:2.37.2-4ubuntu3.5" — that's the same name+versionInfo we saw in the SBOM. The scanner looked up the package's PURL in its database and got back a CVE list. The SBOM was the input, the database was the lookup table, the CVE report was the output. Once you have an SBOM, this scan can run anywhere — you don't need access to the original image.

Step 3: Attach the SBOM to the Image as an OCI Artifact

Now the interesting part: we attach the SBOM to the registry next to the image, using the OCI 1.1 subject + Referrers mechanism we explored in Part 4.

What `oras attach` does

docker exec -w /work oci-lab bash -c '
  oras attach --plain-http \
    --artifact-type application/spdx+json \
    oci-registry:5000/ubuntu-curl:v1 \
    sbom.spdx.json:application/spdx+json
'

Uploading 28164ea0c196 sbom.spdx.json
Uploaded  28164ea0c196 sbom.spdx.json
Attached to [registry] oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
Digest: sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4

Three things happened:

The SBOM JSON file was pushed as a blob (sha256:28164ea0..., 1.9 MB).
An OCI manifest was created describing it (sha256:f50bb644...).
The manifest's subject field points to our image manifest.

Let's also attach the CycloneDX one:

docker exec -w /work oci-lab bash -c '
  oras attach --plain-http \
    --artifact-type application/vnd.cyclonedx+json \
    oci-registry:5000/ubuntu-curl:v1 \
    sbom.cdx.json:application/vnd.cyclonedx+json
'

Uploading fb4cd9377fac sbom.cdx.json
Uploaded  fb4cd9377fac sbom.cdx.json
Attached to [registry] oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
Digest: sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499

The SBOM Manifest — Look at What Was Just Pushed

docker exec oci-lab bash -c '
  curl -s "http://oci-registry:5000/v2/ubuntu-curl/manifests/sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4" \
    -H "Accept: application/vnd.oci.image.manifest.v1+json" | jq .
'

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/spdx+json",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "layers": [
    {
      "mediaType": "application/spdx+json",
      "digest": "sha256:28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195",
      "size": 1943149,
      "annotations": {
        "org.opencontainers.image.title": "sbom.spdx.json"
      }
    }
  ],
  "subject": {
    "mediaType": "application/vnd.oci.image.manifest.v1+json",
    "digest": "sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8",
    "size": 424
  },
  "annotations": {
    "org.opencontainers.image.created": "2026-05-09T11:36:17Z"
  }
}

This is the same shape as a notation signature manifest. The only differences:

Field	Notation Signature	SBOM (this manifest)
`artifactType`	`application/vnd.cncf.notary.signature`	`application/spdx+json`
`config.mediaType`	`application/vnd.cncf.notary.signature`	`application/vnd.oci.empty.v1+json`
`layers[0].mediaType`	`application/cose`	`application/spdx+json`
`subject`	image manifest digest	image manifest digest (same!)

The subject field works identically. The OCI registry doesn't care that one is a signature and the other is an SBOM — both are just manifests with a subject.

The empty config ({}, 2 bytes, mediaType application/vnd.oci.empty.v1+json) is the OCI-spec-blessed "I have no config" placeholder. Notice the data: "e30=" field — that's {} base64-encoded inlined directly into the manifest, so even fetching the config blob is optional.

Step 4: Discover Attached Artifacts via the Referrers Mechanism

Method 1 — `oras discover` (the convenient way)

docker exec oci-lab oras discover --plain-http oci-registry:5000/ubuntu-curl:v1 --format tree

oci-registry:5000/ubuntu-curl@sha256:0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8
├── application/spdx+json
│   └── sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
└── application/vnd.cyclonedx+json
    └── sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499

Both SBOMs are now discoverable, grouped by artifactType.

Method 2 — Raw HTTP via the Referrers tag fallback

registry:2 doesn't support the OCI 1.1 Referrers API natively, so oras (and notation, and trivy) all use the tag-based fallback: a tag named sha256-<hex> whose content is an OCI Image Index listing all referrers.

docker exec oci-lab bash -c '
  curl -s http://oci-registry:5000/v2/ubuntu-curl/tags/list | jq .
'

{
  "name": "ubuntu-curl",
  "tags": [
    "sha256-0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8",
    "v1"
  ]
}

Notice the sha256-... tag — that's the referrer index, named after our image's manifest digest with : replaced by -.

docker exec oci-lab bash -c '
  curl -s "http://oci-registry:5000/v2/ubuntu-curl/manifests/sha256-0124b5388c7c05576c0cedeab121a7c590b0ec16b4238e6b997ad9d57ccdefd8" \
    -H "Accept: application/vnd.oci.image.index.v1+json" | jq .
'

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4",
      "size": 730,
      "artifactType": "application/spdx+json",
      "annotations": {
        "org.opencontainers.image.created": "2026-05-09T11:36:17Z"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:39519e85a6346ada4c89cfe66837694f669c07843ecfd81b36d5cc67fe809499",
      "size": 746,
      "artifactType": "application/vnd.cyclonedx+json",
      "annotations": {
        "org.opencontainers.image.created": "2026-05-09T11:36:17Z"
      }
    }
  ]
}

The artifactType annotation lets clients filter: "Give me the SPDX one only" or "Give me everything signature-related". Notation signatures, SBOMs, vulnerability scans, and SLSA attestations all live side by side under the same parent image.

Method 3 — The Referrers API (when supported)

On a registry that supports OCI 1.1 natively (Zot, Harbor 2.9+, GHCR, ECR, ACR, Docker Hub):

GET /v2/ubuntu-curl/referrers/sha256:0124b538...
Accept: application/vnd.oci.image.index.v1+json
→ 200 OK
→ Body: <same Image Index as above, computed dynamically by the registry>

Optionally filter by artifact type:

GET /v2/ubuntu-curl/referrers/sha256:0124b538...?artifactType=application/spdx+json
→ 200 OK
→ OCI-Filters-Applied: artifactType
→ Body: <Image Index containing only SPDX referrers>

Clients try the API first; fall back to the tag if it returns 404.

Step 5: Pull the SBOM Back

A consumer (CI pipeline, security scanner, admission controller) can pull the SBOM by content:

docker exec oci-lab bash -c '
  mkdir -p /tmp/pulled-sboms && cd /tmp/pulled-sboms
  oras pull --plain-http \
    oci-registry:5000/ubuntu-curl@sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
  ls -la
'

Downloaded  28164ea0c196 sbom.spdx.json
Pulled [registry] oci-registry:5000/ubuntu-curl@sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4
Digest: sha256:f50bb644a1c952a613e4867ed714c1975673240d42fb4388fcf743311fdeb8a4

-rw-r--r-- 1 root root 1943149 May  9 11:36 sbom.spdx.json

Verify the content matches what we pushed:

docker exec oci-lab sha256sum /work/sbom.spdx.json /tmp/pulled-sboms/sbom.spdx.json

28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195  /work/sbom.spdx.json
28164ea0c19614bfc106fea2ae6107dce12a1e79dfb1a361047b3264915e5195  /tmp/pulled-sboms/sbom.spdx.json

Identical. Content-addressable storage at work.

The Big Picture — Object Inventory

BEFORE attaching SBOMs:
┌─────────────────────────────────────────────────────────────────┐
│  Tags:                                                          │
│    v1 ──► sha256:0124b538... (image manifest)                   │
│                                                                 │
│  Blobs:                                                         │
│    sha256:0124b538... = image manifest    (424 B)               │
│    sha256:8bdde1d7... = image config      (2,069 B)             │
│    sha256:6edbc812... = Ubuntu layer      (27,606,543 B)        │
└─────────────────────────────────────────────────────────────────┘

AFTER attaching SPDX + CycloneDX SBOMs:
┌─────────────────────────────────────────────────────────────────┐
│  Tags:                                                          │
│    v1 ──► sha256:0124b538... (image manifest, unchanged)        │
│    sha256-0124b538... ──► referrer index                        │
│                                                                 │
│  Original blobs (untouched, image digest unchanged):            │
│    sha256:0124b538... = image manifest    (424 B)               │
│    sha256:8bdde1d7... = image config      (2,069 B)             │
│    sha256:6edbc812... = Ubuntu layer      (27,606,543 B)        │
│                                                                 │
│  New blobs from SBOM attachment:                                │
│    sha256:f50bb644... = SPDX manifest      (730 B)              │
│    sha256:39519e85... = CycloneDX manifest (746 B)              │
│    sha256:44136fa3... = empty config {}    (2 B, shared)        │
│    sha256:28164ea0... = SPDX SBOM blob     (1,943,149 B)        │
│    sha256:fb4cd937... = CycloneDX SBOM blob (209,683 B)         │
│                                                                 │
│  Relationships:                                                 │
│    referrer index ──lists──► [SPDX manifest, CycloneDX manifest]│
│    SPDX manifest ──subject──► image manifest                    │
│    CycloneDX manifest ──subject──► image manifest               │
└─────────────────────────────────────────────────────────────────┘

The image is untouched. Its manifest digest is exactly the same before and after. Anyone pinning ubuntu-curl@sha256:0124b538... gets bit-for-bit identical bytes. The SBOMs live alongside, discoverable but separate.

Why This Design Wins

The OCI 1.1 subject + Referrers approach has three properties that older "embed-it-in-the-image" approaches lack:

1. The signed image stays signed

If you embedded the SBOM as an extra layer in the image, the image manifest digest would change every time you regenerated the SBOM. That breaks digest pinning, breaks signatures, and forces re-signing on every SBOM update. With referrers, the image is immutable; metadata is mutable.

2. Anyone can attach anything, anytime

You don't need to modify the image to attach an SBOM. Your CI pipeline can build and push the image, then a separate stage (or a completely separate team/service) can run syft and attach the result. Vulnerability scans can be re-run weekly and re-attached without touching the image.

3. One mechanism for all metadata

Same plumbing for everything:

Image manifest: sha256:0124b538...
  ↑ subject (referrers)
  ├── application/vnd.cncf.notary.signature  ← Notation signature
  ├── application/spdx+json                  ← SBOM (SPDX)
  ├── application/vnd.cyclonedx+json         ← SBOM (CycloneDX)
  ├── application/vnd.in-toto+json           ← SLSA provenance attestation
  └── application/sarif+json                 ← Vulnerability scan results

The registry doesn't need plugins, special endpoints, or knowledge of these formats. It just stores manifests with subject fields and serves them via GET /v2/<name>/referrers/<digest> or the tag fallback.

Production Patterns

Pattern 1: Generate-and-attach in CI

# After docker push, before declaring success:
- name: Generate SBOM
  run: syft $IMAGE_REF -o spdx-json=sbom.spdx.json

- name: Attach SBOM
  run: |
    oras attach \
      --artifact-type application/spdx+json \
      $IMAGE_REF \
      sbom.spdx.json:application/spdx+json

Pattern 2: Admission-time verification

A Kubernetes admission controller (Kyverno, Ratify) can require both:

A valid Notation signature (proves who built it)
An attached SBOM with no CRITICAL CVEs (proves what's in it)

Both are discoverable through the same Referrers API call — Kubernetes admission gets the full provenance story in one place.

Pattern 3: SBOM diffing across versions

Pull SBOMs for myapp:v1 and myapp:v2, diff their package lists, and you have an automated changelog of dependencies. New packages → review for licensing. Removed packages → potential dead code. Version bumps → compare against vulnerability feeds.

Recap

In this part we:

Step	Tool	Result
Generated SPDX SBOM	`syft`	1.9 MB JSON, 102 packages, 2,290 files
Generated CycloneDX SBOM	`trivy`	210 KB JSON, 102 components
Scanned for CVEs	`trivy`	35 vulnerabilities (18 LOW, 17 MEDIUM)
Attached SPDX as OCI artifact	`oras attach`	New manifest with `subject` → image
Attached CycloneDX as OCI artifact	`oras attach`	Second referrer alongside SPDX
Discovered attached artifacts	`oras discover` + raw HTTP	Tree view + Image Index via referrer tag
Pulled the SBOM back	`oras pull`	Bit-identical to source (same sha256)

The big takeaway

An SBOM is just an OCI artifact with artifactType: application/spdx+json (or application/vnd.cyclonedx+json). It uses the exact same plumbing as a notation signature: a manifest with a subject field, discoverable via the Referrers API or its tag-based fallback.

Once you understand subject + Referrers, you understand:

Notation signatures (Part 4)
SBOMs (this part)
Vulnerability scan results
SLSA build provenance
Anything else the supply-chain world dreams up next

The OCI registry is no longer just a place to store images — it's a content-addressable graph of software, its provenance, and everything we know about it.

Cleanup

docker rm -f oci-registry oci-lab
docker network rm oci-net

Series Recap

Across the five parts we went from spec to running container to a fully signed-and-described supply-chain artifact, end to end, with no magic:

Part	We did	Spec
1	Built an OCI image by hand — manifest, config, layers, content-addressable blobs	OCI Image Spec
2	Pushed and pulled with raw `curl` against an nginx "registry"	OCI Distribution Spec
3	Ran a container with `chroot`, `unshare`, `mount`, `overlayfs` — no `runc` needed	OCI Runtime Spec
4	Signed an image with Notation and dissected the COSE envelope + Referrers index	OCI 1.1 Referrers / Notary Project
5 (this)	Attached SBOMs as OCI artifacts and discovered them via the same Referrers mechanism	OCI 1.1 Referrers / Reference Types

You now have the mental model to read every CVE feed, every supply-chain SBOM, every container registry response, and recognize what it is and where it fits.

Every digest, byte count, package count, CVE ID, and command output in this post was captured from an actual run inside Docker Desktop for Mac (arm64) on May 9, 2026. Tools used: registry:2, syft 1.44.0, oras 1.2.0, trivy 0.70.0, skopeo, curl, jq.

What is an SBOM?

Background — Identifying a Package

PURL — Package URL (the modern identifier)

CPE — Common Platform Enumeration

Where these identifiers go

SBOM Formats — SPDX and CycloneDX

How Syft Identifies Packages — Catalogers and Evidence

What a cataloger is

The catalogers Syft ships with

Evidence sources — where the data actually comes from

How Syft sees a container image

What syft — and any SBOM tool — cannot do reliably

Prerequisites — The Lab

Install syft, oras, and trivy

Push a target image

Step 1: Generate an SPDX SBOM with Syft

What's inside

A sample package

externalRefs — not just PURL and CPE

packageVerificationCode — the integrity check

The files array — file-level evidence

The relationships array — where the graph lives

Spec versions — SPDX 2.3 vs SPDX 3.0

Serialization formats

Document-level fields — what every SPDX document must have

creationInfo — the provenance of the SBOM itself

License expressions — the small DSL inside licenseConcluded and licenseDeclared

Three special tokens: NONE, NOASSERTION, LicenseRef-*

hasExtractedLicensingInfos — custom license texts

More package fields you'll meet

annotations — reviewer or tool commentary

Snippets — sub-file granularity

Full relationship type list

Inter-document references

Validating an SPDX document

Headline difference, in one sentence

Step 2: Generate a CycloneDX SBOM with Trivy

What's inside

Top-level fields

metadata.component — the root

The components array

The dependencies array — the graph as adjacency list

Same data, different shape — SPDX vs CycloneDX side by side

Aside: One Downstream Use — Vulnerability Scanning

Step 3: Attach the SBOM to the Image as an OCI Artifact

What oras attach does

The SBOM Manifest — Look at What Was Just Pushed

Step 4: Discover Attached Artifacts via the Referrers Mechanism

Method 1 — oras discover (the convenient way)

Method 2 — Raw HTTP via the Referrers tag fallback

Method 3 — The Referrers API (when supported)

Step 5: Pull the SBOM Back

The Big Picture — Object Inventory

Why This Design Wins

1. The signed image stays signed

2. Anyone can attach anything, anytime

3. One mechanism for all metadata

Production Patterns

Pattern 1: Generate-and-attach in CI

Pattern 2: Admission-time verification

Pattern 3: SBOM diffing across versions

Recap

The big takeaway

Cleanup

Series Recap

The `files` array — file-level evidence

The `relationships` array — where the graph lives

`creationInfo` — the provenance of the SBOM itself

License expressions — the small DSL inside `licenseConcluded` and `licenseDeclared`

Three special tokens: `NONE`, `NOASSERTION`, `LicenseRef-*`

`hasExtractedLicensingInfos` — custom license texts

`annotations` — reviewer or tool commentary

`metadata.component` — the root

The `components` array

The `dependencies` array — the graph as adjacency list

What `oras attach` does

Method 1 — `oras discover` (the convenient way)