Skip to content

Conversation

alilleybrinker
Copy link

@alilleybrinker alilleybrinker commented Aug 6, 2025

This RFD introduces support for CNAs to report artifacts affected by a vulnerability by introducing a new "affectedArtifacts" field to the "cnaPublishedContainer". This new field is an array of objects, with each object identifying a single artifact, potentially with multiple identifiers per-artifact.


This is a replacement for the previous OmniBOR portion of #407 (which originally covered both Package URLs and OmniBOR Artifact IDs, before being narrowed to solely focus on Package URLs).


The potential implementation can be seen in #441.


Rendered

This RFD introduces support for CNAs to report artifacts affected
by a vulnerability by introducing a new "affectedArtifacts" field
to the "cnaPublishedContainer". This new field is an array of objects,
with each object identifying a single artifact, potentially with
multiple identifiers per-artifact.

Signed-off-by: Andrew Lilley Brinker <[email protected]>
Signed-off-by: Andrew Lilley Brinker <[email protected]>
@david-waltermire
Copy link
Collaborator

david-waltermire commented Aug 7, 2025

We may want to consider supporting an array of synonyms.

Here are two variant examples:

{
"affectedArtifacts": [
    {
        "artifacts": {
            "omnibor": "gitoid:blob:sha256:9f64df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8",
            "sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
        },
        "status": "affected",
        "version": "0.18.1",
        "versionType": "semver",
        "platforms": ["macOS", "x86"]
    },
    {
        "artifacts": [
            {
                "type": "omnibor",
                "value": "gitoid:blob:sha256:4043df92367881be21e23567a31a8ce01994d98b69d28917b5c132ce32a8e6c8"
            },
            {
                "type": "sha256",
                "value": "40414dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
            }
        ],
        "status": "affected",
        "version": "0.18.1",
        "versionType": "semver",
        "platforms": ["Windows", "x86"]
    }
]
}

The value of such an approach is twofold:

  1. It makes it easy to enforce a rule that at least one artifact type must be provided.
  2. It opens the door to add other descriptive properties down the road for each artifact type, making the model more extensible. This is supported by the second variant.

I think both of these reasons outweigh the extra verbosity.

@alilleybrinker
Copy link
Author

Thanks @david-waltermire! A few thoughts:

  • I think any field for specifying synonymous identifiers should be called "identifiers" rather than "artifacts" since all identifiers should be identifying the same artifact.
  • I think we should use distinct fields for different identifier types. Not all identifiers will be expressible with a single "value" field, so I think the "type"/"value" approach would be a bit awkward.
  • For OmniBOR specifically, the identifier is called an OmniBOR Artifact ID, with OmniBOR as the name for the overall specification which includes the Input Manifests, so using the name "omnibor" for the field would not be accurate.

@alilleybrinker
Copy link
Author

I'll also note that during today's QWG meeting a couple of folks raised the idea of using a "versions" field instead of the "version" field shown in this proposal, to more compactly support cases where the same artifact is associated with multiple versions. I'm open to that, and will update the RFD to reflect that design.

@darakian
Copy link

darakian commented Aug 8, 2025

I don't think I expressed my thoughts on the version field well in the QWG meeting yesterday so let me try again in text. First just so we have a hard example to poke and prod at, I've published version 0.1.4 of this crate
https://crates.io/crates/discombobulate/versions
which has no code changes from 0.1.3. Crates did require me to change the toml file before upload so, maybe there are ecosystems where we can be assured that there's a bit level diff between all named versions. I guess that depends on what exactly we're hashing, but this isn't really the thread I want to pull on.
Diff commit is here darakian/discombobulate@273572e as well

I think it may make more sense to not have the version field at all and I'll try to make that case.

The case for omitting version information from an affected artifact list (at least for now)

A few assumptions

  1. All else equal fewer properties in the schema is better
  2. A single CVE describes a single problem (perhaps in many things)
  3. Coarse grained version information already exists in a CVE record

Assumption 1 leads me to a stance of "if there's no clear use case for something then cut it". The rational is that fewer properties make for simpler parser design and lower cognitive load for the humans dealing with the records. Given the presence of version information elsewhere in the record it seems duplicative to have it here as well especially in the face of the dumb edge cases as demonstrated above with my package.

Next I consider how I might use this data and the only use cases I can think of are when the coarse grained version info has failed me. Eg. If I'm reading some CVE and it lays out that the ranges [1.2.3, 2.3.4), [3.4.5, 4.5.6), etc... are vulnerable for whichever product then I would first attempt to check if the versions I use intersect with that list. Perhaps its a failure of my imagination (please call that out if so), but I think I would only be looking for hash based / intrinsic identifiers in the event that the version information has failed me and in that case I don't really care if the hash abc123... matches against version 1.2.4 or 3.4.6 or whatever. I simply care that its in the affected set. Or if its called out as explicitly unaffected.

I can see potential value for an alternate workflow where someone may be doing deeper analysis and want to track a flaw back from some blob they find on disk back to a source, however the discussion yesterday was going toward duplicating much or all of the current version block. I'm not sure it's worth it and I'd like to suggest that in the interest of simplicity that we consider dropping it, observing how people use the field in the wild and reconsider for a 5.3.0 if we feel there's a need. Same for platforms.

My $0.02

Other comments

We should create a list of valid identifiers and link out to their definitions least we end up with more chaos in the records in practice. I guess that list is omnibor and sha256 right now.

I think any field for specifying synonymous identifiers should be called "identifiers" rather than "artifacts" since all identifiers should be identifying the same artifact.

100% agree.

@alilleybrinker
Copy link
Author

Thanks @darakian! Regarding removing version/versionType/platforms, I generally agree. My original proposal for the prior design around OmniBOR didn't permit their equivalent fields in affected array objects when the omniborArtifactID was specified.

I'd added them in for this proposal based on feedback from @ElectricNroff (see #427). I assume he would advocate for their inclusion here as well.

In this context, they'd semantically be providing information about the artifact but not constraining applicability the way they do when used in the affected array, so that's a difference worth noting.

@darakian
Copy link

darakian commented Aug 11, 2025

Ah, I see. Give me a minute or two to read the thread in full, but @david-waltermire what are your thoughts on dropping version / platforms, hashing out how those could/should/would work and circling back on those in 5.3.0 or whichever the timing aligns with?

An old PR came to me over the weekend which might add some nice real world context.
A while back I dealt with a researcher who was looking for so called "shaded" dependencies. I was unfamiliar with the term, but I came to understand it as a copy+paste of some dependency into one or many maven artifacts. Full thread here
github/advisory-database#2258
and research paper (which is also linked in that thread) here
https://arxiv.org/pdf/2306.05534
Notably the nested dependencies had their own versions independent from the version ascribed to the broader maven artifact.

We ended up handling this with a bunch of individual PRs adding different sets of packages to different GHSAs and it worked well enough. That said, being able to express a vuln cleanly as about the specific files via hashes/omnibors/whatever and to let some form of tooling go check public registries and derive coarse grained version info would have been super neat. Less work and (probably) better coverage.

@alilleybrinker
Copy link
Author

Summarizing open issues (meant to do this in the first comment, but better late than never):

  • How should identifiers be represented?
    • Fields directly on the affectedArtifact object?
    • identifiers array, with identifier objects, with a type field?
  • What metadata should be included in the affectedArtifact object?
    • version?
    • versions (with similar structured to affected[].versions)?
    • platforms?

@alilleybrinker
Copy link
Author

@ElectricNroff
Copy link

ElectricNroff commented Aug 14, 2025

We may want to look at other projects that ingest inherent identifiers for similar purposes, and see what type of data they get, e.g., a complete set of every relevant inherent identifier, a well-defined subset, a seemingly random subset, etc. Example: https://github.com/RetireJS/retire.js/blob/6da45fcb6a3425e55ee8181b2ac35168879bf086/repository/jsrepository-master.json#L824-L842

In other words, when I compare that RetireJS data to the https://github.com/AceMetrix/jquery-deparam/tags page, how can I understand why there is no inherent identifier for 0.2.0, 0.5.0, or 0.5.1 - but there is (perhaps surprisingly) an inherent identifier for 0.4.1? Is the typical answer "that's all the work that I wanted to do - if you want more inherent identifiers, then submit your own ADP container"? Or, is it sometimes something else, e.g., the data provider could not reproduce the vulnerability in 0.2.0, 0.5.0, or 0.5.1?

@darakian
Copy link

darakian commented Aug 14, 2025

Love the idea of inspecting how projects ingest inherent identifiers for use. One thought came to me on having a "complete list" though; that's going to run up against the record size limit which is defacto 16MB today
#200
due to a mongoDB limit.
Projects which make a large number of releases may hit record limits. Perhaps there's guidance that could be given along the lines of "only use inherent identifiers when a coarse grained identifier is insufficient" or something

I think an answer to this question

Is the typical answer "that's all the work that I wanted to do - if you want more inherent identifiers, then submit your own ADP container"? Or, is it sometimes something else, e.g., the data provider could not reproduce the vulnerability in 0.2.0, 0.5.0, or 0.5.1?

Could be modeled on how github takes feedback for their advisories. In essence the first publication is a best effort and users come and suggest/request additions/subtractions and provide evidence in a public forum in order to motivate those changes.

ADP containers could work too and maybe CNAs could delegate/designate authority to groups they trust to do validation work so as to keep the process from just being a free for all. Maybe something else, but just spitballing.

@alilleybrinker
Copy link
Author

Adding some open questions from discussions on this:

  • Should the affectedArtifacts field be within the objects in the affected array, or even within the versions field of those objects, to provide structural relationships between product / package info and artifact info?
  • Should affectedArtifacts be an object with completeness and description fields?
  • Should objects in the affectedArtifacts array have a description field to explain what the identifier is an identifier for?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants