-
-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #931 : Support for Google OSV #1703
Issue #931 : Support for Google OSV #1703
Conversation
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: Sahiba Mittal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direction you're headed in looks good, @sahibamittal.
[...] but on comparing (say, for ghsa id GHSA-vvw4-rfwf-p6hx), lower/upper ranges are bit different.
Can you elaborate a bit? I am not able to spot a difference in the ranges, but it's totally possible that I'm not looking close enough.
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: Sahiba Mittal <[email protected]>
@nscuro Anything outstanding prior to merge? |
@stevespringett I haven't had a chance to review the latest changes, and haven't tested the new functionality yet. I'll try to get it done tonight. |
It doesn't appear that authentication is required in order to download the OSV ecosystem data. In the configuration, OSV is disabled by default. Is this what the default should be? Are there reasons why OSV should not be enabled by default? |
@nscuro I tried to run the container but it is showing 'Connection timed out' error at URL.openStream in OSVDownloadTask. |
Yes, it should be enabled by default. The property value for enabled flag is coming as true from front-end. But, I'll change default to true in backend as well. |
Signed-off-by: Sahiba Mittal <[email protected]>
Are you connected to a corporate network of some sorts? Those networks typically require usage of a proxy to be able to reach external hosts. I'm not super sure if It may be a good idea to use dependency-track/src/main/java/org/dependencytrack/tasks/NistMirrorTask.java Lines 211 to 218 in 0aa6517
You may need to then configure the proxy, see "Proxy Configuration" section at https://docs.dependencytrack.org/getting-started/configuration/ For running the entire DT application, you don't need to build the container. I'd recommend to launch it via Maven, see: https://github.com/DependencyTrack/dependency-track/blob/a5037fa42e73343cab513df14af5475cbc67896c/DEVELOPING.md#debugging |
src/main/java/org/dependencytrack/parser/osv/model/OSVAdvisory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/dependencytrack/parser/osv/GoogleOSVAdvisoryParser.java
Outdated
Show resolved
Hide resolved
src/test/java/org/dependencytrack/task/OSVDownloadTaskTest.java
Outdated
Show resolved
Hide resolved
One issue I noticed is that GHSA vulnerabilities are duplicated. We'll have vulnerabilities with dependency-track/src/main/java/org/dependencytrack/tasks/scanners/OssIndexAnalysisTask.java Lines 253 to 254 in 0aa6517
@stevespringett, will #1642 help here? |
Can we may be disable github advisories and make OSV default as vulnerability source, to avoid duplication? since github source is a subset of OSV. |
A viable option for sure, especially since that would also remove the necessity of configuring a PAT for GHSA mirroring. Few things to check:
We'll also need to think of the migration scenario: Folks will already have lots of GHSA vulns in their portfolio, and they'll have audited them as well. We can't just remove the "old" GHSA vulns when upgrading, as audit trails will be lost. |
Signed-off-by: Sahiba Mittal <[email protected]>
@nscuro to my knowledge, OSV only contains a subset of GHSA. They only include the "reviewed" vulnerabilities (approx 7500). They do not include the unreviewed vulnerabilities (>170K). In the future, I want to make a configurable option for the GitHub Advisories integration for enabling reviewed and unreviewed independently. This would allow folks to use OSV for all reviewed GHSAs and use the GitHub Advisory integration for all unreviewed vulnerabilities. |
Signed-off-by: Sahiba Mittal <[email protected]>
Does OSV have any vulnerabilities that are specific to them? For example, Sonatype OSS Index, if what OSS Index found was a CVE, then the source is set to the NVD. If it was a sonatype specific finding, then the source is set to OSSINDEX. We need to do the same here. If OSV finds a CVE, GHSA, etc, then the source needs to be set to the NVD and GITHUB accordingly, since they are the source of the vulnerability, not OSV. OSV is just acting as an interface. But if OSV does have their own findings (which I think they do with OSS Fuzz), then OSV would only be the source for those things. This will also prevent the duplicates from occurring, and will prevent duplicate records being mirrored in the internal database. |
OSV does have its own findings (e.g: https://osv.dev/vulnerability/OSV-2022-217 ) . OSV also supports alias field https://ossf.github.io/osv-schema/#aliases-field. I believe that the OSV data may contain more useful information ("PURLs, expand version ranges to a full list of affected versions, etc." ) @oliverchang please correct me if I'm wrong. I think duplicate findings can be correlated and displayed as a unified finding from multiple sources? I think in this way, we can identify any data issues with a particular source. |
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: Sahiba Mittal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sahibamittal, have you by chance tested how this behaves when GHSA mirroring is enabled as well? Will the GHSA mirror task override the vulnerabilities created by OSV, and the other way around?
Signed-off-by: Sahiba Mittal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some early feedback. Great work so far! But I'll need a little bit more time to complete the review and do some testing.
src/main/java/org/dependencytrack/parser/osv/GoogleOSVAdvisoryParser.java
Outdated
Show resolved
Hide resolved
src/main/java/org/dependencytrack/parser/osv/GoogleOSVAdvisoryParser.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sahiba Mittal <[email protected]>
src/main/java/org/dependencytrack/parser/osv/GoogleOSVAdvisoryParser.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: Sahiba Mittal <[email protected]>
Signed-off-by: nscuro <[email protected]>
Signed-off-by: nscuro <[email protected]>
…fields Signed-off-by: nscuro <[email protected]>
Additional changes: * Rename `OsvVulnerability` to `OsvAffectedPackage` to avoid confusion * Be more strict about ordering of range events Signed-off-by: nscuro <[email protected]>
As requested by @VinodAnandan, I made some changes to address some outstanding issues.
Even before these changes, the code in this PR worked as intended. To demonstrate, using the OSV data, the internal analyzer was able to identify pretty much the same vulnerabilities that OSS Index found: I noticed a few general limitations however. Noting them here for completeness:
Points 2 and 3 bother me the most. While my local testing looked good WRT identifying vulnerabilities, I think it's safe to say that OSV integration will not yield the accurate data we were hoping for, at least not right now. If we end up merging this, we should make it really clear that this is a preview feature. Not due to the implementation, but because mapping from OSV to our data model is error-prone. May be worth investigating whether just using OSV's API would be a better approach for now. That way we can use their documented evaluation algorithm, and only mirror vulnerabilities ad-hoc, like it's done for OSS Index already. I hacked together a MVP to demonstrate what I mean: nscuro@d9fc2ee I guess my open question is: Are we fine with the uncertainties and unconsistencies that we may introduce by mirroring OSV? |
For some odd reason, the query generated by DataNucleus for fetching `VulnerableSoftware` is drastically less efficient when using the `VULNERABLESOFTWARE` `@FetchGroup` over lazy fetching via `Vulnerability#getVulnerableSoftware()`. Query generated by fetch group: ``` SELECT 'org.dependencytrack.model.VulnerableSoftware' AS DN_TYPE,A1.CPE22,A1.CPE23,A1.EDITION,A1.ID AS NUCORDER0,A1."LANGUAGE",A1.OTHER,A1.PART,A1.PRODUCT,A1.PURL,A1.PURL_NAME,A1.PURL_NAMESPACE,A1.PURL_QUALIFIERS,A1.PURL_SUBPATH,A1.PURL_TYPE,A1.PURL_VERSION,A1.SWEDITION,A1.TARGETHW,A1.TARGETSW,A1."UPDATE",A1.UUID,A1.VENDOR,A1.VERSION,A1.VERSIONENDEXCLUDING,A1.VERSIONENDINCLUDING,A1.VERSIONSTARTEXCLUDING,A1.VERSIONSTARTINCLUDING,A1.VULNERABLE,A0.VULNERABILITY_ID FROM VULNERABLESOFTWARE_VULNERABILITIES A0 INNER JOIN VULNERABLESOFTWARE A1 ON A0.VULNERABLESOFTWARE_ID = A1.ID WHERE EXISTS (SELECT 'org.dependencytrack.model.Vulnerability' AS DN_TYPE,A0_SUB.ID AS DN_APPID FROM VULNERABILITY A0_SUB WHERE A0_SUB.SOURCE = 'NVD' AND A0_SUB.VULNID = 'CVE-2020-0404' AND A0.VULNERABILITY_ID = A0_SUB.ID) ORDER BY NUCORDER0 ``` Query generated by `getVulnerableSoftware()`: ``` SELECT 'org.dependencytrack.model.VulnerableSoftware' AS DN_TYPE,A1.CPE22,A1.CPE23,A1.EDITION,A1.ID AS NUCORDER0,A1."LANGUAGE",A1.OTHER,A1.PART,A1.PRODUCT,A1.PURL,A1.PURL_NAME,A1.PURL_NAMESPACE,A1.PURL_QUALIFIERS,A1.PURL_SUBPATH,A1.PURL_TYPE,A1.PURL_VERSION,A1.SWEDITION,A1.TARGETHW,A1.TARGETSW,A1."UPDATE",A1.UUID,A1.VENDOR,A1.VERSION,A1.VERSIONENDEXCLUDING,A1.VERSIONENDINCLUDING,A1.VERSIONSTARTEXCLUDING,A1.VERSIONSTARTINCLUDING,A1.VULNERABLE FROM VULNERABLESOFTWARE_VULNERABILITIES A0 INNER JOIN VULNERABLESOFTWARE A1 ON A0.VULNERABLESOFTWARE_ID = A1.ID WHERE A0.VULNERABILITY_ID = ? ORDER BY NUCORDER0 ``` Signed-off-by: nscuro <[email protected]>
I think there are pros and cons for each approach (individual calls vs batch/mirroring). Considering the current state (New, relatively small VDB, No SLA, etc.) of OSV.dev I believe the batch/mirroring approach is better for now, but we can revisit it later if needed. Also, the mirroring processing can be improved if we can incrementally get the data. I don't believe the affected data problem is just associated with OSV.dev, any vulnerability database may contain inaccurate affected data, and we can't expect perfect affected data from any source. The DT should support correcting the affected data over time by removing the append-only restriction. DT should associate the affected data to a source and the source be allowed to update the data if needed. Otherwise it may result in more FPs even if the vulnerability data source eventually corrects the data. |
@VinodAnandan Agreed. We need a way to track which source added a
+1. However, my main concern with OSV isn't that their data isn't super accurate, it's more that me may not be able to correctly parse / interpret the data they're providing (see the ranges issue above). |
For For github/advisory-database#470, for vulnerabilities without a "fixed" event, we interpret that as every version is affected right now. This is actually not too big of an issue most of the time since GitHub tracks the patched version most of the time -- if one exists, there will be a "fixed" event. If not, most likely it's not fixed. Either way, GitHub has the issue open on their side to make use of the |
Thanks @oliverchang, google/osv.dev#485 would be really nice to have! |
Signed-off-by: Sahiba Mittal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for all the work you put into this, @sahibamittal!
I'm going to merge this now, and I logged #1815 to address the VulnerableSoftware
situation.
We still need documentation for the OSV datasource (similar to the GHSA one). The docs should also state that this is a preview feature. Could you provide a PR for this, or do you want us to do it?
Thanks Niklas! |
Signed-off-by: Sahiba Mittal [email protected]
Issue: #931
Google OSV Repo -> https://github.com/google/osv
Frontend Changes: DependencyTrack/frontend#170
NOTE: Alternative way is to use Google client library to retrieve from bucket.
TBD: Mapping of OSV data to dependency-track data.
In Github advisory, we receive version range object whereas, in google OSV, we get an array of versions which can be used as range but on comparing (say, for ghsa id GHSA-vvw4-rfwf-p6hx), lower/upper ranges are bit different.
Objects from Github Advisory: https://docs.github.com/en/graphql/reference/objects#securityvulnerability
Objects from Google OSV: https://osv.dev/list?page=21&ecosystem=Maven
Example for difference between github and google for GHSA-vvw4-rfwf-p6hx :
Github: GHSA-vvw4-rfwf-p6hx
Google: attachment json
GHSA-vvw4-rfwf-p6hx.json.zip