Improve matching suggestions quality #391

Open
opened 2024-11-29 12:04:44 +00:00 by fricklerhandwerk · 2 comments
fricklerhandwerk commented 2024-11-29 12:04:44 +00:00 (Migrated from github.com)

This needs refinement.

We discussed a few approaches to reduce the noise in suggestions

Ideas

  • Filter out hardware type CPEs
  • Compile hand-curated (allow or deny) list of product names that can be used for filtering
    • e.g. openshift4/ose-docker-builder or rhel{8,9,10} will never concern us
  • Accommodate for Nixpkgs-specifics
    • deduplicate wrapped/unwrapped but still encourage maintainers to fix both
    • drivers and kernel modules contain the kernel version in their attribute names
      • those CVEs will have thousands of matching derivations at the moment
    • handle aliases
    • packages in multiple versions
      • GCC, LLVM, Python are regular offenders
      • we can work with version constraints and help navigation with appropriate rendering
      • https://github.com/Nix-Security-WG/nix-security-tracker/issues/332
        • root dependencies will (rightfully) yield thousands of matches, but denoising that in UI would help
      • (advanced) analyze package variants such as overrides
  • compute Levenshtein distance on matches and cut off after some fixed distance
    • may want to combine it with stripping of package set and other Nixpkgs-specific pre/postfixes
  • Leverage Nixpkgs-native CPEs (long-term)
  • Leverage crowd-sourced data

Examples

Substring matching is too dumb:

image

Low Levensthein distance is too dumb:

image

Too many affected products that don't improve the matching quality but take a lot of screen real estate:

image

This needs refinement. We discussed a few approaches to reduce the noise in suggestions ## Ideas - Filter out `hardware` type CPEs - Compile hand-curated (allow or deny) list of product names that can be used for filtering - e.g. `openshift4/ose-docker-builder` or `rhel{8,9,10}` will never concern us - Accommodate for Nixpkgs-specifics - deduplicate wrapped/unwrapped but still encourage maintainers to fix both - drivers and kernel modules contain the kernel version in their attribute names - those CVEs will have thousands of matching derivations at the moment - handle aliases - packages in multiple versions - GCC, LLVM, Python are regular offenders - we can work with version constraints and help navigation with appropriate rendering - https://github.com/Nix-Security-WG/nix-security-tracker/issues/332 - root dependencies will (rightfully) yield thousands of matches, but denoising that in UI would help - (advanced) analyze package variants such as overrides - compute Levenshtein distance on matches and cut off after some fixed distance - may want to combine it with stripping of package set and other Nixpkgs-specific pre/postfixes - Leverage Nixpkgs-native CPEs (long-term) - https://github.com/NixOS/nixpkgs/issues/354012 - Leverage crowd-sourced data - Once we dismiss derivations from a suggestion, the tracker can suggest to open a pull request for creating a Nixpkgs-native CPE to avoid confusion in the future - Related (but more involved): - https://github.com/Nix-Security-WG/nix-security-tracker/issues/216 # Examples Substring matching is too dumb: ![image](https://github.com/user-attachments/assets/25152837-890c-42ee-9572-33dcf6e5a890) Low Levensthein distance is too dumb: ![image](https://github.com/user-attachments/assets/c603d87e-35b5-4bca-8d64-1f5082045947) Too many affected products that don't improve the matching quality but take a lot of screen real estate: ![image](https://github.com/user-attachments/assets/dc3b4d41-431b-4966-8d9a-b84663e025e4)
h0nIg commented 2024-12-23 23:42:36 +00:00 (Migrated from github.com)

@fricklerhandwerk what does Long-term-operation means in months / years? Is this something for the next 1-2 years or something which may land in 6 months?

@fricklerhandwerk what does Long-term-operation means in months / years? Is this something for the next 1-2 years or something which may land in 6 months?
fricklerhandwerk commented 2025-01-07 22:40:10 +00:00 (Migrated from github.com)

The long-term milestone is merely a collection for items we want to address before committing to a production deployment. This is somewhat detached from the actual development schedule, which will depend on available funding. I'll post updates on the relevant Discourse thread if something noteworthy happens in that regard.

The long-term milestone is merely a collection for items we want to address before committing to a production deployment. This is somewhat detached from the actual development schedule, which will depend on available funding. I'll post updates on [the relevant Discourse thread](https://discourse.nixos.org/t/nixpkgs-supply-chain-security-project/34345) if something noteworthy happens in that regard.
Sign in to join this conversation.
No description provided.