CVE fetchers should work using a bulk saving context or perform bulk_create #16
Labels
No labels
automation
backend
bug
contributor experience
data
deployment
documentation
duplicate
good first issue
help wanted
nice to have
notifications
package maintainer
performance
skin
tech debt
user story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-community/nix-security-tracker#16
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently, ingesting of the initial 230K of CVEs takes around 25 minutes on a very fast CPU (130-200 CVE/s).
In practice, SQLite can do much more than that (96K inserts/s are easy.).
Because we have a lot of M2M all arounds, it's not trivial by using the "nice" ORM API.
Two solutions:
Using Django vanilla API
The classical way to solve this is to use
Model.through
which is an automatically generated table composed ofid, $containing_id, $contained_id
fields which can be used to bulk create the M2M rows.Therefore, all the fetchers code should be reworked to take lists all the time (single item is just
[x]
) and return a list of models (not yet saved!) and all of them are bulk created in the call-site.Topological sort has to be done manually, usually, we do:
Using a bulk saver context
We can also remove all reference to
save
and use a custom API à la https://gist.github.com/crucialfelix/7fa53265ed11e6761531f1b2e0d1f36a to coalesce any operation we need to.It's unclear if it would make performance faster as-is.