Post run diagnostics #39

2023-10-04T04:05:01Z

grahamc commented

2023-10-04 04:05:01 +00:00

(Migrated from github.com)

Description

Sometimes Nix releases will cause widespread breakage that is hard to identify. For example, the recent Nix 2.18.0 release. To provide the best experience we can, we would like to know if a Nix bump is causing users to have suddenly broken CI experiences so we can roll it back.

This extends our action to send a post-workflow-run diagnostics report which sends 2 bits of data: failure | cancelled | success. This also sets the "attribution" property to a random UUID for the installer's diagnostic report, to allow us to correlate the install diagnostic with the subsequent post-run diagnostic field. This correlation is useful and necessary to connect the status to the version of the installer and Nix that was installed, and other diagnostic data in that original capture. This doesn't give us any new insight into who our users are or their behavior, nor does it offer anything identifiable.

Checklist

Tested changes against a test repository
Added or updated relevant documentation (leave unchecked if not applicable)
(If this PR is for a release) Updated README to point to the new tag (leave unchecked if not applicable)

##### Description Sometimes Nix releases will cause widespread breakage that is hard to identify. For example, [the recent Nix 2.18.0 release](https://github.com/NixOS/nix/issues/9052). To provide the best experience we can, we would like to know if a Nix bump is causing users to have suddenly broken CI experiences so we can roll it back. This extends our action to send a post-workflow-run diagnostics report which sends 2 bits of data: failure | cancelled | success. This also sets the "attribution" property to a random UUID for the installer's diagnostic report, to allow us to correlate the install diagnostic with the subsequent post-run diagnostic field. This correlation is useful and necessary to connect the status to the version of the installer and Nix that was installed, and other diagnostic data in that original capture. This doesn't give us any new insight into who our users are or their behavior, nor does it offer anything identifiable. ##### Checklist - [ ] Tested changes against a test repository - [ ] Added or updated relevant documentation (leave unchecked if not applicable) - [ ] (If this PR is for a release) Updated README to point to the new tag (leave unchecked if not applicable)

cole-h (Migrated from github.com) reviewed 2023-10-04 15:07:45 +00:00

dist/licenses.txt

					
				@ -4,3 +4,3 @@

				Copyright (c) 2016 - 2020 Node Fetch Team

				Copyright (c) 2016 David Frank

cole-h (Migrated from github.com) commented

2023-10-04 15:07:45 +00:00

?

cole-h (Migrated from github.com) reviewed 2023-10-04 15:09:42 +00:00

cole-h (Migrated from github.com) reviewed 2023-10-04 15:18:17 +00:00

src/main.ts

cole-h (Migrated from github.com) commented

2023-10-04 15:18:17 +00:00

I don't know how the post-run whatever stuff works, but it feels to me like it would just... re-run this and thus get a new random UUID...? Am I wrong?

grahamc commented

2023-10-04 17:05:54 +00:00

(Migrated from github.com)

For readers: we've had a bit of discussion internally about if this PR is a good idea. We don't all agree, but I've decided that we're going to try it. We discussed the privacy implications and whether or not the data will even be useful. In general, I agree it does feel a little bit weird to be collecting overall workflow / job statuses, however since we don't collect any data which connects the reports to a given organization or repository it is only providing data in aggregate. That's the point, though. I'm open to ideas about how this data will be somehow identifying, I just haven't figured any out.

Whether the data will be useful or not: I'm not sure. I think it will be. I'll explain: The goal of the Determinate Nix Installer is to provide a working Nix. Not just the most recent, or a version of Nix -- but an installation that works as the user is expecting. We do existing work here: like running a self-test after the installation completes. This is a great start, but it doesn't check a lot. It shows if the bare minimum of the Nix installation was successful, and isn't able to check more of Nix's behavior to identify larger problems like what Nix 2.18.0 had with invalid store paths.

So, why record aggregate job conclusions? The way we roll out updates to the Nix installer is by gradually ramping new releases out, starting with something like 10-20% of GitHub Actions installs. (That's the purpose of the ?ci=github argument in the download URL.) We prioritize initial rollout for GHA because the environment is highly likely to be ephemeral, and the user "cost" of a failure is smaller: one re-run away from a clean environment where they're not likely to get the new version again. Compared to starting with users, who end up in a bad state and have to possibly uninstall and reinstall.

One reason the data may not be useful is if a particularly large user of our action has a bad day and has many many failures in the day. Or perhaps some related infrastructure is broken, causing the job failures to rise. This might be true, and if the data isn't useful we will stop collecting it and delete it. However, we perform many thousands of installs every day on GitHub Actions. In that way we get quite a lot of "signal". And, importantly, using the percentage-based rollout strategy, I believe this will cause heavy users to be roughly balanced between the released cases. The data is intended to be examined in using comparative analysis: % of runs per outcome in version A, vs. the % of runs per outcome in version B. I think with the install frequency and the randomized distribution of A vs. B, we'll find reasonable signal in the results. However, again: if we don't, we'll get rid of it.

For readers: we've had a bit of discussion internally about if this PR is a good idea. We don't all agree, but I've decided that we're going to try it. We discussed the privacy implications and whether or not the data will even be useful. In general, I agree it does feel a little bit weird to be collecting overall workflow / job statuses, however since we don't collect any data which connects the reports to a given organization or repository it is only providing data in aggregate. That's the point, though. I'm open to ideas about how this data will be somehow identifying, I just haven't figured any out. Whether the data will be useful or not: I'm not sure. I think it will be. I'll explain: The goal of the Determinate Nix Installer is to provide a working Nix. Not just the most recent, or a version of Nix -- but an installation that works as the user is expecting. We do existing work here: like running a self-test after the installation completes. This is a great start, but it doesn't check a lot. It shows if the bare minimum of the Nix installation was successful, and isn't able to check more of Nix's behavior to identify larger problems like what Nix 2.18.0 had with invalid store paths. So, why record aggregate job conclusions? The way we roll out updates to the Nix installer is by gradually ramping new releases out, starting with something like 10-20% of GitHub Actions installs. (That's the purpose of the `?ci=github` argument in the download URL.) We prioritize initial rollout for GHA because the environment is highly likely to be ephemeral, and the user "cost" of a failure is smaller: one re-run away from a clean environment where they're not likely to get the new version again. Compared to starting with users, who end up in a bad state and have to possibly uninstall and reinstall. One reason the data may not be useful is if a particularly large user of our action has a bad day and has many many failures in the day. Or perhaps some related infrastructure is broken, causing the job failures to rise. This might be true, and if the data isn't useful we will stop collecting it and delete it. However, we perform many thousands of installs every day on GitHub Actions. In that way we get quite a lot of "signal". And, importantly, using the percentage-based rollout strategy, I believe this will cause heavy users to be roughly balanced between the released cases. The data is intended to be examined in using comparative analysis: % of runs per outcome in version A, vs. the % of runs per outcome in version B. I think with the install frequency and the randomized distribution of A vs. B, we'll find reasonable signal in the results. However, again: if we don't, we'll get rid of it.

grahamc (Migrated from github.com) reviewed 2023-10-04 17:06:46 +00:00

dist/licenses.txt

					
				@ -4,3 +4,3 @@

				Copyright (c) 2016 - 2020 Node Fetch Team

				Copyright (c) 2016 David Frank

grahamc (Migrated from github.com) commented

2023-10-04 17:06:46 +00:00

This is automatically maintained by typscript I think.

grahamc (Migrated from github.com) reviewed 2023-10-04 17:08:12 +00:00

src/main.ts

grahamc (Migrated from github.com) commented

2023-10-04 17:08:12 +00:00

ohp lol

Sign in to join this conversation.

No reviewers

No labels

No milestone

No project

No assignees

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix-install-action#39