Figure out the Gerrit/Pyroscope situation #108

Open
opened 2024-09-05 11:38:28 +00:00 by k900 · 6 comments
Owner

Right now we're using Grafana Alloy to profile Gerrit, which does some horrible things with a vendored prebuilt profiler that doesn't even work correctly (see: 2c70be2d20/internal/component/pyroscope/java/asprof/asprof.go). I'm not sure what a good solution is here. Patch Alloy to use async-profiler from nixpkgs? Just invoke async-profiler ourselves and shove the output into Pyroscope somehow? Pyroscope can accept some Java specific format: https://grafana.com/docs/pyroscope/latest/configure-server/about-server-api/#jfr-format

Right now we're using Grafana Alloy to profile Gerrit, which does some horrible things with a vendored prebuilt profiler that doesn't even work correctly (see: https://github.com/grafana/alloy/blob/2c70be2d20a835a897fb7bd0c35f67c0ba92b56d/internal/component/pyroscope/java/asprof/asprof.go). I'm not sure what a good solution is here. Patch Alloy to use async-profiler from nixpkgs? Just invoke async-profiler ourselves and shove the output into Pyroscope somehow? Pyroscope can accept some Java specific format: https://grafana.com/docs/pyroscope/latest/configure-server/about-server-api/#jfr-format

What's wrong with vendored asprof, exactly?

I'm not sure how easy it would be to patch Alloy, but from looking at the docs of asprof, it seems not very user-friendly at all. It pretty much does profiling and that's it. It wouldn't be too hard to write a wrapper around it that would launch and stop it and so on and would send profiles to Grafana, but then it's just Alloy all over again. It's not too terrible though, so let's identify the issues that Alloy has and see if we can fix them.

What's wrong with vendored asprof, exactly? I'm not sure how easy it would be to patch Alloy, but from looking at the docs of asprof, it seems not very user-friendly at all. It pretty much does profiling and that's it. It wouldn't be too hard to write a wrapper around it that would launch and stop it and so on and would send profiles to Grafana, but then it's just Alloy all over again. It's not too terrible though, so let's identify the issues that Alloy has and see if we can fix them.
Author
Owner
It doesn't actually run properly, see https://grafana.forkos.org/explore?schemaVersion=1&panes=%7B%22s94%22%3A%7B%22datasource%22%3A%22loki%22%2C%22queries%22%3A%5B%7B%22refId%22%3A%22A%22%2C%22expr%22%3A%22%7Bunit%3D%5C%22alloy.service%5C%22%7D+%7C%3D+%60asprof%60%22%2C%22queryType%22%3A%22range%22%2C%22datasource%22%3A%7B%22type%22%3A%22loki%22%2C%22uid%22%3A%22loki%22%7D%2C%22editorMode%22%3A%22builder%22%7D%5D%2C%22range%22%3A%7B%22from%22%3A%22now-24h%22%2C%22to%22%3A%22now%22%7D%7D%7D&orgId=1

Don't have Explore access in Grafana :(

Don't have Explore access in Grafana :(
Author
Owner
ts=2024-09-24T08:00:50.532378134Z level=error component_path=/ component_id=pyroscope.java.java pid=816060 err="failed to reset: failed to stop : asprof failed to run: asprof failed to start /tmp/alloy-asprof-glibc-872cc6b164ca32b39e0149ac513fa23e211f47bf/bin/asprof: fork/exec /tmp/alloy-asprof-glibc-872cc6b164ca32b39e0149ac513fa23e211f47bf/bin/asprof: no such file or directory
``` ts=2024-09-24T08:00:50.532378134Z level=error component_path=/ component_id=pyroscope.java.java pid=816060 err="failed to reset: failed to stop : asprof failed to run: asprof failed to start /tmp/alloy-asprof-glibc-872cc6b164ca32b39e0149ac513fa23e211f47bf/bin/asprof: fork/exec /tmp/alloy-asprof-glibc-872cc6b164ca32b39e0149ac513fa23e211f47bf/bin/asprof: no such file or directory ```
Owner

To be a bit more precise, it works some of the time, but often fails unpacking/setting up the vendored analyzer for some reason. It usually fixes it self after some time. So it's not a pressing issue but still annoying.

To be a bit more precise, it works some of the time, but often fails unpacking/setting up the vendored analyzer for some reason. It usually fixes it self after some time. So it's not a pressing issue but still annoying.

Why does it try to fork/exec anyway?

I looked at the code for a few minutes and my best guess is that Alloy uses fork/exec to run asprof in a separate process, and maybe it re-runs it every once in a while. I think it wants to look for some data (musl or glibc libs, for example) in a tmpdir. And since tmp is, well, tmp - the tmp folders are cleaned up sometimes and Alloy freaks out. A proper fix would probably create a new tmpdir for each restart, or use some more stable directory (StateDirectory, maybe?)

Looks like it's actually documented and configurable. I'll try submitting a PR to fix it.

Why does it try to fork/exec anyway? I looked at the code for a few minutes and my best guess is that Alloy uses fork/exec to run asprof in a separate process, and maybe it re-runs it every once in a while. I think it wants to look for some data (musl or glibc libs, for example) in a tmpdir. And since tmp is, well, tmp - the tmp folders are cleaned up sometimes and Alloy freaks out. A proper fix would probably create a new tmpdir for each restart, or use some more stable directory (StateDirectory, maybe?) Looks like it's actually [documented and configurable](https://grafana.com/docs/alloy/latest/reference/components/pyroscope/pyroscope.java/#arguments). I'll try submitting a PR to fix it.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: the-distro/infra#108
No description provided.