Post-mortem debugging support for protocol problems and for debug logs #734

Open
opened 2025-03-13 18:36:13 +00:00 by jade · 0 comments
Owner

Currently with issues like #732, we are basically just screwed. There are no realistic ways to debug them besides some very careful guessing and basically "being pennae".

If we had a facility with a ring buffer of like 256k in size or something like that that stored frames of data in a totally ordered fashion, we could hook up protocol events to it and have a hope of actually debugging these problems.

The idea is, you have a framing mechanism where you store a frame of data and then its length directly after that. To extract the frames from the ring buffer, you pull the length from right before the write pointer and then pull the frame.

In the frames we probably want (but this is not critical, we NEED to have the facility at all, we cannot block on putting anything in particular in it):

  • Something stacktrace like (not necessarily a full stacktrace, since we want to run this in prod, but we could track callers of the serializers/deserializers with std::source_location).
  • Which serializer/deserializer it was
  • Which protocol command was being executed?
  • Debug logs?
  • The data in the serialization/deserialization (maybe truncated to 1k or something?)
  • Timestamp?
  • Thread ID?

When Lix crashes via terminate, we could print out the base64 of the ring buffer (in the proper order but not deframed).

Then we could write a debug tool in Rust or GNU poke or something to pull the frames apart.

I discussed this with @lunaphied who was thinking about doing some of it.

Currently with issues like https://git.lix.systems/lix-project/lix/issues/732, we are basically just screwed. There are no realistic ways to debug them besides some very careful guessing and basically "being pennae". If we had a facility with a ring buffer of like 256k in size or something like that that stored frames of data in a totally ordered fashion, we could hook up protocol events to it and have a hope of actually debugging these problems. The idea is, you have a framing mechanism where you store a frame of data and then its length directly after that. To extract the frames from the ring buffer, you pull the length from right before the write pointer and then pull the frame. In the frames we probably want (but this is not critical, we NEED to have the facility at all, we cannot block on putting anything in particular in it): - Something stacktrace like (not necessarily a full stacktrace, since we want to run this in prod, but we could track callers of the serializers/deserializers with `std::source_location`). - Which serializer/deserializer it was - Which protocol command was being executed? - Debug logs? - The data in the serialization/deserialization (maybe truncated to 1k or something?) - Timestamp? - Thread ID? When Lix crashes via terminate, we could print out the base64 of the ring buffer (in the proper order but not deframed). Then we could write a debug tool in Rust or GNU poke or something to pull the frames apart. I discussed this with @lunaphied who was thinking about doing some of it.
lunaphied was assigned by jade 2025-03-13 18:36:13 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#734
No description provided.