Locale for the daemon on macOS is plausibly wrong #511

Open
opened 2024-09-10 04:27:02 +00:00 by jade · 1 comment
Owner

It would be extremely plausible that the daemon might be running accidentally in a C locale on macOS since LC_CTYPE is set by Terminal.app but possibly not by launchd. Someone needs to check that the Lix daemon on macOS has correct locale when started by launchd.

Running with a non-UTF8 locale can cause all kinds of weirdness. It appears that POSIX states that strcasecmp (which we use for casehack verification) uses LC_CTYPE, which may not be set in our launchd plists and thus casehack may be broken for unicode filenames.

I expect we already have a few roundtripping corruption bugs of utf-8 filenames inevitably, at least on HFS+ where filenames are stored in NFD (this implies that the case hack is bork, though we already know it is bork, this is a new variant of bork #332).

Fundamentally you cannot extract NARs on every system and expect them to work because the filesystems may have wildly different opinions about what a filename means unicode wise, and the only way to find out is to fuck around (namely, eat a syscall error). Case sensitivity is merely the most obvious instance in the English speaking world.

HFS+ stores filenames in not-quite-NFD, APFS is normalization-preserving but keys on normalized file names [1] and Apple tried to hide this by messing with the Cocoa APIs (which mean we are unaffected by Apple trying to normalize our file names for us, but we are affected by multiple normalizations colliding).

[1]: the hashes of filenames are looked up by (if case insensitive) case-folded and normalized versions of the filenames, so you cannot have two different normalization forms of the same filename

https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/

It would be extremely plausible that the daemon might be running accidentally in a C locale on macOS since LC_CTYPE is set by Terminal.app but possibly not by launchd. Someone needs to check that the Lix daemon on macOS has correct locale when started by launchd. Running with a non-UTF8 locale can cause all kinds of weirdness. It appears that POSIX states that strcasecmp (which we use for casehack verification) uses LC_CTYPE, which may not be set in our launchd plists and thus casehack may be broken for unicode filenames. I expect we already have a few roundtripping corruption bugs of utf-8 filenames *inevitably*, at least on HFS+ where filenames are *stored* in NFD (this implies that the case hack is bork, though we already know it is bork, this is a new variant of bork https://git.lix.systems/lix-project/lix/issues/332). Fundamentally you *cannot* extract NARs on every system and expect them to work because the filesystems may have wildly different opinions about what a filename means unicode wise, and the only way to find out is to fuck around (namely, eat a syscall error). Case sensitivity is merely the most obvious instance in the English speaking world. HFS+ [stores filenames in not-quite-NFD](https://mjtsai.com/blog/2017/03/24/apfss-bag-of-bytes-filenames/), APFS is normalization-preserving but keys on normalized file names [1] and Apple tried to hide this by messing with the Cocoa APIs (which mean we are unaffected by Apple trying to normalize our file names for us, but we *are* affected by multiple normalizations colliding). [1]: the hashes of filenames are looked up by (if case insensitive) case-folded *and normalized* versions of the filenames, so you *cannot* have two different normalization forms of the same filename https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/
jade added the
bug
label 2024-09-10 04:27:02 +00:00
Member

This issue was mentioned on Gerrit on the following CLs:

  • comment in cl/1973 ("Set an UTF-8 compatible locale by default")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/1973", "number": 1973, "kind": "comment"}], "cl_meta": {"1973": {"change_title": "Set an UTF-8 compatible locale by default"}}} --> This issue was mentioned on Gerrit on the following CLs: * comment in [cl/1973](https://gerrit.lix.systems/c/lix/+/1973) ("Set an UTF-8 compatible locale by default")
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#511
No description provided.