Incremental reformatting has significant costs #948
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#948
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is a summary of my position on a Matrix discussion between @weethet, @raito, @cobaltcause, @qyriad, and I.
The current Lix codebase is not consistently formatted. There is a pre‐commit hook that formats only changed parts of files but CI does not check formatting.
Partial formatting has costs:
The codebase is in an inconsistent format, with the usual costs to readability.
Pre‐commit hooks can be intrusive to workflows, as they can cause creating draft commits to be a blocking or even fallible operation.
Pre‐commit hooks don’t even necessarily run: users of Jujutsu,
git revise
,git-branchless
, or even the upstreamgit replay
command will not get pre‐commit hooks run, as they are unsupported by design – because pre‐commit hooks are impossible to run in‐memory and depend on expensive, mutating checkouts. I know there are several Lix contributors who use Jujutsu andgit revise
.In combination with the lack of CI enforcement, this means that there is no guarantee of monotonic formatting progress, and contributors such as myself who do not want to send egregious diffs but whose workflow does not work with partial formatting may have to put avoidable manual toil into code formatting.
(This is not to say people who like and can use pre‐commit hooks shouldn’t be able to use them. I’m fine with the pre‐commit hooks existing regardless of any of this; I just can’t use them and really didn’t want to even when I could.)
LSPs, editors with save‐on‐format functionality,
treefmt
, andjj fix
don’t reliably support range formatting. The interface surface of something that has to rewrite diffs in context is much more complicated, and there is an inherent performance cost to computing diffs and rebasing patches in a snapshot‐based VCS, so the set of tooling that does support this scheme is a small subset of the set of tooling that can handle automated formatting. Although the Lix repository usestreefmt
, it (surprisingly, to me) does not format the C++ code for this reason. In general, users are much more likely to have issues contributing formatted code, and resolving conflicts caused by formatting.It will take indefinitely long for the entire codebase to become fully formatted, especially in the absence of any enforcement of formatting of changed hunks. (Enforcement, on the other hand, would remove the current fallback option available to people with tooling issues of just ignoring the formatting style.)
Full formatting has costs:
Conflicts will be caused when a change from before formatting is rebased, cherry‐picked, or reverted.
A
.git-blame-ignore-revs
would need to be configured, andgit blame
can still show formatting commits in the presence of significant whitespace churn.There is a flag day where changes acceptable to submit before would not be acceptable to submit after (though there would be at the end of the partial formatting process, too, and Lix already requires changes to be based on the latest HEAD to submit).
There were some other potential costs of full formatting that were raised on Matrix, but I think they were all addressed (e.g. difficulty backporting changes to stable branches does not seem like a concern, as it would be easy and safe to format stable branches simultaneously).
I believe that the costs of partial formatting outweigh the costs of full formatting:
Conflicts caused solely by formatting are by definition trivially mechanically resolvable. This can be done by a Git mergetool,
jj fix
, or a rebase script like Nixpkgs has.This is a fully‐reliable process (you need only “format both sides”: format the tree before a change, format the tree after a change, and take the diff, and you have successfully rebased a patch across a reformat), so formatting changes alone can never cause more than one additional command’s worth of friction to a merge, rebase, or cherry‐pick; any remaining conflicts will be ones that would exist regardless of reformatting.
This also means that (Gerrit integration surface permitting) the web‐based rebase button that is already required to avoid manual CLI work to submit changes that are not based on the latest HEAD could be augmented to trivially resolve any formatting issues for changes proposed before a tree‐wide reformatting.
The tooling issues caused by partial formatting, on the other hand, are much less trivial to resolve than running a single command.
git blame
is an invaluable tool, but the development shell can set up the.git-blame-ignore-revs
in the same way it currently sets up pre‐commit hooks. It experienced an unfortunate slight reduction in convenience in Nixpkgs even in the presence of.git-blame-ignore-revs
, but the style change there was drastic, involving touching the majority of the repository, drastically shifting the indentation of many definitions, and causing many lines to be split and merged. From what I have seen of theclang-format
format used by Lix, the style change would be much less drastic, and it would therefore be much less common forgit blame
to have any issues attributing lines. This could of course be tested before such a reformat.Based on the discussion, I believe the costs of partial formatting have been considerably underestimated and the costs of full formatting have been considerably overestimated.
(FWIW, Nix recently did a tree‐wide reformat.)
(Not in favor of this proposal but will do a more thorough answer later once time permits.)
Here's the diffstat of a full formatting for reference starting at revision a4717e1315ab62e089e912c2b95b1461f8e49680.
the current autoformatter config for c++ is a best approximation of a reasonable style and produces extremely distateful results in a number of corner cases we have in the codebase[1]. applying it across the entire tree in bulk to make a few people's life easier because their tools of choice do not have full support for our development setup is a very, very skewed proposal. if we do anything here we should disable the formatting check in ci, and maybe the pre-commit hook for c++ formatting as well.
clarification [1]: clang-format simply does not support a number of things we'd need it to do to not produce nonsense when nobody is looking. that alone is reason enough to NACK a bulk format.
Adding on @pennae's comment:
Looking at the reformatting output for a while.
// NOLINT
aware meaning that manual fixups will be required on the top of that bulk reformat, their cost is unknown.#ifdef
is not great in certain cases and makes things unreadable, e.g. AutoCloseFD flags assembling.It's plausible that more of the clang-format output cannot be taken at face value without manual verification.
In general, I would like to generalize the following:
The codebase is an inconsistent state. With or without formatting, this changes nothing in my eyes.
Moving this particular point aside, the rest looks like DX tool integration, Lix is first and foremost a codebase meant for
git
, yes, it doesn't work super well withgit-revise
, butgit-clang-format
can operate to reformat automatically viagit rebase -i --exec
.Picking a certain setup will probably increase toil, and I'm more than willing to assist on decreasing it, but I think bulk formatting is the thing that would add too much to my overloaded plate.
Therefore, I reiterate that I'm against this change.
Interested in the opinions of @jade on this.
I would love to have a code formatter consistently applied, but in spite of my best efforts as the author of the clang format configuration, it still kind of sucks! I think a proposal to improve the state of things requires triaging the egregiously bad formatting instances and figuring out why they happened and if we can fix them.
If we can't fix them then we should add excludes to the relevant code blocks and move on with life. The suggestion that it's a NOLINT problem is incorrect: there is a disable comment, it's just not called NOLINT.
The partial format situation sucks quite a bit and I've been annoyed at it a lot. it's better than no formatter to some degree because it means we're not adding more diff to an eventual proper reformat run, but it is really frustrating to use with jj. Fundamentally it's a stepping stone to just running clang-format and I think we're being too conservative about the amount of bother it is to do that. But also I've been kind of unimpressed with the output quality of clang format. idk.
I think you'd get warmer reception if clang-format were anywhere near as good and well behaved as either nixfmt, fourmolu, or rustfmt. But clang-format makes several utterly baffling alignment choices that are seemingly impossible to turn off and it thus sucks. I wonder how much it would be annoying to fix those upstream.
(It would have been nice if “the
clang-format
style is bad and we are automatically setting up tooling that makes the code worse” had been mentioned a single time as an objection during the long Matrix discussion, as it would have saved me the trouble of writing up this issue and I could have opened a simpler one to propose disabling the formatter entirely.)the nolint problem isn't that clang-format can't be disabled selectively, it's that it sometimes shuffles nolint comments and thus breaks them
@emilazy wrote in #948 (comment):
This is a larger problem which is that not everyone holds this view and the people who are cranky about clang format or remembered well that original discussion and testing a year ago weren't looking at matrix at the time (or literally weren't in the channel at all) or so, so it's not just that "nobody told you" but "the ppl holding the opinions were different ppl than were on matrix".
I didn't realize the incremental formatting tooling was especially objectionable at all, so that part is news to me! I did know that part of the reason we hadn't fully applied it was that the style made code illegible sometimes.