RFD: Structured interpolation via quasi-quoters and AST representation structures #835
Labels
No labels
Affects/CppNix
Affects/Nightly
Affects/Only nightly
Affects/Stable
Area/build-packaging
Area/cli
Area/evaluator
Area/fetching
Area/flakes
Area/language
Area/lix ci
Area/nix-eval-jobs
Area/profiles
Area/protocol
Area/releng
Area/remote-builds
Area/repl
Area/repl/debugger
Area/store
bug
Context
contributors
Context
drive-by
Context
maintainers
Context
RFD
crash 💥
Cross Compilation
devx
docs
Downstream Dependents
E/easy
E/hard
E/help wanted
E/reproducible
E/requires rearchitecture
imported
Language/Bash
Language/C++
Language/NixLang
Language/Python
Language/Rust
Needs Langver
OS/Linux
OS/macOS
performance
regression
release-blocker
stability
Status
blocked
Status
invalid
Status
postponed
Status
wontfix
testing
testing/flakey
Topic/Large Scale Installations
ux
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: lix-project/lix#835
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Firstly, this idea is not mine, all credits are to @pennae
and @delroth who came up with these ideas (independently, in addition, so it must be good :P).(oopsie miscommunications) This issue is a way to keep track of this milestone goal.Problem
The legacy
${ … }
interpolation conflates coercion and concatenation, which3.141592653589793…
);An example of things that users would like
let p = 2516; in "${p}"
to work out of the box without having to writetoString p
all the time, see https://gerrit.lix.systems/c/lix/+/3191 for an attempt to solve this.Design
Introduce a format-AST with a quasi-quoter, better seen in Lisp or Lean 4:
format ⟨…⟩
is parsed into the AST; no coercion yet.%d
,%.3g
,%x
, …).string
happens only when a true string is demanded.Migration
${ … }
remains supported as-is.CppNix divergence
This feature is a clear departure from CppNix's syntax.
Action Items
format ⟨…⟩
surface syntax and the supported printf subset.Structured interpolation via quasi-quoters and AST representation structuresto RFD: Structured interpolation via quasi-quoters and AST representation structuresI'm unconvinced that this is the right direction (and I think I might have miscommunicated at some point because this isn't really something I've considered).
To go over your problem statement:
${}
is in fact fairly restrictive about what it accepts (only supporting string/path/external/attrs that have __toString) there's no reason this couldn't be split off into a separate code path while keeping 100% compatibility (or, I don't know, change the already badbool coerceMore
to anenum class CoercionMode
).${}
can be any Nixlang expression returning a string. You can extend the behavior by just calling a function if you need to.I fail to see what makes for example:
Better than:
Which already works now, is arguably more readable, and does not introduce more syntax. (Assuming a
format
function which supports properly formatting floats. That's an orthogonal problem imo.)When I originally ranted about what ended up being https://gerrit.lix.systems/c/lix/+/3191 I was mostly annoyed at the case of
${}
with integers. This is a case that has a clear, unambiguous coercion to string, with no data loss. Coercing integers to strings in the context of a${}
templating is also imo not something that makes Nixlang "more weakly typed", not any more than having atoString
function which accepts different types of arguments, and that's 1. the case; 2. the current main alternative to${anIntVariable}
anyway.Oops, no, it doesn't already work now, because coercing ints with
${}
doesn't work. For the 3rd time in a week I would have had an eval failure because I didn't write${builtins.toString (length xs)}
if this was real code.this isn't just about formatting strings, it's about the general ability to represent things more cleanly. formatting is one example of this. another is a more principled way to specify build instructions (not that "lol everything gets chucked into a bash script by string replacement" was easy to make any worse).
the root problem we want to solve is that string coercions are a complete trap, especially when combined with
toString
(due to it formattingnull
andfalse
as""
,true
as1
, and arrays as space-separated concatenations of their elements). if we're going to change interpolation behavior at all it should be prinicipled and thought ought from the beginning, not just tack on more hacks to an already broken system. a formatting system with sensible (i.e., not terminally bash-brained) behavior cannot begin with the current interpolation system if you already refuse to write${toString x}
instead of${x}
because the stringification rules of the language are complete garbage. building a new system on quasiquotes is not necessary, but it sure is convenient because it allows us to implement the actual formatting behavior in a place that isn't another broken builtinGenerally I am all for the proposal!
Formatting was always a source of massive frustration in the language for me (whenever I ran into it, which isn't too often, but the times I did it was just.… urgh).
And the idea to introduce AST features, paving the way for future improvements in that direction, while not necessary, is convenient as stated before.
Some unqualified thoughts hidden behind a spoiler and disclaimer:
Big disclaimer: I am not a language person, don't take my word for anything I say here, feel free to ignore
I wanted to split this into a semantics and syntax section, but somehow I failed, so here goes a mix of both, in a horrible back and forth mix of several different stances on different parts of the proposal.
However starting off, is the syntax presented in the Design section of the OP an example of what it could look like or a solid syntax proposal (or something otherwise agreed upon)?
If it is more than a suggestion already then please read the following as an opinion only.
I will readily admit that the syntax is unfamiliar to me so my concerns are probably amplified by that, meaning I'm not a good baseline for any serious criticism.
Anyway, taking this example from the OP:
(Note: that I am largely ignoring the second format in the OP because I can't wrap my head around what that would do or how its syntax works, and if someone had a cohesive explanation of that specific bit I think it would also answer most of my uncertainties below)
What I'm reading here (as someone unfamiliar with language development) is:
"items: "
,2
,": "
, and"a, b"
(for an xs of[ "a" "b" ]
)[ length xs ]
and[ (length xs) ]
in regular code except for parens (or using a binding of sorts) I would assume that the inner parens are not special, but more on that belowformat
is either a keyword or a builtin which takes a value of type AST (and could potentially even be written asbuiltin.format
)format
keyword followed by quasi-quotation and any use other than this is a syntax error (i.e. format is not standalone and in absence of any other keywords the quasi-quotation is also not usable by itself)Given this I, as a potential user, would be a bit confused and consider it somewhat unintuitive.
If the last bullet point applies I'd prefer a syntax like this for instance:
While seeing this bothers me personally because I can't put the parens after a newline that'd be a me-problem.
More importantly however it would indicate that the combination of
format
and the rest of it make up a value and they are not separable.On the other hand if format is a builtin or keyword which is independent (although not really usable without an AST built from quasi-quotation) which the space separation indicates to me, then I would expect this to work:
This also is much more in line with what little I've seen from LISP in that 2 minute glance (though LISP is a bit special in how code is just data).
Note also that I put parens around
foo
.As mentioned earlier, reading the OP I am not sure how these come into play exactly but I have some assumptions.
However, as an alternative to that, and maybe it's just because I've worked with it recently, but I feel like the maud library (Rust) has interesting syntax which could be applicable here.
If this feature does introduce quasi-quotation as a general feature (and not strictly tied to
format
), regardless of whether it is otherwise possible to use, it would be good to get that part right from the start.It would be awful if we ended up with several different quasi-quotation syntaxes depending on whether you format things or use some other feature which may use quasi-quotation later (
tryEval
comes to mind).Basically the syntax there means that anything written as-is is taken as a token as-is, but surrounding it with parens makes it an expression.
So in the quasi-quotation syntax here
foo
could refer to afoo
token while(foo)
would evaluate (lazily I assume) the localfoo
and use its value as part of the quasi-quotation AST thingy.This would also mean that the syntax inside the backtick-parens is not necessarily purely literal, opening up possibilities like this:
Although now I'm straying pretty far from the proposal, however this would generalize a lot of things.
If we then also changed the backtick-parens to backtick-brackets to be in line with list syntax then it would be pretty clear that this is a list of tokens, and since formatting would be syntax inside quasi-quoting we wouldn't need a format builtin, but rather just a concat one which concats an AST-list of strings into a single string.
However there are "drawbacks" such as requiring literals for the format specifier, but even ignoring the really broken formatting options I don't think runtime format options within Nix code are that important, but that's not my area of expertise (none of this is, I cannot overstate that enough).
In general though, this looks pretty neat to me: