bbjubjub2494/lix

Author	SHA1	Message	Date
eldritch horrors	86a1121d16	use byte indexed locations for PosIdx we now keep not a table of all positions, but a table of all origins and their sizes. position indices are now direct pointers into the virtual concatenation of all parsed contents. this slightly reduces memory usage and time spent in the parser, at the cost of not being able to report positions if the total input size exceeds 4GiB. this limit is not unique to nix though, rustc and clang also limit their input to 4GiB (although at least clang refuses to process inputs that are larger, we will not). this new 4GiB limit probably will not cause any problems for quite a while, all of nixpkgs together is less than 100MiB in size and already needs over 700MiB of memory and multiple seconds just to parse. 4GiB worth of input will easily take multiple minutes and over 30GiB of memory without even evaluating anything. if problems do arise we can probably recover the old table-based system by adding some tracking to Pos::Origin (or increasing the size of PosIdx outright), but for time being this looks like more complexity than it's worth. since we now need to read the entire input again to determine the line/column of a position we'll make unsafeGetAttrPos slightly lazy: mostly the set it returns is only used to determine the file of origin of an attribute, not its exact location. the thunks do not add measurable runtime overhead. notably this change is necessary to allow changing the parser since apparently nothing supports nix's very idiosyncratic line ending choice of "anything goes", making it very hard to calculate line/column positions in the parser (while byte offsets are very easy). (cherry picked from commit 5d9fdab3de0ee17c71369ad05806b9ea06dfceda) Change-Id: `Ie0b2430cb120c09097afa8c0101884d94f4bbf34`	2024-03-18 16:12:46 +01:00
eldritch horrors	b221a14f0a	Merge pull request #9925 from 9999years/fmt-cleanup Cleanup `fmt.hh` (cherry picked from commit 47a1dbb4b8e7913cbb9b4d604728b912e76e4ca0) Change-Id: `Id076a45cb39652f437fe3f8bda10c310a9894777`	2024-03-09 07:00:13 -07:00
eldritch horrors	08252967a8	libexpr: Support structured error classes While preparing PRs like #9753, I've had to change error messages in dozens of code paths. It would be nice if instead of EvalError("expected 'boolean' but found '%1%'", showType(v)) we could write TypeError(v, "boolean") or similar. Then, changing the error message could be a mechanical refactor with the compiler pointing out places the constructor needs to be changed, rather than the error-prone process of grepping through the codebase. Structured errors would also help prevent the "same" error from having multiple slightly different messages, and could be a first step towards error codes / an error index. This PR reworks the exception infrastructure in `libexpr` to support exception types with different constructor signatures than `BaseError`. Actually refactoring the exceptions to use structured data will come in a future PR (this one is big enough already, as it has to touch every exception in `libexpr`). The core design is in `eval-error.hh`. Generally, errors like this: state.error("'%s' is not a string", getAttrPathStr()) .debugThrow<TypeError>() are transformed like this: state.error<TypeError>("'%s' is not a string", getAttrPathStr()) .debugThrow() The type annotation has moved from `ErrorBuilder::debugThrow` to `EvalState::error`. (cherry picked from commit c6a89c1a1659b31694c0fbcd21d78a6dd521c732) Change-Id: `Iced91ba4e00ca9e801518071fb43798936cbd05a`	2024-03-09 04:47:05 -07:00
eldritch horrors	92693973b6	decouple parser and EvalState there's no reason the parser itself should be doing semantic analysis like bindVars. split this bit apart (retaining the previous name in EvalState) and have the parser really do only parsing, decoupled from EvalState. (cherry picked from commit b596cc9e7960b9256bcd557334d81e9d555be5a2) Change-Id: `I481a7623afc783e9d28a6eb4627552cf8a780986`	2024-03-09 00:25:54 -07:00
eldritch horrors	f9f8664879	rename ParserState::{makeCurPos -> at} most instances of this being used do not refer to the "current" position, sometimes not even to one reasonably close by. it could also be called `makePos` instead, but `at` seems clear in context. (cherry picked from commit 835a6c7bcfd0b22acc16f31de5fc7bb650d52017) Change-Id: `I17cab8a6cc14cac5b64624431957bfcf04140809`	2024-03-09 00:25:54 -07:00
eldritch horrors	e1cd0077f3	move ParseData to own header, rename to ParserState ParserState better describes what this struct really is. the parser really does modify its state (most notably position and symbol tables), so calling it that rather than obliquely "data" (which implies being input only) makes sense. (cherry picked from commit 007605616477f4f0d8a0064c375b1d3cf6188ac5) Change-Id: `I92feaec796530e1d4d0f7d4fba924229591cea95`	2024-03-09 00:25:54 -07:00
eldritch horrors	1342c8f18e	Merge pull request #10074 from lf-/jade/ban-implicit-fallthrough Warn on implicit switch case fallthrough (cherry picked from commit 21282c3c204597641402c6bcff8fc9ee7bc31fa1) Change-Id: `I5ebbdfb6c037d2c55254f37dd391c07c2ce7443e`	2024-03-07 00:11:12 -07:00
eldritch horrors	dd180911d8	Merge pull request #9582 from pennae/misc-opts a packet of small optimizations (cherry picked from commit ee439734e924eb337a869ff2e48aff8b989198bc) Change-Id: `I125d870710750a32a0dece48f39a3e9132b0d023`	2024-03-04 07:32:31 +01:00
Yingchi Long	3c90340fe6	libexpr: use `thread_local` to make the parser thread-safe If we call `adjustLoc`, the global variable `prev_yylloc` is shared between threads and racy. Currently, nix itself does not concurrently parsing files, but this is helpful for libexpr users. (The parser is thread-safe except this.)	2023-07-03 16:05:43 +08:00
Eelco Dolstra	27ebb97d0a	Handle EOFs in string literals correctly We can't return a STR token without setting a valid StringToken, otherwise the parser will crash. Fixes #6562.	2022-05-25 17:58:13 +02:00
pennae	6526d1676b	replace most Pos objects/ptrs with indexes into a position table Pos objects are somewhat wasteful as they duplicate the origin file name and input type for each object. on files that produce more than one Pos when parsed this a sizeable waste of memory (one pointer per Pos). the same goes for ptr<Pos> on 64 bit machines: parsing enough source to require 8 bytes to locate a position would need at least 8GB of input and 64GB of expression memory. it's not likely that we'll hit that any time soon, so we can use a uint32_t index to locate positions instead.	2022-04-21 21:46:06 +02:00
Sergei Trofimovich	9174d884d7	lexer: add error location to lexer errors Before the change lexter errors did not report the location: $ nix build -f. mc error: path has a trailing slash (use '--show-trace' to show detailed location information) Note that it's not clear what file generates the error. After the change location is reported: $ src/nix/nix --extra-experimental-features nix-command build -f ~/nm mc error: path has a trailing slash at .../pkgs/development/libraries/glib/default.nix:54:18: 53\| }; 54\| src = /tmp/foo/; \| ^ 55\| (use '--show-trace' to show detailed location information) Here we see both problematic file and the string itself.	2022-03-24 08:16:14 +00:00
pennae	0a7746603e	remove ExprIndStr it can be replaced with StringToken if we add another bit if information to StringToken, namely whether this string should take part in indentation scanning or not. since all escaping terminates indentation scanning we need to set this bit only for the non-escaped IND_STRING rule. this improves performance by about 1%. before nix search --no-eval-cache --offline ../nixpkgs hello Time (mean ± σ): 8.880 s ± 0.048 s [User: 6.809 s, System: 1.643 s] Range (min … max): 8.781 s … 8.993 s 20 runs nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 375.0 ms ± 2.2 ms [User: 339.8 ms, System: 35.2 ms] Range (min … max): 371.5 ms … 379.3 ms 20 runs nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.831 s ± 0.040 s [User: 2.536 s, System: 0.225 s] Range (min … max): 2.769 s … 2.912 s 20 runs after nix search --no-eval-cache --offline ../nixpkgs hello Time (mean ± σ): 8.832 s ± 0.048 s [User: 6.757 s, System: 1.657 s] Range (min … max): 8.743 s … 8.921 s 20 runs nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 367.4 ms ± 3.2 ms [User: 332.7 ms, System: 34.7 ms] Range (min … max): 364.6 ms … 374.6 ms 20 runs nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.810 s ± 0.030 s [User: 2.517 s, System: 0.225 s] Range (min … max): 2.742 s … 2.854 s 20 runs	2022-01-19 13:39:42 +01:00
pennae	72f42093e7	optimize unescapeStr mainly to avoid an allocation and a copy of a string that can be modified in place (ever since EvalState holds on to the buffer, not the generated parser itself). # before Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 571.7 ms ± 2.4 ms [User: 563.3 ms, System: 8.0 ms] Range (min … max): 566.7 ms … 579.7 ms 50 runs Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 376.6 ms ± 1.0 ms [User: 345.8 ms, System: 30.5 ms] Range (min … max): 374.5 ms … 379.1 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.922 s ± 0.006 s [User: 2.707 s, System: 0.215 s] Range (min … max): 2.906 s … 2.934 s 50 runs # after Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 570.4 ms ± 2.8 ms [User: 561.3 ms, System: 8.6 ms] Range (min … max): 564.6 ms … 578.1 ms 50 runs Benchmark 2: nix eval -f ../nixpkgs/pkgs/development/haskell-modules/hackage-packages.nix Time (mean ± σ): 375.4 ms ± 1.3 ms [User: 343.2 ms, System: 31.7 ms] Range (min … max): 373.4 ms … 378.2 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.925 s ± 0.006 s [User: 2.704 s, System: 0.219 s] Range (min … max): 2.910 s … 2.942 s 50 runs	2022-01-13 18:06:15 +01:00
pennae	61a9d16d5c	don't strdup tokens in the lexer every stringy token the lexer returns is turned into a Symbol and not used further, so we don't have to strdup. using a string_view is sufficient, but due to limitations of the current parser we have to use a POD type that holds the same information. gives ~2% on system build, 6% on search, 8% on parsing alone # before Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 610.6 ms ± 2.4 ms [User: 602.5 ms, System: 7.8 ms] Range (min … max): 606.6 ms … 617.3 ms 50 runs Benchmark 2: nix eval -f hackage-packages.nix Time (mean ± σ): 430.1 ms ± 1.4 ms [User: 393.1 ms, System: 36.7 ms] Range (min … max): 428.2 ms … 434.2 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 3.032 s ± 0.005 s [User: 2.808 s, System: 0.223 s] Range (min … max): 3.023 s … 3.041 s 50 runs # after Benchmark 1: nix search --offline nixpkgs hello Time (mean ± σ): 574.7 ms ± 2.8 ms [User: 566.3 ms, System: 8.0 ms] Range (min … max): 569.2 ms … 580.7 ms 50 runs Benchmark 2: nix eval -f hackage-packages.nix Time (mean ± σ): 394.4 ms ± 0.8 ms [User: 361.8 ms, System: 32.3 ms] Range (min … max): 392.7 ms … 395.7 ms 50 runs Benchmark 3: nix eval --raw --impure --expr 'with import <nixpkgs/nixos> {}; system' Time (mean ± σ): 2.976 s ± 0.005 s [User: 2.757 s, System: 0.218 s] Range (min … max): 2.966 s … 2.990 s 50 runs	2022-01-13 18:06:14 +01:00
Eelco Dolstra	81e7c40264	Optimize primop calls We now parse function applications as a vector of arguments rather than as a chain of binary applications, e.g. 'substring 1 2 "foo"' is parsed as ExprCall { .fun = <substring>, .args = [ <1>, <2>, <"foo"> ] } rather than ExprApp (ExprApp (ExprApp <substring> <1>) <2>) <"foo"> This allows primops to be called immediately (if enough arguments are supplied) without having to allocate intermediate tPrimOpApp values. On $ nix-instantiate --dry-run '<nixpkgs/nixos/release-combined.nix>' -A nixos.tests.simple.x86_64-linux this gives a substantial performance improvement: user CPU time: median = 0.9209 mean = 0.9218 stddev = 0.0073 min = 0.9086 max = 0.9340 [rejected, p=0.00000, Δ=-0.21433±0.00677] elapsed time: median = 1.0585 mean = 1.0584 stddev = 0.0024 min = 1.0523 max = 1.0623 [rejected, p=0.00000, Δ=-0.20594±0.00236] because it reduces the number of tPrimOpApp allocations from 551990 to 42534 (i.e. only small minority of primop calls are partially applied) which in turn reduces time spent in the garbage collector.	2021-11-04 15:03:40 +01:00
Taeer Bar-Yam	f14660d5e2	reset yylloc when yyless(0) is called	2021-09-29 19:47:01 -04:00
Taeer Bar-Yam	8f9429dcab	add antiquotations to paths	2021-08-06 06:46:05 -04:00
Pamplemousse	99f8fc995b	libexpr: Fix read out-of-bound on the heap Signed-off-by: Pamplemousse <xav.maso@gmail.com>	2021-07-14 09:09:42 -07:00
regnat	0d9e1af695	Remove an `unknown pragma` gcc warning	2020-12-02 14:33:20 +01:00
regnat	438977731c	shut up clang warnings - Fix some class/struct discrepancies - Explicit the overloading of `run` in the `Cmd*` classes - Ignore a warning in the generated lexer	2020-12-01 15:04:03 +01:00
Eelco Dolstra	e14e62fddd	Remove trailing whitespace	2020-06-15 14:12:39 +02:00
Ben Burdette	3bc9155dfc	a few more 'format's rremoved	2020-04-22 15:00:11 -06:00
Guillaume Maudoux	6a5bf9b143	simplify handling of extra '}'	2018-10-27 00:14:51 +02:00
aszlig	0ad643ed5c	libexpr: Use int64_t for NixInt Using a 64bit integer on 32bit systems will come with a bit of a performance overhead, but given that Nix doesn't use a lot of integers compared to other types, I think the overhead is negligible also considering that 32bit systems are in decline. The biggest advantage however is that when we use a consistent integer size across all platforms it's less likely that we miss things that we break due to that. One example would be: https://github.com/NixOS/nixpkgs/pull/44233 On Hydra it will evaluate, because the evaluator runs on a 64bit machine, but when evaluating the same on a 32bit machine it will fail, so using 64bit integers should make that consistent. While the change of the type in value.hh is rather easy to do, we have a few more options available for doing the conversion in the lexer: * Via an #ifdef on the architecture and using strtol() or strtoll() accordingly depending on which architecture we are. For the #ifdef we would need another AX_COMPILE_CHECK_SIZEOF in configure.ac. * Using istringstream, which would involve copying the value. * As we're already using boost, lexical_cast might be a good idea. Spoiler: I went for the latter, first of all because lexical_cast does have an overload for const char* and second of all, because it doesn't involve copying around the input string. Also, because istringstream seems to come with a bigger overhead than boost::lexical_cast: https://www.boost.org/doc/libs/release/doc/html/boost_lexical_cast/performance.html The first method (still using strtol/strtoll) also wasn't something I pursued further, because it is also locale-aware which I doubt is what we want, given that the regex for int is [0-9]+. Signed-off-by: aszlig <aszlig@nix.build> Fixes: #2339	2018-08-29 01:05:52 +02:00
Eelco Dolstra	1ad19232c4	Don't return negative numbers from the flex tokenizer Fixes #1374. Closes #2129.	2018-05-11 12:05:12 +02:00
Eelco Dolstra	f3c85f9eb3	Revert "Throw a specific error for incomplete parse errors." This reverts commit `6498adb002`. We don't actually use IncompleteParseError in 'nix repl'.	2018-05-11 11:40:50 +02:00
Tuomas Tynkkynen	a0e38c16bc	libexpr: Recognize newline in more places in lexer Flex's regexes have an annoying feature: the dot matches everything except a newline. This causes problems for expressions like: "${0}\ " where the backslash-newline combination matches this rule instead of the intended one mentioned in the comment: <STRING>\$\|\\\|\$\\ { /* This can only occur when we reach EOF, otherwise the above (...\|\$[^\{\"\\]\|\\.\|\$\\.)+ would have triggered. This is technically invalid, but we leave the problem to the parser who fails with exact location. */ return STR; } However, the parser actually accepts the resulting token sequence ('"' DOLLAR_CURLY 0 '}' STR '"'), which is a problem because the lexer rule didn't assign anything to yylval. Ultimately this leads to a crash when dereferencing a NULL pointer in ExprConcatStrings::bindVars(). The fix does change the syntax of the language in some corner cases but I think it's only turning previously invalid (or crashing) syntax to valid syntax. E.g. "a\ b" and ''a''\ b'' were previously syntax errors but now both result in "a\nb". Found by afl-fuzz.	2018-03-02 17:30:48 +02:00
Tuomas Tynkkynen	f67a7007a2	libexpr: Pre-reserve space in string in unescapeStr() Avoids some malloc() traffic.	2018-02-16 04:39:43 +02:00
Eelco Dolstra	2c39e4eca0	Revert "Don't parse "x:x" as a URI" This reverts commit `f90f660b24`. This broke Hydra's release.nix, which contained preCheck = ''export LOGNAME=${LOGNAME:-foo}'';	2017-11-14 15:10:52 +01:00
Eelco Dolstra	f90f660b24	Don't parse "x:x" as a URI URIs now have to contain "://" or start with "channel:".	2017-10-30 17:58:01 +01:00
Jörg Thalheim	2fd8f8bb99	Replace Unicode quotes in user-facing strings by ASCII Relevant RFC: NixOS/rfcs#4 $ ag -l \| xargs sed -i -e "/\"/s/’/'/g;/\"/s/‘/'/g"	2017-07-30 12:32:45 +01:00
Guillaume Maudoux	a143014d73	lexer: remove catch-all rules hiding real errors With catch-all rules, we hide potential errors. It turns out that `a4744254` made one cath-all useless. Flex detected that is was impossible to reach. The other is more subtle, as it can only trigger on unfinished escapes in unfinished strings, which only occurs at EOF.	2017-05-01 01:18:06 +02:00
Guillaume Maudoux	a474425425	Fix lexer to support `$'` in multiline strings.	2017-05-01 01:15:40 +02:00
Eelco Dolstra	603f08506e	Tweak error message	2016-12-06 17:18:40 +01:00
Guillaume Maudoux	e4b82af387	Improve error message on trailing path slashes	2016-11-27 17:48:46 +01:00
Guillaume Maudoux	a5e761dddb	Fix comments parsing Fixed the parsing of multiline strings ending with an even number of stars, like / this /. Added test cases for comments.	2016-11-13 17:20:34 +01:00
Scott Olson	6498adb002	Throw a specific error for incomplete parse errors. `nix-repl` will use this for deciding whether to keep waiting for input or error out right away.	2016-02-24 04:32:21 -06:00
Eelco Dolstra	b3e8d72770	Merge pull request #762 from ctheune/ctheune-floats Implement floats	2016-02-12 12:49:59 +01:00
Eelco Dolstra	5d8b7eb3e1	Revert "Revert "next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751 """ This reverts commit `b669d3d2e8`.	2016-01-20 16:34:42 +01:00
Eelco Dolstra	b669d3d2e8	Revert "next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751 "" This reverts commit `ed23c8568e`. Let's merge this after the 1.11.1 release.	2016-01-20 00:05:28 +01:00
Fabian Schmitthenner	ed23c8568e	next try for "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751 " This reverts commit `8120b6fb8a` and fixes the regression introduced in `8d22b26448`.	2016-01-19 20:35:35 +00:00
Eelco Dolstra	8120b6fb8a	Revert "don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751 " This reverts commit `8d22b26448`. It breaks Nixpkgs: $ nix-env -qa error: syntax error, unexpected IND_STR, expecting '}', at /home/eelco/Dev/nixpkgs-stable/pkgs/top-level/python-packages.nix:7605:8	2016-01-19 20:33:32 +01:00
Fabian Schmitthenner	8d22b26448	don't abort when given unmatched '}' with 'start-condition stack underflow'. This fixes #751	2016-01-12 20:40:41 +00:00
Christian Theune	a12a43046b	Edge condition: parser did not pick up floats starting exactly with 0.	2016-01-05 09:54:49 +01:00
Christian Theune	f872262e08	Fix up float parsing.	2016-01-05 09:46:37 +01:00
Christian Theune	494fc5acbb	Try a simplified version of float lexing that didn't work. The last one I tried was botchered anyway ...	2016-01-05 00:53:22 +01:00
Christian Theune	14ebde5289	First hit at providing support for floats in the language.	2016-01-05 00:40:40 +01:00
Guillaume Maudoux	467977f203	Fix the parsing of "$"'s in strings.	2015-07-03 14:09:58 +02:00
Guillaume Maudoux	65e4dcd69b	Fix the hack that resets the scanner state.	2015-07-03 13:53:36 +02:00

1 2

88 commits