# Regular expressions are hard Writing a high-quality implementation of POSIX Extended Regular Expressions does not seem easy. Ideally, the following features would be offered at the same time: * Strict standards compliance (see https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap09.html and https://pubs.opengroup.org/onlinepubs/9799919799/functions/regexec.html). * Predictable performance (polynomial in the length of the inputs, and linear in the length of the matched string). * Limited resource usage (memory, CPU time (also related to the previous point)). * Absence of vendor-specific syntax extensions (for portability). In practice, most implementations fall short of these goals, due to a variety of issues: * Unclear wording in the standards, leading to diverging outcomes between implementations. * Exponential matching time due to backtracking implementation (sometimes even mandated by non-standard syntax extensions). * Excessive consumption of stack memory. * Spurious matching failures (either indication of a non-match, or an error). * Incorrect capturing behaviour of parenthesised subexpressions. Here we test a small sample of regular expressions that are known to be hard to match properly against a couple of popular regex engines. ## Run it Build using Lix: nix-build -A default # or libcxx, or musl List all available engines: result/bin/driver list Check the specified engines (tries all if none is specified, not recommended on libstdc++ because `std` may crash): result/bin/driver check [engine]… Print the match results of the specified engines (tries all if none is specified, not recommended on libstdc++ because `std` may crash): result/bin/driver results [engine]… ## List of supported engines * Boost.Regex (`boost`) * C standard library (`c`) * Oniguruma (`oniguruma`, does not claim POSIX compliance) * PCRE2 (`pcre`, does not claim POSIX compliance) * RE2 (`re2`, does not claim POSIX compliance) * C++ standard library (`std`) * TRE (`tre`)