Matching 500KB of data with builtins.match causes stack overflow #476

Open
opened 2024-08-18 07:18:45 +00:00 by sugar · 2 comments

Describe the bug

builtins.match uses #include <regex>, which crashes when matching on too large amount of data

this causes issues in practice, for instance in 620a3c32c6, the code had to be modified to avoid regexes

Steps To Reproduce

  1. run

    nix eval --expr 'builtins.match ".*" (builtins.concatStringsSep "" (builtins.genList (_: "a") 500000))'
    
  2. see the following error

    error: stack overflow (possible infinite recursion)
    

Expected behavior

i would expect [ ] to appear, not a stack overflow

nix --version output

nix (Lix, like Nix) 2.91.0

Additional context

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164

## Describe the bug `builtins.match` uses `#include <regex>`, which crashes when matching on too large amount of data this causes issues in practice, for instance in https://github.com/sodiboo/niri-flake/commit/620a3c32c6d9a026defe9fd35954e2e1b5a17334, the code had to be modified to avoid regexes ## Steps To Reproduce 1. run ```sh nix eval --expr 'builtins.match ".*" (builtins.concatStringsSep "" (builtins.genList (_: "a") 500000))' ``` 2. see the following error ``` error: stack overflow (possible infinite recursion) ``` ## Expected behavior i would expect `[ ]` to appear, not a stack overflow ## `nix --version` output ``` nix (Lix, like Nix) 2.91.0 ``` ## Additional context https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164
sugar added the
bug
label 2024-08-18 07:18:45 +00:00
Owner

so, to start off, we agree this is a bug.

there's some history to trying to fix the regexes in lix: #34. it's been tried before to replace them with boost regex but the nixpkgs regex escape function was not escaping enough stuff which caused a regression.

the most viable option is most likely to use std::regex from libc++ on all platforms by vendoring it or to rip out the right number of features from rust regex to use that instead.

so, to start off, we agree this is a bug. there's some history to trying to fix the regexes in lix: https://git.lix.systems/lix-project/lix/issues/34. it's been tried before to replace them with boost regex but the nixpkgs regex escape function was not escaping enough stuff which caused a regression. the most viable option is most likely to use std::regex from libc++ on all platforms by vendoring it or to rip out the right number of features from rust regex to use that instead.
jade added this to the Broken regexes project 2024-08-18 10:50:24 +00:00
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/1821 ("libexpr: Replace regex engine with boost::regex")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/1821", "number": 1821, "kind": "commit message"}], "cl_meta": {"1821": {"change_title": "libexpr: Replace regex engine with boost::regex"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/1821](https://gerrit.lix.systems/c/lix/+/1821) ("libexpr: Replace regex engine with boost::regex")
jade reopened this issue 2024-08-23 00:12:17 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: lix-project/lix#476
No description provided.