Matching 500KB of data with builtins.match causes stack overflow #476

Open
opened 2024-08-18 07:18:45 +00:00 by sugar · 2 comments
Member

Describe the bug

builtins.match uses #include <regex>, which crashes when matching on too large amount of data

this causes issues in practice, for instance in sodiboo/niri-flake@620a3c32c6, the code had to be modified to avoid regexes

Steps To Reproduce

  1. run

    nix eval --expr 'builtins.match ".*" (builtins.concatStringsSep "" (builtins.genList (_: "a") 500000))'
    
  2. see the following error

    error: stack overflow (possible infinite recursion)
    

Expected behavior

i would expect [ ] to appear, not a stack overflow

nix --version output

nix (Lix, like Nix) 2.91.0

Additional context

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164

## Describe the bug `builtins.match` uses `#include <regex>`, which crashes when matching on too large amount of data this causes issues in practice, for instance in https://github.com/sodiboo/niri-flake/commit/620a3c32c6d9a026defe9fd35954e2e1b5a17334, the code had to be modified to avoid regexes ## Steps To Reproduce 1. run ```sh nix eval --expr 'builtins.match ".*" (builtins.concatStringsSep "" (builtins.genList (_: "a") 500000))' ``` 2. see the following error ``` error: stack overflow (possible infinite recursion) ``` ## Expected behavior i would expect `[ ]` to appear, not a stack overflow ## `nix --version` output ``` nix (Lix, like Nix) 2.91.0 ``` ## Additional context https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86164
Owner

so, to start off, we agree this is a bug.

there's some history to trying to fix the regexes in lix: #34. it's been tried before to replace them with boost regex but the nixpkgs regex escape function was not escaping enough stuff which caused a regression.

the most viable option is most likely to use std::regex from libc++ on all platforms by vendoring it or to rip out the right number of features from rust regex to use that instead.

so, to start off, we agree this is a bug. there's some history to trying to fix the regexes in lix: https://git.lix.systems/lix-project/lix/issues/34. it's been tried before to replace them with boost regex but the nixpkgs regex escape function was not escaping enough stuff which caused a regression. the most viable option is most likely to use std::regex from libc++ on all platforms by vendoring it or to rip out the right number of features from rust regex to use that instead.
jade added this to the Broken regexes project 2024-08-18 10:50:24 +00:00
Member

This issue was mentioned on Gerrit on the following CLs:

  • commit message in cl/1821 ("libexpr: Replace regex engine with boost::regex")
<!-- GERRIT_LINKBOT: {"cls": [{"backlink": "https://gerrit.lix.systems/c/lix/+/1821", "number": 1821, "kind": "commit message"}], "cl_meta": {"1821": {"change_title": "libexpr: Replace regex engine with boost::regex"}}} --> This issue was mentioned on Gerrit on the following CLs: * commit message in [cl/1821](https://gerrit.lix.systems/c/lix/+/1821) ("libexpr: Replace regex engine with boost::regex")
jade reopened this issue 2024-08-23 00:12:17 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lix-project/lix#476
No description provided.