Runtime error in manual ingestion from evaluation file #344

New issue

Open

opened 2024-11-12 17:10:27 +00:00 by alejandrosame · 0 comments

alejandrosame commented

2024-11-12 17:10:27 +00:00

(Migrated from github.com)

When trying to run a complete data setup from scratch, there's a runtime error that prevents from parsing most of the evaluation file.

To reproduce, run from the root of the project the following command (CONTRIBUTING.md mentions how to get the evaluation.jsonl file):

manage ingest_manual_evaluation cd17fb8b3bfd63bf4a54512cfdd987887e1f15eb nixos-unstable ./contrib/evaluation.jsonl

I didn't look too deep into how to fix this, but applying the following patch allows the user to run the command without runtime errors:

diff --git a/src/website/shared/management/commands/ingest_manual_evaluation.py b/src/website/shared/management/commands/ingest_manual_evaluation.py
index 8e6d6c7..41e3d0b 100644
--- a/src/website/shared/management/commands/ingest_manual_evaluation.py
+++ b/src/website/shared/management/commands/ingest_manual_evaluation.py
@@ -562,7 +562,11 @@ def parse_evaluation_results(
     lines: list[str],
 ) -> Generator[PartialEvaluatedAttribute, None, None]:
     for line in lines:
-        raw = json.loads(line)
+        try:
+            raw = json.loads(line)
+        except json.JSONDecodeError as e:
+            print(raw, e)
+            continue
         error = raw.get("error")
         yield PartialEvaluatedAttribute(
             attr=raw.get("attr"),

If this workaround is considered ok-ish, I can submit it as a PR.

When trying to run a complete data setup from scratch, there's a runtime error that prevents from parsing most of the evaluation file. To reproduce, run from the root of the project the following command (CONTRIBUTING.md mentions how to get the evaluation.jsonl file): `manage ingest_manual_evaluation cd17fb8b3bfd63bf4a54512cfdd987887e1f15eb nixos-unstable ./contrib/evaluation.jsonl` I didn't look too deep into how to fix this, but applying the following patch allows the user to run the command without runtime errors: ``` diff --git a/src/website/shared/management/commands/ingest_manual_evaluation.py b/src/website/shared/management/commands/ingest_manual_evaluation.py index 8e6d6c7..41e3d0b 100644 --- a/src/website/shared/management/commands/ingest_manual_evaluation.py +++ b/src/website/shared/management/commands/ingest_manual_evaluation.py @@ -562,7 +562,11 @@ def parse_evaluation_results( lines: list[str], ) -> Generator[PartialEvaluatedAttribute, None, None]: for line in lines: - raw = json.loads(line) + try: + raw = json.loads(line) + except json.JSONDecodeError as e: + print(raw, e) + continue error = raw.get("error") yield PartialEvaluatedAttribute( attr=raw.get("attr"), ``` If this workaround is considered ok-ish, I can submit it as a PR.