fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556

lambda-science · 2024-04-17T14:17:24Z

Related Issues

fixes JsonSchemaValidator: unintended primitive value conversion in _recursive_json_to_object method #7457 JsonSchemaValidator: inconsistency in corrective message generation for Claude vs. OpenAI models #7455

Proposed Changes:

Claude Compatibility: modified the behaviour so that (i) error template is now a single message with generated json, error and schema (ii) make it so that validated messages are always "Assistant" chatmessage (for next pipeline step) and validation_errors are always "User" chatmessage (for LLM loops)
Recursive Loop in type conversion: used Claude OPUS to automatically generate a fix based on the written issue.

How did you test it?

Tested on my personal use-case and it solved my issues.

Notes for the reviewer

The behaviour is modified to only include the last messages from the conversation and not the whole history of messages (less cost for long pipeline loops, not necessary to have previous messages).

For the auto-generated fix for recursive, maybe the bug comes from the fact that sometimes json.loads(value) output a string and needs to be called twice to get the actual dict/list in the string. This is weird, but I've seen it happen. I'm not sure about the fundamental difference to be honest. Maybe it doesn't work for nested json.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

…ted by ClaudeOpus). Modify the behaviour to build the error template in a single user_message instead of two separate. Modify the behaviour to only include latest message instead of full history (very costly if long looping pipeline)

vblagoje · 2024-04-22T07:30:41Z

Looks good @lambda-science , would you please add a short reno note (see https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md) and resolve these black issues :-)

masci · 2024-05-15T19:58:58Z

Is this still relevant? Let's merge it or close.

vblagoje · 2024-05-16T08:34:06Z

It's missing reno note and unit tests. It's an important addition and would love to see @lambda-science push it towards the finish line 🏁

lambda-science · 2024-05-16T08:54:24Z

Sorry, it went out of my mind I will do it :)

lambda-science · 2024-05-16T09:24:12Z

I know why I stopped, because I had issue setting up the env (on windows).
Now that all is set, I can see there was test failling (on top of black/reno missing) so I will work on it

…ted by ClaudeOpus). Modify the behaviour to build the error template in a single user_message instead of two separate. Modify the behaviour to only include latest message instead of full history (very costly if long looping pipeline)

CLAassistant · 2024-05-16T09:55:45Z

All committers have signed the CLA.

lambda-science · 2024-05-16T10:03:51Z

Should be good now.
I had to change the test a bit because as I explained I suggested to only validate latest message (and include only latest message for validation) to optimize cost of long loops ! Tell me if you agree or not.
(So validation of multi-message history only return a list of 1 message)

coveralls · 2024-05-16T14:59:56Z

Pull Request Test Coverage Report for Build 9110242140

Details

0 of 0 changed or added relevant lines in 0 files are covered.
28 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.01%) to 90.573%

Files with Coverage Reduction	New Missed Lines	%
components/validators/json_schema.py	28	0.0%

Totals
Change from base Build 9103206687:	-0.01%
Covered Lines:	6591
Relevant Lines:	7277

💛 - Coveralls

vblagoje · 2024-05-16T16:46:23Z

haystack/components/validators/json_schema.py

@@ -142,18 +143,22 @@ def run(
                else:
                    validate(instance=content, schema=validation_schema)

-            return {"validated": messages}
+            return {"validated": [last_message]}


@lambda-science looks good but isn't this changing the method contract, do you remember why this change?

By method contract you mean signature ?
The type is the same List[ChatMessage] but now this list is always of length 1.
The reason I did this is to not have exponentional cost by passing all message history to the LLM each time. Imagine your pipe does 20 loops, you used 20+19+18+17+16... (20x(20+1)/2) 220 messages worth of tokens.
By passing only the last error (which should be the only needed to fix the json schema) you would only use 20 messages worth of tokens

Ok I see; but one can also argue that previous failure attempts are useful as well. We should provide some top_k last messages argument, wdyt @lambda-science ?

lambda-science requested a review from a team as a code owner April 17, 2024 14:17

lambda-science requested review from anakin87 and removed request for a team April 17, 2024 14:17

github-actions bot added 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Apr 17, 2024

anakin87 requested a review from vblagoje April 17, 2024 14:24

lambda-science changed the title ~~feat(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility~~ fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility Apr 17, 2024

Merge branch 'deepset-ai:main' into fix/json_schema_validator

15dbded

anakin87 removed their request for review May 16, 2024 09:25

lambda-science and others added 3 commits May 16, 2024 11:54

reno

9314424

fix test

4fe77b0

lambda-science requested a review from a team as a code owner May 16, 2024 09:55

lambda-science requested review from dfokina and removed request for a team May 16, 2024 09:55

github-actions bot added the topic:tests label May 16, 2024

vblagoje reviewed May 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556

fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556

lambda-science commented Apr 17, 2024 •

edited

vblagoje commented Apr 22, 2024

masci commented May 15, 2024

vblagoje commented May 16, 2024

lambda-science commented May 16, 2024

lambda-science commented May 16, 2024

CLAassistant commented May 16, 2024 •

edited

lambda-science commented May 16, 2024 •

edited

coveralls commented May 16, 2024

vblagoje May 16, 2024 •

edited

lambda-science May 22, 2024 •

edited

vblagoje May 22, 2024

fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556

Are you sure you want to change the base?

fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556

Conversation

lambda-science commented Apr 17, 2024 • edited

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

vblagoje commented Apr 22, 2024

masci commented May 15, 2024

vblagoje commented May 16, 2024

lambda-science commented May 16, 2024

lambda-science commented May 16, 2024

CLAassistant commented May 16, 2024 • edited

lambda-science commented May 16, 2024 • edited

coveralls commented May 16, 2024

Pull Request Test Coverage Report for Build 9110242140

Details

💛 - Coveralls

vblagoje May 16, 2024 • edited

Choose a reason for hiding this comment

lambda-science May 22, 2024 • edited

Choose a reason for hiding this comment

vblagoje May 22, 2024

Choose a reason for hiding this comment

lambda-science commented Apr 17, 2024 •

edited

CLAassistant commented May 16, 2024 •

edited

lambda-science commented May 16, 2024 •

edited

vblagoje May 16, 2024 •

edited

lambda-science May 22, 2024 •

edited