New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556
base: main
Are you sure you want to change the base?
fix(JsonSchemaValidator): fix recursive loop and general LLM (claude, mistral...) compatibility #7556
Conversation
…ted by ClaudeOpus). Modify the behaviour to build the error template in a single user_message instead of two separate. Modify the behaviour to only include latest message instead of full history (very costly if long looping pipeline)
Looks good @lambda-science , would you please add a short reno note (see https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md) and resolve these black issues :-) |
Is this still relevant? Let's merge it or close. |
It's missing reno note and unit tests. It's an important addition and would love to see @lambda-science push it towards the finish line 🏁 |
Sorry, it went out of my mind I will do it :) |
I know why I stopped, because I had issue setting up the env (on windows). |
…ted by ClaudeOpus). Modify the behaviour to build the error template in a single user_message instead of two separate. Modify the behaviour to only include latest message instead of full history (very costly if long looping pipeline)
Should be good now. |
Pull Request Test Coverage Report for Build 9110242140Details
💛 - Coveralls |
@@ -142,18 +143,22 @@ def run( | |||
else: | |||
validate(instance=content, schema=validation_schema) | |||
|
|||
return {"validated": messages} | |||
return {"validated": [last_message]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lambda-science looks good but isn't this changing the method contract, do you remember why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By method contract you mean signature ?
The type is the same List[ChatMessage] but now this list is always of length 1.
The reason I did this is to not have exponentional cost by passing all message history to the LLM each time. Imagine your pipe does 20 loops, you used 20+19+18+17+16... (20x(20+1)/2) 220 messages worth of tokens.
By passing only the last error (which should be the only needed to fix the json schema) you would only use 20 messages worth of tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see; but one can also argue that previous failure attempts are useful as well. We should provide some top_k last messages argument, wdyt @lambda-science ?
Related Issues
_recursive_json_to_object
method #7457 JsonSchemaValidator: inconsistency in corrective message generation for Claude vs. OpenAI models #7455Proposed Changes:
How did you test it?
Tested on my personal use-case and it solved my issues.
Notes for the reviewer
The behaviour is modified to only include the last messages from the conversation and not the whole history of messages (less cost for long pipeline loops, not necessary to have previous messages).
For the auto-generated fix for recursive, maybe the bug comes from the fact that sometimes
json.loads(value)
output a string and needs to be called twice to get the actual dict/list in the string. This is weird, but I've seen it happen. I'm not sure about the fundamental difference to be honest. Maybe it doesn't work for nested json.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.