New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow up: Means of Evaluation and Instruction Automatic Generation #93
base: main
Are you sure you want to change the base?
Conversation
Apply Sweep Rules to your PR?
|
When I messed around with ChatGPT, it hallucinated a lot when it did a visual comparison. I'm curious if your prompt works well. How are the visual comparison results? Are they accurate? Thanks for exploring this method of improvement. |
It is not bad after limiting it to only care about the CSS properties mistakes. But I believe there are room to improve and this will be an experimental feature, The user can choose to skip it or modify the result it generate so I'd like to put it here and hopefully there will be better prompt contributed in the future |
Thank you for this and sorry I'm slow to review it. Will get it in tomorrow. |
Following up with #70 , we now have a potential way to evaluate the result provided by the system. Meanwhile, we can generate instructions automatically by vision comparison.
Now our system have vision comparison, but it is coupling with other generation process. Thus we can not fully take advantages of vision comparison:
Current flow:
Update flow:
We will add another button “Generate Instruction” which can insert the “Auto Generate Instruction” and Eval into the flow.
System design
Frontend
It should call
instructionGenerate
when user click on it.instructionGenerate
It should set disable and loading for all buttons on panel when appState is AppState.INSTRUCTION_GENERATING.
Backend
Eval
Add GPT count mistakes that previously made, this can give user a vibe of the performance of our system.
For more serious evaluation, I will explore and follow up.