Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are there ways to improve the accuracy of Question Answering? #331

Open
gvzdv opened this issue Jan 9, 2024 · 2 comments
Open

Are there ways to improve the accuracy of Question Answering? #331

gvzdv opened this issue Jan 9, 2024 · 2 comments
Assignees

Comments

@gvzdv
Copy link

gvzdv commented Jan 9, 2024

I followed the instructions from this notebook and started with just one document, a driving guide for British Columbia.

While the model manages to answer some questions, a lot of answers even to simple questions (like "what does flashing green light mean?" or "can you cross a solid yellow line?") are either wrong or said to be unavailable (despite the document containing them).

Is there a way to improve the accuracy of matching?

@admatt01
Copy link

I think you might've run into the number of pages limitation.

Document AI

The following limits apply for online processing with the Document OCR processor.

Limit | Value -- | -- Maximum file size | 20 MB **Maximum pages | 15**

For documents that don't meet these limits, you can use batch processing to extract the document text. (Not covered in this notebook.)

Document AI The following [limits](https://cloud.google.com/document-ai/quotas) apply for online processing with the Document OCR processor.

Limit Value
Maximum file size 20 MB
Maximum pages 15
For documents that don't meet these limits, you can use batch processing to extract the document text. (Not covered in this notebook.)

@holtskinner
Copy link
Collaborator

@admatt01's comment is not relevant to this issue because the notebook linked doesn't use Document AI.

My theory is that the current setup of the Question Answering with Documents using LangChain 🦜️🔗 and Vertex AI Matching Engine notebook isn't reading or parsing all of the text from the document. I'll need to do some investigation to see if the GCS Loader actually reads in the text or does OCR.

You actually might have more luck using the Question answering with Documents using Document AI, Pandas, and PaLM notebook which will perform OCR on the input documents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants