Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chipperv2 outputs incorrect table structure and text #319

Closed
six5532one opened this issue Dec 4, 2023 · 1 comment
Closed

Chipperv2 outputs incorrect table structure and text #319

six5532one opened this issue Dec 4, 2023 · 1 comment

Comments

@six5532one
Copy link
Contributor

six5532one commented Dec 4, 2023

Describe the bug
Yokebe.pdf
Bebevita.pdf
Mykoforte.pdf

  • Yokebe.pdf: chipperv2 separates the header of the table on page 2 as a separate table and there are multiple (OCR?) errors in the text (e.g. the "μ" in "μg", etc.)
  • Bebivita.pdf: no tables found
  • Mykoforte.pdf: chipperv2 found one table and did not detect others. There are issues in the detected table structure and text.

To Reproduce
See attached documents.
A user used the hosted API with the chipperv2 model. They also tried setting "languages" to "['deu']" and "OCR_AGENT" to "paddle" but noticed no difference. Here is their code:

import requests

unstructured_api_key = '.............' 
unstructured_api_headers = {
    "accept": "application/json",
    "unstructured-api-key": unstructured_api_key
}

unstructured_api_url = "https://api.unstructured.io/general/v0/general"

data = {
    "strategy": "hi_res",
    "pdf_infer_table_structure": "true",
    "hi_res_model_name": "yolox", --> change to chipperv2
    "languages": "['eng']"
}

file_path = "..............."
file_data = {'files': open(file_path, 'rb')}

response = requests.post(url=unstructured_api_url,
                         files=file_data,
                         data=data,
                         headers=unstructured_api_headers)
@MthwRobinson
Copy link
Contributor

Closing this because Chipper is only supported in the SaaS API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants