Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infer_table_structure lead Failed to initialize the model #2923

Open
spongxin opened this issue Apr 23, 2024 · 0 comments
Open

infer_table_structure lead Failed to initialize the model #2923

spongxin opened this issue Apr 23, 2024 · 0 comments
Labels
bug Something isn't working pdf

Comments

@spongxin
Copy link

Describe the bug
I use partition_pdf to parse PDF file, when I set infer_table_structure=True, it happened :

This function will be deprecated in a future release and unstructured will simply use the DEFAULT_MODEL from unstructured_inference.model.base to set default model name
Failed to initialize the model.
Ensure that the model is correct
Review the parameters to initialize a UnstructuredTableTransformerModel obj

To Reproduce

docker run -dt --name unstructured downloads.unstructured.io/unstructured-io/unstructured:latest
docker exec -it unstructured bash

my code is as below:

from unstructured.partition.pdf import partition_pdf
from collections import Counter

try:
    elements = partition_pdf(
        filename=filename,
        strategy='hi_res',
        infer_table_structure=True
    )
   print(Counter(type(element) for element in elements))
except Exception as e:
    print(e)

Expected behavior

I want to obtain text_as_html data for Table by setting infer_table_structure=True. When I set infer_table_structure=False the program runs normally.

Environment Info

[notebook-user@57ba27f71222 ~]$ python3 /data/unstructured-main/scripts/collect_env.py
/data/unstructured-main/scripts/collect_env.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
OS version: Linux-5.4.0-42-generic-x86_64-with-glibc2.34
Python version: 3.10.13
unstructured version: None
unstructured-inference version: 0.7.27
pytesseract version: 0.3.10
Torch version: 2.2.2
Detectron2 is not installed

[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip

[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Libmagic version: file-5.39
magic file from /etc/magic:/usr/share/misc/magic
LibreOffice version: LibreOffice 7.1.8.1 10(Build:1)

Thank you if u can help me about this issue !

@spongxin spongxin added the bug Something isn't working label Apr 23, 2024
@scanny scanny added the pdf label Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pdf
Projects
None yet
Development

No branches or pull requests

2 participants