You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I use partition_pdf to parse PDF file, when I set infer_table_structure=True, it happened :
This function will be deprecated in a future release and unstructured will simply use the DEFAULT_MODEL from unstructured_inference.model.base to set default model name
Failed to initialize the model.
Ensure that the model is correct
Review the parameters to initialize a UnstructuredTableTransformerModel obj
from unstructured.partition.pdf import partition_pdf
from collections import Counter
try:
elements = partition_pdf(
filename=filename,
strategy='hi_res',
infer_table_structure=True
)
print(Counter(type(element) for element in elements))
except Exception as e:
print(e)
Expected behavior
I want to obtain text_as_html data for Table by setting infer_table_structure=True. When I set infer_table_structure=False the program runs normally.
Environment Info
[notebook-user@57ba27f71222 ~]$ python3 /data/unstructured-main/scripts/collect_env.py
/data/unstructured-main/scripts/collect_env.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
OS version: Linux-5.4.0-42-generic-x86_64-with-glibc2.34
Python version: 3.10.13
unstructured version: None
unstructured-inference version: 0.7.27
pytesseract version: 0.3.10
Torch version: 2.2.2
Detectron2 is not installed
[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Libmagic version: file-5.39
magic file from /etc/magic:/usr/share/misc/magic
LibreOffice version: LibreOffice 7.1.8.1 10(Build:1)
Thank you if u can help me about this issue !
The text was updated successfully, but these errors were encountered:
Describe the bug
I use
partition_pdf
to parse PDF file, when I set infer_table_structure=True, it happened :This function will be deprecated in a future release and
unstructured
will simply use the DEFAULT_MODEL fromunstructured_inference.model.base
to set default model nameFailed to initialize the model.
Ensure that the model is correct
Review the parameters to initialize a UnstructuredTableTransformerModel obj
To Reproduce
my code is as below:
Expected behavior
I want to obtain
text_as_html
data for Table by settinginfer_table_structure=True
. When I set infer_table_structure=False the program runs normally.Environment Info
[notebook-user@57ba27f71222 ~]$ python3 /data/unstructured-main/scripts/collect_env.py
/data/unstructured-main/scripts/collect_env.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
OS version: Linux-5.4.0-42-generic-x86_64-with-glibc2.34
Python version: 3.10.13
unstructured version: None
unstructured-inference version: 0.7.27
pytesseract version: 0.3.10
Torch version: 2.2.2
Detectron2 is not installed
[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Libmagic version: file-5.39
magic file from /etc/magic:/usr/share/misc/magic
LibreOffice version: LibreOffice 7.1.8.1 10(Build:1)
Thank you if u can help me about this issue !
The text was updated successfully, but these errors were encountered: