Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamlit app using Table Transformer and OCR #200

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

salman-moh
Copy link

addition of OCR to download tables directly as csv files.
HF space link: https://huggingface.co/spaces/SalML/TableTransformer2CSV

addition of OCR to download tables directly as csv files.
@salman-moh salman-moh marked this pull request as draft October 19, 2022 07:18
@salman-moh salman-moh marked this pull request as ready for review October 19, 2022 07:29
salman-moh and others added 5 commits October 19, 2022 13:04
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
@maxjeblick
Copy link

When uploading an image to the app, I'm getting:
(probably the table extraction failed on that particular image)

AttributeError: 'UploadedFile' object has no attribute 'split'
Traceback:

File "/home/user/.local/lib/python3.8/site-packages/streamlit/scriptrunner/script_runner.py", line 554, in _run_script
    exec(code, module.__dict__)
File "/home/user/app/app.py", line 501, in <module>
    asyncio.run(te.start_process(img_name, TD_THRESHOLD=0.6, TSR_THRESHOLD=0.8, padd_top=padd_top, padd_left=padd_left, padd_bottom=padd_bottom, padd_right=padd_right, delta_xmin=0, delta_ymin=0, delta_xmax=0, delta_ymax=0, expand_rowcol_bbox_top=0, expand_rowcol_bbox_bottom=0))
File "/usr/local/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
File "/usr/local/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
File "/home/user/app/app.py", line 438, in start_process
    print('No table found in the pdf-page image'+image_path.split('/')[-1])

@salman-moh
Copy link
Author

salman-moh commented Oct 19, 2022

Yea, the model did not find any bbox. Thanks for this, I have updated app to just print out 'no table found' during such a case.
Added slider for threshold, lower your threshold and check @maxjeblick

@salman-moh
Copy link
Author

@NielsRogge let me know if any other changes you see fit.

@NielsRogge
Copy link
Owner

Hi,

thanks for your PR. Maybe it's clearer to just include a link to your demo, I'd like to keep this repo just for notebooks.

@salman-moh
Copy link
Author

Sounds good, I have removed app.py and included demo link with screenshot in the readme.

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
README.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants