Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper results on scanned pdfs #193

Open
Shravan-Ganji opened this issue Aug 21, 2023 · 1 comment
Open

Improper results on scanned pdfs #193

Shravan-Ganji opened this issue Aug 21, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@Shravan-Ganji
Copy link

I have been trying to analyze the documents using layout parser on different types of documents, I am able to get expected results on True pdfs but not on scanned pdfs, it is detecting the scanned pdf image contents as figure or not as expected results.

I am facing this issue only for the scanned pdfs

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version, see the Layout Parser Releases

To Reproduce

import layoutparser as lp
import cv2

image = cv2.imread("test.png")
image = image[..., ::-1]

model = lp.models.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})

color_map = {
'Text': 'red',
'Title': 'blue',
'List': 'green',
'Table': 'purple',
'Figure': 'pink',
}

layout = model.detect(image)

lp.draw_box(image, layout, box_width=3,color_map=color_map)

Environment

  1. I am using windows
  2. Latest layout parser version

Contains 2 images:

1: Scanned pdf image result
2: Proper pdf image result
error
positive

@Shravan-Ganji Shravan-Ganji added the bug Something isn't working label Aug 21, 2023
@Permafacture
Copy link

Have you tried correcting the scanned images to make the background plain white? Here's a robust looking example using opencv:

https://www.freedomvc.com/index.php/2022/01/17/basic-background-remover-with-opencv/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants