Text can't be extracted from scanned PDF, jpg and png. #445

Takip31 · 2022-11-03T02:29:21Z

Describe the bug
The .txt file only shows arrows without any text presence.

To Reproduce
Steps to reproduce the behavior:
Use this code:

import glob
import textract

file=glob.glob(r'path/to/retrieve/file.extension')
for file_path in file:
text=textract.process(file_path)
with open(f'{file_path[:-4]}.txt', 'w') as file:
file.write(text)

Expected behavior
The text from file should be showing up.

Screenshots

Desktop (please complete the following information):

OS: Windows 10
Textract version 1.6.5
Python version 3.10
Virtual environment No

Additional context
Add any other context about the problem here.

Takip31 changed the title ~~Text can't be extracted from scanned PDF~~ Text can't be extracted from scanned PDF, jpg and png. Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text can't be extracted from scanned PDF, jpg and png. #445

Text can't be extracted from scanned PDF, jpg and png. #445

Takip31 commented Nov 3, 2022 •

edited

Text can't be extracted from scanned PDF, jpg and png. #445

Text can't be extracted from scanned PDF, jpg and png. #445

Comments

Takip31 commented Nov 3, 2022 • edited

Takip31 commented Nov 3, 2022 •

edited