A script to convert PDF files to TXT
-
Updated
Dec 8, 2022 - Python
A script to convert PDF files to TXT
Graphlit Platform
Apache Tika adapter in Go
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
A collection of PDF tools to convert, merge, and compress PDFs. Free & No installation.
IO management for PCU project
This PDFBox wrapper that can be used for extracting text and text co-ordinates from a printed PDF doc (no OCR)
Perl client for SelectPdf Online REST API
JRuby gem to pdf to text while keeping the layout from original pdf file
Pdf to text extraction using PDF parser library in codeigniter 3 sample code
C# demo for PDF to image converting, pdf text extracting, adding digital signature to pdf, adding watermark to pdf, and compressing pdf
Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents
The notebook in this repository uses pytesseract to extract text from a pdf document. The script can be used to automate text acquisition from a large body of printed resources such as books. The acquired text can then be used for dowstream tasks, such as training language models, topic models, document summarization etc
Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.
io for nocodefunctions: csv, txt, pdf, and xlsx so far
A book reader with voice control functionality for blind people
Node.js client for SelectPdf Online REST API
Python script to translate a PDF file to DOCX or ODT
Aspose.PDF for Javascript via C++
This code is designed to analyze a PDF document and determine the percentage of AI-generated content within the text. It utilizes the PyPDF2 library to extract the text from each page of the PDF and the NLTK library to check for AI-generated words.
Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.
To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."