#

pdf-to-text

Here are 58 public repositories matching this topic...

fabriziomiano / pdf2txt-azure-ocr

A script to convert PDF files to TXT

converter ocr azure-cognitive-services pdf-to-text pdf-to-image pdf-tools

Updated Dec 8, 2022
Python

orijtech / tikago

Apache Tika adapter in Go

tika pdf-to-text apache-tika transcribe docs-to-text

Updated Jan 4, 2017
Go

zevio / pcu_io

IO management for PCU project

python pdf parser json text pdf-to-text input-output pcu pcu-io json-to-text

Updated Nov 28, 2018
Python

kanishk-mehta / PDFBox-get-Coordinates-of-text

This PDFBox wrapper that can be used for extracting text and text co-ordinates from a printed PDF doc (no OCR)

pdf-to-text coordinate pdf-reading text-coordinates

Updated Jul 10, 2018
JavaScript

Directorman9 / Optical-character-recognition

The notebook in this repository uses pytesseract to extract text from a pdf document. The script can be used to automate text acquisition from a large body of printed resources such as books. The acquired text can then be used for dowstream tasks, such as training language models, topic models, document summarization etc

ocr pdf-to-text pytesseract

Updated Apr 30, 2022

dongju93 / extract-ti-from-reports

Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.

python pdf json regex jupyter-notebook pdf-to-text threat-intelligence text-to-json

Updated Mar 24, 2024
Jupyter Notebook

Dheovani / PDFConverter

Python script to translate a PDF file to DOCX or ODT

pdf python-script pdf-converter python3 docx pdf-to-text odt docx-generator odf pdf-to-docx pdf-to-odt

Updated May 12, 2024
Python

Kamaruddheen / document-scanner

Extract structured text and data from documents like invoices, book pages, tables, etc.. using OpenCV and Tesseract OCR

python opencv tesseract-ocr pdf-to-text image-to-text

Updated Mar 14, 2024
HTML

pashaq / PdfToText-Converter

Converting the Pdf and Fb2 documents to text or to the list of articles.

pdf csharp lib pdf-to-text itext pdf2txt fb2-to-text

Updated Aug 23, 2020
C#

mfakca / pdf2text

PDF'leri metne dönüştürür

pdf-converter pdf-to-text

Updated Oct 9, 2021
Roff

53buahapel / pdf-to-text-converter

python script that i made to convert pdf to text

pdf pdf-converter pdf-to-text pdf-to-image

Updated Dec 6, 2023
Python

ajaycode / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

nlp pdf machine-learning natural-language-processing information-retrieval ocr deep-learning ml docx preprocessing pdf-to-text data-pipelines donut document-image-processing pdf-to-json document-ai document-image-analysis document-parsing langchain

Updated Mar 3, 2023
HTML

datalogics / apdfl-csharp-dotnet-framework-samples

Sample code for the Datalogics .NET Framework interface of the Adobe PDF Library

pdf ocr pdf-converter pdf-document pdf-conversion pdf-generation pdf-to-text pdf-manipulation pdfa pdf-split pdf-merger pdf-parser pdf-to-image pdf-tools pdf-compression pdf-lib pdf-render ocr-pdf pdf-to-office

Updated Apr 15, 2024
C#

selectpdf / selectpdf-api-perl-client

Perl client for SelectPdf Online REST API

html-to-pdf pdf-generator pdf-generation pdf-to-text pdf-merge pdf-generator-api html-to-pdf-converter search-pdf html-to-pdf-api

Updated Nov 17, 2021
Perl

aishwarya-art / Pdf-to-text-extract

Pdf to text extraction using PDF parser library in codeigniter 3 sample code

extraction pdf-to-text codeigniter3 composer-library pdfparser samlot

Updated Oct 5, 2023
PHP

amitbd1508 / Blind-EYE

A book reader with voice control functionality for blind people

windows pdf csharp winforms voice-recognition pdf-to-text voice-assistant

Updated Jun 29, 2020
C#

selectpdf / selectpdf-api-nodejs-client

Node.js client for SelectPdf Online REST API

pdf pdf-converter html-to-pdf pdf-to-text pdf-merge html-to-pdf-converter html-to-pdf-api pdf-merge-api pdf-to-text-api

Updated Nov 23, 2021
JavaScript

zevio / pcu_pdf

PDF parser component (Apache Tika) for PCU project

python pdf parser component tika apache pdf-to-text pcu pdf-parser-component

Updated Nov 28, 2018
Python

selectpdf / selectpdf-api-ruby-client

Ruby client for SelectPdf Online REST API

html-to-pdf pdf-to-text pdf-merge pdf-api html-to-pdf-api html-to-pdf-ruby

Updated Nov 17, 2021
Ruby

SaiGanesh-S / OCR-Django

Implementing the concept of Optical Character Recognition in Django

ocr pdf-to-text image-to-text django-project ocr-python

Updated Jan 26, 2023
Python

Improve this page

Add a description, image, and links to the pdf-to-text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-to-text topic, visit your repo's landing page and select "manage topics."