Skip to content

atbasu/document-content-extractor

Repository files navigation

Document Content Extractor

The Document Content Extractor is a tool that can be used to extract semantic information from a pdf text file. This tool is useful when the structure and format of the text to be parsed in dynamic and the information to be extracted is semantic as opposed to syntactic in nature. It's not as fast as a static parser would be, but it can extract information that would be hard to encode in a static parser and does not need to be modified every time the format of the underlying document changes.

  1. The tool leverages OpenAI Completions Api to process the text.
  2. It is fully configurable, refer to the section on Fine tuning the adpater below for more information.
  3. the tool can be used both as a standalone python application or as a microservice running inside a larger application.

↓ Click on each link below to be redirected to the appropriate seciton of the Wiki

⚠️ Caution: Before using this application on your data, ensure you verify that the open ai api meets the necessary requirements for the data you want to process by visiting their security and compliance page

About

Python program that uses open ai apis to parse user specified content from text files

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages