Skip to content

Onify Blueprint: Amazon AWS Textract - PDF to form example

License

Notifications You must be signed in to change notification settings

onify/blueprint-aws-textract-pdf-to-form

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Onify Blueprints

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

Onify Blueprint: Amazon AWS Textract - PDF to form example

Example how to 1) upload files to AWS S3 and 2) process the PDF file via AWS Textract and 3) send link to form to validate data from PDF. What you need to do is decide where the data from the form should go. But that is a different story and a different Blueprint :-)

Amazon Textract

Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Read more: https://aws.amazon.com/textract/.

Onify Blueprint: Amazon AWS Textract - PDF to form example

Screenshots

PDF

Here is a example of the PDF files that are in the "inbox".

alt text

AWS Textract

Here is how Amazon Textract sees the PDF as a form.

alt text

Form

Here is how the data ends up in the form in Onify.

alt text

Onify Flow

alt text

Requirements

  • Onify Hub API 2.3.0 or later
  • Mail configured in Onify Hub
  • Onify Agent (tagged agent)
  • Onify Flow license
  • Node.js installed (on agent)
  • Camunda Modeler 4.4 or later
  • Amazon AWS services: S3 Bucket, SNS and SQS

Included

  • 1 x Flow
  • 3 x Scripts (nodejs)

Setup

Amazon AWS

In order for this to work you need the following setup:

  1. Amazon S3 Bucket
  2. AWS user with permissions
  3. Document access key (accessKeyId) and Secure Access Key for AWS user (secretAccessKey)

NOTE: For more information, please read Configuring Amazon Textract for Asynchronous Operations

NOTE: Amazon Textract is not available in all regions. Also make sure S3 bucket and Textract are in same region.

Onify Agent

  • Copy files from .\resources\agent\scripts to .\scripts folder on Onify Agent.
  • Run npm install from the .\scripts folder
  • Update aws_config.json with AWS credentials and region.

Onify Flow

Update flow (aws-textract-pdf-to-form.bpmn) with your own variables:

  • inboxPath - Path to the PDF files
  • bucket - S3 bucket to upload files
  • mailTo - Where to send the link to the form
  • onifyUrl - URL to Onify APP (default is http://localhost:3000)
  • roleArn - The Amazon Resource Name (ARN) of an IAM role that gives Amazon Textract publishing permissions to the Amazon SNS topic
  • snsTopicArn - The Amazon SNS topic that Amazon Textract posts the completion status to
  • sqsQueueUrl - Amazon SQS url that is subscribed to the SNS topic

Run

  1. Open aws-textract-pdf-to-form.bpmn in Camunda Modeler
  2. Click Start current diagram

Support

License

This project is licensed under the MIT License - see the LICENSE file for details.