Skip to content

prateekralhan/Scanned-PDFs-checker

Repository files navigation

📑📝 Scanned PDFs checker 📄👨‍💻 Project Status: Active

A minimalistic streamlit based webapp to detect and identify scanned/digitally created PDFs from a large corpus. Want to OCR the scanned docs? Don't worry we have that covered here as well 😉

Live Web-App can be found here.

demo

Installation:

  • Simply run the command pip install -r requirements.txt to install the necessary dependencies.

Usage:

  1. Simply run the command:
streamlit run app.py
  1. Navigate to http://localhost:8501 in your web-browser.
  2. By default, streamlit allows us to upload files of max. 200MB. If you want to have more size for uploading images, execute the command :
streamlit run app.py --server.maxUploadSize=1028

Results


1 2 3 4 5

Running the Dockerized App

  1. Ensure you have Docker Installed and Setup in your OS (Windows/Mac/Linux). For detailed Instructions, please refer this.
  2. Navigate to the folder where you have cloned this repository ( where the Dockerfile is present ).
  3. Build the Docker Image (don't forget the dot!! 😄 ):
docker build -f Dockerfile -t app:latest .
  1. Run the docker:
docker run -p 8501:8501 app:latest

This will launch the dockerized app. Navigate to http://localhost:8501/ in your browser to have a look at your application. You can check the status of your all available running dockers by:

docker ps

About

A streamlit based webapp to detect scanned/digital PDFs from a large corpus as well as allow the user to OCR the scanned docs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published