Skip to content

cygniv404/display-and-search-through-non-machine-readable-PDFs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

find & extract information from non-machine-readable PDF’s and the OCR is already done on the document.

Web application that displays a document and allows the users to:

1 - Scroll throughall the pages of the document

2 - Enter keywords and search the document for them

3 - Highlight words that match their keywords

two sample files are given:

1 - tokens.json (OCR extracted Tokens)

2 - images.zip (11 extracted Images)

#Instructions

Using Python 3.6.7

  1. Create a virtualenv "venv" with 'python3 -m venv venv'. If you do not have virtualenv python package installed, follow the installation manual: https://virtualenv.pypa.io/en/stable/installation/
  2. Activate it with '. venv/bin/activate'
  3. Install the requirements with 'pip install -r requirements.txt'
  4. Run the application with 'python3 -m main'
  5. go to http://localhost:7474

#frontend code

please go to the frontend branch to see the actual development code used.

About

Python | Flask | ReactJS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published