find & extract information from non-machine-readable PDF’s and the OCR is already done on the document.
Web application that displays a document and allows the users to:
1 - Scroll throughall the pages of the document
2 - Enter keywords and search the document for them
3 - Highlight words that match their keywords
two sample files are given:
1 - tokens.json (OCR extracted Tokens)
2 - images.zip (11 extracted Images)
Using Python 3.6.7
- Create a virtualenv "venv" with 'python3 -m venv venv'. If you do not have virtualenv python package installed, follow the installation manual: https://virtualenv.pypa.io/en/stable/installation/
- Activate it with '. venv/bin/activate'
- Install the requirements with 'pip install -r requirements.txt'
- Run the application with 'python3 -m main'
- go to http://localhost:7474
#frontend code
please go to the frontend branch to see the actual development code used.