A RESTfull Web API service to Menelik's Berhan Ethiopic Script OCR app.
Menelik's Berhan (loosely translated as Menelik's light) is a web API for OCR services of image and pdf files containing Ethiopic Script texts.
It uses Google's open source tesseract-ocr engine and provides OCR service for texts printed in Amharic, Ge'ez and Tigrigna.
The API is implemented with the intention of using it in web applications, and the overall structure and abstractions in the app take this into consideration.
Concepts learned from previous implementation of Ethiopic Script CLI OCR app were used for the OCR process.
Please note that this OCR application is primarily designed to work with printed text. It may not perform well with handwritten text.
- OCR on Images and PDFs: Perform OCR on images and PDFs containing Ethiopic script text.
- OCR Process Tracking: Each OCR process (for image or PDF) is tracked and stored in a database for future analysis.
- Flexible OCR Outputs: OCR results can be provided in various formats including plain text, Microsoft Word, and PDF.
- OCR Result Accuracy: Provides an accuracy score for OCR results based on the average confidence level of words recognized.
- Configurable OCR Process: Users can configure the OCR process by adjusting Tesseract configuration options.
- Image Preprocessing: Includes image preprocessing capabilities to improve OCR results.
- File Storage and Metadata: Uploaded OCR input image and PDF files are stored locally, with file metadata stored in a database using class abstractions.
- Fine-Tuned Language Model: In addition to the default Tesseract language models, includes a fine-tuned model for Amharic, based on texts printed in the 1950s.
- Data Abstraction for Analysis: Abstracts input images & PDFs, Tesseract configuration used for OCR, and the OCR process & result using classes, and stores these data in the database for future analysis.
- OAuth2 User Authentication: Secure user authentication implemented using FastAPI securities (JWT).
- Python (3.8): The whole app is built with Python.
- FastApi (0.110.0): This is the web framework used.
- Uvicorn (0.29.0): An ASGI web server implementation for Python.
- MongoDB (7.0.5): This is the database used.
- Motor (3.3.2): Asynchronous Python driver for MongoDB.
- Pydantic (2.6.4): Data validation library for Python.
- python-jose (3.3.0): A JavaScript Object Signing and Encryption (JOSE) implementation in Python.
- Passlib (1.7.4): A password hashing library for Python
- Tesseract (5.3.4): This is the OCR engine used.
- PyTesseract (0.3.10): An OCR tool for Python. It's a wrapper for Tesseract-OCR Engine.
- Aiofiles (23.2.1): A library for handling asynchronous file I/O.
- PDF2Image (1.17.0): A Python module that converts PDFs into images.
- NumPy (1.24.4): A package for scientific computing with Python.
- OpenCV-python(4.9.0.80): A library for real-time computer vision.
- Pillow (10.2.0): Adds image processing capabilities to Python.
- python-docx (1.1.0): Reads, queries and modifies Microsoft Word 2007/2008 docx files.
- FPDF2 (2.7.8): A library to create PDF documents using Python.
- Pytest: This is the testing framework used.
Implemented and Tested on Ubuntu 20.04 with Python 3.8
# (optional) for tesseract version 5.* add this repository
sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
# Reload local package database
sudo apt update
# Install tesseract
sudo apt install -y tesseract-ocr
# Install gnupg and curl if they are not already available
sudo apt-get install gnupg curl
# import the MongoDB public GPG key
curl -fsSL https://www.mongodb.org/static/pgp/server-7.0.asc | \
sudo gpg -o /usr/share/keyrings/mongodb-server-7.0.gpg \
--dearmor
# Create a list file for MongoDB
echo "deb [ arch=amd64,arm64 signed-by=/usr/share/keyrings/mongodb-server-7.0.gpg ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/7.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-7.0.list
# Reload local package database
sudo apt-get update
# Install the MongoDB packages
sudo apt-get install -y mongodb-org
# Start MongoDB
sudo systemctl start mongod
# Enable MongoDB on startup
sudo systemctl enable mongod
git clone https://github.com/MenelikBerhan/REST-API_for_Ethiopic_Script_OCR.git
cd REST-API_for_Ethiopic_Script_OCR
Its recommended to setup a python vertual environment before installing requirements:
sudo apt install -y python3.8-venv
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt
Change default app variables by setting values in app_env file.
python -m api.v1.app
Change testing app variables by setting values in test_env file.
`pytest`
`pytest tests/[<test_file.py>]`
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.