A robust API for extracting information from Bangladesh National ID Cards using OCR technology. This service provides a secure, rate-limited REST API that processes images and extracts key information such as name, date of birth, and ID number.
- Robust Text Extraction: Uses EasyOCR with specialized patterns for Bangladesh ID cards
- High Accuracy: Multiple pattern matching algorithms to handle various ID card formats
- Secure API: Token-based authentication and request rate limiting
- Cross-Platform: Works on both Windows and Linux environments
- Field Validation: Validates extracted information against provided data
- Resource Management: Efficient cleaning of temporary files
- Comprehensive Logging: Detailed logs for debugging and auditing
- Python 3.8+ (tested on Python 3.12 and 3.13)
- Flask web framework
- OpenCV for image processing
- EasyOCR for text extraction
- Storage space for model files (~100MB)
-
Clone the repository:
git clone https://github.com/yourusername/nid-ocr-extractor.git cd nid-ocr-extractor
-
Create environment file:
# Copy example environment file cp .env.example .env # Generate secure tokens and update .env file python -c "import secrets; print(f'SECRET_KEY={secrets.token_hex(32)}')" python -c "import secrets; print(f'AUTH_TOKEN={secrets.token_hex(16)}')"
-
Create and activate virtual environment:
python -m venv win_venv win_venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt # Windows needs python-magic-bin instead of python-magic pip uninstall -y python-magic pip install python-magic-bin
-
Create and activate virtual environment:
python -m venv venv source venv/bin/activate
-
Install system dependencies:
For Debian/Ubuntu:
sudo apt-get update sudo apt-get install -y libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev
For Arch Linux:
sudo pacman -Syu sudo pacman -S mesa glib2 libx11 libxext libxrender
-
Install Python dependencies:
pip install -r requirements.txt
# Start the Flask server
python app.py
By default, the server runs on http://localhost:5000
.
The repository includes a client script for testing the API:
# Basic usage with default settings
python client.py --token YOUR_AUTH_TOKEN
# Specify a custom image
python client.py --image path/to/id/image.jpg --token YOUR_AUTH_TOKEN
# Compare with known data
python client.py --name "John Doe" --dob "15 Mar 1985" --token YOUR_AUTH_TOKEN
Health check endpoint that confirms the API is running.
Processes an ID card image and extracts information.
Request Headers:
X-API-Token
: Your authentication token from .env file
Form Data:
image
: The image file (JPEG, PNG)Name
(optional): Name for comparisonDate of Birth
(optional): Date of birth for comparison
Response:
{
"Name": "MD SAMIM MIA",
"Date of birth": "07 Jun 1972",
"ID Number": "9116217028",
"Full extracted text": "...",
"similarity": {
"status": "no_comparison_data_provided"
}
}
-
libmagic not found error:
ImportError: failed to find libmagic
Solution: Replace
python-magic
withpython-magic-bin
:pip uninstall -y python-magic pip install python-magic-bin
-
DLL load failed error:
ImportError: DLL load failed while importing cv2
Solution: Reinstall OpenCV:
pip uninstall -y opencv-python pip install opencv-python
-
OpenGL/libGL.so.1 error:
ImportError: libGL.so.1: cannot open shared object file
Solution: Install required libraries:
# For Ubuntu/Debian sudo apt-get install -y libgl1-mesa-glx # For Arch Linux sudo pacman -S mesa
-
Permission denied for cache directory:
PermissionError: [Errno 13] Permission denied: 'cache'
Solution: Check permissions:
chmod 750 cache
The application uses environment variables defined in .env for configuration:
Variable | Description | Default |
---|---|---|
SECRET_KEY | Secret key for Flask | Generated value |
AUTH_TOKEN | API authentication token | Generated value |
RATE_LIMIT | Max requests per window | 10 |
RATE_LIMIT_WINDOW | Rate limit window in seconds | 60 |
MAX_CONTENT_LENGTH | Max allowed file size in bytes | 5MB (5242880) |
CACHE_DIR | Directory for temporary files | cache |
- Always use a strong, randomly generated AUTH_TOKEN
- The API implements rate limiting to prevent abuse
- Temporary files are automatically deleted after processing
- Input validation helps prevent malicious uploads
- Security headers mitigate common web vulnerabilities
This project is licensed under the MIT License - see the LICENSE file for details.
- EasyOCR for the OCR engine
- Flask team for the web framework
- OpenCV contributors for image processing capabilities
Note: This software is intended for legitimate identity verification purposes. Please ensure compliance with local data protection and privacy regulations when handling personal identification information.