VeriShield is an open-source initiative to build a modular, scalable, and efficient backend solution for KYC (Know Your Customer) and KYB (Know Your Business) processes. By leveraging technologies such as FastAPI, PostgreSQL, Neo4j, Kafka, and Machine Learning, VeriShield automates identity verification, detects fraud, and delivers real-time risk scoring. Its community-driven architecture ensures flexibility and extensibility, allowing developers to integrate additional data sources (like IP watchlists or advanced device intelligence) and sophisticated ML workflows.
- Introduction
- Project Goals
- Features (Phase 1)
- Features (Phase 2)
- Features (Phase 3)
- Features (Phase 4)
- Quick Start (Docker-Only)
- Testing (Docker-Only)
- Seeding Data (Optional)
- Requirements
- Project Structure
- Roadmap
- License
- Contact
VeriShield serves as a backend simulation for financial institutions, fintech startups, and e-commerce platforms requiring KYC/KYB capabilities. By automating identity verification, fraud detection, and real-time risk assessments, it addresses complex regulatory requirements in identity management. Key highlights:
- Automated identity verification reduces human error and manual overhead.
- Fraud detection employing classical ML, deep learning, or graph neural networks (GNN)—especially relevant for ring-based or multi-owner collusion.
- Real-time risk scoring integrated with Kafka.
- Graph-based analysis (Neo4j) for discovering hidden suspicious relationships (e.g., shared IP usage, ring leaders, multi-owner webs).
By design, VeriShield is modular—enabling quick enhancements (like IP watchlists or synergy-based labeling) to keep pace with evolving fraud tactics.
- Automate KYC/KYB processes, reducing manual checks while maintaining regulatory compliance.
- Detect anomalies & potential fraud using synergy-based labeling, ring expansions, and watchlist IP logic.
- Model entity relationships in a graph database for advanced ring or multi-owner detection (Neo4j).
- Harness asynchronous workflows using Kafka, ensuring robust & scalable verification at high volumes.
- Enable easy extensibility through microservices, containerization, and a plug-in approach for advanced ML or GNN solutions.
- Fintech (AML, user signups, suspicious IP tracking)
- Digital Banking & E-Commerce (fraud detection, real-time risk-based transaction blocking)
- Analytics & Risk: Combining ML & GNN for advanced ring-based anomaly detection in complex user–business–IP graphs.
- Dockerized Setup: Local deployment with FastAPI, PostgreSQL, Neo4j.
- Basic Endpoint: A
/health
route verifying the service’s operational status. - Initial Testing: Basic Pytest coverage verifying environment and container synergy.
- Foundational Structure: Clear environment variables, Docker configuration, and code organization.
- CRUD Endpoints (FastAPI):
- User & Business create/read/update.
- Basic data validation with Pydantic.
- Database Integration:
- SQLAlchemy + Postgres for standard relational data.
- Neo4j driver for future graph-based queries or ring expansions.
- Secure Passwords:
- bcrypt hashing.
- Potential to expand for more advanced authentication flows.
- Advanced Testing:
- Integration tests checking CRUD correctness (e.g., duplicates, 404s).
- Additional Docker-based tests.
- Event-Driven Architecture via Kafka:
- Producer publishes events (
user_created
,user_verified
). - Consumer listens and sets
is_verified=true
in the background.
- Producer publishes events (
- Retries & DLQ:
- Automatic re-delivery on partial failures.
- “Dead Letter Queue” for unresolvable messages.
- Scaling:
- As user volume increases, scale consumer services horizontally.
- Test Coverage:
- Integration tests verifying event-driven flows.
- Demonstrates asynchronous identity checks.
-
Risk Scoring Service
- ML pipeline generating risk scores for new signups or business registrations.
- Could run offline in batch or real-time in Kafka consumer.
-
verishield_ml_experiments
Sub-Project- Found in
verishield_ml_experiments/
. - Synthetic data creation (multi-pass synergy, ring leaders, IP collisions).
- EDA & Model Training notebooks (XGBoost, Keras MLP, GNN).
- Demonstrates multi-task classification: user, business, plus IP nodes.
- Found in
-
Offline + Online Flow
- Offline: train/tune ML or GNN on synthetic or partial real data.
- Online: integrate best models into the microservice or consumer for real-time risk flags.
-
Neo4j + GNN
- Phase 5 focuses on deeper integration with Neo4j for ring-based or IP-based subgraphs.
- Evaluate suspicious patterns (shared IP usage, colluding ring leaders) to refine fraud detection.
- Clone:
git clone https://github.com/Harshil7875/VeriShield-AI-Financial-Verification-Platform.git cd VeriShield-AI-Financial-Verification-Platform
- Launch:
docker compose up -d --build
- Runs backend (FastAPI), consumer, Postgres, Neo4j, Kafka, Zookeeper.
- Check:
docker compose ps
- Ensure containers are healthy.
- Health:
- Visit http://localhost:8000/health. Expect
{"status":"OK"}
.
- Visit http://localhost:8000/health. Expect
- Logs:
docker compose logs backend -f
docker compose logs consumer -f
- Create a User:
curl -X POST -H "Content-Type: application/json" \ -d '{"email":"test@example.com","password":"pass123"}' \ http://localhost:8000/users
- Enter Container:
docker compose exec backend /bin/bash
- Pytest:
pytest --cov=app --cov-report=term-missing
- Shows coverage and any warnings.
- Inside container:
docker compose exec backend /bin/bash
- Run:
cd scripts python seed_data.py 10 15 True
- Seeds 10 users, 15 businesses, optionally Neo4j data.
- Docker (Docker Desktop or engine + compose)
- Git
- (Optional) Python 3.11+ for local dev
- (Optional) Conda/virtualenv for local environment
Apple Silicon: Our images (e.g.
postgres:15
,neo4j:5
) supportarm64
. If issues, specifyplatform: linux/amd64
indocker-compose.yml
.
VeriShield-AI-Financial-Verification-Platform/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI endpoints
│ │ ├── kafka_consumer.py # Listens for user_created events
│ │ ├── kafka_producer.py # Publishes user_created events
│ │ ├── models.py # SQLAlchemy models (User/Business)
│ │ ├── database.py # Postgres + Neo4j config
│ │ ├── crud.py # DB logic
│ │ ├── schemas.py # Pydantic schemas
│ │ └── __init__.py
│ ├── tests/
│ │ ├── test_kafka.py
│ │ └── test_main.py
│ ├── scripts/
│ │ └── seed_data.py
│ ├── Dockerfile
│ ├── requirements.txt
│ └── __init__.py
├── verishield_ml_experiments/
│ ├── data_generators/
│ ├── notebooks/
│ ├── requirements.txt
│ └── README.md
├── docker-compose.yml
├── LICENSE
└── README.md
- Phase 3: (Complete) Kafka-based asynchronous user verification
- Phase 4: ML & GNN integration for advanced risk scoring (ongoing)
- Phase 5: Neo4j expansions (graph-based synergy, ring-based analytics)
- Phase 6: Cloud deployment, CI/CD
- Phase 7: Observability (monitoring, logging, alerting), performance
Licensed under the MIT License. Feel free to use, modify, and distribute under these terms. We welcome community contributions to enhance synergy-based ring detection, IP classification, or advanced GNN integrations.
For questions, feature requests, or contributions:
- Maintainer: harshilbhandari01@gmail.com
I appreciate feedback and pull requests to strengthen identity verification workflows, ring-based detection, multi-task classification, or advanced GNN modeling for real-time fraud prevention.