The project aims to explore the collaborative potential among development banks using Natural Language Processing (NLP) techniques. By analyzing textual data such as project declerations from various development banks, the goal is to uncover insights into potential areas of collaboration and synergy.
- Data Pipeline: Data pipeline to fetch and preprocess development cooperation projects from IATI Datastore (IATI)
- Similarity Scores: Calculation of text similarities between fetched projects to find similar projects
- Extended Similarity Scores: Calculation of a extended similarity score which includes cosine text similarity, CRS3 Codes, CRS5 Codes and SDGs
- Application: Visualization of results in an web application through Streamlit (App)
Model Name | Description | Link |
---|---|---|
all-MiniLM-L6-v2 | A small, efficient transformer model for various NLP tasks, used here with Sentence Transformer. Used to create text embeddings to calculate cosine similarity. | MiniLMv2 on Hugging Face |
jonas/bert-base-uncased-finetuned-sdg | A classifier model for SDG (Sustainable Development Goals) classification from Hugging Face. | SDG Classifier on Hugging Face |
git clone https://github.com/JanMuehlnikel/NLP-Development-Banks-Collaboration-Analyzer
cd synergy-app
git clone https://huggingface.co/spaces/GIZ/eb-synergy-app
- Install
NLP-Development-Banks-Collaboration-Analyzer/requirements.txt
in virtual enviroment (e.g. conda)
- Navigate to
/config/
- Create KEYS.py
- Add line
IATI_KEY = "{Your_Iati_Datastore_Key}"
- Create IATI Datastore API Key and replace it with the placeholder (Create Key - Full Access subscription used)
- Navigate to
/data/pipeline
- Run
python pipeline.py
- Wait till pipeline finishes
- See results in
/src/merged_orgas.csv
- Navigate to
/data/models
- Run
similarity_minilm.ipynb
Notebook - Text based cosine similarity scores stored in
/src/similarities.npz
- Navigate to
/data/models
- Run
extended_similarities.ipynb
Notebook - Extended Similarity Results stored in
synergy-app/src/
Launch Local (Most likely not possible throgh extremely high RAM usage!)
cd /synergy-app
streamlit run app.py
Visit HuggingFace Space
Through high RAM usage the Streamlit App is hosted in a Hugging Face Space:
https://huggingface.co/spaces/GIZ/eb-synergy-app
├── config/ # configuration files, constants and keys
├── data/ # pipeline, models and validation
├── src/ # sources
├── synergy-app/ # Streamlit App to display results (different repo (https://huggingface.co/spaces/GIZ/eb-synergy-app))
├── .gitignore # files ignored (especially large memmory files)
├── README.md # project information
└── requirments.txt # dependecies and libs that need to be installed