Secure Your Contract is an AI-based contract assistant that helps you draft a contract avoiding possible disadvantageous terms & keywords.
It is based on LLaMa 2 7B for simplicity with much of references, finetuned using QLoRA (4-bit quantization) which is very promising towards a lightweight off-the-shelf system. Please go for LLaMa-3 8B 4-bit as a backbone if you would like to build on this project.
Secure Your Contract provides two types of analysis:
-
Negative Term Detector(NTD): LLM-based negative term/phrases detector will provide the detections against risky or weakly risky terms, the reasons for them and corresponding refinement suggestions.
-
Negative Keyword Detector(NKD): Traditional NLP & text mining-based negative keywords detector consist of 9 different methodologies.
Contract - Analysis Pairs
-
You need to download 100
contract
keyword-related contract/agreement documents from WONDER.LEGAL such that the model learns realistic examples. -
For both auto-generated (by ChatGPT in our method for memory management) and crawled data, contract(prompt)-analysis(response) data pairs are placed under
./data/pair/prompt/
and./data/pair/response/
respectively with each separated as.txt
files.
MITIE installation
- You need to follow the instruction of MITIE github such that you are able to run
./LLMs/mitie_finetune.py
. PlaceMITIE-models
right under the root folder.
Convert Data Format
If you want to push your .json
to your huggingface repository, run
cd utils
python3 save_oneliner.py
The code will convert the normal prompt - response paris to them for LLaMa-2 with your template applied.
If you want to convert PDF(s) to .txt(s)
or .json
, run
cd utils
python3 pdf2json.py
python3 pdf2text.py
python3 pdfs2texts.py
Fine-tune NTD(Negative Term Detector) with LLaMa-2 7B chat model with QLoRA:
cd LLMs
python3 finetune.py
Note you can change configs/finetune.yml
for different settings.
Inference with your fine-tuned model on the sample contract data:
cd LLMs
python3 inference.py
Note you can change configs/inference.yml
for different settings.
cd LLMs
python3 neg_detect.py
Note using fine-tuned MITIE is not helpful, although it is possible.
To evaluate with RAG(Retrieval Augmented Generation), run
python3 evaluate_rag.py
Note you need to contruct your FAISS DB first. Since LLaMa-2 has a limited token ingestion capacity, RAG rather degrades the model performance in our case.
w/o Inference & w/o RAG
To evaluate your model with inference, run
python3 evaluate.py
To evaluate without inference, run
python3 evaluate_onlyscores.py
For histograms, run
python3 visualize_histograms.py
For word clouds, run
python3 visualize_wordclouds.py