This is a NLP project focuses on analyzing and matching phrases from US patent documents from Kaggle Competition.
- Clone the repository
git clone https://github.com/waijian1/nlp_phrase_match.git
cd nlp_phrase_match
- Create and activate conda environment
conda env create -f environment.yaml
conda activate nlp_phrase_match # replace with your actual environment name
├── main.ipynb # Main notebook containing analysis and results
├── environment.yaml # Conda & pip packages environment file
└── us-patent-phrase-to-phrase-matching/ # Data directory (download from notebook)
You can view the complete notebook with all outputs and visualizations in these formats:
- View in nbviewer (recommended)
- View in GitHub
The notebook includes:
- Download train & validation data from Kaggle competition
- Data preprocessing
- Fine-tune pretrained Transformer model from HuggingFace
- Model training
- Results
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.