Skip to content

Patent Phrase to Phrase Matching (Kaggle Competition). This model allow to compare two different phrases and output a score between 0 - 1 (where 0 means unrelated & 1 means identical)

License

Notifications You must be signed in to change notification settings

waijian1/nlp_phrase_match

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP - US Patent Phrase to Phrase Matching

This is a NLP project focuses on analyzing and matching phrases from US patent documents from Kaggle Competition.

Installation

  1. Clone the repository
git clone https://github.com/waijian1/nlp_phrase_match.git
cd nlp_phrase_match
  1. Create and activate conda environment
conda env create -f environment.yaml
conda activate nlp_phrase_match # replace with your actual environment name

Project Structure

├── main.ipynb          # Main notebook containing analysis and results
├── environment.yaml    # Conda & pip packages environment file
└── us-patent-phrase-to-phrase-matching/    # Data directory (download from notebook)

Notebook Overview

You can view the complete notebook with all outputs and visualizations in these formats:

The notebook includes:

  • Download train & validation data from Kaggle competition
  • Data preprocessing
  • Fine-tune pretrained Transformer model from HuggingFace
  • Model training
  • Results

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

About

Patent Phrase to Phrase Matching (Kaggle Competition). This model allow to compare two different phrases and output a score between 0 - 1 (where 0 means unrelated & 1 means identical)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published