Skip to content

Extracting search filters from natural language queries using custom named entity recognition with spaCy and machine learning

Notifications You must be signed in to change notification settings

aidanbunch/Search-Query-Named-Entity-Recognition-with-spaCy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Search Query Named Entity Recognition with spaCy

This repository contains a project for extracting search filters from natural language queries using custom named entity recognition with spaCy and machine learning. I trained a custom NER model to extract these entities using a dataset saved in CSV format, which is cleaned, converted to the spaCy tuple format in preprocessing and used to train/validate the model.

Dataset

Put your CSV input data into the data/raw directory. Make sure the first column is titled "Text" and contains the full string, while every subsequent column represents an entity name and contains the extracted entity values. Something like the following schema should work:

Screenshot 2023-07-29 at 12 08 18 PM

Usage

  1. Open a terminal or command prompt and navigate to the project directory.
  2. Create a virtual environment by running the following command:
python -m venv venv
  1. Activate the virtual environment: For Windows:
venv\Scripts\activate

For macOS/Linux:

source venv/bin/activate
  1. Install the required dependencies by running the following command:
pip install -r requirements.txt
  1. Clean the data by running the following command:
cd cleaning && python clean_data.py
  1. Prepare the data for training by running the following command:
cd .. && cd preprocessing && python prepare_data.py
  1. Train the NER model by running the following command:
cd .. && python train_ner.py
Screenshot 2023-07-28 at 7 41 32 PM
  1. After the model has been generated, you can test it by running the following command:
cd testing && python test_ner.py
Screenshot 2023-07-28 at 7 44 15 PM

About

Extracting search filters from natural language queries using custom named entity recognition with spaCy and machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages