Skip to content

malaysia-ai/sfw-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Safe for Work Classifier

This repository contains code for training a Safe for Work (SFW) classifier on Malaysian data. The classifier is designed to detect various categories including hate speech, violence, self-harm, harassment, informative content, psychiatric or mental illness-related content, racist speech, religion insults, sexist speech, pornographic content, and content that is safe for work.

Finetuned https://huggingface.co/mesolitica/malaysian-mistral-191M-MLM-512 with Malaysian NSFW data.

Labels: 'racist', 'religion insult','psychiatric or mental illness', 'sexist','hate','porn','safe for work', 'self-harm'

Refer status:

A summary

Flowchart: https://excalidraw.com/#json=TlW-2VsgrA8YdUgUx8Sdu,iPDBbixT0_x3udcl91LZ4A

Developing a safe for work classifier involves an extensive amount of data mining. To summarize:

  1. Scraped social media data
  2. Use LLM to help label data to their respective label
  3. Filter out labelled data by LLM using centroid.
    • generate embedding of labelled data using baai/bge-m3 (this was the best one during this point in time!)
    • compute centroid of the embedding (average of all the embeddings)
    • filter out based on threshold (heurestically determine the threshold for filtering based on data distribution)

"We are assuming that sentences whose embeddings are closer to the centroid are more similar in a way."

  1. We still don't have enough data, or better yet we have too much of it.

    • For hate/harassment data (we have too much)
    • filter only negative sentiment data
    • For safe for work data
    • filter only positive and neutral text
  2. For other labels, not enough labels

    • Train a single label classifier model (example: sexist classifier (sexist/non sexist)
    • Classify unlabelled texts
    • Filter out data with lower confidence
    • Heurestically evaluate the prediction

Some labels are more challenging where we have to label it manually or add more dataset:

LLMOps Pipeline

Image in a markdown cell

Notebook & Dataset Guide

  1. LLM Inference notebook malaysian-mistral.ipynb
  2. Notebook on exploration and filtering using centroid for specific labels filter-{labels}.
  3. Notebook for training single label data for pseudolabeling pseudolabeling-single-label.ipynb
  4. Notebook for training sfw classifier model train-sfw-classifier.ipynb
  5. Notebook for fasttext language detection

Important Data Files for Reference

  • Dataset containing filtered and training dataset for gathering malaysia-sfw-dataset.jsonl
  • Dataset for labelled data from llm inference in inference-data folder result-mixtral, result-mistral, nsfw-inference-mallam.jsonl
  • Dataset of social media data with english and standard malay translation noisy-translation-combined.jsonl
  • Scraped nsfw data using keyword and profiles https://huggingface.co/datasets/malaysia-ai/crawl-twitter-nsfw

How to Use

We applied LLM2Vec paper to pretrain, pretrained malaysian mistral on masked lm task. The model outperforms other models from t5 and malaysian-mistral trained on causalm lm. We decided to use this model as the base model for this classifier.

Refer file classifier.py to understand the code of the model

from classifier import MistralForSequenceClassification
model = MistralForSequenceClassification.from_pretrained('malaysia-ai/malaysian-sfw-classifier')

Reference

Acknowledgment

Thank you to all malaysia-ai volunteers for helping to scrape and mining the data to help finish this project :)

About

Safe for Work Classifier

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published