Skip to content

Commit

Permalink
Respond to Jonas Feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
nelsonauner committed Jul 2, 2024
1 parent f8ad68b commit f4625cb
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 89 deletions.
50 changes: 24 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,32 +6,30 @@ To quickly learn how to run cleanlab on your own data, first check out the [quic

## Table of Contents

| | Example | Description |
| --- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | datalab | Use Datalab to detect various types of data issues in (a subset of) the Caltech-256 image classification dataset. |
| 2 | find_label_errors_iris | Find label errors introduced into the Iris classification dataset. |
| 3 | classifier_comparison | Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors. |
| 4 | hyperparameter_optimization | Hyperparameter optimization to find the best settings of CleanLearning's optional parameters. |
| 5 | simplifying_confident_learning | Straightforward implementation of Confident Learning algorithm with raw numpy code. |
| 6 | visualizing_confident_learning | See how cleanlab estimates parameters of the label error distribution (noise matrix). |
| 7 | find_tabular_errors | Handle mislabeled tabular data to improve a XGBoost classifier. |
| 8 | fine_tune_LLM | Fine-tuning OpenAI language models with noisily labeled text data |
| 9 | cnn_mnist | Finding label errors in MNIST image data with a Convolutional Neural Network. |
| 10 | huggingface_keras_imdb | CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset. |
| 11 | fasttext_amazon_reviews | Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible FastText model. |
| 12 | multiannotator_cifar10 | Iteratively improve consensus labels and trained classifier from data labeled by multiple annotators. |
| 13 | llm_evals_w_crowdlab | LLM Evals with humans, AI judges, and GPT token probabilities. Evaluate an LLM from multiple human/AI reviewers of varying competency by using CROWDLAB and GPT token probabilities. |
| 14 | active_learning_multiannotator | Improve a classifier model by iteratively collecting additional labels from data annotators. This active learning pipeline considers data labeled in batches by multiple (imperfect) annotators. |
| 15 | active_learning_single_annotator | Improve a classifier model by iteratively labeling batches of currently-unlabeled data. This demonstrates a standard active learning pipeline with at most one label collected for each example (unlike our multi-annotator active learning notebook which allows re-labeling). |
| 16 | active_learning_transformers | Improve a Transformer model for classifying politeness of text by iteratively labeling and re-labeling batches of data using multiple annotators. If you haven't done active learning with re-labeling, try the active_learning_multiannotator notebook first. |
| 17 | outlier_detection_cifar10 | Train AutoML for image classification and use it to detect out-of-distribution images. |
| 18 | multilabel_classification | Find label errors in an image tagging dataset (CelebA) using a Pytorch model you can easily train for multi-label classification. |
| 19 | entity_recognition | Train Transformer model for Named Entity Recognition and produce out-of-sample pred_probs for cleanlab.token_classification. |
| 20 | transformer_sklearn | How to use KerasWrapperModel to make any Keras model sklearn-compatible, demonstrated here for a BERT Transformer. |
| 21 | cnn_coteaching_cifar10 | Train a Convolutional Neural Network on noisily labeled Cifar10 image data using cleanlab with coteaching. |
| 22 | non_iid_detection | Use Datalab to detect non-IID sampling (e.g. drift) in datasets based on numeric features or embeddings. |
| 23 | object_detection | Train Detectron2 object detection model for use with cleanlab. |
| 24 | semantic segmentation | Train ResNeXt semantic segmentation model for use with cleanlab. |
| 1 | [datalab](datalab_image_classification/README.md) | Use Datalab to detect various types of data issues in (a subset of) the Caltech-256 image classification dataset. |
| 2 | [find_label_errors_iris](find_label_errors_iris/find_label_errors_iris.ipynb) | Find label errors introduced into the Iris classification dataset. |
| 3 | [classifier_comparison](classifier_comparison/classifier_comparison.ipynb) | Use CleanLearning to train 10 different classifiers on 4 dataset distributions with label errors. |
| 4 | [hyperparameter_optimization](hyperparameter_optimization/hyperparameter_optimization.ipynb) | Hyperparameter optimization to find the best settings of CleanLearning's optional parameters. |
| 5 | [simplifying_confident_learning](simplifying_confident_learning/simplifying_confident_learning.ipynb) | Straightforward implementation of Confident Learning algorithm with raw numpy code. |
| 6 | [visualizing_confident_learning](visualizing_confident_learning/visualizing_confident_learning.ipynb) | See how cleanlab estimates parameters of the label error distribution (noise matrix). |
| 7 | [find_tabular_errors](find_tabular_errors/find_tabular_errors.ipynb) | Handle mislabeled [tabular data](https://github.com/cleanlab/s/blob/master/student-grades-demo.csv) to improve a XGBoost classifier. |
| 8 | [fine_tune_LLM](fine_tune_LLM/LLM_with_noisy_labels_cleanlab.ipynb) | Fine-tuning OpenAI language models with noisily labeled text data |
| 9 | [cnn_mnist](cnn_mnist/find_label_errors_cnn_mnist.ipynb) | Finding label errors in MNIST image data with a [Convolutional Neural Network](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/mnist_pytorch.py). |
| 10 | [huggingface_keras_imdb](huggingface_keras_imdb/huggingface_keras_imdb.ipynb) | CleanLearning for text classification with Keras Model + pretrained BERT backbone and Tensorflow Dataset. |
| 11 | [fasttext_amazon_reviews](fasttext_amazon_reviews/fasttext_amazon_reviews.ipynb) | Finding label errors in Amazon Reviews text dataset using a cleanlab-compatible [FastText model](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/models/fasttext.py). |
| 12 | [multiannotator_cifar10](multiannotator_cifar10/multiannotator_cifar10.ipynb) | Iteratively improve consensus labels and trained classifier from data labeled by multiple annotators. |
| 13 | [llm_evals_w_crowdlab](llm_evals_w_crowdlab/llm_evals_w_crowdlab.ipynb) | LLM Evals with humans, AI judges, and GPT token probabilities. Evaluate an LLM from multiple human/AI reviewers of varying competency by using CROWDLAB and GPT token probabilities. |
| 14 | [active_learning_multiannotator](active_learning_multiannotator/active_learning.ipynb) | Improve a classifier model by iteratively collecting additional labels from data annotators. This active learning pipeline considers data labeled in batches by multiple (imperfect) annotators. |
| 15 | [active_learning_single_annotator](active_learning_single_annotator/active_learning_single_annotator.ipynb) | Improve a classifier model by iteratively labeling batches of currently-unlabeled data. This demonstrates a standard active learning pipeline with *at most one label* collected for each example (unlike our multi-annotator active learning notebook which allows re-labeling). |
| 16 | [active_learning_transformers](active_learning_transformers/active_learning.ipynb) | Improve a Transformer model for classifying politeness of text by iteratively labeling and re-labeling batches of data using multiple annotators. If you haven't done active learning with re-labeling, try the [active_learning_multiannotator](active_learning_multiannotator/active_learning.ipynb) notebook first. |
| 17 | [outlier_detection_cifar10](outlier_detection_cifar10/outlier_detection_cifar10.ipynb) | Train AutoML for image classification and use it to detect out-of-distribution images. |
| 18 | [multilabel_classification](multilabel_classification/image_tagging.ipynb) | Find label errors in an image tagging dataset ([CelebA](https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html)) using a [Pytorch model](multilabel_classification/pytorch_network_training.ipynb) you can easily train for multi-label classification. |
| 19 | [entity_recognition](entity_recognition/) | Train Transformer model for Named Entity Recognition and produce out-of-sample `pred_probs` for **cleanlab.token_classification**. |
| 20 | [transformer_sklearn](transformer_sklearn/transformer_sklearn.ipynb) | How to use `KerasWrapperModel` to make any Keras model sklearn-compatible, demonstrated here for a BERT Transformer. |
| 21 | [cnn_coteaching_cifar10](cnn_coteaching_cifar10/README.md) | Train a [Convolutional Neural Network](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/cifar_cnn.py) on noisily labeled Cifar10 image data using cleanlab with [coteaching](https://github.com/cleanlab/cleanlab/blob/master/cleanlab/experimental/coteaching.py). |
| 22 | [non_iid_detection](non_iid_detection/non_iid_detection.ipynb) | Use Datalab to detect non-IID sampling (e.g. drift) in datasets based on numeric features or embeddings. |
| 23 | [object_detection](object_detection/README.md) | Train Detectron2 object detection model for use with cleanlab. |
| 24 | [semantic segmentation](segmentation/training_ResNeXt50_for_Semantic_Segmentation_on_SYNTHIA.ipynb) | Train ResNeXt semantic segmentation model for use with cleanlab. |


## Instructions
Expand Down
Loading

0 comments on commit f4625cb

Please sign in to comment.