Training the full imagenet dataset (1k classes) needs a high computational resource, it is usually hard to quickly check your model on your local or personal computer. The mini-imagenet (100 classes) and tiny-imagenet (200 classes) are way more friendly on a local or personal computer, but the format of them are not friendly for the classical or traditional classification task, e.g. the original raw mini-imagenet data is divided into training/validation/testing sets for the few-shot or meta learning task.
- download the mini-imagenet and tiny-imagenet easily and directly with only one line!
from MLclf import MLclf
MLclf.miniimagenet_download(Download=True)
MLclf.tinyimagenet_download(Download=True)
- transform the mini-imagenet dataset which is initially created for the few-shot learning to the format that fit the classical classification task. You can also use this package to download and obtain the raw data of the mini-imagenet dataset (for few-shot learning tasks).
- transform the tiny-imagenet dataset to the format that fit the classical classification task, which can be more easily used (being able to directly input to the Pytorch dataloader) compared to the original raw format.
- tranform other popular datasets to the format that fit the classical classification task or the few-shot learning / zero-shot learning / transfer learning tasks (Feel free to leave your message if your have any ideas for the selection of potential datasets).
Besides the transformation above, the format for few-shot / meta learning task can be also extracted in MLclf, see more detailed in the following instruction.
The original dataset of mini-imagenet includes totally 100 classes, but due to its intention to meta-learning or few-shot learning, the train/validation/test dataset contains different classes. They have respectively 64/16/20 classes.
The original dataset of tiny-imagenet includes totally 200 classes, the train/validation/test dataset contains all classes. They have respectively 100000/10000/10000 images. For example, the training dataset has 500 images for each class.
In order to make the mini/tiny-imagenet dataset fit the format requirement for the classical classification task. MLclf made a proper transformation (recombination and splitting) of the original mini/tiny-imagenet dataset.
The transformed dataset of mini-imagenet is divided into train, validation and test dataset, each dataset of which includes 100 classes. Each image has the size 84x84 pixels with 3 channels.
The transformed dataset of tiny-imagenet is divided into train, validation and test dataset, each dataset of which includes 200 classes. Each image has the size 64x64 pixels with 3 channels.
Notice: The provider of tiny-imagenet dataset does not public the labels of testing dataset, so there is no labels for the original raw testing dataset.
The MLclf package can be found at: https://github.com/tiger2017/MLclf or at: https://pypi.org/project/MLclf/
Welcome to create an issue to the repository of MLclf on GitHub, and I will add more datasets loading functions based on the issues.
The mini-imagenet source data can be also accessed from: https://deepai.org/dataset/imagenet (there is no need to manually download it if you use MLclf).
- Python 3.x
- numpy
- torchvision
How to install MLclf package:
pip install MLclf
How to use this package for mini-imagenet:
from MLclf import MLclf
import torch
import torchvision.transforms as transforms
# Download the original mini-imagenet data:
MLclf.miniimagenet_download(Download=True) # only need to run this line before you download the mini-imagenet dataset for the first time. And the data will be downloaded to a newly-created folder in the current directory.
# Transform the original data into the format that fits the task for classification:
# Note: If you want to keep the data format as the same as that for the meta-learning or few-shot learning (original format), just set ratio_train=0.64, ratio_val=0.16, shuffle=False.
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# The argument transform is a optional keyword. You can also set transform = None or simply not set transform, if you do not want the data being standardized and only want a normalization b/t [0,1].
# The line below transformed the mini-imagenet data into the format for the traditional classification task, e.g. 60% training, 20% validation and 20% testing, with 100 classes in each of training/validation/testing set.
train_dataset, validation_dataset, test_dataset = MLclf.miniimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2, seed_value=None, shuffle=True, transform=transform, save_clf_data=True)
# The dataset can be easily converted to dataloader via torch:
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128, shuffle=True, num_workers=0)
# You can check the corresponding relations between labels and label_marks of the image data:
# (Note: The relations below can be obtained after MLclf.miniimagenet_clf_dataset is called, otherwise they will be returned as None instead.)
labels_to_marks = MLclf.labels_to_marks['mini-imagenet']
marks_to_labels = MLclf.marks_to_labels['mini-imagenet']
You can also obtain the raw data of mini-imagenet from the downloaded pkl files:
from MLclf import MLclf
# The raw data of mini-imagenet can be also obtained via the function below:
data_raw_train, data_raw_val, data_raw_test = MLclf.miniimagenet_data_raw()
How to use this package for tiny-imagenet for the traditional classification task (similarly as mini-imagenet):
from MLclf import MLclf
import torch
import torchvision.transforms as transforms
MLclf.tinyimagenet_download(Download=True) # only need to run this line before you download the tiny-imagenet dataset for the first time.
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_dataset, validation_dataset, test_dataset = MLclf.tinyimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2,
seed_value=None, shuffle=True,
transform=transform,
save_clf_data=True,
few_shot=False)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=5, shuffle=True, num_workers=0)
# You can check the corresponding relations between labels and label_marks of the image data:
# (Note: The relations below can be obtained after MLclf.tinyimagenet_clf_dataset is called, otherwise they will be returned as None instead.)
labels_to_marks = MLclf.labels_to_marks['tiny-imagenet']
marks_to_labels = MLclf.marks_to_labels['tiny-imagenet']
data_raw_train, data_raw_val, data_raw_test = MLclf.tinyimagenet_data_raw()
If you want to use tiny-imagenet for the few-shot learning task, just change few_shot=True, for example:
train_dataset, validation_dataset, test_dataset = MLclf.tinyimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2,
seed_value=None, shuffle=True,
transform=transform,
save_clf_data=True,
few_shot=True)
# only original training dataset is used as the whole dataset of the few-shot learning task, so 200 classes in total,
# and in this few-shot learning task's example, 120 classes as training dataset, 40 classes as validation dataset and 40 classes as testing dataset, with 500 images for each class.
If you want to use this MLclf package, please cite:
@software{xin_cao_2022_7233094,
author = {Xin Cao},
title = {{MLclf: The Project Machine Learning CLassiFication
for Utilizing Mini-imagenet and Tiny-imagenet}},
month = oct,
year = 2022,
publisher = {Zenodo},
version = {v0.2.14},
doi = {10.5281/zenodo.7233094},
url = {https://doi.org/10.5281/zenodo.7233094}
}