What's the model capability of GLOM?
Minimal implementation of the GLOM architecture in PyTorch, inspired by Hinton's idea described in https://arxiv.org/pdf/2102.12627.pdf.
Currently, I'm training some GLOM configurations for different benchmarks.
Model | Dataset | Accuracy |
---|---|---|
GLOM T/4 | MNIST | |
GLOM S/4 | MNIST | |
GLOM B/4 | MNIST | |
GLOM T/4 | CIFAR10 | |
GLOM S/4 | CIFAR10 | |
GLOM B/4 | CIFAR10 | |
GLOM T/4 | CIFAR100 | |
GLOM S/4 | CIFAR100 | |
GLOM B/4 | CIFAR100 | |
GLOM T/32 | ImageNet | |
GLOM S/32 | ImageNet | |
GLOM B/32 | ImageNet |