SegFormer-pytorch/README.md at main · hankkkwu/SegFormer-pytorch · GitHub

Building the blocks of Segformer architecture.

Overlap Patch Embedding. A method to convert images to sequence of overlapping patches
Efficient Self-Attention - 1st Core component of all Transformer based models.
Mix-FeedForward module - 2nd core component of Transformer models. Along with Self-Attention, forms single Transformer block
Transformer block - Self-attention + Mix FFN + Layer Norm forms a basic Tranformer block5.
Decoder head - contains MLP layers.

Here is the result trained on BDD100k drivable area:

Here is the attention maps from the video above: