Skip to content

Update Deep Learning Questions & Answers for Data Scientists.md #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Phoenixcoder-6
Copy link

The question was How can transformers be used for tasks other than natural language processing, such as computer vision?
I have found the most understandable answer of that.

In NLP:

  • A sentence is a sequence of words.
  • Transformer sees each word as a token.
  • It learns how words relate to each other using self-attention.

Example:
In the sentence “The cat sat on the mat”,
"cat" and "sat" are related,
"cat" and "mat" are also related (because the cat is on the mat).

The transformer automatically figures out these relationships.

In Computer Vision:

An image is not a sequence, it’s a 2D grid (height x width x channels).
So before feeding it to a transformer, we:

  • Cut the image into small square patches (say 16×16 pixels).
  • Flatten each patch into a long vector (just line up the pixel values).
  • Embed each vector into a fixed-size vector (like an embedding layer for text).

Now, treat patches like "words" and the image like a "sentence"!
Then self-attention can figure out which parts of the image should attend to which others.
Maybe eyes should attend to nose for face recognition.
Maybe wheels should attend to car body in car detection.

✨ Why does this help?
In traditional CNNs, each convolutional filter looks only at a small local region (say 3×3 pixels).
In transformers, every patch can look at every other patch — even if they are far away!
So transformers can capture global relationships better.

@Phoenixcoder-6
Copy link
Author

Check this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant