Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doubt about repeated joint_tensor_query in all_to_all #424

Open
Edenzzzz opened this issue Jan 6, 2025 · 3 comments
Open

Doubt about repeated joint_tensor_query in all_to_all #424

Edenzzzz opened this issue Jan 6, 2025 · 3 comments

Comments

@Edenzzzz
Copy link

Edenzzzz commented Jan 6, 2025

It seems that in this line, the image conditioning for query is concatenated before all_to_all, which means it's repeated by ulysses_size times. I wonder if that might affect correctness?
image

query = torch.cat([joint_tensor_query, query], dim=1)

@Edenzzzz Edenzzzz changed the title Doubt about repeated joint_tensor_query Doubt about repeated joint_tensor_query in all_to_all Jan 6, 2025
@feifeibear
Copy link
Collaborator

It is absolutely correct. We have two different parallel for mm-dit SP.

  1. split both text and image.
  2. split image replicate text for small text or dim of text is not divisible by sp_degree.

What you see is 2.

@Edenzzzz
Copy link
Author

Edenzzzz commented Jan 7, 2025

Thanks for your reply.
In case 2, joint_tensor_query has full sequence length before all_to_all, doesn't that mean it's duplicated in a interleaved fashion with image embedding along the seq dim (text_cond, image_seq_1, test_cond, image_seq_2, ...)?
This also seems to prevent pack qkv (can only pack kv now) due to seqlen diff.
Would it be better to cat after all_to_all inside ring comp?

@feifeibear
Copy link
Collaborator

It does waste some communication bandwidth for dimension flexibility. I encourage you to give a try for supporting var len, which is definitely valuable by avoiding text padding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants