Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about codebook dim #5

Open
gyt1145028706 opened this issue Jan 24, 2025 · 6 comments
Open

Questions about codebook dim #5

gyt1145028706 opened this issue Jan 24, 2025 · 6 comments

Comments

@gyt1145028706
Copy link

gyt1145028706 commented Jan 24, 2025

作者您好!
请问你们有尝试用更高的码本维度吗(现在是 fsq, codebook dim=8)
比如你们有尝试过使用 1 层 vq,codebook dim=512 或 1024 吗

@zhenye234
Copy link
Owner

你好! fsq可能不太能行,你可以看下paper https://arxiv.org/pdf/2309.15505 和这个blog https://spaces.ac.cn/archives/9826

@gyt1145028706
Copy link
Author

谢谢您的回复和建议。我可能有些表述不清。我的意思其实是:您有没有尝试过使用普通 VQ(而不是 FSQ)且更高的码本维度(如 512,1024)来训一个单码本或者少量码本(如 RVQ-3)的 X-Codec? 我在尝试做这个事情,不知道是不是码本维度高了不好训的原因~

@zhenye234
Copy link
Owner

一开始用bigcodec的vq尝试过,发现vq训练非常不稳定,而且vq codebook size上去了,codebook利用率也不一定能上去,效果未必变好。

@zhenye234
Copy link
Owner

更高的特征维度,如果codebook不变的话,不太会提升性能,因为vq作为bottleneck通过的信息有限。

@dzq84
Copy link

dzq84 commented Jan 27, 2025

感谢作者分享经验,学到了很多

@gyt1145028706
Copy link
Author

感谢分享 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants