Multi-GPU training #76

hegc · 2021-01-19T08:58:01Z

Hi, these examples are excellent, and can we training on multi-GPUs with K2/lhotse? Just like the DDP in pytorch?

danpovey · 2021-01-19T09:39:27Z

I don't believe we have examples of multi-GPU training yet, but AFAIK standard PyTorch mechanisms for multi-GPU training should work. If you try, let us know, and make a PR about it!

…

On Tue, Jan 19, 2021 at 4:58 PM ffhh ***@***.***> wrote: Hi, these examples are excellent, and can we training on multi-GPUs with K2/lhotse? Just like the DDP in pytorch? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#76>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO5BU7THFY7ZHC7MU2TS2VCSXANCNFSM4WIOG3LA> .

csukuangfj · 2021-01-19T09:41:16Z

There is a WIP pull-request about multi-GPU training: #71

hegc · 2021-01-19T09:42:31Z

Thanks, I'll try it. @danpovey @csukuangfj

pzelasko · 2021-01-19T18:36:17Z

BTW I intend to finish that PR so that we can switch between single-GPU and multi-GPU training when I find some spare time. I will also need to make sure that it does the right thing when aggregating things like validation loss between the GPUs (I saw both processes return slightly different values) and storing/loading checkpoints. If you have more time and can pick it up before I do, you're welcome to do it :)

danpovey · 2021-01-20T04:43:12Z

Fantastic!!

…

On Wed, Jan 20, 2021 at 2:36 AM Piotr Żelasko ***@***.***> wrote: BTW I intend to finish that PR so that we can switch between single-GPU and multi-GPU training when I find some spare time. I will also need to make sure that it does the right thing when aggregating things like validation loss between the GPUs (I saw both processes return slightly different values) and storing/loading checkpoints. If you have more time and can pick it up before I do, you're welcome to do it :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#76 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6JUGID2MJESMMUBCLS2XGLFANCNFSM4WIOG3LA> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU training #76

Multi-GPU training #76

hegc commented Jan 19, 2021

danpovey commented Jan 19, 2021 via email

csukuangfj commented Jan 19, 2021

hegc commented Jan 19, 2021

pzelasko commented Jan 19, 2021

danpovey commented Jan 20, 2021 via email

Multi-GPU training #76

Multi-GPU training #76

Comments

hegc commented Jan 19, 2021

danpovey commented Jan 19, 2021 via email

csukuangfj commented Jan 19, 2021

hegc commented Jan 19, 2021

pzelasko commented Jan 19, 2021

danpovey commented Jan 20, 2021 via email