-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nemo-automodel: fsdp2 support for peft #12008
Conversation
5948ab7
to
8721756
Compare
b21a5ff
to
de34b2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks
70865f5
to
3665444
Compare
8d222a4
to
a81e480
Compare
d337aa8
to
e66d469
Compare
04eaaf0
to
ba03e8e
Compare
da1bd02
to
c4bc136
Compare
87a97c7
to
0731898
Compare
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
1cbd6f6
to
1c088eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks
feb6136
to
65c7b20
Compare
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #12008 +/- ##
==========================================
- Coverage 30.30% 30.30% -0.01%
==========================================
Files 1387 1387
Lines 176283 176293 +10
Branches 27091 27096 +5
==========================================
- Hits 53422 53420 -2
- Misses 118775 118788 +13
+ Partials 4086 4085 -1 ☔ View full report in Codecov by Sentry. |
[🤖]: Hi @akoumpa 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
* Use patch_linear_module for FSDP2 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use patch_linear_module for FSDP2 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add fsdp2 strategy to test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add --num-nodes option Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add --num-nodes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * minor fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add to_cpu in utils Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use to_cpu Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * shard adapter weights Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use to_cpu from utils Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use get_automodel_from_trainer Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove some mcore logic from save_checkpoint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * call to_cpu in strategy's save_checkpoint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix ckpt saving Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add missing import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * simplify Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * minor fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * # noqa: F821 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * noqa Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
* Use patch_linear_module for FSDP2 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use patch_linear_module for FSDP2 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add fsdp2 strategy to test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add --num-nodes option Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add --num-nodes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * minor fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add to_cpu in utils Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use to_cpu Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * shard adapter weights Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use to_cpu from utils Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * use get_automodel_from_trainer Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove some mcore logic from save_checkpoint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * call to_cpu in strategy's save_checkpoint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix ckpt saving Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove unused import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add missing import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * simplify Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * minor fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pylint fix :/ Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * # noqa: F821 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add docstrings Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * noqa Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
What does this PR do ?
Adds support for fsdp2 + peft.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information