New issue

Jump to bottom

DeepSeek #11971

Merged

ko3n1g merged 44 commits into main from chcui/deepseek

Feb 22, 2025

+1,358 −71

Collaborator

cuichenx commented Jan 28, 2025 •

edited

Loading

What does this PR do ?

Add DeepSeek V2-Lite, V2, and V3 (including R1) models.
Support model import (from HF), model export (to HF), SFT/LoRA recipes.

Currently not supported: SFT/LoRA with packed sequence, pretraining,

Collection: [Note which collection this PR will affect]

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

cuichenx and others added 2 commits

January 27, 2025 21:47


          initial commit

e4d0cd9

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

325db93

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

github-advanced-security bot found potential problems

View reviewed changes

nemo/collections/llm/__init__.py Fixed Show fixed Hide fixed

cuichenx and others added 20 commits

January 28, 2025 09:42


          clean

c34778e

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

87836f6

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>


          fix mscale and remove debug code

0fcdaa5

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

31b0f23

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>


          remove MTP to avoid HF warning

4e5929b

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Revert "remove MTP to avoid HF warning"

b3a7328

This reverts commit 4e5929b.


          guard v3 args

d851bc7

Signed-off-by: Chen Cui <chcui@nvidia.com>


          guard one more v3 arg

f194158

Signed-off-by: Chen Cui <chcui@nvidia.com>


          add recipes (wip)

fb73013

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

fb90521

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>


          update recipes

52f0778

Signed-off-by: Chen Cui <chcui@nvidia.com>


          update recipes

b92f6be

Signed-off-by: Chen Cui <chcui@nvidia.com>


          update to latest mcore

90c5eb1

Signed-off-by: Chen Cui <chcui@nvidia.com>


          update to latest mcore

fa67d3b

Signed-off-by: Chen Cui <chcui@nvidia.com>


          support lora for TELinear

21e3993

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

163ffe9

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>


          support V2 lite

6d3708f

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

ad73088

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>


          exporter

1b7ed6f

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

cec4326

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

NVIDIA deleted a comment from github-actions bot

NVIDIA deleted a comment from github-actions bot

cuichenx and others added 4 commits

February 13, 2025 22:47


          memory-efficient hf export

7b8940c

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Merge remote-tracking branch 'origin/chcui/deepseek' into chcui/deepseek

f2c95c0

# Conflicts:
#	nemo/collections/llm/gpt/model/deepseek.py


          Merge branch 'main' into chcui/deepseek

955e7ef

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

6937ae5

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

cuichenx marked this pull request as ready for review

February 14, 2025 03:53

cuichenx added the skip-linting label

Collaborator Author

cuichenx commented Feb 19, 2025

@JRD971000 @yaoyu-33 could you review the changes to ssm.py and mllama.py

cuichenx and others added 3 commits

February 20, 2025 15:30


          update recipes

af49891

Signed-off-by: Chen Cui <chcui@nvidia.com>


          Apply isort and black reformatting

39f4f49

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>


          guard packed sequence=false

198d7be

Signed-off-by: Chen Cui <chcui@nvidia.com>

github-advanced-security bot found potential problems

View reviewed changes

nemo/collections/llm/recipes/deepseek_v2.py Fixed Show fixed Hide fixed

nemo/collections/llm/recipes/deepseek_v2_lite.py Fixed Show fixed Hide fixed

nemo/collections/llm/recipes/deepseek_v3.py Fixed Show fixed Hide fixed


          unused imports

74353dd

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx added Run CICD r2.2.0 labels

ko3n1g added Run CICD and removed Run CICD labels

suiyoubi reviewed

View reviewed changes

nemo/collections/llm/gpt/model/deepseek.py Show resolved Hide resolved

suiyoubi reviewed

View reviewed changes

nemo/collections/llm/recipes/deepseek_v2.py Outdated Show resolved Hide resolved

suiyoubi reviewed

View reviewed changes

nemo/collections/llm/recipes/deepseek_v2_lite.py Outdated Show resolved Hide resolved

hemildesai previously approved these changes

View reviewed changes

Collaborator

hemildesai left a comment

LGTM for llm/api and nemo.lightning related changes

cuichenx added 2 commits

February 21, 2025 15:54


          fix lora

556c864

Signed-off-by: Chen Cui <chcui@nvidia.com>


          address comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx dismissed hemildesai’s stale review via

February 21, 2025 21:04


          Apply isort and black reformatting

ae394ff

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

cuichenx added Run CICD and removed Run CICD labels


          fix test

616f76d

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx added Run CICD and removed Run CICD labels

Contributor

github-actions bot commented Feb 22, 2025

[🤖]: Hi @cuichenx 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

ko3n1g approved these changes

View reviewed changes

ko3n1g merged commit b1dd398 into main

192 checks passed

ko3n1g deleted the chcui/deepseek branch

February 22, 2025 10:14

ko3n1g pushed a commit that referenced this pull request


          DeepSeek (#11971)

5f3e168

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fix mscale and remove debug code

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* remove MTP to avoid HF warning

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Revert "remove MTP to avoid HF warning"

This reverts commit 4e5929b.

* guard v3 args

Signed-off-by: Chen Cui <chcui@nvidia.com>

* guard one more v3 arg

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add recipes (wip)

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* update recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* update recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* update to latest mcore

Signed-off-by: Chen Cui <chcui@nvidia.com>

* update to latest mcore

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for TELinear

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* support V2 lite

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* exporter

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* memory-efficient hf export

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* code scanning

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* comment out cli factory for pretraining recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Support non-layernom column parallel layer for LoRA SP in MLA

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add v2-lite recipe

Signed-off-by: Chen Cui <chcui@nvidia.com>

* recipe typos

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* linting

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* update recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* guard packed sequence=false

Signed-off-by: Chen Cui <chcui@nvidia.com>

* unused imports

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix lora

Signed-off-by: Chen Cui <chcui@nvidia.com>

* address comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fix test

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g added a commit that referenced this pull request


          chore: Cherry pick deepseek (#12324)

328a191

* DeepSeek (#11971)

* initial commit

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* clean

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fix mscale and remove debug code

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* remove MTP to avoid HF warning

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Revert "remove MTP to avoid HF warning"

This reverts commit 4e5929b.

* guard v3 args

Signed-off-by: Chen Cui <chcui@nvidia.com>

* guard one more v3 arg

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add recipes (wip)

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* update recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* update recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* update to latest mcore

Signed-off-by: Chen Cui <chcui@nvidia.com>

* update to latest mcore

Signed-off-by: Chen Cui <chcui@nvidia.com>

* support lora for TELinear

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* support V2 lite

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* exporter

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* memory-efficient hf export

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* code scanning

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* comment out cli factory for pretraining recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Support non-layernom column parallel layer for LoRA SP in MLA

Signed-off-by: Chen Cui <chcui@nvidia.com>

* add v2-lite recipe

Signed-off-by: Chen Cui <chcui@nvidia.com>

* recipe typos

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* linting

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* update recipes

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* guard packed sequence=false

Signed-off-by: Chen Cui <chcui@nvidia.com>

* unused imports

Signed-off-by: Chen Cui <chcui@nvidia.com>

* fix lora

Signed-off-by: Chen Cui <chcui@nvidia.com>

* address comments

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* fix test

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>

* missing imports

Signed-off-by: oliver könig <okoenig@nvidia.com>

---------

Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

r2.2.0 Run CICD skip-linting