Merge develop to main (#233) · llauraa23/InternLM@54f85a6

Commit

Merge develop to main (InternLM#233)

* feat(utils/writer.py): support tensorboard writer (InternLM#63)

* feat(utils/writer.py): support tensorboard writer

* feat(utils/writer.py): add class comment

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>

* [Develop] Pull Main Branch (InternLM#121)

* fix/fix_submodule_err (InternLM#61)

* fix/fix_submodule_err

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>

* fix issue templates (InternLM#65)

* fix(tokenizer): refactor tokenizer and update usage in readme (InternLM#51)

* update tokenizer example

* fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (InternLM#73)

* fix a typo in readme

* in order to find InternLMTokenizer, select a lower version of Transformers

---------

Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>

* [Doc] Add wechat and discord link in readme (InternLM#78)

* Doc：add wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* [Docs]: add Japanese README (InternLM#43)

* Add Japanese README

* Update README-ja-JP.md

replace message

* Update README-ja-JP.md

* add repetition_penalty in GenerationConfig in web_demo.py (InternLM#48)

Co-authored-by: YWMditto <862779238@qq.com>

* use fp16 in instruction (InternLM#80)

* [Enchancement] add more options for issue template (InternLM#77)

* [Enchancement] add more options for issue template

* update qustion icon

* fix link

* Use tempfile for convert2hf.py (InternLM#23)

Fix InternLM#50

* delete torch_dtype of README's example code (InternLM#100)

* set the value of repetition_penalty to 1.0 to avoid random outputs (InternLM#99)

* Update web_demo.py (InternLM#97)

Remove meaningless log.

* [Fix]Fix wrong string cutoff in the script for sft text tokenizing (InternLM#106)

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: Kai Chen <chenkaidev@gmail.com>
Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com>
Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com>
Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>
Co-authored-by: vansin <msnode@163.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com>
Co-authored-by: YWMditto <862779238@qq.com>
Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>
Co-authored-by: Shuo Zhang <zhangshuolove@live.com>
Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com>

* feat(core/scheduler): support pipeline parallel (InternLM#98)

* feat(utils/writer.py): support tensorboard writer

* feat(utils/writer.py): add class comment

* feat(core): support pipeline parallel

* fix(core): fix demo running error

* feat(solver/optimizer): add pp zero optimizer

* fix(solver/optimizer): fix word spelling error

* feat(core/scheduler): add new dir scheduler in core/

* fix(core): fix ci lint error

* feat(solver/optimizer): merge pp and nopp optimizer

* doc(usage.md): update usage doc

* feat(core/scheduler): support post func

* feat(core/scheduler): add dtype para in pp sche and update func get_tensor_shape

* feat(core/scheduler): add _load_micro_batch in base scheduler

* feat(core/scheduler): support optimizer overlap communication in pp scheduler

* feat(core/scheduler): delete data process func code

* feat(core/trainer): schedule pre processing for all schedule

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>

* refactor(rotaryEmbedding): refactor forward (InternLM#120)

* use fp16 in instruction (InternLM#80)

* delete torch_dtype of README's example code (InternLM#100)

* refactor the forward for rotary embedding

---------

Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>

* feat(model/metrics.py): support calculating accuracy and perplexity m… (InternLM#91)

* feat(model/metrics.py): support calculating accuracy and perplexity metrics

* fix(model/metrics.py): fix import error

* feat(train.py): minor update

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>

* fix(optimizer/util.py) change inf defination

* [Dev] Pull Main (InternLM#139)

* fix/fix_submodule_err (InternLM#61)

* fix/fix_submodule_err

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>

* fix issue templates (InternLM#65)

* fix(tokenizer): refactor tokenizer and update usage in readme (InternLM#51)

* update tokenizer example

* fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (InternLM#73)

* fix a typo in readme

* in order to find InternLMTokenizer, select a lower version of Transformers

---------

Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>

* [Doc] Add wechat and discord link in readme (InternLM#78)

* Doc：add wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* [Docs]: add Japanese README (InternLM#43)

* Add Japanese README

* Update README-ja-JP.md

replace message

* Update README-ja-JP.md

* add repetition_penalty in GenerationConfig in web_demo.py (InternLM#48)

Co-authored-by: YWMditto <862779238@qq.com>

* use fp16 in instruction (InternLM#80)

* [Enchancement] add more options for issue template (InternLM#77)

* [Enchancement] add more options for issue template

* update qustion icon

* fix link

* Use tempfile for convert2hf.py (InternLM#23)

Fix InternLM#50

* delete torch_dtype of README's example code (InternLM#100)

* set the value of repetition_penalty to 1.0 to avoid random outputs (InternLM#99)

* Update web_demo.py (InternLM#97)

Remove meaningless log.

* [Fix]Fix wrong string cutoff in the script for sft text tokenizing (InternLM#106)

* docs(install.md): update dependency package transformers version to >= 4.28.0 (InternLM#124)

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>

* docs(LICENSE): add license (InternLM#125)

* add license of colossalai and flash-attn

* fix lint

* modify the name

* fix AutoModel map in convert2hf.py (InternLM#116)

* variables are not printly as expect (InternLM#114)

* feat(solver): fix code to adapt to torch2.0 and provide docker images (InternLM#128)

* feat(solver): fix code to adapt to torch2.0

* docs(install.md): publish internlm environment image

* docs(install.md): update dependency packages version

* docs(install.md): update default image

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>

* add demo test (InternLM#132)

Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>

* fix web_demo cache accelerate (InternLM#133)

* fix(hybrid_zero_optim.py): delete math import

* Update embedding.py

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: Kai Chen <chenkaidev@gmail.com>
Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com>
Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com>
Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>
Co-authored-by: vansin <msnode@163.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com>
Co-authored-by: YWMditto <862779238@qq.com>
Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>
Co-authored-by: Shuo Zhang <zhangshuolove@live.com>
Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com>
Co-authored-by: huangting4201 <1538303371@qq.com>
Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com>
Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>
Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com>

* style(solver/optimizer/utils.py): fix lint error (InternLM#147)

Co-authored-by: huangting.p <huangting@sensetime.com>

* feat(*): support not-flash-attn for pp and no-pp (InternLM#145)

* support not flash attention for no-pp

* support pipeline

* modify the config

* refactor the code

* refactor the code

* remove some unnecessary code

* fix(initialize/launch.py): set default value for use_flash_attn (InternLM#158)

* add default for use_flash_attn

* fix lint

* feat(utils/logger.py): support uniscale logger (InternLM#152)

* style(internlm): fix lint error

* feat(utils/logger.py): support uniscale logger

* fix(utils/logger.py): fix import circular error

* feat(train.py): support dashboard metric panel and fix ci train config

* fix(ci_scripts/train/slurm_train.sh): fix ci train error

* fix(ci_scripts/train/torchrun.sh): fix ci train error

* fix(ci_scripts/train): restore ci update

* fix(config.json): delete alert webhook

* feat(train.py): optimize func init logger

* feat(config.json): delete config.json

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>

* feat(utils/evaluation.py): support evaluate (InternLM#154)

* style(internlm): fix lint error

* feat(utils/logger.py): support uniscale logger

* fix(utils/logger.py): fix import circular error

* feat(train.py): support dashboard metric panel and fix ci train config

* fix(ci_scripts/train/slurm_train.sh): fix ci train error

* fix(ci_scripts/train/torchrun.sh): fix ci train error

* feat(utils/evaluation.py): support evaluate on validation dataset

* fix(utils/evaluation.py): fix demo error

* fix(ci_scripts/train/ci_7B_sft.py): fix ci train error

* feat(initialize/launch.py): set default value for valid_bsz and valid_every

* fix(ci_scripts/train): restore ci update

* docs(configs/7B_sft.py): update comment for config

* fix(config.json): delete config.json

* fix evaluation bug in scheduler when use_flash_attn=False

* feat(scheduler/no_pipeline_scheduler.py): support micro_bsz>1 in no pp

* modify the jugement in pp and no-pp scheduler

* modify the data_process_func in evaluation

* fix bugs when use_flash_attn=False

* rename symbol

* feat(configs/7B_sft.py): change para valid_bsz to valid_micro_num

* feat(scheduler/no_pipeline_scheduler.py): update para set _grad_accum_batch_size

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: huangting.p <huangting@sensetime.com>
Co-authored-by: yingtongxiong <974106207@qq.com>

* feat(*): support no apex (InternLM#166)

* support no-apex

* add default for use_apex

* fix lint

* modify the RMSNormTorch

* remove some comments

* remove use_apex parameter

* remove some unnecessary code

* refactor(*): refactor the code with no-apex (InternLM#170)

* support no-apex

* add default for use_apex

* fix lint

* modify the RMSNormTorch

* remove some comments

* remove use_apex parameter

* remove some unnecessary code

* optimize the code including import

* remove the import RMSNorm

* remove warnings

* refactor(scheduler): rewrite pipeline scheduler (InternLM#138)

* refactor(scheduler): rewrite pipeline scheduler

* fix(*): fix pipeline scheduler bugs

* fix(*): fix merge bug

* feat(*): update codes with todo tag

* feat(*): add comments

* feat(internlm/core/scheduler): update recv_prev/next logic

* feat(utils/evaluation.py): update sche metric hook for valid

---------

Co-authored-by: huangting.p <huangting@sensetime.com>

* feat(*): support fp32 training (InternLM#155)

* support float32 training

* fix lint

* add adaptation in model/utils.py

* remove some unnecessary code

* fix lint

* feat(optim): add support for fp32 zero

* Revert "Merge pull request InternLM#2 from SolenoidWGT/fp32_zero"

This reverts commit 53fc50b, reversing
changes made to 40f24d0.

revert commit

* merge develop

* Update utils.py

* support fp32 in zero optimizer

* modify the dtype

---------

Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>

* feat(*): support sequence_parallel (InternLM#180)

* support sequence_parallel for no pipeline

* sequence_parallel does not support no-flash-attn

* support sequence parallel for pipeline

* add memory profiler

* Update 13B.py

* add memory profiler

* fix evaluation bug

* remove some unnecessary code

* remove some unnecessary code

* Update parallel_context.py

* modify the config

* remove memory profiler

* modify the config

* support selective dropout

* feat(monitor): support monitor and alert (InternLM#175)

* feat(monitor): support monitor and alert

* feat(monitor.py): fix demo error

* feat(monitor.py): move cmd monitor args to config file

* feat(hybrid_zero_optim.py): if overflow occurs send alert msg

* feat(monitor.py): remove alert msg filter

* feat(monitor.py): optimize class MonitorTracker

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(monitor.py): optimize code

* feat(train.py): update print to log

* style(ci): fix lint error

* fix(utils/evaluation.py): remove useless code

* fix(model/modeling_internlm.py): fix lint error

---------

Co-authored-by: huangting4201 <huangting3@sensetime.com>

* feat(ckpt): add async upload and ckpt snapshot (InternLM#161)

* use fp16 in instruction (InternLM#80)

* delete torch_dtype of README's example code (InternLM#100)

* feat(ckpt): support async ckpt upload and ckpt snapshot

---------

Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>
Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>

* feat(ckpt): add auto ckpt load and singal quit (InternLM#189)

Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>

* Revert "feat(ckpt): add auto ckpt load and singal quit (InternLM#189)" (InternLM#192)

This reverts commit a45a91b.

* refactor(solver/optimizer): improve optimizer memory (InternLM#193)

* refactor(solver/optimizer): improve optimizer memory

* feat(data): remove useless dataset type ids map

* Feat/optimizer (InternLM#194)

* feat(optimier.py): reduce memory footprint and avoid _check_overflow call

* feat(optimier.py): reduce memory footprint and avoid _check_overflow call

* feat(optimizer.py): overlap compute norm with allreduce

* update var and function name

* update function compute norm (InternLM#197)

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>

* feat(optimizer/hybrid_zero_optim.py): overlap gradients last bucket allreduce and compute norm (InternLM#196)

* support gradients allreduce and compute norm overlap

* fix para set error

* remove timer cal_norm for testing

* feat(optimizer/hybrid_zero_optim.py): support group global norm

* format(lint): fix lint error

* feat(optimizer/store.py): update code based on comment

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: huangting4201 <1538303371@qq.com>

* fix(ci): fix ci train error (InternLM#199)

* fix/ci train error (InternLM#200)

* fix(ci): fix ci train error

* fix(ci): fix ci train error

* fix(ci): fix ci train error

* fix(train.py): fix scheduler metric hook skip error (InternLM#204)

* Merge main to develop (InternLM#203)

* fix/fix_submodule_err (InternLM#61)

* fix/fix_submodule_err

---------

Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>

* fix issue templates (InternLM#65)

* fix(tokenizer): refactor tokenizer and update usage in readme (InternLM#51)

* update tokenizer example

* fix(readme, requirements): fix typo at Chinese readme and select a lower version of transformers (InternLM#73)

* fix a typo in readme

* in order to find InternLMTokenizer, select a lower version of Transformers

---------

Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>

* [Doc] Add wechat and discord link in readme (InternLM#78)

* Doc：add wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* Doc：update wechat and discord link

* [Docs]: add Japanese README (InternLM#43)

* Add Japanese README

* Update README-ja-JP.md

replace message

* Update README-ja-JP.md

* add repetition_penalty in GenerationConfig in web_demo.py (InternLM#48)

Co-authored-by: YWMditto <862779238@qq.com>

* use fp16 in instruction (InternLM#80)

* [Enchancement] add more options for issue template (InternLM#77)

* [Enchancement] add more options for issue template

* update qustion icon

* fix link

* Use tempfile for convert2hf.py (InternLM#23)

Fix InternLM#50

* delete torch_dtype of README's example code (InternLM#100)

* set the value of repetition_penalty to 1.0 to avoid random outputs (InternLM#99)

* Update web_demo.py (InternLM#97)

Remove meaningless log.

* [Fix]Fix wrong string cutoff in the script for sft text tokenizing (InternLM#106)

* docs(install.md): update dependency package transformers version to >= 4.28.0 (InternLM#124)

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>

* docs(LICENSE): add license (InternLM#125)

* add license of colossalai and flash-attn

* fix lint

* modify the name

* fix AutoModel map in convert2hf.py (InternLM#116)

* variables are not printly as expect (InternLM#114)

* feat(solver): fix code to adapt to torch2.0 and provide docker images (InternLM#128)

* feat(solver): fix code to adapt to torch2.0

* docs(install.md): publish internlm environment image

* docs(install.md): update dependency packages version

* docs(install.md): update default image

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>

* add demo test (InternLM#132)

Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>

* fix web_demo cache accelerate (InternLM#133)

* Doc: add twitter link (InternLM#141)

* Feat add checkpoint fraction (InternLM#151)

* feat(config): add checkpoint_fraction into config

* feat: remove checkpoint_fraction from configs/7B_sft.py

---------

Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>

* [Doc] update deployment guide to keep consistency with lmdeploy (InternLM#136)

* update deployment guide

* fix error

* use llm partition (InternLM#159)

Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>

* test(ci_scripts): clean test data after test, remove unnecessary global variables, and other optimizations (InternLM#165)

* test: optimization of ci scripts(variables, test data cleaning, etc).

* chore(workflows): disable ci job on push.

* fix: update partition

* test(ci_scripts): add install requirements automaticlly,trigger event about lint check and other optimizations (InternLM#174)

* add pull_request in lint check

* use default variables in ci_scripts

* fix format

* check and install requirements automaticlly

* fix format

---------

Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>

* feat(profiling): add a simple memory profiler (InternLM#89)

* feat(profiling): add simple memory profiler

* feat(profiling): add profiling argument

* feat(CI_workflow): Add PR & Issue auto remove workflow (InternLM#184)

* feat(ci_workflow): Add PR & Issue auto remove workflow

Add a workflow for stale PR & Issue  auto remove
- pr & issue well be labeled as stale for inactive in 7 days
- staled PR & Issue  well be remove in 7 days
- run this workflow every day on 1:30 a.m.

* Update stale.yml

* feat(bot): Create .owners.yml for Auto Assign (InternLM#176)

* Create .owners.yml: for issue/pr assign automatically

* Update .owners.yml

* Update .owners.yml

fix typo

* [feat]: add pal reasoning script (InternLM#163)

* [Feat] Add PAL inference script

* Update README.md

* Update tools/README.md

Co-authored-by: BigDong <yudongwang1226@gmail.com>

* Update tools/pal_inference.py

Co-authored-by: BigDong <yudongwang1226@gmail.com>

* Update pal script

* Update README.md

* restore .ore-commit-config.yaml

* Update tools/README.md

Co-authored-by: BigDong <yudongwang1226@gmail.com>

* Update tools/README.md

Co-authored-by: BigDong <yudongwang1226@gmail.com>

* Update pal inference script

* Update READMD.md

* Update internlm/utils/interface.py

Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>

* Update pal script

* Update pal script

* Update script

* Add docstring

* Update format

* Update script

* Update script

* Update script

---------

Co-authored-by: BigDong <yudongwang1226@gmail.com>
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>

* test(ci_scripts): add timeout settings and clean work after the slurm job (InternLM#185)

* restore pr test on develop branch

* add mask

* add post action to cancel slurm job

* remove readonly attribute on job log

* add debug info

* debug job log

* try stdin

* use stdin

* set default value avoid error

* try setting readonly on job log

* performance echo

* remove debug info

* use squeue to check slurm job status

* restore the lossed parm

* litmit retry times

* use exclusive to avoid port already in use

* optimize loop body

* remove partition

* add {} for variables

* set env variable for slurm partition

---------

Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>

* refactor(tools): move interface.py and import it to web_demo (InternLM#195)

* move interface.py and import it to web_demo

* typo

* fix(ci): fix lint error

* fix(ci): fix lint error

---------

Co-authored-by: Sun Peng <sunpengsdu@gmail.com>
Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: Kai Chen <chenkaidev@gmail.com>
Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com>
Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com>
Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>
Co-authored-by: vansin <msnode@163.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com>
Co-authored-by: YWMditto <862779238@qq.com>
Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>
Co-authored-by: Shuo Zhang <zhangshuolove@live.com>
Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com>
Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com>
Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>
Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com>
Co-authored-by: Guoteng <32697156+SolenoidWGT@users.noreply.github.com>
Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>
Co-authored-by: lvhan028 <lvhan_028@163.com>
Co-authored-by: zachtzy <141206206+zachtzy@users.noreply.github.com>
Co-authored-by: cx <759046501@qq.com>
Co-authored-by: Jaylin Lee <61487970+APX103@users.noreply.github.com>
Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com>
Co-authored-by: Shaoyuan Xie <66255889+Daniel-xsy@users.noreply.github.com>
Co-authored-by: BigDong <yudongwang1226@gmail.com>
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>
Co-authored-by: huangting4201 <huangting3@sensetime.com>

* fix(pipeline_scheduler.py): fix tensor shape err and comm block (InternLM#210)

* feat(train.py): support torch profiler (InternLM#201)

* feat(train.py): support torch profiling

* feat(train.py): optimize initialize_llm_profile

* feat(train.py): profiling with tp0 and dp0

* move sequence parallel context manager to evalation func

* fix lint

* move the process for type_ids to load_new_batch

* fix lint

---------

Co-authored-by: yingtongxiong <974106207@qq.com>

* feat(ckpt): add auto ckpt load and singal quit (InternLM#216)

Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>

* feat(memory_profiler): improve memory profiler (InternLM#217)

* Feat/overlap_bcast_forward (InternLM#218)

* feat/support bcast forward overlao

* feat/optimize the bcast call

* feat/optimize the bcast call

* feat/optimize the bcast call

* fix lint

* fix lint

* fix lint

* fix lint

* add torch.cuda.synchronize in save_checkpoint

---------

Co-authored-by: sunpeng <sunpengsdu@gmail.com>

* fix(*): move sequence_parallel to parallel config (InternLM#224)

* move sequence_parallel to parallel config

* set the sequece_parallel default value is False

* fix lint

* fix lint

* fix lint

* Feat/example training internlm (InternLM#212)

* feat(train/training_internlm.py): move common init funcs to internlm/train

* feat(train/training_internlm.py): update some public funcs

* feat(train/training_internlm.py): update some public funcs

* feat(evaluation.py): adapt evaluate to streaming dataset

* feat(train/training_internlm.py): minor update based on comments

* fix(training_internlm.py): set train dataloader persistent_workers true only when num_worker>0

* fix(training_internlm.py): fix demo error

* feat(data/utils.py): add new dataset type code for streaming dataset (InternLM#225)

* test(model): support fp32 with flash_attn (InternLM#223)

* support tf32 with flash

* move autocast to attention

* fix lint

* fix lint

* fix lint

* fix lint

* fix some bugs in model

* modify the convert dtype

* fix(pipeline): modify the sequence_parallel in pipeline (InternLM#227)

* move sequence_parallel to parallel config

* set the sequece_parallel default value is False

* fix lint

* fix lint

* fix lint

* modify the sequence_parallel in pp

* feat(init): add skip args check flag and add zero overlap flag (InternLM#222)

* feat(init): add skip args check flag

* fix(optim): add param overlap enable flag

* fix(ci): fix train error (InternLM#228)

Co-authored-by: huangting4201 <huangting3@sensetime.com>

* fix(writer): fix tensorboard resume bug (InternLM#229)

* fix(train.py): fix overflow grad norm error (InternLM#230)

* feat(ckpt): add train config into ckpt (InternLM#231)

---------

Co-authored-by: 黄婷 <huangting3@CN0014010744M.local>
Co-authored-by: Sun Peng <sunpengsdu@gmail.com>
Co-authored-by: ChenQiaoling00 <qiaoling_chen@u.nus.edu>
Co-authored-by: Kai Chen <chenkaidev@gmail.com>
Co-authored-by: Yang Gao <Gary1546308416AL@gmail.com>
Co-authored-by: Changjiang GOU <gouchangjiang@gmail.com>
Co-authored-by: gouhchangjiang <gouhchangjiang@gmail.com>
Co-authored-by: vansin <msnode@163.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: YWMditto <46778265+YWMditto@users.noreply.github.com>
Co-authored-by: YWMditto <862779238@qq.com>
Co-authored-by: WRH <12756472+wangruohui@users.noreply.github.com>
Co-authored-by: liukuikun <24622904+Harold-lkk@users.noreply.github.com>
Co-authored-by: x54-729 <45304952+x54-729@users.noreply.github.com>
Co-authored-by: Shuo Zhang <zhangshuolove@live.com>
Co-authored-by: Miao Zheng <76149310+MeowZheng@users.noreply.github.com>
Co-authored-by: huangting.p <huangting@sensetime.com>
Co-authored-by: ytxiong <45058324+yingtongxiong@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: kkscilife <126147887+kkscilife@users.noreply.github.com>
Co-authored-by: qa-caif-cicd <qa-caif-cicd@pjlab.org.cn>
Co-authored-by: hw <45089338+MorningForest@users.noreply.github.com>
Co-authored-by: yingtongxiong <974106207@qq.com>
Co-authored-by: cx <759046501@qq.com>
Co-authored-by: wangguoteng.p <wangguoteng925@qq.com>
Co-authored-by: huangting4201 <huangting3@sensetime.com>
Co-authored-by: Guoteng <32697156+SolenoidWGT@users.noreply.github.com>
Co-authored-by: lvhan028 <lvhan_028@163.com>
Co-authored-by: zachtzy <141206206+zachtzy@users.noreply.github.com>
Co-authored-by: Jaylin Lee <61487970+APX103@users.noreply.github.com>
Co-authored-by: del-zhenwu <dele.zhenwu@gmail.com>
Co-authored-by: Shaoyuan Xie <66255889+Daniel-xsy@users.noreply.github.com>
Co-authored-by: BigDong <yudongwang1226@gmail.com>
Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com>

Loading branch information

35 people authored Aug 24, 2023

1 parent e1cefae commit 54f85a6

.github/workflows/demo_in_readme.yaml

-Original file line number
+Diff line change
@@ -1,5 +1,5 @@
     name: demo-in-readme
-    on:
+    on:
       pull_request:
         branches:
           - "main"
@@ Expand Down Expand Up / @@ -110,7 +110,6 @@ jobs: @@
             srun -p ${SLURM_PARTITION} --job-name=${GITHUB_RUN_ID}-${GITHUB_JOB} --gpus-per-task=2 python ../ci_scripts/model/loaded_as_transformer.py
             cd ..
             rm -rf $GITHUB_WORKSPACE/hf_ckpt
       load-chat-model-in-hf:
         if: ${{ always() }}
         needs: check-requirements
@@ Expand Down @@

.gitignore

-Original file line number
+Diff line change
@@ Expand Up / @@ -115,6 +115,7 @@ venv.bak/ @@
     *.pkl
     *.pkl.json
     *.log.json
+    *.trace.json
     docs/modelzoo_statistics.md
     mmdet/.mim
     work_dirs/
@@ Expand Down Expand Up / @@ -142,4 +143,5 @@ core.* @@
     # Run
     llm_ckpts
-    memory_trace
+    events.*
+    memory_trace

.pre-commit-config.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -49,5 +49,5 @@ repos: @@
             args:
                 [
                     '--rcfile=.pylintrc',
-                    '--disable=C0114,C0415,W0212,W0235,W0238,W0621,C0103,R1735,C2801,E0402,C0412,W0719,R1728,W1514,W0718,W0105,W0707,C0209,W0703'
+                    '--disable=C0114,C0415,W0212,W0235,W0238,W0621,C0103,R1735,C2801,E0402,C0412,W0719,R1728,W1514,W0718,W0105,W0707,C0209,W0703,W1203'
                 ]

ci_scripts/train/ci_7B_sft.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -15,6 +15,7 @@ @@
     SAVE_CKPT_FOLDER = "local:llm_ckpts"
     # LOAD_CKPT_FOLDER = "local:llm_ckpts/49"
     ckpt = dict(
+        enable_save_ckpt=True,
         # Path to save training ckpt.
         save_ckpt_folder=SAVE_CKPT_FOLDER,
         # Path to continue training ckpt (load model weights and scheduler/context states).
@@ Expand Down @@

ci_scripts/train/load_ckpt.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -5,7 +5,7 @@ set -x @@
     readonly CKPTS_PATH="$GITHUB_WORKSPACE/llm_ckpts"
     readonly CKPTS40_PATH="$GITHUB_WORKSPACE/llm_ckpts/40"
     readonly CKPTS40_OUTPUT="${CKPTS40_PATH}/*.pt"
-    expected_num=21
+    expected_num=22
     exit_code=0
     source ./ci_scripts/common/basic_func.sh
@@ Expand Down @@

ci_scripts/train/slurm_train.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -5,7 +5,7 @@ set -x @@
     readonly CKPTS_PATH="$GITHUB_WORKSPACE/llm_ckpts"
     readonly CKPTS20_PATH="$GITHUB_WORKSPACE/llm_ckpts/20"
     readonly CKPTS20_OUTPUT="${CKPTS20_PATH}/*.pt"
-    expected_num=21
+    expected_num=22
     exit_code=0
     source ./ci_scripts/common/basic_func.sh
@@ Expand Down @@

ci_scripts/train/torchrun.sh

-Original file line number
+Diff line change
@@ Expand Up / @@ -5,7 +5,7 @@ set -x @@
     readonly CKPTS_PATH="$GITHUB_WORKSPACE/llm_ckpts"
     readonly CKPTS20_PATH="$GITHUB_WORKSPACE/llm_ckpts/20"
     readonly CKPTS_OUTPUT="${CKPTS20_PATH}/*.pt"
-    expected_num=21
+    expected_num=22
     exit_code=0
     source ./ci_scripts/common/basic_func.sh
@@ Expand Down @@

configs/7B_sft.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -7,38 +7,51 @@
  
    NUM_LAYER = 32

    VOCAB_SIZE = 103168

    MODEL_ONLY_FOLDER = "local:llm_ckpts/xxxx"

    # Ckpt folder format:

    # fs: 'local:/mnt/nfs/XXX'

    # oss: 'boto3:s3://model_weights/XXX'

    MODEL_ONLY_FOLDER = "local:llm_ckpts/xxxx"

    SAVE_CKPT_FOLDER = "local:llm_ckpts"

    LOAD_CKPT_FOLDER = "local:llm_ckpts/49"

    # boto3 Ckpt folder format:

    # import os

    # BOTO3_IP = os.environ["BOTO3_IP"] # boto3 bucket endpoint

    # SAVE_CKPT_FOLDER = f"boto3:s3://model_weights.{BOTO3_IP}/internlm"

    # LOAD_CKPT_FOLDER = f"boto3:s3://model_weights.{BOTO3_IP}/internlm/snapshot/1/"

    CHECKPOINT_EVERY = 50

    ckpt = dict(

        # Path to save training ckpt.

        save_ckpt_folder=SAVE_CKPT_FOLDER,

        # Path to continue training ckpt (load model weights and scheduler/context states).

        # load_ckpt_folder=LOAD_CKPT_FOLDER,

        # Path to initialize with given model weights.

        # load_model_only_folder=MODEL_ONLY_FOLDER,

        checkpoint_every=50,

        # Wheter to load optimizer states when continuing training.

        load_optimizer=True,

        enable_save_ckpt=False,  # enable ckpt save.

        save_ckpt_folder=SAVE_CKPT_FOLDER,  # Path to save training ckpt.

        # load_ckpt_folder=LOAD_CKPT_FOLDER, # Ckpt path to resume training(load weights and scheduler/context states).

        # load_model_only_folder=MODEL_ONLY_FOLDER, # Path to initialize with given model weights.

        load_optimizer=True,  # Wheter to load optimizer states when continuing training.

        checkpoint_every=CHECKPOINT_EVERY,

        async_upload=True,  # async ckpt upload. (only work for boto3 ckpt)

        async_upload_tmp_folder="/dev/shm/internlm_tmp_ckpt/",  # path for temporarily files during asynchronous upload.

        snapshot_ckpt_folder="/".join([SAVE_CKPT_FOLDER, "snapshot"]),  # directory for snapshot ckpt storage path.

        oss_snapshot_freq=int(CHECKPOINT_EVERY / 2),  # snapshot ckpt save frequency.

    )

    TRAIN_FOLDER = "/path/to/dataset"

    VALID_FOLDER = "/path/to/dataset"

    data = dict(

        seq_len=SEQ_LEN,

        # micro_num means the number of micro_batch contained in one gradient update

        micro_num=4,

        # packed_length = micro_bsz * SEQ_LEN

        micro_bsz=2,

        # defaults to the value of micro_num

        valid_micro_num=4,

        # defaults to 0, means disable evaluate

        valid_every=50,

        pack_sample_into_one=False,

        total_steps=50000,

        skip_batches="",

        rampup_batch_size="",

        # Datasets with less than 50 rows will be discarded

        min_length=50,

        # train_folder=TRAIN_FOLDER,

        # valid_folder=VALID_FOLDER,

    )

    grad_scaler = dict(

    @@ -62,7 +75,8 @@
  
    hybrid_zero_optimizer = dict(

        # Enable low_level_optimzer overlap_communication

        zero_overlap_communication=True,

        overlap_sync_grad=True,

        overlap_sync_param=True,

        # bucket size for nccl communication params

        reduce_bucket_size=512 * 1024 * 1024,

        # grad clipping

    @@ -107,9 +121,11 @@
  
        num_layers=NUM_LAYER,

        mlp_ratio=MLP_RATIO,

        apply_post_layer_norm=False,

        dtype="torch.bfloat16",

        dtype="torch.float16",  # Support: "torch.float16", "torch.half", "torch.bfloat16", "torch.float32", "torch.tf32"

        norm_type="rmsnorm",

        layer_norm_epsilon=1e-5,

        use_flash_attn=True,

        num_chunks=1,  # if num_chunks > 1, interleaved pipeline scheduler is used.

    )

    """

    zero1 parallel:

    @@ -118,11 +134,15 @@
  
        2. if zero1 == 1, zero is not used, and all dp groups retain the full amount of model parameters.

        3. zero1 > 1 and zero1 <= dp world size, the world size of zero is a subset of dp world size.

            For smaller models, it is usually a better choice to split the parameters within nodes with a setting <= 8.

    pipeline parallel: pipeline parallel size, only 1 is accepted currently.

    tensor parallel: tensor parallel size, usually the number of GPUs per node, only 1 is accepted currently.

    pipeline parallel (dict):

        1. size: int, the size of pipeline parallel.

        2. interleaved_overlap: bool, enable/disable communication overlap when using interleaved pipeline scheduler.

    tensor parallel: tensor parallel size, usually the number of GPUs per node.

    """

    parallel = dict(

        zero1=8,

        pipeline=dict(size=1, interleaved_overlap=True),

        sequence_parallel=False,

    )

    cudnn_deterministic = False

doc/en/usage.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -174,7 +174,7 @@ parallel = dict( @@
       - When `size <= 0`, the size of the zero1 process group is equal to the size of the data parallel process group, so the optimizer state parameters will be split within the data parallel range.
       - When `size == 1`, zero1 is not used, and all data parallel groups retain the complete optimizer state parameters.
       - When `size > 1` and `size <= data_parallel_world_size`, the zero1 process group is a subset of the data parallel process group.
-    - pipeline: pipeline parallel size, currently only supports 1, default value is 1
+    - pipeline: pipeline parallel size, default value is 1
     - tensor: tensor parallel size, usually the number of GPUs per node, default value is 1
     Note: `Data parallel size = Total number of GPUs / Pipeline parallel size / Tensor parallel size`
@@ Expand Down @@

doc/usage.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -159,7 +159,7 @@ parallel = dict( @@
       - 当`size <= 0`，则 zero1 进程组的大小等于数据并行进程组的大小，因此优化器状态参数将在数据并行范围内分配
       - 当`size == 1`，则不使用 zero1 ，所有数据并行组保留完整的优化器状态参数
       - 当`size > 1`且`size <= data_parallel_world_size`，则 zero1 进程组是数据并行进程组的子集
-    - pipeline：流水线并行大小，目前只支持 1，默认值为 1
+    - pipeline：流水线并行大小，默认值为 1
     - tensor：张量并行大小，通常是每个节点的 GPU 数量，默认值为 1
     注意：`数据并行大小 = 总的 GPU 数目 / 流水线并行大小 / 张量并行大小`
@@ Expand Down @@

internlm/core/communication/__init__.py

-Original file line number
+Diff line change
@@ -0,0 +1,32 @@
+    from .p2p import (
+        AsynCommunicator,
+        recv_backward,
+        recv_forward,
+        send_backward,
+        send_backward_and_recv_next_backward_async,
+        send_backward_recv_backward,
+        send_backward_recv_forward,
+        send_forward,
+        send_forward_and_recv_next_forward_async,
+        send_forward_backward_recv_forward_backward,
+        send_forward_recv_backward,
+        send_forward_recv_forward,
+    )
+    from .utils import recv_obj_meta, send_obj_meta
+    __all__ = [
+        "send_forward",
+        "send_forward_recv_forward",
+        "send_forward_backward_recv_forward_backward",
+        "send_backward",
+        "send_backward_recv_backward",
+        "send_backward_recv_forward",
+        "send_forward_recv_backward",
+        "recv_backward",
+        "recv_forward",
+        "send_obj_meta",
+        "recv_obj_meta",
+        "send_backward_and_recv_next_backward_async",
+        "send_forward_and_recv_next_forward_async",
+        "AsynCommunicator",
+    ]

0 comments on commit `54f85a6`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `54f85a6`

Commit

There are no files selected for viewing

0 comments on commit 54f85a6

0 comments on commit `54f85a6`