Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FT Vs. FT-L key hyperparameters #430

Closed
asimokby opened this issue Nov 19, 2024 · 5 comments
Closed

FT Vs. FT-L key hyperparameters #430

asimokby opened this issue Nov 19, 2024 · 5 comments
Labels
question Further information is requested

Comments

@asimokby
Copy link

I am trying to replicate the results in this paper: https://arxiv.org/pdf/2402.11905#page=5.21

They differentiate between FT and FT-L. What do I need to change in hparams to try both? Does making norm_contraint: false is enough to run FT? What is the key hyperparameters to change to run FT and FT-L? Thanks!

alg_name: "FT"
model_name: "openai-community/gpt2"
device: 0

layers: [0]
num_steps: 25
batch_size: 3
max_length: 40
lr: 5e-4
weight_decay: 0
kl_factor: 0
# norm_constraint: 5e-4
norm_constraint: false
rewrite_module_tmp: "transformer.h.{}.mlp.c_proj"
layer_module_tmp: "transformer.h.{}"
mlp_module_tmp: "transformer.h.{}.mlp"
attn_module_tmp: "transformer.h.{}.attn"
ln_f_module: "transformer.ln_f"
lm_head_module: "transformer.wte"
model_parallel: false
@littlefive5
Copy link
Collaborator

Our "FT" hparams is used for FT-L training, which is just tune the MLP in one layer. Our repo does not support the whole-parameters tuning, if you want to conduct the FT in their paper to tune the whole parameters, you can just follow the original Huggingface training scripts and you can use our code to evaluate.

@zxlzr
Copy link
Contributor

zxlzr commented Nov 20, 2024

Thank you for your interest in EasyEdit! EasyEdit is continuously maintained and updated. If you achieve better results (which might happen with certain methods), it could be due to updates in Python library versions or optimizations in the EasyEdit module. If you have any questions, feel free to reach out at any time!

@zxlzr zxlzr added the question Further information is requested label Nov 20, 2024
@zxlzr
Copy link
Contributor

zxlzr commented Nov 20, 2024

I am trying to replicate the results in this paper: https://arxiv.org/pdf/2402.11905#page=5.21

They differentiate between FT and FT-L. What do I need to change in hparams to try both? Does making norm_contraint: false is enough to run FT? What is the key hyperparameters to change to run FT and FT-L? Thanks!

alg_name: "FT"
model_name: "openai-community/gpt2"
device: 0

layers: [0]
num_steps: 25
batch_size: 3
max_length: 40
lr: 5e-4
weight_decay: 0
kl_factor: 0
# norm_constraint: 5e-4
norm_constraint: false
rewrite_module_tmp: "transformer.h.{}.mlp.c_proj"
layer_module_tmp: "transformer.h.{}"
mlp_module_tmp: "transformer.h.{}.mlp"
attn_module_tmp: "transformer.h.{}.attn"
ln_f_module: "transformer.ln_f"
lm_head_module: "transformer.wte"
model_parallel: false

Dear asimokby,

We would like to inform you that the results of KnowEdit have been updated due to updates and bug fixes in EasyEdit (details in #427). The conclusions are as follows: the results of AdaLora, ROME, and MEMIT have improved, while FT-L has shown a slight decline, and the results of other methods have not changed. We recommend that you refer to the updated results for reproduction.

image

We will also notify researchers using EasyEdit and ensure that the community conducts experiments on fair and comparable datasets, guaranteeing the reproducibility of results.

We sincerely apologize for any issues caused by the updates.

EasyEdit Team

@zxlzr
Copy link
Contributor

zxlzr commented Nov 20, 2024

Hi buddy, do you have any further questions?

@zxlzr zxlzr closed this as completed Nov 21, 2024
@asimokby
Copy link
Author

Thank you for your answer! I don't have further questions for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants