FT Vs. FT-L key hyperparameters #430

asimokby · 2024-11-19T15:34:29Z

I am trying to replicate the results in this paper: https://arxiv.org/pdf/2402.11905#page=5.21

They differentiate between FT and FT-L. What do I need to change in hparams to try both? Does making norm_contraint: false is enough to run FT? What is the key hyperparameters to change to run FT and FT-L? Thanks!

alg_name: "FT"
model_name: "openai-community/gpt2"
device: 0

layers: [0]
num_steps: 25
batch_size: 3
max_length: 40
lr: 5e-4
weight_decay: 0
kl_factor: 0
# norm_constraint: 5e-4
norm_constraint: false
rewrite_module_tmp: "transformer.h.{}.mlp.c_proj"
layer_module_tmp: "transformer.h.{}"
mlp_module_tmp: "transformer.h.{}.mlp"
attn_module_tmp: "transformer.h.{}.attn"
ln_f_module: "transformer.ln_f"
lm_head_module: "transformer.wte"
model_parallel: false

The text was updated successfully, but these errors were encountered:

littlefive5 · 2024-11-19T20:12:18Z

Our "FT" hparams is used for FT-L training, which is just tune the MLP in one layer. Our repo does not support the whole-parameters tuning, if you want to conduct the FT in their paper to tune the whole parameters, you can just follow the original Huggingface training scripts and you can use our code to evaluate.

zxlzr · 2024-11-20T00:29:28Z

Thank you for your interest in EasyEdit! EasyEdit is continuously maintained and updated. If you achieve better results (which might happen with certain methods), it could be due to updates in Python library versions or optimizations in the EasyEdit module. If you have any questions, feel free to reach out at any time!

zxlzr · 2024-11-20T04:28:54Z

I am trying to replicate the results in this paper: https://arxiv.org/pdf/2402.11905#page=5.21

They differentiate between FT and FT-L. What do I need to change in hparams to try both? Does making norm_contraint: false is enough to run FT? What is the key hyperparameters to change to run FT and FT-L? Thanks!
alg_name: "FT"
model_name: "openai-community/gpt2"
device: 0

layers: [0]
num_steps: 25
batch_size: 3
max_length: 40
lr: 5e-4
weight_decay: 0
kl_factor: 0
# norm_constraint: 5e-4
norm_constraint: false
rewrite_module_tmp: "transformer.h.{}.mlp.c_proj"
layer_module_tmp: "transformer.h.{}"
mlp_module_tmp: "transformer.h.{}.mlp"
attn_module_tmp: "transformer.h.{}.attn"
ln_f_module: "transformer.ln_f"
lm_head_module: "transformer.wte"
model_parallel: false

Dear asimokby,

We would like to inform you that the results of KnowEdit have been updated due to updates and bug fixes in EasyEdit (details in #427). The conclusions are as follows: the results of AdaLora, ROME, and MEMIT have improved, while FT-L has shown a slight decline, and the results of other methods have not changed. We recommend that you refer to the updated results for reproduction.

We will also notify researchers using EasyEdit and ensure that the community conducts experiments on fair and comparable datasets, guaranteeing the reproducibility of results.

We sincerely apologize for any issues caused by the updates.

EasyEdit Team

zxlzr · 2024-11-20T10:20:41Z

Hi buddy, do you have any further questions?

asimokby · 2024-11-21T08:44:12Z

Thank you for your answer! I don't have further questions for now.

zxlzr added the question Further information is requested label Nov 20, 2024

zxlzr closed this as completed Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FT Vs. FT-L key hyperparameters #430

FT Vs. FT-L key hyperparameters #430

asimokby commented Nov 19, 2024

littlefive5 commented Nov 19, 2024

zxlzr commented Nov 20, 2024

zxlzr commented Nov 20, 2024

zxlzr commented Nov 20, 2024

asimokby commented Nov 21, 2024

FT Vs. FT-L key hyperparameters #430

FT Vs. FT-L key hyperparameters #430

Comments

asimokby commented Nov 19, 2024

littlefive5 commented Nov 19, 2024

zxlzr commented Nov 20, 2024

zxlzr commented Nov 20, 2024

zxlzr commented Nov 20, 2024

asimokby commented Nov 21, 2024