IndexError: piece id is out of range. #4

lv-otfse · 2023-11-25T04:51:33Z

你好，我进行p_tuning时用yago数据可以跑，用自己的数据，目前就做了61条就报错，用的是chatglm-6b模型。
Running tokenizer on train dataset: 100%|██████████████████████████████████████| 61/61 [00:00<00:00, 2185.47 examples/s]
input_ids [5, 107883, 102011, 64744, 73948, 63826, 102011, 65407, 65267, 64379, 31, 71492, 63859, 65845, 63984, 64121, 66740, 12, 91831, 85, 65853, 85, 64174, 7, 150001, 150004, 5, 91831, 150005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
Traceback (most recent call last):
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 399, in
main()
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 226, in main
print_dataset_example(train_dataset[0])
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 206, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3476, in decode
return self._decode(
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 285, in _decode
return super()._decode(token_ids, **kwargs)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 293, in _convert_id_to_token
return self.sp_tokenizer[index]
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 157, in getitem
return self.text_tokenizer.convert_id_to_token(x - self.num_image_tokens)
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 44, in convert_id_to_token
return self.sp.IdToPiece(idx)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/sentencepiece/init.py", line 1045, in _batched_func
return _func(self, arg)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/sentencepiece/init.py", line 1038, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
请问是什么原因吗？感谢

yao8839836 · 2023-11-25T05:30:40Z

@lv199882

你好，可参考：
THUDM/ChatGLM-6B#438

lv-otfse · 2023-11-30T08:07:52Z

非常感谢

@lv199882

你好，可参考： THUDM/ChatGLM-6B#438 砰砰/ChatGLM-6B#438

非常感谢您的解答，我想请教您另一个问题，因为网上没找到llama的模型，我用的llama2-7b，在运行lora_finetune_wn11.py后，我调用lora_infer_wn11.py文件时发生以下错误：
Traceback (most recent call last):
File "/home/yuan/kg-llm-main-KGC/lora_infer_wn11.py", line 38, in
model = PeftModel.from_pretrained(
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 332, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 629, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 222, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
这个是因为llama2不能使用lora微调吗？

JTWang722 · 2024-01-19T02:56:10Z

非常感谢

@lv199882
你好，可参考： THUDM/ChatGLM-6B#438 砰砰/ChatGLM-6B#438

非常感谢您的解答，我想请教您另一个问题，因为网上没找到llama的模型，我用的llama2-7b，在运行lora_finetune_wn11.py后，我调用lora_infer_wn11.py文件时发生以下错误： Traceback (most recent call last): File "/home/yuan/kg-llm-main-KGC/lora_infer_wn11.py", line 38, in model = PeftModel.from_pretrained( File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 332, in from_pretrained model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs) File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/peft_model.py", line 629, in load_adapter adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs) File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 222, in load_peft_weights adapters_weights = safe_load_file(filename, device=device) File "/home/yuan/.conda/envs/kgc/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file with safe_open(filename, framework="pt", device=device) as f: safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization 这个是因为llama2不能使用lora微调吗？

我也遇到了同样的问题，使用lora在wn11数据集上微调llama2-7b，错误如下：
Traceback (most recent call last):
File "lora_finetune.py", line 179, in
trainer.train()
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/transformers/trainer.py", line 1537, in train
return inner_training_loop(
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/transformers/trainer.py", line 1957, in _inner_training_loop
self._load_best_model()
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/transformers/trainer.py", line 2181, in _load_best_model
model.load_adapter(self.state.best_model_checkpoint, model.active_adapter)
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/peft/peft_model.py", line 689, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/peft/utils/save_and_load.py", line 270, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
File "/home/220/.conda/envs/kgellm/lib/python3.8/site-packages/safetensors/torch.py", line 308, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

请问您解决了吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: piece id is out of range. #4

IndexError: piece id is out of range. #4

lv-otfse commented Nov 25, 2023

yao8839836 commented Nov 25, 2023

lv-otfse commented Nov 30, 2023

JTWang722 commented Jan 19, 2024

IndexError: piece id is out of range. #4

IndexError: piece id is out of range. #4

Comments

lv-otfse commented Nov 25, 2023

yao8839836 commented Nov 25, 2023

lv-otfse commented Nov 30, 2023

JTWang722 commented Jan 19, 2024