-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: piece id is out of range. #4
Comments
@lv199882 你好,可参考: |
非常感谢
非常感谢您的解答,我想请教您另一个问题,因为网上没找到llama的模型,我用的llama2-7b,在运行lora_finetune_wn11.py后,我调用lora_infer_wn11.py文件时发生以下错误: |
我也遇到了同样的问题,使用lora在wn11数据集上微调llama2-7b,错误如下: 请问您解决了吗? |
你好,我进行p_tuning时用yago数据可以跑,用自己的数据,目前就做了61条就报错,用的是chatglm-6b模型。
Running tokenizer on train dataset: 100%|██████████████████████████████████████| 61/61 [00:00<00:00, 2185.47 examples/s]
input_ids [5, 107883, 102011, 64744, 73948, 63826, 102011, 65407, 65267, 64379, 31, 71492, 63859, 65845, 63984, 64121, 66740, 12, 91831, 85, 65853, 85, 64174, 7, 150001, 150004, 5, 91831, 150005, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]
Traceback (most recent call last):
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 399, in
main()
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 226, in main
print_dataset_example(train_dataset[0])
File "/home/yuan/kg-llm-main-KGC/ptuning_main.py", line 206, in print_dataset_example
print("inputs", tokenizer.decode(example["input_ids"]))
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3476, in decode
return self._decode(
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 285, in _decode
return super()._decode(token_ids, **kwargs)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 293, in _convert_id_to_token
return self.sp_tokenizer[index]
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 157, in getitem
return self.text_tokenizer.convert_id_to_token(x - self.num_image_tokens)
File "/home/yuan/.cache/huggingface/modules/transformers_modules/chatglm-6b/tokenization_chatglm.py", line 44, in convert_id_to_token
return self.sp.IdToPiece(idx)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/sentencepiece/init.py", line 1045, in _batched_func
return _func(self, arg)
File "/home/yuan/.conda/envs/kgc1/lib/python3.10/site-packages/sentencepiece/init.py", line 1038, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
请问是什么原因吗?感谢
The text was updated successfully, but these errors were encountered: