-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about start token? #271
Comments
Hi @FYYFU, The Alternatively, you can always generate separately using |
Thanks for your reply! I tried to add input_text1 = '<s>Develop an algorithm that...'
for text in [input_text1]:
out = model.attribute(
input_texts=text,
method="saliency",
generation_args={
"do_sample": True,
"max_new_tokens": 10,
"temperature": 0.7,
"repetition_penalty": 1.0,
"skip_special_tokens": False
}
).show() As long as we use start token as our first tokens in the input, we will get error |
The start token should be added automatically if used by the model, have you tried removing it from the text and seeing whether it was still present in the output that is shown? |
Including the import inseq
import torch
model_name = "Qwen/Qwen1.5-0.5B-Chat"
access_token = None
attr_type = "saliency"
model = inseq.load_model(
model_name,
attr_type,
)
prompt_message = "Summarise this article: If you're famous and performing the American national anthem, be prepared to become a national hero or a national disgrace."
messages = [{"role": "user", "content": prompt_message}]
prompt = model.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = "<s>" + prompt
out = model.attribute(
prompt,
generation_args={"max_new_tokens": 64, "do_sample": False, "skip_special_tokens": False},
)
out.show() But yes, in principle, we'll have to take a second look at how special tokens are handled in various cases (e.g. ignoring pad when doing batched attribution, but not ignoring special tokens in the prompt). In particular, having a flag in |
Yes. I also tried your code but changed the model to So the problem is still on checking the bos token. Anyway thanks for your reply and your awesome open-source tool. |
Update: a new |
Question
How to change the default setting of adding special tokens?
Additional context
The wrong information:
AssertionError: Forced generations with decoder-only models must start with the input texts.
I noticed that during attrubution it will use the created tokenizer to encode the
input_text
and theadd_special_tokens=True
was the default setting. And we cannot change the variable. The encoded input will be<s><s>[INST]...
.So I wonder if this is a bug or something else?
Checklist
issues
.The text was updated successfully, but these errors were encountered: