Skip to content

Commit

Permalink
[Fix] InfoVQA, WandB logging, CLI problems. (EvolvingLMMs-Lab#31)
Browse files Browse the repository at this point in the history
* Remove unused code and configuration file

* Remove docvqa.yaml and update vizwizvqa.yaml

* lint

* Add dataset_kwargs to vizwizvqa.yaml

* Add dataset_kwargs to vizwizvqa.yaml

* textvqa (EvolvingLMMs-Lab#27)

* Update textvqa.yaml and utils.py

* Fix YAML formatting in textvqa.yaml and remove unused files

* remove useless matric

* add textvqa val & test

* Update progress bar description in evaluator.py

* Update submission file names in VizWizVQA tasks

* Update output path to include log samples suffix

* Update submission file paths in OKVQA and VizWizVQA tasks

* Refactor llava-in-the-wild.yaml and utils.py

* Update metric for llava evaluation

* Refactor logging message in Task class

* Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b'

* Fix formatting issues and add progress bar closing statements

* Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

* Update tqdm progress bar in OtterHD model

* Squashed commit of the following:

commit eae210c3700a59b7d5cc9de46fcb855f443096aa
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:46:19 2024 +0800

    Black lint

commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
Merge: ab898e4 fb209e4
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:45:31 2024 +0800

    Merge branch 'main' into kc/list_tasks_num

commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:44:23 2024 +0800

    Enable list all tasks num

commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:41:32 2024 +0800

    Exclude train yaml file in the task list

commit 5553d10
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Sun Jan 28 02:04:57 2024 +0800

    Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

    * add chartqa

    * black

    * add ai2d

    * black

    * update chartqa

    * blacl

    * update ai2d dataset

    * black

    * add qwenvl

    * add infovqa and docvqa

* Fix error handling in loading YAML config files

* Squashed commit of the following:

commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 12:41:40 2024 +0800

    Fix key bugs

commit eae210c3700a59b7d5cc9de46fcb855f443096aa
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:46:19 2024 +0800

    Black lint

commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
Merge: ab898e4 fb209e4
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:45:31 2024 +0800

    Merge branch 'main' into kc/list_tasks_num

commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:44:23 2024 +0800

    Enable list all tasks num

commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Jan 28 09:41:32 2024 +0800

    Exclude train yaml file in the task list

commit 5553d10
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Sun Jan 28 02:04:57 2024 +0800

    Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

    * add chartqa

    * black

    * add ai2d

    * black

    * update chartqa

    * blacl

    * update ai2d dataset

    * black

    * add qwenvl

    * add infovqa and docvqa

* List task #num sorted

* Update prompt messages for image-related tasks

* Delete unused task configuration files

* Remove coco_train.yaml configuration file

* Update task name in mmmu.yaml

* Fix error message for missing tasks

* Add wandb import and integration

* Update generation kwargs for LMMS tasks

* Update lmms_eval MME task configuration and utils

* Update generation_kwargs in lmms_eval tasks

* Update doc_to_text function in coco and okvqa tasks

* Add COCO 2017 version

* Update task name in coco_test2017.yaml

* Squashed commit of the following:

commit 0fd4558
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Mon Jan 29 22:41:33 2024 +0800

    Add/mmmu test (EvolvingLMMs-Lab#30)

    * mmmu_test

    * black

commit f125889
Author: Li Bo <drluodian@gmail.com>
Date:   Sun Jan 28 22:19:13 2024 +0800

    [Dataset Check] dataset check and add wandb logging (EvolvingLMMs-Lab#29)

    * Remove unused code and configuration file

    * Remove docvqa.yaml and update vizwizvqa.yaml

    * lint

    * Add dataset_kwargs to vizwizvqa.yaml

    * Add dataset_kwargs to vizwizvqa.yaml

    * textvqa (EvolvingLMMs-Lab#27)

    * Update textvqa.yaml and utils.py

    * Fix YAML formatting in textvqa.yaml and remove unused files

    * remove useless matric

    * add textvqa val & test

    * Update progress bar description in evaluator.py

    * Update submission file names in VizWizVQA tasks

    * Update output path to include log samples suffix

    * Update submission file paths in OKVQA and VizWizVQA tasks

    * Refactor llava-in-the-wild.yaml and utils.py

    * Update metric for llava evaluation

    * Refactor logging message in Task class

    * Merge commit '5553d106e5ffd84b280b3d5a3c8d47c35e2d310b'

    * Fix formatting issues and add progress bar closing statements

    * Update task from "infovqa_val" to "infovqa_test" in infovqa_test.yaml

    * Update tqdm progress bar in OtterHD model

    * Squashed commit of the following:

    commit eae210c3700a59b7d5cc9de46fcb855f443096aa
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:46:19 2024 +0800

        Black lint

    commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
    Merge: ab898e4 fb209e4
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:45:31 2024 +0800

        Merge branch 'main' into kc/list_tasks_num

    commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:44:23 2024 +0800

        Enable list all tasks num

    commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:41:32 2024 +0800

        Exclude train yaml file in the task list

    commit 5553d10
    Author: Zhang Peiyuan <a1286225768@gmail.com>
    Date:   Sun Jan 28 02:04:57 2024 +0800

        Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

        * add chartqa

        * black

        * add ai2d

        * black

        * update chartqa

        * blacl

        * update ai2d dataset

        * black

        * add qwenvl

        * add infovqa and docvqa

    * Fix error handling in loading YAML config files

    * Squashed commit of the following:

    commit fdb0c6785b0c5d6979d10e7ddf75ce9055038db8
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 12:41:40 2024 +0800

        Fix key bugs

    commit eae210c3700a59b7d5cc9de46fcb855f443096aa
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:46:19 2024 +0800

        Black lint

    commit 18e4a19e82357352ab25df77b5ae4f1b011d61ae
    Merge: ab898e4 fb209e4
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:45:31 2024 +0800

        Merge branch 'main' into kc/list_tasks_num

    commit e899be48f55f95172fdf96bd2a98d3b91ff2aaed
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:44:23 2024 +0800

        Enable list all tasks num

    commit a999fc6889c6986c28ec5d95460a4ab5233e5d4f
    Author: kcz358 <92624596+kcz358@users.noreply.github.com>
    Date:   Sun Jan 28 09:41:32 2024 +0800

        Exclude train yaml file in the task list

    commit 5553d10
    Author: Zhang Peiyuan <a1286225768@gmail.com>
    Date:   Sun Jan 28 02:04:57 2024 +0800

        Add InfoVQA, DocVQA, and QwenVL (EvolvingLMMs-Lab#28)

        * add mmme

        * black

        * add model specific prompt and gen kwargs

        * black

        * add yaml config to supprot multi-model eval

        * print table at the end

        * refactor multi model code

        * add chartqa

        * black

        * add ai2d

        * black

        * update chartqa

        * blacl

        * update ai2d dataset

        * black

        * add qwenvl

        * add infovqa and docvqa

    * List task #num sorted

    * Update prompt messages for image-related tasks

    * Delete unused task configuration files

    * Remove coco_train.yaml configuration file

    * Update task name in mmmu.yaml

    * Fix error message for missing tasks

    * Add wandb import and integration

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
    Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>

* Refactor CLI evaluate function and improve error logging

---------

Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
  • Loading branch information
3 people authored Jan 30, 2024
1 parent 0fd4558 commit 12675c7
Show file tree
Hide file tree
Showing 27 changed files with 146 additions and 64 deletions.
17 changes: 16 additions & 1 deletion lmms_eval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def parse_eval_args() -> argparse.Namespace:
return args


def cli_evaluate(args: Union[argparse.Namespace, None], wandb_run) -> None:
def cli_evaluate(args: Union[argparse.Namespace, None] = None, wandb_run=None) -> None:
if args is None:
args = parse_eval_args()

Expand Down Expand Up @@ -292,10 +292,22 @@ def print_results(args, results):

# initialize Accelerator
accelerator = Accelerator()
all_args_dict = vars(args)

if accelerator.is_main_process:
# initialize a W&B run only on rank 0
wandb_args_dict = utils.simple_parse_args_string(args.wandb_args)
if "name" not in wandb_args_dict:
if "config" not in all_args_dict:
# use the model name and task names as run name
task_names = args.tasks.replace(",", "_")
wandb_args_dict["name"] = f"{args.model}_{task_names}_{args.log_samples_suffix}"
if args.num_fewshot:
wandb_args_dict["name"] += f"_{args.num_fewshot}shot"
else:
# use the name of the config file as run name
wandb_args_dict["name"] = all_args_dict["config"].split("/")[-1].split(".")[0]

wandb_run = wandb.init(**wandb_args_dict)
is_main_process = True
else:
Expand All @@ -307,3 +319,6 @@ def print_results(args, results):
for args in args_list:
results = cli_evaluate(args, wandb_run)
results_list.append(results)

if is_main_process:
wandb_run.finish()
10 changes: 6 additions & 4 deletions lmms_eval/models/llava.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ def _collate(x):
if "image_aspect_ratio" in gen_kwargs.keys() and "image_aspect_ratio" not in self._config.__dict__:
# here we should pop it out of gen_kwargs so that it doesn't get passed to the model for next step of generation
self._config.image_aspect_ratio = gen_kwargs.pop("image_aspect_ratio")

eval_logger.info(f"Setting image aspect ratio: {self._config.image_aspect_ratio}")
# encode, pad, and truncate contexts for this batch
if visuals:
image_tensor = process_images(visuals, self._image_processor, self._config)
Expand Down Expand Up @@ -289,7 +289,7 @@ def _collate(x):
input_ids = tokenizer_image_token(prompt, self.tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).to(self.device)

# preconfigure gen_kwargs with defaults
gen_kwargs["image_sizes"] = [visuals[0].size]
gen_kwargs["image_sizes"] = [visuals[idx].size for idx in range(len(visuals))]
if "max_new_tokens" not in gen_kwargs:
gen_kwargs["max_new_tokens"] = 1024
if "temperature" not in gen_kwargs:
Expand Down Expand Up @@ -318,9 +318,11 @@ def _collate(x):
use_cache=self.use_cache,
)
except Exception as e:
print("Error in generating")
eval_logger.error(f"Error {e} in generating")
cont = ""
raise e
eval_logger.error(prompt)
eval_logger.error(visuals)
eval_logger.error(prompts_input)

cont_toks_list = cont.tolist()
for cont_toks, context in zip(cont_toks_list, contexts):
Expand Down
4 changes: 4 additions & 0 deletions lmms_eval/tasks/coco/coco2017.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
group : coco2017
task:
- coco_val2017
- coco_test2017
6 changes: 2 additions & 4 deletions lmms_eval/tasks/coco/coco_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,10 @@ group : "coco_caption"
test_split: test
output_type: generate_until
doc_to_visual: !function utils.coco_doc_to_visual
doc_to_text: !function utils.coco_doc_to_text
doc_to_text: "Provide a one-sentence caption for the provided image."
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 128
temperature: 0
top_p: 0
num_beams: 1
Expand Down
24 changes: 24 additions & 0 deletions lmms_eval/tasks/coco/coco_test2017.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
dataset_path: lmms-lab/COCO-Caption2017
dataset_kwargs:
token: True
task : "coco_test2017"
group : "coco_caption2017"
test_split: test
output_type: generate_until
doc_to_visual: !function utils.coco_doc_to_visual
doc_to_text: !function utils.coco_doc_to_text
doc_to_target: "answer"
generation_kwargs:
max_new_tokens: 128
temperature: 0
top_p: 0
num_beams: 1
do_sample: false
process_results: !function utils.coco_test_process_result
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
metric_list:
- metric: coco_passthrough
aggregation : !function utils.coco_test_aggregation_result
higher_is_better : true
metadata:
- version: 0.0
6 changes: 2 additions & 4 deletions lmms_eval/tasks/coco/coco_val.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,10 @@ group : "coco_caption"
test_split: val
output_type: generate_until
doc_to_visual: !function utils.coco_doc_to_visual
doc_to_text: !function utils.coco_doc_to_text
doc_to_text: "Provide a one-sentence caption for the provided image."
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 64
temperature: 0
top_p: 0
num_beams: 1
Expand Down
45 changes: 45 additions & 0 deletions lmms_eval/tasks/coco/coco_val2017.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
dataset_path: lmms-lab/COCO-Caption2017
dataset_kwargs:
token: True
task: "coco_val2017"
group : "coco_caption2017"
test_split: val
output_type: generate_until
doc_to_visual: !function utils.coco_doc_to_visual
doc_to_text: !function utils.coco_doc_to_text
doc_to_target: "answer"
generation_kwargs:
max_new_tokens: 64
temperature: 0
top_p: 0
num_beams: 1
do_sample: false
process_results: !function utils.coco_process_result
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
metric_list:
- metric: coco_Bleu_4
aggregation : !function utils.coco_bleu4
higher_is_better : true
- metric: coco_Bleu_3
aggregation : !function utils.coco_bleu3
higher_is_better : true
- metric: coco_Bleu_2
aggregation : !function utils.coco_bleu2
higher_is_better : true
- metric: coco_Bleu_1
aggregation : !function utils.coco_bleu1
higher_is_better : true
- metric: coco_METEOR
aggregation : !function utils.coco_meteor
higher_is_better : true
- metric: coco_ROUGE_L
aggregation : !function utils.coco_rougel
higher_is_better : true
- metric: coco_CIDEr
aggregation : !function utils.coco_cider
higher_is_better : true
#- metric: coco_SPICE
# aggregation : !function utils.coco_spice
# higher_is_better : true
metadata:
- version: 0.0
3 changes: 1 addition & 2 deletions lmms_eval/tasks/coco/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ def coco_doc_to_visual(doc):


def coco_doc_to_text(doc):
question = doc["question"]
return f"{question}\nDescribe this image briefly using a single sentence."
return f"Provide a one-sentence caption for the provided image."


def coco_process_result(doc, result):
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/flickr30k/flickr30k.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@ doc_to_visual: !function utils.flickr_doc_to_visual
doc_to_text: !function utils.flickr_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 64
temperature: 0
top_p: 0
num_beams: 1
Expand Down
4 changes: 2 additions & 2 deletions lmms_eval/tasks/flickr30k/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ def flickr_doc_to_visual(doc):


def flickr_doc_to_text(doc):
question = "Please carefully observe the image and come up with a caption for the image."
return f"{question}\nAnswer the question with a short phrase."
# question = "Please carefully observe the image and come up with a caption for the image"
return f"Provide a one-sentence caption for the provided image."


def flickr_process_result(doc, result):
Expand Down
7 changes: 5 additions & 2 deletions lmms_eval/tasks/gqa/gqa.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,11 @@ doc_to_visual: !function utils.gqa_doc_to_visual
doc_to_text: !function utils.gqa_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 16
temperature: 0
top_p: 0
num_beams: 1
do_sample: false
metric_list:
- metric: exact_match
aggregation: mean
Expand Down
2 changes: 1 addition & 1 deletion lmms_eval/tasks/infovqa/infovqa_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ dataset_name: InfographicVQA
dataset_kwargs:
token: True
task: "infovqa_test"
test_split: validation
test_split: test
output_type: generate_until
doc_to_visual: !function utils.infovqa_doc_to_visual
doc_to_text: !function utils.infovqa_doc_to_text
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/mmbench_cn/mmbench_cc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ doc_to_visual: !function cc_utils.mmbench_doc_to_visual
doc_to_text: !function cc_utils.mmbench_cn_cc_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 256
temperature: 0
top_p: 0
num_beams: 1
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/mmbench_cn/mmbench_cn_dev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ doc_to_visual: !function utils.mmbench_doc_to_visual
doc_to_text: !function utils.mmbench_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 256
temperature: 0
top_p: 0
num_beams: 1
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/mmbench_cn/mmbench_cn_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,7 @@ doc_to_visual: !function utils.mmbench_doc_to_visual
doc_to_text: !function utils.mmbench_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 256
temperature: 0
top_p: 0
num_beams: 1
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/mmbench_en/mmbench_en_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ doc_to_visual: !function utils.mmbench_doc_to_visual
doc_to_text: !function utils.mmbench_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 256
temperature: 0
top_p: 0
num_beams: 1
Expand Down
13 changes: 10 additions & 3 deletions lmms_eval/tasks/mme/mme.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@ doc_to_visual: !function utils.mme_doc_to_visual
doc_to_text: !function utils.mme_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 16
temperature: 0
top_p: 0
num_beams: 1
do_sample: false
# The return value of process_results will be used by metrics
process_results: !function utils.mme_process_results
# Note that the metric name can be either a registed metric function (such as the case for GQA) or a key name returned by process_results
Expand All @@ -20,5 +23,9 @@ metric_list:
- metric: mme_cognition_score
aggregation: !function utils.mme_aggregate_results
higher_is_better: true
model_specific_prompt_kwargs:
default:
pre_prompt: ""
post_prompt: "\nAnswer the question using a single word or phrase."
metadata:
- version: 0.0
- version: 0.0
16 changes: 10 additions & 6 deletions lmms_eval/tasks/mme/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,22 @@
}


replace_prompt = "Please answer yes or no."
replace_prompt = " Please answer yes or no."


def mme_doc_to_visual(doc):
return [doc["image"].convert("RGB")]


def mme_doc_to_text(doc):
question = doc["question"]
# TODO: This is a hack. We should fix this in the dataset.
question = question.replace(replace_prompt, "").strip()
return f"{question}\nAnswer the question using a single word or phrase."
def mme_doc_to_text(doc, model_specific_prompt_kwargs=None):
question = doc["question"].strip()
if "pre_prompt" in model_specific_prompt_kwargs and model_specific_prompt_kwargs["pre_prompt"] != "":
question = question.replace(replace_prompt, "")
question = f"{model_specific_prompt_kwargs['pre_prompt']}{question}"
if "post_prompt" in model_specific_prompt_kwargs and model_specific_prompt_kwargs["post_prompt"] != "":
question = question.replace(replace_prompt, "")
question = f"{question}{model_specific_prompt_kwargs['post_prompt']}"
return question


def parse_pred_ans(pred_ans):
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/nocaps/nocaps_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ doc_to_visual: !function utils.nocaps_doc_to_visual
doc_to_text: !function utils.nocaps_doc_to_text
doc_to_target: "annotations_captions"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 64
temperature: 0
top_p: 0
num_beams: 1
Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/nocaps/nocaps_val.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ doc_to_visual: !function utils.nocaps_doc_to_visual
doc_to_text: !function utils.nocaps_doc_to_text
doc_to_target: "annotations_captions"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 64
temperature: 0
top_p: 0
num_beams: 1
Expand Down
7 changes: 5 additions & 2 deletions lmms_eval/tasks/okvqa/okvqa.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,11 @@ doc_to_visual: !function utils.okvqa_doc_to_visual
doc_to_text: !function utils.okvqa_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 16
temperature: 0
top_p: 0
num_beams: 1
do_sample: false
metric_list:
- metric: exact_match
aggregation: mean
Expand Down
2 changes: 1 addition & 1 deletion lmms_eval/tasks/okvqa/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ def okvqa_process_results(doc, result):


def okvqa_doc_to_text(doc):
text = f"{doc['question'].capitalize()}\n Answer the question using a single word or phrase."
text = f"{doc['question'].capitalize()}\nAnswer the question using a single word."
return text


Expand Down
4 changes: 1 addition & 3 deletions lmms_eval/tasks/textcaps/textcaps_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ doc_to_visual: !function utils.textcaps_doc_to_visual
doc_to_text: !function utils.textcaps_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
max_new_tokens: 1024
max_new_tokens: 64
temperature: 0
top_p: 0
num_beams: 1
Expand Down
Loading

0 comments on commit 12675c7

Please sign in to comment.