Skip to content

Commit

Permalink
vizwiz dataset (EvolvingLMMs-Lab#24)
Browse files Browse the repository at this point in the history
* Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545'

* Update dataset paths and improve user prompts

* Add submission folder and update file paths for storing prediction results

* Merge commit 'b2c71248314fc8f8461222e594c7ab046f5383f5'

* Update dataset_path in flickr30k.yaml

* Add coco_val and coco_test tasks to coco.yaml

* Squashed commit of the following:

commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit 63739fc6fa0a462d807ae81de0db0173102de584
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit edcc752f97ea3845cefad56624e5d2855066f680
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:36:46 2024 +0800

    add model specific prompt and gen kwargs

commit b2c7124
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Thu Jan 25 09:47:31 2024 +0800

    [Dataset] Add flickr30k (EvolvingLMMs-Lab#18)

    * Add flickr30k support

    * Black lint

    * Align prompt with NoCaps

commit d9c5827
Author: Li Bo <drluodian@gmail.com>
Date:   Wed Jan 24 22:10:14 2024 +0800

    [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17)

    * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545'

    * Update dataset paths and improve user prompts

commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:56:51 2024 +0800

    black

commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:55:43 2024 +0800

    add mmme

* Squashed commit of the following:

commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:59:12 2024 +0800

    refactor multi model code

commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:51:16 2024 +0800

    print table at the end

commit 63739fc6fa0a462d807ae81de0db0173102de584
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 11:20:59 2024 +0800

    add yaml config to supprot multi-model eval

commit edcc752f97ea3845cefad56624e5d2855066f680
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:39:42 2024 +0800

    black

commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5
Merge: 7e8b57d 1d3fdd4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:37:57 2024 +0800

    resolve conflicts in sqa

commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4
Author: jzhang38 <a1286225768@gmail.com>
Date:   Thu Jan 25 10:36:46 2024 +0800

    add model specific prompt and gen kwargs

commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:56:51 2024 +0800

    black

commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd
Author: jzhang38 <a1286225768@gmail.com>
Date:   Wed Jan 24 13:55:43 2024 +0800

    add mmme

* Squashed commit of the following:

commit 8dce2b0
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Fix cli itself can not run with config file

* Fix bug in login functionality

Refactor code for better performance

Add new feature for user authentication

Update UI layout for improved user experience

Fix typo in variable name

Optimize database queries for faster response time

Add error handling for edge cases

Update dependencies to latest versions

Remove unused code

Improve code readability and maintainability

* Refactor get_task_dict function to handle nested groups

* Add submission file for coco, flickr30k, nocaps, and textcaps tasks

* Remove unused files and update task configuration

* Fix tasks issue for nocaps, refcoco/+/g

* Fix file path and raise error if config file does not exist

* Exclude train in refcoco/+/g config

* Solve doc_iterator_for_counting crashing issue

* Black lint

* Refactor code to improve performance and readability

* Squashed commit of the following:

commit 6f66c1130070307ba51eae79f54e197f0053266b
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:03:57 2024 +0800

    change okvqa yaml

commit a6d360d7b1092d5656e4b4ad7d8964f44ee0a3dc
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:55:40 2024 +0800

    change yaml

commit 7ed11f762e3af8b9a2261793c5bbc9c3ebc2c512
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:42:43 2024 +0800

    add okvqa task

commit 8dce2b0
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Squashed commit of the following:

commit 963fd932338aae1dee007bbb574daec162cb58bb
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:06:02 2024 +0800

    change ocr reference

commit 1481d73aef646233dce05b3b2989a9e8eddcab2b
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 01:05:46 2024 +0800

    revert example_eval

commit 45a3bf24b4c6e610237e2ef81f1b01cf11ee25d9
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Fri Jan 26 00:17:28 2024 +0800

    edit vizwiz utils

commit 63080782e2d7544d58c513648dd64647131d6337
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 23:49:47 2024 +0800

    reorganize __init__

commit ef60547ab60a4a5e18de1634c8126ad5cbc1139c
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 23:46:20 2024 +0800

    minor fixes

commit 7d2e92c2835f88cd7832ddab0874996b308faa9a
Author: JvThunder <joshuaadrianc@gmail.com>
Date:   Thu Jan 25 17:41:03 2024 +0800

    add vizwizvqa eval rask

commit 8dce2b0
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Jan 25 17:08:25 2024 +0800

    add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19)

    * add mmme

    * black

    * add model specific prompt and gen kwargs

    * black

    * add yaml config to supprot multi-model eval

    * print table at the end

    * refactor multi model code

* Refactor mathvista.yaml and utils.py

* Add gpt_eval_score to mathvista_process_results

* Refactor mathvista_aggregate_results to return average accuracy score

* Fix refcoco evaluation error

* Fix evaluation problem for refcoco+/g

* Refactor mathvista.yaml and mathvista_evals.py

* Add dependencies and update YAML files

* Refactor mmbench_en/utils.py to save test results to separate Excel file

* Fix caption task prompt

* Add group field to mmbench_en_test and mmbench_en_val yaml files

* Delete mmbench_en_val.yaml file

* Update mmbench_cn.yaml and mmbench_cn_test.yaml

* Update mmbench_cn_val.yaml and utils.py

* Remove unused fields in mmbench_cn_cc_process_results function

* Update aggregation function for mmbench_en_dev.yaml

* Fix capitalization of L2-category key in utils.py

* Fix variable name in mmbench_process_results function

* Delete mmbench_cn_val.yaml file

* Update mathvista_test.yaml and mathvista_testmini.yaml

* Fix warnings and update mathvista.yaml

* Remove system message from MathVistaEvaluator

* Update GPT model version in MathVistaEvaluator constructor

* Update GQA_RAW_IMAGE_DATASET path in utils.py

* change vizwiz to test set

* Add split flag to mathvista_aggregate_results function

* Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files

* Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py

* vizwiz-val

* Update utils.py

* Update vizwizvqa.yaml

---------

Co-authored-by: Bo Li <drluodian@gmail.com>
Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
  • Loading branch information
3 people authored Jan 27, 2024
1 parent 0983188 commit 55fe62a
Show file tree
Hide file tree
Showing 4 changed files with 300 additions and 3 deletions.
2 changes: 1 addition & 1 deletion lmms_eval/evaluator.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,7 @@ def evaluate(
# Don't use above one, this would crash if doc_iterator_for_counting contains too many objects and very slow
doc_iterator_for_counting = itertools.islice(range(len(task.test_docs())), lm.rank, limit, lm.world_size) if task.has_test_docs() else itertools.islice(range(len(task.validation_docs())), lm.rank, limit, lm.world_size)
total_docs = sum(1 for _ in doc_iterator_for_counting)
pbar = tqdm(total=total_docs, desc="Postprocessing")
pbar = tqdm(total=total_docs, desc="Postprocessing", position=lm.rank)
for doc_id, doc in doc_iterator:
# subset instances to only this document id ; sort by idx
requests = list(filter(lambda x: x.doc_id == doc_id, task.instances))
Expand Down
4 changes: 2 additions & 2 deletions lmms_eval/tasks/vizwizvqa/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,14 +252,14 @@ def vizwizvqa_process_results(doc, result):
return {
"exact_match": accuracy,
"submission": {
"question_id": doc["question_id"],
"image": f"{doc['question_id']}.jpg",
"answer": resAns,
},
}


def vizwizvqa_doc_to_text(doc):
text = f"{doc['question'].capitalize()}\n When the provided information is insufficient, respond with 'unanswerable'. Answer the question using a single word or phrase."
text = f"{doc['question'].capitalize()}\nWhen the provided information is insufficient, respond with 'Unanswerable'.\nAnswer the question using a single word or phrase."
return text


Expand Down
273 changes: 273 additions & 0 deletions lmms_eval/tasks/vizwizvqa_val/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
import re
import os
import json
import yaml
import pathlib
import logging
import datetime
import statistics

eval_logger = logging.getLogger("lmms-eval")

with open(pathlib.Path(__file__).parent / "vizwizvqa.yaml", "r") as f:
raw_data = f.readlines()
for i in range(len(raw_data)):
raw_data[i] = raw_data[i].replace("!function", "function")

config = yaml.safe_load("".join(raw_data))


class EvalAIAnswerProcessor:
CONTRACTIONS = {
"aint": "ain't",
"arent": "aren't",
"cant": "can't",
"couldve": "could've",
"couldnt": "couldn't",
"couldn'tve": "couldn't've",
"couldnt've": "couldn't've",
"didnt": "didn't",
"doesnt": "doesn't",
"dont": "don't",
"hadnt": "hadn't",
"hadnt've": "hadn't've",
"hadn'tve": "hadn't've",
"hasnt": "hasn't",
"havent": "haven't",
"hed": "he'd",
"hed've": "he'd've",
"he'dve": "he'd've",
"hes": "he's",
"howd": "how'd",
"howll": "how'll",
"hows": "how's",
"Id've": "I'd've",
"I'dve": "I'd've",
"Im": "I'm",
"Ive": "I've",
"isnt": "isn't",
"itd": "it'd",
"itd've": "it'd've",
"it'dve": "it'd've",
"itll": "it'll",
"let's": "let's",
"maam": "ma'am",
"mightnt": "mightn't",
"mightnt've": "mightn't've",
"mightn'tve": "mightn't've",
"mightve": "might've",
"mustnt": "mustn't",
"mustve": "must've",
"neednt": "needn't",
"notve": "not've",
"oclock": "o'clock",
"oughtnt": "oughtn't",
"ow's'at": "'ow's'at",
"'ows'at": "'ow's'at",
"'ow'sat": "'ow's'at",
"shant": "shan't",
"shed've": "she'd've",
"she'dve": "she'd've",
"she's": "she's",
"shouldve": "should've",
"shouldnt": "shouldn't",
"shouldnt've": "shouldn't've",
"shouldn'tve": "shouldn't've",
"somebody'd": "somebodyd",
"somebodyd've": "somebody'd've",
"somebody'dve": "somebody'd've",
"somebodyll": "somebody'll",
"somebodys": "somebody's",
"someoned": "someone'd",
"someoned've": "someone'd've",
"someone'dve": "someone'd've",
"someonell": "someone'll",
"someones": "someone's",
"somethingd": "something'd",
"somethingd've": "something'd've",
"something'dve": "something'd've",
"somethingll": "something'll",
"thats": "that's",
"thered": "there'd",
"thered've": "there'd've",
"there'dve": "there'd've",
"therere": "there're",
"theres": "there's",
"theyd": "they'd",
"theyd've": "they'd've",
"they'dve": "they'd've",
"theyll": "they'll",
"theyre": "they're",
"theyve": "they've",
"twas": "'twas",
"wasnt": "wasn't",
"wed've": "we'd've",
"we'dve": "we'd've",
"weve": "we've",
"werent": "weren't",
"whatll": "what'll",
"whatre": "what're",
"whats": "what's",
"whatve": "what've",
"whens": "when's",
"whered": "where'd",
"wheres": "where's",
"whereve": "where've",
"whod": "who'd",
"whod've": "who'd've",
"who'dve": "who'd've",
"wholl": "who'll",
"whos": "who's",
"whove": "who've",
"whyll": "why'll",
"whyre": "why're",
"whys": "why's",
"wont": "won't",
"wouldve": "would've",
"wouldnt": "wouldn't",
"wouldnt've": "wouldn't've",
"wouldn'tve": "wouldn't've",
"yall": "y'all",
"yall'll": "y'all'll",
"y'allll": "y'all'll",
"yall'd've": "y'all'd've",
"y'alld've": "y'all'd've",
"y'all'dve": "y'all'd've",
"youd": "you'd",
"youd've": "you'd've",
"you'dve": "you'd've",
"youll": "you'll",
"youre": "you're",
"youve": "you've",
}

NUMBER_MAP = {
"none": "0",
"zero": "0",
"one": "1",
"two": "2",
"three": "3",
"four": "4",
"five": "5",
"six": "6",
"seven": "7",
"eight": "8",
"nine": "9",
"ten": "10",
}
ARTICLES = ["a", "an", "the"]
PERIOD_STRIP = re.compile(r"(?!<=\d)(\.)(?!\d)")
COMMA_STRIP = re.compile(r"(?<=\d)(\,)+(?=\d)")
PUNCTUATIONS = [
";",
r"/",
"[",
"]",
'"',
"{",
"}",
"(",
")",
"=",
"+",
"\\",
"_",
"-",
">",
"<",
"@",
"`",
",",
"?",
"!",
]

def __init__(self, *args, **kwargs):
pass

def word_tokenize(self, word):
word = word.lower()
word = word.replace(",", "").replace("?", "").replace("'s", " 's")
word = word.replace("\n", " ").replace("\t", " ").strip()
return word.strip()

def process_punctuation(self, in_text):
out_text = in_text
for p in self.PUNCTUATIONS:
if (p + " " in in_text or " " + p in in_text) or (re.search(self.COMMA_STRIP, in_text) is not None):
out_text = out_text.replace(p, "")
else:
out_text = out_text.replace(p, " ")
out_text = self.PERIOD_STRIP.sub("", out_text, re.UNICODE)
return out_text

def process_digit_article(self, in_text):
out_text = []
temp_text = in_text.lower().split()
for word in temp_text:
word = self.NUMBER_MAP.setdefault(word, word)
if word not in self.ARTICLES:
out_text.append(word)
else:
pass
for word_id, word in enumerate(out_text):
if word in self.CONTRACTIONS:
out_text[word_id] = self.CONTRACTIONS[word]
out_text = " ".join(out_text)
return out_text

def __call__(self, item):
item = self.word_tokenize(item)
item = self.process_punctuation(item)
item = self.process_digit_article(item)
return item


def vizwizvqa_doc_to_visual(doc):
return [doc["image"].convert("RGB")]


def vizwizvqa_process_results(doc, result):
eval_ai_processor = EvalAIAnswerProcessor()
assert len(result) == 1, f"The result should be a list of length 1, but got {len(result)}."
resAns = eval_ai_processor(result[0])
accuracy = 0

if "answers" in doc and doc["answers"] is not None:
gtAcc = []

for i in range(len(doc["answers"])):
doc["answers"][i] = eval_ai_processor(doc["answers"][i])

for i in range(len(doc["answers"])):
otherGTAns = [doc["answers"][j] for j in range(len(doc["answers"])) if i != j]
matchingAns = [item for item in otherGTAns if item == resAns]
acc = min(1, float(len(matchingAns)) / 3)
gtAcc.append(acc)
if gtAcc:
accuracy = statistics.mean(gtAcc)
else:
accuracy = 0

return {
"exact_match": accuracy,
"submission": {
"image": f"{doc['question_id']}.jpg",
"answer": resAns,
},
}


def vizwizvqa_doc_to_text(doc):
text = f"{doc['question'].capitalize()}\nWhen the provided information is insufficient, respond with 'Unanswerable'.\nAnswer the question using a single word or phrase."
return text


def vizwizvqa_aggreate_submissions(results):
now_date_time = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
submission_file_name = f"vizwizvqa-submission-{now_date_time}.json"
path = os.path.abspath(submission_file_name)
with open(path, "w") as f:
json.dump(results, f)
print(f"Submission file saved to {path}")
return 0
24 changes: 24 additions & 0 deletions lmms_eval/tasks/vizwizvqa_val/vizwizvqa.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
task: vizwizvqa_val
dataset_path: lmms-lab/VizWiz-VQA
token: True
test_split: val
output_type: generate_until
doc_to_visual: !function utils.vizwizvqa_doc_to_visual
doc_to_text: !function utils.vizwizvqa_doc_to_text
doc_to_target: "answer"
generation_kwargs:
until:
- "ASSISTANT:"
metric_list:
- metric: exact_match
aggregation: mean
higher_is_better: true
ignore_case: true
ignore_punctuation: true
- metric: submission
aggregation: !function utils.vizwizvqa_aggreate_submissions
higher_is_better: true
metadata:
- version: 0.0
- have_ocr_reference: false
process_results: !function utils.vizwizvqa_process_results

0 comments on commit 55fe62a

Please sign in to comment.