-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to reproduce llava evaluation #19
Comments
Interesting, I observe different behavior for the log_samples_suffix reproduce (does not work) and log_samples_suffix llava_v1.5_mme_mmbenchen (I got to work) In order to get it functioning, I also had to introduce two changes to the lmms-eval > lmms_eval > models > llava.py
Would anyone be able to clarify the difference between reproduce and llava_v1.5_mme_mmbenchen? What benchmarks are supported by llava_v1.5_mme_mmbenchen? Additionally, has anyone ever run into the errors I mentioned? Wondering why I was unable to run the code as is. |
I almost met the same series of error as @jacob-hansen. Hope can have a clean version for llava soon. |
Hi, @jacob-hansen , @justlovebarbecue , thank you for spotting out the issue. It seems like current SolutionFirst cd into lmms-eval
Then cd into the LLaVA repo from https://github.com/haotian-liu/LLaVA?tab=readme-ov-file and do the same thing
This will build llava and lmms_eval without installing any dependency. Then instead of using the current
Save this into
The correct environment will be installed by this requirement file. Noted that this file is generated py Then you can run
Make sure use_flash_attention_2 is set to False in model_args. Results![]() Additional NoteIf you want to use flash attention, you can install it by
But noted that if you use flash-attn 2, there will be a slight different in the result. I got |
Also, @jacob-hansen , there is one typo for the |
When following your new protocol, I observe no differences than my work arounds for the installation. By specifying use_flash_attention_2=False, I no longer had to comment that part out from my code. But I still observe a gpu and required the following changes:
This might be specific to my system though |
@jacob-hansen , you might also need to add |
This change has been merged into: |
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
…del Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d7 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d7 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d7 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d7 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d7 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts i…
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d407…
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752…
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f9e48cec5493010a363b446b81a335ef1484e42f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa …
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
* add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '4d11dcea8db1a7e4b7347f3c9880788e8cde5d9f' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 4d11dce Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 7c68ea1 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts commit 2c63fe7f7b6313ce772edeb41974ba0b08b8c469 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit c524ca948439157c24faad9b2fc41c7c139e0ed1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 2c63fe7f7b6313ce772edeb41974ba0b08b8c469 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit c524ca948439157c24faad9b2fc41c7c139e0ed1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit aaf199c777fe7b81e1ad39bd72cf2cd1daf30d69 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 07b5317f2d9f85465b35dcb2e11cf0d3a51aeb2a Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 6126fe6d8bdf09825855236377cb78b5e4b242ed Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit c9f49774bfa0f505fb266871f3e56ae5a397a97b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2a852842282e211ca885180db1aba4b1d1f8c2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit 8ef634ccbe2bd5f1159674f1ce70349d7adf935f Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit f49f4961d921b7c8196c1484418ec1673e5e4b74 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 368690aad385c5e1972fe5394b94a8eb1a47efca Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 47463754525984a17f790c5dace6ff05b1ce72f7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 60c1d7c Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'f9c9014ba3566cb1bf1f19bf0d85c6e54ce7c8b4' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit f9c9014 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4a97197 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '97ff1387a5d851d5e34dd6988fb4567f87e0ce7e' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 99f5333 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '8709dc0660676131a2d84126b6cf5ea2ee873c7f' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 8709dc0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4e27457 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 35c3c7098e489ddc552778ea801a6acb6a25a9d9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 25d9de0b0ea4418e4b1b6f74bdb0dd4c835f66a9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit aad562494c54d6ddd8cc9b9558a2a300e65f2ea2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 40d1888f2e83dadac572c08b7e1f0ae6e2b4d504 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit 02b00db5c3c2dce5ab4c2db6a3eacc7d0b735942 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit f35878778fc0179381b8f3d61d222000b1773774 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 64fb8196c4d9a943fa11a1d0b0fd2a065ed37847 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit f79ece372f140427c9461aa652fe1a9e8a312b3d Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 028007a0352365dd42a968df6000eb66c9d30e2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 6904a35 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '2037acaebc414280bd85e31b30ef9d2e671b3a19' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit bf49735f01e8a523d01acadba47a410b1fa46434 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 8a7901e371f8f1e1c47442609cf5d007a5aee3df Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit fcd53e6e5a1a7b17e7a69c08eb306dd8ad3435c6 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit cbf0704d7b754b0d233f1643f3c3181fea8d02db Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 77cc77fe7c49d65b3275c333bb1ce93798d46994 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 100acee4869445bfa0a00aebdc1d36272f2af7ed Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 2037aca Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 5df364f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts commit fc6d5dd1b7e142e0336c2099845cd2b89558a77b Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 4d35cfef00c7bbe2d51d7e72b4df60fc30e0cea1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit bf49735f01e8a523d01acadba47a410b1fa46434 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 8a7901e371f8f1e1c47442609cf5d007a5aee3df Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit fcd53e6e5a1a7b17e7a69c08eb306dd8ad3435c6 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit cbf0704d7b754b0d233f1643f3c3181fea8d02db Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 77cc77fe7c49d65b3275c333bb1ce93798d46994 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 100acee4869445bfa0a00aebdc1d36272f2af7ed Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit fc6d5dd1b7e142e0336c2099845cd2b89558a77b Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 4d35cfef00c7bbe2d51d7e72b4df60fc30e0cea1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '6675f7f78dab8240ff74e2b35530ad5d500dcead' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 07298ce Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'b2c71248314fc8f8461222e594c7ab046f5383f5' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit b2c7124 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit d9c5827 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 6f66c1130070307ba51eae79f54e197f0053266b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit a6d360d7b1092d5656e4b4ad7d8964f44ee0a3dc Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 7ed11f762e3af8b9a2261793c5bbc9c3ebc2c512 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 963fd932338aae1dee007bbb574daec162cb58bb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit 1481d73aef646233dce05b3b2989a9e8eddcab2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit 45a3bf24b4c6e610237e2ef81f1b01cf11ee25d9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 63080782e2d7544d58c513648dd64647131d6337 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit ef60547ab60a4a5e18de1634c8126ad5cbc1139c Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 7d2e92c2835f88cd7832ddab0874996b308faa9a Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 297f023 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '82108537ee4e3d54d6378fb7faa78199e00a3e8b' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 8210853 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 158c42d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '97ff1387a5d851d5e34dd6988fb4567f87e0ce7e' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 68fdd79 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '52ee4a18dad22b2399a4248d2aa9204dbfe88624' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 52ee4a1 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 04303b0 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 35c3c7098e489ddc552778ea801a6acb6a25a9d9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 25d9de0b0ea4418e4b1b6f74bdb0dd4c835f66a9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit aad562494c54d6ddd8cc9b9558a2a300e65f2ea2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 40d1888f2e83dadac572c08b7e1f0ae6e2b4d504 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit 02b00db5c3c2dce5ab4c2db6a3eacc7d0b735942 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit f35878778fc0179381b8f3d61d222000b1773774 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 64fb8196c4d9a943fa11a1d0b0fd2a065ed37847 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit f79ece372f140427c9461aa652fe1a9e8a312b3d Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 028007a0352365dd42a968df6000eb66c9d30e2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 3a3373b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…del Specific Prompt. (EvolvingLMMs-Lab#20) * Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'bfdf75d7b67680cdc98fdf3f58458633bb492de6' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 11795cb69caaaceddf6b284f18a386c7787d476d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit fb19895ca28ecf64d2ea5322e5391f7742e540f4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e02df3b556a9d34d32d8bfa1f99ea992b763bc6f Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 388a23ac4bb47644826869562c70c10b470a1817 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit bcb7df038402c5ef73db230126fcd76795ee69df Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 30056b56be382107f520d5c85b84c3d541d970e9 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit bfdf75d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit f69268b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts commit 53ddf3fb2716fd99b2fa454656312d6fc92227b7 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d7bbd3b2cbd78fdc3df2137ac0d625b5f5505acc Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 11795cb69caaaceddf6b284f18a386c7787d476d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit fb19895ca28ecf64d2ea5322e5391f7742e540f4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e02df3b556a9d34d32d8bfa1f99ea992b763bc6f Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 388a23ac4bb47644826869562c70c10b470a1817 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit bcb7df038402c5ef73db230126fcd76795ee69df Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 30056b56be382107f520d5c85b84c3d541d970e9 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 53ddf3fb2716fd99b2fa454656312d6fc92227b7 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d7bbd3b2cbd78fdc3df2137ac0d625b5f5505acc Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '899aa01c40d964fdabf024964c7e96fe3663c7d6' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 94b86aa Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (EvolvingLMMs-Lab#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Update tqdm progress bar position * Merge commit '4d11dcea8db1a7e4b7347f3c9880788e8cde5d9f' * Squashed commit of the following: commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 4d11dce Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 7c68ea1 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts commit a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 2c63fe7f7b6313ce772edeb41974ba0b08b8c469 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit c524ca948439157c24faad9b2fc41c7c139e0ed1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '4d11dcea8db1a7e4b7347f3c9880788e8cde5d9f' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 4d11dce Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 7c68ea1 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts commit 2c63fe7f7b6313ce772edeb41974ba0b08b8c469 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit c524ca948439157c24faad9b2fc41c7c139e0ed1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '4d11dcea8db1a7e4b7347f3c9880788e8cde5d9f' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 4d11dce Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 7c68ea1 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' commit a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit 'f9c9014ba3566cb1bf1f19bf0d85c6e54ce7c8b4' * Squashed commit of the following: commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit f9c9014 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4a97197 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts commit e546b08 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 122f420e8450d70eeee97d0e33d30772f781358d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 363c608444b6df57d51e53f2adb8d8cbfeda0852 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 87d7ee9a776438ad39d4f275f6dc589433f30931 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit d5571773a0d095a288be47883ffdf53f07a077ee Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit e8710b2ab15bf0bfdb30a60fbd18dcec404bd2ae Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 3f3f7481a933a368b5f1b8f267f8003ea0ef82f4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit c0dcef296c7076d5ad9992c1afef229e447b9851 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 59e99665b53fb0d1b95f59ae7c2bfffdb1f6d93b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 68364a32bd346514262a957e89899a4c1c057bf9 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 18f51856fef5fd1b62b1189068c9837fb67195e3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 927d9e80856e6b14fd81ddac273ba2468dddc076 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 589b3a2a124c241441b3180a67bab57412bbe5ef Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 4c9c7db3ba26973d130a111506b4d5d77ab00c95 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4f7ad7f56de08cddd2b3af64635d0d3a2c37ddb7 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'f9c9014ba3566cb1bf1f19bf0d85c6e54ce7c8b4' into dev/bli_add_datasets commit e7cd3c23c345d1ed54e9085ac0cf28006489c434 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit f9c9014 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4a97197 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 33b1143ba4f86461cc37c5b4f86c3a20523768e5 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b163d1076b6df227914352cb7a23e5cbc282c683 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 44df7c8fa06090b48a47c5c87f988d2e14c663f9 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'f9c9014ba3566cb1bf1f19bf0d85c6e54ce7c8b4' commit 96626ddda0df111ef5498f294e58ed01b51bdbbd Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit f9c9014 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4a97197 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts commit 739dc3f823ab434707c23160c1bf51712ccbdc43 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 7ae42b4e7f1895429ffaa4ffb2d57f6aab2a470c Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 55d8c58b5446e432de7d397eb251028608d08edd Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' commit e546b08 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit '8709dc0660676131a2d84126b6cf5ea2ee873c7f' * Squashed commit of the following: commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 8709dc0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4e27457 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts commit 84cec07 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '8709dc0660676131a2d84126b6cf5ea2ee873c7f' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 8709dc0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4e27457 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '8709dc0660676131a2d84126b6cf5ea2ee873c7f' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 8709dc0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4e27457 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' commit 84cec07 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit '2037acaebc414280bd85e31b30ef9d2e671b3a19' * Squashed commit of the following: commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 2037aca Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 5df364f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts commit 1e0514f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 69aeae7eb7dbf916c81e86820e4d56a8503c4538 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6872c8515f3ae9044137a582a90487cd2795da72 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e7c30645476c5eafcf623adc63cc765ca32b24b3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit f593b6f1673dd66b593db2fe8a87bafec22b228b Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 4a717c6390d2b1af8ff8b60b73185a9ddadb670b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 7336e827777d5293bd31137a771a40c23d52c104 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit a06b559b508b71443254bf1ffb9be07460f63e77 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 359bc89bd5c703bcf89c620bd8e1f7ae803efef6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5759d3d5bcf9c030a59f268a7606f5166d896771 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 8ad0423836c858594d359038bcbf95018e41ce07 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 70894b138d6aa4e654444ee49de0471137987ebc Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 955f34bbf69e5e7056eb6c4258940d254054be24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 003617d7cb7ee573953ef01fe99da260893ddf24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit bf49735f01e8a523d01acadba47a410b1fa46434 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 8a7901e371f8f1e1c47442609cf5d007a5aee3df Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit fcd53e6e5a1a7b17e7a69c08eb306dd8ad3435c6 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit cbf0704d7b754b0d233f1643f3c3181fea8d02db Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 77cc77fe7c49d65b3275c333bb1ce93798d46994 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 100acee4869445bfa0a00aebdc1d36272f2af7ed Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit fc6d5dd1b7e142e0336c2099845cd2b89558a77b Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 4d35cfef00c7bbe2d51d7e72b4df60fc30e0cea1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b57696a1735aa68637a9eb31dcf270dbd10febd4 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '2037acaebc414280bd85e31b30ef9d2e671b3a19' into dev/bli_add_datasets commit 7b0184493f0b5c06bf32cb711a877b4ef2360a82 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit bf49735f01e8a523d01acadba47a410b1fa46434 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 8a7901e371f8f1e1c47442609cf5d007a5aee3df Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit fcd53e6e5a1a7b17e7a69c08eb306dd8ad3435c6 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit cbf0704d7b754b0d233f1643f3c3181fea8d02db Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 77cc77fe7c49d65b3275c333bb1ce93798d46994 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 100acee4869445bfa0a00aebdc1d36272f2af7ed Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 2037aca Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 5df364f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts commit fc6d5dd1b7e142e0336c2099845cd2b89558a77b Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 4d35cfef00c7bbe2d51d7e72b4df60fc30e0cea1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 27dd8d1264f84c46af49e0a94d32297c566379e9 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit a64326c69ced88a0037ba379447ec0c4db74ada6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit c2e4dbb83d76e02e711845de7df6c6e27f417a3b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '2037acaebc414280bd85e31b30ef9d2e671b3a19' commit b6677e4f0ade1a4b86dff73d67140f35aa8a77ad Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 2037aca Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 5df364f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts commit 6b952d50a2e305eb8382b14e4a6ce9a3e7b6e080 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit bcb0d81b71bb3209f814fb6e4889fb6ef54bb524 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 5fc18f2f0a8786e447a8feb312fcdc4538622f1f Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' commit 1e0514f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit 'b2c71248314fc8f8461222e594c7ab046f5383f5' * Squashed commit of the following: commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit b2c7124 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit d9c5827 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts commit 340c450 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'b2c71248314fc8f8461222e594c7ab046f5383f5' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit b2c7124 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit d9c5827 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'b2c71248314fc8f8461222e594c7ab046f5383f5' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit b2c7124 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit d9c5827 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' commit 340c450 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit '82108537ee4e3d54d6378fb7faa78199e00a3e8b' * Squashed commit of the following: commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 8210853 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 158c42d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts commit 76c213d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 122f420e8450d70eeee97d0e33d30772f781358d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 363c608444b6df57d51e53f2adb8d8cbfeda0852 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 87d7ee9a776438ad39d4f275f6dc589433f30931 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit d5571773a0d095a288be47883ffdf53f07a077ee Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit e8710b2ab15bf0bfdb30a60fbd18dcec404bd2ae Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 3f3f7481a933a368b5f1b8f267f8003ea0ef82f4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit c0dcef296c7076d5ad9992c1afef229e447b9851 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 59e99665b53fb0d1b95f59ae7c2bfffdb1f6d93b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 68364a32bd346514262a957e89899a4c1c057bf9 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 18f51856fef5fd1b62b1189068c9837fb67195e3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 927d9e80856e6b14fd81ddac273ba2468dddc076 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 589b3a2a124c241441b3180a67bab57412bbe5ef Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 4c9c7db3ba26973d130a111506b4d5d77ab00c95 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4f7ad7f56de08cddd2b3af64635d0d3a2c37ddb7 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '82108537ee4e3d54d6378fb7faa78199e00a3e8b' into dev/bli_add_datasets commit e7cd3c23c345d1ed54e9085ac0cf28006489c434 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 8210853 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 158c42d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 33b1143ba4f86461cc37c5b4f86c3a20523768e5 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b163d1076b6df227914352cb7a23e5cbc282c683 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 44df7c8fa06090b48a47c5c87f988d2e14c663f9 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '82108537ee4e3d54d6378fb7faa78199e00a3e8b' commit 96626ddda0df111ef5498f294e58ed01b51bdbbd Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 8210853 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 158c42d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts commit 739dc3f823ab434707c23160c1bf51712ccbdc43 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 7ae42b4e7f1895429ffaa4ffb2d57f6aab2a470c Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 55d8c58b5446e432de7d397eb251028608d08edd Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' commit 76c213d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit '52ee4a18dad22b2399a4248d2aa9204dbfe88624' * Squashed commit of the following: commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 52ee4a1 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 04303b0 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts commit bee5794 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '52ee4a18dad22b2399a4248d2aa9204dbfe88624' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 52ee4a1 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 04303b0 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '52ee4a18dad22b2399a4248d2aa9204dbfe88624' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 52ee4a1 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 04303b0 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' commit bee5794 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Update tqdm progress bar position * Merge commit 'bfdf75d7b67680cdc98fdf3f58458633bb492de6' * Squashed commit of the following: commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit bfdf75d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit f69268b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts commit 95f3d3e Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 9e827183b527e9a035a6359448c1e692df089ed1 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 570500320783a594f218699ea1509ec537591b2e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 0e75485613ff06b532403a152974eedf8e117c9c Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 6429b7e69ddc0eee6a6728772ec5eb2114d6e331 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 892bc90979fd6b5b64de0ed68b17ac2944b9e6fa Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit aff94aaf134bb404e48cd59d931cd214197df339 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit d0dc730cbee420e7121b0520eb40a1f30447930d Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit c69ecbfc52492aca3e5ecfc8d425ee9e7af00978 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 9053bc9aafb19d654b30927a8fec72347c745886 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit bbf0dbb9e7d05ce6aecd251815a66ac38e9a4169 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit ee76ebb5bd120708d07477e1462e986ece346975 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit d252441a31ea5ab29bd32accb5b0b9e1ba73587b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 3278cccfcd5454ab972071555918fc8571f94d37 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 11795cb69caaaceddf6b284f18a386c7787d476d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit fb19895ca28ecf64d2ea5322e5391f7742e540f4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e02df3b556a9d34d32d8bfa1f99ea992b763bc6f Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 388a23ac4bb47644826869562c70c10b470a1817 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit bcb7df038402c5ef73db230126fcd76795ee69df Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 30056b56be382107f520d5c85b84c3d541d970e9 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 53ddf3fb2716fd99b2fa454656312d6fc92227b7 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d7bbd3b2cbd78fdc3df2137ac0d625b5f5505acc Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 741278f40ef70df04efd52ddd79e3c260c41a53e Merge: 22c3adf 1d3fdd4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'bfdf75d7b67680cdc98fdf3f58458633bb492de6' into dev/bli_add_datasets commit cbdaa28e87913c26dd6d2de6bd7c2b3acb556b0a Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 11795cb69caaaceddf6b284f18a386c7787d476d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit fb19895ca28ecf64d2ea5322e5391f7742e540f4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e02df3b556a9d34d32d8bfa1f99ea992b763bc6f Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 388a23ac4bb47644826869562c70c10b470a1817 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit bcb7df038402c5ef73db230126fcd76795ee69df Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 30056b56be382107f520d5c85b84c3d541d970e9 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit bfdf75d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit f69268b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts commit 53ddf3fb2716fd99b2fa454656312d6fc92227b7 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d7bbd3b2cbd78fdc3df2137ac0d625b5f5505acc Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b8389cf8dac3f22c8d07f9789fdd877d8298d786 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit f399ed85ace060b3e64bd5468b17f2a856d005bd Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 4657c9b111bac762f3dc5ff9397ea211b2b62656 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'bfdf75d7b67680cdc98fdf3f58458633bb492de6' commit 9b3a02280e05f15e305eb86a3669e76f011c6444 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit bfdf75d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit f69268b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts commit ad4a267e810a4653e5d7ad0b5b9000ea0a39028e Merge: c6370bf 51f2eaa Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit b441be2447ef78dce4c9c8134ad34cfd20765eef Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 9e30e09b429b30cc67389af0ebc94a1149dcc4bb Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' commit 95f3d3e Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (EvolvingLMMs-Lab#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com>
* Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '4d11dcea8db1a7e4b7347f3c9880788e8cde5d9f' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 4d11dce Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 7c68ea1 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'a0b87f52d0c7cde3c320aeac77eb11165e5bb3ef' * Update dataset paths and improve user prompts commit 2c63fe7f7b6313ce772edeb41974ba0b08b8c469 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit c524ca948439157c24faad9b2fc41c7c139e0ed1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 2c63fe7f7b6313ce772edeb41974ba0b08b8c469 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit c524ca948439157c24faad9b2fc41c7c139e0ed1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit aaf199c777fe7b81e1ad39bd72cf2cd1daf30d69 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 07b5317f2d9f85465b35dcb2e11cf0d3a51aeb2a Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 6126fe6d8bdf09825855236377cb78b5e4b242ed Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit c9f49774bfa0f505fb266871f3e56ae5a397a97b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2a852842282e211ca885180db1aba4b1d1f8c2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit 8ef634ccbe2bd5f1159674f1ce70349d7adf935f Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit f49f4961d921b7c8196c1484418ec1673e5e4b74 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 368690aad385c5e1972fe5394b94a8eb1a47efca Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 47463754525984a17f790c5dace6ff05b1ce72f7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 95460de Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'f9c9014ba3566cb1bf1f19bf0d85c6e54ce7c8b4' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit f9c9014 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4a97197 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'e546b08ca8286fe2e4d0943ad9b41667d275f65a' * Update dataset paths and improve user prompts commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 0e74884 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '8709dc0660676131a2d84126b6cf5ea2ee873c7f' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 8709dc0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4e27457 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '84cec070862dc1806761d9f0ee5f1df3b4c8ac0c' * Update dataset paths and improve user prompts commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 35c3c7098e489ddc552778ea801a6acb6a25a9d9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 25d9de0b0ea4418e4b1b6f74bdb0dd4c835f66a9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit aad562494c54d6ddd8cc9b9558a2a300e65f2ea2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 40d1888f2e83dadac572c08b7e1f0ae6e2b4d504 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit 02b00db5c3c2dce5ab4c2db6a3eacc7d0b735942 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit f35878778fc0179381b8f3d61d222000b1773774 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 64fb8196c4d9a943fa11a1d0b0fd2a065ed37847 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit f79ece372f140427c9461aa652fe1a9e8a312b3d Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 028007a0352365dd42a968df6000eb66c9d30e2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 7021e8e Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '2037acaebc414280bd85e31b30ef9d2e671b3a19' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit bf49735f01e8a523d01acadba47a410b1fa46434 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 8a7901e371f8f1e1c47442609cf5d007a5aee3df Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit fcd53e6e5a1a7b17e7a69c08eb306dd8ad3435c6 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit cbf0704d7b754b0d233f1643f3c3181fea8d02db Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 77cc77fe7c49d65b3275c333bb1ce93798d46994 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 100acee4869445bfa0a00aebdc1d36272f2af7ed Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 2037aca Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 5df364f Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '1e0514f92df2bbcd3d1c1fc86e3212c5fed93eaf' * Update dataset paths and improve user prompts commit fc6d5dd1b7e142e0336c2099845cd2b89558a77b Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 4d35cfef00c7bbe2d51d7e72b4df60fc30e0cea1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit bf49735f01e8a523d01acadba47a410b1fa46434 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 8a7901e371f8f1e1c47442609cf5d007a5aee3df Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit fcd53e6e5a1a7b17e7a69c08eb306dd8ad3435c6 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit cbf0704d7b754b0d233f1643f3c3181fea8d02db Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 77cc77fe7c49d65b3275c333bb1ce93798d46994 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 100acee4869445bfa0a00aebdc1d36272f2af7ed Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit fc6d5dd1b7e142e0336c2099845cd2b89558a77b Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 4d35cfef00c7bbe2d51d7e72b4df60fc30e0cea1 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 15a5c86 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'b2c71248314fc8f8461222e594c7ab046f5383f5' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit b2c7124 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit d9c5827 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '340c4501058e13bc64aad611c8bbb4d0059fc545' * Update dataset paths and improve user prompts commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit facd3d87fef5f4eb82dbe3b236a6b199dc87863e Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3380863c2ca0f3b98d74f94c9e72460d28d34acd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 6f66c1130070307ba51eae79f54e197f0053266b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit a6d360d7b1092d5656e4b4ad7d8964f44ee0a3dc Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 7ed11f762e3af8b9a2261793c5bbc9c3ebc2c512 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 963fd932338aae1dee007bbb574daec162cb58bb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit 1481d73aef646233dce05b3b2989a9e8eddcab2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit 45a3bf24b4c6e610237e2ef81f1b01cf11ee25d9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 63080782e2d7544d58c513648dd64647131d6337 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit ef60547ab60a4a5e18de1634c8126ad5cbc1139c Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 7d2e92c2835f88cd7832ddab0874996b308faa9a Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 8dce2b0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '82108537ee4e3d54d6378fb7faa78199e00a3e8b' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 8210853 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 158c42d Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '76c213db0f1495c1ececf0b58678f87cc6144e3c' * Update dataset paths and improve user prompts commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit a9bdc9b952df662cd7156ccc63af31ae0a83d2ff Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 7313f07606ec94f555d50d4523adcb2c1714922e Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 69f7f0be0eaa855c6c46e7c748a7ac69a04606e8 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 9173602a072c669f3348a58b715c77cfef4f0fbf Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 9386d0011c4d6ed7190373d0951d903c7548ccb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 77079bc826943e187247863d5473237de05b3cf2 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 1a284e6a412da3cc503297f33417dad19dd59aee Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d1c04e8c8e509a375c117020b3c241cc736f9365 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 03edad8 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '52ee4a18dad22b2399a4248d2aa9204dbfe88624' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 52ee4a1 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 04303b0 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit 'bee5794a597d8a87794b4bcd9b57a1553efad857' * Update dataset paths and improve user prompts commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 7d5058337d3de3cd4f0e85368e3dd463f34e703c Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 73918654650daa0dad965d1b786d53e7c3585010 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 35c3c7098e489ddc552778ea801a6acb6a25a9d9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 25d9de0b0ea4418e4b1b6f74bdb0dd4c835f66a9 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit aad562494c54d6ddd8cc9b9558a2a300e65f2ea2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 40d1888f2e83dadac572c08b7e1f0ae6e2b4d504 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit 02b00db5c3c2dce5ab4c2db6a3eacc7d0b735942 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit f35878778fc0179381b8f3d61d222000b1773774 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 64fb8196c4d9a943fa11a1d0b0fd2a065ed37847 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit f79ece372f140427c9461aa652fe1a9e8a312b3d Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 028007a0352365dd42a968df6000eb66c9d30e2b Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit f7a7db5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'bfdf75d7b67680cdc98fdf3f58458633bb492de6' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 11795cb69caaaceddf6b284f18a386c7787d476d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit fb19895ca28ecf64d2ea5322e5391f7742e540f4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e02df3b556a9d34d32d8bfa1f99ea992b763bc6f Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 388a23ac4bb47644826869562c70c10b470a1817 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit bcb7df038402c5ef73db230126fcd76795ee69df Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 30056b56be382107f520d5c85b84c3d541d970e9 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit bfdf75d Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (EvolvingLMMs-Lab#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit f69268b Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (EvolvingLMMs-Lab#17) * Merge commit '95f3d3e116db32b49631f2005c9b2a608f778cc0' * Update dataset paths and improve user prompts commit 53ddf3fb2716fd99b2fa454656312d6fc92227b7 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d7bbd3b2cbd78fdc3df2137ac0d625b5f5505acc Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 11795cb69caaaceddf6b284f18a386c7787d476d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit fb19895ca28ecf64d2ea5322e5391f7742e540f4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e02df3b556a9d34d32d8bfa1f99ea992b763bc6f Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 388a23ac4bb47644826869562c70c10b470a1817 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit bcb7df038402c5ef73db230126fcd76795ee69df Merge: 7e8b57d 1d3fdd4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 30056b56be382107f520d5c85b84c3d541d970e9 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 53ddf3fb2716fd99b2fa454656312d6fc92227b7 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit d7bbd3b2cbd78fdc3df2137ac0d625b5f5505acc Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 19db53b Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (EvolvingLMMs-Lab#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Update lmms_eval/evaluator.py and lmms_eval/tasks/vizwizvqa/utils.py * vizwiz-val * Update utils.py * Update vizwizvqa.yaml --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f102f038a161fe667628accd2d9daa33e70fe74f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f102f038a161fe667628accd2d9daa33e70fe74f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d407…
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f102f038a161fe667628accd2d9daa33e70fe74f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f102f038a161fe667628accd2d9daa33e70fe74f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752…
* Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> * VQAv2 eval (#4) * vqav2 * Add vqav2_process_results function and update vqav2_doc_to_text function * Implement vqav2_process_results function to return exact match score * Refactor fewshot_docs() to use config.fewshot_config * Refactor Task class to handle fewshot_docs when training and validation docs are not available * Add answer processing logic in vqav2_process_results function * Refactor vqav2_process_results function and add submission aggregation * Add vqav2_aggreate_submissions function to utils.py * textvqa * Refactor answer processing in textvqa_process_results() function * textvqa eval * Update dataset path and modify textvqa_doc_to_text function * Capitalize the question in textvqa_doc_to_text function * Update textvqa.yaml and utils.py * Fix formatting issues in lmms_eval/api/task.py, lmms_eval/tasks/gqa/utils.py, lmms_eval/tasks/textvqa/utils.py, and lmms_eval/tasks/vqav2/utils.py --------- Co-authored-by: Li Bo <drluodian@gmail.com> * [Big Changes] add LLaVA-1.6, MMVet, LLaVA-W, POPE, and many other changes on logs, model args. (#7) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * black lint * Remove unused code and scripts * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f102f038a161fe667628accd2d9daa33e70fe74f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * add mmvet and try to modify llava arch * black lint * Update lmms_eval tasks and utils * Update LMMS-Eval dependencies and configurations * Squashed commit of the following: commit 209f3904f33210bec0b4b146e96fcbd67a4e1541 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Wed Jan 17 20:27:13 2024 +0800 Add COCO, RefCOCO, RefCOCO+, RefCOCOg (#5) * Update author name and email in pyproject.toml * add mmvet and try to modify llava arch * Add coco, refcoco support * Fix doc_to_visual error * Fix segmentation mask error * Add refcoco+, refcocog * Remove debug code * black lint * Remove unused code and scripts * Fix group stderr N/A error between str and int * Fix letter case issue * Update lmms_eval tasks and utils * Fix coco test_split name * Add llava-bench-in-the-wild support * Black codestyle, lint * Add COCO evaluation metric * Add refcoco, refcocog, refcoco+ evaluation kit * Add llava bench coco support --------- Co-authored-by: Bo Li <drluodian@gmail.com> commit f102f038a161fe667628accd2d9daa33e70fe74f Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 17 20:26:58 2024 +0800 Update utils.py (#6) * Fix logging issue and remove unnecessary whitespace * Add openai and pycocoevalcap dependencies * Fix device mapping issue in Llava constructor * Add support for truncating context in generation * Update Llava model and evaluation configuration * Update YAML configuration files * Update YAML configuration files * add otterhd and gemini models * Add support for custom image aspect ratio in Llava model * Add dataset_kwargs and max_gen_toks to YAML files * Fix log_samples suffix typo and use hash for output name * Refactor LMMS evaluation code and update LLAVA model properties * matched response for mistral-llava * Refactor logging in llava_aggregation function * Print evaluation statistics instead of logging them * Fix logging information in llava_aggregation function * Add new models and dataset_kwargs for COCO tasks * Update truncate_context parameter in Llava class constructor * Update dataset_kwargs in YAML files * Remove issue type tags from issue and pull request templates * Refactor pope utils functions * Update transformers dependency to version 4.36.2 * Revise llava-in-the-wild prompt for align * Add default values for gen_kwargs in Llava class * Fix formatting issues and import pdb for debugging * Remove pdb.set_trace() and update default value for max_new_tokens * Add llava loglikelihood * Fix formatting and indentation issues in lmms_eval/api/metrics.py and lmms_eval/models/llava.py * Update function to handle edge cases This commit updates the function to handle edge cases, improving the overall reliability and robustness of the code. * Update black version in pre-commit config * Remove duplicate lines in gqa * Another way to solve memory issue * Handle exception in model generation * Refactor pope_aggregate_results to use "score" key instead of "pope_accuracy" * Update pope metrics aggregation functions * Add model_to_prompt in pope.yaml * Update pope.yaml configuration * Refactor code to simplify construct_requests call --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Add datetime to output name in cli_evaluate function Add get_datetime_str function to utils.py * Refactor pope_aggregate_f1_score function * Fix datetime format in get_datetime_str function * Update JSON dump indentation in cli_evaluate function * Add datetime to output name in cli_evaluate function (#10) * Revert "Add datetime to output name in cli_evaluate function" This reverts commit ef26f78c46b50d8769a4fb6990b909162c2881c3. * Add datetime to output name in cli_evaluate function * [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * add mmmu (#15) * add mmme * black * add mmmu (#15) * add mmme * black * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit 290126e6a269db4cca9b3544bd017d6c17012793 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 8b0227cd7b2602d096d773a01b2199d1f4110f22 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Memory issue] Solve memory issue for building context (#14) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions * Remove unused function llava_aggregation * Refractor llava-bench aggregation code * Add logs and scripts to .gitignore, and set image_aspect_ratio to original in scienceqa.yaml * Update generation parameters in scienceqa.yaml * Solve memory issue for building context * Solved gather result error * Update lmms_eval scienceqa_img config * Fixed nocaps store results * Revise seedbench prompt * Squashed commit of the following: commit c3cc24a89415aeccad31ccbb10642af677cd6fe5 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Wed Jan 24 14:07:36 2024 +0800 add mmmu (#15) * add mmme * black commit 0dbc5d16c4f45ebea8def5f0bc1a36fcd93f9a05 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 10:00:33 2024 +0800 [Datasets] Add four internal evaluation datasets (#13) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function * Remove unused variable in mmvet_process_results function * Remove unused imports in utils.py * Refactor get_chat_response function to include retries for API requests * Update gpt_eval_model_name in lmms_eval/tasks/dc100_en.yaml and add retry logic in get_chat_response function * Update prompt variable in lmms_eval tasks * Refactor output_name variable in cli_evaluate function * Fix logging message in mmvet_process_results function * Update sleep time in get_chat_response function * Merge commit 'fec494dbe5971e8fa5a886b191a4781be3ce7a6f' * Refactor get_eval function to include retries * Add token parameter to load_dataset function in gqa_doc_to_visual * Refactor llava_process_results and llava_aggregation functions commit fec494dbe5971e8fa5a886b191a4781be3ce7a6f Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Jan 23 19:17:40 2024 +0800 [Dataset] Add SEED-Bench, TextCaps, NoCaps (#12) * Change coco from print to logger * Add llava loglikelihood * Add Nocaps support * Fix pass through function * Add textcaps support * Fix textcaps eval image_id * Add seedbench support * Add seedbench ppl evaluation * black lint commit 4c3c2c63a681f29c537c2467957de1a90568748d Author: Li Bo <drluodian@gmail.com> Date: Tue Jan 23 19:17:12 2024 +0800 [Datasets] Added POPE and Aligned. (#11) * Update generation_kwargs in pope.yaml * Update pope_doc_to_text function --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit a2cc9303dc72e4d53983bb56e54a32e977c3e270 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit 35e87e7c7a480d005abf607c2527a35457d92311 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 89755323596b85208ed33aa88c296604a39af6eb Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0b0d30dfb247c5f0b7b68398b9e9fcde74cf7fa2 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit e273f9cbd91540df86bdbc652bff88a847bd0d2d Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit e84126aaaf8a07bd371a0571a914ccbcd3697f20 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 110deab53dc1a2fd349b1872cd261b69074c5fa8 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 0fa3e0c40075997ea80ed976bdee9615f17d3ece Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit 2aaca579120def99860f90054233f3358950fa66 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '817eb057bcb61226b33d3ac3c8def01c36c90f96' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit f253968ad703f682a29317bdd51ec6c1fd7c5465 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * [Datasets] Changes for Flickr30K and NoCaps, also merged Peiyuan's Model Specific Prompt. (#20) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Fix cli itself can not run with config file * Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability * Refactor get_task_dict function to handle nested groups * Add submission file for coco, flickr30k, nocaps, and textcaps tasks * Remove unused files and update task configuration * Fix tasks issue for nocaps, refcoco/+/g * Fix file path and raise error if config file does not exist * Exclude train in refcoco/+/g config * Solve doc_iterator_for_counting crashing issue * Black lint * Refactor code to improve performance and readability * Squashed commit of the following: commit 0df825c9e72a06e6acb4c0bd43c2083ffe8b74c0 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:03:57 2024 +0800 change okvqa yaml commit b9d9f9896993033b92346e9f47420c55b866c715 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:55:40 2024 +0800 change yaml commit 4256bef410e4c8d8761e0cd0d79ac5e57b97651b Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:42:43 2024 +0800 add okvqa task commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Squashed commit of the following: commit 0c8a3919885b8fe2880bb2892f7a619d060012d1 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:06:02 2024 +0800 change ocr reference commit d2bc7c92ac61179b8c4031e11bc31970355252f6 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 01:05:46 2024 +0800 revert example_eval commit c78fa29cd0d161641ee05db57bd39314b998c8c7 Author: JvThunder <joshuaadrianc@gmail.com> Date: Fri Jan 26 00:17:28 2024 +0800 edit vizwiz utils commit 397f0906968fd8ba04b883469b96217737c43e09 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:49:47 2024 +0800 reorganize __init__ commit 52a7ea6c7599adeec2ac2787f500e215ce47cf79 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 23:46:20 2024 +0800 minor fixes commit f706b2aaf9b288c582611191a1841b58feaeb741 Author: JvThunder <joshuaadrianc@gmail.com> Date: Thu Jan 25 17:41:03 2024 +0800 add vizwizvqa eval rask commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * Refactor mathvista.yaml and utils.py * Add gpt_eval_score to mathvista_process_results * Refactor mathvista_aggregate_results to return average accuracy score * Fix refcoco evaluation error * Fix evaluation problem for refcoco+/g * Refactor mathvista.yaml and mathvista_evals.py * Add dependencies and update YAML files * Refactor mmbench_en/utils.py to save test results to separate Excel file * Fix caption task prompt * Add group field to mmbench_en_test and mmbench_en_val yaml files * Delete mmbench_en_val.yaml file * Update mmbench_cn.yaml and mmbench_cn_test.yaml * Update mmbench_cn_val.yaml and utils.py * Remove unused fields in mmbench_cn_cc_process_results function * Update aggregation function for mmbench_en_dev.yaml * Fix capitalization of L2-category key in utils.py * Fix variable name in mmbench_process_results function * Delete mmbench_cn_val.yaml file * Update mathvista_test.yaml and mathvista_testmini.yaml * Fix warnings and update mathvista.yaml * Remove system message from MathVistaEvaluator * Update GPT model version in MathVistaEvaluator constructor * Update GQA_RAW_IMAGE_DATASET path in utils.py * change vizwiz to test set * Add split flag to mathvista_aggregate_results function * Add higher_is_better: false to gpt_eval_info metric in d170_cn, d170_en, dc100_en, and dc200_cn yaml files * Add download configuration for dataset * Update GQA_RAW_IMAGE_DATASET path in utils.py * add datasets * Update gpt_eval_model_name in mathvista.yaml * Merge commit '0d620f98b49f8204d02633f209eedd5d8b7a1f7c' * Update pyproject.toml with dependencies and URLs * Squashed commit of the following: commit 8b600f55b6cf5627504c407871539db59f6085a3 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Sat Jan 27 13:56:37 2024 +0800 Dev/add chartqa and ai2d (#23) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code * add chartqa * black * add ai2d * black * update chartqa * blacl * update ai2d dataset * black * Add 'submissions/' directory to .gitignore * Add Python setup and Black version installation workflow Refactor ContextSampler class in samplers.py Remove unnecessary line in DecontaminationFilter class Update dependencies in pyproject.toml * Refactor code in ContextSampler class --------- Co-authored-by: Bo Li <drluodian@gmail.com> * Refactor image processing and submission file path * Refactor directory creation logic in cli_evaluate_single function * Update dataset path and test split in vqav2.yaml * Remove "total" column from cap_details_columns DataFrame * Add retry logic for dataset download * Add 'tenacity' to dependencies in pyproject.toml * Refactor code in ContextSampler class * Update Black version and configuration, and improve code readability in ContextSampler * Update Black version and line length --------- Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> * vqav2 (#25) * Update tqdm progress bar position * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit eeb2b9827502f044ef67d8440f53124baf219ba3 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 1ce9f0b37e4bc5e6ff5fbfcd23fd339eb14974ae Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit e12b3bb41ed4f51540cfac84e5e96d15777540c4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 42c56f82bc4ccae12e19e76d09d7e525ca9ef2f4 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit aed08303fe87808986d206540a0c0ee6d8764988 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit a105386613c443d9e740c89725cbd1281bbdfef6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 21c8119e377760f44c769bed2528d863a8f4333b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 0ccb2629c2aacdb297b7cf0c9c2bcfa386bb7582 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 5365e13e93c702a1e0e259ee6a08d6a427d72470 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 6773348c807bcfa1b09ceffc90c75e15cad908f7 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit 31140f9c87dea89ca94c94bc850e3a8d43e5f8b4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit df1bad47f6ed13f94848d2bee29b28e00c2384b2 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit b13a805623dfd9d826ddd440e1b5ecde773fbb12 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 06383aa4a5ff59db52fc8d584f3086efd88b7e74 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 7a71fd6022ee5985100dda38b94956595cec77a5 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' into dev/bli_add_datasets commit 6870cba13cb54976480c1d5e8d97602c246f881b Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit d4e8e2552d40752bfdc5bbf4cd962c1798096258 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit 520c7a2cafe60810aca79df814ce6829d4576032 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit 3a633240327c078fa4f5a75dbd38ad5bc0d468dd Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit b40d522b6bf483ebdfbf5facd4573de0cf8a93f6 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit 5bf643f73d06f1e540897b753450352bb92fd9ec Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit 95f110f0eef5196205bc501367e3642c57cc7a17 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' commit c844ae49b18c1334711832208b0359c9439fe1c0 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit 842fbc6f2da7d9a118adf9ec27c3d8542d74168e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit 4bf0504fabc3b62f356c467b2fd1119083d27313 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts commit f0446227f0dd93651e9d6c06254bbf5212ede2dd Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit 1e1f6cfccba758dc606fa4217102518fab73c936 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 966933754b9e5179995b3ab41d746603e13e75c6 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' commit 767f7e2cae60cf67ec5878234d84321395a3ed15 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vqav2 (#25) * Update tqdm progress bar position * Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' * Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * remove useless output file * Update dataset path in vqav2.yaml * Squashed commit of the following: commit 75bb7043ea5a533ab6351fc0f5ab055e86106423 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:56:45 2024 +0800 Black lint commit 6635a8aa34cfbd3c7a4afb6fcd214a7283ce01cb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:47 2024 +0800 Solve doc_iterator_for_counting crashing issue commit 080f42b88ea8acacd527b8d67b84ba1d7d135b03 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 09:55:13 2024 +0800 Exclude train in refcoco/+/g config commit 4da84069c08c95e49e8ab0e64a1e103ff7ac8730 Merge: 6a1ae69 697a438 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:13 2024 +0000 Merge branch 'dev/bli_add_datasets' of https://github.com/EvolvingLMMs-Lab/lmms-eval into dev/bli_add_datasets commit 6a1ae69923d79ae32a001edac38206b605274ec3 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 17:17:06 2024 +0000 Fix file path and raise error if config file does not exist commit 697a4387827ceeec3e393237dd1baa217c714c88 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Fri Jan 26 00:47:24 2024 +0800 Fix tasks issue for nocaps, refcoco/+/g commit 47e40437126d39a5f062c9a33b4de426c1a29804 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 10:09:43 2024 +0000 Remove unused files and update task configuration commit 9976eb8e9ed03c8613725fdbd822ef5d8cf70e47 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:43:56 2024 +0000 Add submission file for coco, flickr30k, nocaps, and textcaps tasks commit 95f97a69faa6129676e89eee14960fcfe2076b7c Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:32:54 2024 +0000 Refactor get_task_dict function to handle nested groups commit 3b79ee842b2488714baf92ab34528ef77989d392 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:13:46 2024 +0000 Fix bug in login functionality Refactor code for better performance Add new feature for user authentication Update UI layout for improved user experience Fix typo in variable name Optimize database queries for faster response time Add error handling for edge cases Update dependencies to latest versions Remove unused code Improve code readability and maintainability commit f5c353f2ce93a2d96add4312b695b57432f68cbb Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 17:07:20 2024 +0800 Fix cli itself can not run with config file commit 9a68fec37be74cfe8d4a73390bc83edee147ae24 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:09:04 2024 +0000 Squashed commit of the following: commit 18e984cfe173390843c73048a931baa17800f918 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Jan 25 17:08:25 2024 +0800 add model specific prompt and gen kwargs in sqa (#19) * add mmme * black * add model specific prompt and gen kwargs * black * add yaml config to supprot multi-model eval * print table at the end * refactor multi model code commit 93f847c5851fd246716367935d6b807b17d53949 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 09:02:57 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit fa4ad4404e26d8924f55208746dbb9143b464011 Merge: 22c3adf 4d11dce Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:43:15 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' into dev/bli_add_datasets commit 22c3adfd0645acc23b6d7c06b487f4ffd47666c4 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:52 2024 +0000 Squashed commit of the following: commit 4d48d0c9b88e62dfebe05ec909b7f1851e9cd75d Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 4a4b7bec200c72332b61a0c277cd8f8a34e4f721 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit 63739fc6fa0a462d807ae81de0db0173102de584 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit edcc752f97ea3845cefad56624e5d2855066f680 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 41f4b63d3a6e83babe92bac32a7432a8ef740bb5 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa commit 7e8b57d3bcc21d2a049d3abbc8a8201631641db4 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:36:46 2024 +0800 add model specific prompt and gen kwargs commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 5f55126484a7c9325db586d26cf2052538222804 Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:56:51 2024 +0800 black commit aa6f8853cf82384fb3b15306fec4769212fbc5ab Author: jzhang38 <a1286225768@gmail.com> Date: Wed Jan 24 13:55:43 2024 +0800 add mmme commit 4c712336b6f7438e717a865910bb241e413a4688 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 08:38:11 2024 +0000 Add coco_val and coco_test tasks to coco.yaml commit b5547126c855927fd4dc8384211e4aceee40870f Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 04:58:28 2024 +0000 Update dataset_path in flickr30k.yaml commit f786f61e2559f082072f21aa9030e2080ddaf809 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:12:25 2024 +0000 Merge commit 'ecb47d73d6e000b472be6c5c0cdc9413c7734384' commit 796a011000e0df90f66f8e80cb34dc2318ae9ac8 Author: Bo Li <drluodian@gmail.com> Date: Thu Jan 25 02:10:18 2024 +0000 Add submission folder and update file paths for storing prediction results commit ecb47d73d6e000b472be6c5c0cdc9413c7734384 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Thu Jan 25 09:47:31 2024 +0800 [Dataset] Add flickr30k (#18) * Add flickr30k support * Black lint * Align prompt with NoCaps commit dc23f4b42b1dd60b41904d7ddbee1412d6851077 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:14 2024 +0800 [Datasets] modify NoCaps data path and prompts (#17) * Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' * Update dataset paths and improve user prompts commit 118744c63eb2d9724571d85fbbd85fcc9ad05b59 Merge: c6370bf a0b87f5 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 22:10:07 2024 +0800 Merge branch 'main' into dev/bli_add_datasets commit c6370bff65903681f00cf3d07111d8e15a57b619 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 14:08:06 2024 +0000 Update dataset paths and improve user prompts commit 810daf458fa94cb3ec2b4a6cc5ecb1e656a24002 Author: Bo Li <drluodian@gmail.com> Date: Wed Jan 24 11:52:33 2024 +0000 Merge commit '95ef3ea519cbd772924f9a6afa5394979eb00432' commit 95ef3ea519cbd772924f9a6afa5394979eb00432 Author: Li Bo <drluodian@gmail.com> Date: Wed Jan 24 19:51:34 2024 +0800 Add output path file naming convention (#16) Update datetime format in get_datetime_str() function * Fix bug in login functionality * create vqav2_val * Update vqav2_test.yaml * Update vqav2_test.yaml * Update vqav2_val.yaml --------- Co-authored-by: Li Bo <drluodian@gmail.com> * vizwiz dataset (#24) * Merge commit '767f7e2cae60cf67ec5878234d84321395a3ed15' * Update dataset paths and improve user prompts * Add submission folder and update file paths for storing prediction results * Merge commit '842fbc6f2da7d9a118adf9ec27c3d8542d74168e' * Update dataset_path in flickr30k.yaml * Add coco_val and coco_test tasks to coco.yaml * Squashed commit of the following: commit 542a34dc5721ecdff6c5c68b0568692ad3a17149 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:59:12 2024 +0800 refactor multi model code commit 3c397b8af85192b1821b3b6a0d8b8df746b5347c Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:51:16 2024 +0800 print table at the end commit e7b8a2d1f1e7337f02298efafd2ebf81543f4f85 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 11:20:59 2024 +0800 add yaml config to supprot multi-model eval commit 2626383d99b5eac59d531ca0f293df960570c524 Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:39:42 2024 +0800 black commit 8349935fe145e33af0007ad4fb0d71fd925be7a0 Merge: 7e8b57d 4d11dce Author: jzhang38 <a1286225768@gmail.com> Date: Thu Jan 25 10:37:57 2024 +0800 resolve conflicts in sqa …
To Reproduce (on a linux machine with 1xA100 80GB):
Received error:
__main__.py: error: unrecognized arguments: --log_samples_sufix reproduce
Removed unknown commands (log_samples_sufix)
Received ValueError:
ValueError: Attempted to load model 'llava', but no model for this name found! Supported model names: qwen_vl, gpt4V, instructblip, minicpm_v
I've tried from many fresh installations. I'm am up to date with the main branch of lmms-eval (cloned it few hours ago). What should I do?
The text was updated successfully, but these errors were encountered: