Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[README] near public (EvolvingLMMs-Lab#63)
* Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 7e8d3e4 Author: Pu Fanyi <FPU001@e.ntu.edu.sg> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <joshuaadrianc@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 4fa73ba Author: Li Bo <drluodian@gmail.com> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Add timeout to API requests * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix error logging in get_chat_response function * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Squashed commit of the following: commit e873012d0da2711f2076f7c09f390901f89da2f9 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 621cdd663e0197827a5792872f13cdf3d27d2813 Merge: a3cae8e ffb9eb2 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 2a7a03205a2514fe0322ab4aa05c4948f9233109 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit a99850057224596d01835fface39d4aafd79de3e Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit dddd0276003115c8a150a78eb3ae7bd299c460e4 Merge: 786f2b5 1700786 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f6705996b992363f2fd3c5dedb90e1bd51d04426 Merge: 4240785 888c1c1 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 15b0336a932ef1823696e63672837700ce4fdae9 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit f75e7cfd35b1ee814f86abb9d4fbace027c00941 Merge: 9cf86fa 93534dc Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 06c51ea7682e31964ca720a8a40705a3a7f3f360 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2782eb0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 7e8d3e4 Author: Pu Fanyi <FPU001@e.ntu.edu.sg> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <joshuaadrianc@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 4fa73ba Author: Li Bo <drluodian@gmail.com> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Fix small bugs in list_with_num * Revise list_with_num model args * Dev/readme rm rolling (EvolvingLMMs-Lab#60) * remove log_likelyhood_rolling * Update time efficiency benchmark in README.md * add task guide --------- Co-authored-by: jzhang38 <a1286225768@gmail.com> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> * Remove unnecessary code and update dependencies * Fix logging utils bug on wandb grouping * Add reproduce envs * Squashed commit of the following: commit bf49a3e1de8431193bdf6f7688a4ff7f4683a84d Merge: 2475639 f89a736 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sun Mar 3 22:12:12 2024 +0800 Merge branch 'main' into kc/final_fix commit b535df91bc792b3b2b296572ec4692c75fdfe878 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sun Mar 3 22:11:04 2024 +0800 Add reproduce envs commit d0539a0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Mar 3 21:19:15 2024 +0800 [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 7e8d3e4 Author: Pu Fanyi <FPU001@e.ntu.edu.sg> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <joshuaadrianc@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 4fa73ba Author: Li Bo <drluodian@gmail.com> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args * Fix logging utils bug on wandb grouping --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> Co-authored-by: jzhang38 <a1286225768@gmail.com> commit 7dc049915a1846177e0f9f8eab12366881f82157 Merge: 83358a4 5e1c9c7 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sun Mar 3 07:25:48 2024 +0000 Merge branch 'main' into kc/final_fix commit 5ec98efc7b666341adc726b8d1d4779b6c543f7f Author: kcz358 <kaichenzhang358@outlook.com> Date: Sun Mar 3 07:23:19 2024 +0000 Fix logging utils bug on wandb grouping commit 105d781 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Mar 3 13:01:11 2024 +0800 [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update README.md with new features and installation instructions * Update supported models and datasets * Delete otter.py file * Fix capitalization in README.md * Update image sizes and add new features * Refactor README.md to improve readability and add new features * Add description for lmms-eval in README.md * Update accelerator support in README.md * Update lmms-eval README with improved description and additional features * Update README.md with improved task grouping description * change `Otter-AI/MME` to `lmms-lab/MME` * Update README.md * Update README.md * Remove unused code in mme.yaml * Squashed commit of the following: commit 2782eb0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 7e8d3e4 Author: Pu Fanyi <FPU001@e.ntu.edu.sg> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <joshuaadrianc@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 4fa73ba Author: Li Bo <drluodian@gmail.com> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * add llava main in pyproject * Update README.md * Remove unnecessary dependencies and add specific version for llava_repr * Add dependencies for llava_repr*** * Update README.md * add some docs on models and command line commands * remove some lines * typo * Update model_guide.md * Update model_guide.md * Update README.md * Update README.md * Update README.md * Fix refcocog dataset path * Record gpt response in eval info * Resolve conflict * Fix hallusionbench gpt json saving path * Rename hallubench gpt output path * Change remove image to check by type instead of check by names * More robust check by type * Remove unnecessary img in data * Forcing an empty commit. * Testing * Delete unnecessary things * Fix seedbench2 image issue in doc_to_text * Add conditional exclude for internal eval * Fix small bugs in list_with_num * Revise list_with_num model args --------- Co-authored-by: Bo Li <drluodian@gmail.com> Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> Co-authored-by: jzhang38 <a1286225768@gmail.com> commit 8263ca91c87a127d992dd01bdac5f89b8a5ff521 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 05:58:08 2024 +0000 Revise list_with_num model args commit c413569d46be0ad604cd249df8bd58ffe26c0e39 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 05:09:15 2024 +0000 Fix small bugs in list_with_num commit e873012d0da2711f2076f7c09f390901f89da2f9 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 03:49:36 2024 +0000 Add conditional exclude for internal eval commit 621cdd663e0197827a5792872f13cdf3d27d2813 Merge: a3cae8e ffb9eb2 Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 03:24:29 2024 +0000 Merge branch 'dev/readme' into kc/final_fix commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b Author: kcz358 <kaichenzhang358@outlook.com> Date: Sat Mar 2 02:47:31 2024 +0000 Fix seedbench2 image issue in doc_to_text commit 2a7a03205a2514fe0322ab4aa05c4948f9233109 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:32:49 2024 +0000 Delete unnecessary things commit a99850057224596d01835fface39d4aafd79de3e Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:31:42 2024 +0000 Testing commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:29:30 2024 +0000 Forcing an empty commit. commit dddd0276003115c8a150a78eb3ae7bd299c460e4 Merge: 786f2b5 1700786 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:24:56 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 15:24:20 2024 +0000 Remove unnecessary img in data commit f6705996b992363f2fd3c5dedb90e1bd51d04426 Merge: 4240785 888c1c1 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 13:41:24 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 13:40:51 2024 +0000 More robust check by type commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 13:00:57 2024 +0000 Change remove image to check by type instead of check by names commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 12:33:02 2024 +0000 Rename hallubench gpt output path commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 09:32:52 2024 +0000 Fix hallusionbench gpt json saving path commit 15b0336a932ef1823696e63672837700ce4fdae9 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 08:51:13 2024 +0000 Resolve conflict commit f75e7cfd35b1ee814f86abb9d4fbace027c00941 Merge: 9cf86fa 93534dc Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 08:37:21 2024 +0000 Merge branch 'kc/final_fix' into dev/readme commit 06c51ea7682e31964ca720a8a40705a3a7f3f360 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 07:55:03 2024 +0000 Record gpt response in eval info commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7 Author: kcz358 <kaichenzhang358@outlook.com> Date: Fri Mar 1 07:49:01 2024 +0000 Fix refcocog dataset path commit 2782eb0 Author: Zhang Peiyuan <a1286225768@gmail.com> Date: Thu Feb 29 13:40:02 2024 +0800 Dev/py add models (EvolvingLMMs-Lab#57) * add instructblip * minicpm_v * remove <image> from qwen-vl * speed up postprocessing * Optimize build context speed --------- Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 7e8d3e4 Author: Pu Fanyi <FPU001@e.ntu.edu.sg> Date: Wed Feb 28 14:49:07 2024 +0800 Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56) * refactor vizwizvqa task * Delete vqav2_test and vqav2_val YAML files * Refactor vqav2_process_results functions * Add a pack for vqav2 * refactor okvqa * roll back vizwiz_vqa * Fix exact_match calculation in ok_vqa_process_results * Update OKVQA dataset name in readme * add model_specific_prompt_kwargs * add model_specific_prompt_kwargs to vizwiz_vqa * add model_specific_prompt_kwargs for vqav2 * lint * fix a small bug for eval_logger * Refactor make_table function to display points as " - " if value is None * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7' * Refactor ok_vqa_aggreate_submissions function * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff' * Refactor VQA submission file saving * Update file utils * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0' * Refactor file path handling and submission generation * OKVQA path * vizwizvqa file * pack cmmmu * fix a small metric bug for cmmmu * Add higher_is_better flag to submission metric * Add CMMMU dataset to README.md * Add logging and refactor submission file generation in docvqa utils.py * pack docvqa * add traceback to print detailed error * Refactor docvqa_test_aggregate_results to accept additional arguments * Add metric check in evaluator.py and update test.yaml and val.yaml * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2 * merge textvqa * textvqa * Modify submission file generation for COCO test results * Update test result storage path * update coco cap file name * Update COCO 2017 Caption dataset name * ferret * Add Ferret dataset * Refactor hb_doc_to_text function to include model-specific prompts * Add IconQA and its subtasks * Refactor image list creation in doc_to_visual function * Add process_results function to default template * Update process_results function in iconqa utils.py * refactor flickr30k * change aggregation function * Fix formatting issues and update logging message * Fix llava can not handle only text question (no visuals) * Fix qwen can not handle no image question (no visuals) * Add fuyu prepare accelerator scripts * refactor mme * naming consistency * aggregation_submissions consistency * flickr30k naming consistency * remove submissions for mme * remove unused submission function * Refactor infovqa_test.yaml and infovqa_val.yaml * Refactor code for improved readability and maintainability * stvqa * remane sqa * Update lmms_eval textcaps files and utils.py * Update default prompt for text captions * Refactor textcaps_aggregation_result function * Add generate_submission_file function and update mathvista_aggregate_results signature * Update nocaps_test.yaml and nocaps_val.yaml * refractor internal_eval * Add internal evaluation datasets * pack multidocvqa * mmvet * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating * Refractor llava wild * Refractor llava-bench-coco * Add JSON file generation for gpt evaluation details * mmmu * Remove MMBench English and Chinese tasks * Remove unnecessary return statement in mmbench_aggregate_test_results function * Fix distributed process group initialization * Update dataset paths and group names in mmbench test configs * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py * Add torch module import * lint * Remove IconQA dataset from README.md * Add Multi-DocVQA and its submodules * Add new datasets and update task names * Refactor flickr_aggregation_result function to accept additional arguments * Add timeout kwargs in Accelerator constructor * Add encoding to be utf-8 for cmmmu * Fix llava try and catch, remove torch.distributed.init in main * Ds prepare script for llava --------- Co-authored-by: JvThunder <joshuaadrianc@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com> commit 4fa73ba Author: Li Bo <drluodian@gmail.com> Date: Tue Feb 27 22:52:07 2024 +0800 [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55) * Refactor logging in lmms_eval package * Refactor variable names in lmms_eval package * Update commands.md * Add repr_scripts for reference * Add timeout for gpt4V * Remove unnecessary dependencies * Add reproduce into readme * Revise seedbench process_result * Fix exclude dc hardcode postprocess logic error * Fix metric repeat issue * Update dataset runtime and add environment info * Revise val submission file saving path * Put the correct query into the gpt extraction * Update sleep time in utils.py * update --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg> Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com> Co-authored-by: jzhang38 <a1286225768@gmail.com> Co-authored-by: kcz358 <kaichenzhang358@outlook.com>
- Loading branch information