[README] near public (#63) · kangreen0210/LIME@971cc48

Commit

[README] near public (EvolvingLMMs-Lab#63)
* Refactor logging in lmms_eval package

* Refactor variable names in lmms_eval package

* Update README.md with new features and installation instructions

* Update supported models and datasets

* Delete otter.py file

* Fix capitalization in README.md

* Update image sizes and add new features

* Refactor README.md to improve readability and add new features

* Add description for lmms-eval in README.md

* Update accelerator support in README.md

* Update lmms-eval README with improved description and additional features

* Update README.md with improved task grouping description

* change `Otter-AI/MME` to `lmms-lab/MME`

* Update README.md

* Update README.md

* Remove unused code in mme.yaml

* Squashed commit of the following:

commit 2782eb0
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Feb 29 13:40:02 2024 +0800

    Dev/py add models (EvolvingLMMs-Lab#57)

    * add instructblip

    * minicpm_v

    * remove <image> from qwen-vl

    * speed up postprocessing

    * Optimize build context speed

    ---------

    Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 7e8d3e4
Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
Date:   Wed Feb 28 14:49:07 2024 +0800

    Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56)

    * refactor vizwizvqa task

    * Delete vqav2_test and vqav2_val YAML files

    * Refactor vqav2_process_results functions

    * Add a pack for vqav2

    * refactor okvqa

    * roll back vizwiz_vqa

    * Fix exact_match calculation in ok_vqa_process_results

    * Update OKVQA dataset name in readme

    * add model_specific_prompt_kwargs

    * add model_specific_prompt_kwargs to vizwiz_vqa

    * add model_specific_prompt_kwargs for vqav2

    * lint

    * fix a small bug for eval_logger

    * Refactor make_table function to display points as "  -  " if value is None

    * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7'

    * Refactor ok_vqa_aggreate_submissions function

    * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff'

    * Refactor VQA submission file saving

    * Update file utils

    * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0'

    * Refactor file path handling and submission generation

    * OKVQA path

    * vizwizvqa file

    * pack cmmmu

    * fix a small metric bug for cmmmu

    * Add higher_is_better flag to submission metric

    * Add CMMMU dataset to README.md

    * Add logging and refactor submission file generation in docvqa utils.py

    * pack docvqa

    * add traceback to print detailed error

    * Refactor docvqa_test_aggregate_results to accept additional arguments

    * Add metric check in evaluator.py and update test.yaml and val.yaml

    * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

    * merge textvqa

    * textvqa

    * Modify submission file generation for COCO test results

    * Update test result storage path

    * update coco cap file name

    * Update COCO 2017 Caption dataset name

    * ferret

    * Add Ferret dataset

    * Refactor hb_doc_to_text function to include model-specific prompts

    * Add IconQA and its subtasks

    * Refactor image list creation in doc_to_visual function

    * Add process_results function to default template

    * Update process_results function in iconqa utils.py

    * refactor flickr30k

    * change aggregation function

    * Fix formatting issues and update logging message

    * Fix llava can not handle only text question (no visuals)

    * Fix qwen can not handle no image question (no visuals)

    * Add fuyu prepare accelerator scripts

    * refactor mme

    * naming consistency

    * aggregation_submissions consistency

    * flickr30k naming consistency

    * remove submissions for mme

    * remove unused submission function

    * Refactor infovqa_test.yaml and infovqa_val.yaml

    * Refactor code for improved readability and maintainability

    * stvqa

    * remane sqa

    * Update lmms_eval textcaps files and utils.py

    * Update default prompt for text captions

    * Refactor textcaps_aggregation_result function

    * Add generate_submission_file function and update mathvista_aggregate_results signature

    * Update nocaps_test.yaml and nocaps_val.yaml

    * refractor internal_eval

    * Add internal evaluation datasets

    * pack multidocvqa

    * mmvet

    * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

    * Refractor llava wild

    * Refractor llava-bench-coco

    * Add JSON file generation for gpt evaluation details

    * mmmu

    * Remove MMBench English and Chinese tasks

    * Remove unnecessary return statement in mmbench_aggregate_test_results function

    * Fix distributed process group initialization

    * Update dataset paths and group names in mmbench test configs

    * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

    * Add torch module import

    * lint

    * Remove IconQA dataset from README.md

    * Add Multi-DocVQA and its submodules

    * Add new datasets and update task names

    * Refactor flickr_aggregation_result function to accept additional arguments

    * Add timeout kwargs in Accelerator constructor

    * Add encoding to be utf-8 for cmmmu

    * Fix llava try and catch, remove torch.distributed.init in main

    * Ds prepare script for llava

    ---------

    Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 4fa73ba
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Feb 27 22:52:07 2024 +0800

    [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

* add llava main in pyproject

* Update README.md

* Remove unnecessary dependencies and add specific version for llava_repr

* Add dependencies for llava_repr***

* Update README.md

* add some docs on models and command line commands

* remove some lines

* typo

* Update model_guide.md

* Update model_guide.md

* Update README.md

* Update README.md

* Update README.md

* Fix refcocog dataset path

* Record gpt response in eval info

* Resolve conflict

* Fix hallusionbench gpt json saving path

* Rename hallubench gpt output path

* Change remove image to check by type instead of check by names

* More robust check by type

* Add timeout to API requests

* Remove unnecessary img in data

* Forcing an empty commit.

* Testing

* Delete unnecessary things

* Fix error logging in get_chat_response function

* Fix seedbench2 image issue in doc_to_text

* Add conditional exclude for internal eval

* Squashed commit of the following:

commit e873012d0da2711f2076f7c09f390901f89da2f9
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 03:49:36 2024 +0000

    Add conditional exclude for internal eval

commit 621cdd663e0197827a5792872f13cdf3d27d2813
Merge: a3cae8e ffb9eb2
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 03:24:29 2024 +0000

    Merge branch 'dev/readme' into kc/final_fix

commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 02:47:31 2024 +0000

    Fix seedbench2 image issue in doc_to_text

commit 2a7a03205a2514fe0322ab4aa05c4948f9233109
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:32:49 2024 +0000

    Delete unnecessary things

commit a99850057224596d01835fface39d4aafd79de3e
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:31:42 2024 +0000

    Testing

commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:29:30 2024 +0000

    Forcing an empty commit.

commit dddd0276003115c8a150a78eb3ae7bd299c460e4
Merge: 786f2b5 1700786
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:24:56 2024 +0000

    Merge branch 'kc/final_fix' into dev/readme

commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:24:20 2024 +0000

    Remove unnecessary img in data

commit f6705996b992363f2fd3c5dedb90e1bd51d04426
Merge: 4240785 888c1c1
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 13:41:24 2024 +0000

    Merge branch 'kc/final_fix' into dev/readme

commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 13:40:51 2024 +0000

    More robust check by type

commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 13:00:57 2024 +0000

    Change remove image to check by type instead of check by names

commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 12:33:02 2024 +0000

    Rename hallubench gpt output path

commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 09:32:52 2024 +0000

    Fix hallusionbench gpt json saving path

commit 15b0336a932ef1823696e63672837700ce4fdae9
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 08:51:13 2024 +0000

    Resolve conflict

commit f75e7cfd35b1ee814f86abb9d4fbace027c00941
Merge: 9cf86fa 93534dc
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 08:37:21 2024 +0000

    Merge branch 'kc/final_fix' into dev/readme

commit 06c51ea7682e31964ca720a8a40705a3a7f3f360
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 07:55:03 2024 +0000

    Record gpt response in eval info

commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 07:49:01 2024 +0000

    Fix refcocog dataset path

commit 2782eb0
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Feb 29 13:40:02 2024 +0800

    Dev/py add models (EvolvingLMMs-Lab#57)

    * add instructblip

    * minicpm_v

    * remove <image> from qwen-vl

    * speed up postprocessing

    * Optimize build context speed

    ---------

    Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 7e8d3e4
Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
Date:   Wed Feb 28 14:49:07 2024 +0800

    Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56)

    * refactor vizwizvqa task

    * Delete vqav2_test and vqav2_val YAML files

    * Refactor vqav2_process_results functions

    * Add a pack for vqav2

    * refactor okvqa

    * roll back vizwiz_vqa

    * Fix exact_match calculation in ok_vqa_process_results

    * Update OKVQA dataset name in readme

    * add model_specific_prompt_kwargs

    * add model_specific_prompt_kwargs to vizwiz_vqa

    * add model_specific_prompt_kwargs for vqav2

    * lint

    * fix a small bug for eval_logger

    * Refactor make_table function to display points as "  -  " if value is None

    * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7'

    * Refactor ok_vqa_aggreate_submissions function

    * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff'

    * Refactor VQA submission file saving

    * Update file utils

    * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0'

    * Refactor file path handling and submission generation

    * OKVQA path

    * vizwizvqa file

    * pack cmmmu

    * fix a small metric bug for cmmmu

    * Add higher_is_better flag to submission metric

    * Add CMMMU dataset to README.md

    * Add logging and refactor submission file generation in docvqa utils.py

    * pack docvqa

    * add traceback to print detailed error

    * Refactor docvqa_test_aggregate_results to accept additional arguments

    * Add metric check in evaluator.py and update test.yaml and val.yaml

    * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

    * merge textvqa

    * textvqa

    * Modify submission file generation for COCO test results

    * Update test result storage path

    * update coco cap file name

    * Update COCO 2017 Caption dataset name

    * ferret

    * Add Ferret dataset

    * Refactor hb_doc_to_text function to include model-specific prompts

    * Add IconQA and its subtasks

    * Refactor image list creation in doc_to_visual function

    * Add process_results function to default template

    * Update process_results function in iconqa utils.py

    * refactor flickr30k

    * change aggregation function

    * Fix formatting issues and update logging message

    * Fix llava can not handle only text question (no visuals)

    * Fix qwen can not handle no image question (no visuals)

    * Add fuyu prepare accelerator scripts

    * refactor mme

    * naming consistency

    * aggregation_submissions consistency

    * flickr30k naming consistency

    * remove submissions for mme

    * remove unused submission function

    * Refactor infovqa_test.yaml and infovqa_val.yaml

    * Refactor code for improved readability and maintainability

    * stvqa

    * remane sqa

    * Update lmms_eval textcaps files and utils.py

    * Update default prompt for text captions

    * Refactor textcaps_aggregation_result function

    * Add generate_submission_file function and update mathvista_aggregate_results signature

    * Update nocaps_test.yaml and nocaps_val.yaml

    * refractor internal_eval

    * Add internal evaluation datasets

    * pack multidocvqa

    * mmvet

    * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

    * Refractor llava wild

    * Refractor llava-bench-coco

    * Add JSON file generation for gpt evaluation details

    * mmmu

    * Remove MMBench English and Chinese tasks

    * Remove unnecessary return statement in mmbench_aggregate_test_results function

    * Fix distributed process group initialization

    * Update dataset paths and group names in mmbench test configs

    * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

    * Add torch module import

    * lint

    * Remove IconQA dataset from README.md

    * Add Multi-DocVQA and its submodules

    * Add new datasets and update task names

    * Refactor flickr_aggregation_result function to accept additional arguments

    * Add timeout kwargs in Accelerator constructor

    * Add encoding to be utf-8 for cmmmu

    * Fix llava try and catch, remove torch.distributed.init in main

    * Ds prepare script for llava

    ---------

    Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 4fa73ba
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Feb 27 22:52:07 2024 +0800

    [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

* Fix small bugs in list_with_num

* Revise list_with_num model args

* Dev/readme rm rolling (EvolvingLMMs-Lab#60)

* remove log_likelyhood_rolling

* Update time efficiency benchmark in README.md

* add task guide

---------

Co-authored-by: jzhang38 <a1286225768@gmail.com>
Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>

* Remove unnecessary code and update dependencies

* Fix logging utils bug on wandb grouping

* Add reproduce envs

* Squashed commit of the following:

commit bf49a3e1de8431193bdf6f7688a4ff7f4683a84d
Merge: 2475639 f89a736
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sun Mar 3 22:12:12 2024 +0800

    Merge branch 'main' into kc/final_fix

commit b535df91bc792b3b2b296572ec4692c75fdfe878
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sun Mar 3 22:11:04 2024 +0800

    Add reproduce envs

commit d0539a0
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Mar 3 21:19:15 2024 +0800

    [Fix] wandb group logging missing columns (EvolvingLMMs-Lab#61)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

    * Update README.md with new features and installation instructions

    * Update supported models and datasets

    * Delete otter.py file

    * Fix capitalization in README.md

    * Update image sizes and add new features

    * Refactor README.md to improve readability and add new features

    * Add description for lmms-eval in README.md

    * Update accelerator support in README.md

    * Update lmms-eval README with improved description and additional features

    * Update README.md with improved task grouping description

    * change `Otter-AI/MME` to `lmms-lab/MME`

    * Update README.md

    * Update README.md

    * Remove unused code in mme.yaml

    * Squashed commit of the following:

    commit 2782eb0
    Author: Zhang Peiyuan <a1286225768@gmail.com>
    Date:   Thu Feb 29 13:40:02 2024 +0800

        Dev/py add models (EvolvingLMMs-Lab#57)

        * add instructblip

        * minicpm_v

        * remove <image> from qwen-vl

        * speed up postprocessing

        * Optimize build context speed

        ---------

        Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
        Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

    commit 7e8d3e4
    Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Date:   Wed Feb 28 14:49:07 2024 +0800

        Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56)

        * refactor vizwizvqa task

        * Delete vqav2_test and vqav2_val YAML files

        * Refactor vqav2_process_results functions

        * Add a pack for vqav2

        * refactor okvqa

        * roll back vizwiz_vqa

        * Fix exact_match calculation in ok_vqa_process_results

        * Update OKVQA dataset name in readme

        * add model_specific_prompt_kwargs

        * add model_specific_prompt_kwargs to vizwiz_vqa

        * add model_specific_prompt_kwargs for vqav2

        * lint

        * fix a small bug for eval_logger

        * Refactor make_table function to display points as "  -  " if value is None

        * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7'

        * Refactor ok_vqa_aggreate_submissions function

        * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff'

        * Refactor VQA submission file saving

        * Update file utils

        * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0'

        * Refactor file path handling and submission generation

        * OKVQA path

        * vizwizvqa file

        * pack cmmmu

        * fix a small metric bug for cmmmu

        * Add higher_is_better flag to submission metric

        * Add CMMMU dataset to README.md

        * Add logging and refactor submission file generation in docvqa utils.py

        * pack docvqa

        * add traceback to print detailed error

        * Refactor docvqa_test_aggregate_results to accept additional arguments

        * Add metric check in evaluator.py and update test.yaml and val.yaml

        * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

        * merge textvqa

        * textvqa

        * Modify submission file generation for COCO test results

        * Update test result storage path

        * update coco cap file name

        * Update COCO 2017 Caption dataset name

        * ferret

        * Add Ferret dataset

        * Refactor hb_doc_to_text function to include model-specific prompts

        * Add IconQA and its subtasks

        * Refactor image list creation in doc_to_visual function

        * Add process_results function to default template

        * Update process_results function in iconqa utils.py

        * refactor flickr30k

        * change aggregation function

        * Fix formatting issues and update logging message

        * Fix llava can not handle only text question (no visuals)

        * Fix qwen can not handle no image question (no visuals)

        * Add fuyu prepare accelerator scripts

        * refactor mme

        * naming consistency

        * aggregation_submissions consistency

        * flickr30k naming consistency

        * remove submissions for mme

        * remove unused submission function

        * Refactor infovqa_test.yaml and infovqa_val.yaml

        * Refactor code for improved readability and maintainability

        * stvqa

        * remane sqa

        * Update lmms_eval textcaps files and utils.py

        * Update default prompt for text captions

        * Refactor textcaps_aggregation_result function

        * Add generate_submission_file function and update mathvista_aggregate_results signature

        * Update nocaps_test.yaml and nocaps_val.yaml

        * refractor internal_eval

        * Add internal evaluation datasets

        * pack multidocvqa

        * mmvet

        * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

        * Refractor llava wild

        * Refractor llava-bench-coco

        * Add JSON file generation for gpt evaluation details

        * mmmu

        * Remove MMBench English and Chinese tasks

        * Remove unnecessary return statement in mmbench_aggregate_test_results function

        * Fix distributed process group initialization

        * Update dataset paths and group names in mmbench test configs

        * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

        * Add torch module import

        * lint

        * Remove IconQA dataset from README.md

        * Add Multi-DocVQA and its submodules

        * Add new datasets and update task names

        * Refactor flickr_aggregation_result function to accept additional arguments

        * Add timeout kwargs in Accelerator constructor

        * Add encoding to be utf-8 for cmmmu

        * Fix llava try and catch, remove torch.distributed.init in main

        * Ds prepare script for llava

        ---------

        Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
        Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

    commit 4fa73ba
    Author: Li Bo <drluodian@gmail.com>
    Date:   Tue Feb 27 22:52:07 2024 +0800

        [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55)

        * Refactor logging in lmms_eval package

        * Refactor variable names in lmms_eval package

    * add llava main in pyproject

    * Update README.md

    * Remove unnecessary dependencies and add specific version for llava_repr

    * Add dependencies for llava_repr***

    * Update README.md

    * add some docs on models and command line commands

    * remove some lines

    * typo

    * Update model_guide.md

    * Update model_guide.md

    * Update README.md

    * Update README.md

    * Update README.md

    * Fix refcocog dataset path

    * Record gpt response in eval info

    * Resolve conflict

    * Fix hallusionbench gpt json saving path

    * Rename hallubench gpt output path

    * Change remove image to check by type instead of check by names

    * More robust check by type

    * Remove unnecessary img in data

    * Forcing an empty commit.

    * Testing

    * Delete unnecessary things

    * Fix seedbench2 image issue in doc_to_text

    * Add conditional exclude for internal eval

    * Fix small bugs in list_with_num

    * Revise list_with_num model args

    * Fix logging utils bug on wandb grouping

    ---------

    Co-authored-by: Bo Li <drluodian@gmail.com>
    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
    Co-authored-by: jzhang38 <a1286225768@gmail.com>

commit 7dc049915a1846177e0f9f8eab12366881f82157
Merge: 83358a4 5e1c9c7
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sun Mar 3 07:25:48 2024 +0000

    Merge branch 'main' into kc/final_fix

commit 5ec98efc7b666341adc726b8d1d4779b6c543f7f
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sun Mar 3 07:23:19 2024 +0000

    Fix logging utils bug on wandb grouping

commit 105d781
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Mar 3 13:01:11 2024 +0800

    [Fix] refcocog dataset path, record gpt prompt in internal eval, build context issue (EvolvingLMMs-Lab#59)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

    * Update README.md with new features and installation instructions

    * Update supported models and datasets

    * Delete otter.py file

    * Fix capitalization in README.md

    * Update image sizes and add new features

    * Refactor README.md to improve readability and add new features

    * Add description for lmms-eval in README.md

    * Update accelerator support in README.md

    * Update lmms-eval README with improved description and additional features

    * Update README.md with improved task grouping description

    * change `Otter-AI/MME` to `lmms-lab/MME`

    * Update README.md

    * Update README.md

    * Remove unused code in mme.yaml

    * Squashed commit of the following:

    commit 2782eb0
    Author: Zhang Peiyuan <a1286225768@gmail.com>
    Date:   Thu Feb 29 13:40:02 2024 +0800

        Dev/py add models (EvolvingLMMs-Lab#57)

        * add instructblip

        * minicpm_v

        * remove <image> from qwen-vl

        * speed up postprocessing

        * Optimize build context speed

        ---------

        Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
        Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

    commit 7e8d3e4
    Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Date:   Wed Feb 28 14:49:07 2024 +0800

        Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56)

        * refactor vizwizvqa task

        * Delete vqav2_test and vqav2_val YAML files

        * Refactor vqav2_process_results functions

        * Add a pack for vqav2

        * refactor okvqa

        * roll back vizwiz_vqa

        * Fix exact_match calculation in ok_vqa_process_results

        * Update OKVQA dataset name in readme

        * add model_specific_prompt_kwargs

        * add model_specific_prompt_kwargs to vizwiz_vqa

        * add model_specific_prompt_kwargs for vqav2

        * lint

        * fix a small bug for eval_logger

        * Refactor make_table function to display points as "  -  " if value is None

        * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7'

        * Refactor ok_vqa_aggreate_submissions function

        * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff'

        * Refactor VQA submission file saving

        * Update file utils

        * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0'

        * Refactor file path handling and submission generation

        * OKVQA path

        * vizwizvqa file

        * pack cmmmu

        * fix a small metric bug for cmmmu

        * Add higher_is_better flag to submission metric

        * Add CMMMU dataset to README.md

        * Add logging and refactor submission file generation in docvqa utils.py

        * pack docvqa

        * add traceback to print detailed error

        * Refactor docvqa_test_aggregate_results to accept additional arguments

        * Add metric check in evaluator.py and update test.yaml and val.yaml

        * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

        * merge textvqa

        * textvqa

        * Modify submission file generation for COCO test results

        * Update test result storage path

        * update coco cap file name

        * Update COCO 2017 Caption dataset name

        * ferret

        * Add Ferret dataset

        * Refactor hb_doc_to_text function to include model-specific prompts

        * Add IconQA and its subtasks

        * Refactor image list creation in doc_to_visual function

        * Add process_results function to default template

        * Update process_results function in iconqa utils.py

        * refactor flickr30k

        * change aggregation function

        * Fix formatting issues and update logging message

        * Fix llava can not handle only text question (no visuals)

        * Fix qwen can not handle no image question (no visuals)

        * Add fuyu prepare accelerator scripts

        * refactor mme

        * naming consistency

        * aggregation_submissions consistency

        * flickr30k naming consistency

        * remove submissions for mme

        * remove unused submission function

        * Refactor infovqa_test.yaml and infovqa_val.yaml

        * Refactor code for improved readability and maintainability

        * stvqa

        * remane sqa

        * Update lmms_eval textcaps files and utils.py

        * Update default prompt for text captions

        * Refactor textcaps_aggregation_result function

        * Add generate_submission_file function and update mathvista_aggregate_results signature

        * Update nocaps_test.yaml and nocaps_val.yaml

        * refractor internal_eval

        * Add internal evaluation datasets

        * pack multidocvqa

        * mmvet

        * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

        * Refractor llava wild

        * Refractor llava-bench-coco

        * Add JSON file generation for gpt evaluation details

        * mmmu

        * Remove MMBench English and Chinese tasks

        * Remove unnecessary return statement in mmbench_aggregate_test_results function

        * Fix distributed process group initialization

        * Update dataset paths and group names in mmbench test configs

        * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

        * Add torch module import

        * lint

        * Remove IconQA dataset from README.md

        * Add Multi-DocVQA and its submodules

        * Add new datasets and update task names

        * Refactor flickr_aggregation_result function to accept additional arguments

        * Add timeout kwargs in Accelerator constructor

        * Add encoding to be utf-8 for cmmmu

        * Fix llava try and catch, remove torch.distributed.init in main

        * Ds prepare script for llava

        ---------

        Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
        Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

    commit 4fa73ba
    Author: Li Bo <drluodian@gmail.com>
    Date:   Tue Feb 27 22:52:07 2024 +0800

        [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55)

        * Refactor logging in lmms_eval package

        * Refactor variable names in lmms_eval package

    * add llava main in pyproject

    * Update README.md

    * Remove unnecessary dependencies and add specific version for llava_repr

    * Add dependencies for llava_repr***

    * Update README.md

    * add some docs on models and command line commands

    * remove some lines

    * typo

    * Update model_guide.md

    * Update model_guide.md

    * Update README.md

    * Update README.md

    * Update README.md

    * Fix refcocog dataset path

    * Record gpt response in eval info

    * Resolve conflict

    * Fix hallusionbench gpt json saving path

    * Rename hallubench gpt output path

    * Change remove image to check by type instead of check by names

    * More robust check by type

    * Remove unnecessary img in data

    * Forcing an empty commit.

    * Testing

    * Delete unnecessary things

    * Fix seedbench2 image issue in doc_to_text

    * Add conditional exclude for internal eval

    * Fix small bugs in list_with_num

    * Revise list_with_num model args

    ---------

    Co-authored-by: Bo Li <drluodian@gmail.com>
    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
    Co-authored-by: jzhang38 <a1286225768@gmail.com>

commit 8263ca91c87a127d992dd01bdac5f89b8a5ff521
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 05:58:08 2024 +0000

    Revise list_with_num model args

commit c413569d46be0ad604cd249df8bd58ffe26c0e39
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 05:09:15 2024 +0000

    Fix small bugs in list_with_num

commit e873012d0da2711f2076f7c09f390901f89da2f9
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 03:49:36 2024 +0000

    Add conditional exclude for internal eval

commit 621cdd663e0197827a5792872f13cdf3d27d2813
Merge: a3cae8e ffb9eb2
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 03:24:29 2024 +0000

    Merge branch 'dev/readme' into kc/final_fix

commit 6daf75c54fe3d45970c5d35a10000f10c1420c6b
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Sat Mar 2 02:47:31 2024 +0000

    Fix seedbench2 image issue in doc_to_text

commit 2a7a03205a2514fe0322ab4aa05c4948f9233109
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:32:49 2024 +0000

    Delete unnecessary things

commit a99850057224596d01835fface39d4aafd79de3e
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:31:42 2024 +0000

    Testing

commit 42f5fc125c7ee7d31633647f29f0d02ed3e640a8
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:29:30 2024 +0000

    Forcing an empty commit.

commit dddd0276003115c8a150a78eb3ae7bd299c460e4
Merge: 786f2b5 1700786
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:24:56 2024 +0000

    Merge branch 'kc/final_fix' into dev/readme

commit bcffe0b45083f48886e18d5ece5f2504b96bbcbd
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 15:24:20 2024 +0000

    Remove unnecessary img in data

commit f6705996b992363f2fd3c5dedb90e1bd51d04426
Merge: 4240785 888c1c1
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 13:41:24 2024 +0000

    Merge branch 'kc/final_fix' into dev/readme

commit 9290fc1c27ecca86f7ec3df0d932c7fa228e19c9
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 13:40:51 2024 +0000

    More robust check by type

commit 2fceaaf8f855d08d642996cd217ec0f6fc0fa04c
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 13:00:57 2024 +0000

    Change remove image to check by type instead of check by names

commit 33c0a81c91733e9aabe214f0797be2fdd3df1f1c
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 12:33:02 2024 +0000

    Rename hallubench gpt output path

commit 90ad0ace136a35ecc16a09ce841736842f7eb6dd
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 09:32:52 2024 +0000

    Fix hallusionbench gpt json saving path

commit 15b0336a932ef1823696e63672837700ce4fdae9
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 08:51:13 2024 +0000

    Resolve conflict

commit f75e7cfd35b1ee814f86abb9d4fbace027c00941
Merge: 9cf86fa 93534dc
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 08:37:21 2024 +0000

    Merge branch 'kc/final_fix' into dev/readme

commit 06c51ea7682e31964ca720a8a40705a3a7f3f360
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 07:55:03 2024 +0000

    Record gpt response in eval info

commit cdf7e6f77f7b6eee960e01e80c00ec74b8c1fbe7
Author: kcz358 <kaichenzhang358@outlook.com>
Date:   Fri Mar 1 07:49:01 2024 +0000

    Fix refcocog dataset path

commit 2782eb0
Author: Zhang Peiyuan <a1286225768@gmail.com>
Date:   Thu Feb 29 13:40:02 2024 +0800

    Dev/py add models (EvolvingLMMs-Lab#57)

    * add instructblip

    * minicpm_v

    * remove <image> from qwen-vl

    * speed up postprocessing

    * Optimize build context speed

    ---------

    Co-authored-by: Pu Fanyi <FPU001@e.ntu.edu.sg>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 7e8d3e4
Author: Pu Fanyi <FPU001@e.ntu.edu.sg>
Date:   Wed Feb 28 14:49:07 2024 +0800

    Pufanyi/flickr30k refractor (EvolvingLMMs-Lab#56)

    * refactor vizwizvqa task

    * Delete vqav2_test and vqav2_val YAML files

    * Refactor vqav2_process_results functions

    * Add a pack for vqav2

    * refactor okvqa

    * roll back vizwiz_vqa

    * Fix exact_match calculation in ok_vqa_process_results

    * Update OKVQA dataset name in readme

    * add model_specific_prompt_kwargs

    * add model_specific_prompt_kwargs to vizwiz_vqa

    * add model_specific_prompt_kwargs for vqav2

    * lint

    * fix a small bug for eval_logger

    * Refactor make_table function to display points as "  -  " if value is None

    * Merge commit 'c5e52a785d3cc87a866be9b880deb477d9f73fb7'

    * Refactor ok_vqa_aggreate_submissions function

    * Merge commit 'e5aa0a9601d6d8ce727315e4b0a8f13f06f26bff'

    * Refactor VQA submission file saving

    * Update file utils

    * Merge commit '560deca9f72483ca091795d6dc2537d4c54b32b0'

    * Refactor file path handling and submission generation

    * OKVQA path

    * vizwizvqa file

    * pack cmmmu

    * fix a small metric bug for cmmmu

    * Add higher_is_better flag to submission metric

    * Add CMMMU dataset to README.md

    * Add logging and refactor submission file generation in docvqa utils.py

    * pack docvqa

    * add traceback to print detailed error

    * Refactor docvqa_test_aggregate_results to accept additional arguments

    * Add metric check in evaluator.py and update test.yaml and val.yaml

    * add common `EvalAIAnswerProcessor` for okvqa, textvqa, vizwizvqa and vqav2

    * merge textvqa

    * textvqa

    * Modify submission file generation for COCO test results

    * Update test result storage path

    * update coco cap file name

    * Update COCO 2017 Caption dataset name

    * ferret

    * Add Ferret dataset

    * Refactor hb_doc_to_text function to include model-specific prompts

    * Add IconQA and its subtasks

    * Refactor image list creation in doc_to_visual function

    * Add process_results function to default template

    * Update process_results function in iconqa utils.py

    * refactor flickr30k

    * change aggregation function

    * Fix formatting issues and update logging message

    * Fix llava can not handle only text question (no visuals)

    * Fix qwen can not handle no image question (no visuals)

    * Add fuyu prepare accelerator scripts

    * refactor mme

    * naming consistency

    * aggregation_submissions consistency

    * flickr30k naming consistency

    * remove submissions for mme

    * remove unused submission function

    * Refactor infovqa_test.yaml and infovqa_val.yaml

    * Refactor code for improved readability and maintainability

    * stvqa

    * remane sqa

    * Update lmms_eval textcaps files and utils.py

    * Update default prompt for text captions

    * Refactor textcaps_aggregation_result function

    * Add generate_submission_file function and update mathvista_aggregate_results signature

    * Update nocaps_test.yaml and nocaps_val.yaml

    * refractor internal_eval

    * Add internal evaluation datasets

    * pack multidocvqa

    * mmvet

    * Fix gpt eval timeout issue for hallubench, restore load from gpt to avoid re evaluating

    * Refractor llava wild

    * Refractor llava-bench-coco

    * Add JSON file generation for gpt evaluation details

    * mmmu

    * Remove MMBench English and Chinese tasks

    * Remove unnecessary return statement in mmbench_aggregate_test_results function

    * Fix distributed process group initialization

    * Update dataset paths and group names in mmbench test configs

    * Update import statements in cc_utils.py, cn_utils.py, and en_utils.py

    * Add torch module import

    * lint

    * Remove IconQA dataset from README.md

    * Add Multi-DocVQA and its submodules

    * Add new datasets and update task names

    * Refactor flickr_aggregation_result function to accept additional arguments

    * Add timeout kwargs in Accelerator constructor

    * Add encoding to be utf-8 for cmmmu

    * Fix llava try and catch, remove torch.distributed.init in main

    * Ds prepare script for llava

    ---------

    Co-authored-by: JvThunder <joshuaadrianc@gmail.com>
    Co-authored-by: kcz358 <kaichenzhang358@outlook.com>

commit 4fa73ba
Author: Li Bo <drluodian@gmail.com>
Date:   Tue Feb 27 22:52:07 2024 +0800

    [Wandb Logger] add models, and args to wandb tables. (EvolvingLMMs-Lab#55)

    * Refactor logging in lmms_eval package

    * Refactor variable names in lmms_eval package

* Update commands.md

* Add repr_scripts for reference

* Add timeout for gpt4V

* Remove unnecessary dependencies

* Add reproduce into readme

* Revise seedbench process_result

* Fix exclude dc hardcode postprocess logic error

* Fix metric repeat issue

* Update dataset runtime and add environment info

* Revise val submission file saving path

* Put the correct query into the gpt extraction

* Update sleep time in utils.py

* update

---------

Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Co-authored-by: kcz358 <92624596+kcz358@users.noreply.github.com>
Co-authored-by: jzhang38 <a1286225768@gmail.com>
Co-authored-by: kcz358 <kaichenzhang358@outlook.com>