-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llava
model for 🤗 Transformers
#47
Add llava
model for 🤗 Transformers
#47
Conversation
context = f"<image>\n{context}" | ||
if self.tokenizer.chat_template is not None: | ||
messages = [{"role": "user", "content": context}] | ||
text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using chat templates is the best way enable flexibility across fine-tuned models so that one doesn't have to manually implement the template each time (like is currently done for the llava-next
models)
lmms_eval/models/llava_hf.py
Outdated
self.cache_hook.add_partial("generate_until", (context, gen_kwargs), text_outputs) | ||
pbar.update(1) | ||
# reorder this group of results back to original unsorted form | ||
res = re_ords.get_original(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A duplicate line here
Looks quite good for me, I think adding the |
lmms_eval/models/llava_hf.py
Outdated
messages = [{"role": "user", "content": context}] | ||
text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | ||
else: | ||
text = f"USER: {context}\nASSISTANT:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we prepend vicuna 1.1 system prompt? @kcz358
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think so. But that should be determined by the pass in conv template? So that it is much more flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I'll add a chat_template
arg so users can set / override the template in the tokenizer if desired
I'm working on loglikelihood support - is there a way to test it works as expected? If you can point me to a benchmark or command to run, that would be very helpful! |
Hi, I think you can try seedbench_ppl which is a multiple_choice output type and appending the options one by one to the context to calculate the loglikelihood. Or you can use this yaml file which is revised on llava_in_the_wild
This will test the model's perplexity on a generation task |
Hello @kcz358 @jzhang38 I've now tidied up the code and pushed support for:
I also ran the 7B model over several benchmarks to compare against the original Spreadsheet: https://docs.google.com/spreadsheets/d/1CbV-SOSVNl1S60Ns8B0-DhHBH5k5zPAm9M6XcpwFG5w/edit?usp=sharing Do you have some ideas about why e.g. For the loglikelihood benchmarks, here's the chat template that is being applied (inspired by the
Please let me know if this is not correct, e.g. should the EOS token be omitted? Edit: I double checked the prompt template for the |
@@ -198,7 +198,7 @@ def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]: | |||
else: | |||
image = None | |||
|
|||
prompts_input = contexts[0] | |||
prompts_input = contexts[0] if isinstance(contexts, list) else contexts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this explains the main diff on seedbench_ppl
compared to llava_hf
(running eval now to compare). Basically the problem was that contexts
is a string for batch_size=1
and thus the prompt was just the first letter of the prompt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK this didn't have much impact after all: seedbench_ppl
went from 0.168
-> 0.112
.
What is odd is that the numbers reported in the paper are much higher ~0.6 which suggests there is also an issue in the original llava.py
implementation as well:
For reference, this is the command I am running:
accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained=liuhaotian/llava-v1.5-7b --tasks seedbench_ppl --batch_size 1 --output_path ./logs/ --log_samples
Perhaps this is something that can be dealt with in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I also notice this huge result difference when I write the seedbench ppl. The original llava use generation instead of ppl to generate answer and I achieve similar result using the generation version. So I feel this is just whether you use generation or perplexity. May I ask in your seedbench ppl logs, are the answer being matched correctly?
Wow your work is amazing @lewtun ! Currently all seem quite good to me and thank you very much for spotting out the loglikelihood issue for us. For mme disagreement, have you checked that the prompt are exactly the same for hf version and the llava version? Also, I think just like you mentioned, different image processing implementation will also affect the score. Based on some of the tests in our development, this could cause some significant shift to the score. I checked the eval scripts of the llava and seems like the image processing implementation llava 1.5 used on mme is pad the image to a square. Another factor that may affect the final score is the torch version you use. We provide a reproduce environment here that can exactly reproduce mme score on llava. Whether you use flash attn or not may also affect the score a bit but not too much and can be ignored |
Yes, I've checked they are exactly the same which suggests image processing is the culprit.
Thanks, I am using |
Yeah I think this is okay for now since for most of the benchmark the scores are similar |
Great! Any chance we could merge this soon? We are working on VLM integration in |
lmms_eval/constants.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved into the model_utils here? Can create a folder inside the utils named llava and put this inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will move it there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to the llava_hf.py
file for now (to keep things simple) in 6d08fe8
Hi @Luodian, most of the parts of this PR LGTM for me. Do you think we can merge it now or wait until next release? You might also wanna review the changes and see whether there are things that need to change. |
Hi I think it can be merged directly, but let me see the changes and after checking I will merge it~ |
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 5a44010 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (#45) commit cf10a45 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (#46) commit caaad1d Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit cfa11b6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 4d42aa8 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (#42) * refactor vizwizvqa task * Merge commit '0cf06439d3c85aee8783034b226f1badd3a08608' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 4011e6c Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (#45) commit 16a6c1f Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (#46) commit 515a7c4 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit b3a013c Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 1b4a477 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (#42) * refactor vizwizvqa task * Merge commit '41d044cd287adcbcf095afb1a0ef5a96c88c3d9d' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Add `llava` model for 🤗 Transformers
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit c35da5e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 0175674 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 25f7a96 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 631891b Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 210d779 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '5b0d7aaac69663d1fedc531b75644ebe1bdb867e' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 21dea7b Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 12144a6 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit aca1e6d Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 0925443 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 16f1cf2 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 9cb2f41 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 8154867 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 2078e19 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 81b2181 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit b22bced Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit e2686e8 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit bf93c62 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 3a6b334 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 568a358 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 966c56f Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '41ceea1413ea03f0089bcc346d9187060dc228df' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 5598ac0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 015a8d2 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit ee5b446 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 7c11ba4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit d18d66d Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '780af491d66291bd0780d5426295a4c7dfe385e2' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 11c9464 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 1cbc746 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 7c4d14b Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 801829a Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 2bb8fd6 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit ca0c734 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit c6d4d44 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit b5204d4 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 3dd77b9 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 058a7d4 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit f77ff8a Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 23294e3 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit e60daa7 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit d95e7ff Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 7a005aa Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit 'cfdce77dad7c0ae328f60712c6dd5ba1bc75cc1d' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
…a-hf Add `llava` model for 🤗 Transformers
This PR adds the modelling code needed to evaluate
llava
models in thetransformers
format: https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0Example command to run:
accelerate launch --num_processes=8 -m lmms_eval --model llava_hf --model_args pretrained="llava-hf/llava-1.5-7b-hf" --tasks mme --batch_size 1 --output_path ./logs/ --log_samples
I will share some benchmark numbers shortly, but the code can be reviewed in any case :)