Add `llava` model for 🤗 Transformers #47

lewtun · 2024-04-05T13:04:58Z

This PR adds the modelling code needed to evaluate llava models in the transformers format: https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0

Example command to run:

accelerate launch --num_processes=8 -m lmms_eval --model llava_hf   --model_args pretrained="llava-hf/llava-1.5-7b-hf"   --tasks mme --batch_size 1 --output_path ./logs/ --log_samples

I will share some benchmark numbers shortly, but the code can be reviewed in any case :)

lewtun · 2024-04-05T13:09:03Z

lmms_eval/models/llava_hf.py

+                context = f"<image>\n{context}"
+            if self.tokenizer.chat_template is not None:
+                messages = [{"role": "user", "content": context}]
+                text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)


I think using chat templates is the best way enable flexibility across fine-tuned models so that one doesn't have to manually implement the template each time (like is currently done for the llava-next models)

kcz358 · 2024-04-06T04:52:22Z

lmms_eval/models/llava_hf.py

+            self.cache_hook.add_partial("generate_until", (context, gen_kwargs), text_outputs)
+            pbar.update(1)
+            # reorder this group of results back to original unsorted form
+        res = re_ords.get_original(res)


A duplicate line here

kcz358 · 2024-04-06T04:52:36Z

Looks quite good for me, I think adding the device_map=auto option and loglikelihood support then should be able to merge

jzhang38 · 2024-04-06T07:45:32Z

lmms_eval/models/llava_hf.py

+                messages = [{"role": "user", "content": context}]
+                text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+            else:
+                text = f"USER: {context}\nASSISTANT:"


Should we prepend vicuna 1.1 system prompt? @kcz358

Yeah, I think so. But that should be determined by the pass in conv template? So that it is much more flexible

Good idea, I'll add a chat_template arg so users can set / override the template in the tokenizer if desired

lewtun · 2024-04-08T15:40:46Z

Looks quite good for me, I think adding the device_map=auto option and loglikelihood support then should be able to merge

I'm working on loglikelihood support - is there a way to test it works as expected? If you can point me to a benchmark or command to run, that would be very helpful!

kcz358 · 2024-04-09T01:34:16Z

Looks quite good for me, I think adding the device_map=auto option and loglikelihood support then should be able to merge

I'm working on loglikelihood support - is there a way to test it works as expected? If you can point me to a benchmark or command to run, that would be very helpful!

Hi, I think you can try seedbench_ppl which is a multiple_choice output type and appending the options one by one to the context to calculate the loglikelihood.

Or you can use this yaml file which is revised on llava_in_the_wild

dataset_path: lmms-lab/llava-bench-coco
dataset_kwargs:
  token: True
task: "llava_in_the_wild_ppl"
test_split: train
output_type: loglikelihood
doc_to_visual: !function utils.llava_doc_to_visual
doc_to_text: !function utils.llava_doc_to_text
doc_to_target: "gpt_answer"
metric_list:
  - metric: perplexity
    higher_is_better: true
metadata:
  version: 0.0
model_specific_prompt_kwargs:
  default:
    pre_prompt: ""
    post_prompt: ""

This will test the model's perplexity on a generation task

lewtun · 2024-04-09T12:45:15Z

Hello @kcz358 @jzhang38 I've now tidied up the code and pushed support for:

Chat templates that the user can specify at runtime. Note that I opted for the Jinja templates we use in transformers so that users don't have to install the GitHub Llava repo
A proper implementation of the Vicuna chat template with system prompt in the Jinja format
Loglikelihood benchmarks
Flash attention

I also ran the 7B model over several benchmarks to compare against the original llava implementation. In some we have good agreement, while in others there is some significant difference. One possible reason is that the image processing differs across implementations (see here) and/or some slight differences in how the inputs are formatted.

Spreadsheet: https://docs.google.com/spreadsheets/d/1CbV-SOSVNl1S60Ns8B0-DhHBH5k5zPAm9M6XcpwFG5w/edit?usp=sharing

Do you have some ideas about why e.g. mme can be so different, given that other benchmarks like mmbench and mmmu are quite similar?

For the loglikelihood benchmarks, here's the chat template that is being applied (inspired by the llava code):

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image> <image> <image> <image> <image> <image> <image> <image>
Please identify the sequence of actions in this video and record them in sequence. Answer :  ASSISTANT:  scoop sugar, pour milk, carry milk, reach cup, carry cup, reach cup</s>

Please let me know if this is not correct, e.g. should the EOS token be omitted?

Edit: I double checked the prompt template for the llava implementation of loglikelihood and spotted a bug in llava.py. Fixed in 7c7b969

lewtun · 2024-04-09T13:20:17Z

lmms_eval/models/llava.py

@@ -198,7 +198,7 @@ def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
            else:
                image = None

-            prompts_input = contexts[0]
+            prompts_input = contexts[0] if isinstance(contexts, list) else contexts


I think this explains the main diff on seedbench_ppl compared to llava_hf (running eval now to compare). Basically the problem was that contexts is a string for batch_size=1 and thus the prompt was just the first letter of the prompt

OK this didn't have much impact after all: seedbench_ppl went from 0.168 -> 0.112.

What is odd is that the numbers reported in the paper are much higher ~0.6 which suggests there is also an issue in the original llava.py implementation as well:

For reference, this is the command I am running:

accelerate launch --num_processes=8 -m lmms_eval --model llava --model_args pretrained=liuhaotian/llava-v1.5-7b --tasks seedbench_ppl --batch_size 1 --output_path ./logs/ --log_samples

Perhaps this is something that can be dealt with in a follow up PR?

Hmmm, I also notice this huge result difference when I write the seedbench ppl. The original llava use generation instead of ppl to generate answer and I achieve similar result using the generation version. So I feel this is just whether you use generation or perplexity. May I ask in your seedbench ppl logs, are the answer being matched correctly?

kcz358 · 2024-04-09T13:43:00Z

Wow your work is amazing @lewtun ! Currently all seem quite good to me and thank you very much for spotting out the loglikelihood issue for us.

For mme disagreement, have you checked that the prompt are exactly the same for hf version and the llava version?

Also, I think just like you mentioned, different image processing implementation will also affect the score. Based on some of the tests in our development, this could cause some significant shift to the score. I checked the eval scripts of the llava and seems like the image processing implementation llava 1.5 used on mme is pad the image to a square.

https://github.com/haotian-liu/LLaVA/blob/4e2277a060da264c4f21b364c867cc622c945874/llava/mm_utils.py#L152-L163

Another factor that may affect the final score is the torch version you use. We provide a reproduce environment here that can exactly reproduce mme score on llava. Whether you use flash attn or not may also affect the score a bit but not too much and can be ignored

lewtun · 2024-04-09T14:32:57Z

For mme disagreement, have you checked that the prompt are exactly the same for hf version and the llava version?

Yes, I've checked they are exactly the same which suggests image processing is the culprit.

Another factor that may affect the final score is the torch version you use. We provide a reproduce environment here that can exactly reproduce mme score on llava. Whether you use flash attn or not may also affect the score a bit but not too much and can be ignored

Thanks, I am using torch==2.1.2 which produces an MME score of 1513.673 for llava which is compatible with the paper. I know there are plans to enable the same padding logic for llava_hf models, so perhaps we can merge this as-is and revisit MME at a future date?

kcz358 · 2024-04-10T00:22:03Z

Thanks, I am using torch==2.1.2 which produces an MME score of 1513.673 for llava which is compatible with the paper. I know there are plans to enable the same padding logic for llava_hf models, so perhaps we can merge this as-is and revisit MME at a future date?

Yeah I think this is okay for now since for most of the benchmark the scores are similar

lewtun · 2024-04-10T07:09:22Z

Yeah I think this is okay for now since for most of the benchmark the scores are similar

Great! Any chance we could merge this soon? We are working on VLM integration in trl and would like to point the community to lmms-eval for the release :)

kcz358 · 2024-04-10T07:36:56Z

lmms_eval/constants.py

Should this be moved into the model_utils here? Can create a folder inside the utils named llava and put this inside

Good idea, will move it there!

Moved to the llava_hf.py file for now (to keep things simple) in 6d08fe8

kcz358 · 2024-04-10T07:44:32Z

Great! Any chance we could merge this soon? We are working on VLM integration in trl and would like to point the community to lmms-eval for the release :)

Hi @Luodian, most of the parts of this PR LGTM for me. Do you think we can merge it now or wait until next release? You might also wanna review the changes and see whether there are things that need to change.

Luodian · 2024-04-11T02:33:05Z

Great! Any chance we could merge this soon? We are working on VLM integration in trl and would like to point the community to lmms-eval for the release :)

Hi @Luodian, most of the parts of this PR LGTM for me. Do you think we can merge it now or wait until next release? You might also wanna review the changes and see whether there are things that need to change.

Hi I think it can be merged directly, but let me see the changes and after checking I will merge it~

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 5a44010 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (#45) commit cf10a45 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (#46) commit caaad1d Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit cfa11b6 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 4d42aa8 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (#42) * refactor vizwizvqa task * Merge commit '0cf06439d3c85aee8783034b226f1badd3a08608' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 4011e6c Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (#45) commit 16a6c1f Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (#46) commit 515a7c4 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit b3a013c Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 1b4a477 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (#42) * refactor vizwizvqa task * Merge commit '41d044cd287adcbcf095afb1a0ef5a96c88c3d9d' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

Add `llava` model for 🤗 Transformers

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit c35da5e Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 0175674 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 25f7a96 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 631891b Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 210d779 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '5b0d7aaac69663d1fedc531b75644ebe1bdb867e' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 21dea7b Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 12144a6 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit aca1e6d Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 0925443 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 16f1cf2 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 9cb2f41 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 8154867 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 2078e19 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 81b2181 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit b22bced Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit e2686e8 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit bf93c62 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 3a6b334 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 568a358 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 966c56f Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '41ceea1413ea03f0089bcc346d9187060dc228df' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 5598ac0 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 015a8d2 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit ee5b446 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 7c11ba4 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit d18d66d Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '780af491d66291bd0780d5426295a4c7dfe385e2' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit 11c9464 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 1cbc746 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit 7c4d14b Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 801829a Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 2bb8fd6 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit ca0c734 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit c6d4d44 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit b5204d4 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit 3dd77b9 Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 058a7d4 Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

* Refactor logging and model initialization * Fix wandb_logger.online() method call * Add error handling during evaluation * Add wait time and error handling in get_chat_response function * Update wait_time in get_chat_response function * Refactor code for improved readability and maintainability * Refactor doc_to_visual function to handle multiple images in ICON-QA tasks * Refactor logging_utils.py and utils.py This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10. * Refactor code for wandb logging and generation in OtterHD class * Refactor prepare_report_by_task method in logging_utils.py * Update generation parameters in OtterHD model * Update generation parameters in OtterHD model * Squashed commit of the following: commit f77ff8a Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Tue Feb 13 18:50:37 2024 +0800 Fix seedbench choices bugs (EvolvingLMMs-Lab#45) commit 23294e3 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Tue Feb 13 18:50:23 2024 +0800 add stvqa and multidocvqa (EvolvingLMMs-Lab#46) commit e60daa7 Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com> Date: Sun Feb 11 00:54:39 2024 +0800 add cmmmu (EvolvingLMMs-Lab#44) Co-authored-by: ygjin11 <1633504509@qq.com> commit d95e7ff Author: kcz358 <92624596+kcz358@users.noreply.github.com> Date: Sun Feb 11 00:54:23 2024 +0800 [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43) * Add qwen loglikelihood * Revise the pyproject dependency. Move tiktoken out from optional-dependencies * Add ferret-bench * Add seedbench 2, test on llava commit 7a005aa Author: JvThunder <44111143+JvThunder@users.noreply.github.com> Date: Wed Feb 7 00:08:22 2024 +0800 Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42) * refactor vizwizvqa task * Merge commit 'cfdce77dad7c0ae328f60712c6dd5ba1bc75cc1d' * Fix exact_match accuracy calculation in vizwiz_vqa_process_results * Update vizwiz_vqa tasks --------- Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>

…a-hf Add `llava` model for 🤗 Transformers

lewtun added 5 commits April 4, 2024 07:49

Remove flash attention

3f878ea

Fix LlavaHf integration - need fp16 in accelerate config!

c804345

Fix parsing

a6e258b

Add chat template

86732eb

Revert

c4955df

lewtun commented Apr 5, 2024

View reviewed changes

kcz358 reviewed Apr 6, 2024

View reviewed changes

jzhang38 reviewed Apr 6, 2024

View reviewed changes

lewtun added 2 commits April 8, 2024 11:00

Merge branch 'EvolvingLMMs-Lab:main' into upstream-llava-hf

55aeed3

Remove duplicate code

78fc617

lewtun added 3 commits April 8, 2024 15:44

Add chat templates

ca20b8c

Fix chat templates

46e1f68

Tidy

2e96d4c

lewtun added 2 commits April 9, 2024 12:36

Add log likelihood

bd00350

Style

d410bea

lewtun added 2 commits April 9, 2024 12:46

Remove loggine

80a1e7d

Fix llava loglikelihood

7c7b969

lewtun commented Apr 9, 2024

View reviewed changes

Tidy up model calls

3f0cad9

lewtun added 2 commits April 9, 2024 14:35

Add cmd

6c27203

Split logging

c531bf0

kcz358 reviewed Apr 10, 2024

View reviewed changes

Refactor

6d08fe8

kcz358 approved these changes Apr 11, 2024

View reviewed changes

Luodian merged commit a876169 into EvolvingLMMs-Lab:main Apr 11, 2024
1 check passed

Luodian added a commit that referenced this pull request Apr 16, 2024

Merge pull request #47 from huggingface/upstream-llava-hf

d6fe446

Add `llava` model for 🤗 Transformers

kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024

Merge pull request EvolvingLMMs-Lab#47 from huggingface/upstream-llav…

e0b561a

…a-hf Add `llava` model for 🤗 Transformers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `llava` model for 🤗 Transformers #47

Add `llava` model for 🤗 Transformers #47

lewtun commented Apr 5, 2024 •

edited

Loading

lewtun Apr 5, 2024

kcz358 Apr 6, 2024

kcz358 commented Apr 6, 2024

jzhang38 Apr 6, 2024 •

edited

Loading

kcz358 Apr 7, 2024 •

edited

Loading

lewtun Apr 8, 2024

lewtun commented Apr 8, 2024

kcz358 commented Apr 9, 2024

lewtun commented Apr 9, 2024 •

edited

Loading

lewtun Apr 9, 2024

lewtun Apr 9, 2024

kcz358 Apr 10, 2024

kcz358 commented Apr 9, 2024

lewtun commented Apr 9, 2024 •

edited

Loading

kcz358 commented Apr 10, 2024

lewtun commented Apr 10, 2024

kcz358 Apr 10, 2024

lewtun Apr 10, 2024

lewtun Apr 10, 2024

kcz358 commented Apr 10, 2024

Luodian commented Apr 11, 2024

Add llava model for 🤗 Transformers #47

Add llava model for 🤗 Transformers #47

Conversation

lewtun commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kcz358 commented Apr 6, 2024

jzhang38 Apr 6, 2024 • edited Loading

Choose a reason for hiding this comment

kcz358 Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lewtun commented Apr 8, 2024

kcz358 commented Apr 9, 2024

lewtun commented Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kcz358 commented Apr 9, 2024

lewtun commented Apr 9, 2024 • edited Loading

kcz358 commented Apr 10, 2024

lewtun commented Apr 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kcz358 commented Apr 10, 2024

Luodian commented Apr 11, 2024

Add `llava` model for 🤗 Transformers #47

Add `llava` model for 🤗 Transformers #47

lewtun commented Apr 5, 2024 •

edited

Loading

jzhang38 Apr 6, 2024 •

edited

Loading

kcz358 Apr 7, 2024 •

edited

Loading

lewtun commented Apr 9, 2024 •

edited

Loading

lewtun commented Apr 9, 2024 •

edited

Loading