Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add llava model for 🤗 Transformers #47

Merged
merged 18 commits into from
Apr 11, 2024

Conversation

lewtun
Copy link
Contributor

@lewtun lewtun commented Apr 5, 2024

This PR adds the modelling code needed to evaluate llava models in the transformers format: https://huggingface.co/collections/llava-hf/llava-15-65f762d5b6941db5c2ba07e0

Example command to run:

accelerate launch --num_processes=8 -m lmms_eval --model llava_hf   --model_args pretrained="llava-hf/llava-1.5-7b-hf"   --tasks mme --batch_size 1 --output_path ./logs/ --log_samples

I will share some benchmark numbers shortly, but the code can be reviewed in any case :)

context = f"<image>\n{context}"
if self.tokenizer.chat_template is not None:
messages = [{"role": "user", "content": context}]
text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using chat templates is the best way enable flexibility across fine-tuned models so that one doesn't have to manually implement the template each time (like is currently done for the llava-next models)

self.cache_hook.add_partial("generate_until", (context, gen_kwargs), text_outputs)
pbar.update(1)
# reorder this group of results back to original unsorted form
res = re_ords.get_original(res)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A duplicate line here

@kcz358
Copy link
Collaborator

kcz358 commented Apr 6, 2024

Looks quite good for me, I think adding the device_map=auto option and loglikelihood support then should be able to merge

messages = [{"role": "user", "content": context}]
text = self.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
else:
text = f"USER: {context}\nASSISTANT:"
Copy link
Contributor

@jzhang38 jzhang38 Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we prepend vicuna 1.1 system prompt? @kcz358

Copy link
Collaborator

@kcz358 kcz358 Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. But that should be determined by the pass in conv template? So that it is much more flexible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I'll add a chat_template arg so users can set / override the template in the tokenizer if desired

@lewtun
Copy link
Contributor Author

lewtun commented Apr 8, 2024

Looks quite good for me, I think adding the device_map=auto option and loglikelihood support then should be able to merge

I'm working on loglikelihood support - is there a way to test it works as expected? If you can point me to a benchmark or command to run, that would be very helpful!

@kcz358
Copy link
Collaborator

kcz358 commented Apr 9, 2024

Looks quite good for me, I think adding the device_map=auto option and loglikelihood support then should be able to merge

I'm working on loglikelihood support - is there a way to test it works as expected? If you can point me to a benchmark or command to run, that would be very helpful!

Hi, I think you can try seedbench_ppl which is a multiple_choice output type and appending the options one by one to the context to calculate the loglikelihood.

Or you can use this yaml file which is revised on llava_in_the_wild

dataset_path: lmms-lab/llava-bench-coco
dataset_kwargs:
  token: True
task: "llava_in_the_wild_ppl"
test_split: train
output_type: loglikelihood
doc_to_visual: !function utils.llava_doc_to_visual
doc_to_text: !function utils.llava_doc_to_text
doc_to_target: "gpt_answer"
metric_list:
  - metric: perplexity
    higher_is_better: true
metadata:
  version: 0.0
model_specific_prompt_kwargs:
  default:
    pre_prompt: ""
    post_prompt: ""

This will test the model's perplexity on a generation task

@lewtun
Copy link
Contributor Author

lewtun commented Apr 9, 2024

Hello @kcz358 @jzhang38 I've now tidied up the code and pushed support for:

  • Chat templates that the user can specify at runtime. Note that I opted for the Jinja templates we use in transformers so that users don't have to install the GitHub Llava repo
  • A proper implementation of the Vicuna chat template with system prompt in the Jinja format
  • Loglikelihood benchmarks
  • Flash attention

I also ran the 7B model over several benchmarks to compare against the original llava implementation. In some we have good agreement, while in others there is some significant difference. One possible reason is that the image processing differs across implementations (see here) and/or some slight differences in how the inputs are formatted.

Screenshot 2024-04-09 at 14 40 40

Spreadsheet: https://docs.google.com/spreadsheets/d/1CbV-SOSVNl1S60Ns8B0-DhHBH5k5zPAm9M6XcpwFG5w/edit?usp=sharing

Do you have some ideas about why e.g. mme can be so different, given that other benchmarks like mmbench and mmmu are quite similar?

For the loglikelihood benchmarks, here's the chat template that is being applied (inspired by the llava code):

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image> <image> <image> <image> <image> <image> <image> <image>
Please identify the sequence of actions in this video and record them in sequence. Answer :  ASSISTANT:  scoop sugar, pour milk, carry milk, reach cup, carry cup, reach cup</s>

Please let me know if this is not correct, e.g. should the EOS token be omitted?

Edit: I double checked the prompt template for the llava implementation of loglikelihood and spotted a bug in llava.py. Fixed in 7c7b969

@@ -198,7 +198,7 @@ def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
else:
image = None

prompts_input = contexts[0]
prompts_input = contexts[0] if isinstance(contexts, list) else contexts
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explains the main diff on seedbench_ppl compared to llava_hf (running eval now to compare). Basically the problem was that contexts is a string for batch_size=1 and thus the prompt was just the first letter of the prompt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this didn't have much impact after all: seedbench_ppl went from 0.168 -> 0.112.

What is odd is that the numbers reported in the paper are much higher ~0.6 which suggests there is also an issue in the original llava.py implementation as well:

Screenshot 2024-04-09 at 16 13 22

For reference, this is the command I am running:

accelerate launch --num_processes=8 -m lmms_eval --model llava   --model_args pretrained=liuhaotian/llava-v1.5-7b   --tasks seedbench_ppl --batch_size 1 --output_path ./logs/ --log_samples

Perhaps this is something that can be dealt with in a follow up PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I also notice this huge result difference when I write the seedbench ppl. The original llava use generation instead of ppl to generate answer and I achieve similar result using the generation version. So I feel this is just whether you use generation or perplexity. May I ask in your seedbench ppl logs, are the answer being matched correctly?

@kcz358
Copy link
Collaborator

kcz358 commented Apr 9, 2024

Wow your work is amazing @lewtun ! Currently all seem quite good to me and thank you very much for spotting out the loglikelihood issue for us.

For mme disagreement, have you checked that the prompt are exactly the same for hf version and the llava version?

Also, I think just like you mentioned, different image processing implementation will also affect the score. Based on some of the tests in our development, this could cause some significant shift to the score. I checked the eval scripts of the llava and seems like the image processing implementation llava 1.5 used on mme is pad the image to a square.

https://github.com/haotian-liu/LLaVA/blob/4e2277a060da264c4f21b364c867cc622c945874/llava/mm_utils.py#L152-L163

Another factor that may affect the final score is the torch version you use. We provide a reproduce environment here that can exactly reproduce mme score on llava. Whether you use flash attn or not may also affect the score a bit but not too much and can be ignored

@lewtun
Copy link
Contributor Author

lewtun commented Apr 9, 2024

For mme disagreement, have you checked that the prompt are exactly the same for hf version and the llava version?

Yes, I've checked they are exactly the same which suggests image processing is the culprit.

Another factor that may affect the final score is the torch version you use. We provide a reproduce environment here that can exactly reproduce mme score on llava. Whether you use flash attn or not may also affect the score a bit but not too much and can be ignored

Thanks, I am using torch==2.1.2 which produces an MME score of 1513.673 for llava which is compatible with the paper. I know there are plans to enable the same padding logic for llava_hf models, so perhaps we can merge this as-is and revisit MME at a future date?

@kcz358
Copy link
Collaborator

kcz358 commented Apr 10, 2024

Thanks, I am using torch==2.1.2 which produces an MME score of 1513.673 for llava which is compatible with the paper. I know there are plans to enable the same padding logic for llava_hf models, so perhaps we can merge this as-is and revisit MME at a future date?

Yeah I think this is okay for now since for most of the benchmark the scores are similar

@lewtun
Copy link
Contributor Author

lewtun commented Apr 10, 2024

Yeah I think this is okay for now since for most of the benchmark the scores are similar

Great! Any chance we could merge this soon? We are working on VLM integration in trl and would like to point the community to lmms-eval for the release :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved into the model_utils here? Can create a folder inside the utils named llava and put this inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will move it there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to the llava_hf.py file for now (to keep things simple) in 6d08fe8

@kcz358
Copy link
Collaborator

kcz358 commented Apr 10, 2024

Great! Any chance we could merge this soon? We are working on VLM integration in trl and would like to point the community to lmms-eval for the release :)

Hi @Luodian, most of the parts of this PR LGTM for me. Do you think we can merge it now or wait until next release? You might also wanna review the changes and see whether there are things that need to change.

@Luodian
Copy link
Contributor

Luodian commented Apr 11, 2024

Great! Any chance we could merge this soon? We are working on VLM integration in trl and would like to point the community to lmms-eval for the release :)

Hi @Luodian, most of the parts of this PR LGTM for me. Do you think we can merge it now or wait until next release? You might also wanna review the changes and see whether there are things that need to change.

Hi I think it can be merged directly, but let me see the changes and after checking I will merge it~

@Luodian Luodian merged commit a876169 into EvolvingLMMs-Lab:main Apr 11, 2024
1 check passed
Luodian added a commit that referenced this pull request Apr 16, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit 5a44010
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (#45)

commit cf10a45
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (#46)

commit caaad1d
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit cfa11b6
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 4d42aa8
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (#42)

    * refactor vizwizvqa task

    * Merge commit '0cf06439d3c85aee8783034b226f1badd3a08608'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Luodian added a commit that referenced this pull request Apr 16, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit 4011e6c
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (#45)

commit 16a6c1f
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (#46)

commit 515a7c4
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit b3a013c
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 1b4a477
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (#42)

    * refactor vizwizvqa task

    * Merge commit '41d044cd287adcbcf095afb1a0ef5a96c88c3d9d'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
Luodian added a commit that referenced this pull request Apr 16, 2024
Add `llava` model for 🤗 Transformers
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit c35da5e
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit 0175674
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit 25f7a96
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 631891b
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 210d779
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '5b0d7aaac69663d1fedc531b75644ebe1bdb867e'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit 21dea7b
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit 12144a6
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit aca1e6d
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 0925443
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 16f1cf2
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit 9cb2f41
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit 8154867
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit 2078e19
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 81b2181
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit b22bced
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit e2686e8
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit bf93c62
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit 3a6b334
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 568a358
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 966c56f
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '41ceea1413ea03f0089bcc346d9187060dc228df'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit 5598ac0
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit 015a8d2
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit ee5b446
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 7c11ba4
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit d18d66d
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '780af491d66291bd0780d5426295a4c7dfe385e2'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit 11c9464
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit 1cbc746
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit 7c4d14b
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 801829a
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 2bb8fd6
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '9bbbad51a77051fcf676438f81e81f723c1b438b'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit ca0c734
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit c6d4d44
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit b5204d4
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit 3dd77b9
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 058a7d4
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit '59c7d67077c315657a02bdee2eace0e64c1ee0d4'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
* Refactor logging and model initialization

* Fix wandb_logger.online() method call

* Add error handling during evaluation

* Add wait time and error handling in get_chat_response function

* Update wait_time in get_chat_response function

* Refactor code for improved readability and maintainability

* Refactor doc_to_visual function to handle multiple images in ICON-QA tasks

* Refactor logging_utils.py and utils.py

This commit refactors the `logging_utils.py` and `utils.py` files. It removes unused imports, adjusts code formatting, and updates the `get_chat_response` function to increase the `wait_time` parameter from 5 to 10.

* Refactor code for wandb logging and generation in OtterHD class

* Refactor prepare_report_by_task method in logging_utils.py

* Update generation parameters in OtterHD model

* Update generation parameters in OtterHD model

* Squashed commit of the following:

commit f77ff8a
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Tue Feb 13 18:50:37 2024 +0800

    Fix seedbench choices bugs (EvolvingLMMs-Lab#45)

commit 23294e3
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Tue Feb 13 18:50:23 2024 +0800

    add stvqa and multidocvqa (EvolvingLMMs-Lab#46)

commit e60daa7
Author: XinrunDu <154438029+XinrunDu@users.noreply.github.com>
Date:   Sun Feb 11 00:54:39 2024 +0800

    add cmmmu (EvolvingLMMs-Lab#44)

    Co-authored-by: ygjin11 <1633504509@qq.com>

commit d95e7ff
Author: kcz358 <92624596+kcz358@users.noreply.github.com>
Date:   Sun Feb 11 00:54:23 2024 +0800

    [Feat] Add qwen loglikelihood (EvolvingLMMs-Lab#43)

    * Add qwen loglikelihood

    * Revise the pyproject dependency. Move tiktoken out from optional-dependencies

    * Add ferret-bench

    * Add seedbench 2, test on llava

commit 7a005aa
Author: JvThunder <44111143+JvThunder@users.noreply.github.com>
Date:   Wed Feb 7 00:08:22 2024 +0800

    Joshua/vizwizvqa refactor (EvolvingLMMs-Lab#42)

    * refactor vizwizvqa task

    * Merge commit 'cfdce77dad7c0ae328f60712c6dd5ba1bc75cc1d'

    * Fix exact_match accuracy calculation in vizwiz_vqa_process_results

    * Update vizwiz_vqa tasks

    ---------

    Co-authored-by: Fanyi Pu <FPU001@e.ntu.edu.sg>
kangreen0210 pushed a commit to kangreen0210/LIME that referenced this pull request Oct 6, 2024
…a-hf

Add `llava` model for 🤗 Transformers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants