Replies: 1 comment
-
cc: @hzhwcmhf |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
想请教下官方大佬,我们做了qwen2-72b-instruct和qwen2.5-72b-instruct的opencompass测评,发现与官方public的指标不符,主要有两个方面,1、mmlu_pro、GPQA_diamond指标和qwen2基本持平(public说是qwen2.5提升很多) 2、TheoremQA指标比qwen2下降了很多(大概是qwen2的一半)。
想请教下,这三个指标是如何测评的,是否也用了opencompass,是否在opencompass中做了某些优化,比如prompts方面,不胜感激!!!!
PS:
1、我们用到的配置如下:
mmlu版本:mmlu_pro_gen_cdbebf
gpqa:gpqa_openai_simple_evals_gen_5aeece
TheoremQA:TheoremQA_5shot_gen_6f0af8
Beta Was this translation helpful? Give feedback.
All reactions