Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
eternal8080 authored Jul 7, 2024
1 parent 897b38b commit 4ce6be7
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,8 @@ <h1 class="title is-1 publication-title">GeoEval: Benchmark for Evaluating LLMs
<div class="box m-5">
<div class="content has-text-centered">
<img src="static/images/lidia.jpg" alt="geometric reasoning" width="84%"/>
<p> Accuracy scores of one leading LLM (i.e., PoT GPT-4), four primary LMMs, random chance, and human performance our proposed
across mathematical reasoning and visual context types. PoT refers to program-of-thought prompting, and PoT GPT-4 is a textual LLM augmented with the caption and OCR text. GPT-4V is manually evaluated via the playground chatbot. <b class="best-score-text" style="color: #C6011F"> The scores of Gemini Ultra are from the Gemini Team, Google.</b>
<p> the performance of models across various subjects, revealing distinct strengths. The WizardMath-7B model significantly outperforms others in flat geometry problems, such as length
and lines. Conversely, in solid geometry problems like cuboids and spheres, GPT-4V surpasses WizardMath-7B, indicating its superior capability in addressing solid geometry questions.
</p>
</div>
</div>
Expand Down

0 comments on commit 4ce6be7

Please sign in to comment.