Skip to content

yhenon/llm-face-vision

Repository files navigation

LLM Face Vision benchmarks

This repository provides a framework for comparing capabilities of Vision-Language Models (VLMs) to dedicated face recognition systems.

Recognition

Benchmarked models:

Commercial API VLMs:
- Anthropic Haiku
- OpenAI GPT-4o-mini
- Grok-2 vision
- Gemini-2-flash-lite

Open source VLMs:
- LLava Next (https://github.com/LLaVA-VL/LLaVA-NeXT)

Face recognition systems:
- Insightface arface-resnet-100 (https://github.com/deepinsight/insightface/tree/master/recognition/arcface_torch)

Datasets:

AgeDB-30
LFW
CALFW
CPLFW

Results

LLM face recognition

Counting

We evaluate the capacity of VLMs to count faces in a scene. Counting is performed on Wideface validation data.

Benchmarked models:

Commercial API VLMs:
- Anthropic Haiku
- OpenAI GPT-4o-mini
- Grok-2 vision
- Gemini-2-flash-lite

Since models sometimes refuse a query

LLM face counting

About

Benchmarking vision language vision on face tasks

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published