MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

This demo uses LLM-based (GPT-4) evaluator to grade open-ended outputs from your models.

Plese upload your json file of your model results containing {v1_0: ..., v1_1: ..., }like this json file.

The grading may last 5 minutes. Sine we only support 1 queue, the grading time may be longer when you need to wait for other users' grading to finish.

The grading results will be downloaded as a zip file.

Select model (gpt-4.1 is free with our api key)