Logo ProJudge

A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges

Jiaxin Ai, Pengfei Zhou, Zhaopan Xu, Ming Li, Fanrui Zhang, Zizhen Li, Jianwen Sun,
Yukang Feng, Baojin Huang, Zhongyuan Wang†, Kaipeng Zhang†

†Corresponding Author: zhangkaipeng@pjlab.org.cn
data_construction

🌈 we introduce ProJudgeBench, a comprehensive benchmark for assessing MLLMs' capabilities as process judges, and ProJudge-173k, a large-scale instruction-tuning dataset designed to enhance open-source MLLMs' process evaluation abilities.

🔔News

✨[03/11/2025] We release our paper and project page. The data and codes will be openly available soon!

BibTeX


@article{ai2025projudge,
  title={ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges},
  author={Jiaxin Ai and Pengfei Zhou and Zhaopan Xu and Ming Li and Fanrui Zhang and Zizhen Li and Jianwen Sun and Yukang Feng and Baojin Huang and Zhongyuan Wang and Kaipeng Zhang},
  journal={arXiv preprint arXiv:2503.06553},
  year={2025}
}