Tool-Retrieval benchmark leaderboard

Welcome to the ToolRet benchmark leaderboard!

  • Search: Enter keywords for the model name in the search box. Use a semicolon (;) to separate multiple keywords.
  • Model Type: We provide a wide range of open-source models. Choose the model type(s) you're interested in.
  • Model Size: Select the parameter count range to filter models accordingly.

Click the Filter Data button to update the display with the filtered data.

Model types
Model sizes (Parameter Count)
Rank
Model
Average
Comp@10
Recall@10
Prec@10
NDCG@10
Number of Parameters
Model Type
10
jinaai/jina-reranker-v2-base-multilingual
39.09
44.87
56.91
8.84
45.73
Unkown
re-ranking model

Acknowledgement

This work present the first diverse tool retrieval benchmark to evaluate the tool retrieval performance of a wide range of information retrieval models. We sincerely thank prior work, such as MAIR and ToolBench, which inspire this project or provide strong technique reference.

Citation

@article{ToolRetrieval,
  title    = {Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models},
  author   = {Zhengliang Shi, Yuhan Wang, Lingyong Yan, Pengjie Ren, Shuaiqiang Wang, Dawei Yin, Zhaochun Ren},
  year     = 2025,
  journal  = {arXiv},
}

This demo is created by Gradio