GitHub Trending AI
·
13d ago
·
7
·
benchmark
agent
open source
evaluation
VibeSearchBench is a new benchmark for evaluating multi-turn agentic search systems with 200 tasks involving vague queries and progressive user disclosure, using knowledge-graph-based evaluation metrics (precision/recall/F1 at node and triplet levels). The benchmark integrates with OpenAI-compatible LLMs and OpenClaw CLI, making it directly applicable for engineers building and testing agentic search workflows.