GitHub Trending AI · 13d ago · 7 · benchmark agent open source evaluation

VibeSearchBench is a new benchmark for evaluating multi-turn agentic search systems with 200 tasks involving vague queries and progressive user disclosure, using knowledge-graph-based evaluation metrics (precision/recall/F1 at node and triplet levels). The benchmark integrates with OpenAI-compatible LLMs and OpenClaw CLI, making it directly applicable for engineers building and testing agentic search workflows.