News Nug

VibeSearchBench — 🔍 The hardest search benchmark in the wild — vague, multi-turn, proactive. 200 long-horizon tasks with persona-driven progressive disclosure, scored by verifiable schema-free knowledge-graph evaluation. No vibes, just triplet F1.

GitHub Trending AI · 13d ago · 7 · benchmark agent open source evaluation

VibeSearchBench is a new benchmark for evaluating multi-turn agentic search systems with 200 tasks involving vague queries and progressive user disclosure, using knowledge-graph-based evaluation metrics (precision/recall/F1 at node and triplet levels). The benchmark integrates with OpenAI-compatible LLMs and OpenClaw CLI, making it directly applicable for engineers building and testing agentic search workflows.