News Nug

Latent Space · 14h ago · 7 · api update model benchmark agent

OpenAI released GPT-5.6 with explicit model stratification (Luna/Terra/Sol) and multiple effort levels, creating 36+ configuration variants that confused users and caused faster-than-expected API usage burn. Initial benchmarks show GPT-5.6 excels at agentic coding and presentation tasks while remaining competitive on cost, though OpenAI quickly course-corrected UX regressions and clarified routing defaults after community feedback.

[AINews] OpenAI launches GPT 5.6 Sol/Terra/Luna, Codex becomes ChatGPT superapp

Latent Space · 1d ago · 9 · new model api update agent benchmark

OpenAI released GPT-5.6 in three sizes (Sol, Terra, Luna) with a new 'ultra' effort level that coordinates four agents in parallel for complex tasks. Terra and Luna achieve better performance than previous flagship models at 1/3 the latency, 1/2 the tokens, and 1/4 the cost, with state-of-the-art results on engineering benchmarks. The release includes expanded API pricing tiers and new capabilities in computer use and long-horizon coding tasks.

Jul 9, 2026Case StudyUST is bringing Claude to physical AI

Anthropic Blog · 1d ago · 6 · api update deployment agent

Anthropic is partnering with UST to integrate Claude into hardware validation and chip manufacturing workflows, using Claude Code to automatically generate and run regression tests on hardware designs and validate silicon against digital twins. The partnership targets 20,000 engineers across semiconductor and manufacturing companies, aiming to reduce validation cycle times from 4 days to 48 hours through automated test generation and fault detection.

The new GPT-5.6 family: Luna, Terra, Sol

Simon Willison · 1d ago · 9 · new model api update agent benchmark

OpenAI released GPT-5.6 family (Luna, Terra, Sol) with significant improvements in agentic performance benchmarks and new API features for reasoning token control. The models offer better cost-efficiency than Claude Fable 5 for agent workflows, though coding performance remains competitive rather than definitively superior.

Introducing Muse Spark 1.1

Simon Willison · 2d ago · 9 · new model api update tool agent

Meta released Muse Spark 1.1 with a new API and claimed improvements in agentic tool calling and computer use capabilities. The post includes a new LLM CLI plugin (llm-meta-ai) for programmatic access to the model, making it immediately useful for engineers building with AI.

ChatGPT is now a partner for your most ambitious work

OpenAI Blog · 2d ago · 7 · agent workflow api update

ChatGPT Work introduces agentic capabilities enabling multi-step task automation across integrated applications and files with extended context persistence. This represents a meaningful evolution in AI agent design for practical workflow automation, though specific technical implementation details and API access patterns would be needed for actionable integration.

[AINews] SpaceXAI launches Grok 4.5, first Opus-class model post Cursor acquisition

Latent Space · 2d ago · 7 · new model tool api update agent inference

Grok 4.5, a new frontier model from xAI trained specifically for coding and agents, launched with Cursor partnership offering Opus-class performance at better speed, cost efficiency, and token efficiency. The model is positioned for practical engineering workflows rather than benchmark supremacy, with immediate availability across Cursor, Grok API, OpenRouter, and agent frameworks like Hermes.

Introducing GPT‑Live

Simon Willison · 2d ago · 8 · new model api update workflow

OpenAI upgraded ChatGPT's voice mode with GPT-Live, a new model that intelligently delegates complex tasks (web search, reasoning) to GPT-5.5 while maintaining conversational flow. The upgrade significantly improves voice mode's usefulness as a brainstorming tool, moving beyond the outdated GPT-4o model previously in use.

GPT-5.6 Thursday ⭐️, Claude Cowork mobile 📱, Gemini API agents 🤖

TLDR AI · 2d ago · 8 · agent api update tool workflow

MiniMax Code offers a practical AI platform for building multi-step agents with 1M token context window, native vision capabilities, and competitive pricing ($500/year for 5.1B tokens). Enables developers to create reasoning agents, visual document processing, and codebase analysis workflows without external vision models.

Introducing GPT-Live

OpenAI Blog · 3d ago · 8 · new model tool api update

OpenAI released advanced voice models that enable natural speech-based interaction with ChatGPT, supporting real-time conversation with improved naturalness and responsiveness. This represents a significant tool update for AI engineers building voice-enabled applications and multimodal interfaces.

Gepard : 0.6B streaming TTS built for real-time dialogue - 20× realtime factor, ~50ms time-to-first-audio, vLLM-native, Apache 2.0

r/LocalLLaMA · 4d ago · 8 · new model tool inference api update

Gepard-1.0 is a streaming text-to-speech model optimized for real-time dialogue and voice agents, built on Qwen3-0.8B with NVIDIA NanoCodec for low-latency audio generation. The model generates speech incrementally as text arrives, delivering natural prosody and supporting zero-shot voice cloning, making it practical for conversational AI applications where latency matters more than perfect speaker matching.

sqlite-utils 4.0rc4

Simon Willison · 4d ago · 6 · api update

Release candidate for an upcoming 4.0 stable version incorporating Claude Fable 5 feedback. While potentially relevant for tracking dependency updates, the article lacks technical specifics about what features or improvements were implemented.

Vibe-Research — Vibe-Research: Your Personal Trading Research Agent · A股/美股/港股的个人投研 Agent：每日复盘、资讯雷达、个股数据、板块中心、我的持仓、研究记录。Vibe-Research 把数据和功能配齐，由你自己的 AI 驱动投资研究。

GitHub Trending AI · 6d ago · 6 · open source agent api update tool

Vibe-Research is an open-source AI-powered investment research dashboard for Chinese stocks that integrates market data, financial reports, and news feeds with pluggable AI models (Claude, DeepSeek, Qwen, etc.) via API or MCP server. Software engineers building AI applications can leverage this as a reference architecture for data aggregation, multi-source integration, and AI agent interfaces, though the trading domain may have limited direct applicability.

The DeepSWE benchmark was runned rather incompetently and the results are completely invalid

r/LocalLLaMA · 37d ago · 7 · benchmark inference api update

Deep technical analysis exposing critical measurement errors in the DeepSWE benchmark for code generation tasks: cache pricing is inflated ~5x (billing cache hits at miss rates), and deepseek-v4-pro lacks effort-level tuning compared to competing models. The authors demonstrate solving all three failing tasks at ~$0.86 total cost versus the reported $4.22, highlighting real-world performance/cost discrepancies crucial for engineers evaluating AI models on benchmarks.

Dreaming: Better memory for a more helpful ChatGPT

OpenAI Blog · 37d ago · 6 · api update workflow

ChatGPT's memory feature allows the model to retain user preferences and context across separate conversations, reducing the need to re-establish context. This is a workflow improvement for developers building ChatGPT-based applications, though the technical implementation details and API implications for custom integrations remain unclear.

Best Visual Reasoning Model in 2026 (Including APIs) [D]

r/MachineLearning · 37d ago · 6 · benchmark api update

Discussion exploring which AI models handle long-form video understanding and complex reasoning tasks effectively. Covers practical considerations for video input handling and reasoning capabilities across different model providers.

Designing the hf CLI as an agent-optimized way to work with the Hub

HuggingFace Blog · 37d ago · 8 · tool workflow api update agent

Hugging Face rebuilt its CLI to optimize for both human users and coding agents (Claude Code, Codex, Cursor), with auto-detection via environment variables that switches output formatting between human-readable (colored tables, progress bars) and agent-optimized (compact TSV, no ANSI codes). Benchmarks show the optimized CLI uses 6× fewer tokens than agents manually using curl or Python SDK for multi-step tasks.

Introducing new capabilities to GPT-Rosalind

OpenAI Blog · 38d ago · 7 · new model api update

GPT-Rosalind is a specialized model variant with enhanced capabilities for biological reasoning, medicinal chemistry, genomics, and experimental workflows. This represents a domain-specific model extension relevant for engineers building life sciences AI applications and needing specialized reasoning in these technical areas.

Uber Caps Usage of AI Tools Like Claude Code to Manage Costs

Simon Willison · 38d ago · 6 · agent deployment api update

Uber has implemented per-tool monthly token spending caps ($1,500/employee) for agentic coding tools like Claude Code and Cursor to manage AI costs. The analysis reveals practical insights about enterprise AI tool economics—with the caps representing ~11% of median engineer compensation—and reflects real industry patterns of token cost management as AI coding agents become standard infrastructure.

[AINews] Microsoft Build: MAI-Thinking-1 and MAI Family models

Latent Space · 38d ago · 9 · new model research api update agent inference

Microsoft announced 7 new MAI models including the flagship MAI-Thinking-1 reasoning model with a comprehensive 109-page technical report emphasizing clean data lineage and zero third-party distillation. The release covers reasoning, code, image, speech, and voice models, positioning Microsoft as both a platform and frontier lab, with additional launches around local AI, Windows agent infrastructure, and Web IQ APIs for grounding.