Simon Willison · 7h ago · 7 · api update agent deployment

GitHub Copilot is restructuring pricing and usage limits due to agentic workflows consuming significantly more compute than originally anticipated, shifting from per-request to token-based pricing with restrictions on individual plans. This reflects the real infrastructure costs of AI agents in production and impacts developers using Copilot's expanding agentic capabilities across IDE integrations and CLI tools.

r/MachineLearning · 7h ago · 7 · benchmark inference deployment

Discussion on evaluating quantization impact for DeepSeek V3.2, covering practical benchmark selection for measuring quality degradation from runtime quantization. Relevant for engineers deploying quantized models in production and optimizing inference performance vs. accuracy tradeoffs.

Simon Willison · 8h ago · 6 · api update workflow deployment

Anthropic briefly tested moving Claude Code from the $20/month Pro plan to exclusive availability on $100+/month Max plans, sparking community backlash. The change was quickly reverted, but the incident reveals product strategy shifts around AI coding agent features and competitive positioning against OpenAI's Codex offerings.

Latent Space · 10h ago · 9 · new model api update benchmark agent

OpenAI released GPT-Image-2, a major image generation model now available via API and ChatGPT with significant improvements in text rendering, layout consistency, and multilingual support. The model achieves #1 on Arena leaderboards with a +242 Elo lead on text-to-image tasks and introduces thinking variants that enable web search and self-checking capabilities, positioning image generation as a front-end interface for coding agents.

r/LocalLLaMA · 13h ago · 8 · new model tool agent open source

Granite-4.1-8B is a new 8B parameter instruction-tuned model with enhanced tool-calling capabilities, multilingual support (12 languages), and improved post-training via SFT and RL alignment. The model is designed for AI assistants and LLM agents with function-calling abilities, making it relevant for engineers building agentic systems and tool-integrated applications.

Simon Willison · 14h ago · 7 · new model api update tool

OpenAI released ChatGPT Images 2.0, their latest image generation model with significant improvements over the previous version. The article includes practical testing methodology, code examples using the OpenAI Python client library, and demonstrates the model's capability through a Where's Waldo-style image generation task with quality and resolution comparisons.

r/MachineLearning · 14h ago · 8 · open source fine tuning inference deployment benchmark

Chaperone-Thinking-LQ-1.0 is an open-source quantized reasoning model (4-bit GPTQ + QAT + QLoRA fine-tuning on medical/scientific data) that achieves 84% on MedQA while fitting on a single L40 GPU with 1.6x speedup over base DeepSeek-R1-32B. Directly addresses on-premises deployment constraints for enterprise healthcare with strict data sovereignty requirements.

r/MachineLearning · 17h ago · 7 · open source tutorial research

Engineer implemented a discrete diffusion language model from scratch on MacBook M2 without AI code generation assistance, training on Shakespeare dataset with 7.5M parameters. The project demonstrates hands-on learning of diffusion mechanisms, tokenization, and encoder-decoder architectures with open-source implementation shared on GitHub.

HuggingFace Blog · 1d ago · 7 · benchmark research open source

QIMMA is a new Arabic LLM evaluation platform that validates benchmark quality before model evaluation, addressing systematic issues in existing Arabic benchmarks like translation artifacts and annotation inconsistencies. The project consolidates 52,000+ samples across 14 benchmarks with a rigorous multi-stage validation pipeline and releases code/outputs publicly, making it a valuable resource for anyone building or evaluating Arabic language models.

r/LocalLLaMA · 1d ago · 7 · tool deployment open source inference

Open WebUI Desktop is a native application that allows engineers to run LLMs locally via llama.cpp or connect to remote Open WebUI servers without Docker or terminal setup. It provides offline-capable inference with privacy guarantees and supports switching between local and remote model connections.

HuggingFace Blog · 1d ago · 7 · tool dataset tutorial agent deployment

NVIDIA released Nemotron-Personas-Korea, a synthetic dataset of 6M demographically-accurate Korean personas (zero PII) for grounding multilingual agents with cultural and contextual accuracy. The tutorial demonstrates deploying a Korean-aware agent using the dataset with NeMo Data Designer, NIM inference, or NVIDIA APIs—useful for engineers building localized AI systems.

Latent Space · 1d ago · 8 · new model agent open source benchmark inference deployment

Moonshot's Kimi K2.6 (1T MoE, 32B active) released with strong open-source coding benchmarks (58.6% SWE-Bench Pro) and novel long-horizon execution capabilities (4,000+ tool calls, 300 parallel sub-agents, 'Claw Groups' for multi-agent coordination). Alibaba's Qwen3.6-Max-Preview preview also landed with improvements to agentic coding and reasoning stability, with both models gaining immediate deployment support across vLLM, OpenRouter, and other inference platforms.

HuggingFace Blog · 1d ago · 7 · agent research prompt engineering

Mythos demonstrates that AI vulnerability detection requires not just frontier models but system-level architecture combining code analysis, vulnerability detection, and patch generation. The article explores how agentic AI systems can autonomously find and patch software vulnerabilities, and argues that open-source ecosystems may be more resilient than closed-source approaches as AI cybersecurity capabilities proliferate.

OpenAI Blog · 1d ago · 5 · api update deployment

OpenAI announced Codex Labs initiative with enterprise partnerships to facilitate Codex deployment at scale, reaching 4M weekly active users. While the user growth metric is noteworthy, this is primarily a business/partnership announcement rather than a technical release or capability update.

Simon Willison · 1d ago · 6 · api update tool inference

Simon Willison demonstrates accessing Kimi 2.6 through OpenRouter API and showcases the model's capability to generate interactive HTML/JavaScript visualizations. The post includes a transcript and highlights practical integration of a new model variant through existing API infrastructure.

r/MachineLearning · 1d ago · 8 · open source inference research library

Two open-source implementations of KV-cache compaction techniques for long-context inference: Cartridges (corpus-specific compressed caches) and STILL (neural KV-cache compaction with reusable compression). Both repos include benchmark comparisons against baselines and readable code, making recent research directly applicable to production inference optimization.

Latent Space · 1d ago · 6 · new model tool research deployment

Noetik's TARIO-2 model uses AI to predict high-resolution spatial transcriptomics from standard H&E histology slides, enabling better patient-treatment matching in oncology—GSK signed a $50M deal for this platform approach. The technical innovation involves training an autoregressive transformer on large tumor spatial transcriptomics datasets to predict ~19,000-gene maps, potentially improving clinical trial success rates by better matching patients to existing treatments rather than discovering new drugs.

r/LocalLLaMA · 1d ago · 7 · new model open source agent api update deployment

Kimi K2.6 is a new open-source multimodal agentic model with native int4 quantization, supporting long-horizon coding, video/image understanding, and autonomous task execution. The model is available via OpenAI/Anthropic-compatible APIs on the Moonshot platform, with deployment guides for vLLM/SGLang and new features like preserve_thinking mode for enhanced agent reasoning.