Latent Space · 1d ago · 7 · workflow deployment agent research

Shopify's CTO discusses internal AI infrastructure including Tangle (reproducible ML workflows), Tangent (auto-research optimization), and SimGym (customer behavior simulation), with practical insights on code review bottlenecks, deployment stability, and why AI coding's real constraint is now validation/deployment rather than generation.

r/MachineLearning · 1d ago · 7 · open source tool deployment

Open-source GPU pricing catalog that automatically aggregates real-time data from 20+ cloud providers, covering 50 GPU models and 2K+ offerings with spot and on-demand pricing. Useful infrastructure tool for engineers optimizing cloud costs and managing GPU resource allocation across multiple providers.

Simon Willison · 1d ago · 9 · new model open source inference benchmark

Qwen3.6-27B is a new 27B dense model claiming flagship-level coding performance while being 15x smaller than its predecessor (55.6GB vs 807GB), with practical demonstration of local inference using GGUF quantization and llama.cpp achieving strong coding generation at reasonable token throughput.

HuggingFace Blog · 1d ago · 7 · tutorial deployment open source tool

Tutorial for building a multimodal Voice Language Agent (VLA) with Gemma 4 on Jetson Orin Nano, enabling autonomous vision and audio interaction without hardcoded triggers. Covers practical setup with llama.cpp native compilation, STT/TTS integration via Hugging Face, and memory optimization techniques for edge deployment.

r/LocalLLaMA · 1d ago · 8 · new model inference open source deployment

Qwen3.6-27B open-weight model release with 262K context length, optimized for coding and real-world applications. Includes deployment guides for SGLang, vLLM, and other inference frameworks with support for tool use and multi-token prediction.

r/MachineLearning · 1d ago · 7 · benchmark inference tool

Discussion of a practical TTS benchmark that evaluates streaming text-to-speech models on real-world failure cases like dates, URLs, and phone numbers using 1000+ test sentences and Gemini evaluation. Identifies a genuine production challenge in TTS systems where models succeed on naturalness but fail on structured data normalization.

DeepMind Blog · 1d ago · 8 · research deployment inference

DiLoCo introduces a distributed training architecture that decouples compute into asynchronous "islands" across distant data centers, dramatically reducing bandwidth requirements while improving hardware resilience. The system maintains training efficiency during chip failures and reintegrates failed nodes seamlessly, demonstrated on Gemma 4 models with comparable performance to traditional tightly-coupled training.

OpenAI Blog · 1d ago · 7 · agent workflow tutorial

Guide on building workspace agents in ChatGPT to automate workflows and integrate tools for team operations. Covers practical implementation of agent patterns for connecting external tools and scaling automation across teams.

OpenAI Blog · 1d ago · 8 · agent inference workflow

Technical breakdown of optimization patterns in the Codex agent loop using WebSockets for persistent connections and connection-scoped caching to reduce API overhead and improve model latency. Practical architectural insights for engineers building with AI agents and managing inference performance at scale.

OpenAI Blog · 1d ago · 6 · agent workflow api update

ChatGPT Workspace agents are cloud-based automation tools powered by Codex that handle multi-step workflows across integrated applications. This is relevant for engineers building AI workflows, though the details on actual capabilities, API integration patterns, and security architecture would determine practical value for daily development.

r/MachineLearning · 1d ago · 8 · tool inference open source deployment

Researcher shipped Spiral, a model compression tool using INT3 quantization (+0.14 nats) and custom 2-bit KV cache optimization with fused Metal kernels for M-series Macs. Includes Qwen 7B preview model, with Triton GPU kernels in development—directly applicable for engineers optimizing inference on consumer hardware.

Simon Willison · 1d ago · 7 · api update agent deployment

GitHub Copilot is restructuring pricing and usage limits due to agentic workflows consuming significantly more compute than originally anticipated, shifting from per-request to token-based pricing with restrictions on individual plans. This reflects the real infrastructure costs of AI agents in production and impacts developers using Copilot's expanding agentic capabilities across IDE integrations and CLI tools.

r/MachineLearning · 1d ago · 7 · benchmark inference deployment

Discussion on evaluating quantization impact for DeepSeek V3.2, covering practical benchmark selection for measuring quality degradation from runtime quantization. Relevant for engineers deploying quantized models in production and optimizing inference performance vs. accuracy tradeoffs.

Simon Willison · 1d ago · 6 · api update workflow deployment

Anthropic briefly tested moving Claude Code from the $20/month Pro plan to exclusive availability on $100+/month Max plans, sparking community backlash. The change was quickly reverted, but the incident reveals product strategy shifts around AI coding agent features and competitive positioning against OpenAI's Codex offerings.

Latent Space · 1d ago · 9 · new model api update benchmark agent

OpenAI released GPT-Image-2, a major image generation model now available via API and ChatGPT with significant improvements in text rendering, layout consistency, and multilingual support. The model achieves #1 on Arena leaderboards with a +242 Elo lead on text-to-image tasks and introduces thinking variants that enable web search and self-checking capabilities, positioning image generation as a front-end interface for coding agents.

OpenAI Blog · 1d ago · 8 · new model open source tool

OpenAI released an open-weight model specifically designed to detect and redact PII from text with high accuracy, useful for building privacy-preserving applications and data pipelines. This tool directly addresses a common engineering challenge when working with user data and LLMs.

r/LocalLLaMA · 2d ago · 8 · new model tool agent open source

Granite-4.1-8B is a new 8B parameter instruction-tuned model with enhanced tool-calling capabilities, multilingual support (12 languages), and improved post-training via SFT and RL alignment. The model is designed for AI assistants and LLM agents with function-calling abilities, making it relevant for engineers building agentic systems and tool-integrated applications.