Simon Willison · 3d ago · 6 · tool library deployment

A new Datasette plugin enables spending limit controls for LLM usage, integrating with datasette-llm and datasette-llm-accountant to manage per-user or global cost caps. This addresses practical cost management for developers building LLM applications within Datasette environments.

Latent Space · 3d ago · 7 · tool agent api update workflow

GitHub and OpenAI released significant updates to coding agent tooling: GitHub's new Copilot App provides an agent-first desktop environment for parallel workflows, while OpenAI expanded Codex into mobile with remote execution, SSH management, and programmatic automation hooks. VS Code added multi-agent/multi-project support with browser/mobile access via vscode.dev/agents and token-efficiency features.

OpenAI Blog · 3d ago · 5 · workflow api update

Article describes using Codex (OpenAI's code model) to automate documentation generation for data science workflows, converting raw work inputs into structured business outputs like briefs and analytics specs. Practical for engineers integrating LLMs into data pipelines, though focuses more on business process automation than novel technical implementation.

r/MachineLearning · 3d ago · 6 · research inference

This paper introduces reference-guided flow matching, a technique that leverages mean trajectories to improve generative model training and sampling efficiency. While technically interesting for diffusion model research, it's primarily a theoretical contribution that may be relevant for engineers building advanced generative systems rather than those in immediate production use.

r/LocalLLaMA · 3d ago · 8 · inference benchmark research optimization

TurboQuant is a KV-cache quantization method that compresses to 3-4 bits during storage and dequantizes to BF16 for attention computation, offering significant GPU memory savings. This comprehensive benchmark study evaluates TurboQuant variants against FP8 baselines across four large models (30B-200B+) and realistic workloads, providing practical guidance for inference optimization and memory efficiency tradeoffs.

OpenAI Blog · 3d ago · 5 · deployment workflow

Sea Limited is adopting Codex (OpenAI's code generation model) to accelerate development across engineering teams in Asia. The piece discusses deployment strategy and organizational workflow changes for AI-assisted coding, relevant for understanding enterprise adoption patterns of code generation tools.

HuggingFace Blog · 3d ago · 8 · new model open source rag deployment inference

Granite Embedding Multilingual R2 releases two new multilingual embedding models (97M and 311M parameters) supporting 200+ languages with 32K token context length and enhanced retrieval for 52 languages plus code. Both models ship with ONNX/OpenVINO optimization, work out-of-the-box with sentence-transformers and major RAG frameworks (LangChain, LlamaIndex, Haystack, Milvus), and are Apache 2.0 licensed—enabling drop-in replacement for language coverage at minimal performance cost.

r/LocalLLaMA · 3d ago · 6 · tool workflow api update

VS Code's AI Toolkit extension now supports agent-first development with configurable language models optimized for different tasks, including reasoning models with adjustable thinking effort levels. The article covers model selection strategies (fast vs. reasoning models), tool-calling support for agents, and how to configure API keys for custom models.

OpenAI Blog · 3d ago · 6 · api update tool workflow

OpenAI's Codex integration in the ChatGPT mobile app enables remote code generation and task monitoring across devices. This expands practical access to AI-assisted coding workflows beyond desktop environments, useful for developers managing remote infrastructure or mobile-first development pipelines.

r/MachineLearning · 3d ago · 7 · workflow tutorial research

A practitioner shares a real-world time series anomaly detection challenge: building failure prediction for IoT chargers with sparse positive labels (~1-2%), variable data rates between operational modes, and high device heterogeneity. They're exploring architectural solutions (dual RNN encoders vs. data-level sampling) and seeking advice on handling extreme class imbalance in time series forecasting.

Simon Willison · 4d ago · 6 · tool workflow deployment

Simon Willison describes using GPT-5.5 to generate a configurable rate-limiting plugin for handling crawler traffic on datasette.io. The post provides practical insights into using LLMs for DevOps/infrastructure automation and production deployment patterns.

r/MachineLearning · 4d ago · 8 · agent research workflow

Research paper on Continual Harness demonstrates how foundation models can autonomously refine their own execution harnesses through iterative self-improvement, demonstrated via Gemini completing Pokémon games without losses. The work formalizes the agent-harness co-learning loop and shows that self-refinement capabilities are critical for long-horizon task completion, with implications for building more autonomous AI systems.

OpenAI Blog · 4d ago · 5 · api update

OpenAI has implemented safety updates to ChatGPT that improve contextual understanding of sensitive conversations and risk detection patterns. While the safety mechanisms are interesting from an AI safety perspective, the practical technical details and implementation methods are not disclosed, limiting direct applicability for engineers building with AI.

HuggingFace Blog · 4d ago · 8 · inference workflow tutorial

This article explains how to optimize LLM inference performance by decoupling CPU and GPU workloads through asynchronous batching, eliminating idle gaps that waste ~24% of runtime in synchronous approaches. The post builds on continuous batching concepts and provides practical profiling techniques to measure and improve GPU utilization, critical for managing high inference costs on hardware like H200s.

r/MachineLearning · 4d ago · 8 · research fine tuning tutorial open source

Engineer trained rating-conditioned transformer chess models (9M parameters) on 1B Lichess games, achieving MAIA-3 parity with novel additions: thinking time prediction and clock-aware win probability models. The technical work emphasizes data pipeline optimization (C++ preprocessing + sequential shuffling for GPU efficiency) and demonstrates how small models can match larger baselines through careful training setup and conditioning on player/time context.

Anthropic Blog · 4d ago · 6 · api update workflow agent

Anthropic launched Claude for Small Business, a package of pre-built agentic workflows and connectors that integrate Claude into tools like QuickBooks, HubSpot, and Google Workspace for small business automation tasks. The offering includes 15 ready-to-run workflows across finance, sales, and operations, plus emphasis on data security and AI training partnerships.

r/MachineLearning · 4d ago · 8 · open source tool inference deployment api update

Scenema Audio releases open-source diffusion-based TTS model weights and inference code that decouples emotional performance from voice identity through prompt-based control. Key technical advantages include more natural emotional delivery than autoregressive TTS, support for audio-first video generation workflows, optimized diffusion (8 steps), and Docker/REST API deployment with automatic VRAM management. Practical trade-offs noted: stochastic quality requiring post-editing workflow, sensitivity to detailed prompting, and phonetic spelling for complex words.

r/MachineLearning · 4d ago · 6 · fine tuning tool workflow

A team building synthetic data generation for document understanding (PDFs, forms with PII) seeks feedback on output formats (FUNSD, BIO, YOLO, Donut, COCO) and distribution methods (PyPI SDK vs API vs zip files). This is relevant for engineers working on document processing pipelines and fine-tuning models on structured data, though it's primarily a community discussion rather than a technical resource.