r/MachineLearning · 8h ago · 7 · agent deployment workflow

A solopreneur building a scope verification service for AI agents shares production logging data showing how permission enforcement differs from IAM—distinguishing between action-not-in-scope and grant_revoked denial modes. The post highlights a real latency tradeoff (~12ms per verify call) and frames agent authorization as a distinct problem from credentials, with a concrete cautionary example from Meta's March 2026 agent incident.

r/MachineLearning · 9h ago · 7 · workflow tutorial

A practitioner asks about best practices for iterative dataset curation and model training with 150k medical images, specifically whether manual verification before each training cycle is the right approach. This touches on practical workflows around annotation quality, active learning, and dataset scaling strategies that are directly applicable to building production computer vision systems.

r/MachineLearning · 12h ago · 6 · api update deployment

Weights & Biases updated their Master Service Agreement with concerning changes to data ownership and usage rights—specifically removing explicit customer ownership statements and adding provisions allowing W&B to use customer data (including ML models and training logs) for product development and AI feature training without clear opt-out mechanisms. For engineers shipping with wandb for experiment tracking and model management, this represents a potential IP and data governance issue worth understanding before the May 11th effective date.

Anthropic Blog · 13h ago · 7 · api update deployment inference

Anthropic has doubled Claude Code rate limits, removed peak hour restrictions for Pro/Max users, and significantly increased Claude API rate limits for Opus models, backed by new compute capacity including a SpaceX partnership providing 220,000+ NVIDIA GPUs. Engineers using Claude API and Code should review the updated rate limits table to understand new quotas for their applications.

r/LocalLLaMA · 14h ago · 8 · new model open source benchmark inference

Zyphra released ZAYA1-8B, a new 8B parameter MoE model trained on AMD hardware that achieves strong performance on reasoning, math, and coding tasks while using <1B active parameters. The model features novel architectural innovations (Compressed Convolutional Attention, MLP-based routing, learned residual scaling) and a Markovian-RSA test-time compute methodology, available as a serverless endpoint on Zyphra Cloud.

HuggingFace Blog · 15h ago · 7 · inference research workflow

Technical deep-dive on migrating vLLM from V0 to V1 for online RL workloads (GSPO/PPO), covering critical fixes for logprob processing, runtime defaults, weight updates, and prefix caching behavior that affected training convergence. The post provides practical debugging methodology for inference engine parity testing in RL systems.

Simon Willison · 20h ago · 7 · workflow prompt engineering agent

Simon Willison discusses the blurring line between 'vibe coding' (non-programmer AI-assisted coding without quality concern) and 'agentic engineering' (professional developers leveraging AI tools while maintaining code standards), noting that as AI coding agents become more reliable, even experienced engineers are reviewing less code while maintaining production quality. The key insight is that modern AI coding tools are enabling engineers to tackle significantly larger scope of challenges while maintaining or improving code quality, fundamentally changing the engineering paradigm.

Latent Space · 1d ago · 6 · agent deployment workflow

Anthropic and OpenAI are launching services companies ($1.5B and $4B funded respectively) to handle enterprise deployment and system integration of AI agents, recognizing that model capability alone isn't sufficient—organizations need help with context management, workflow modernization, and adoption. This signals a shift toward "last-mile" services businesses as AI labs recognize opportunities in the operational work required to integrate agents into real business processes.

r/LocalLLaMA · 1d ago · 8 · open source security inference

Critical vulnerability (CVE-2026-7482, CVSS 9.1) discovered in Ollama enabling unauthenticated memory leaks from the Ollama process, potentially exposing user prompts, system prompts, and environment variables across 300,000+ servers. The article provides technical details on Ollama's API architecture and how the vulnerability works through the /api/create and /api/blobs endpoints.

r/MachineLearning · 1d ago · 7 · research architecture inference

SATFormer introduces a more efficient alternative to recent Transformer variants by replacing static cross-layer pathways with per-token, per-head gating that selectively reuses first-layer representations. The method achieves better efficiency-performance tradeoffs (1.75-1.82× higher throughput than competitors) while improving validation loss at 130M-1.3B scale and showing strong results on retrieval-intensive tasks.

HuggingFace Blog · 1d ago · 6 · benchmark research open source

The Open ASR Leaderboard now includes private high-quality English speech datasets from Appen and DataoceanAI to prevent benchmark gaming while maintaining standardized evaluation metrics. The leaderboard has reached 710K visits since launch and emphasizes the importance of standardization and openness in benchmarking, with optional toggles to see private dataset impact on model performance.

r/MachineLearning · 1d ago · 6 · agent benchmark

League of Robot Runners (LoRR) 2026 is a research competition focused on large-scale multi-robot coordination using ML/RL methods for task scheduling and path planning under uncertainty. The competition provides starter kits in C++/Python, automated evaluation with live leaderboards, and welcomes diverse technical approaches including RL, search, optimization, and hybrid techniques.

Latent Space · 1d ago · 6 · prompt engineering workflow research

Article explores the 'Jagged Frontier' concept where modern LLMs like GPT-5 show dramatic capability improvements at research/science frontiers while appearing incremental for everyday tasks. Features physicist Alex Lapskasky using AI (o3/GPT-5) to accelerate theoretical physics research, reproducing complex papers in minutes through prompt engineering techniques like 'priming' with textbook problems.

Anthropic Blog · 1d ago · 8 · agent tool api update deployment

Anthropic released 10 pre-built agent templates for financial services workflows (pitchbooks, KYC screening, month-end closing) deployable as Claude plugins or managed agents, plus native integrations with Microsoft 365 apps and expanded MCP/connector ecosystem for real-time data access. The templates package skills, data connectors, and subagents as reference architectures that teams can adapt and deploy in days, with Claude Opus 4.7 achieving 64.37% on Vals AI's Finance Agent benchmark.

r/LocalLLaMA · 1d ago · 7 · benchmark inference research

Comprehensive benchmark comparison of Qwen3.6 vs Qwen3.5 27B and Gemma 4 31B across accuracy, latency, and token efficiency metrics, with extended analysis on thinking-enabled modes. Results show Qwen3.6 excels on math/knowledge tasks but underperforms on instruction-following and some reasoning benchmarks, revealing task-specific trade-offs for practitioners choosing between models.

r/MachineLearning · 1d ago · 7 · deployment workflow

A software engineer shares production cost management challenges with LLM APIs, specifically difficulty tracking token usage and costs across features when moving from prototypes to scaled deployments. The core issue is lack of cost attribution granularity—OpenAI dashboards provide total spend but not per-feature breakdown, requiring manual reconciliation that doesn't scale and lacks confidence.

r/MachineLearning · 1d ago · 8 · library open source inference benchmark

TritonSigmoid is an open-source GPU kernel implementing sigmoid attention with native padding awareness, achieving 515 TFLOPS on H100 and outperforming softmax/FlashAttention on variable-length sequences. Designed for single-cell biology models where multi-token attention is semantically required, it demonstrates both computational efficiency and empirical improvements in loss and representation quality across benchmarks.

r/MachineLearning · 1d ago · 7 · fine tuning tool workflow open source

Engineer shares a practical approach using Qwen2-VL-2B-Instruct with LoRA fine-tuning for detecting obfuscated transaction patterns by converting graphs to 2D images and leveraging VLM visual understanding—demonstrates an interesting workflow alternative to standard GNNs, includes published LoRA weights and synthetic dataset methodology on AMD/ROCm hardware.