HuggingFace Blog · 10h ago · 7 · tool dataset tutorial agent deployment

NVIDIA released Nemotron-Personas-Korea, a synthetic dataset of 6M demographically-accurate Korean personas (zero PII) for grounding multilingual agents with cultural and contextual accuracy. The tutorial demonstrates deploying a Korean-aware agent using the dataset with NeMo Data Designer, NIM inference, or NVIDIA APIs—useful for engineers building localized AI systems.

Latent Space · 10h ago · 8 · new model agent open source benchmark inference deployment

Moonshot's Kimi K2.6 (1T MoE, 32B active) released with strong open-source coding benchmarks (58.6% SWE-Bench Pro) and novel long-horizon execution capabilities (4,000+ tool calls, 300 parallel sub-agents, 'Claw Groups' for multi-agent coordination). Alibaba's Qwen3.6-Max-Preview preview also landed with improvements to agentic coding and reasoning stability, with both models gaining immediate deployment support across vLLM, OpenRouter, and other inference platforms.

r/MachineLearning · 18h ago · 8 · open source inference research library

Two open-source implementations of KV-cache compaction techniques for long-context inference: Cartridges (corpus-specific compressed caches) and STILL (neural KV-cache compaction with reusable compression). Both repos include benchmark comparisons against baselines and readable code, making recent research directly applicable to production inference optimization.

Latent Space · 18h ago · 6 · new model tool research deployment

Noetik's TARIO-2 model uses AI to predict high-resolution spatial transcriptomics from standard H&E histology slides, enabling better patient-treatment matching in oncology—GSK signed a $50M deal for this platform approach. The technical innovation involves training an autoregressive transformer on large tumor spatial transcriptomics datasets to predict ~19,000-gene maps, potentially improving clinical trial success rates by better matching patients to existing treatments rather than discovering new drugs.

r/LocalLLaMA · 19h ago · 7 · new model open source agent api update deployment

Kimi K2.6 is a new open-source multimodal agentic model with native int4 quantization, supporting long-horizon coding, video/image understanding, and autonomous task execution. The model is available via OpenAI/Anthropic-compatible APIs on the Moonshot platform, with deployment guides for vLLM/SGLang and new features like preserve_thinking mode for enhanced agent reasoning.

r/MachineLearning · 1d ago · 8 · workflow inference tool tutorial

A practitioner discusses the shift from C++/CuTe/CUTLASS template metaprogramming to NVIDIA's newer CuTeDSL Python DSL for GPU kernel development, questioning whether newcomers should learn legacy C++ or adopt the newer stack (CuTeDSL + Triton + Mojo) for LLM inference optimization work. This reflects real ecosystem changes in kernel engineering for projects like FlashAttention, FlashInfer, and SGLang, with implications for skill prioritization and hiring.

r/MachineLearning · 1d ago · 7 · open source tool dataset agent

Developer released SGOCR, an open-source dataset pipeline for generating spatially-grounded OCR-focused VQA data with rich metadata for training vision-language models. The project details a practical multi-stage architecture using Nvidia's nemotron-ocr-v2, Gemma/Qwen models, and Gemini 2.5 Flash for verification, plus an agentic optimization loop inspired by Karpathy's autoresearch for dataset quality improvement.

r/MachineLearning · 1d ago · 8 · agent deployment open source tool

Open-source runtime monitoring system for production AI agents that scores risk across five dimensions (action type, resource sensitivity, blast radius, frequency, context deviation) to detect failure modes like unintended actions, PII leaks, and runaway loops. Addresses critical gap between agent demos and production deployment with real-time behavioral guardrails.

r/LocalLLaMA · 1d ago · 5 · inference deployment

SK hynix is mass-producing 192GB SOCAMM2 LPDDR5X memory modules optimized for AI servers, offering 2x bandwidth and 75% better power efficiency than traditional RDIMM. The article argues memory bandwidth is becoming the critical bottleneck in AI infrastructure scaling, particularly for training workloads, with these modules co-engineered for NVIDIA's upcoming platforms.

Simon Willison · 1d ago · 7 · api update tool inference

Claude Opus 4.7 introduces a new tokenizer that increases token consumption by 1.46x for text and 3.01x for high-resolution images compared to Opus 4.6, despite identical pricing—effectively making the model ~40% more expensive per task. The author's upgraded token counter tool now enables side-by-side comparisons across Claude models (Opus 4.7/4.6, Sonnet 4.6, Haiku 4.5) to help engineers assess cost implications of the new tokenizer.

Simon Willison · 1d ago · 7 · api update agent workflow

Headless APIs are becoming the preferred interface for personal AI agents rather than GUI automation, with Salesforce exposing its entire platform through APIs and MCP protocols. This architectural shift enables agents to access data and workflows directly without browser automation, fundamentally changing how SaaS platforms should be designed for AI integration.

r/MachineLearning · 1d ago · 5 · research

A philosophical essay on developing a rigorous science of deep learning and foundation models, discussing scientific methodology and how to systematically understand complex ML systems. While conceptually interesting for ML engineers, it's primarily theoretical discussion about the philosophy of science rather than practical technical guidance or concrete tools.

r/MachineLearning · 1d ago · 8 · research open source benchmark

Curated list of ~1,200 ICLR 2026 accepted papers with publicly available code, data, or demos (22% of total papers). Direct links to implementations across GitHub and official repositories provide immediate access to reproducible research for exploring cutting-edge ML techniques.

r/MachineLearning · 1d ago · 7 · agent workflow deployment

A technical discussion distinguishing between reactive agent harnesses and truly autonomous agent runtime environments, questioning whether current infrastructure (LangChain, etc.) supports persistent, self-managing agents with heartbeats, self-healing, and long-term memory. The post identifies a potential gap between execution frameworks and operational infrastructure needed for continuous autonomous systems.

r/MachineLearning · 1d ago · 6 · deployment workflow

A discussion on Reddit about a subtle failure mode in production AI systems where formally correct outputs become contextually wrong when underlying assumptions shift—not a technical failure, but a structural one where governance and monitoring reinforce outdated decision frameworks. This identifies the 'Formalisation Trap' as a distinct operational problem that requires rethinking system design beyond traditional controls.