r/MachineLearning · 7h ago · 8 · workflow inference tool tutorial

A practitioner discusses the shift from C++/CuTe/CUTLASS template metaprogramming to NVIDIA's newer CuTeDSL Python DSL for GPU kernel development, questioning whether newcomers should learn legacy C++ or adopt the newer stack (CuTeDSL + Triton + Mojo) for LLM inference optimization work. This reflects real ecosystem changes in kernel engineering for projects like FlashAttention, FlashInfer, and SGLang, with implications for skill prioritization and hiring.

r/MachineLearning · 8h ago · 7 · open source tool dataset agent

Developer released SGOCR, an open-source dataset pipeline for generating spatially-grounded OCR-focused VQA data with rich metadata for training vision-language models. The project details a practical multi-stage architecture using Nvidia's nemotron-ocr-v2, Gemma/Qwen models, and Gemini 2.5 Flash for verification, plus an agentic optimization loop inspired by Karpathy's autoresearch for dataset quality improvement.

r/MachineLearning · 9h ago · 8 · agent deployment open source tool

Open-source runtime monitoring system for production AI agents that scores risk across five dimensions (action type, resource sensitivity, blast radius, frequency, context deviation) to detect failure modes like unintended actions, PII leaks, and runaway loops. Addresses critical gap between agent demos and production deployment with real-time behavioral guardrails.

r/LocalLLaMA · 11h ago · 5 · inference deployment

SK hynix is mass-producing 192GB SOCAMM2 LPDDR5X memory modules optimized for AI servers, offering 2x bandwidth and 75% better power efficiency than traditional RDIMM. The article argues memory bandwidth is becoming the critical bottleneck in AI infrastructure scaling, particularly for training workloads, with these modules co-engineered for NVIDIA's upcoming platforms.

Simon Willison · 11h ago · 7 · api update tool inference

Claude Opus 4.7 introduces a new tokenizer that increases token consumption by 1.46x for text and 3.01x for high-resolution images compared to Opus 4.6, despite identical pricing—effectively making the model ~40% more expensive per task. The author's upgraded token counter tool now enables side-by-side comparisons across Claude models (Opus 4.7/4.6, Sonnet 4.6, Haiku 4.5) to help engineers assess cost implications of the new tokenizer.

Simon Willison · 14h ago · 7 · api update agent workflow

Headless APIs are becoming the preferred interface for personal AI agents rather than GUI automation, with Salesforce exposing its entire platform through APIs and MCP protocols. This architectural shift enables agents to access data and workflows directly without browser automation, fundamentally changing how SaaS platforms should be designed for AI integration.

r/MachineLearning · 16h ago · 5 · research

A philosophical essay on developing a rigorous science of deep learning and foundation models, discussing scientific methodology and how to systematically understand complex ML systems. While conceptually interesting for ML engineers, it's primarily theoretical discussion about the philosophy of science rather than practical technical guidance or concrete tools.

r/MachineLearning · 20h ago · 8 · research open source benchmark

Curated list of ~1,200 ICLR 2026 accepted papers with publicly available code, data, or demos (22% of total papers). Direct links to implementations across GitHub and official repositories provide immediate access to reproducible research for exploring cutting-edge ML techniques.

r/MachineLearning · 21h ago · 7 · agent workflow deployment

A technical discussion distinguishing between reactive agent harnesses and truly autonomous agent runtime environments, questioning whether current infrastructure (LangChain, etc.) supports persistent, self-managing agents with heartbeats, self-healing, and long-term memory. The post identifies a potential gap between execution frameworks and operational infrastructure needed for continuous autonomous systems.

r/MachineLearning · 21h ago · 6 · deployment workflow

A discussion on Reddit about a subtle failure mode in production AI systems where formally correct outputs become contextually wrong when underlying assumptions shift—not a technical failure, but a structural one where governance and monitoring reinforce outdated decision frameworks. This identifies the 'Formalisation Trap' as a distinct operational problem that requires rethinking system design beyond traditional controls.

r/MachineLearning · 1d ago · 7 · fine tuning prompt engineering workflow rag

A practical technical discussion on converting XQuery to SQL using local LLMs with limited training data (~110-120 samples), comparing parsing, prompt-engineering, and fine-tuning (QLoRA with Qwen2.5-Coder 7B) approaches. The post identifies key challenges like query sensitivity and missing conditions, directly relevant for engineers building AI solutions with constrained resources in enterprise environments.

Simon Willison · 1d ago · 8 · prompt engineering workflow research

Anthropic published system prompt changes between Claude Opus 4.6 and 4.7, revealing important instruction updates around tool usage, task completion, and response handling. The changes show evolved guidance on when Claude should use tools to resolve ambiguity before asking users, when to ask clarifying questions, and refined behavioral guidelines around disclaimers and specific sensitive topics like eating disorders.

r/MachineLearning · 1d ago · 8 · fine tuning deployment workflow tutorial

ML team documents critical issues and workarounds for fine-tuning and deploying Gemma-4 with PEFT and TRL, including problems with custom layer compatibility, KV-sharing attention, DeepSpeed ZeRO-3 adapter corruption, and runtime LoRA serving limitations. Provides practical fixes like unwrapping custom layers before PEFT, upgrading transformers to v5.5.2+, and manual weight merging for deployment.

r/MachineLearning · 1d ago · 8 · open source library tool workflow

easyaligner is a new open-source forced alignment library built for speech-to-text preprocessing that handles practical pain points like partial transcripts, long audio segments without chunking, and text normalization with format recovery. It leverages PyTorch's forced alignment API with GPU-optimized Viterbi algorithm and supports any language with wav2vec2 models on Hugging Face Hub, achieving 35-102% faster transcription than WhisperX.

Simon Willison · 1d ago · 7 · prompt engineering tool open source

Anthropic publicly released system prompts for Claude models as Markdown, which Simon Willison converted into version-tracked files using Claude Code to enable easy comparison. This provides valuable transparency into how Claude's behavior is shaped across model versions, with detailed notes on changes between Opus 4.6 and 4.7 for understanding prompt engineering decisions.

Ahead of AI · 2d ago · 7 · workflow tutorial open source

A practical workflow guide for reverse-engineering and understanding LLM architectures by inspecting official reports, Hugging Face model configs, and transformers library implementations. The author emphasizes learning through manual analysis of open-weight models rather than relying on proprietary documentation, making it valuable for engineers who want to deeply understand model design patterns.

Latent Space · 2d ago · 7 · new model api update tool benchmark

Anthropic released Claude Opus 4.7 with improved coding/reasoning capabilities and introduced Claude Design, a new design prototyping tool competing with Figma/Bolt/v0. The update shows strong benchmark performance (ranked #1 in Code Arena, 57.3 on Intelligence Index) with ~35% token efficiency gains, though initial rollout had stability issues that were quickly patched.