HuggingFace Blog · 11h ago · 9 · new model research inference benchmark

Cosmos 3 is a unified multimodal foundation model using Mixture-of-Transformers architecture that combines video generation, scene understanding, reasoning, and policy generation in a single model for physical AI applications like robotics and autonomous vehicles. The architecture supports text, image, video, audio, and action modalities through shared representation with separate autoregressive and diffusion pathways, eliminating the need to juggle multiple specialized models.

r/MachineLearning · 14h ago · 6 · research

A discussion post asking about current academic research directions in world models and self-supervised learning, noting a shift from methods like Barlow Twins/DINO toward scaled video generation. While this reflects genuine technical evolution in representation learning, it's community commentary rather than a concrete technical resource, tool, or research paper.

r/LocalLLaMA · 15h ago · 9 · new model benchmark agent open source

M3 is a new open-weight frontier model combining 1M-token context via proprietary Sparse Attention, native multimodality, and world-leading coding/agentic capabilities—demonstrated through autonomous task execution including ICLR paper reproduction and GPU kernel optimization without human intervention. The model achieves top benchmarks on coding tasks, agent-based browsing (83.5 on BrowseComp), and multi-step reasoning, making it directly relevant for building AI assistants and automated workflows.

r/MachineLearning · 19h ago · 6 · tutorial workflow

A software engineer is troubleshooting convergence issues with a Conformer-based ASR model trained on dialectal Arabic speech using SpeechBrain, where combined CTC+KLDiv losses plateau early and validation WER remains near 100% despite multiple hyperparameter adjustments. This represents a practical deep learning debugging challenge relevant to engineers building speech models, though it's a specific problem thread rather than a generalizable technique or tool release.

Simon Willison · 1d ago · 5 · workflow prompt engineering

A thoughtful essay examining how AI coding agents can paradoxically reduce productivity by making project creation frictionless, leading to abandoned side projects and attention fragmentation. The post explores the psychological challenge of managing AI tools' efficiency and includes discussion of how some ADHD users find agents helpful for focus, presenting a nuanced perspective on workflow management with AI.

r/MachineLearning · 1d ago · 6 · workflow benchmark

A software engineer working on computer vision is seeking advice on clustering YOLO detections into groups and predicting strand counts. They've trained a YOLO object detector and XGBoost classifier achieving 70% accuracy, but believe better performance is possible given the constrained problem space (max 8 groups, 3 strands per group). This is a practical computer vision engineering problem discussing detection post-processing and classification approaches.

Simon Willison · 1d ago · 8 · deployment open source agent benchmark

Anthropic published detailed documentation on their sandboxing techniques across Claude products (Claude.ai, Claude Code, Cowork), covering process isolation methods like gVisor, Seatbelt, Bubblewrap, and full VMs. The post explains threat models and constraint strategies for preventing agent escapes and credential exfiltration, plus mentions their open-source srt (Sandbox Runtime) tool for building secure AI applications.

Simon Willison · 1d ago · 7 · tool workflow open source

Datasette Lite now uses Service Workers with Pyodide to run Python ASGI apps in the browser, enabling full JavaScript execution that was previously broken. This approach, developed with Claude Opus 4.8's assistance, allows running full Python web applications like Datasette in WebAssembly without server infrastructure.

r/LocalLLaMA · 1d ago · 7 · new model inference deployment open source

NVIDIA released a quantized version of Alibaba's Qwen3.6-35B model using NVIDIA Model Optimizer, enabling efficient deployment on GPU hardware with a 262K context window and multimodal capabilities. The NVFP4 quantization reduces model size while maintaining performance for AI agents, chatbots, and RAG systems, making it immediately deployable via Hugging Face.

TLDR AI · 2d ago · 6 · agent workflow deployment

Workshop announcement covering orchestration patterns for AI agents using AWS Step Functions, Amazon Bedrock Agents, and Apache Airflow, with focus on production reliability features like retry logic and human-in-the-loop approvals. Targets teams building production-ready agent applications.

TLDR AI · 2d ago · 7 · rag prompt engineering tutorial deployment

A technical guide covering RAG (Retrieval-Augmented Generation) implementation patterns, including code snippets, prompt templates, and production anti-patterns for scaling AI-powered search systems. Provides practical patterns and ready-to-use prompt contracts for building reliable RAG applications.

r/MachineLearning · 2d ago · 7 · research workflow rag

Two ML students question whether robotics has a data scarcity problem or a data interoperability problem, proposing to normalize disparate public robotics datasets into a common schema and evaluate reusability across tasks and embodiments. They're seeking practitioner feedback on whether unified access to standardized robot-learning datasets would actually be useful, or if teams prefer collecting their own data due to embodiment mismatch, quality concerns, and task-specific requirements.

r/LocalLLaMA · 2d ago · 6 · fine tuning research open source

A new fine-tuned model combining Qwen 3.6 27B with reasoning traces for roleplay tasks, experimenting with whether chain-of-thought planning improves character consistency and narrative quality. The model uses DeepSeek-generated thinking traces validated through a judge model, paired with diverse persona training data from the Pantheon series.

r/MachineLearning · 2d ago · 8 · tool open source library workflow

Developer shares NeuralDBG, an open-source PyTorch tool for automatically detecting and localizing training failures by monitoring per-layer gradient norm transitions rather than global loss curves. The key insight is that training failures are typically localized to specific layers, and includes practical code snippets for gradient monitoring that can catch 80% of failures without additional tooling.

r/MachineLearning · 2d ago · 6 · benchmark tool open source

A developer shares a vision classifier model trained on Wikipedia data using Gemini Flash 3.5, benchmarked against PyTorch. The project demonstrates practical use of multimodal AI models for building and evaluating custom vision tasks on Hugging Face.