News Nug

Talos-XII: hand-written autograd + small RL/MLP stack in Rust, applied to gacha probability modeling (no tch-rs/ndarray/PyTorch) — looking for benchmark help on ARM/AVX-512/GPU [P]

r/MachineLearning · 2d ago · 7 · research open source inference library

Talos-XII is a hand-written ML systems project in Rust that trains neural networks (EnvNet, DQN, PPO) to model gacha probability dynamics, featuring a custom autograd engine, SIMD dispatch (AVX2/AVX-512/NEON), and an experimental adaptive caching component (ACHF) for CPU-bound RL inference. The project demonstrates practical systems engineering for embedded ML—custom autodiff, parallelization, and BF16 optimization—though the core innovation (ACHF) is still experimental and lacks cross-hardware validation.

Native-speed vLLM transformers modeling backend

HuggingFace Blog · 3d ago · 8 · inference tool library deployment open source

The transformers library's vLLM integration now uses torch.fx graph analysis and AST-based code rewriting to dynamically optimize model inference at runtime, matching native vLLM performance without custom implementations. This enables single-flag deployment of Hugging Face models with optimized inference (continuous batching, fused kernels) through --model-impl transformers, with benchmark comparisons showing performance parity across Qwen3 variants.

sqlite-utils 4.0

Simon Willison · 4d ago · 7 · tool library workflow

sqlite-utils 4.0 introduces database schema migrations, a practical feature for developers managing evolving data structures in SQLite-backed applications. This is particularly useful for AI engineers building data pipelines, RAG systems, or applications that need reliable database versioning alongside their model workflows.

Repo for implementations of various Transformer Attn mechanisms [P]

r/MachineLearning · 37d ago · 7 · library open source tool

AttnHut is an open-source repository providing modular, swappable attention mechanism implementations for language models and vision tasks, including MiniMax M3's sparse attention. The library enables easy experimentation and benchmarking of different attention variants, with applications across SLMs, computer vision, and RL.

qwen35: use post-norm hidden state for MTP by am17an · Pull Request #24025 · ggml-org/llama.cpp

r/LocalLLaMA · 37d ago · 7 · library inference open source

Technical discussion in llama.cpp about extracting embeddings for Multi-Token Prediction (MTP) models, specifically whether to use pre-norm or post-norm hidden states depending on the model architecture. The thread explores API design options for decoupling embedding extraction from logits computation to support different MTP model requirements.

Encodec.cpp, a portable C++ implementation of Meta's EnCodec using Eigen [P]

r/MachineLearning · 38d ago · 8 · open source library tool inference

A lightweight C++ implementation of Meta's EnCodec audio codec using Eigen with zero ML runtime dependencies, compiled weights, and single-threaded performance matching or exceeding ONNX Runtime. Provides an easily integrable CMake library for audio tokenization and compression tasks without external model files.

TorchDAE: Implicit DAE Solvers with Index Reduction and Adjoint Sensitivity [P]

r/MachineLearning · 38d ago · 7 · library open source tool

New PyTorch library for solving Differential Algebraic Equations with GPU acceleration and differentiable workflows, implementing Generalized-Alpha integration and adjoint sensitivity methods. Enables physics-informed machine learning applications like system identification and scientific ML by bridging traditional numerical methods with PyTorch's autograd ecosystem.

What I learned building a debugger for PyTorch training loops and how it changed how I think about failure diagnosis [D]

r/MachineLearning · 42d ago · 8 · tool open source library workflow

Developer shares NeuralDBG, an open-source PyTorch tool for automatically detecting and localizing training failures by monitoring per-layer gradient norm transitions rather than global loss curves. The key insight is that training failures are typically localized to specific layers, and includes practical code snippets for gradient monitoring that can catch 80% of failures without additional tooling.

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

r/MachineLearning · 44d ago · 8 · tool library open source dataset benchmark

MONET is a new Apache 2.0 open-source image-text dataset with 104.9M high-quality samples curated from 2.9B images, accompanied by visualization tools, a retrieval system, and a T2I training codebase. This is a significant resource for engineers building multimodal AI systems, offering both the dataset and practical tooling for training text-to-image models.

Duel-Agents — CLI, SDK, and IDE plugins for Duel Agents

GitHub Trending AI · 44d ago · 7 · tool library open source inference workflow

Duel Agents is an IDE-native routing layer that automatically selects the cheapest LLM response while maintaining quality across multiple models (Anthropic, OpenAI, etc.). The open-source integration package provides CLI, SDK, and plugins for Claude, Cursor, and OpenAI-compatible clients, with setup automation and environment configuration.

Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]

r/MachineLearning · 44d ago · 8 · inference open source library benchmark

TritonMoE is a portable MoE inference kernel written in Triton that achieves 89-131% of Megablocks throughput while running unchanged on both NVIDIA and AMD GPUs. The key optimization uses fused gate+up GEMM operations to reduce global memory traffic by 35%, though performance degrades at very long sequences (2048+ tokens) and under extreme routing skew.

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

r/MachineLearning · 45d ago · 8 · research inference open source library

NeuroFlow is a training-free dynamic routing framework for Vision Transformers that achieves 55.8× wall-clock speedup on high-res video inference by eliminating redundant tokens via semantic surprise tracking in embedding space. The approach uses a dual-memory architecture with retinal gating and cortical caching to maintain 97%+ fidelity while achieving extreme sparsity (84% token reduction), with code and paper publicly available.

Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation (ICML 2026 Workshops) [R]

r/MachineLearning · 46d ago · 7 · research library benchmark

EAMS presents an Equivariant Mesh Neural Network framework for robust anatomical mesh segmentation across medical imaging tasks (dental, liver, aneurysm), maintaining performance under geometric perturbations like patient pose variation where standard methods degrade by 25+ IoU points. The work combines intrinsic mesh descriptors with anatomy-aware PCA-derived priors in a lightweight (<2M parameter) architecture, demonstrating that equivariance principles from molecular modeling transfer effectively to 3D medical mesh tasks despite trade-offs in capturing subtle asymmetric features.

OpenMOSS-Team/MOSS-TTS-v1.5 · Hugging Face

r/LocalLLaMA · 46d ago · 7 · new model library inference

MOSS-TTS-v1.5 expands multilingual text-to-speech capabilities to 31 languages with improved performance through FlashAttention 2 support and optimized dependencies. The update maintains backward compatibility with v1.0 while adding support for languages like Cantonese, Hindi, Thai, and Vietnamese, with straightforward installation and generation APIs.

Thermocompute constant time inference [P]

r/MachineLearning · 48d ago · 6 · library open source inference benchmark

Thermocompute is a PyTorch library that emulates thermodynamic probabilistic computing, offering stochastic neural layers (p-bits, samplers, generative models) designed to exploit parallel hardware where inference time remains constant as layer width increases. The key technical insight is that on GPUs with available parallel capacity, thermodynamic layers can achieve flat wall-clock time scaling with width, potentially outperforming classical dense FFNs for certain workloads.

Working on a cgo-free CUDA binding in Go for ML stuff Week 3 - open source [P]

r/MachineLearning · 48d ago · 7 · library open source inference

A Go developer created a pure Go CUDA binding library (gocudrv) that eliminates cgo dependencies by loading libcuda.so at runtime using purego, enabling cross-compilation and smaller Docker images for ML workloads. The implementation uses OS thread locking to handle CUDA's per-thread context model via goroutine channels, with early support for memory allocation, kernel launches, and GPU event timing.

I built a Mamba1 variant I call SM1 with d_state=1 that runs on Blackwell in pure PyTorch [P]

r/MachineLearning · 49d ago · 7 · research inference open source library

SM1 (Scalar Mamba1) implements a closed-form solution for state-space models with d_state=1 using pure PyTorch operations, eliminating the selective scan bottleneck and reducing memory by 16x compared to standard Mamba implementations. The author demonstrates practical benefits: training a 130M parameter model on MIDI data with minimal memory footprint (56KB state, no KV cache) on consumer hardware, highlighting that scalar state dimensions can be sufficient when token representations already encode structure.

Datasette Agent

Simon Willison · 50d ago · 8 · tool agent open source library plugin

Datasette Agent is a new conversational AI assistant that lets users query data stored in Datasette using natural language, with LLM-powered SQL generation and an extensible plugin architecture. The tool integrates with modern LLMs (Gemini, Claude, local models) for reliable tool calling and SQL generation, and includes plugins for charts and other functionality. This represents a practical fusion of data querying and LLM agents with immediate applicability for engineers working with databases and AI.

NOML-NOML: hierarchical TD3 + anchor policy for flight control [P]

r/MachineLearning · 52d ago · 8 · open source research library agent

Engineer open-sourced NOML, a custom RL algorithm for continuous control that addresses instability in flight simulation by combining anchor policy (safe action fallback), hierarchical actor architecture (independent MLP heads per control axis), and mirror learning for data efficiency. The approach diverges from standard TD3 by eliminating exploration noise while maintaining stability through structural constraints rather than reward shaping.

Witchcraft, fast local semantic search on top of SQLite [P]

r/MachineLearning · 54d ago · 8 · open source tool library rag deployment

Witchcraft is a Rust-based semantic search engine for client-side deployment using SQLite, achieving 20ms latency without external APIs or vector databases. It includes Pickbrain, a CLI tool that indexes Claude/Codex transcripts and documents for semantic search with direct session resumption, plus skills for both AI platforms to maintain cross-session memory.