News Nug

noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]

r/MachineLearning · 5h ago · 8 · tool open source benchmark workflow

noisekit is an open-source tool that generates realistic degraded audio datasets from clean annotated speech data, enabling accurate STT vendor benchmarking under production conditions (phone noise, codecs, reverb). It fills a critical gap for voice agent builders by providing WER-measurable datasets that approximate real-world phone call audio rather than relying on clean studio recordings.

EMA-Gated Temporal Sequence Compression in Vision Transformers [P]

r/MachineLearning · 5h ago · 8 · research inference open source library

NeuroFlow is a training-free dynamic routing framework for Vision Transformers that achieves 55.8× wall-clock speedup on high-res video inference by eliminating redundant tokens via semantic surprise tracking in embedding space. The approach uses a dual-memory architecture with retinal gating and cortical caching to maintain 97%+ fidelity while achieving extreme sparsity (84% token reduction), with code and paper publicly available.

Cross-species RSA: same learning rules (BP, PC, STDP, FA) tested against both human fMRI and macaque electrophysiology [P]

r/MachineLearning · 6h ago · 7 · research benchmark open source

Cross-species neuroscience study comparing learning rules (BP, FA, PC, STDP) across human fMRI and macaque electrophysiology (V1/V2/V4/IT), finding that early visual alignment is conserved but IT alignment scales with model capacity rather than learning rule. Includes careful controls for stimulus confounds and capacity baselines, with code and companion papers provided.

A Tiny Open-Source Self-Driving AI That Runs on a Phone [P]

r/MachineLearning · 12h ago · 7 · open source inference deployment

Open-source 7MB autonomous driving model that learns visual navigation, lane following, and drift recovery for edge deployment on lightweight hardware. Demonstrates practical real-time inference optimization for complex perception tasks without cloud infrastructure, valuable for understanding model compression and embedded AI systems.

Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

HuggingFace Blog · 18h ago · 8 · tool workflow open source research inference

A practical open-source solution for efficient async RL training using sparse weight deltas instead of full model snapshots, reducing synchronization overhead from ~1TB to ~20GB per checkpoint. The approach leverages bf16 arithmetic properties where 98% of weights remain bit-equivalent between steps, enabling asynchronous weight distribution via shared storage (S3) without direct trainer-inference connectivity.

Reachy Mini goes fully local

HuggingFace Blog · 18h ago · 8 · tutorial tool open source workflow inference deployment

Technical guide for building a fully local speech-to-speech pipeline (VAD → STT → LLM → TTS) with Reachy Mini robot using open-source tools like llama.cpp, Parakeet, and Qwen3TTS. Demonstrates how to run conversational AI systems without cloud dependencies, with modular component swapping and customization for latency/quality tradeoffs.

Tomesphere, 3M paper pages with TLDRs, peer reviews, code, and a SPECTER2 similarity graph [P]

r/MachineLearning · 1d ago · 8 · tool open source rag

Tomesphere is a free research paper discovery platform indexing 3M arxiv/OpenAlex papers with AI-generated TLDRs, peer reviews, GitHub repos, HuggingFace models, and semantic similarity search using SPECTER2 embeddings in pgvector. The semantic graph approach enables discovery of topically related papers beyond citation networks, with a Chrome extension for arxiv integration and multiple ranking modes (influential, recent, hidden gems, nearest neighbors).

[P] Built a portable GPU ISA after reading too many architecture manuals [P]

r/MachineLearning · 1d ago · 9 · tool open source inference deployment

WAVE is a portable GPU kernel abstraction layer that compiles to a unified binary compatible with Metal, PTX, HIP, and SYCL across Apple, NVIDIA, and AMD hardware. This solves a critical pain point for AI engineers building cross-platform systems—write kernels once and deploy identically across diverse GPU architectures with verified PyTorch integration.

Qwen3.5 27B Uncensored Heretic Native MTP Preserved is Out Now With the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs, NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats!

r/LocalLLaMA · 1d ago · 7 · tutorial inference open source deployment

Practical guide covering multiple inference frameworks (Transformers, llama-cpp-python, vLLM, SGLang, Ollama, etc.) for running a 27B quantized Qwen model. Includes GGUF quantization options and benchmark comparisons showing minimal accuracy degradation, useful for engineers optimizing local model deployment.

Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

r/LocalLLaMA · 1d ago · 6 · open source inference deployment fine tuning

Guide for using a fine-tuned Qwen 3.5-35B variant (with reduced content restrictions) across multiple inference frameworks including Transformers, vLLM, and SGLang, with MMLU benchmark results (83.72% accuracy) and multiple quantization options available. Practical for engineers looking to deploy modified open-source models with different inference backends.

Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

r/MachineLearning · 1d ago · 7 · tool open source rag

Aiki is a lightweight local tool for querying Wikipedia with custom TF-IDF retrieval and optional LLM answer generation. It demonstrates practical RAG implementation with minimal dependencies, featuring query expansion via Wikipedia links and flexible article selection—useful reference for building local knowledge systems.

DCGAN inference on a microcontroller: 12.6M parameters, 512KB SRAM, 26-second generation, pure C [P]

r/MachineLearning · 1d ago · 8 · inference open source deployment quantization

Novel implementation of DCGAN inference on resource-constrained RISC-V microcontroller (CH32H417) with 512KB shared SRAM, using int8 quantization, SD card weight streaming with double buffering, and custom C inference engine achieving bit-identical PyTorch outputs. Demonstrates practical techniques for embedded generative models on non-ARM architectures where ecosystem tools like CMSIS-NN don't exist, with creative integration of quantum entropy for latent vector seeding.

Reconstructing the agent methodology: Decoupling decision-making and execution - open source [P]

r/MachineLearning · 2d ago · 8 · open source agent workflow tool

Spice is an open-source decision layer framework that sits above execution agents to make agent decision-making explicit and interpretable. It captures what was observed, options considered, reasoning for selection, trade-offs rejected, and execution outcomes—addressing a key gap where agents excel at execution but lack transparent decision-making processes. The project is early-stage but functional, installable, and designed to work with existing agents like Claude Code and other tools.

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

r/LocalLLaMA · 2d ago · 7 · inference optimization cuda open source benchmark

Discussion of FWHT (Fast Walsh-Hadamard Transform) CUDA kernel implementation for quantized KV-cache in LLM inference, with performance benchmarks across different model architectures and head sizes. Shows practical optimization work for inference speed-ups when using q8_0 quantization on different GPU architectures (RTX 5090, CDNA).

MiniCPM5-1B

r/LocalLLaMA · 2d ago · 8 · new model tool inference open source deployment

MiniCPM5-1B is a new 1B-class open-source model achieving SOTA in its weight class with built-in hybrid reasoning modes, designed for on-device deployment and resource-constrained scenarios. The release includes deployment guides for Transformers, vLLM, and SGLang, plus fine-tuning resources and newly released training datasets (Ultra-FineWeb, UltraData-Math, UltraData-SFT).

Sponsio: Deterministic Contract Layer for LLM Agents [P]

r/MachineLearning · 2d ago · 8 · tool agent deployment open source

Production-tested solution for enforcing tool-call constraints in LangGraph agents using a YAML-based contract layer that validates rules deterministically before execution. Addresses critical failure mode where prompt engineering and post-hoc auditing fail to prevent compliance violations, with the approach open-sourced as Sponsio for community feedback.

MergeNB: An intuitive merge conflict resolver built for Jupyter notebooks in VS Code [P]

r/MachineLearning · 2d ago · 7 · tool open source workflow

MergeNB is a VS Code extension that improves Jupyter Notebook merging for collaborative workflows, addressing pain points with existing tools like nbdime. The tool features a web UI and plans to expand as a git mergetool, offering practical improvements for teams managing notebook-based research and development.

Thermocompute constant time inference [P]

r/MachineLearning · 3d ago · 6 · library open source inference benchmark

Thermocompute is a PyTorch library that emulates thermodynamic probabilistic computing, offering stochastic neural layers (p-bits, samplers, generative models) designed to exploit parallel hardware where inference time remains constant as layer width increases. The key technical insight is that on GPUs with available parallel capacity, thermodynamic layers can achieve flat wall-clock time scaling with width, potentially outperforming classical dense FFNs for certain workloads.

Working on a cgo-free CUDA binding in Go for ML stuff Week 3 - open source [P]

r/MachineLearning · 3d ago · 7 · library open source inference

A Go developer created a pure Go CUDA binding library (gocudrv) that eliminates cgo dependencies by loading libcuda.so at runtime using purego, enabling cross-compilation and smaller Docker images for ML workloads. The implementation uses OS thread locking to handle CUDA's per-thread context model via goroutine channels, with early support for memory allocation, kernel launches, and GPU event timing.

PapersWithCode new features - week 1 [P]

r/MachineLearning · 3d ago · 7 · tool benchmark open source

Papers with Code has been revived with new features for tracking AI SOTA across domains, including multi-metric leaderboards, paper lineage tracking, method taxonomy, and ~3k model evaluations. The platform now supports external paper submissions (non-Arxiv) with auto-enrichment via AI, making it a useful reference tool for staying current with model releases and benchmarks.