noisekit is an open-source tool that generates realistic degraded audio datasets from clean annotated speech data, enabling accurate STT vendor benchmarking under production conditions (phone noise, codecs, reverb). It fills a critical gap for voice agent builders by providing WER-measurable datasets that approximate real-world phone call audio rather than relying on clean studio recordings.
NeuroFlow is a training-free dynamic routing framework for Vision Transformers that achieves 55.8× wall-clock speedup on high-res video inference by eliminating redundant tokens via semantic surprise tracking in embedding space. The approach uses a dual-memory architecture with retinal gating and cortical caching to maintain 97%+ fidelity while achieving extreme sparsity (84% token reduction), with code and paper publicly available.
Cross-species neuroscience study comparing learning rules (BP, FA, PC, STDP) across human fMRI and macaque electrophysiology (V1/V2/V4/IT), finding that early visual alignment is conserved but IT alignment scales with model capacity rather than learning rule. Includes careful controls for stimulus confounds and capacity baselines, with code and companion papers provided.
Open-source 7MB autonomous driving model that learns visual navigation, lane following, and drift recovery for edge deployment on lightweight hardware. Demonstrates practical real-time inference optimization for complex perception tasks without cloud infrastructure, valuable for understanding model compression and embedded AI systems.
A practical open-source solution for efficient async RL training using sparse weight deltas instead of full model snapshots, reducing synchronization overhead from ~1TB to ~20GB per checkpoint. The approach leverages bf16 arithmetic properties where 98% of weights remain bit-equivalent between steps, enabling asynchronous weight distribution via shared storage (S3) without direct trainer-inference connectivity.
Technical guide for building a fully local speech-to-speech pipeline (VAD → STT → LLM → TTS) with Reachy Mini robot using open-source tools like llama.cpp, Parakeet, and Qwen3TTS. Demonstrates how to run conversational AI systems without cloud dependencies, with modular component swapping and customization for latency/quality tradeoffs.
Tomesphere is a free research paper discovery platform indexing 3M arxiv/OpenAlex papers with AI-generated TLDRs, peer reviews, GitHub repos, HuggingFace models, and semantic similarity search using SPECTER2 embeddings in pgvector. The semantic graph approach enables discovery of topically related papers beyond citation networks, with a Chrome extension for arxiv integration and multiple ranking modes (influential, recent, hidden gems, nearest neighbors).
WAVE is a portable GPU kernel abstraction layer that compiles to a unified binary compatible with Metal, PTX, HIP, and SYCL across Apple, NVIDIA, and AMD hardware. This solves a critical pain point for AI engineers building cross-platform systems—write kernels once and deploy identically across diverse GPU architectures with verified PyTorch integration.
Practical guide covering multiple inference frameworks (Transformers, llama-cpp-python, vLLM, SGLang, Ollama, etc.) for running a 27B quantized Qwen model. Includes GGUF quantization options and benchmark comparisons showing minimal accuracy degradation, useful for engineers optimizing local model deployment.
Guide for using a fine-tuned Qwen 3.5-35B variant (with reduced content restrictions) across multiple inference frameworks including Transformers, vLLM, and SGLang, with MMLU benchmark results (83.72% accuracy) and multiple quantization options available. Practical for engineers looking to deploy modified open-source models with different inference backends.
Aiki is a lightweight local tool for querying Wikipedia with custom TF-IDF retrieval and optional LLM answer generation. It demonstrates practical RAG implementation with minimal dependencies, featuring query expansion via Wikipedia links and flexible article selection—useful reference for building local knowledge systems.
Novel implementation of DCGAN inference on resource-constrained RISC-V microcontroller (CH32H417) with 512KB shared SRAM, using int8 quantization, SD card weight streaming with double buffering, and custom C inference engine achieving bit-identical PyTorch outputs. Demonstrates practical techniques for embedded generative models on non-ARM architectures where ecosystem tools like CMSIS-NN don't exist, with creative integration of quantum entropy for latent vector seeding.
Spice is an open-source decision layer framework that sits above execution agents to make agent decision-making explicit and interpretable. It captures what was observed, options considered, reasoning for selection, trade-offs rejected, and execution outcomes—addressing a key gap where agents excel at execution but lack transparent decision-making processes. The project is early-stage but functional, installable, and designed to work with existing agents like Claude Code and other tools.
Discussion of FWHT (Fast Walsh-Hadamard Transform) CUDA kernel implementation for quantized KV-cache in LLM inference, with performance benchmarks across different model architectures and head sizes. Shows practical optimization work for inference speed-ups when using q8_0 quantization on different GPU architectures (RTX 5090, CDNA).
MiniCPM5-1B is a new 1B-class open-source model achieving SOTA in its weight class with built-in hybrid reasoning modes, designed for on-device deployment and resource-constrained scenarios. The release includes deployment guides for Transformers, vLLM, and SGLang, plus fine-tuning resources and newly released training datasets (Ultra-FineWeb, UltraData-Math, UltraData-SFT).
Production-tested solution for enforcing tool-call constraints in LangGraph agents using a YAML-based contract layer that validates rules deterministically before execution. Addresses critical failure mode where prompt engineering and post-hoc auditing fail to prevent compliance violations, with the approach open-sourced as Sponsio for community feedback.
MergeNB is a VS Code extension that improves Jupyter Notebook merging for collaborative workflows, addressing pain points with existing tools like nbdime. The tool features a web UI and plans to expand as a git mergetool, offering practical improvements for teams managing notebook-based research and development.
Thermocompute is a PyTorch library that emulates thermodynamic probabilistic computing, offering stochastic neural layers (p-bits, samplers, generative models) designed to exploit parallel hardware where inference time remains constant as layer width increases. The key technical insight is that on GPUs with available parallel capacity, thermodynamic layers can achieve flat wall-clock time scaling with width, potentially outperforming classical dense FFNs for certain workloads.
A Go developer created a pure Go CUDA binding library (gocudrv) that eliminates cgo dependencies by loading libcuda.so at runtime using purego, enabling cross-compilation and smaller Docker images for ML workloads. The implementation uses OS thread locking to handle CUDA's per-thread context model via goroutine channels, with early support for memory allocation, kernel launches, and GPU event timing.
Papers with Code has been revived with new features for tracking AI SOTA across domains, including multi-metric leaderboards, paper lineage tracking, method taxonomy, and ~3k model evaluations. The platform now supports external paper submissions (non-Arxiv) with auto-enrichment via AI, making it a useful reference tool for staying current with model releases and benchmarks.