EAMS presents an Equivariant Mesh Neural Network framework for robust anatomical mesh segmentation across medical imaging tasks (dental, liver, aneurysm), maintaining performance under geometric perturbations like patient pose variation where standard methods degrade by 25+ IoU points. The work combines intrinsic mesh descriptors with anatomy-aware PCA-derived priors in a lightweight (<2M parameter) architecture, demonstrating that equivariance principles from molecular modeling transfer effectively to 3D medical mesh tasks despite trade-offs in capturing subtle asymmetric features.
MOSS-TTS-v1.5 expands multilingual text-to-speech capabilities to 31 languages with improved performance through FlashAttention 2 support and optimized dependencies. The update maintains backward compatibility with v1.0 while adding support for languages like Cantonese, Hindi, Thai, and Vietnamese, with straightforward installation and generation APIs.
Thermocompute is a PyTorch library that emulates thermodynamic probabilistic computing, offering stochastic neural layers (p-bits, samplers, generative models) designed to exploit parallel hardware where inference time remains constant as layer width increases. The key technical insight is that on GPUs with available parallel capacity, thermodynamic layers can achieve flat wall-clock time scaling with width, potentially outperforming classical dense FFNs for certain workloads.
A Go developer created a pure Go CUDA binding library (gocudrv) that eliminates cgo dependencies by loading libcuda.so at runtime using purego, enabling cross-compilation and smaller Docker images for ML workloads. The implementation uses OS thread locking to handle CUDA's per-thread context model via goroutine channels, with early support for memory allocation, kernel launches, and GPU event timing.
SM1 (Scalar Mamba1) implements a closed-form solution for state-space models with d_state=1 using pure PyTorch operations, eliminating the selective scan bottleneck and reducing memory by 16x compared to standard Mamba implementations. The author demonstrates practical benefits: training a 130M parameter model on MIDI data with minimal memory footprint (56KB state, no KV cache) on consumer hardware, highlighting that scalar state dimensions can be sufficient when token representations already encode structure.
Datasette Agent is a new conversational AI assistant that lets users query data stored in Datasette using natural language, with LLM-powered SQL generation and an extensible plugin architecture. The tool integrates with modern LLMs (Gemini, Claude, local models) for reliable tool calling and SQL generation, and includes plugins for charts and other functionality. This represents a practical fusion of data querying and LLM agents with immediate applicability for engineers working with databases and AI.
Engineer open-sourced NOML, a custom RL algorithm for continuous control that addresses instability in flight simulation by combining anchor policy (safe action fallback), hierarchical actor architecture (independent MLP heads per control axis), and mirror learning for data efficiency. The approach diverges from standard TD3 by eliminating exploration noise while maintaining stability through structural constraints rather than reward shaping.
Witchcraft is a Rust-based semantic search engine for client-side deployment using SQLite, achieving 20ms latency without external APIs or vector databases. It includes Pickbrain, a CLI tool that indexes Claude/Codex transcripts and documents for semantic search with direct session resumption, plus skills for both AI platforms to maintain cross-session memory.
PaddleOCR 3.5 now supports Transformers as a backend, enabling easier integration of OCR and document parsing into Hugging Face-centered workflows. This addresses document ingestion for RAG and Document AI pipelines by allowing developers to run PP-OCRv5 and PaddleOCR-VL models with flexible backend selection through a simple engine parameter.
Orthrus achieves 7.8× tokens-per-frame speedup by injecting a trainable diffusion attention module into frozen AR Transformer layers, maintaining exact output distribution while freezing backbone weights and outperforming existing diffusion LMs and speculative decoding methods. The approach trains only 16% of parameters on <1B tokens, eliminates external drafter overhead, and achieves 11.7 mean acceptance length on MATH-500 with zero TTFT penalty.
Cola DLM is a new hierarchical continuous latent-space diffusion language model from ByteDance that combines a Text VAE with a block-causal Diffusion Transformer, using Flow Matching for latent prior transport. The documentation provides integration guides for Transformers, vLLM, SGLang, and Docker deployment, along with benchmark results and an OpenAI-compatible API adapter for experimentation.
A new Datasette plugin enables spending limit controls for LLM usage, integrating with datasette-llm and datasette-llm-accountant to manage per-user or global cost caps. This addresses practical cost management for developers building LLM applications within Datasette environments.
A novel Vision Transformer backbone using block-sparse core-periphery attention that reduces complexity from O(N²) to O(2NC + C²), trained with nested dropout for elastic inference-time cost adjustment. Achieves competitive accuracy with DINOv3 while maintaining stability across resolutions (256-1024) and demonstrates interesting emergent attention patterns.
A minimal 160-200 line PyTorch implementation of JEPA (Joint-Embedding Predictive Architecture) algorithms that strips away scaling complexities to expose core mathematical concepts. Includes tutorial documentation mapping algorithm theory directly to implementation, making it valuable for understanding self-supervised learning approaches.
Parax v0.7 is a JAX library that bridges functional PyTree-based modeling with object-oriented approaches, offering derived parameters, computed PyTrees, and abstract interfaces for constrained optimization and probabilistic sampling. The release includes polished APIs and practical examples for bounded optimization (JAXopt) and Bayesian sampling (BlackJAX), making it valuable for engineers building probabilistic ML systems in JAX.
A new Python library that wraps NumPy operations with mathematical expression syntax, using C++/pybind11 for performance. While it provides cleaner notation for complex vectorized operations, it's early-stage and represents an ergonomic enhancement rather than a fundamental capability addition for AI engineers.
FormalSLT is a machine-verified Lean 4 library implementing core statistical learning theory results (VC bounds, PAC-Bayes, algorithmic stability) with 45 modules and zero unproven statements, providing formally certified generalization bounds for AI practitioners who need mathematically rigorous foundations. The library bridges the gap between paper proofs and executable code by encoding hypotheses and finite-sample assumptions directly into theorem signatures.
Guide for using z-lab/gemma-4-26B-A4B-it-DFlash, a speculative decoding drafter model that achieves up to 3.7x speedup through parallel token drafting via block diffusion. Includes integration instructions for Transformers, vLLM, SGLang, and Docker with performance benchmarks on NVIDIA B300 GPUs.
TritonSigmoid is an open-source GPU kernel implementing sigmoid attention with native padding awareness, achieving 515 TFLOPS on H100 and outperforming softmax/FlashAttention on variable-length sequences. Designed for single-cell biology models where multi-token attention is semantically required, it demonstrates both computational efficiency and empirical improvements in loss and representation quality across benchmarks.
Explores TRE regex engine's superior handling of ReDoS attacks compared to Python's standard library, with Claude Code used to build experimental Python bindings and test malicious regex patterns. Demonstrates practical security benefits of backtracking-free regex implementations for AI engineers building systems that process untrusted regex inputs.