OpenAI Blog · 10h ago · 6 · agent workflow api update

Case study on building a self-improving tax agent using OpenAI's Codex for automating tax filings and improving accuracy through iterative refinement. Demonstrates practical application of code generation models to domain-specific workflow automation.

r/MachineLearning · 11h ago · 7 · open source inference deployment

Open-source 7MB autonomous driving model that learns visual navigation, lane following, and drift recovery for edge deployment on lightweight hardware. Demonstrates practical real-time inference optimization for complex perception tasks without cloud infrastructure, valuable for understanding model compression and embedded AI systems.

r/MachineLearning · 12h ago · 6 · research benchmark

A researcher shares their struggling GNN implementation for fraud detection on IEEE CIS dataset, achieving suboptimal performance (AUC 0.87, PR-AUC 0.52) across multiple architectures (GCN, GraphSAGE, GAT). This is practical ML engineering content with specific technical challenges but lacks novel insights—relevant for learning what not to do and potential debugging approaches.

r/MachineLearning · 1d ago · 7 · research library benchmark

EAMS presents an Equivariant Mesh Neural Network framework for robust anatomical mesh segmentation across medical imaging tasks (dental, liver, aneurysm), maintaining performance under geometric perturbations like patient pose variation where standard methods degrade by 25+ IoU points. The work combines intrinsic mesh descriptors with anatomy-aware PCA-derived priors in a lightweight (<2M parameter) architecture, demonstrating that equivariance principles from molecular modeling transfer effectively to 3D medical mesh tasks despite trade-offs in capturing subtle asymmetric features.

r/MachineLearning · 1d ago · 8 · tool open source rag

Tomesphere is a free research paper discovery platform indexing 3M arxiv/OpenAlex papers with AI-generated TLDRs, peer reviews, GitHub repos, HuggingFace models, and semantic similarity search using SPECTER2 embeddings in pgvector. The semantic graph approach enables discovery of topically related papers beyond citation networks, with a Chrome extension for arxiv integration and multiple ranking modes (influential, recent, hidden gems, nearest neighbors).

Simon Willison · 1d ago · 7 · agent security prompt engineering

Microsoft Copilot Cowork contained a critical security vulnerability where agentic systems could exfiltrate files through unapproved email messages with external image requests and pre-authenticated OneDrive links. This highlights a major design challenge in building safe autonomous agents: preventing prompt injection attacks from enabling data theft while maintaining agent autonomy.

r/MachineLearning · 1d ago · 7 · research agent inference

A technical essay critiques reasoning models' ability to perform faithful inference, arguing that jointly-generated reasoning traces and final answers lack genuine separation of concerns. The piece engages empirically with recent work (Lanham/Turpin/Mirzadeh) and compares architectural approaches (HRM, TRM, GRAM, AlphaProof, Kona/Aleph), offering conceptual framing around constraints vs. influence that's relevant for engineers building reasoning systems.

r/LocalLLaMA · 1d ago · 7 · new model library inference

MOSS-TTS-v1.5 expands multilingual text-to-speech capabilities to 31 languages with improved performance through FlashAttention 2 support and optimized dependencies. The update maintains backward compatibility with v1.0 while adding support for languages like Cantonese, Hindi, Thai, and Vietnamese, with straightforward installation and generation APIs.

r/MachineLearning · 1d ago · 9 · tool open source inference deployment

WAVE is a portable GPU kernel abstraction layer that compiles to a unified binary compatible with Metal, PTX, HIP, and SYCL across Apple, NVIDIA, and AMD hardware. This solves a critical pain point for AI engineers building cross-platform systems—write kernels once and deploy identically across diverse GPU architectures with verified PyTorch integration.

r/MachineLearning · 1d ago · 6 · workflow

A Reddit discussion asking for ML/AI community recommendations focused on deep technical work—papers, training dynamics, model debugging, and infrastructure challenges rather than LLM API projects. The post seeks spaces for sharing specific technical problems (e.g., anomalies in SSL training) and receiving substantive expert feedback.

r/LocalLLaMA · 1d ago · 7 · tutorial inference open source deployment

Practical guide covering multiple inference frameworks (Transformers, llama-cpp-python, vLLM, SGLang, Ollama, etc.) for running a 27B quantized Qwen model. Includes GGUF quantization options and benchmark comparisons showing minimal accuracy degradation, useful for engineers optimizing local model deployment.

r/LocalLLaMA · 1d ago · 6 · open source inference deployment fine tuning

Guide for using a fine-tuned Qwen 3.5-35B variant (with reduced content restrictions) across multiple inference frameworks including Transformers, vLLM, and SGLang, with MMLU benchmark results (83.72% accuracy) and multiple quantization options available. Practical for engineers looking to deploy modified open-source models with different inference backends.

r/MachineLearning · 1d ago · 7 · tool open source rag

Aiki is a lightweight local tool for querying Wikipedia with custom TF-IDF retrieval and optional LLM answer generation. It demonstrates practical RAG implementation with minimal dependencies, featuring query expansion via Wikipedia links and flexible article selection—useful reference for building local knowledge systems.

r/MachineLearning · 1d ago · 6 · benchmark research

Critical analysis of METR's widely-cited AI capability benchmark, exposing methodological flaws including biased sampling (METR employees' peers), perverse incentives (hourly pay encouraging slower completion), unmeasured baselines, and likely training data contamination. Highlights systemic issues in AI research evaluation practices that engineers should be aware of when assessing capability claims.

r/MachineLearning · 1d ago · 8 · inference open source deployment quantization

Novel implementation of DCGAN inference on resource-constrained RISC-V microcontroller (CH32H417) with 512KB shared SRAM, using int8 quantization, SD card weight streaming with double buffering, and custom C inference engine achieving bit-identical PyTorch outputs. Demonstrates practical techniques for embedded generative models on non-ARM architectures where ecosystem tools like CMSIS-NN don't exist, with creative integration of quantum entropy for latent vector seeding.

r/MachineLearning · 1d ago · 8 · open source agent workflow tool

Spice is an open-source decision layer framework that sits above execution agents to make agent decision-making explicit and interpretable. It captures what was observed, options considered, reasoning for selection, trade-offs rejected, and execution outcomes—addressing a key gap where agents excel at execution but lack transparent decision-making processes. The project is early-stage but functional, installable, and designed to work with existing agents like Claude Code and other tools.

r/LocalLLaMA · 1d ago · 7 · inference optimization cuda open source benchmark

Discussion of FWHT (Fast Walsh-Hadamard Transform) CUDA kernel implementation for quantized KV-cache in LLM inference, with performance benchmarks across different model architectures and head sizes. Shows practical optimization work for inference speed-ups when using q8_0 quantization on different GPU architectures (RTX 5090, CDNA).

r/MachineLearning · 2d ago · 6 · inference fine tuning deployment research

Call for papers for the 2nd Workshop on Efficient Reasoning at COLM 2026, covering practical topics like inference optimization (pruning, compression, KV-cache), efficient training/fine-tuning, and deployment of reasoning systems under resource constraints. Relevant for engineers working on cost-effective LLM inference and on-device reasoning, though this is primarily a conference submission announcement rather than technical content.

r/LocalLLaMA · 2d ago · 8 · new model tool inference open source deployment

MiniCPM5-1B is a new 1B-class open-source model achieving SOTA in its weight class with built-in hybrid reasoning modes, designed for on-device deployment and resource-constrained scenarios. The release includes deployment guides for Transformers, vLLM, and SGLang, plus fine-tuning resources and newly released training datasets (Ultra-FineWeb, UltraData-Math, UltraData-SFT).

r/LocalLLaMA · 2d ago · 7 · tool inference deployment tutorial

Practical guide for running MiMo-V2.5-coder-Q2, a quantized coding model optimized for Apple Silicon, across multiple inference frameworks (llama.cpp, vLLM, Ollama, etc.). Includes specific configurations for 128GB M5 systems and fallback strategies for memory-constrained setups, directly applicable for engineers deploying local coding assistants.