r/MachineLearning · 1d ago · 7 · research benchmark inference workflow

Deep technical analysis of SSM (State Space Model) vs Transformer performance constraints from OpenAI's Parameter Golf competition, revealing that SSMs have fundamental compression disadvantages (3.26x worse LZMA compression on weights) in size-constrained regimes. Includes kernel-level optimization experiments on Mamba-3 Triton kernels and practical findings on mixed-precision techniques that recovered 0.8 mBPB.

r/MachineLearning · 1d ago · 7 · benchmark tool inference workflow

AutoBe introduces a structured benchmark for end-to-end backend generation using AST-based function calling rather than unstructured code generation, with deterministic static analysis scoring. Key finding: smaller/cheaper models (qwen3.5-27b, local models) achieve competitive results with frontier models when using well-structured harnesses, suggesting harness design matters more than model size for backend generation tasks.

r/LocalLLaMA · 1d ago · 8 · inference optimization open source tool

A Pull Request implementing Multi Token Prediction (MTP) head support in llama.cpp, enabling speculative decoding with ~2.5x speedup and 75% token acceptance rates on Qwen3.6 models. The implementation optimizes host-device data transfers and is designed to work with any MTP-capable model, with working examples and performance benchmarks provided.

r/LocalLLaMA · 2d ago · 7 · research fine tuning inference open source

Developer shares work on a reverse LLM sidecar architecture that improves code generation in small models (1.7B-9B) by reading outputs end-to-start and injecting feedback loops focused on syntax correction. The approach shows promise on HumanEval benchmarks and code is being cleaned up for GitHub release.

OpenAI Blog · 2d ago · 7 · inference deployment workflow

OpenAI details architectural improvements to their WebRTC implementation for real-time voice AI, focusing on latency optimization and conversation management. This provides practical insights into building low-latency audio systems for AI applications, relevant for engineers implementing real-time voice features.

r/MachineLearning · 2d ago · 8 · tool inference open source benchmark

A proof-of-concept leveraging idle NVENC hardware on GPUs to compress neural network intermediate states (activations, KV cache) for PCIe transfer, achieving ~180 GB/s effective bandwidth on consumer GPUs like the RTX 5090—effectively recovering NVLink-class performance through hardware-pipelined codec operations that hide behind compute.

r/MachineLearning · 2d ago · 6 · tutorial workflow agent

Developer shares practical experience implementing Behavior Cloning on a game environment, covering action space remapping, trajectory alignment, and LSTM evaluation challenges. While this demonstrates real reinforcement learning workflow problems (BC→PPO transition, partial observability), it's primarily a case study rather than introducing new techniques or tools.

Simon Willison · 2d ago · 6 · research benchmark

Anthropic released research on Claude's sycophancy behavior across different domains, finding it exhibits problematic deference in 38% of spirituality conversations and 25% of relationship discussions, while maintaining critical pushback in most other contexts. This is relevant for engineers building with Claude to understand behavioral biases and potential limitations when using the model for sensitive advice or guidance tasks.

r/MachineLearning · 2d ago · 7 · research open source inference

Engineer demonstrates language model-based source code compression using n-gram models + arithmetic coding, achieving 82.4% compression (0.176x ratio) on Flask codebase—33% better than zlib but 1600× slower. The work showcases how token-level modeling captures syntactic patterns better than byte-level compressors, with practical implications for downstream transformer/LSTM approaches and batch optimization.

r/MachineLearning · 3d ago · 6 · workflow api update library

A developer encounters a breaking change in the Hugging Face Transformers library where the 'question-answering' pipeline task has been deprecated, and seeks alternatives for zero-shot extractive QA on text. The post highlights a practical workflow issue: the code previously used `pipeline('question-answering')` no longer works, and available alternatives like 'document-question-answering' don't fit text-only use cases.

r/MachineLearning · 3d ago · 7 · research rag fine tuning inference

Experimental work on augmenting frozen transformers with lightweight external memory for in-context adaptation without weight updates. Uses forward-pass derived correction vectors to enable one-shot binding of new facts while maintaining context separation, with results showing 80%+ accuracy on same-context recall but degraded generalization to new contexts.

r/MachineLearning · 3d ago · 8 · workflow tutorial

A discussion thread addressing the common blocker of content consumption without practical application—exploring how to transition from learning AI concepts to independently building systems. The conversation likely covers project-based learning strategies, determining necessary depth in math/theory, and developing the problem-solving mindset needed for real-world engineering rather than tutorial-following.

r/MachineLearning · 3d ago · 8 · research agent open source benchmark inference

A minimal research implementation of Meta AI's test-time compute scaling paper (PDR+RTV pipeline) for agentic coding tasks, enabling developers to experiment with the approach using Gemini 3.1 Pro on SWE-bench. This is the first public implementation of the paper's core techniques, making it immediately useful for engineers exploring advanced reasoning strategies in coding agents.

r/MachineLearning · 3d ago · 5 · research

Discussion thread exploring practical applications of physics-informed neural networks (PINNs) and physics-informed AI beyond academia. The post raises valid questions about deployment in real industries but is primarily a question seeking examples rather than showcasing actual technical implementations or breakthroughs.

r/MachineLearning · 4d ago · 7 · open source agent rag tutorial

OpenVidya is an open-source multi-agent AI system for curriculum-aware lesson generation tailored to Indian education (NCERT/CBSE), featuring concept dependency graphs, exam-pattern grounding, and five pedagogical modes with mode-specific prompting. The project demonstrates practical application of agentic AI and RAG patterns for domain-specific education, with structured curriculum integration as a reusable architecture pattern.

Simon Willison · 4d ago · 7 · tool workflow prompt engineering

Developer built a complete web app entirely on mobile using Claude Code, demonstrating a practical AI-assisted workflow: created a Python CLI tool, set up Git scraping automation, and generated a JavaScript frontend with a single LLM prompt. Shows how Claude can handle multi-layer full-stack development from local tooling to cloud-hosted APIs.