r/MachineLearning · 3d ago · 6 · tool rag workflow

A developer built a Steam game recommender system using custom vector embeddings to capture nuanced game characteristics (gameplay focus, music, vibe) instead of broad tags, enabling more personalized recommendations and discovery of underrated games. The project uses a database-driven approach with explanations for each recommendation and includes an advanced mode for fine-tuned filtering.

r/MachineLearning · 3d ago · 9 · new model benchmark inference open source

TabPFN-3 releases a major tabular foundation model update enabling 1M-row inference on single H100s with 10-1000x faster inference and a novel thinking mode for test-time compute optimization. The model achieves 93% win rate over classical ML and demonstrates significant improvements in speed, scale, and multi-class support through architectural innovations like row-chunked inference and KV caching.

r/MachineLearning · 3d ago · 7 · research benchmark open source

Research revealing that the ratio of MLP to attention spectral norms in decoder transformers predicts rank collapse in final layers, with optimal stability maintained at 0.5-2 ratio. This provides actionable guidance for model architecture design and debugging, with an accompanying open-source implementation for analysis.

r/LocalLLaMA · 3d ago · 6 · tool open source benchmark

An open-source evaluation tool for distributed LLM assessment that supports multiple grading methods (LLM-based, regex, custom scripts) and distributes tasks across machines. The tool enables engineers to evaluate model outputs at scale, though discussions highlight concerns about LLM self-grading reliability and regex false-negatives.

r/MachineLearning · 3d ago · 6 · tool benchmark inference

Engineer seeks specialized cache simulation tools for LLM prompt caching workloads with multi-tier hierarchies, token-weighted objects, and edit-driven traces—current options like libCacheSim don't model the cost/residency structure of systems like Anthropic's tiered prompt cache. This is a technical community question surfacing a real gap in tooling for LLM inference optimization and cache policy research.

Latent Space · 4d ago · 9 · new model benchmark inference

Thinking Machines released TML-Interaction-Small, a 276B parameter MoE model optimized for real-time multimodal interaction with <200ms latency, featuring encoder-free early fusion and novel benchmarks (TimeSpeak, CueSpeak, RepCount-A, ProactiveVideoQA) designed to measure continuous, simultaneous interaction capabilities that exceed GPT-4o Realtime and Gemini 3.1-Flash on audio/visual tasks. The approach prioritizes time-aligned microturns and synchronized audio-visual processing, advancing the practical implementation of responsive voice AI systems.

OpenAI Blog · 4d ago · 5 · workflow api update

AutoScout24 Group's case study demonstrates practical applications of Codex and ChatGPT for accelerating development workflows and code quality improvements. While showing real-world AI integration in software teams, the content is primarily business-focused with limited technical depth on implementation details or novel engineering techniques.

OpenAI Blog · 4d ago · 7 · benchmark research agent

Parameter Golf is a competition framework that challenged 1,000+ participants to optimize ML research, coding agents, and model design under computational constraints, covering practical techniques like quantization and efficient model architectures. The large submission volume suggests useful real-world patterns and techniques emerged for building efficient AI systems.

HuggingFace Blog · 4d ago · 7 · workflow deployment open source infrastructure

Technical overview of open-source software stacks for foundation model training and inference, covering the layered architecture spanning hardware infrastructure, resource orchestration (Kubernetes, Slurm), ML frameworks (PyTorch, JAX), and observability tools (Prometheus, Grafana). Provides practical guidance on systems bottlenecks and scaling characteristics for engineers building distributed LLM training/inference pipelines.

r/MachineLearning · 4d ago · 9 · tool tutorial inference open source

A deep technical breakdown of building a minimal LLM compiler from scratch in Python that lowers models (TinyLlama, Qwen2.5-7B) to optimized CUDA kernels across six IR levels. Demonstrates practical GPU optimization techniques (tiling, shared memory staging, bank conflict resolution, pipelining) with competitive performance (1.11-1.20× vs PyTorch/torch.compile on some ops) and includes reproducible CLI commands for each optimization stage.

Simon Willison · 4d ago · 6 · workflow agent

Analysis of the economic trade-offs when using AI coding agents, arguing that productivity gains only make financial sense if paired with proportional reductions in code maintenance costs. The piece highlights a critical blindspot in AI-assisted development: increased code volume without corresponding maintenance efficiency improvements can actually increase total costs exponentially.

Simon Willison · 4d ago · 7 · tutorial workflow prompt engineering tool

Simon Willison demonstrates practical patterns for executing LLM-generated code directly from shell scripts using shebang syntax, including examples with tool calls and YAML-defined functions. The post covers workflow techniques for integrating LLM outputs into command-line workflows and debugging with options like --td for tool inspection.

r/MachineLearning · 4d ago · 6 · workflow prompt engineering

A developer seeks guidance on optimal methods for inputting multidimensional time series data alongside video to VLMs, noting that common approaches (text formatting and line chart visualization) underperformed on their task. This represents a practical workflow challenge in multimodal AI engineering with potential solutions in data representation and prompt engineering.

r/LocalLLaMA · 4d ago · 9 · new model deployment open source inference

MiniCPM-V 4.6 is a lightweight multimodal model optimized for on-device deployment across iOS, Android, and HarmonyOS, achieving Qwen 2B-level performance with 50% fewer visual encoding FLOPs and 1.5x better throughput than comparable models. The model supports mixed visual token compression (4x/16x), works with popular inference frameworks (vLLM, llama.cpp, Ollama) and fine-tuning tools (SWIFT, LLaMA-Factory), with all edge adaptation code open-sourced for developer customization.

Simon Willison · 4d ago · 7 · agent workflow deployment

Shopify's internal coding agent 'River' enforces public-only Slack interactions to create visible, searchable work that enables organizational learning at scale—a practical implementation of how transparency and observability can improve both productivity and knowledge sharing in AI-assisted development workflows.

r/MachineLearning · 4d ago · 5 · tool tutorial

Interactive visualization tool for Jensen-Shannon divergence, a symmetric divergence metric useful for comparing probability distributions. While mathematically foundational for ML work, this is primarily an educational visualization rather than a practical tool for daily AI development workflows.

r/MachineLearning · 4d ago · 8 · research benchmark open source

Pre-registered robustness study of Meta's V-JEPA 2.1 across model sizes (80M-2B) reveals that representational drift (M2 metric) predicts failure on temporal corruptions but not image noise, non-monotonic scaling where larger models aren't reliably more robust, and unexpected orientation sensitivity despite temporal structure preservation. Includes mechanistic hypothesis linking findings to hub marginalization in deep ViTs with fully reproducible code and pre-registered decision rules.