r/MachineLearning · 1d ago · 7 · rag tool workflow open source deployment

Engineer built a Steam game recommender system using RAG/vector embeddings on 2k reviews across 80k games, with a pipeline that extracts game vibes and mechanics into interpretable vectors stored in PostgreSQL + Chroma DB. The system uses ChatGPT to generate structured tags from reviews, clusters them semantically, and provides explainable recommendations via a React frontend deployed on Digital Ocean—demonstrating practical LLM integration for recommendation systems with focus on interpretability over black-box collaborative filtering.

r/LocalLLaMA · 1d ago · 8 · new model open source benchmark agent

Zyphra releases ZAYA1-74B-Preview, a 74B parameter MoE model (4B active) trained end-to-end on AMD hardware, demonstrating strong pass@4 reasoning performance on math and coding benchmarks despite being a pre-RL checkpoint. The open-source model (Apache 2.0) shows competitive reasoning capabilities and promising agentic task performance, with expectations for significant gains from pending RL post-training based on patterns observed in their 8B variant.

Anthropic Research · 1d ago · 8 · open source tool benchmark research

Petri 3.0, an open-source alignment testing toolbox, has been transferred to Meridian Labs nonprofit to evaluate LLMs for misaligned behaviors like deception and sycophancy. The tool uses separate auditor and judge models to systematically test alignment across scenarios, and is now part of a broader evaluation stack alongside Inspect and Scout for independent, credible model assessment.

Anthropic Research · 1d ago · 9 · research tool open source workflow

Anthropic introduces Natural Language Autoencoders (NLAs), a method that converts neural network activations into human-readable text explanations, enabling direct interpretation of what language models are thinking. The approach trains models to explain their own activations and reconstruct them from text, with applications to safety testing and reliability improvements. Code and an interactive frontend are released for researchers to build on this interpretability technique.

r/MachineLearning · 1d ago · 7 · tutorial inference deployment

Manning is releasing 'Quantization and Fast Inference' by Kalyan Aranganathan, a practical guide covering PTQ, QAT, and production deployment trade-offs for efficient model inference. The book addresses real-world quantization challenges like activation outliers in LLMs, KV cache optimization, and hardware-specific behavior—moving beyond theory to operational constraints.

Simon Willison · 1d ago · 6 · new model benchmark

Simon Willison shares retrospective analysis of Gemini 3.1 Flash-Lite, comparing the March preview version to the now-released production model. The writeup covers technical characteristics of this lightweight variant in Google's Gemini 3.1 lineup, useful for understanding model capabilities and trade-offs for different deployment scenarios.

r/MachineLearning · 1d ago · 7 · new model open source fine tuning ner

New open-source NER model (en_legal_ner_ind_trf v0.1) fine-tuned on InLegalBERT for Indian legal document extraction, achieving 78.67% F1 across 13 entity types with exceptional performance on case citations (97.76% F1). Addresses the gap left by unmaintained OpenNyAI model, particularly handling pre-1990 OCR-degraded constitutional texts using a silver-annotation pipeline combining regex, metadata projection, transformer NER, and gazetteer approaches trained with Focal Loss for label imbalance.

Simon Willison · 1d ago · 8 · workflow research tool agent

Mozilla leveraged Claude Mythos preview to systematically identify and fix hundreds of Firefox security vulnerabilities using improved AI-guided techniques for steering, scaling, and filtering model outputs. The approach discovered 423 security bugs in April 2026 (vs. 20-30/month previously), demonstrating practical application of advanced LLMs for security auditing at scale.

r/MachineLearning · 1d ago · 6 · research prompt engineering

A developer proposes using diffusion models operating on abstract syntax trees (ASTs) to guarantee syntactically correct code generation by constraining the search space to valid program structures rather than token sequences. The idea suggests this approach could reduce training data requirements by leveraging the finite combinatorial space of valid ASTs with fixed node counts.

r/MachineLearning · 2d ago · 6 · research workflow benchmark

A software engineer shares a technical approach using Jensen-Shannon divergence (JSD) to detect narrative shifts in AI news before sentiment aggregates register them, comparing rolling 7-day windows across vocabulary distributions and an 8-category narrative frame taxonomy. The core challenge is establishing reliable baselines and trigger thresholds at short time horizons where existing semantic change literature (typically longer-term) may not directly apply, raising questions about window sizing, distance metrics, and frame granularity for daily news regime detection.

r/MachineLearning · 2d ago · 6 · inference deployment workflow

A practitioner explores ROCm viability for model training on AMD GPUs (RX7900XTX) as an alternative to NVIDIA RTX 3090s, noting PyTorch support but lacking concrete user reports on training performance and ecosystem maturity. The technical comparison focuses on FP16 throughput advantages and seeking real-world validation of ROCm's production-readiness for training workflows.

r/MachineLearning · 2d ago · 8 · tool tutorial open source

An interactive dataflow visualization tool for understanding transformer architectures from first principles, covering attention mechanisms (MLA, hybrid attention, RoPE), routing methods (MoE), and model variants from GPT-2 to Qwen 3.6. Useful for engineers who need to understand architectural differences and implementations across modern LLM families.

r/MachineLearning · 2d ago · 6 · inference workflow

A software engineer asks about reproducibility of video diffusion models across different GPU architectures, questioning whether identical weights, prompts, and noise seeds produce perceptually similar outputs despite floating-point arithmetic differences. This technical question touches on practical concerns for deterministic inference and model deployment consistency.

r/LocalLLaMA · 2d ago · 7 · tool inference open source

This PR adds MiMo V2.5 model support to llama.cpp with text-to-text inference capabilities, including proper FP8 dequantization handling and attention value scale fixes for better transformer compatibility. The implementation addresses weight sharding complexities and unfuses attention components to maintain compatibility with existing MiMo V2 inference paths.

OpenAI Blog · 2d ago · 6 · agent deployment api update

Parloa is a platform that uses OpenAI's models to build voice-based customer service agents with simulation and deployment capabilities. While it demonstrates practical application of LLMs for enterprise use, it's primarily a SaaS product rather than a new technical capability or tool that directly impacts daily AI engineering workflows.

OpenAI Blog · 2d ago · 9 · api update new model inference

OpenAI has released new realtime voice models in their API supporting reasoning, translation, and transcription capabilities. This enables building voice applications with lower latency and more natural interactions, expanding the technical possibilities for voice-based AI products and integrations.