r/MachineLearning · 2d ago · 6 · tool tutorial

An interactive visualization tool for understanding KL divergence behavior across different distribution parameters (mean, skew, truncation, discretization). The tool runs client-side and provides intuitive exploration of how the KL metric changes with various distribution transformations.

Simon Willison · 2d ago · 8 · prompt engineering workflow tutorial

Practical guide on using Claude to generate rich HTML output instead of Markdown, enabling interactive visualizations, SVG diagrams, and better information presentation. Includes concrete prompt examples and demonstrates real-world applications like PR reviews and security exploit explanations.

HuggingFace Blog · 2d ago · 8 · fine tuning open source benchmark deployment tool

CyberSecQwen-4B demonstrates that a carefully fine-tuned 4B model can match an 8B specialist on cybersecurity tasks (CWE classification, CVE mapping, CTI Q&A) while fitting on consumer GPUs, achieving 97.3% of larger model accuracy with +8.7 points on multiple-choice benchmarks. The post details the training methodology using AMD MI300X, training on cybersecurity-specific datasets, and provides open-source configs for reproducing the work on various hardware stacks.

HuggingFace Blog · 2d ago · 8 · new model research inference

EMO is a new 14B-parameter mixture-of-experts model that enables task-specific expert subsets (12.5% of total) to achieve near-full performance without predefined domains, using emergent modular structure discovered during pretraining. This addresses practical deployment challenges by allowing selective expert activation for reduced computational costs while maintaining strong general-purpose capabilities.

r/MachineLearning · 2d ago · 7 · library open source research

FormalSLT is a machine-verified Lean 4 library implementing core statistical learning theory results (VC bounds, PAC-Bayes, algorithmic stability) with 45 modules and zero unproven statements, providing formally certified generalization bounds for AI practitioners who need mathematically rigorous foundations. The library bridges the gap between paper proofs and executable code by encoding hypotheses and finite-sample assumptions directly into theorem signatures.

r/MachineLearning · 2d ago · 6 · open source tool

Community discussion about open-source embedding models for time series data with frequency domain support. Relevant for engineers building RAG systems or ML pipelines that need to handle variable-length temporal sequences.

r/LocalLLaMA · 2d ago · 7 · tool inference library benchmark

Guide for using z-lab/gemma-4-26B-A4B-it-DFlash, a speculative decoding drafter model that achieves up to 3.7x speedup through parallel token drafting via block diffusion. Includes integration instructions for Transformers, vLLM, SGLang, and Docker with performance benchmarks on NVIDIA B300 GPUs.

OpenAI Blog · 2d ago · 7 · deployment agent workflow

OpenAI details Codex's production security architecture including sandboxing, approval workflows, network policies, and telemetry for safe agent deployment. Practical for engineers building coding agents who need enterprise-grade safety patterns and compliance mechanisms.

r/LocalLLaMA · 2d ago · 8 · tool inference open source deployment

ds4.c is a specialized native inference engine optimized for DeepSeek V4 Flash models, featuring Metal graph execution, aggressive 2-bit quantization (IQ2_XXS for MoE experts), and built-in server API. The project builds on GGML/llama.cpp foundations but is intentionally narrow and model-specific, delivering efficient inference for AI engineering workflows.

HuggingFace Blog · 2d ago · 8 · fine tuning tutorial workflow open source

MedQA demonstrates a complete LoRA fine-tuning pipeline for clinical question-answering on AMD ROCm hardware, proving that HuggingFace ecosystem tools (Transformers, PEFT, TRL, Accelerate) work seamlessly without CUDA. The project fine-tunes Qwen3-1.7B on MedMCQA dataset in ~5 minutes on MI300X with 192GB HBM3, requiring only three environment variables to switch from CUDA to ROCm.

Latent Space · 2d ago · 9 · new model api update agent inference

OpenAI released GPT-Realtime-2 with significant improvements for voice agent development: 128K context window, parallel tool calls with audible feedback, better interruption handling, adjustable reasoning levels (minimal to xhigh), and improved domain terminology retention. Also launched GPT-Realtime-Translate (70+ languages) and GPT-Realtime-Whisper for streaming transcription, all available in the Realtime API.

r/MachineLearning · 3d ago · 7 · rag tool workflow open source deployment

Engineer built a Steam game recommender system using RAG/vector embeddings on 2k reviews across 80k games, with a pipeline that extracts game vibes and mechanics into interpretable vectors stored in PostgreSQL + Chroma DB. The system uses ChatGPT to generate structured tags from reviews, clusters them semantically, and provides explainable recommendations via a React frontend deployed on Digital Ocean—demonstrating practical LLM integration for recommendation systems with focus on interpretability over black-box collaborative filtering.

r/LocalLLaMA · 3d ago · 8 · new model open source benchmark agent

Zyphra releases ZAYA1-74B-Preview, a 74B parameter MoE model (4B active) trained end-to-end on AMD hardware, demonstrating strong pass@4 reasoning performance on math and coding benchmarks despite being a pre-RL checkpoint. The open-source model (Apache 2.0) shows competitive reasoning capabilities and promising agentic task performance, with expectations for significant gains from pending RL post-training based on patterns observed in their 8B variant.

Anthropic Research · 3d ago · 8 · open source tool benchmark research

Petri 3.0, an open-source alignment testing toolbox, has been transferred to Meridian Labs nonprofit to evaluate LLMs for misaligned behaviors like deception and sycophancy. The tool uses separate auditor and judge models to systematically test alignment across scenarios, and is now part of a broader evaluation stack alongside Inspect and Scout for independent, credible model assessment.

Anthropic Research · 3d ago · 9 · research tool open source workflow

Anthropic introduces Natural Language Autoencoders (NLAs), a method that converts neural network activations into human-readable text explanations, enabling direct interpretation of what language models are thinking. The approach trains models to explain their own activations and reconstruct them from text, with applications to safety testing and reliability improvements. Code and an interactive frontend are released for researchers to build on this interpretability technique.

r/MachineLearning · 3d ago · 7 · tutorial inference deployment

Manning is releasing 'Quantization and Fast Inference' by Kalyan Aranganathan, a practical guide covering PTQ, QAT, and production deployment trade-offs for efficient model inference. The book addresses real-world quantization challenges like activation outliers in LLMs, KV cache optimization, and hardware-specific behavior—moving beyond theory to operational constraints.

Simon Willison · 3d ago · 6 · new model benchmark

Simon Willison shares retrospective analysis of Gemini 3.1 Flash-Lite, comparing the March preview version to the now-released production model. The writeup covers technical characteristics of this lightweight variant in Google's Gemini 3.1 lineup, useful for understanding model capabilities and trade-offs for different deployment scenarios.

r/MachineLearning · 3d ago · 7 · new model open source fine tuning ner

New open-source NER model (en_legal_ner_ind_trf v0.1) fine-tuned on InLegalBERT for Indian legal document extraction, achieving 78.67% F1 across 13 entity types with exceptional performance on case citations (97.76% F1). Addresses the gap left by unmaintained OpenNyAI model, particularly handling pre-1990 OCR-degraded constitutional texts using a silver-annotation pipeline combining regex, metadata projection, transformer NER, and gazetteer approaches trained with Focal Loss for label imbalance.