r/MachineLearning · 2d ago · 7 · tutorial prompt engineering

A hands-on explanation of LLM architecture breaking down how token prediction works through embeddings, positional encoding, attention, and the LM Head—using a simple 4-sentence example to illustrate why models predict contextually appropriate tokens. Demystifies transformer mechanics by focusing on the core probability matching problem rather than advanced concepts, making it accessible for engineers learning from first principles.

r/MachineLearning · 2d ago · 6 · api update inference deployment

Analysis of AI lab profitability models (Anthropic, xAI, OpenAI) and their implications for API pricing and developer costs. The article examines divergent strategies: Anthropic's enterprise lock-in approach with claimed 77% margins versus xAI's aggressive subsidy-driven approach, with direct impact on token pricing through Q3.

r/LocalLLaMA · 2d ago · 8 · new model open source tool inference

LongCat-Video-Avatar 1.5 is an open-source framework for audio-driven human video generation with production-ready stability, supporting multiple input modalities (Audio-Text-to-Video, Audio-Text-Image-to-Video, Video Continuation) and compatible with Diffusers/Transformers libraries. The release includes comprehensive technical documentation, integration guides, and a detailed human evaluation benchmark across 6 application scenarios with both subjective and objective quality metrics.

r/MachineLearning · 2d ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

HuggingFace Blog · 2d ago · 8 · new model inference open source tool

NVIDIA introduces Nemotron-Labs Diffusion, a new family of diffusion language models that generate multiple tokens in parallel and iteratively refine them, addressing latency bottlenecks in autoregressive generation. These models offer 3x-4x speedups on modern GPUs, support multiple generation modes (autoregressive, diffusion, self-speculation), and are available in 3B-14B scales with open licensing and training code via Megatron framework.

Anthropic Research · 2d ago · 7 · new model benchmark tool

Anthropic's Project Glasswing has discovered 10,000+ high/critical vulnerabilities in critical infrastructure software using Claude Mythos Preview, demonstrating AI's capability in automated security testing at scale. The post discusses Mythos Preview's vulnerability detection performance, coordination challenges with the 90-day disclosure timeline, and implications for AI-assisted security workflows.

r/MachineLearning · 2d ago · 6 · inference deployment workflow

Discussion of whether to build a custom lightweight image encoder for video frame classification instead of using foundation models like CLIP/DINO, with focus on CPU inference speed and deployment constraints. The poster describes a practical pipeline processing video streams through embeddings into a small transformer, seeking guidance on whether custom training on domain-specific data (few million images, 4-5 labels) would improve both speed and accuracy versus established encoders.

HuggingFace Blog · 2d ago · 8 · fine tuning benchmark open source inference

Dharma released DharmaOCR, a pair of specialized 3B-parameter language models that outperform frontier APIs on structured OCR tasks while being significantly cheaper to operate, challenging the industry assumption that largest models are always best. The article explores how specialization, fine-tuning pipelines, and distributional alignment can yield better performance and cost-efficiency than scaling parameters, supported by benchmarks and research across multiple domains.

r/MachineLearning · 2d ago · 8 · new model open source tool deployment

NuExtract3 is a new 4B open-weight model (Apache-2.0) purpose-built for document understanding tasks like PDF extraction, table recognition, and structured data extraction from complex layouts. It's immediately practical with free HuggingFace space, multiple quantization options (GPTQ, W8A8, FP8, Q4, Q6), and low resource requirements (4GB VRAM), making it a viable local alternative to API-based document extraction pipelines.

r/MachineLearning · 3d ago · 7 · benchmark workflow agent

Community discussion identifying gaps between standard benchmarks and real-world AI system robustness, particularly around ambiguous intent, context handling, and multi-turn sessions. Highlights the disconnect between optimizing for clean evaluation metrics versus building production-resilient systems.

OpenAI Blog · 3d ago · 6 · tool deployment workflow

Virgin Atlantic leveraged OpenAI's Codex to accelerate mobile app development under tight deadline constraints, achieving high test coverage and production quality. The case study demonstrates practical application of AI code generation for shipping real-world products with strong quality metrics.

Latent Space · 3d ago · 7 · tool deployment agent inference

Daytona provides cloud-based sandboxed compute infrastructure optimized for AI agents, enabling stateful, instantly-spinnable environments that handle massive scale (850k+ sandboxes/day). The infrastructure supports agentic workflows requiring composable computers with dynamic resource scaling, bare-metal architecture, and instant startup times (~60ms), addressing the emerging market gap between traditional code execution and agent-specific compute needs.

Simon Willison · 3d ago · 8 · tool agent open source library plugin

Datasette Agent is a new conversational AI assistant that lets users query data stored in Datasette using natural language, with LLM-powered SQL generation and an extensible plugin architecture. The tool integrates with modern LLMs (Gemini, Claude, local models) for reliable tool calling and SQL generation, and includes plugins for charts and other functionality. This represents a practical fusion of data querying and LLM agents with immediate applicability for engineers working with databases and AI.

r/LocalLLaMA · 3d ago · 6 · new model fine tuning open source

Latitude released Equinox, a 31B parameter model fine-tuned on Gemma 4 using balanced datasets combining dark adventure narratives and slice-of-life storytelling via supervised fine-tuning. The model is available via subscription on AI Dungeon with quantized GGUF weights provided for download, representing a practical example of multi-dataset fine-tuning for specialized narrative generation tasks.

Simon Willison · 3d ago · 7 · tool agent open source

A new Datasette Agent plugin enables running commands in a Fly Sprites sandbox environment, extending Datasette's capabilities for AI agents to execute code safely. This is a practical tool for developers building agentic systems that need sandboxed command execution alongside database operations.