r/MachineLearning · 2d ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

HuggingFace Blog · 2d ago · 8 · new model inference open source tool

NVIDIA introduces Nemotron-Labs Diffusion, a new family of diffusion language models that generate multiple tokens in parallel and iteratively refine them, addressing latency bottlenecks in autoregressive generation. These models offer 3x-4x speedups on modern GPUs, support multiple generation modes (autoregressive, diffusion, self-speculation), and are available in 3B-14B scales with open licensing and training code via Megatron framework.

Anthropic Research · 2d ago · 7 · new model benchmark tool

Anthropic's Project Glasswing has discovered 10,000+ high/critical vulnerabilities in critical infrastructure software using Claude Mythos Preview, demonstrating AI's capability in automated security testing at scale. The post discusses Mythos Preview's vulnerability detection performance, coordination challenges with the 90-day disclosure timeline, and implications for AI-assisted security workflows.

r/MachineLearning · 2d ago · 6 · inference deployment workflow

Discussion of whether to build a custom lightweight image encoder for video frame classification instead of using foundation models like CLIP/DINO, with focus on CPU inference speed and deployment constraints. The poster describes a practical pipeline processing video streams through embeddings into a small transformer, seeking guidance on whether custom training on domain-specific data (few million images, 4-5 labels) would improve both speed and accuracy versus established encoders.

HuggingFace Blog · 2d ago · 8 · fine tuning benchmark open source inference

Dharma released DharmaOCR, a pair of specialized 3B-parameter language models that outperform frontier APIs on structured OCR tasks while being significantly cheaper to operate, challenging the industry assumption that largest models are always best. The article explores how specialization, fine-tuning pipelines, and distributional alignment can yield better performance and cost-efficiency than scaling parameters, supported by benchmarks and research across multiple domains.

r/MachineLearning · 2d ago · 8 · new model open source tool deployment

NuExtract3 is a new 4B open-weight model (Apache-2.0) purpose-built for document understanding tasks like PDF extraction, table recognition, and structured data extraction from complex layouts. It's immediately practical with free HuggingFace space, multiple quantization options (GPTQ, W8A8, FP8, Q4, Q6), and low resource requirements (4GB VRAM), making it a viable local alternative to API-based document extraction pipelines.

r/MachineLearning · 2d ago · 7 · benchmark workflow agent

Community discussion identifying gaps between standard benchmarks and real-world AI system robustness, particularly around ambiguous intent, context handling, and multi-turn sessions. Highlights the disconnect between optimizing for clean evaluation metrics versus building production-resilient systems.

OpenAI Blog · 3d ago · 6 · tool deployment workflow

Virgin Atlantic leveraged OpenAI's Codex to accelerate mobile app development under tight deadline constraints, achieving high test coverage and production quality. The case study demonstrates practical application of AI code generation for shipping real-world products with strong quality metrics.

Latent Space · 3d ago · 7 · tool deployment agent inference

Daytona provides cloud-based sandboxed compute infrastructure optimized for AI agents, enabling stateful, instantly-spinnable environments that handle massive scale (850k+ sandboxes/day). The infrastructure supports agentic workflows requiring composable computers with dynamic resource scaling, bare-metal architecture, and instant startup times (~60ms), addressing the emerging market gap between traditional code execution and agent-specific compute needs.

Simon Willison · 3d ago · 8 · tool agent open source library plugin

Datasette Agent is a new conversational AI assistant that lets users query data stored in Datasette using natural language, with LLM-powered SQL generation and an extensible plugin architecture. The tool integrates with modern LLMs (Gemini, Claude, local models) for reliable tool calling and SQL generation, and includes plugins for charts and other functionality. This represents a practical fusion of data querying and LLM agents with immediate applicability for engineers working with databases and AI.

r/LocalLLaMA · 3d ago · 6 · new model fine tuning open source

Latitude released Equinox, a 31B parameter model fine-tuned on Gemma 4 using balanced datasets combining dark adventure narratives and slice-of-life storytelling via supervised fine-tuning. The model is available via subscription on AI Dungeon with quantized GGUF weights provided for download, representing a practical example of multi-dataset fine-tuning for specialized narrative generation tasks.

Simon Willison · 3d ago · 7 · tool agent open source

A new Datasette Agent plugin enables running commands in a Fly Sprites sandbox environment, extending Datasette's capabilities for AI agents to execute code safely. This is a practical tool for developers building agentic systems that need sandboxed command execution alongside database operations.

r/MachineLearning · 3d ago · 6 · research fine tuning workflow

RPS (Regressive Plasticity Schedule) is a two-stage training approach combining curriculum learning with adaptive learning rate decay, showing improvements on ARC-AGI benchmarks and program synthesis tasks. The method trains models on easy data with high learning rates, then hard data with reduced learning rates, demonstrating 4% vs 2.4% performance gains over equal learning rate baselines.

r/MachineLearning · 3d ago · 7 · research inference open source

A proof-of-concept exploring inference-time learning within Mixture of Experts (MoE) architectures by inserting specialized expert modules that can update sibling expert weights dynamically. The work combines existing components in a novel way to enable adaptive behavior during inference, potentially useful for building more flexible AI systems without retraining.

Simon Willison · 3d ago · 7 · tool agent open source

Datasette Agent is a new extensible AI assistant built for Datasette, enabling users to query and interact with databases through an agentic interface. This tool bridges LLMs with database systems, useful for engineers building AI applications that need structured data access patterns.

r/MachineLearning · 3d ago · 6 · research inference

A Reddit discussion questioning why major AI labs haven't adopted adaptive/dynamic vision tokenization despite research showing potential efficiency gains. The post explores technical trade-offs like pipeline constraints requiring fixed token counts, uncertainty in scaling laws for adaptive methods, and whether marginal improvements justify implementation complexity.