News Nug

LatitudeGames/Equinox-31B · Hugging Face

r/LocalLLaMA · 5d ago · 6 · new model fine tuning open source

Latitude released Equinox, a 31B parameter model fine-tuned on Gemma 4 using balanced datasets combining dark adventure narratives and slice-of-life storytelling via supervised fine-tuning. The model is available via subscription on AI Dungeon with quantized GGUF weights provided for download, representing a practical example of multi-dataset fine-tuning for specialized narrative generation tasks.

datasette-agent-sprites 0.1a0

Simon Willison · 5d ago · 7 · tool agent open source

A new Datasette Agent plugin enables running commands in a Fly Sprites sandbox environment, extending Datasette's capabilities for AI agents to execute code safely. This is a practical tool for developers building agentic systems that need sandboxed command execution alongside database operations.

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

r/MachineLearning · 6d ago · 6 · research fine tuning workflow

RPS (Regressive Plasticity Schedule) is a two-stage training approach combining curriculum learning with adaptive learning rate decay, showing improvements on ARC-AGI benchmarks and program synthesis tasks. The method trains models on easy data with high learning rates, then hard data with reduced learning rates, demonstrating 4% vs 2.4% performance gains over equal learning rate baselines.

Does this idea sound fun? [R]

r/MachineLearning · 6d ago · 7 · research inference open source

A proof-of-concept exploring inference-time learning within Mixture of Experts (MoE) architectures by inserting specialized expert modules that can update sibling expert weights dynamically. The work combines existing components in a novel way to enable adaptive behavior during inference, potentially useful for building more flexible AI systems without retraining.

datasette-agent 0.1a3

Simon Willison · 6d ago · 7 · tool agent open source

Datasette Agent is a new extensible AI assistant built for Datasette, enabling users to query and interact with databases through an agentic interface. This tool bridges LLMs with database systems, useful for engineers building AI applications that need structured data access patterns.

Do VLMs in production still use fixed-patch ViTs for their vision capabilities? [D]

r/MachineLearning · 6d ago · 6 · research inference

A Reddit discussion questioning why major AI labs haven't adopted adaptive/dynamic vision tokenization despite research showing potential efficiency gains. The post explores technical trade-offs like pipeline constraints requiring fixed token counts, uncertainty in scaling laws for adaptive methods, and whether marginal improvements justify implementation complexity.

[AINews] OpenAI GPT-next disproves 80 year old Erdős planar unit distance problem for under $1000

Latent Space · 6d ago · 9 · new model research inference benchmark

OpenAI's general-purpose LLM achieved a novel research result on the Erdős unit distance problem through extended reasoning (125-page output), demonstrating that inference-time scaling enables frontier mathematical reasoning without domain-specific scaffolding. This validates test-time compute as a key scaling paradigm and suggests reasoning capabilities may generalize beyond competition math to open research problems.

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL [R]

r/MachineLearning · 6d ago · 8 · research agent open source benchmark

Research on masked diffusion language models (MDLMs) for world modeling in RL environments, addressing mode collapse and diversity limitations of autoregressive models. Introduces GRPO training framework with zero-shot transfer across multiple open-source environments and agent backbones, with open-sourced code and dataset of 239K trajectories.

OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D]

r/MachineLearning · 6d ago · 8 · research benchmark inference

OpenAI's reasoning model discovered a counterexample to a long-standing conjecture in discrete geometry (Erdős's unit-distance problem), with the proof verified by an AI grading pipeline and human mathematicians. The result is technically significant for AI-for-science, but lacks crucial experimental details (model name, sampling strategy, compute budget, full pipeline specs) needed to assess whether this represents genuine autonomous research capability or selective reporting from extensive search.

How fast is 10 tokens per second really?

Simon Willison · 6d ago · 6 · tool inference

Interactive tool that visualizes LLM token generation speeds (5-800 tokens/second) to help developers understand what different inference throughput claims actually feel like in practice. Useful for evaluating model performance claims and understanding real-world latency implications.

under 2% quality gap but 10x cost difference: tested 5 models on identical tool calling tasks[D]

r/MachineLearning · 7d ago · 8 · agent inference deployment benchmark

Practical cost-optimization study comparing five LLMs (Opus, GPT-5, Sonnet, DeepSeek V4, Hunyuan) on an MCP-based file management agent across 500+ tool calls, revealing surprisingly small quality gaps (96-99% success) despite 10x price differences. Author deployed Hunyuan locally via MLX on M2 Ultra for $5.5k, reducing daily inference costs from $40 to $9 through intelligent routing (local/cheap API for routine tasks, expensive models for complex failures).

CohereLabs/command-a-plus-05-2026-bf16 · Hugging Face

r/LocalLLaMA · 7d ago · 8 · new model tool inference open source deployment

Command A+ is a new 25B active parameter open-source MoE model from Cohere optimized for agentic and reasoning tasks with multimodal support. The article provides practical integration guides for Transformers, vLLM, SGLang, and Docker deployments, plus details on quantization options and model architecture including sparse MoE with 128 experts and multilingual support across 48 languages.

Google I/O, Gemini Spark, Antigravity

Simon Willison · 7d ago · 6 · new model agent deployment

Google I/O 2026 introduced Gemini 3.5 Flash and Gemini Spark, a new AI agent product integrating with Google Workspace apps, running on Gemini 3.5 Flash and a closed-source Go binary called Antigravity. Key technical consideration: Spark uses isolated ephemeral VMs with DLP policies for enterprise security, though the author notes this is a critical area given prompt injection risks with sensitive data flows.

AMD Ryzen AI Halo PC will cost 3999$ with 128GB memory on board

r/LocalLLaMA · 7d ago · 5

NOML-NOML: hierarchical TD3 + anchor policy for flight control [P]

r/MachineLearning · 7d ago · 8 · open source research library agent

Engineer open-sourced NOML, a custom RL algorithm for continuous control that addresses instability in flight simulation by combining anchor policy (safe action fallback), hierarchical actor architecture (independent MLP heads per control axis), and mirror learning for data efficiency. The approach diverges from standard TD3 by eliminating exploration noise while maintaining stability through structural constraints rather than reward shaping.

[WIP] Gemma 4 MTP

r/LocalLLaMA · 7d ago · 7 · inference optimization open source

Pull request discussion on implementing MTP (Multi-token prediction) speculative decoding for Gemma 4 models in llama.cpp, achieving >2x speedup on dense models with caveats around hardware compatibility and multi-GPU support. The thread documents real-world performance testing across different GPU setups, revealing variable results depending on hardware configuration and noting current limitations like broken multi-GPU support and incompatibility with quantized KV cache.

CANTANTE: Optimizing Agentic Systems via Contrastive Credit Attribution [R]

r/MachineLearning · 7d ago · 8 · agent prompt engineering research open source

CANTANTE is a novel framework that automates multi-agent LLM system configuration by solving the credit assignment problem, allowing per-agent prompt optimization from global task rewards rather than manual tuning. The approach outperforms DSPy baselines (GEPA, MIPROv2) by 12-19 points on standard benchmarks while maintaining inference costs, with open-source code available.

Machine Learning on Spherical Manifold [R]

r/MachineLearning · 7d ago · 7 · tutorial research workflow

This article explains Riemannian optimization techniques for machine learning on manifolds (like hyperspheres), focusing on how to adapt gradient descent to preserve geometric constraints using exponential maps and retractions. It provides practical implementation guidance for constraining neural network parameters to stay on spherical manifolds, with code examples using PyTorch.

[AINews] Google I/O 2026: Gemini 3.5 Flash, Omni (NanoBanana for Video), Spark (background agents), and Antigravity 2.0

Latent Space · 7d ago · 9 · new model api update agent workflow

Google released Gemini 3.5 Flash (GA immediately) with 1M context window, 65k max output, and agentic/coding capabilities, plus the new Gemini Omni multimodal family for video/audio generation and editing. The stack includes expanded Antigravity agents across desktop/CLI/SDK/API, with Google reporting 3.2 quadrillion tokens/month processed and 900M+ monthly users.

How Ramp engineers accelerate code review with Codex

OpenAI Blog · 7d ago · 6 · workflow tool

Ramp shares their workflow using Codex (OpenAI's code model) integrated with GPT-5.5 for automated code review, reducing feedback cycles from hours to minutes. The article highlights practical implementation of AI-assisted code review as part of their development process, offering insights into how organizations can adopt similar AI-powered review systems.