r/LocalLLaMA · 22h ago · 7 · new model library inference

MOSS-TTS-v1.5 expands multilingual text-to-speech capabilities to 31 languages with improved performance through FlashAttention 2 support and optimized dependencies. The update maintains backward compatibility with v1.0 while adding support for languages like Cantonese, Hindi, Thai, and Vietnamese, with straightforward installation and generation APIs.

r/LocalLLaMA · 2d ago · 8 · new model tool inference open source deployment

MiniCPM5-1B is a new 1B-class open-source model achieving SOTA in its weight class with built-in hybrid reasoning modes, designed for on-device deployment and resource-constrained scenarios. The release includes deployment guides for Transformers, vLLM, and SGLang, plus fine-tuning resources and newly released training datasets (Ultra-FineWeb, UltraData-Math, UltraData-SFT).

r/LocalLLaMA · 4d ago · 8 · new model open source tool inference

LongCat-Video-Avatar 1.5 is an open-source framework for audio-driven human video generation with production-ready stability, supporting multiple input modalities (Audio-Text-to-Video, Audio-Text-Image-to-Video, Video Continuation) and compatible with Diffusers/Transformers libraries. The release includes comprehensive technical documentation, integration guides, and a detailed human evaluation benchmark across 6 application scenarios with both subjective and objective quality metrics.

HuggingFace Blog · 4d ago · 8 · new model inference open source tool

NVIDIA introduces Nemotron-Labs Diffusion, a new family of diffusion language models that generate multiple tokens in parallel and iteratively refine them, addressing latency bottlenecks in autoregressive generation. These models offer 3x-4x speedups on modern GPUs, support multiple generation modes (autoregressive, diffusion, self-speculation), and are available in 3B-14B scales with open licensing and training code via Megatron framework.

Anthropic Research · 4d ago · 7 · new model benchmark tool

Anthropic's Project Glasswing has discovered 10,000+ high/critical vulnerabilities in critical infrastructure software using Claude Mythos Preview, demonstrating AI's capability in automated security testing at scale. The post discusses Mythos Preview's vulnerability detection performance, coordination challenges with the 90-day disclosure timeline, and implications for AI-assisted security workflows.

r/MachineLearning · 5d ago · 8 · new model open source tool deployment

NuExtract3 is a new 4B open-weight model (Apache-2.0) purpose-built for document understanding tasks like PDF extraction, table recognition, and structured data extraction from complex layouts. It's immediately practical with free HuggingFace space, multiple quantization options (GPTQ, W8A8, FP8, Q4, Q6), and low resource requirements (4GB VRAM), making it a viable local alternative to API-based document extraction pipelines.

r/LocalLLaMA · 5d ago · 6 · new model fine tuning open source

Latitude released Equinox, a 31B parameter model fine-tuned on Gemma 4 using balanced datasets combining dark adventure narratives and slice-of-life storytelling via supervised fine-tuning. The model is available via subscription on AI Dungeon with quantized GGUF weights provided for download, representing a practical example of multi-dataset fine-tuning for specialized narrative generation tasks.

Latent Space · 6d ago · 9 · new model research inference benchmark

OpenAI's general-purpose LLM achieved a novel research result on the Erdős unit distance problem through extended reasoning (125-page output), demonstrating that inference-time scaling enables frontier mathematical reasoning without domain-specific scaffolding. This validates test-time compute as a key scaling paradigm and suggests reasoning capabilities may generalize beyond competition math to open research problems.

r/LocalLLaMA · 6d ago · 8 · new model tool inference open source deployment

Command A+ is a new 25B active parameter open-source MoE model from Cohere optimized for agentic and reasoning tasks with multimodal support. The article provides practical integration guides for Transformers, vLLM, SGLang, and Docker deployments, plus details on quantization options and model architecture including sparse MoE with 128 experts and multilingual support across 48 languages.

Simon Willison · 6d ago · 6 · new model agent deployment

Google I/O 2026 introduced Gemini 3.5 Flash and Gemini Spark, a new AI agent product integrating with Google Workspace apps, running on Gemini 3.5 Flash and a closed-source Go binary called Antigravity. Key technical consideration: Spark uses isolated ephemeral VMs with DLP policies for enterprise security, though the author notes this is a critical area given prompt injection risks with sensitive data flows.

Latent Space · 7d ago · 9 · new model api update agent workflow

Google released Gemini 3.5 Flash (GA immediately) with 1M context window, 65k max output, and agentic/coding capabilities, plus the new Gemini Omni multimodal family for video/audio generation and editing. The stack includes expanded Antigravity agents across desktop/CLI/SDK/API, with Google reporting 3.2 quadrillion tokens/month processed and 900M+ monthly users.

Simon Willison · 7d ago · 9 · new model api update deployment inference

Google released Gemini 3.5 Flash to general availability with 1M input/65K output tokens, integrated into billions of consumer products, but at 3-6x higher pricing than previous Flash models ($1.50/$9 per million tokens). The release includes a new Interactions API (beta) for server-side history management and demonstrates industry-wide trend of pricing increases for new model releases across OpenAI, Anthropic, and Google.

r/LocalLLaMA · 7d ago · 6 · new model benchmark

Community discussion about HRM-Text, a new 1B parameter model with impressive benchmark claims. The post raises valid skepticism about the benchmarks and seeks technical explanation of the model's architecture and practical limitations for engineers evaluating whether to adopt it.

HuggingFace Blog · 7d ago · 7 · new model inference benchmark

OlmoEarth v1.1 achieves 3x compute cost reduction for satellite imagery processing while maintaining performance through optimized transformer architecture and token representation strategies. The release demonstrates practical efficiency improvements in large-scale geospatial AI inference, with technical details on patch-based tokenization and multi-resolution handling for remote sensing data.

r/LocalLLaMA · 8d ago · 8 · new model research inference

Lance is a unified multimodal model from ByteDance that handles image and video understanding, generation, and editing in a single framework. The paper demonstrates strong performance on diverse visual reasoning tasks including video QA, chart analysis, and detailed scene description, making it relevant for engineers building multimodal AI applications.

HuggingFace Blog · 8d ago · 8 · new model tool open source rag tutorial

Six new Sentence Transformers CrossEncoder rerankers built on ModernBERT, trained with distillation on open datasets, achieving SOTA performance at multiple model sizes. Includes full training recipes, easy 3-line inference API, and a new Hugging Face Agent Skill for fine-tuning rerankers on custom data.

DeepMind Blog · 9d ago · 7 · new model tool agent

Google has expanded Project Genie, their world model capable of generating interactive environments, by integrating Street View imagery to ground virtual worlds in real-world locations. This enables AI agents and robots to train and simulate in realistic environments tied to actual places, with the capability now rolling out to Google AI Ultra subscribers globally.

DeepMind Blog · 9d ago · 8 · new model api update inference

Google released Gemini Omni Flash, a multimodal generative model that creates and edits video from text, image, audio, and video inputs with consistent physics and character continuity. The model supports iterative natural language editing and reasoning about real-world physics, now rolling out to Gemini app, Google Flow, and YouTube Shorts with plans to add image and audio generation.

DeepMind Blog · 10d ago · 7 · new model agent api update workflow

Google launches Gemini for Science, a collection of experimental AI tools (Co-Scientist, Alpha Evolve, Empirical Research Assistance, NotebookLM) designed to accelerate scientific research workflows by automating complex tasks like literature analysis and data synthesis. Enterprise versions are already in private preview with companies like BASF and Bayer, with validation papers published in Nature.

r/LocalLLaMA · 10d ago · 7 · new model open source tool inference agent benchmark

Jackrong/Qwopus3.5-9B-Coder-GGUF is a 9B fine-tuned coding model optimized for agentic tasks, tool calling, and complex reasoning, with practical integration guides across multiple inference frameworks (llama.cpp, vLLM, Ollama, etc.) and strong performance on SWE-bench benchmarks. The model runs efficiently on 16GB RAM devices at 8-bit precision, making it accessible for local development while maintaining competitive coding capabilities.