News Nug

internlm/Intern-S2-Preview · Hugging Face

r/LocalLLaMA · 26d ago · 8 · new model tool inference open source agent

Intern-S2-Preview is a new 35B multimodal scientific foundation model that achieves strong performance through task scaling and full-chain training (pre-training to RL), with enhanced agent capabilities and efficient reasoning techniques. The release includes deployment guides for popular inference frameworks (Transformers, vLLM, SGLang) and demonstrates competitive performance on scientific and general reasoning benchmarks while maintaining multimodal understanding.

elephant-agent — Personal-Model First Self Evolving AI Agent 🐘

GitHub Trending AI · 26d ago · 6 · agent workflow open source

Elephant Agent is a personal AI agent framework that maintains persistent, evolving context about a user through selective memory and a correctable personal model rather than storing full transcripts. The system uses curiosity-driven loops to extract durable knowledge from interactions and present it through a dashboard for user oversight and correction.

HermesPet — 让 AI 住在你 MacBook 的刘海里 · 零依赖开箱即用 · 多引擎并行的桌面 AI 伴侣（Swift 6 / SwiftUI / macOS 14+）

GitHub Trending AI · 26d ago · 6 · tool open source deployment

HermesPet is a native macOS AI client that runs in the notch with support for multiple parallel AI engines (Claude, DeepSeek, Kimi, OpenAI, etc.) and offers local-first features like file handling, voice input, and knowledge graph visualization. Windows beta version also available with core chat and tool capabilities. Technically interesting for its Swift 6/SwiftUI implementation and multi-model orchestration approach.

parastore — Draw a store, generate LLM personas, and watch them shop — an isometric 3D sandbox for synthetic-consumer experiments.

GitHub Trending AI · 26d ago · 5

arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]

r/MachineLearning · 27d ago · 6 · workflow prompt engineering

arXiv moderator Thomas Dietterich clarifies the platform's Code of Conduct regarding AI-generated content in academic papers, emphasizing author responsibility for all submitted material regardless of generation method. The post outlines specific penalties (1-year ban + peer-review requirement) for papers with evidence of unchecked LLM outputs, with concrete examples like hallucinated references and meta-comments left in final submissions.

datasette-llm-limits 0.1a0

Simon Willison · 27d ago · 6 · tool library deployment

A new Datasette plugin enables spending limit controls for LLM usage, integrating with datasette-llm and datasette-llm-accountant to manage per-user or global cost caps. This addresses practical cost management for developers building LLM applications within Datasette environments.

[AINews] Everything is Conductor

Latent Space · 27d ago · 7 · tool agent api update workflow

GitHub and OpenAI released significant updates to coding agent tooling: GitHub's new Copilot App provides an agent-first desktop environment for parallel workflows, while OpenAI expanded Codex into mobile with remote execution, SSH management, and programmatic automation hooks. VS Code added multi-agent/multi-project support with browser/mobile access via vscode.dev/agents and token-efficiency features.

How data science teams use Codex

OpenAI Blog · 27d ago · 5 · workflow api update

Article describes using Codex (OpenAI's code model) to automate documentation generation for data science workflows, converting raw work inputs into structured business outputs like briefs and analytics specs. Practical for engineers integrating LLMs into data pipelines, though focuses more on business process automation than novel technical implementation.

Follow the Mean: Reference-Guided Flow Matching [R]

r/MachineLearning · 27d ago · 6 · research inference

This paper introduces reference-guided flow matching, a technique that leverages mean trajectories to improve generative model training and sampling efficiency. While technically interesting for diffusion model research, it's primarily a theoretical contribution that may be relevant for engineers building advanced generative systems rather than those in immediate production use.

A First Comprehensive Study of TurboQuant: Accuracy and Performance

r/LocalLLaMA · 27d ago · 8 · inference benchmark research optimization

TurboQuant is a KV-cache quantization method that compresses to 3-4 bits during storage and dequantizes to BF16 for attention computation, offering significant GPU memory savings. This comprehensive benchmark study evaluates TurboQuant variants against FP8 baselines across four large models (30B-200B+) and realistic workloads, providing practical guidance for inference optimization and memory efficiency tradeoffs.

Sea's View on the Future of Agentic Software Development with Codex

OpenAI Blog · 27d ago · 5 · deployment workflow

Sea Limited is adopting Codex (OpenAI's code generation model) to accelerate development across engineering teams in Asia. The piece discusses deployment strategy and organizational workflow changes for AI-assisted coding, relevant for understanding enterprise adoption patterns of code generation tools.

NVIDIA Reportedly Prepares RTX 5090 Price Hike Amid Rising GDDR7 Costs (maybe RTX 50 and PRO series as well)

r/LocalLLaMA · 27d ago · 5

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

HuggingFace Blog · 27d ago · 8 · new model open source rag deployment inference

Granite Embedding Multilingual R2 releases two new multilingual embedding models (97M and 311M parameters) supporting 200+ languages with 32K token context length and enhanced retrieval for 52 languages plus code. Both models ship with ONNX/OpenVINO optimization, work out-of-the-box with sentence-transformers and major RAG frameworks (LangChain, LlamaIndex, Haystack, Milvus), and are Apache 2.0 licensed—enabling drop-in replacement for language coverage at minimal performance cost.

VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things)

r/LocalLLaMA · 27d ago · 6 · tool workflow api update

VS Code's AI Toolkit extension now supports agent-first development with configurable language models optimized for different tasks, including reasoning models with adjustable thinking effort levels. The article covers model selection strategies (fast vs. reasoning models), tool-calling support for agents, and how to configure API keys for custom models.

Work with Codex from anywhere

OpenAI Blog · 27d ago · 6 · api update tool workflow

OpenAI's Codex integration in the ChatGPT mobile app enables remote code generation and task monitoring across devices. This expands practical access to AI-assisted coding workflows beyond desktop environments, useful for developers managing remote infrastructure or mobile-first development pipelines.

mobilegym — MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research · 浏览器里运行的安卓模拟器 · Browser-hosted Android Simulator · Verifiable Evaluation · Scalable Online RL Training

GitHub Trending AI · 27d ago · 8 · tool benchmark open source agent rl training

MobileGym is a browser-based mobile simulation environment for training and evaluating GUI agents, featuring 28 simulated apps, 416 task templates, and proven Sim-to-Real transfer (95.1% retention on real devices). The platform supports 256 parallel instances for scalable RL training and includes detailed benchmarking infrastructure with GRPO training recipes tested on Qwen3-VL-4B.

Rare event prediction on time series that change structure mid-stream? [D]

r/MachineLearning · 27d ago · 7 · workflow tutorial research

A practitioner shares a real-world time series anomaly detection challenge: building failure prediction for IoT chargers with sparse positive labels (~1-2%), variable data rates between operational modes, and high device heterogeneity. They're exploring architectural solutions (dual RNN encoders vs. data-level sampling) and seeking advice on handling extreme class imbalance in time series forecasting.

datasette-ip-rate-limit 0.1a0

Simon Willison · 27d ago · 6 · tool workflow deployment

Simon Willison describes using GPT-5.5 to generate a configurable rate-limiting plugin for handling crawler traffic on datasette.io. The post provides practical insights into using LLMs for DevOps/infrastructure automation and production deployment patterns.

Continual Harness: Online Adaptation for Self-Improving Foundation Agents [R]

r/MachineLearning · 27d ago · 8 · agent research workflow

Research paper on Continual Harness demonstrates how foundation models can autonomously refine their own execution harnesses through iterative self-improvement, demonstrated via Gemini completing Pokémon games without losses. The work formalizes the agent-harness co-learning loop and shows that self-refinement capabilities are critical for long-horizon task completion, with implications for building more autonomous AI systems.

Unlocking asynchronicity in continuous batching

HuggingFace Blog · 28d ago · 8 · inference workflow tutorial

This article explains how to optimize LLM inference performance by decoupling CPU and GPU workloads through asynchronous batching, eliminating idle gaps that waste ~24% of runtime in synchronous approaches. The post builds on continuous batching concepts and provides practical profiling techniques to measure and improve GPU utilization, critical for managing high inference costs on hardware like H200s.