News Nug

Marc Andreessen introspects on The Death of the Browser, Pi + OpenClaw, and Why "This Time Is Different"

Latent Space · 54d ago · 6 · agent open source inference prompt engineering

Marc Andreessen discusses AI's 80-year technical trajectory, scaling laws, reasoning models, agents, and edge inference in a long-form conversation. Key technical insights include his perspectives on agents as a Unix-like architecture, edge AI economics, open-source models, and why software bottlenecks may matter more than model improvements going forward.

[AINews] Gemma 4: The best small Multimodal Open Models, dramatically better than Gemma 3 in every way

Latent Space · 54d ago · 9 · new model open source agent inference api update

Google DeepMind released Gemma 4, a family of open-weight models (31B dense, 26B MoE, edge variants) under Apache 2.0 license with native multimodal support (text/image/video/audio), 256K context, and function calling—positioning it as a top-tier open model for reasoning, agents, and edge deployment. The 31B variant achieves competitive performance with significantly fewer parameters than rivals, with strong benchmarks on GPQA and AIME, and rapid ecosystem adoption already underway.

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

Latent Space · 55d ago · 7 · research workflow benchmark

Moonlake AI presents an alternative world modeling approach using game engine bootstrapping and structured representations rather than pure scaling, addressing limitations of models like Genie 3 through multiplayer interactivity, indefinite lifetimes, and better physical consistency. The research emphasizes efficiency via causal structure and semantic understanding over high-resolution pixel prediction, with insights from Chris Manning and Ian Goodfellow on why this architectural approach is necessary for practical planning and environmental understanding.

Gemma 4: Byte for byte, the most capable open models

DeepMind Blog · 55d ago · 9 · new model open source inference fine tuning

Google released Gemma 4, a family of open-source models (2B to 31B parameters) built on Gemini 3 technology, ranked #3 and #6 on Arena AI leaderboard for their sizes. The models are optimized for on-device deployment, agentic workflows, and fine-tuning across hardware from mobile to datacenter, with Apache 2.0 licensing enabling direct integration into engineering workflows.

[AINews] A quiet April Fools

Latent Space · 55d ago · 7 · new model open source agent benchmark deployment

Multiple open-weight model releases including Arcee's 400B Trinity-Large-Thinking (Apache 2.0, strong agentic benchmarks), Z.ai's GLM-5V-Turbo (native multimodal vision-coding), and TII's Falcon Perception with efficient OCR. Also covers a Claude Code source leak analysis and competitive landscape updates relevant to developers building agents and deploying models.

Welcome Gemma 4: Frontier multimodal intelligence on device

HuggingFace Blog · 55d ago · 9 · new model open source benchmark deployment

Google releases Gemma 4, a new family of open-source multimodal models (4 sizes, up to 31B dense and 26B MoE) with Apache 2 licenses, strong arena benchmark scores, and support for image/audio/text inputs. The models feature novel architecture improvements like Per-Layer Embeddings and variable aspect ratio image encoding, with broad framework support (transformers, llama.cpp, MLX, WebGPU, Rust) for on-device and server deployment.

Holo3: Breaking the Computer Use Frontier

HuggingFace Blog · 56d ago · 7 · new model agent benchmark open source inference

Holo3 is a new 10B-parameter agent model achieving 78.85% on OSWorld benchmark for autonomous desktop task execution, with weights openly available on Hugging Face under Apache2 license. The model is production-ready and trained via a specialized flywheel combining synthetic navigation data, out-of-domain augmentation, and curated reinforcement learning for computer use tasks across enterprise applications.

Vibe-Trading — "Vibe-Trading: Your Personal Trading Agent"

GitHub Trending AI · 56d ago · 6 · tool agent open source api update

Vibe-Trading is an open-source multi-agent finance workspace that converts natural language into trading strategies and market analysis, with support for multiple LLM providers and free data sources. It offers API, CLI, and MCP plugin interfaces for integration into AI agent workflows, with backtesting capabilities and multi-platform export for TradingView/MT5.

Falcon Perception

HuggingFace Blog · 56d ago · 7 · new model open source research

TII releases Falcon OCR, a 0.3B parameter model achieving 80.3/88.6 on olmOCR/OmniDocBench benchmarks with the highest throughput among open-source OCR models. The post details their unified early-fusion Transformer architecture that combines vision and language modeling in a single backbone with hybrid attention masks and structured Chain-of-Perception decoding for dense object detection and segmentation.

Any Custom Frontend with Gradio's Backend

HuggingFace Blog · 56d ago · 8 · tool workflow api update deployment

gradio.Server enables building custom frontends (React, Svelte, vanilla JS) while leveraging Gradio's backend infrastructure including queuing, concurrency management, ZeroGPU support, and gradio_client compatibility. The approach extends FastAPI to provide both traditional Gradio UI components and full custom frontend flexibility with the same backend power.

open-multi-agent — TypeScript multi-agent framework — one runTeam() call from goal to result. Auto task decomposition, parallel execution. 3 dependencies, deploys anywhere Node.js runs.

GitHub Trending AI · 56d ago · 8 · library open source agent tool

open-multi-agent is a lightweight TypeScript multi-agent orchestration framework with minimal dependencies (3 runtime deps) designed for goal-driven agent coordination in Node.js environments. It provides a simpler alternative to LangGraph (declarative graph approach) and CrewAI (Python), with built-in features like structured output, task retry, and human-in-the-loop capabilities.

claude-code-book — 《御舆：解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析，15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站：

GitHub Trending AI · 57d ago · 7 · agent architecture tutorial workflow

A comprehensive Chinese technical guide ("御舆") that deconstructs AI Agent architecture, specifically analyzing Claude Code's design patterns including conversation loops, tool permission pipelines, context compression, and the Agent Harness runtime framework. Provides a transferable mental model for building production-grade agent systems across different frameworks without relying on prompt engineering tutorials.

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

HuggingFace Blog · 57d ago · 8 · new model open source fine tuning research

IBM releases Granite 4.0 3B Vision, a modular vision-language model optimized for chart and document understanding, delivered as a LoRA adapter on Granite 4.0 Micro with a novel DeepStack architecture for multi-layer visual feature injection. The release includes ChartNet, a 1.7M-sample synthetic dataset for chart interpretation with code-guided augmentation, addressing a key VLM weakness in structured data reasoning.

claude-reviews-claude — Claude reads its own source code — 17-chapter architectural deep-dive into Claude Code v2.1.88. EN/ZH bilingual.

GitHub Trending AI · 57d ago · 6 · research open source

A comprehensive architectural analysis of Claude Code v2.1.88's TypeScript codebase (1,902 files, 477K lines), written by Claude itself, covering the query engine, 42 tools, multi-agent coordination, and 7-layer security model. While meta and entertaining, this is primarily documentation/breakdown of Anthropic's internal tooling rather than actionable technical content for building with AI.

how-claude-code-works — Deep dive into Claude Code internals — architecture, agent loop, context engineering, and more. / 深入解析 Claude Code 源码：架构、Agent 循环、上下文工程、工具系统等

GitHub Trending AI · 57d ago · 9 · agent architecture tutorial open source workflow

In-depth technical analysis of Claude Code's source architecture, covering the agent loop, context engineering, tool system, and production-grade error recovery strategies. Includes a companion project (Claude Code From Scratch) with ~4000 lines of TypeScript/Python and 11-chapter tutorial for building your own AI programming agent from scratch.

m_flow — A bio-inspired cognitive memory engine — a new paradigm for Graph RAG.

GitHub Trending AI · 57d ago · 7 · rag research workflow

M-flow introduces a novel RAG architecture where the knowledge graph becomes the scoring engine rather than a supporting structure, using path-cost reasoning and granularity-matched retrieval to find relevant evidence chains instead of relying on vector similarity alone. The system organizes knowledge in a four-layer cone graph (Episode→Facet→FacetPoint→Entity) and propagates evidence through typed edges to score relevance by coherent reasoning paths rather than embedding proximity.

Training mRNA Language Models Across 25 Species for $165

HuggingFace Blog · 57d ago · 8 · open source tool research benchmark fine tuning

OpenMed built an end-to-end open-source protein engineering pipeline combining structure prediction, sequence design, and codon optimization, with novel contributions in codon-level language modeling. They benchmarked transformer architectures (CodonRoBERTa-large-v2 vs ModernBERT) for codon optimization, scaled to 25 species in 55 GPU-hours, and released runnable code with full experimental transparency—directly applicable for engineers building biological AI systems.

TRL v1.0: Post-Training Library Built to Move with the Field

HuggingFace Blog · 57d ago · 8 · library fine tuning workflow research

TRL v1.0 introduces architectural lessons for building stable post-training libraries that can adapt as methods evolve from PPO to DPO to RLVR approaches. The library design prioritizes flexibility over fixed abstractions, recognizing that core concepts like reward models shift between being fundamental, optional, or reimagined as verifiers across different training paradigms.

Reimagining the mouse pointer for the AI era

DeepMind Blog · 59d ago · 6 · workflow ui design api update

Google is exploring AI-powered pointer interactions that bring contextual AI capabilities directly into users' existing workflows across applications, powered by Gemini. The approach focuses on reducing prompt engineering friction by letting AI understand visual and semantic context from pointer position and natural language commands, demonstrating principles for more intuitive human-AI interaction patterns.

Gemini 3.1 Flash Live: Making audio AI more natural and reliable

DeepMind Blog · 62d ago · 8 · new model api update agent

Google released Gemini 3.1 Flash Live, an improved real-time audio model with better precision, lower latency, and enhanced tonal understanding for voice-first applications. Available via Gemini Live API, it achieves 90.8% on ComplexFuncBench Audio and 36.1% on Scale AI's Audio MultiChallenge, enabling developers to build voice agents that handle complex tasks with natural dialogue in noisy environments.