News Nug

Looking for arXiv endorsement + sharing a preprint on homeostatic cognitive architecture for AI companions [R]

r/MachineLearning · 4d ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

Scaling LLMs horizontally: hidden-state coupling without weight modification [R]

r/MachineLearning · 9d ago · 8 · research architecture inference open source

Residual Coupling (RC) is a novel architecture that connects frozen language models in parallel using lightweight linear bridge projections, achieving significant improvements over baselines and MoE routing (80.7% perplexity reduction in medical domain). The approach enables horizontal scaling of multi-model systems without modifying base weights, with potential applications in reducing multi-turn prompting to single parallel forward passes and edge deployment.

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

Ahead of AI · 11d ago · 9 · new model research architecture inference open source

Deep technical analysis of long-context efficiency improvements in recent open-weight LLMs, focusing on architectural innovations like KV sharing, layer-wise attention budgeting, and compressed convolutional attention across Gemma 4, Laguna XS.2, ZAYA1, and DeepSeek V4. The article provides detailed explanations of how modern models optimize KV-cache size, memory traffic, and attention computation costs—critical constraints for building production AI systems with extended context windows.

Transformers with Selective Access to Early Representations [R]

r/MachineLearning · 21d ago · 7 · research architecture inference

SATFormer introduces a more efficient alternative to recent Transformer variants by replacing static cross-layer pathways with per-token, per-head gating that selectively reuses first-layer representations. The method achieves better efficiency-performance tradeoffs (1.75-1.82× higher throughput than competitors) while improving validation loss at 130M-1.3B scale and showing strong results on retrieval-intensive tasks.

claude-code-book — 《御舆：解码 Agent Harness》42万字拆解 AI Agent 的Harness骨架与神经 —— Claude Code 架构深度剖析，15 章从对话循环到构建你自己的 Agent Harness。在线阅读网站：

GitHub Trending AI · 56d ago · 7 · agent architecture tutorial workflow

A comprehensive Chinese technical guide ("御舆") that deconstructs AI Agent architecture, specifically analyzing Claude Code's design patterns including conversation loops, tool permission pipelines, context compression, and the Agent Harness runtime framework. Provides a transferable mental model for building production-grade agent systems across different frameworks without relying on prompt engineering tutorials.

how-claude-code-works — Deep dive into Claude Code internals — architecture, agent loop, context engineering, and more. / 深入解析 Claude Code 源码：架构、Agent 循环、上下文工程、工具系统等

GitHub Trending AI · 57d ago · 9 · agent architecture tutorial open source workflow

In-depth technical analysis of Claude Code's source architecture, covering the agent loop, context engineering, tool system, and production-grade error recovery strategies. Includes a companion project (Claude Code From Scratch) with ~4000 lines of TypeScript/Python and 11-chapter tutorial for building your own AI programming agent from scratch.