PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.
Residual Coupling (RC) is a novel architecture that connects frozen language models in parallel using lightweight linear bridge projections, achieving significant improvements over baselines and MoE routing (80.7% perplexity reduction in medical domain). The approach enables horizontal scaling of multi-model systems without modifying base weights, with potential applications in reducing multi-turn prompting to single parallel forward passes and edge deployment.
Deep technical analysis of long-context efficiency improvements in recent open-weight LLMs, focusing on architectural innovations like KV sharing, layer-wise attention budgeting, and compressed convolutional attention across Gemma 4, Laguna XS.2, ZAYA1, and DeepSeek V4. The article provides detailed explanations of how modern models optimize KV-cache size, memory traffic, and attention computation costs—critical constraints for building production AI systems with extended context windows.
SATFormer introduces a more efficient alternative to recent Transformer variants by replacing static cross-layer pathways with per-token, per-head gating that selectively reuses first-layer representations. The method achieves better efficiency-performance tradeoffs (1.75-1.82× higher throughput than competitors) while improving validation loss at 130M-1.3B scale and showing strong results on retrieval-intensive tasks.
A comprehensive Chinese technical guide ("御舆") that deconstructs AI Agent architecture, specifically analyzing Claude Code's design patterns including conversation loops, tool permission pipelines, context compression, and the Agent Harness runtime framework. Provides a transferable mental model for building production-grade agent systems across different frameworks without relying on prompt engineering tutorials.
In-depth technical analysis of Claude Code's source architecture, covering the agent loop, context engineering, tool system, and production-grade error recovery strategies. Includes a companion project (Claude Code From Scratch) with ~4000 lines of TypeScript/Python and 11-chapter tutorial for building your own AI programming agent from scratch.