This essay explores whether LLM capabilities emerge purely from scale (data + compute) versus requiring fundamental algorithmic innovations, tracing this debate from early computer vision work through GPT scaling. While intellectually engaging, it's primarily philosophical reflection on existing trends rather than introducing new technical methods, models, or practical tools for engineers building with AI.
Anthropic released Claude Mythos Preview under restricted access through Project Glasswing, a model with dramatically enhanced cybersecurity research capabilities that can autonomously develop complex multi-vulnerability exploits and ROP chains—achieving 181/210 success rate on exploit development vs near-0% for Claude Opus 4.6. This represents a significant capability jump in AI-assisted vulnerability research with direct implications for how engineers must approach security testing and deployment of foundational systems.
Moonlake AI presents an alternative world modeling approach using game engine bootstrapping and structured representations rather than pure scaling, addressing limitations of models like Genie 3 through multiplayer interactivity, indefinite lifetimes, and better physical consistency. The research emphasizes efficiency via causal structure and semantic understanding over high-resolution pixel prediction, with insights from Chris Manning and Ian Goodfellow on why this architectural approach is necessary for practical planning and environmental understanding.
TII releases Falcon OCR, a 0.3B parameter model achieving 80.3/88.6 on olmOCR/OmniDocBench benchmarks with the highest throughput among open-source OCR models. The post details their unified early-fusion Transformer architecture that combines vision and language modeling in a single backbone with hybrid attention masks and structured Chain-of-Perception decoding for dense object detection and segmentation.
IBM releases Granite 4.0 3B Vision, a modular vision-language model optimized for chart and document understanding, delivered as a LoRA adapter on Granite 4.0 Micro with a novel DeepStack architecture for multi-layer visual feature injection. The release includes ChartNet, a 1.7M-sample synthetic dataset for chart interpretation with code-guided augmentation, addressing a key VLM weakness in structured data reasoning.
OpenMed built an end-to-end open-source protein engineering pipeline combining structure prediction, sequence design, and codon optimization, with novel contributions in codon-level language modeling. They benchmarked transformer architectures (CodonRoBERTa-large-v2 vs ModernBERT) for codon optimization, scaled to 25 species in 55 GPU-hours, and released runnable code with full experimental transparency—directly applicable for engineers building biological AI systems.
Research release on empirically validated toolkit for measuring AI manipulation capabilities, tested across 10,000+ participants in finance and health domains. Provides open-source methodology and materials for evaluating how AI systems can be misused to deceptively influence human behavior and beliefs in high-stakes scenarios.
Comprehensive reference guide organizing 45+ LLM architectures with visual model cards and detailed explanations of attention variants (MHA, GQA, sliding window, etc.) used in modern models. Includes both a web gallery and printable poster, serving as a practical learning resource for understanding contemporary transformer architectures.
Google DeepMind released a cognitive taxonomy framework for measuring AGI progress, grounded in psychology and neuroscience, identifying 10 key cognitive abilities. They're launching a $200K Kaggle hackathon where engineers can design evaluations for five priority abilities (learning, metacognition, attention, executive functions, social cognition) using their new Community Benchmarks platform to test against frontier models.
Comprehensive technical comparison of 10+ major open-weight LLM releases from January-March 2026, analyzing architectural innovations like mixture-of-experts, sliding window attention, QK-norm, and gating mechanisms across models from Arcee, Moonshot, Qwen, and others. Serves as a practical reference for understanding current design patterns and trade-offs in large model architecture.
Comprehensive overview of inference-time scaling techniques for LLMs, covering methods like chain-of-thought prompting, self-consistency, best-of-N ranking, and rejection sampling with verifiers. The author shares practical experimentation results (achieving 15% to 52% accuracy improvement) and categorizes approaches from both academic literature and proprietary LLM implementations, making it directly applicable to deployed systems.
A comprehensive retrospective on 2025's major LLM developments, starting with DeepSeek R1's January release showing that reinforcement learning (specifically RLVR/GRPO) can enable reasoning-like behavior in LLMs, and revealing that state-of-the-art model training may cost an order of magnitude less than previously estimated. The article examines how post-training scaling through verifiable rewards represents a significant algorithmic shift from SFT/RLHF approaches, opening new possibilities for capability unlocking.
DeepSeek V3.2 is a new open-weight flagship model achieving GPT-5/Gemini 3.0 Pro-level performance with a custom sparse attention mechanism requiring specialized inference infrastructure. The article provides technical deep-dive into the model's architecture, training pipeline, and what's changed since V3/R1, making it essential for engineers working with state-of-the-art open-source models.
Comprehensive overview of alternative LLM architectures beyond standard transformers, including diffusion models, linear attention hybrids, state space models (SSMs), and specialized architectures like code world models. The article surveys emerging approaches aimed at improving efficiency and modeling performance, with comparisons to current SOTA transformer-based models like DeepSeek R1, Llama 4, and Qwen3.
Deep dive into Qwen3 architecture implementation from scratch in PyTorch, covering the open-weight model family's design choices and building blocks. Provides practical code examples and architectural patterns directly applicable to understanding modern LLM internals and building custom variations.