Ahead of AI · 254d ago · 8 · tutorial open source research

Deep dive into Qwen3 architecture implementation from scratch in PyTorch, covering the open-weight model family's design choices and building blocks. Provides practical code examples and architectural patterns directly applicable to understanding modern LLM internals and building custom variations.

HN AI Stories · 502d ago · 7 · benchmark new model inference

Comprehensive year-in-review of LLM developments in 2024, highlighting that 18 organizations now have models surpassing GPT-4, with major advances in context length (up to 2M tokens with Gemini), multimodal capabilities (video input), and expanded model availability across open-source and commercial providers. Key takeaways include the democratization of competitive model performance, practical improvements in long-context reasoning for code and document analysis, and emerging capabilities like AI agents and multimodal processing becoming standard.

HN AI Stories · 769d ago · 9 · open source library inference tutorial

llm.c is a high-performance C/CUDA implementation for LLM pretraining that eliminates heavy dependencies (PyTorch, Python) while achieving 7% faster performance than PyTorch Nightly. It provides clean reference implementations for reproducing GPT-2/GPT-3 models with both GPU (CUDA) and CPU code paths, making it valuable for understanding model training mechanics and CUDA optimization.

HN AI Stories · 900d ago · 8 · tool open source deployment inference

llamafile 0.10.0 update from Mozilla.ai enables distributing and running open LLMs as single-file executables across platforms with no installation required, now with improved alignment to latest llama.cpp versions and support for more recent models. The tool also includes whisperfile for single-file speech-to-text capabilities, making local LLM deployment significantly more accessible for developers.