HN AI Stories · 512d ago · 7 · benchmark new model inference

Comprehensive year-in-review of LLM developments in 2024, highlighting that 18 organizations now have models surpassing GPT-4, with major advances in context length (up to 2M tokens with Gemini), multimodal capabilities (video input), and expanded model availability across open-source and commercial providers. Key takeaways include the democratization of competitive model performance, practical improvements in long-context reasoning for code and document analysis, and emerging capabilities like AI agents and multimodal processing becoming standard.

HN AI Stories · 779d ago · 9 · open source library inference tutorial

llm.c is a high-performance C/CUDA implementation for LLM pretraining that eliminates heavy dependencies (PyTorch, Python) while achieving 7% faster performance than PyTorch Nightly. It provides clean reference implementations for reproducing GPT-2/GPT-3 models with both GPU (CUDA) and CPU code paths, making it valuable for understanding model training mechanics and CUDA optimization.

HN AI Stories · 910d ago · 8 · tool open source deployment inference

llamafile 0.10.0 update from Mozilla.ai enables distributing and running open LLMs as single-file executables across platforms with no installation required, now with improved alignment to latest llama.cpp versions and support for more recent models. The tool also includes whisperfile for single-file speech-to-text capabilities, making local LLM deployment significantly more accessible for developers.