News Nug

OpenAI Blog · 36d ago · 5

Fixing Unsupervised Hyperbolic Contrastive Loss [D]

r/MachineLearning · 36d ago · 7 · research fine tuning workflow

A software engineer is debugging an implementation of unsupervised hyperbolic contrastive learning on ImageNet-1k, where their hyperbolic version (57% 1-NN accuracy) significantly underperforms standard Euclidean cosine contrastive learning (64%). The issue likely involves manifold constraint enforcement, loss formulation design, or hyperparameter tuning specific to hyperbolic geometry.

datasette-llm 0.1a7

Simon Willison · 36d ago · 7 · tool workflow open source

Datasette now supports configurable default options for LLM models in plugins, allowing users to specify model selection and parameters like temperature across enrichment operations. This workflow improvement addresses practical concerns for teams building LLM-integrated data tools.

llm-echo 0.5a0

Simon Willison · 36d ago · 7 · tool testing open source

A new testing plugin provides a fake LLM model ('echo') that echoes prompts without actual inference, enabling developers to write automated tests for LLM-based applications. The tool supports faking reasoning blocks and JSON responses, streamlining test development workflows.

Granite 4.1 3B SVG Pelican Gallery

Simon Willison · 36d ago · 7 · new model open source inference

IBM released Granite 4.1 LLMs (3B, 8B, 30B sizes) under Apache 2.0 license with detailed training documentation, and Unsloth published 21 GGUF quantized variants for the 3B model ranging from 1.2GB-6.34GB. The post documents an experimental evaluation of how quantization affects model performance on SVG generation tasks, providing practical insights into model size-quality tradeoffs for local deployment.

Building a 9-ball AI player: Candidate generation for direct cut shots [P]

r/MachineLearning · 36d ago · 5

How do you experiment with a (very) large model architecture? [D]

r/MachineLearning · 37d ago · 6 · workflow research

Reddit discussion on practical strategies for validating expensive diffusion model experiments, covering dataset reduction, batch size/learning rate tradeoffs, and early stopping. While not a formal resource, it discusses real engineering constraints relevant to researchers reproducing compute-heavy papers.

White House Considers Vetting A.I. Models Before They Are Released

r/LocalLLaMA · 37d ago · 5

TRE Python binding — ReDoS robustness demo

Simon Willison · 37d ago · 6 · tool library research

Explores TRE regex engine's superior handling of ReDoS attacks compared to Python's standard library, with Claude Code used to build experimental Python bindings and test malicious regex patterns. Demonstrates practical security benefits of backtracking-free regex implementations for AI engineers building systems that process untrusted regex inputs.

[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

r/MachineLearning · 37d ago · 8 · fine tuning open source tool tutorial benchmark

A practical fine-tuning case study using QLoRA to adapt Qwen2.5-1.5B for CEFR English proficiency classification with 84.9% accuracy on 6 difficulty levels. The work includes synthetic dataset generation via Llama-3.3-70B, 4-bit quantization optimization, and FastAPI deployment—demonstrating efficient parameter-tuning (0.28%) for real-world educational NLP tasks.

Parax v0.5: Parametric Modeling in JAX [P]

r/MachineLearning · 37d ago · 7 · library open source tool

Parax is a generalized JAX library for parametric modeling that provides derived/constrained parameters, computed PyTrees, and abstract interfaces for parameter management with a focus on clean, extensible APIs and opt-in design rather than framework overhead.

Why SSMs struggle in parameter-constrained training: empirical findings at 25M parameters [R]

r/MachineLearning · 37d ago · 7 · research benchmark inference workflow

Deep technical analysis of SSM (State Space Model) vs Transformer performance constraints from OpenAI's Parameter Golf competition, revealing that SSMs have fundamental compression disadvantages (3.26x worse LZMA compression on weights) in size-constrained regimes. Includes kernel-level optimization experiments on Mamba-3 Triton kernels and practical findings on mixed-precision techniques that recovered 0.8 mBPB.

AutoBe benchmark: structured harness narrows frontier-vs-local gap in backend generation [D]

r/MachineLearning · 37d ago · 7 · benchmark tool inference workflow

AutoBe introduces a structured benchmark for end-to-end backend generation using AST-based function calling rather than unstructured code generation, with deterministic static analysis scoring. Key finding: smaller/cheaper models (qwen3.5-27b, local models) achieve competitive results with frontier models when using well-structured harnesses, suggesting harness design matters more than model size for backend generation tasks.

Llama.cpp MTP support now in beta!

r/LocalLLaMA · 37d ago · 8 · inference optimization open source tool

A Pull Request implementing Multi Token Prediction (MTP) head support in llama.cpp, enabling speculative decoding with ~2.5x speedup and 75% token acceptance rates on Qwen3.6 models. The implementation optimizes host-device data transfers and is designed to work with any MTP-capable model, with working examples and performance benchmarks provided.

Live demo of LocalVQE: Tiny ~1M param audio model that cancels echo and noise in realtime

r/LocalLLaMA · 37d ago · 5

Photo-agents — Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.

GitHub Trending AI · 37d ago · 7 · agent tool open source

Photo Agents is a Python runtime for building vision-grounded autonomous agents that perceive screen state, reason about tasks, and execute actions with self-learned skills—running locally for full data ownership. The framework uses layered memory similar to biological systems and requires Python 3.10+ with API key validation, offering multiple frontend options for integration.

"Second Thoughts" Been playing with adding a small transformer that reads output near the end of generation, and feeds it back near the top as a refinement loop. A quick test of 1.7B model showed drastic improvement in focused tasks (like coding)

r/LocalLLaMA · 37d ago · 7 · research fine tuning inference open source

Developer shares work on a reverse LLM sidecar architecture that improves code generation in small models (1.7B-9B) by reading outputs end-to-start and injecting feedback loops focused on syntax correction. The approach shows promise on HumanEval benchmarks and code is being cleaned up for GitHub release.

How OpenAI delivers low-latency voice AI at scale

OpenAI Blog · 37d ago · 7 · inference deployment workflow

OpenAI details architectural improvements to their WebRTC implementation for real-time voice AI, focusing on latency optimization and conversation management. This provides practical insights into building low-latency audio systems for AI applications, relevant for engineers implementing real-time voice features.

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]

r/MachineLearning · 37d ago · 8 · tool inference open source benchmark

A proof-of-concept leveraging idle NVENC hardware on GPUs to compress neural network intermediate states (activations, KV cache) for PCIe transfer, achieving ~180 GB/s effective bandwidth on consumer GPUs like the RTX 5090—effectively recovering NVLink-class performance through hardware-pipelined codec operations that hide behind compute.

how-to-train-your-gpt — Build a modern LLM from scratch. Every line commented. Explained like we are five.

GitHub Trending AI · 37d ago · 8 · tutorial open source workflow

A comprehensive 12-chapter interactive textbook that teaches building a decoder-only Transformer from scratch with fully annotated code (~860 lines core implementation, ~2600 lines explanation). Covers tokenization, embeddings, attention mechanisms, training loops, and inference—the same architecture family behind ChatGPT, Claude, and LLaMA—with no prerequisites beyond Python basics.