News Nug

Public Repository "Codegraph" claims to reduce Claude, Cursor, Codex, and OpenCode API tool calls by 94% locally, an innovation that could directly offset the most recent Claude API pricing model.

r/LocalLLaMA · 22d ago · 8 · tool agent workflow open source

CodeGraph is a new MCP server tool that pre-indexes codebases into knowledge graphs (symbol relationships, call graphs, code structure), enabling AI agents like Claude Code to explore repositories with 92% fewer tool calls and 71% faster performance by querying local SQLite indices instead of scanning files. The tool auto-syncs via file watchers, integrates with Claude Code/Cursor/Codex CLI, and includes framework-specific routing detection for web apps.

Backprop-free Pong: PC + distributional Hebbian plasticity vs. PPO: 57% vs. 59%, ~1500 lines from scratch [P]

r/MachineLearning · 22d ago · 6 · research agent open source

Empirical comparison of bio-plausible learning (Hebbian plasticity + predictive coding) versus PPO on Pong, achieving 57% of PPO performance with zero backpropagation. Identifies catastrophic forgetting in non-stationary self-play as the key bottleneck rather than the lack of backprop, revealing the plasticity-stability tradeoff in biologically-inspired RL systems.

What do you think about Tabular Foundation Models [D]

r/MachineLearning · 22d ago · 5 · research benchmark

Reddit discussion questioning the practical utility of tabular foundation models (TabPFN-3, TabICL) despite impressive benchmark results, arguing that resource overhead (GB models for MB datasets) may not justify gains over classical ML with feature engineering. Raises valid engineering tradeoffs about model size, inference requirements, and explainability versus performance metrics.

bytedance released an open source model that attempts to do just about anything with only 3b parameters

r/LocalLLaMA · 22d ago · 8 · new model research inference

Lance is a unified multimodal model from ByteDance that handles image and video understanding, generation, and editing in a single framework. The paper demonstrates strong performance on diverse visual reasoning tasks including video QA, chart analysis, and detailed scene description, making it relevant for engineers building multimodal AI applications.

Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI Blog · 22d ago · 5 · tool deployment

OpenAI has released Content Credentials integration and verification tools to help identify AI-generated media through technical standards. While not directly impacting daily AI engineering workflows, this is relevant for developers building content creation systems who need to implement transparency and provenance tracking.

[AINews] How to land a job at a frontier lab (on Pretraining)

Latent Space · 22d ago · 8 · tutorial workflow inference benchmark

Vlad Feinberg's hiring/skill guide emphasizes kernel-level performance optimization as the critical bottleneck in LLM work, highlighting the need for JAX/Pallas expertise to fuse operations like MoE projections for measurable speedups. The piece connects pretraining fundamentals (Chinchilla laws, dense vs MoE tradeoffs) with low-level optimization as a direct path into AI labs, plus practical exercises (deriving scaling laws, implementing kernels from scratch) that double as hiring tests.

First-time ICML workshop acceptance (GlobalSouthML) but can't afford to travel to South Korea. What are my options? [D]

r/MachineLearning · 22d ago · 5

We built a tool that installs frameworks like ComfyUI, Ollama, OpenWebUI etc on any cloud GPU in one command and saves your whole setup between sessions [R]

r/MachineLearning · 22d ago · 8 · tool open source deployment workflow

swm is an open-source tool that simplifies GPU rental workflows by providing unified pricing across providers (RunPod, Vast.ai, Lambda, etc.), automatic workspace syncing to S3-compatible storage, and lifecycle management to prevent runaway costs. It supports popular AI frameworks like ComfyUI, Ollama, vLLM, and Axolotl, eliminating the 45-minute reinstall cycle that plagues multi-provider GPU usage.

The last six months in LLMs in five minutes

Simon Willison · 22d ago · 6 · benchmark agent workflow

A retrospective on LLM developments from November 2025 to May 2026, highlighting the inflection point where coding agents became production-ready through RL from verifiable rewards, and models rapidly iterated across providers. The author discusses practical experiences building ambitious projects with these new capabilities and references an emerging open-source coding agent framework (Warelay).

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]

r/MachineLearning · 22d ago · 7 · workflow inference deployment

Engineer discusses streaming architecture for processing long videos with Whisper and LLMs, addressing chunking strategies to maintain context, audio VAD techniques, and whether asyncio/FastAPI suffices versus Celery/Redis for pipelined task processing. Practical workflow optimization relevant for building real-time AI video analysis backends.

Introducing the Ettin Reranker Family

HuggingFace Blog · 22d ago · 8 · new model tool open source rag tutorial

Six new Sentence Transformers CrossEncoder rerankers built on ModernBERT, trained with distillation on open datasets, achieving SOTA performance at multiple model sizes. Includes full training recipes, easy 3-line inference API, and a new Hugging Face Agent Skill for fine-tuning rerankers on custom data.

Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P]

r/MachineLearning · 22d ago · 7 · dataset open source fine tuning

A new multilingual dataset (Indic HPLT v1) with 9.8M documents across 11 Indian languages plus English, totaling 8.4B tokens, released under CC0 license on Hugging Face. Useful for training and fine-tuning language models for underrepresented Indian language families, though primarily a resource rather than a novel technical breakthrough.

21 GPU's benchmarked running a small TTS model (vram peak: 5GB)

r/LocalLLaMA · 22d ago · 5

May 18, 2026AnnouncementsAnthropic acquires Stainless

Anthropic Blog · 23d ago · 8 · tool agent api update deployment

Anthropic acquired Stainless, the company behind SDK generation and MCP server tooling that powers Claude integrations. This acquisition strengthens agent connectivity by consolidating SDK/CLI generation and Model Context Protocol infrastructure, directly impacting how developers build tool-calling capabilities for AI agents.

Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]

r/MachineLearning · 23d ago · 9 · inference open source benchmark deployment

FlashRT is a CUDA-first inference runtime optimizing small-batch/realtime ML workloads (robotics, VLAs, world models) by rewriting model inference directly in C++/CUDA rather than relying on generic runtimes. The project demonstrates that for batch-size-1 inference, runtime overhead (kernel launches, synchronization, format conversions, precision transitions) dominates latency more than raw compute speed, achieving 17.6ms on Pi0.5 and 2.39ms/token on RTX 5090, with key insight that lower precision (FP4/NVFP4) provides mixed returns unless heavily fused.

Fast-tracking genetic leads to reverse cellular aging

DeepMind Blog · 23d ago · 6 · agent workflow

Researchers are using Co-Scientist (an AI agent/tool) to accelerate aging research by autonomously mining scientific literature for novel genetic targets and analyzing large-scale screening data—reducing months of analysis to days. While demonstrating practical AI application in biology, this is primarily a case study of using existing AI capabilities rather than introducing new models, libraries, or technical workflows for software engineers.

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

HuggingFace Blog · 23d ago · 8 · fine tuning tool tutorial open source

Practical guide for parameter-efficient fine-tuning of NVIDIA's Cosmos Predict 2.5 video world model using LoRA and DoRA adapters, enabling domain-specific adaptation on consumer GPUs without catastrophic forgetting. Includes complete implementation walkthrough using diffusers and accelerate libraries for generating synthetic robot trajectories for policy learning.

Witchcraft, fast local semantic search on top of SQLite [P]

r/MachineLearning · 23d ago · 8 · open source tool library rag deployment

Witchcraft is a Rust-based semantic search engine for client-side deployment using SQLite, achieving 20ms latency without external APIs or vector databases. It includes Pickbrain, a CLI tool that indexes Claude/Codex transcripts and documents for semantic search with direct session resumption, plus skills for both AI platforms to maintain cross-session memory.

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

HuggingFace Blog · 23d ago · 7 · tool library inference rag open source

PaddleOCR 3.5 now supports Transformers as a backend, enabling easier integration of OCR and document parsing into Hugging Face-centered workflows. This addresses document ingestion for RAG and Document AI pipelines by allowing developers to run PP-OCRv5 and PaddleOCR-VL models with flexible backend selection through a simple engine parameter.

The Open Agent Leaderboard

HuggingFace Blog · 23d ago · 8 · benchmark agent open source deployment

A new Open Agent Leaderboard benchmark evaluates full agent systems (not just models) across diverse tasks, reporting both quality and cost metrics to measure practical generality. Released with the Exgentic framework and methodology paper, it tests agents across coding, customer service, technical support, and research tasks to reveal what actually drives real-world agent performance.