News Nug

OpenAI Blog · 47d ago · 5 · prompt engineering workflow

A general guide on using ChatGPT for ideation and planning workflows. While useful for understanding prompt patterns and LLM capabilities, it's broad instructional content rather than technical implementation details or new tools that would directly impact daily AI development work.

Applications of AI at OpenAI

OpenAI Blog · 47d ago · 5 · api update workflow

General overview of OpenAI's existing product portfolio (ChatGPT, Codex, APIs) and their applications across work and development contexts. While relevant to AI engineers, this reads as introductory content without specific technical updates, new capabilities, or implementation guidance.

ChatGPT for operations teams

OpenAI Blog · 47d ago · 5 · workflow prompt engineering

Article discusses practical applications of ChatGPT for operations teams focusing on workflow optimization, process standardization, and coordination improvements. While relevant to AI engineers building with models daily, it's primarily business-focused rather than technical implementation guidance.

Multimodal Embedding & Reranker Models with Sentence Transformers

HuggingFace Blog · 48d ago · 8 · tutorial rag library inference

Practical guide to multimodal embedding and reranker models that extend traditional RAG pipelines to handle text, images, and other modalities in a shared embedding space. Covers model loading, encoding mixed-modality inputs, and computing cross-modal similarities with concrete code examples and performance considerations.

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

HuggingFace Blog · 48d ago · 8 · new model inference open source

Waypoint-1.5 is Overworld's improved real-time video world model now optimized for consumer hardware, running up to 720p/60fps on RTX 3090+ and 360p on broader gaming laptops/Apple Silicon. The model was trained on 100x more data than v1 with more efficient video modeling techniques, prioritizing interactive responsiveness and local deployment over pure visual fidelity.

Meta's new model is Muse Spark, and meta.ai chat has some interesting tools

Simon Willison · 48d ago · 8 · new model api update agent tool benchmark

Meta released Muse Spark, a new hosted AI model with Instant and Thinking modes, accessible via meta.ai with a private API preview. The model includes integrated tools for web search, image generation, code execution, and Meta content search, making it relevant for understanding multi-tool agent systems and comparing reasoning capabilities against current SOTA models like GPT-5.4 and Gemini 3.1.

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

HuggingFace Blog · 49d ago · 7 · agent workflow open source tool

ALTK-Evolve is a long-term episodic memory system for AI agents that distills interaction traces into reusable guidelines rather than storing raw transcripts, enabling agents to generalize principles across tasks. The framework shows significant improvements on multi-step API tasks (AppWorld benchmark) and integrates as a Claude Code plugin or with existing tools like Arize Phoenix and Codex without major stack changes.

Safetensors is Joining the PyTorch Foundation

HuggingFace Blog · 49d ago · 7 · tool open source deployment

Safetensors, the secure model weight format that replaced pickle-based serialization, is moving to PyTorch Foundation governance to become truly community-owned while remaining the de facto standard for model distribution across Hugging Face Hub. The move enables vendor-neutral stewardship and potential integration into PyTorch core, with no breaking changes for existing users but clearer paths for community contributors.

GLM-5.1: Towards Long-Horizon Tasks

Simon Willison · 49d ago · 7 · new model open source benchmark

GLM-5.1, a 754B parameter open-weights model from Z.ai, demonstrates strong capabilities in multimodal generation and instruction-following, particularly for SVG/HTML creation tasks. The model can self-correct technical issues (CSS animations breaking SVG positioning) and generate well-structured code with detailed comments, making it worth testing for creative code generation workflows.

Anthropic's Project Glasswing - restricting Claude Mythos to security researchers - sounds necessary to me

Simon Willison · 49d ago · 9 · new model research benchmark deployment

Anthropic released Claude Mythos Preview under restricted access through Project Glasswing, a model with dramatically enhanced cybersecurity research capabilities that can autonomously develop complex multi-vulnerability exploits and ROP chains—achieving 181/210 success rate on exploit development vs near-0% for Claude Opus 4.6. This represents a significant capability jump in AI-assisted vulnerability research with direct implications for how engineers must approach security testing and deployment of foundational systems.

Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

Latent Space · 50d ago · 7 · agent workflow prompt engineering open source

OpenAI's Ryan Lopopolo discusses 'Harness Engineering'—a methodology for building AI-native software where agents operate autonomously with zero human-written code, using >1B tokens/day and extensive prompt engineering via Symphony (a multi-agent orchestration system). The approach shifts focus from prompt optimization to building proper context, structure, and observability for agents to function as full teammates rather than copilots.

[AINews] Gemma 4 crosses 2 million downloads

Latent Space · 50d ago · 7 · new model deployment inference open source tool

Gemma 4 is gaining traction as a practical edge-inference model with strong on-device performance (40 tok/s on iPhone 17 Pro via MLX), achieving 2M downloads in its first week and becoming the top trending model on Hugging Face. The release demonstrates mature ecosystem support across llama.cpp, Ollama, vLLM, and other deployment tools, positioning it as a reference point for local-first development and reducing reliance on paid cloud APIs.

gemma-gem — Gemma Gem runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine.

GitHub Trending AI · 51d ago · 7 · tool inference open source deployment

Gemma Gem is a browser extension that runs Google's Gemma 4 model locally via WebGPU for on-device AI inference with web automation capabilities (page reading, form filling, JavaScript execution). This is relevant for engineers building AI applications that require privacy-preserving inference and browser-based automation without cloud dependencies.

llm-wiki-compiler — The knowledge compiler. Raw sources in, interlinked wiki out. Inspired by Karpathy's LLM Wiki pattern.

GitHub Trending AI · 51d ago · 7 · tool open source workflow rag

llmwiki is an open-source tool that compiles raw sources into an interlinked markdown wiki using LLMs, inspired by Karpathy's approach. It supports multiple providers (Anthropic, OpenAI, Ollama) with configurable endpoints and offers persistent knowledge compilation that compounds over time, complementing RAG workflows with structured artifact generation.

Jackrong-llm-finetuning-guide

GitHub Trending AI · 52d ago · 7 · fine tuning tutorial open source workflow

An open-source educational repository providing end-to-end LLM fine-tuning pipelines with theoretical explanations, data processing workflows, and deployment strategies designed for beginners. Covers Supervised Fine-Tuning with plans for Reinforcement Learning, using accessible tools like Unsloth to reduce computational barriers and making advanced model adaptation feasible with minimal resources.

mempalace — The highest-scoring AI memory system ever benchmarked. And it's free.

GitHub Trending AI · 52d ago · 7 · open source tool library

MemPalace is an open-source local AI memory system that stores raw conversation transcripts in ChromaDB without summarization, achieving 96.6% on LongMemEval benchmarks. It organizes conversations hierarchically (wings/halls/rooms) for semantic searchability and includes an experimental AAAK compression dialect for handling repeated entities at scale, though the developers transparently document current limitations (84.2% recall with AAAK vs 96.6% with raw storage).

codesight — Universal AI context generator. Saves thousands of tokens per conversation in Claude Code, Cursor, Copilot, Codex, and more.

GitHub Trending AI · 53d ago · 7 · tool open source workflow

Codesight is a zero-dependency CLI tool that generates persistent AI-friendly codebase documentation through AST parsing and regex detection across 30+ frameworks. It creates a wiki and knowledge base that reduces token usage by up to 91x when used with Claude Code, Cursor, and other AI coding assistants by providing targeted context instead of full codebase dumps.

OpenKB — OpenKB: Open LLM Knowledge Base

GitHub Trending AI · 53d ago · 8 · tool open source rag library

OpenKB is an open-source CLI tool that builds persistent, structured knowledge bases from raw documents using LLMs and vectorless retrieval via PageIndex, enabling accumulative knowledge compilation instead of repeated retrieval from scratch. It supports multi-LLM backends (OpenAI, Claude, Gemini via LiteLLM), hierarchical indexing for long documents, and interactive chat with conversation history over the wiki.

Components of A Coding Agent

Ahead of AI · 53d ago · 8 · agent workflow tutorial

Comprehensive reference on coding agent architecture covering six main building blocks of agentic systems (tool use, context management, memory, prompt caching, etc.) and how they differ from raw LLMs and reasoning models. Explains why systems like Claude Code outperform standalone models through their surrounding harness design rather than model capability alone.

[AINews] Good Friday

Latent Space · 53d ago · 8 · new model open source inference benchmark deployment

Gemma 4 launched under Apache 2.0 with strong day-0 ecosystem support across vLLM, llama.cpp, Ollama, and major inference platforms. Key technical highlights include MoE architecture, multimodal capabilities, impressive local inference benchmarks (162 tok/s on RTX 4090, runs on M4 MacBooks and iPhones), and ecosystem-wide quantization/optimization support within hours of release.