News Nug

1,200 ICLR 2026 Papers with Public Code or Data [R]

r/MachineLearning · 51d ago · 8 · research open source benchmark

Curated list of ~1,200 ICLR 2026 accepted papers with publicly available code, data, or demos (22% of total papers). Direct links to implementations across GitHub and official repositories provide immediate access to reproducible research for exploring cutting-edge ML techniques.

Are we confusing Agent Execution Runtimes with true Agent Runtime Environments? [D]

r/MachineLearning · 51d ago · 7 · agent workflow deployment

A technical discussion distinguishing between reactive agent harnesses and truly autonomous agent runtime environments, questioning whether current infrastructure (LangChain, etc.) supports persistent, self-managing agents with heartbeats, self-healing, and long-term memory. The post identifies a potential gap between execution frameworks and operational infrastructure needed for continuous autonomous systems.

Why production systems keep making “correct” decisions that are no longer right [D]

r/MachineLearning · 52d ago · 6 · deployment workflow

A discussion on Reddit about a subtle failure mode in production AI systems where formally correct outputs become contextually wrong when underlying assumptions shift—not a technical failure, but a structural one where governance and monitoring reinforce outdated decision frameworks. This identifies the 'Formalisation Trap' as a distinct operational problem that requires rethinking system design beyond traditional controls.

Converting XQuery to SQL with Local LLMs: Do I Need Fine-Tuning or a Better Approach? [P]

r/MachineLearning · 52d ago · 7 · fine tuning prompt engineering workflow rag

A practical technical discussion on converting XQuery to SQL using local LLMs with limited training data (~110-120 samples), comparing parsing, prompt-engineering, and fine-tuning (QLoRA with Qwen2.5-Coder 7B) approaches. The post identifies key challenges like query sensitivity and missing conditions, directly relevant for engineers building AI solutions with constrained resources in enterprise environments.

Cairn — A AI general-purpose state-space search engine, validated first on autonomous penetration testing.

GitHub Trending AI · 52d ago · 7 · agent open source tool workflow

Cairn is an open-source general-purpose problem-solving engine using a blackboard architecture with explicit fact-intent graphs and multi-agent workers coordinating via stigmergy. The system demonstrated practical value in the Tencent Cloud Hackathon AI Penetration Testing Challenge without predefined roles, RAG, or MCP tools, making it relevant for building autonomous agent systems that navigate complex state spaces.

Changes in the system prompt between Claude Opus 4.6 and 4.7

Simon Willison · 52d ago · 8 · prompt engineering workflow research

Anthropic published system prompt changes between Claude Opus 4.6 and 4.7, revealing important instruction updates around tool usage, task completion, and response handling. The changes show evolved guidance on when Claude should use tools to resolve ambiguity before asking users, when to ask clarifying questions, and refined behavioral guidelines around disclaimers and specific sensitive topics like eating disorders.

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

r/MachineLearning · 52d ago · 8 · fine tuning deployment workflow tutorial

ML team documents critical issues and workarounds for fine-tuning and deploying Gemma-4 with PEFT and TRL, including problems with custom layer compatibility, KV-sharing attention, DeepSpeed ZeRO-3 adapter corruption, and runtime LoRA serving limitations. Provides practical fixes like unwrapping custom layers before PEFT, upgrading transformers to v5.5.2+, and manual weight merging for deployment.

easyaligner: Forced alignment with GPU acceleration and flexible text normalization (compatible with all w2v2 models on HF Hub) [P]

r/MachineLearning · 53d ago · 8 · open source library tool workflow

easyaligner is a new open-source forced alignment library built for speech-to-text preprocessing that handles practical pain points like partial transcripts, long audio segments without chunking, and text normalization with format recovery. It leverages PyTorch's forced alignment API with GPU-optimized Viterbi algorithm and supports any language with wav2vec2 models on Hugging Face Hub, achieving 35-102% faster transcription than WhisperX.

Claude system prompts as a git timeline

Simon Willison · 53d ago · 7 · prompt engineering tool open source

Anthropic publicly released system prompts for Claude models as Markdown, which Simon Willison converted into version-tracked files using Claude Code to enable easy comparison. This provides valuable transparency into how Claude's behavior is shaped across model versions, with detailed notes on changes between Opus 4.6 and 4.7 for understanding prompt engineering decisions.

My Workflow for Understanding LLM Architectures

Ahead of AI · 53d ago · 7 · workflow tutorial open source

A practical workflow guide for reverse-engineering and understanding LLM architectures by inspecting official reports, Hugging Face model configs, and transformers library implementations. The author emphasizes learning through manual analysis of open-weight models rather than relying on proprietary documentation, making it valuable for engineers who want to deeply understand model design patterns.

[AINews] The Two Sides of OpenClaw

Latent Space · 53d ago · 7 · new model api update tool benchmark

Anthropic released Claude Opus 4.7 with improved coding/reasoning capabilities and introduced Claude Design, a new design prototyping tool competing with Figma/Bolt/v0. The update shows strong benchmark performance (ranked #1 in Code Arena, 57.3 on Intelligence Index) with ~35% token efficiency gains, though initial rollout had stability issues that were quickly patched.

WorldX — One sentence creates an AI-driven world — generate maps, characters, and watch stories emerge on their own. 一句话生成一个AI自主驱动的世界.

GitHub Trending AI · 53d ago · 7 · tool open source agent simulation

WorldX is an open-source framework for procedural generation of interactive AI worlds using multiple LLMs (world driver, character, narrative, image generation models). Software engineers can generate complex simulated environments with autonomous AI agents that make decisions, interact, and create emergent narratives—configurable via OpenAI-compatible APIs and supporting multiple providers like OpenRouter and Google AI Studio.

Adding a new content type to my blog-to-newsletter tool

Simon Willison · 53d ago · 8 · agent workflow tutorial prompt engineering

Practical guide demonstrating effective agentic engineering patterns through a real-world example of using Claude Code to modify a blog-to-newsletter tool. Key techniques include cloning reference repositories for context, referencing existing code patterns to explain requirements, and building in validation mechanisms for agents to test their own work.

ProductApr 17, 2026Introducing Claude Design by Anthropic LabsToday, we’re launching Claude Design, a new Anthropic Labs product that lets you collaborate with Claude to create polished visual work like designs, prototypes, slides, one-pagers, and more.

Anthropic Blog · 53d ago · 6 · api update tool workflow

Anthropic launched Claude Design, a new visual design tool powered by Claude Opus 4.7 that integrates with their API ecosystem and offers design system automation, multi-format imports, and seamless handoff to Claude Code for implementation. While primarily a product announcement, it's relevant for engineers building AI applications as it demonstrates practical multimodal AI workflows and introduces new integration opportunities with Claude's expanding toolkit.

Independent researcher looking for technical feedback on a paper about a revision-capable language model [P]

r/MachineLearning · 53d ago · 7 · new model research inference

Reviser is a novel language model architecture that generates text through cursor-relative edit actions on a mutable canvas rather than standard left-to-right autoregressive decoding, enabling revision capabilities while maintaining computational efficiency. The approach generates over edit-history actions instead of final text order, potentially offering practical benefits for iterative text generation workflows. This represents interesting research on alternative decoding paradigms that could influence how engineers think about model inference and editing systems.

Building a Fast Multilingual OCR Model with Synthetic Data

HuggingFace Blog · 53d ago · 8 · new model open source tool research

NVIDIA released Nemotron OCR v2, a multilingual OCR model trained on 12M synthetic images across 6 languages, achieving significant accuracy improvements (NED scores 0.035-0.069) through programmatic text rendering with precise ground truth labels. The approach demonstrates how synthetic data generation can overcome annotation bottlenecks while maintaining real-world performance, with the model, dataset, and pipeline available open-source.

My agent diagnosed a bug in its own system and routed around it unprompted [P]

r/MachineLearning · 53d ago · 8 · agent open source workflow research

Springdrift is a persistent runtime architecture for LLM agents featuring append-only memory, OTP supervision, and passive sensorium (injected self-state context) instead of tool-call-based introspection. The post demonstrates practical advantages through a real example where the agent autonomously diagnosed a missing writer agent without diagnostic tool calls and routed around the error. This workflow design enables LLM agents to serve as collaborative pair programmers on their own systems.

Low accuracy (~50%) with SSL (BYOL/MAE/VICReg) on hyperspectral crop stress data — what am I missing? [R]

r/MachineLearning · 54d ago · 7 · research tutorial fine tuning

A practitioner shares a real hyperspectral classification problem with SSL pretraining stuck at ~45-50% accuracy on nitrogen stress detection in crops. The post discusses SSL method choices (BYOL, MAE, VICReg), data augmentation strategies, and model architectures (ViT vs CNN), providing practical debugging insights for domain-specific computer vision tasks.

Looking for help from people who built multi Agents systems [P]

r/MachineLearning · 54d ago · 6 · agent tool benchmark

Engineer shares a chaos engineering framework they built for testing multi-agent systems in production, designed to prevent customer-facing failures. They're seeking collaboration to develop it further and establish benchmarking capabilities for agent reliability.

datasette 1.0a28

Simon Willison · 54d ago · 6 · api update tool workflow

Datasette Cloud 1.0a27 fixes breaking changes from a previous alpha release, with development accelerated using Claude Code and the new Claude Opus 4.7 model. While the tool update is niche, the mention of Claude Opus 4.7 and AI-assisted development workflow shows practical application of new model capabilities.