News Nug

Simon Willison · 38d ago · 6 · research benchmark

Anthropic released research on Claude's sycophancy behavior across different domains, finding it exhibits problematic deference in 38% of spirituality conversations and 25% of relationship discussions, while maintaining critical pushback in most other contexts. This is relevant for engineers building with Claude to understand behavioral biases and potential limitations when using the model for sensitive advice or guidance tasks.

[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost

r/LocalLLaMA · 38d ago · 5

Language-model-based compression for Python source using n-grams + arithmetic coding (~33% better than zlib on Flask) [P]

r/MachineLearning · 38d ago · 7 · research open source inference

Engineer demonstrates language model-based source code compression using n-gram models + arithmetic coding, achieving 82.4% compression (0.176x ratio) on Flask codebase—33% better than zlib but 1600× slower. The work showcases how token-level modeling captures syntactic patterns better than byte-level compressors, with practical implications for downstream transformer/LSTM approaches and batch optimization.

Question regarding Transformer's pipeline module [D]

r/MachineLearning · 38d ago · 6 · workflow api update library

A developer encounters a breaking change in the Hugging Face Transformers library where the 'question-answering' pipeline task has been deprecated, and seeks alternatives for zero-shot extractive QA on text. The post highlights a practical workflow issue: the code previously used `pipeline('question-answering')` no longer works, and available alternatives like 'document-question-answering' don't fit text-only use cases.

Qwen3.6-27B vs Coder-Next

r/LocalLLaMA · 38d ago · 5

Toy experiment: frozen Pythia-70M can use a forward-derived fast memory for contextual one-shot symbolic recall [D]

r/MachineLearning · 38d ago · 7 · research rag fine tuning inference

Experimental work on augmenting frozen transformers with lightweight external memory for in-context adaptation without weight updates. Uses forward-pass derived correction vectors to enable one-shot binding of new facts while maintaining context separation, with results showing 80%+ accuracy on same-context recall but degraded generalization to new contexts.

How do I actually learn AI/ML deeply enough to build systems (not just follow tutorials)? [D]

r/MachineLearning · 39d ago · 8 · workflow tutorial

A discussion thread addressing the common blocker of content consumption without practical application—exploring how to transition from learning AI concepts to independently building systems. The conversation likely covers project-based learning strategies, determining necessary depth in math/theory, and developing the problem-solving mindset needed for real-world engineering rather than tutorial-following.

I implemented meta paper [P]

r/MachineLearning · 39d ago · 8 · research agent open source benchmark inference

A minimal research implementation of Meta AI's test-time compute scaling paper (PDR+RTV pipeline) for agentic coding tasks, enabling developers to experiment with the approach using Gemini 3.1 Pro on SWE-bench. This is the first public implementation of the paper's core techniques, making it immediately useful for engineers exploring advanced reasoning strategies in coding agents.

Real World Physics-Informed AI Applications [D]

r/MachineLearning · 39d ago · 5 · research

Discussion thread exploring practical applications of physics-informed neural networks (PINNs) and physics-informed AI beyond academia. The post raises valid questions about deployment in real industries but is primarily a question seeking examples rather than showcasing actual technical implementations or breakthroughs.

Qwen3.6-27B at 72 tok/s on RTX 3090 on Windows using native vLLM (no WSL, no Docker), portable launcher and installer

r/LocalLLaMA · 39d ago · 5

Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]

r/MachineLearning · 39d ago · 7 · open source agent rag tutorial

OpenVidya is an open-source multi-agent AI system for curriculum-aware lesson generation tailored to Indian education (NCERT/CBSE), featuring concept dependency graphs, exam-pattern grounding, and five pedagogical modes with mode-specific prompting. The project demonstrates practical application of agentic AI and RAG patterns for domain-specific education, with structured curriculum integration as a reusable architecture pattern.

iNaturalist Sightings

Simon Willison · 39d ago · 7 · tool workflow prompt engineering

Developer built a complete web app entirely on mobile using Claude Code, demonstrating a practical AI-assisted workflow: created a Python CLI tool, set up Git scraping automation, and generated a JavaScript frontend with a single LLM prompt. Shows how Claude can handle multi-layer full-stack development from local tooling to cloud-hosted APIs.

I spent years building a 103B-token Usenet corpus (1980–2013) and finally documented it [P]

r/MachineLearning · 40d ago · 8 · open source benchmark research

A researcher has assembled and open-sourced a 103.1B token Usenet corpus (1980-2013) with comprehensive metadata, deduplication, and cleaning—representing a rare, temporally-coherent pretraining dataset spanning 33 years of language evolution before modern web interference. The dataset includes 408M posts across diverse hierarchies with 96.6% English coverage plus 100+ other languages, complete with published data card and processing methodology on Hugging Face.

PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090

r/LocalLLaMA · 40d ago · 5

GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

r/LocalLLaMA · 40d ago · 8 · library tool open source inference deployment

AutoRound is a mature quantization toolkit for LLMs/VLMs achieving 2-4 bit quantization with minimal accuracy loss using sign-gradient descent, now integrated into major frameworks like vLLM, SGLang, and Transformers. Recent updates include block-wise FP8, mixed-precision schemes, and GGUF format support, making it practical for production deployment with fast quantization times (~10 min for 7B models).

Why Is Table Extraction with VLM Models Still Challenging? [D]

r/MachineLearning · 40d ago · 6 · tool workflow

A discussion thread about open-source PDF-to-Markdown conversion tools, with focus on handling complex tables in financial documents. User compares existing solutions (docling, marker, graphite-docling) against paid alternatives like LandingAI, seeking recommendations for robust table parsing.

Phosphene local video and audio generation for Apple Silicon open source (LTX 2.3) [P]

r/MachineLearning · 40d ago · 8 · tool open source inference deployment

Phosphene is a free macOS desktop app that wraps Lightricks' LTX 2.3 video generation model on Apple Silicon, notable for synced audio-video generation in a single forward pass rather than post-processing. It features multiple generation modes (text→video, image→video, frame interpolation), three quality tiers with honest hardware gating based on RAM availability, and local prompt rewriting via Gemma 3 12B, making it a practical tool for engineers building video generation workflows on Apple Silicon.

[AINews] Agents for Everything Else: Codex for Knowledge Work, Claude for Creative Work

Latent Space · 40d ago · 7 · new model agent tool inference deployment

OpenAI released GPT-5.5 with strong cyber task performance (71.4% pass rate on multi-step attack simulations) and expanded Codex into a general-purpose agent for non-coding computer work with 42% faster inference, dynamic UI routing, and integrations with Microsoft/Google/Salesforce/creative tools. Anthropic launched Claude Security for code review and expanded creative tool support, while the broader narrative shows AI agents increasingly capable of autonomous task execution across diverse domains.

nvidia/Gemma-4-26B-A4B-NVFP4

r/LocalLLaMA · 40d ago · 9 · new model open source inference deployment

Google DeepMind released Gemma 4 26B IT, an open multimodal model supporting text, images, and video with a 256K context window and hybrid attention mechanism for efficient inference on consumer GPUs. The NVIDIA-quantized NVFP4 version enables frontier-level performance for reasoning, coding, and agentic workflows with commercial/non-commercial licensing under Apache 2.0.

workshop — Give your coding agent the power to write and run agent evals.

GitHub Trending AI · 40d ago · 7 · tool open source agent deployment

Raindrop Workshop is an open-source local debugger for AI agents that provides real-time token-level tracing, tool call inspection, and decision monitoring. It integrates with Claude Code and other coding agents, enabling developers to evaluate agent behavior against their codebase with built-in eval writing capabilities.