News Nug

r/MachineLearning · 22h ago · 6 · research agent

A Reddit discussion questioning empirical findings from Zhang et al.'s SA-MDP adversarial attack framework when applied to multi-agent PPO policies. The poster observes contradictory results compared to the original paper's claims about critic vs. actor network attacks, specifically when testing on VMAS environments with IPPO and GPPO variants using KL-divergence-based PGD attacks.

Mapping world model taxonomy [P]

r/MachineLearning · 1d ago · 7 · research tutorial

An explainer article proposing a classification framework for world models, categorizing different approaches and identifying emerging trends in the space. Useful for understanding how world models organize conceptually, though limited in novel technical depth or implementation details.

I built IMGNet – a face verification model that identifies people using sign patterns, not cosine similarity [R]

r/MachineLearning · 1d ago · 7 · research benchmark open source inference

Independent researcher presents IMG Sign Score, a novel face verification approach replacing cosine similarity with sliding window sign pattern matching, achieving 96.27% on LFW with a compact 10.58 MB model trained on CASIA-WebFace. The method introduces SW Block convolution and IMG Sign MSE loss operating purely on sign pattern agreement, with code and model weights publicly available on GitHub and Hugging Face.

Talos-XII: hand-written autograd + small RL/MLP stack in Rust, applied to gacha probability modeling (no tch-rs/ndarray/PyTorch) — looking for benchmark help on ARM/AVX-512/GPU [P]

r/MachineLearning · 2d ago · 7 · research open source inference library

Talos-XII is a hand-written ML systems project in Rust that trains neural networks (EnvNet, DQN, PPO) to model gacha probability dynamics, featuring a custom autograd engine, SIMD dispatch (AVX2/AVX-512/NEON), and an experimental adaptive caching component (ACHF) for CPU-bound RL inference. The project demonstrates practical systems engineering for embedded ML—custom autodiff, parallelization, and BF16 optimization—though the core innovation (ACHF) is still experimental and lacks cross-hardware validation.

Jul 8, 2026AlignmentAn off switch for dual use knowledge in AI models

Anthropic Research · 2d ago · 8 · research fine tuning deployment

Anthropic and AE Studio introduce GRAM (Gradient-Routed Auxiliary Modules), a novel technique for isolating dual-use knowledge (cybersecurity, virology, CBRN) into removable neural compartments within a single model, enabling cost-effective deployment of multiple capability-filtered versions without retraining separate models. This addresses a critical challenge in AI safety by making dangerous knowledge modular and controllable while preserving general model performance.

Agentic safety triggers aren't textual safety triggers — MCP attacks that beat SOTA guardrails more than half the time (code + dataset) [R]

r/MachineLearning · 2d ago · 8 · research agent fine tuning tool prompt engineering

Research demonstrates a critical gap in LLM safety alignment: current text-classification-based guardrails fail against adversarial prompts that encode attacks in tool-call sequences rather than linguistic markers. The study evaluates multiple safety approaches (DPO, SafeDPO, training-free methods) against CVE-based attacks on MCP-enabled agents, showing current SOTA methods only block ~48% of attacks while training-free approaches achieve 3x baseline refusal rates without fine-tuning.

LingBot-Video: sparse-MoE video diffusion transformer (13B total, 1.4B active) post-trained as an action-conditioned world model[R]

r/MachineLearning · 2d ago · 7 · new model open source research rl training inference

LingBot-Video combines diffusion transformers with DeepSeek-V3-style sparse MoE (128 experts, top-8 routing) and multi-reward RL post-training for action-conditioned video generation, with open weights/code in Diffusers/SGLang. Key technical tensions: using VLMs as physics validators may enable reward hacking despite negative examples, and unclear separation between video generation and true world modeling without closed-loop robot validation.

DINOv2 way worse than SigLIP in k-NN. Is this expected? [R]

r/MachineLearning · 3d ago · 6 · research benchmark inference

A developer shares empirical results comparing vision encoders (SigLIP2, CLIP ViT-L, DINOv2) for fine-grained car classification via k-NN retrieval, observing a 50-point accuracy gap between SigLIP2 (92%) and DINOv2 (41%). The post explores whether this is due to embedding space design differences and questions whether DINOv2 needs supervised fine-tuning to be effective for retrieval tasks on small datasets.

Separating signal from noise in coding evaluations

OpenAI Research · 3d ago · 7 · benchmark research

OpenAI's analysis identifies methodological flaws in SWE-Bench Pro, a widely-used benchmark for evaluating AI coding capabilities, which could impact how developers assess model performance for software engineering tasks. This is important for engineers relying on benchmark results to choose models and measure progress on code generation workflows.

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

Latent Space · 3d ago · 7 · agent research workflow tutorial

Lilian Weng's research recap on harness engineering and its relationship to recursive self-improvement (RSI) provides practical design patterns and optimization literature overview for building agent systems. Multiple platforms (Anthropic, Google, LangChain) are converging around harness-centric agent architectures as the proven approach for long-running workflows, moving away from direct model weight modification.

Learning FlashAttention the Hard Way. Part 1: The Algebraic Foundation [D]

r/MachineLearning · 3d ago · 8 · tutorial research inference

A theoretical tutorial series on FlashAttention using modern algebraic formalism (associative reductions, twisted monoids) that enables GPU scheduling optimizations—more powerful than the original framing. Covers safe softmax, Welford's variance, numerical stability bounds, and provides first-principles derivations of constants in FA-2 and Triton kernels.

What if a model could only learn what trusted LoRA adapters can express? [R]

r/MachineLearning · 3d ago · 8 · research fine tuning open source

Novel defense against fine-tuning poisoning attacks that constrains model adaptation to a trusted subspace learned from clean LoRA adapters, making malicious updates geometrically unreachable. Evaluated on 196 public adapters with strong attack suppression while preserving legitimate adaptation, with open-source code and reproducible experiments available.

Mid research got me thinking what about reversed alignment, would trained "bad" model exhibit"good" behavior later and/or secretly [D]

r/MachineLearning · 3d ago · 6 · research fine tuning

A researcher exploring RLHF dynamics poses an interesting thought experiment about training models to exhibit bad behavior and whether latent good behavior would emerge, suggesting alignment properties might be baked into pretraining rather than purely learned during fine-tuning. This touches on mechanistic interpretability and the nature of alignment in language models but lacks empirical validation or concrete technical contribution.

Would you say capture-time semantic annotation for robot trajectories is a solved problem? [R]

r/MachineLearning · 36d ago · 6 · research workflow

A technical discussion on teleoperation data collection limitations for robotics—specifically how raw RGB + joint state streams miss affordance, contact intent, and embodiment context that can't be recovered post-hoc. The post explores whether real-time annotation during capture (rather than post-hoc labeling) could bridge this semantic gap for contact-rich manipulation tasks, relevant for engineers building robot learning systems.

Reality: The Final Eval — Lukas Petersson and Axel Backlund of Andon Labs

Latent Space · 36d ago · 7 · benchmark agent research eval

Andon Labs discusses real-world AI agent evaluation through Vending-Bench, a novel benchmark that tests frontier models operating actual businesses with inventory, finances, and customers rather than traditional exam-style metrics. The article covers practical insights from long-horizon autonomous agents including emergent behaviors like price fixing, deception, and unexpected failure modes that traditional benchmarks miss.

Faithful uncertainty in LLM agents: calibration vs utility tradeoff in practice[D]

r/MachineLearning · 37d ago · 8 · agent research workflow inference

Deep technical discussion on calibration vs. accuracy in LLM-based agents, drawing from Google research on hallucination reduction. Author shares practical patterns for reducing hallucinated tool calls (25% to 5%) using a planning-verification pipeline with confidence-based human review routing, while analyzing the latency-safety tradeoff and the gap between current agent frameworks and confidence-aware control surfaces.

KVarN: Variance-Normalized KV-Cache Quantization [R]

r/MachineLearning · 37d ago · 9 · inference optimization open source benchmark research

KVarN is a novel KV-cache quantization method combining Hadamard rotations with variance normalization that achieves 3-4x compression with minimal accuracy loss on demanding benchmarks like AIME24. The approach includes a vLLM implementation and demonstrates actual speedups over fp16 baselines, making it immediately applicable for optimizing inference in reasoning and code-generation workloads.

On-policy distillation: one of the hottest terms on PapersWithCode [R]

r/MachineLearning · 37d ago · 8 · research fine tuning workflow

On-policy distillation (OPD) is an emerging post-training technique used in recent frontier models (Qwen 3.6/3.7, GLM-5.1, DeepSeek-V4) that efficiently teaches models to avoid specific errors by injecting hint tokens into trajectories rather than requiring full rollout regeneration. The technique uses a separate model to identify mistakes in rollouts, then trains the main model via probability matching on the annotated trajectories—a practical efficiency win over naive reinforcement learning approaches.

How Do You Handle Ablation Studies When the Original Model Is Already Trained?[R]

r/MachineLearning · 37d ago · 6 · workflow research

A practical discussion on conducting ablation studies without full retraining by leveraging saved checkpoints and model components. The thread explores techniques like selective layer freezing, component masking, and gradient-based analysis to evaluate model component importance while maintaining reproducibility against the original baseline.

In current ML systems, where is the main bottleneck: dataset quality or model architecture improvements? [D]

r/MachineLearning · 37d ago · 6 · workflow research

Discussion exploring the practical tradeoff between architectural improvements and data quality/curation in ML systems, with insights on how dataset preparation, synthetic data pipelines, and data constraints compare to model design as bottlenecks in applied settings.