r/MachineLearning · 10h ago · 6 · research benchmark

A researcher shares their struggling GNN implementation for fraud detection on IEEE CIS dataset, achieving suboptimal performance (AUC 0.87, PR-AUC 0.52) across multiple architectures (GCN, GraphSAGE, GAT). This is practical ML engineering content with specific technical challenges but lacks novel insights—relevant for learning what not to do and potential debugging approaches.

r/MachineLearning · 23h ago · 7 · research library benchmark

EAMS presents an Equivariant Mesh Neural Network framework for robust anatomical mesh segmentation across medical imaging tasks (dental, liver, aneurysm), maintaining performance under geometric perturbations like patient pose variation where standard methods degrade by 25+ IoU points. The work combines intrinsic mesh descriptors with anatomy-aware PCA-derived priors in a lightweight (<2M parameter) architecture, demonstrating that equivariance principles from molecular modeling transfer effectively to 3D medical mesh tasks despite trade-offs in capturing subtle asymmetric features.

r/MachineLearning · 1d ago · 7 · research agent inference

A technical essay critiques reasoning models' ability to perform faithful inference, arguing that jointly-generated reasoning traces and final answers lack genuine separation of concerns. The piece engages empirically with recent work (Lanham/Turpin/Mirzadeh) and compares architectural approaches (HRM, TRM, GRAM, AlphaProof, Kona/Aleph), offering conceptual framing around constraints vs. influence that's relevant for engineers building reasoning systems.

r/MachineLearning · 1d ago · 6 · benchmark research

Critical analysis of METR's widely-cited AI capability benchmark, exposing methodological flaws including biased sampling (METR employees' peers), perverse incentives (hourly pay encouraging slower completion), unmeasured baselines, and likely training data contamination. Highlights systemic issues in AI research evaluation practices that engineers should be aware of when assessing capability claims.

r/MachineLearning · 2d ago · 6 · inference fine tuning deployment research

Call for papers for the 2nd Workshop on Efficient Reasoning at COLM 2026, covering practical topics like inference optimization (pruning, compression, KV-cache), efficient training/fine-tuning, and deployment of reasoning systems under resource constraints. Relevant for engineers working on cost-effective LLM inference and on-device reasoning, though this is primarily a conference submission announcement rather than technical content.

r/MachineLearning · 2d ago · 5 · research workflow

This is a technical discussion about evaluating self-supervised learning (SSL) methods like BYOL and JEPA, questioning whether the RankMe metric (embedding effective rank via SVD) remains meaningful as an evaluation criterion when incorporated as a loss term during training. The post explores the tension between using metrics to assess learning quality versus explicitly optimizing them, relevant for practitioners evaluating SSL model representations.

r/MachineLearning · 3d ago · 7 · research inference open source

Deep dive into WordDetectorNN, a handwritten word detection model using per-pixel distance regression to bounding boxes instead of anchor-based detection, followed by DBSCAN clustering with IoU-based distance metric. The architecture uses ResNet18 + FPN decoder with 6-channel pixel-level outputs, offering no-tuning detection but with O(n²) clustering bottleneck and non-differentiable post-processing.

r/MachineLearning · 3d ago · 8 · fine tuning research tutorial

Practical fine-tuning research comparing three supervised fine-tuning (SFT) approaches for personality injection: chat demonstrations, first-person statements, and synthetic documents. The author empirically tests which training data format most effectively shapes model behavior and self-representation, finding first-person statements outperform intuitive conversation-based approaches on generalization.

r/MachineLearning · 4d ago · 7 · research open source inference

A software engineer describes a novel Hebbian learning architecture that achieves CIFAR-10 results without backpropagation, using only 5-7% of parameters through emergent sparse connectivity on a consumer GPU. The system exhibits interesting emergent behaviors including self-recovery after targeted neuron damage and performance jumps, suggesting biological plausibility might yield practical insights for efficient model design.

r/MachineLearning · 4d ago · 6 · research prompt engineering

A researcher observes that transformer models' inherent drive to predict accurate tokens ("clarity-seeking") can prioritize semantic coherence over safety constraints when discussing higher-order topics, potentially explaining constraint bypass behaviors. This touches on model alignment and interpretability but lacks technical depth, experimental validation, or concrete mechanisms.

r/MachineLearning · 4d ago · 7 · research inference open source library

SM1 (Scalar Mamba1) implements a closed-form solution for state-space models with d_state=1 using pure PyTorch operations, eliminating the selective scan bottleneck and reducing memory by 16x compared to standard Mamba implementations. The author demonstrates practical benefits: training a 130M parameter model on MIDI data with minimal memory footprint (56KB state, no KV cache) on consumer hardware, highlighting that scalar state dimensions can be sufficient when token representations already encode structure.

r/MachineLearning · 4d ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

r/MachineLearning · 5d ago · 6 · research fine tuning workflow

RPS (Regressive Plasticity Schedule) is a two-stage training approach combining curriculum learning with adaptive learning rate decay, showing improvements on ARC-AGI benchmarks and program synthesis tasks. The method trains models on easy data with high learning rates, then hard data with reduced learning rates, demonstrating 4% vs 2.4% performance gains over equal learning rate baselines.

r/MachineLearning · 6d ago · 7 · research inference open source

A proof-of-concept exploring inference-time learning within Mixture of Experts (MoE) architectures by inserting specialized expert modules that can update sibling expert weights dynamically. The work combines existing components in a novel way to enable adaptive behavior during inference, potentially useful for building more flexible AI systems without retraining.

r/MachineLearning · 6d ago · 6 · research inference

A Reddit discussion questioning why major AI labs haven't adopted adaptive/dynamic vision tokenization despite research showing potential efficiency gains. The post explores technical trade-offs like pipeline constraints requiring fixed token counts, uncertainty in scaling laws for adaptive methods, and whether marginal improvements justify implementation complexity.

Latent Space · 6d ago · 9 · new model research inference benchmark

OpenAI's general-purpose LLM achieved a novel research result on the Erdős unit distance problem through extended reasoning (125-page output), demonstrating that inference-time scaling enables frontier mathematical reasoning without domain-specific scaffolding. This validates test-time compute as a key scaling paradigm and suggests reasoning capabilities may generalize beyond competition math to open research problems.

r/MachineLearning · 6d ago · 8 · research agent open source benchmark

Research on masked diffusion language models (MDLMs) for world modeling in RL environments, addressing mode collapse and diversity limitations of autoregressive models. Introduces GRPO training framework with zero-shot transfer across multiple open-source environments and agent backbones, with open-sourced code and dataset of 239K trajectories.

r/MachineLearning · 6d ago · 8 · research benchmark inference

OpenAI's reasoning model discovered a counterexample to a long-standing conjecture in discrete geometry (Erdős's unit-distance problem), with the proof verified by an AI grading pipeline and human mathematicians. The result is technically significant for AI-for-science, but lacks crucial experimental details (model name, sampling strategy, compute budget, full pipeline specs) needed to assess whether this represents genuine autonomous research capability or selective reporting from extensive search.

r/MachineLearning · 7d ago · 8 · open source research library agent

Engineer open-sourced NOML, a custom RL algorithm for continuous control that addresses instability in flight simulation by combining anchor policy (safe action fallback), hierarchical actor architecture (independent MLP heads per control axis), and mirror learning for data efficiency. The approach diverges from standard TD3 by eliminating exploration noise while maintaining stability through structural constraints rather than reward shaping.