r/MachineLearning · 6h ago · 8 · agent research workflow

Research paper on Continual Harness demonstrates how foundation models can autonomously refine their own execution harnesses through iterative self-improvement, demonstrated via Gemini completing Pokémon games without losses. The work formalizes the agent-harness co-learning loop and shows that self-refinement capabilities are critical for long-horizon task completion, with implications for building more autonomous AI systems.

r/MachineLearning · 12h ago · 8 · research fine tuning tutorial open source

Engineer trained rating-conditioned transformer chess models (9M parameters) on 1B Lichess games, achieving MAIA-3 parity with novel additions: thinking time prediction and clock-aware win probability models. The technical work emphasizes data pipeline optimization (C++ preprocessing + sequential shuffling for GPU efficiency) and demonstrates how small models can match larger baselines through careful training setup and conditioning on player/time context.

Anthropic Blog · 12h ago · 6 · api update workflow agent

Anthropic launched Claude for Small Business, a package of pre-built agentic workflows and connectors that integrate Claude into tools like QuickBooks, HubSpot, and Google Workspace for small business automation tasks. The offering includes 15 ready-to-run workflows across finance, sales, and operations, plus emphasis on data security and AI training partnerships.

r/MachineLearning · 12h ago · 8 · open source tool inference deployment api update

Scenema Audio releases open-source diffusion-based TTS model weights and inference code that decouples emotional performance from voice identity through prompt-based control. Key technical advantages include more natural emotional delivery than autoregressive TTS, support for audio-first video generation workflows, optimized diffusion (8 steps), and Docker/REST API deployment with automatic VRAM management. Practical trade-offs noted: stochastic quality requiring post-editing workflow, sensitivity to detailed prompting, and phonetic spelling for complex words.

r/LocalLLaMA · 18h ago · 8 · new model open source inference deployment

SenseNova-U1 is a new unified multimodal model architecture that natively integrates visual understanding and generation without separate encoders/decoders, achieving state-of-the-art performance on multiple benchmarks while supporting efficient 8-step inference and interleaved image-text generation. Open-source weights, GGUF quantizations, and inference code are now available, with practical optimization features like layer-offload VRAM modes for low-resource deployment.

r/MachineLearning · 19h ago · 6 · workflow open source

A developer seeks architectural patterns for organizing benchmark infrastructure using type-safe data structures (Dataclasses/Pydantic) to manage datasets, task schemas, and experiment composition. While this is a practical engineering question rather than news, it reflects real challenges in building reproducible ML benchmarks and may surface useful open-source projects or design patterns worth studying.

r/MachineLearning · 19h ago · 6 · research benchmark

A technical critique of the 2024 'Ingenia Theorem' paper claiming AGI via ML is impossible, identifying a critical flaw: the proof equivocates between 'human-level classifier' and 'all polytime-sampleable distributions,' which would absurdly prove ImageNet classification is intractable. This is relevant for understanding the theoretical foundations and limitations arguments in AI/ML research.

r/MachineLearning · 20h ago · 5 · tutorial workflow

A developer discusses choosing between logistic regression and tree-based models (random forests) for a UFC fight prediction project, noting that MMA statistics exhibit nonlinear relationships and feature interactions that logistic regression may miss. The post highlights practical ML modeling decisions around feature engineering and model selection for binary classification with domain-specific constraints like betting value optimization.

r/LocalLLaMA · 21h ago · 9 · new model inference tool deployment

Ovis2.6-80B-A3B is a new multimodal LLM featuring a Mixture-of-Experts architecture with 80B total parameters but only ~3B active during inference, offering strong performance with low serving costs. Key improvements include 64K context window, up to 2880×2880 image resolution support, active visual reasoning via "Think with Image" capability, and enhanced OCR/document understanding—with practical implementation examples provided.

r/MachineLearning · 22h ago · 7 · research library open source

A novel Vision Transformer backbone using block-sparse core-periphery attention that reduces complexity from O(N²) to O(2NC + C²), trained with nested dropout for elastic inference-time cost adjustment. Achieves competitive accuracy with DINOv3 while maintaining stability across resolutions (256-1024) and demonstrates interesting emergent attention patterns.

OpenAI Blog · 23h ago · 7 · agent deployment research

OpenAI's sandbox architecture for Codex on Windows provides technical insights into secure execution environments for AI coding agents, with controlled file access and network restrictions—directly applicable for building safe autonomous coding systems.

r/MachineLearning · 23h ago · 8 · research fine tuning prompt engineering workflow

Fast-Slow Training (FST) combines in-context learning via optimized prompts (fast weights) with parameter updates (slow weights) to achieve 3x better sample efficiency than pure RL while reducing catastrophic forgetting and preserving model plasticity. This dual-timescale approach maintains closer alignment to base models while enabling effective continual learning across multiple tasks.

r/MachineLearning · 1d ago · 6 · rag workflow deployment

Post sharing conference decks from Knowledge Graph Conference highlighting production enterprise systems (Bloomberg, AbbVie, Morgan Stanley) using knowledge graphs as reasoning infrastructure rather than retrieval layers, demonstrating real compliance and governance implementations where KGs serve as source-of-truth with LLM interfaces.

Latent Space · 1d ago · 7 · fine tuning benchmark agent open source research api update

OpenAI is deprecating fine-tuning APIs, shifting the AI engineering landscape toward open models, longer context windows, and agentic systems. The piece covers emerging research benchmarks (FrontierMath, medical evals), agentic breakthroughs in math/physics/coding, and the practical move away from proprietary model fine-tuning toward prompt engineering and open-source RLFT alternatives.

r/MachineLearning · 1d ago · 8 · open source tutorial library

A minimal 160-200 line PyTorch implementation of JEPA (Joint-Embedding Predictive Architecture) algorithms that strips away scaling complexities to expose core mathematical concepts. Includes tutorial documentation mapping algorithm theory directly to implementation, making it valuable for understanding self-supervised learning approaches.

Simon Willison · 1d ago · 9 · api update inference tool

OpenAI's reasoning-capable models now use a new /v1/responses endpoint instead of /v1/chat/completions, enabling interleaved reasoning across tool calls for GPT-5 class models. Developers can now view summarized reasoning tokens in their prompts with new command flags (-R/--hide-reasoning) to control visibility.

r/MachineLearning · 1d ago · 6 · tool rag workflow

A developer built a Steam game recommender system using custom vector embeddings to capture nuanced game characteristics (gameplay focus, music, vibe) instead of broad tags, enabling more personalized recommendations and discovery of underrated games. The project uses a database-driven approach with explanations for each recommendation and includes an advanced mode for fine-tuned filtering.

r/MachineLearning · 1d ago · 9 · new model benchmark inference open source

TabPFN-3 releases a major tabular foundation model update enabling 1M-row inference on single H100s with 10-1000x faster inference and a novel thinking mode for test-time compute optimization. The model achieves 93% win rate over classical ML and demonstrates significant improvements in speed, scale, and multi-class support through architectural innovations like row-chunked inference and KV caching.