r/MachineLearning · 1d ago · 7 · tutorial research workflow

This article explains Riemannian optimization techniques for machine learning on manifolds (like hyperspheres), focusing on how to adapt gradient descent to preserve geometric constraints using exponential maps and retractions. It provides practical implementation guidance for constraining neural network parameters to stay on spherical manifolds, with code examples using PyTorch.

Latent Space · 2d ago · 9 · new model api update agent workflow

Google released Gemini 3.5 Flash (GA immediately) with 1M context window, 65k max output, and agentic/coding capabilities, plus the new Gemini Omni multimodal family for video/audio generation and editing. The stack includes expanded Antigravity agents across desktop/CLI/SDK/API, with Google reporting 3.2 quadrillion tokens/month processed and 900M+ monthly users.

OpenAI Blog · 2d ago · 6 · workflow tool

Ramp shares their workflow using Codex (OpenAI's code model) integrated with GPT-5.5 for automated code review, reducing feedback cycles from hours to minutes. The article highlights practical implementation of AI-assisted code review as part of their development process, offering insights into how organizations can adopt similar AI-powered review systems.

OpenAI Research · 2d ago · 6 · research benchmark

OpenAI's model solved a long-standing discrete geometry problem (the unit distance conjecture), demonstrating AI capability in mathematical reasoning and proof generation. While impressive as a research milestone, this is primarily a mathematics/science application story rather than a technical advancement for building AI systems.

Simon Willison · 2d ago · 9 · new model api update deployment inference

Google released Gemini 3.5 Flash to general availability with 1M input/65K output tokens, integrated into billions of consumer products, but at 3-6x higher pricing than previous Flash models ($1.50/$9 per million tokens). The release includes a new Interactions API (beta) for server-side history management and demonstrates industry-wide trend of pricing increases for new model releases across OpenAI, Anthropic, and Google.

r/LocalLLaMA · 2d ago · 6 · new model benchmark

Community discussion about HRM-Text, a new 1B parameter model with impressive benchmark claims. The post raises valid skepticism about the benchmarks and seeks technical explanation of the model's architecture and practical limitations for engineers evaluating whether to adopt it.

r/LocalLLaMA · 2d ago · 6 · tool inference deployment

Release notes for AI Edge Gallery showing incremental updates including experimental Model Context Protocol (MCP) support, Gemma3 1B NPU optimization for Qualcomm SoCs, and various agent capability enhancements. Relevant for engineers building on-device AI applications, particularly those targeting edge deployment on mobile/embedded hardware.

r/LocalLLaMA · 2d ago · 6 · inference deployment

Intel's Crescent Island GPU, based on the new Xe3P architecture, is an upcoming AI inference accelerator featuring 160GB of LPDDR5X memory designed for cost-effective, power-optimized data center deployment. The PCB leak reveals hardware specifications including 20 memory modules, 13 VRMs, and a single 16-pin power connector, positioning it as a competitor to NVIDIA and AMD's HBM-based solutions.

r/MachineLearning · 2d ago · 9 · tool open source inference research

AXON is a real-time mechanistic interpretability visualization tool that streams SAE-decomposed residual stream features from GPT-2 as an interactive 3D force graph, enabling developers to observe which semantic features activate before token generation. Built with TransformerLens, SAELens, FastAPI WebSocket, and Three.js, it supports multiple model architectures and runs on both CPU and GPU, providing practical insight into model internals during inference.

HuggingFace Blog · 2d ago · 7 · new model inference benchmark

OlmoEarth v1.1 achieves 3x compute cost reduction for satellite imagery processing while maintaining performance through optimized transformer architecture and token representation strategies. The release demonstrates practical efficiency improvements in large-scale geospatial AI inference, with technical details on patch-based tokenization and multi-resolution handling for remote sensing data.

r/LocalLLaMA · 2d ago · 8 · tool agent workflow open source

CodeGraph is a new MCP server tool that pre-indexes codebases into knowledge graphs (symbol relationships, call graphs, code structure), enabling AI agents like Claude Code to explore repositories with 92% fewer tool calls and 71% faster performance by querying local SQLite indices instead of scanning files. The tool auto-syncs via file watchers, integrates with Claude Code/Cursor/Codex CLI, and includes framework-specific routing detection for web apps.

r/MachineLearning · 2d ago · 6 · research agent open source

Empirical comparison of bio-plausible learning (Hebbian plasticity + predictive coding) versus PPO on Pong, achieving 57% of PPO performance with zero backpropagation. Identifies catastrophic forgetting in non-stationary self-play as the key bottleneck rather than the lack of backprop, revealing the plasticity-stability tradeoff in biologically-inspired RL systems.

r/MachineLearning · 2d ago · 5 · research benchmark

Reddit discussion questioning the practical utility of tabular foundation models (TabPFN-3, TabICL) despite impressive benchmark results, arguing that resource overhead (GB models for MB datasets) may not justify gains over classical ML with feature engineering. Raises valid engineering tradeoffs about model size, inference requirements, and explainability versus performance metrics.

r/LocalLLaMA · 2d ago · 8 · new model research inference

Lance is a unified multimodal model from ByteDance that handles image and video understanding, generation, and editing in a single framework. The paper demonstrates strong performance on diverse visual reasoning tasks including video QA, chart analysis, and detailed scene description, making it relevant for engineers building multimodal AI applications.

OpenAI Blog · 2d ago · 5 · tool deployment

OpenAI has released Content Credentials integration and verification tools to help identify AI-generated media through technical standards. While not directly impacting daily AI engineering workflows, this is relevant for developers building content creation systems who need to implement transparency and provenance tracking.

Latent Space · 2d ago · 8 · tutorial workflow inference benchmark

Vlad Feinberg's hiring/skill guide emphasizes kernel-level performance optimization as the critical bottleneck in LLM work, highlighting the need for JAX/Pallas expertise to fuse operations like MoE projections for measurable speedups. The piece connects pretraining fundamentals (Chinchilla laws, dense vs MoE tradeoffs) with low-level optimization as a direct path into AI labs, plus practical exercises (deriving scaling laws, implementing kernels from scratch) that double as hiring tests.

r/MachineLearning · 3d ago · 8 · tool open source deployment workflow

swm is an open-source tool that simplifies GPU rental workflows by providing unified pricing across providers (RunPod, Vast.ai, Lambda, etc.), automatic workspace syncing to S3-compatible storage, and lifecycle management to prevent runaway costs. It supports popular AI frameworks like ComfyUI, Ollama, vLLM, and Axolotl, eliminating the 45-minute reinstall cycle that plagues multi-provider GPU usage.

Simon Willison · 3d ago · 6 · benchmark agent workflow

A retrospective on LLM developments from November 2025 to May 2026, highlighting the inflection point where coding agents became production-ready through RL from verifiable rewards, and models rapidly iterated across providers. The author discusses practical experiences building ambitious projects with these new capabilities and references an emerging open-source coding agent framework (Warelay).

r/MachineLearning · 3d ago · 7 · workflow inference deployment

Engineer discusses streaming architecture for processing long videos with Whisper and LLMs, addressing chunking strategies to maintain context, audio VAD techniques, and whether asyncio/FastAPI suffices versus Celery/Redis for pipelined task processing. Practical workflow optimization relevant for building real-time AI video analysis backends.