r/MachineLearning · 1d ago · 8 · fine tuning deployment workflow tutorial

ML team documents critical issues and workarounds for fine-tuning and deploying Gemma-4 with PEFT and TRL, including problems with custom layer compatibility, KV-sharing attention, DeepSpeed ZeRO-3 adapter corruption, and runtime LoRA serving limitations. Provides practical fixes like unwrapping custom layers before PEFT, upgrading transformers to v5.5.2+, and manual weight merging for deployment.

r/MachineLearning · 2d ago · 8 · open source library tool workflow

easyaligner is a new open-source forced alignment library built for speech-to-text preprocessing that handles practical pain points like partial transcripts, long audio segments without chunking, and text normalization with format recovery. It leverages PyTorch's forced alignment API with GPU-optimized Viterbi algorithm and supports any language with wav2vec2 models on Hugging Face Hub, achieving 35-102% faster transcription than WhisperX.

Simon Willison · 2d ago · 7 · prompt engineering tool open source

Anthropic publicly released system prompts for Claude models as Markdown, which Simon Willison converted into version-tracked files using Claude Code to enable easy comparison. This provides valuable transparency into how Claude's behavior is shaped across model versions, with detailed notes on changes between Opus 4.6 and 4.7 for understanding prompt engineering decisions.

Ahead of AI · 2d ago · 7 · workflow tutorial open source

A practical workflow guide for reverse-engineering and understanding LLM architectures by inspecting official reports, Hugging Face model configs, and transformers library implementations. The author emphasizes learning through manual analysis of open-weight models rather than relying on proprietary documentation, making it valuable for engineers who want to deeply understand model design patterns.

Latent Space · 2d ago · 7 · new model api update tool benchmark

Anthropic released Claude Opus 4.7 with improved coding/reasoning capabilities and introduced Claude Design, a new design prototyping tool competing with Figma/Bolt/v0. The update shows strong benchmark performance (ranked #1 in Code Arena, 57.3 on Intelligence Index) with ~35% token efficiency gains, though initial rollout had stability issues that were quickly patched.

Simon Willison · 2d ago · 8 · agent workflow tutorial prompt engineering

Practical guide demonstrating effective agentic engineering patterns through a real-world example of using Claude Code to modify a blog-to-newsletter tool. Key techniques include cloning reference repositories for context, referencing existing code patterns to explain requirements, and building in validation mechanisms for agents to test their own work.

Anthropic Blog · 3d ago · 6 · api update tool workflow

Anthropic launched Claude Design, a new visual design tool powered by Claude Opus 4.7 that integrates with their API ecosystem and offers design system automation, multi-format imports, and seamless handoff to Claude Code for implementation. While primarily a product announcement, it's relevant for engineers building AI applications as it demonstrates practical multimodal AI workflows and introduces new integration opportunities with Claude's expanding toolkit.

r/MachineLearning · 3d ago · 7 · new model research inference

Reviser is a novel language model architecture that generates text through cursor-relative edit actions on a mutable canvas rather than standard left-to-right autoregressive decoding, enabling revision capabilities while maintaining computational efficiency. The approach generates over edit-history actions instead of final text order, potentially offering practical benefits for iterative text generation workflows. This represents interesting research on alternative decoding paradigms that could influence how engineers think about model inference and editing systems.

HuggingFace Blog · 3d ago · 8 · new model open source tool research

NVIDIA released Nemotron OCR v2, a multilingual OCR model trained on 12M synthetic images across 6 languages, achieving significant accuracy improvements (NED scores 0.035-0.069) through programmatic text rendering with precise ground truth labels. The approach demonstrates how synthetic data generation can overcome annotation bottlenecks while maintaining real-world performance, with the model, dataset, and pipeline available open-source.

r/MachineLearning · 3d ago · 8 · agent open source workflow research

Springdrift is a persistent runtime architecture for LLM agents featuring append-only memory, OTP supervision, and passive sensorium (injected self-state context) instead of tool-call-based introspection. The post demonstrates practical advantages through a real example where the agent autonomously diagnosed a missing writer agent without diagnostic tool calls and routed around the error. This workflow design enables LLM agents to serve as collaborative pair programmers on their own systems.

r/MachineLearning · 3d ago · 7 · research tutorial fine tuning

A practitioner shares a real hyperspectral classification problem with SSL pretraining stuck at ~45-50% accuracy on nitrogen stress detection in crops. The post discusses SSL method choices (BYOL, MAE, VICReg), data augmentation strategies, and model architectures (ViT vs CNN), providing practical debugging insights for domain-specific computer vision tasks.

r/MachineLearning · 3d ago · 6 · agent tool benchmark

Engineer shares a chaos engineering framework they built for testing multi-agent systems in production, designed to prevent customer-facing failures. They're seeking collaboration to develop it further and establish benchmarking capabilities for agent reliability.

Simon Willison · 3d ago · 6 · api update tool workflow

Datasette Cloud 1.0a27 fixes breaking changes from a previous alpha release, with development accelerated using Claude Code and the new Claude Opus 4.7 model. While the tool update is niche, the mention of Claude Opus 4.7 and AI-assisted development workflow shows practical application of new model capabilities.

Latent Space · 3d ago · 9 · new model api update inference benchmark

Claude Opus 4.7 launched with significant improvements: new tokenizer enabling up to 35% higher token efficiency despite 50% reduction in overall token usage, vision capabilities expanded to 2,576px (3.75MP) enabling pixel-perfect multimodal work, and new 'xhigh' reasoning effort level with 11-point SWE-Bench Pro improvement for code tasks. Pricing unchanged at $5/$25 per million tokens, making this a critical update for AI engineers doing coding, computer use agents, and vision-dependent workflows.

Anthropic Blog · 4d ago · 10 · new model api update benchmark deployment

Claude Opus 4.7 is now generally available with significant improvements in software engineering tasks, complex multi-step reasoning, and vision capabilities—handling previously-supervised coding work autonomously. The model is accessible via Claude API (claude-opus-4-7), all major cloud platforms, and maintains Opus 4.6 pricing ($5/$25 per million tokens), with intentionally reduced cybersecurity capabilities and new safeguards for responsible deployment.

r/MachineLearning · 4d ago · 7 · research training inference

ResBM introduces a residual bottleneck architecture for efficient pipeline-parallel training that achieves 128× activation compression while maintaining convergence, directly addressing bandwidth constraints in distributed AI model training. The work combines encoder-decoder bottlenecks with low-rank identity paths and demonstrates practical results using Muon optimization, relevant for engineers optimizing large-scale model training infrastructure.

r/MachineLearning · 4d ago · 6 · research benchmark

An experiment testing frontier multimodal models' ability to appraise fine art from vision alone, revealing a gap between visual recognition and commitment to vision-based decisions. The analysis compares image-only vs. image+metadata approaches across GPT-4o, Claude 3.5 Sonnet, Gemini 3.1 Pro, and others, with implications for understanding multimodal model behavior and visual grounding.

OpenAI Blog · 4d ago · 7 · tool agent workflow

Codex app gains computer use capability, in-app browsing, image generation, and memory features that enable more autonomous agent behaviors for developers. The plugin system and memory persistence could streamline repetitive coding workflows and integrate with existing development tools.