r/LocalLLaMA · 9h ago · 7 · tool inference deployment tutorial

Practical guide for running MiMo-V2.5-coder-Q2, a quantized coding model optimized for Apple Silicon, across multiple inference frameworks (llama.cpp, vLLM, Ollama, etc.). Includes specific configurations for 128GB M5 systems and fallback strategies for memory-constrained setups, directly applicable for engineers deploying local coding assistants.

r/MachineLearning · 16h ago · 8 · tool agent deployment open source

Production-tested solution for enforcing tool-call constraints in LangGraph agents using a YAML-based contract layer that validates rules deterministically before execution. Addresses critical failure mode where prompt engineering and post-hoc auditing fail to prevent compliance violations, with the approach open-sourced as Sponsio for community feedback.

HuggingFace Blog · 17h ago · 7 · agent workflow

A practical glossary clarifying commonly confused terminology in AI agent development (model, scaffold, harness, tool definitions) with examples from frameworks like Claude Code and Codex. Provides mental models for understanding agent architecture that's essential when building or deploying agentic systems, though not a technical tutorial.

Simon Willison · 18h ago · 6 · tool api update agent

Datasette 1.0a30 introduced a new makeJumpSections() JavaScript plugin hook that datasette-agent leverages to add agent chat functionality directly into the Jump to menu interface. This represents a practical integration pattern for embedding AI agents into existing tools, though it's specific to the Datasette ecosystem rather than broadly applicable.

r/MachineLearning · 19h ago · 7 · tool open source workflow

MergeNB is a VS Code extension that improves Jupyter Notebook merging for collaborative workflows, addressing pain points with existing tools like nbdime. The tool features a web UI and plans to expand as a git mergetool, offering practical improvements for teams managing notebook-based research and development.

r/MachineLearning · 19h ago · 5 · research workflow

This is a technical discussion about evaluating self-supervised learning (SSL) methods like BYOL and JEPA, questioning whether the RankMe metric (embedding effective rank via SVD) remains meaningful as an evaluation criterion when incorporated as a loss term during training. The post explores the tension between using metrics to assess learning quality versus explicitly optimizing them, relevant for practitioners evaluating SSL model representations.

Simon Willison · 22h ago · 6 · prompt engineering workflow

Armin Ronacher discusses a growing problem in open-source development where AI-generated issue reports obscure actual user observations with confident but often inaccurate interpretations, making debugging harder. The post highlights practical friction when LLMs are used to process and reword user problems without preserving the original observed facts.

r/MachineLearning · 1d ago · 6 · library open source inference benchmark

Thermocompute is a PyTorch library that emulates thermodynamic probabilistic computing, offering stochastic neural layers (p-bits, samplers, generative models) designed to exploit parallel hardware where inference time remains constant as layer width increases. The key technical insight is that on GPUs with available parallel capacity, thermodynamic layers can achieve flat wall-clock time scaling with width, potentially outperforming classical dense FFNs for certain workloads.

r/MachineLearning · 1d ago · 7 · library open source inference

A Go developer created a pure Go CUDA binding library (gocudrv) that eliminates cgo dependencies by loading libcuda.so at runtime using purego, enabling cross-compilation and smaller Docker images for ML workloads. The implementation uses OS thread locking to handle CUDA's per-thread context model via goroutine channels, with early support for memory allocation, kernel launches, and GPU event timing.

r/MachineLearning · 1d ago · 7 · tool benchmark open source

Papers with Code has been revived with new features for tracking AI SOTA across domains, including multi-metric leaderboards, paper lineage tracking, method taxonomy, and ~3k model evaluations. The platform now supports external paper submissions (non-Arxiv) with auto-enrichment via AI, making it a useful reference tool for staying current with model releases and benchmarks.

r/MachineLearning · 1d ago · 8 · benchmark rag inference workflow

Comprehensive benchmark comparing vision-capable LLMs (native PDF) against OCR-based RAG pipelines on long document processing, showing OCR approaches achieve higher accuracy (59.6% vs 52.0%) and lower cost ($0.19 vs $0.25/query) despite the 'vision makes OCR obsolete' narrative. Key findings: vision LLMs struggle with tables/charts, have a 7% failure rate on large PDFs that survives retries, while premium OCR + layout extraction proves more robust for document-heavy workloads.

r/MachineLearning · 1d ago · 7 · research inference open source

Deep dive into WordDetectorNN, a handwritten word detection model using per-pixel distance regression to bounding boxes instead of anchor-based detection, followed by DBSCAN clustering with IoU-based distance metric. The architecture uses ResNet18 + FPN decoder with 6-channel pixel-level outputs, offering no-tuning detection but with O(n²) clustering bottleneck and non-differentiable post-processing.

r/MachineLearning · 1d ago · 8 · fine tuning research tutorial

Practical fine-tuning research comparing three supervised fine-tuning (SFT) approaches for personality injection: chat demonstrations, first-person statements, and synthetic documents. The author empirically tests which training data format most effectively shapes model behavior and self-representation, finding first-person statements outperform intuitive conversation-based approaches on generalization.

r/MachineLearning · 1d ago · 6 · workflow inference benchmark

A software engineer debugging significant training bottlenecks in a robotics imitation learning pipeline (ResNet18 + DiT policy, 50M params) experiencing 10 iterations/sec throughput with low GPU utilization despite high CPU usage. The profiler data suggests dataloader and optimizer operations are consuming 62%+ of time, indicating potential CPU-GPU synchronization issues, inefficient data pipeline design, or framework overhead rather than compute-bound problems.

r/MachineLearning · 2d ago · 8 · tool open source agent deployment

AgentLantern is an open-source devtool that provides visibility into AI agent project structure and execution, addressing the debugging and observability challenges in multi-agent systems. It offers three components: static documentation generation, linting for design issues, and a runtime viewer for observing agent behavior—currently supporting CrewAI with plans for broader framework support.

r/MachineLearning · 2d ago · 7 · research open source inference

A software engineer describes a novel Hebbian learning architecture that achieves CIFAR-10 results without backpropagation, using only 5-7% of parameters through emergent sparse connectivity on a consumer GPU. The system exhibits interesting emergent behaviors including self-recovery after targeted neuron damage and performance jumps, suggesting biological plausibility might yield practical insights for efficient model design.

r/MachineLearning · 2d ago · 6 · research prompt engineering

A researcher observes that transformer models' inherent drive to predict accurate tokens ("clarity-seeking") can prioritize semantic coherence over safety constraints when discussing higher-order topics, potentially explaining constraint bypass behaviors. This touches on model alignment and interpretability but lacks technical depth, experimental validation, or concrete mechanisms.

r/MachineLearning · 2d ago · 6 · rag workflow prompt engineering

Reddit discussion proposing a personalized cognitive profiling system that tracks not just facts but learning patterns, struggling points, and effective explanation styles to improve LLM context retrieval over time. The idea combines dynamic profiling with RAG-like personalization to create an evolving understanding of how individual users think, rather than basic chat memory.

r/MachineLearning · 2d ago · 7 · open source agent tool workflow

Spice is an open-source decision layer framework that sits above execution agents, providing context-aware task routing and decision-making through a perception → simulation → decision → execution → reflection loop. Rather than replacing agents like Claude or Codex, it adds orchestration capabilities including state modeling, option simulation, and outcome reflection to coordinate multi-agent workflows.