r/LocalLLaMA · 1d ago · 7 · tutorial inference open source deployment

Practical guide covering multiple inference frameworks (Transformers, llama-cpp-python, vLLM, SGLang, Ollama, etc.) for running a 27B quantized Qwen model. Includes GGUF quantization options and benchmark comparisons showing minimal accuracy degradation, useful for engineers optimizing local model deployment.

r/LocalLLaMA · 2d ago · 7 · tool inference deployment tutorial

Practical guide for running MiMo-V2.5-coder-Q2, a quantized coding model optimized for Apple Silicon, across multiple inference frameworks (llama.cpp, vLLM, Ollama, etc.). Includes specific configurations for 128GB M5 systems and fallback strategies for memory-constrained setups, directly applicable for engineers deploying local coding assistants.

r/MachineLearning · 3d ago · 8 · fine tuning research tutorial

Practical fine-tuning research comparing three supervised fine-tuning (SFT) approaches for personality injection: chat demonstrations, first-person statements, and synthetic documents. The author empirically tests which training data format most effectively shapes model behavior and self-representation, finding first-person statements outperform intuitive conversation-based approaches on generalization.

r/MachineLearning · 4d ago · 7 · tutorial prompt engineering

A hands-on explanation of LLM architecture breaking down how token prediction works through embeddings, positional encoding, attention, and the LM Head—using a simple 4-sentence example to illustrate why models predict contextually appropriate tokens. Demystifies transformer mechanics by focusing on the core probability matching problem rather than advanced concepts, making it accessible for engineers learning from first principles.

r/MachineLearning · 7d ago · 7 · tutorial research workflow

This article explains Riemannian optimization techniques for machine learning on manifolds (like hyperspheres), focusing on how to adapt gradient descent to preserve geometric constraints using exponential maps and retractions. It provides practical implementation guidance for constraining neural network parameters to stay on spherical manifolds, with code examples using PyTorch.

Latent Space · 8d ago · 8 · tutorial workflow inference benchmark

Vlad Feinberg's hiring/skill guide emphasizes kernel-level performance optimization as the critical bottleneck in LLM work, highlighting the need for JAX/Pallas expertise to fuse operations like MoE projections for measurable speedups. The piece connects pretraining fundamentals (Chinchilla laws, dense vs MoE tradeoffs) with low-level optimization as a direct path into AI labs, plus practical exercises (deriving scaling laws, implementing kernels from scratch) that double as hiring tests.

HuggingFace Blog · 8d ago · 8 · new model tool open source rag tutorial

Six new Sentence Transformers CrossEncoder rerankers built on ModernBERT, trained with distillation on open datasets, achieving SOTA performance at multiple model sizes. Includes full training recipes, easy 3-line inference API, and a new Hugging Face Agent Skill for fine-tuning rerankers on custom data.

HuggingFace Blog · 8d ago · 8 · fine tuning tool tutorial open source

Practical guide for parameter-efficient fine-tuning of NVIDIA's Cosmos Predict 2.5 video world model using LoRA and DoRA adapters, enabling domain-specific adaptation on consumer GPUs without catastrophic forgetting. Includes complete implementation walkthrough using diffusers and accelerate libraries for generating synthetic robot trajectories for policy learning.

r/LocalLLaMA · 10d ago · 6 · tool inference tutorial

Tutorial covering deployment of a fine-tuned Gemma 4 31B GGUF model across multiple inference frameworks (Transformers, llama-cpp-python, vLLM, Ollama, etc.), with focus on creative writing and reduced content restrictions. While practically useful for engineers running quantized models locally, this is primarily a model card/deployment guide rather than introducing new technical capabilities or frameworks.

r/MachineLearning · 11d ago · 7 · tutorial workflow fine tuning

A software engineer shares a practical medical imaging classification problem (coronary artery classification from X-ray angiograms) with detailed overfitting issues and debugging attempts. This is a real-world scenario demonstrating transfer learning challenges, data augmentation strategies, and regularization techniques on small medical datasets (~900 samples), with actionable technical insights for practitioners building medical AI systems.

r/MachineLearning · 11d ago · 6 · research workflow tutorial

A practitioner is debugging Physics-Informed Neural Networks (PINNs) for solving a damped harmonic oscillator ODE, experiencing convergence failures at higher stiffness parameters (k>50). This touches on important PINN training stability issues including loss landscape challenges and hyperparameter sensitivity that are relevant to AI engineers building physics-based models.

r/MachineLearning · 13d ago · 7 · workflow tutorial research

A practitioner shares a real-world time series anomaly detection challenge: building failure prediction for IoT chargers with sparse positive labels (~1-2%), variable data rates between operational modes, and high device heterogeneity. They're exploring architectural solutions (dual RNN encoders vs. data-level sampling) and seeking advice on handling extreme class imbalance in time series forecasting.

HuggingFace Blog · 13d ago · 8 · inference workflow tutorial

This article explains how to optimize LLM inference performance by decoupling CPU and GPU workloads through asynchronous batching, eliminating idle gaps that waste ~24% of runtime in synchronous approaches. The post builds on continuous batching concepts and provides practical profiling techniques to measure and improve GPU utilization, critical for managing high inference costs on hardware like H200s.

r/MachineLearning · 13d ago · 8 · research fine tuning tutorial open source

Engineer trained rating-conditioned transformer chess models (9M parameters) on 1B Lichess games, achieving MAIA-3 parity with novel additions: thinking time prediction and clock-aware win probability models. The technical work emphasizes data pipeline optimization (C++ preprocessing + sequential shuffling for GPU efficiency) and demonstrates how small models can match larger baselines through careful training setup and conditioning on player/time context.

r/MachineLearning · 14d ago · 5 · tutorial workflow

A developer discusses choosing between logistic regression and tree-based models (random forests) for a UFC fight prediction project, noting that MMA statistics exhibit nonlinear relationships and feature interactions that logistic regression may miss. The post highlights practical ML modeling decisions around feature engineering and model selection for binary classification with domain-specific constraints like betting value optimization.

r/MachineLearning · 14d ago · 8 · open source tutorial library

A minimal 160-200 line PyTorch implementation of JEPA (Joint-Embedding Predictive Architecture) algorithms that strips away scaling complexities to expose core mathematical concepts. Includes tutorial documentation mapping algorithm theory directly to implementation, making it valuable for understanding self-supervised learning approaches.

r/MachineLearning · 15d ago · 9 · tool tutorial inference open source

A deep technical breakdown of building a minimal LLM compiler from scratch in Python that lowers models (TinyLlama, Qwen2.5-7B) to optimized CUDA kernels across six IR levels. Demonstrates practical GPU optimization techniques (tiling, shared memory staging, bank conflict resolution, pipelining) with competitive performance (1.11-1.20× vs PyTorch/torch.compile on some ops) and includes reproducible CLI commands for each optimization stage.

Simon Willison · 15d ago · 7 · tutorial workflow prompt engineering tool

Simon Willison demonstrates practical patterns for executing LLM-generated code directly from shell scripts using shebang syntax, including examples with tool calls and YAML-defined functions. The post covers workflow techniques for integrating LLM outputs into command-line workflows and debugging with options like --td for tool inspection.

r/MachineLearning · 15d ago · 5 · tool tutorial

Interactive visualization tool for Jensen-Shannon divergence, a symmetric divergence metric useful for comparing probability distributions. While mathematically foundational for ML work, this is primarily an educational visualization rather than a practical tool for daily AI development workflows.

r/MachineLearning · 16d ago · 5 · workflow tutorial

A discussion thread about data labeling trade-offs for ML practitioners: Scale AI offers quality but high cost, MTurk is cheap but low quality, leaving a gap for teams needing thousands of labeled examples for evals/fine-tuning. The post seeks practical solutions and community experiences on bridging this middle ground.