News Nug

Qwen3.5 35B A3B uncensored heretic Native MTP Preserved is Out Now With the Full 785 MTPs Preserved and Retained, Available in Safetensors, GGUFs. NVFP4, NVFP4 GGUFs and GPTQ-Int4 Formats

r/LocalLLaMA · 1d ago · 6 · open source inference deployment fine tuning

Guide for using a fine-tuned Qwen 3.5-35B variant (with reduced content restrictions) across multiple inference frameworks including Transformers, vLLM, and SGLang, with MMLU benchmark results (83.72% accuracy) and multiple quantization options available. Practical for engineers looking to deploy modified open-source models with different inference backends.

Call for Papers - Workshop on Efficient Reasoning at COLM 2026 [R]

r/MachineLearning · 2d ago · 6 · inference fine tuning deployment research

Call for papers for the 2nd Workshop on Efficient Reasoning at COLM 2026, covering practical topics like inference optimization (pruning, compression, KV-cache), efficient training/fine-tuning, and deployment of reasoning systems under resource constraints. Relevant for engineers working on cost-effective LLM inference and on-device reasoning, though this is primarily a conference submission announcement rather than technical content.

I fine-tuned an LLM to be C-3PO to test which training data format works best for persona injection [P]

r/MachineLearning · 3d ago · 8 · fine tuning research tutorial

Practical fine-tuning research comparing three supervised fine-tuning (SFT) approaches for personality injection: chat demonstrations, first-person statements, and synthetic documents. The author empirically tests which training data format most effectively shapes model behavior and self-representation, finding first-person statements outperform intuitive conversation-based approaches on generalization.

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

HuggingFace Blog · 5d ago · 8 · fine tuning benchmark open source inference

Dharma released DharmaOCR, a pair of specialized 3B-parameter language models that outperform frontier APIs on structured OCR tasks while being significantly cheaper to operate, challenging the industry assumption that largest models are always best. The article explores how specialization, fine-tuning pipelines, and distributional alignment can yield better performance and cost-efficiency than scaling parameters, supported by benchmarks and research across multiple domains.

LatitudeGames/Equinox-31B · Hugging Face

r/LocalLLaMA · 5d ago · 6 · new model fine tuning open source

Latitude released Equinox, a 31B parameter model fine-tuned on Gemma 4 using balanced datasets combining dark adventure narratives and slice-of-life storytelling via supervised fine-tuning. The model is available via subscription on AI Dungeon with quantized GGUF weights provided for download, representing a practical example of multi-dataset fine-tuning for specialized narrative generation tasks.

I created an LLM post-training method called RPS. Preliminary results show that it improved Qwen3-8b's program synthesis reliability. [R]

r/MachineLearning · 5d ago · 6 · research fine tuning workflow

RPS (Regressive Plasticity Schedule) is a two-stage training approach combining curriculum learning with adaptive learning rate decay, showing improvements on ARC-AGI benchmarks and program synthesis tasks. The method trains models on easy data with high learning rates, then hard data with reduced learning rates, demonstrating 4% vs 2.4% performance gains over equal learning rate baselines.

Released a free 9.8M doc Indic multilingual corpus — Hindi, Bengali, Tamil, Telugu + 7 more (CC0, HuggingFace) [P]

r/MachineLearning · 8d ago · 7 · dataset open source fine tuning

A new multilingual dataset (Indic HPLT v1) with 9.8M documents across 11 Indian languages plus English, totaling 8.4B tokens, released under CC0 license on Hugging Face. Useful for training and fine-tuning language models for underrepresented Indian language families, though primarily a resource rather than a novel technical breakthrough.

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

HuggingFace Blog · 8d ago · 8 · fine tuning tool tutorial open source

Practical guide for parameter-efficient fine-tuning of NVIDIA's Cosmos Predict 2.5 video world model using LoRA and DoRA adapters, enabling domain-specific adaptation on consumer GPUs without catastrophic forgetting. Includes complete implementation walkthrough using diffusers and accelerate libraries for generating synthetic robot trajectories for policy learning.

Struggling with Overfitting on Medical Imaging Task [D]

r/MachineLearning · 11d ago · 7 · tutorial workflow fine tuning

A software engineer shares a practical medical imaging classification problem (coronary artery classification from X-ray angiograms) with detailed overfitting issues and debugging attempts. This is a real-world scenario demonstrating transfer learning challenges, data augmentation strategies, and regularization techniques on small medical datasets (~900 samples), with actionable technical insights for practitioners building medical AI systems.

Trained transformer-based chess models to play like humans (including thinking time) [P]

r/MachineLearning · 13d ago · 8 · research fine tuning tutorial open source

Engineer trained rating-conditioned transformer chess models (9M parameters) on 1B Lichess games, achieving MAIA-3 parity with novel additions: thinking time prediction and clock-aware win probability models. The technical work emphasizes data pipeline optimization (C++ preprocessing + sequential shuffling for GPU efficiency) and demonstrates how small models can match larger baselines through careful training setup and conditioning on player/time context.

What kinds of models are people training with document data? [P]

r/MachineLearning · 13d ago · 6 · fine tuning tool workflow

A team building synthetic data generation for document understanding (PDFs, forms with PII) seeks feedback on output formats (FUNSD, BIO, YOLO, Donut, COCO) and distribution methods (PyPI SDK vs API vs zip files). This is relevant for engineers working on document processing pipelines and fine-tuning models on structured data, though it's primarily a community discussion rather than a technical resource.

Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]

r/MachineLearning · 14d ago · 8 · research fine tuning prompt engineering workflow

Fast-Slow Training (FST) combines in-context learning via optimized prompts (fast weights) with parameter updates (slow weights) to achieve 3x better sample efficiency than pure RL while reducing catastrophic forgetting and preserving model plasticity. This dual-timescale approach maintains closer alignment to base models while enabling effective continual learning across multiple tasks.

[AINews] The End of Finetuning

Latent Space · 14d ago · 7 · fine tuning benchmark agent open source research api update

OpenAI is deprecating fine-tuning APIs, shifting the AI engineering landscape toward open models, longer context windows, and agentic systems. The piece covers emerging research benchmarks (FrontierMath, medical evals), agentic breakthroughs in math/physics/coding, and the practical move away from proprietary model fine-tuning toward prompt engineering and open-source RLFT alternatives.

"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support"

HuggingFace Blog · 17d ago · 9 · open source fine tuning agent rag inference deployment

OncoAgent is an open-source clinical decision support system combining dual-tier fine-tuned LLMs (9B/27B via QLoRA), multi-agent LangGraph architecture, and Corrective RAG over medical guidelines with strict privacy (Zero-PHI). The system demonstrates significant technical innovations: 56× speedup on AMD MI300X hardware via sequence packing, 266K oncological case fine-tuning dataset, and deployable on-premises inference eliminating cloud API dependency.

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

r/MachineLearning · 18d ago · 9 · new model research inference fine tuning benchmark

DeepSeek V4 paper reveals production-ready FP4 quantization-aware training achieving 2x QK selector speedup with 99.7% recall and 27% FLOPs reduction, plus novel training stabilization techniques (anticipatory routing, SwiGLU clamping) for trillion-parameter MoE models. Includes practical inference optimizations and generative reward modeling for RLHF that significantly reduce computational overhead for multi-agent and multi-call workflows.

May 8, 2026AlignmentTeaching Claude why

Anthropic Research · 18d ago · 8 · research fine tuning agent

Anthropic shares practical lessons from improving AI alignment training that reduced agentic misalignment from 96% to 0% across Claude models. The key findings emphasize that data quality/diversity matters more than scale, and that alignment training must specifically include agentic tool-use scenarios rather than relying solely on chat-based RLHF—providing actionable insights for building safer AI systems.

CyberSecQwen-4B: Why Defensive Cyber Needs Small, Specialized, Locally-Runnable Models

HuggingFace Blog · 18d ago · 8 · fine tuning open source benchmark deployment tool

CyberSecQwen-4B demonstrates that a carefully fine-tuned 4B model can match an 8B specialist on cybersecurity tasks (CWE classification, CVE mapping, CTI Q&A) while fitting on consumer GPUs, achieving 97.3% of larger model accuracy with +8.7 points on multiple-choice benchmarks. The post details the training methodology using AMD MI300X, training on cybersecurity-specific datasets, and provides open-source configs for reproducing the work on various hardware stacks.

MedQA: Fine-Tuning a Clinical AI on AMD ROCm — No CUDA Required

HuggingFace Blog · 19d ago · 8 · fine tuning tutorial workflow open source

MedQA demonstrates a complete LoRA fine-tuning pipeline for clinical question-answering on AMD ROCm hardware, proving that HuggingFace ecosystem tools (Transformers, PEFT, TRL, Accelerate) work seamlessly without CUDA. The project fine-tunes Qwen3-1.7B on MedMCQA dataset in ~5 minutes on MI300X with 192GB HBM3, requiring only three environment variables to switch from CUDA to ROCm.

I trained a NER model on 33,000 Indian Supreme Court judgments (1950–2024) CASE_CITATION hits 97.76% F1, +17 points over the only prior baseline [P]

r/MachineLearning · 19d ago · 7 · new model open source fine tuning ner

New open-source NER model (en_legal_ner_ind_trf v0.1) fine-tuned on InLegalBERT for Indian legal document extraction, achieving 78.67% F1 across 13 entity types with exceptional performance on case citations (97.76% F1). Addresses the gap left by unmaintained OpenNyAI model, particularly handling pre-1990 OCR-degraded constitutional texts using a silver-annotation pipeline combining regex, metadata projection, transformer NER, and gazetteer approaches trained with Focal Loss for label imbalance.

Visual graph classification for blockchain security: Experiences fine-tuning Qwen2-VL on AMD MI300X [D]

r/MachineLearning · 22d ago · 7 · fine tuning tool workflow open source

Engineer shares a practical approach using Qwen2-VL-2B-Instruct with LoRA fine-tuning for detecting obfuscated transaction patterns by converting graphs to 2D images and leveraging VLM visual understanding—demonstrates an interesting workflow alternative to standard GNNs, includes published LoRA weights and synthetic dataset methodology on AMD/ROCm hardware.