HuggingFace Blog · 2d ago · 8 · new model open source rag deployment inference

Granite Embedding Multilingual R2 releases two new multilingual embedding models (97M and 311M parameters) supporting 200+ languages with 32K token context length and enhanced retrieval for 52 languages plus code. Both models ship with ONNX/OpenVINO optimization, work out-of-the-box with sentence-transformers and major RAG frameworks (LangChain, LlamaIndex, Haystack, Milvus), and are Apache 2.0 licensed—enabling drop-in replacement for language coverage at minimal performance cost.

r/LocalLLaMA · 2d ago · 6 · tool workflow api update

VS Code's AI Toolkit extension now supports agent-first development with configurable language models optimized for different tasks, including reasoning models with adjustable thinking effort levels. The article covers model selection strategies (fast vs. reasoning models), tool-calling support for agents, and how to configure API keys for custom models.

OpenAI Blog · 2d ago · 6 · api update tool workflow

OpenAI's Codex integration in the ChatGPT mobile app enables remote code generation and task monitoring across devices. This expands practical access to AI-assisted coding workflows beyond desktop environments, useful for developers managing remote infrastructure or mobile-first development pipelines.

r/MachineLearning · 2d ago · 7 · workflow tutorial research

A practitioner shares a real-world time series anomaly detection challenge: building failure prediction for IoT chargers with sparse positive labels (~1-2%), variable data rates between operational modes, and high device heterogeneity. They're exploring architectural solutions (dual RNN encoders vs. data-level sampling) and seeking advice on handling extreme class imbalance in time series forecasting.

Simon Willison · 3d ago · 6 · tool workflow deployment

Simon Willison describes using GPT-5.5 to generate a configurable rate-limiting plugin for handling crawler traffic on datasette.io. The post provides practical insights into using LLMs for DevOps/infrastructure automation and production deployment patterns.

r/MachineLearning · 3d ago · 8 · agent research workflow

Research paper on Continual Harness demonstrates how foundation models can autonomously refine their own execution harnesses through iterative self-improvement, demonstrated via Gemini completing Pokémon games without losses. The work formalizes the agent-harness co-learning loop and shows that self-refinement capabilities are critical for long-horizon task completion, with implications for building more autonomous AI systems.

OpenAI Blog · 3d ago · 5 · api update

OpenAI has implemented safety updates to ChatGPT that improve contextual understanding of sensitive conversations and risk detection patterns. While the safety mechanisms are interesting from an AI safety perspective, the practical technical details and implementation methods are not disclosed, limiting direct applicability for engineers building with AI.

HuggingFace Blog · 3d ago · 8 · inference workflow tutorial

This article explains how to optimize LLM inference performance by decoupling CPU and GPU workloads through asynchronous batching, eliminating idle gaps that waste ~24% of runtime in synchronous approaches. The post builds on continuous batching concepts and provides practical profiling techniques to measure and improve GPU utilization, critical for managing high inference costs on hardware like H200s.

r/MachineLearning · 3d ago · 8 · research fine tuning tutorial open source

Engineer trained rating-conditioned transformer chess models (9M parameters) on 1B Lichess games, achieving MAIA-3 parity with novel additions: thinking time prediction and clock-aware win probability models. The technical work emphasizes data pipeline optimization (C++ preprocessing + sequential shuffling for GPU efficiency) and demonstrates how small models can match larger baselines through careful training setup and conditioning on player/time context.

Anthropic Blog · 3d ago · 6 · api update workflow agent

Anthropic launched Claude for Small Business, a package of pre-built agentic workflows and connectors that integrate Claude into tools like QuickBooks, HubSpot, and Google Workspace for small business automation tasks. The offering includes 15 ready-to-run workflows across finance, sales, and operations, plus emphasis on data security and AI training partnerships.

r/MachineLearning · 3d ago · 8 · open source tool inference deployment api update

Scenema Audio releases open-source diffusion-based TTS model weights and inference code that decouples emotional performance from voice identity through prompt-based control. Key technical advantages include more natural emotional delivery than autoregressive TTS, support for audio-first video generation workflows, optimized diffusion (8 steps), and Docker/REST API deployment with automatic VRAM management. Practical trade-offs noted: stochastic quality requiring post-editing workflow, sensitivity to detailed prompting, and phonetic spelling for complex words.

r/MachineLearning · 3d ago · 6 · fine tuning tool workflow

A team building synthetic data generation for document understanding (PDFs, forms with PII) seeks feedback on output formats (FUNSD, BIO, YOLO, Donut, COCO) and distribution methods (PyPI SDK vs API vs zip files). This is relevant for engineers working on document processing pipelines and fine-tuning models on structured data, though it's primarily a community discussion rather than a technical resource.

r/MachineLearning · 3d ago · 5 · inference open source

Community discussion about running high-quality image generation models locally on CPU-only hardware with 16GB RAM. User seeks alternatives to Google Imagen for generating book covers, having found Stable Diffusion insufficient; thread explores trade-offs between quality, cost, and computational constraints for open-source models.

r/LocalLLaMA · 3d ago · 8 · new model open source inference deployment

SenseNova-U1 is a new unified multimodal model architecture that natively integrates visual understanding and generation without separate encoders/decoders, achieving state-of-the-art performance on multiple benchmarks while supporting efficient 8-step inference and interleaved image-text generation. Open-source weights, GGUF quantizations, and inference code are now available, with practical optimization features like layer-offload VRAM modes for low-resource deployment.

r/MachineLearning · 3d ago · 6 · workflow open source

A developer seeks architectural patterns for organizing benchmark infrastructure using type-safe data structures (Dataclasses/Pydantic) to manage datasets, task schemas, and experiment composition. While this is a practical engineering question rather than news, it reflects real challenges in building reproducible ML benchmarks and may surface useful open-source projects or design patterns worth studying.

r/MachineLearning · 3d ago · 6 · research benchmark

A technical critique of the 2024 'Ingenia Theorem' paper claiming AGI via ML is impossible, identifying a critical flaw: the proof equivocates between 'human-level classifier' and 'all polytime-sampleable distributions,' which would absurdly prove ImageNet classification is intractable. This is relevant for understanding the theoretical foundations and limitations arguments in AI/ML research.

r/MachineLearning · 3d ago · 5 · tutorial workflow

A developer discusses choosing between logistic regression and tree-based models (random forests) for a UFC fight prediction project, noting that MMA statistics exhibit nonlinear relationships and feature interactions that logistic regression may miss. The post highlights practical ML modeling decisions around feature engineering and model selection for binary classification with domain-specific constraints like betting value optimization.

r/LocalLLaMA · 3d ago · 9 · new model inference tool deployment

Ovis2.6-80B-A3B is a new multimodal LLM featuring a Mixture-of-Experts architecture with 80B total parameters but only ~3B active during inference, offering strong performance with low serving costs. Key improvements include 64K context window, up to 2880×2880 image resolution support, active visual reasoning via "Think with Image" capability, and enhanced OCR/document understanding—with practical implementation examples provided.

r/MachineLearning · 3d ago · 7 · research library open source

A novel Vision Transformer backbone using block-sparse core-periphery attention that reduces complexity from O(N²) to O(2NC + C²), trained with nested dropout for elastic inference-time cost adjustment. Achieves competitive accuracy with DINOv3 while maintaining stability across resolutions (256-1024) and demonstrates interesting emergent attention patterns.