News Nug

Tomesphere, 3M paper pages with TLDRs, peer reviews, code, and a SPECTER2 similarity graph [P]

r/MachineLearning · 22h ago · 8 · tool open source rag

Tomesphere is a free research paper discovery platform indexing 3M arxiv/OpenAlex papers with AI-generated TLDRs, peer reviews, GitHub repos, HuggingFace models, and semantic similarity search using SPECTER2 embeddings in pgvector. The semantic graph approach enables discovery of topically related papers beyond citation networks, with a Chrome extension for arxiv integration and multiple ranking modes (influential, recent, hidden gems, nearest neighbors).

Aiki my local Wikipedia Retrieval-Augmented Generation system [R]

r/MachineLearning · 1d ago · 7 · tool open source rag

Aiki is a lightweight local tool for querying Wikipedia with custom TF-IDF retrieval and optional LLM answer generation. It demonstrates practical RAG implementation with minimal dependencies, featuring query expansion via Wikipedia links and flexible article selection—useful reference for building local knowledge systems.

Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA [D]

r/MachineLearning · 3d ago · 8 · benchmark rag inference workflow

Comprehensive benchmark comparing vision-capable LLMs (native PDF) against OCR-based RAG pipelines on long document processing, showing OCR approaches achieve higher accuracy (59.6% vs 52.0%) and lower cost ($0.19 vs $0.25/query) despite the 'vision makes OCR obsolete' narrative. Key findings: vision LLMs struggle with tables/charts, have a 7% failure rate on large PDFs that survives retries, while premium OCR + layout extraction proves more robust for document-heavy workloads.

Is personalized AI memory actually a problem worth solving or am I just coping[D]

r/MachineLearning · 4d ago · 6 · rag workflow prompt engineering

Reddit discussion proposing a personalized cognitive profiling system that tracks not just facts but learning patterns, struggling points, and effective explanation styles to improve LLM context retrieval over time. The idea combines dynamic profiling with RAG-like personalization to create an evolving understanding of how individual users think, rather than basic chat memory.

Tested chunking + embeddings data from 3 production websites. [P]

r/MachineLearning · 4d ago · 7 · rag workflow benchmark

This post demonstrates practical RAG optimization techniques including tiered retrieval scoring, corpus-quality awareness metrics, and empirical results across three real-world datasets with varying content density. The author introduces a 'yield score' metric to predict generation quality and notes that semantic relevance still performs reasonably well even on thin, positioning-heavy corpora—a pattern RAG benchmarks typically don't account for.

Looking for arXiv endorsement + sharing a preprint on homeostatic cognitive architecture for AI companions [R]

r/MachineLearning · 4d ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

Introducing the Ettin Reranker Family

HuggingFace Blog · 8d ago · 8 · new model tool open source rag tutorial

Six new Sentence Transformers CrossEncoder rerankers built on ModernBERT, trained with distillation on open datasets, achieving SOTA performance at multiple model sizes. Includes full training recipes, easy 3-line inference API, and a new Hugging Face Agent Skill for fine-tuning rerankers on custom data.

Witchcraft, fast local semantic search on top of SQLite [P]

r/MachineLearning · 8d ago · 8 · open source tool library rag deployment

Witchcraft is a Rust-based semantic search engine for client-side deployment using SQLite, achieving 20ms latency without external APIs or vector databases. It includes Pickbrain, a CLI tool that indexes Claude/Codex transcripts and documents for semantic search with direct session resumption, plus skills for both AI platforms to maintain cross-session memory.

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

HuggingFace Blog · 8d ago · 7 · tool library inference rag open source

PaddleOCR 3.5 now supports Transformers as a backend, enabling easier integration of OCR and document parsing into Hugging Face-centered workflows. This addresses document ingestion for RAG and Document AI pipelines by allowing developers to run PP-OCRv5 and PaddleOCR-VL models with flexible backend selection through a simple engine parameter.

#1 on memory benchmark LongMemEval with Gemini Flash, not Pro [R]

r/MachineLearning · 9d ago · 7 · rag research benchmark

Experimental memory retrieval system achieving 96.4% on LongMemEval benchmark using cognitive science foundations (episodic memory theory, temporal context modeling) with key innovations in query decomposition, temporal salience scoring, and coherence re-ranking. The work isolates retrieval quality from model capability by using a smaller answering model and provides detailed category-level performance breakdown, though acknowledges limitations including single-benchmark evaluation and no ablation studies.

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

HuggingFace Blog · 12d ago · 8 · new model open source rag deployment inference

Granite Embedding Multilingual R2 releases two new multilingual embedding models (97M and 311M parameters) supporting 200+ languages with 32K token context length and enhanced retrieval for 52 languages plus code. Both models ship with ONNX/OpenVINO optimization, work out-of-the-box with sentence-transformers and major RAG frameworks (LangChain, LlamaIndex, Haystack, Milvus), and are Apache 2.0 licensed—enabling drop-in replacement for language coverage at minimal performance cost.

Sharing all KGC 2026 decks. More production-grade KG systems than I've seen at any conference. [D]

r/MachineLearning · 14d ago · 6 · rag workflow deployment

Post sharing conference decks from Knowledge Graph Conference highlighting production enterprise systems (Bloomberg, AbbVie, Morgan Stanley) using knowledge graphs as reasoning infrastructure rather than retrieval layers, demonstrating real compliance and governance implementations where KGs serve as source-of-truth with LLM interfaces.

Steam Recommender using similarity! (Undergraduate Student Project) [P]

r/MachineLearning · 14d ago · 6 · tool rag workflow

A developer built a Steam game recommender system using custom vector embeddings to capture nuanced game characteristics (gameplay focus, music, vibe) instead of broad tags, enabling more personalized recommendations and discovery of underrated games. The project uses a database-driven approach with explanations for each recommendation and includes an advanced mode for fine-tuned filtering.

MachinaCheck: Building a Multi-Agent CNC Manufacturability System on AMD MI300X

HuggingFace Blog · 16d ago · 7 · agent open source inference workflow rag

MachinaCheck is a multi-agent AI system for CNC machine shops that analyzes STEP CAD files to determine manufacturability in 30 seconds. It uses Qwen 2.5 7B running locally on AMD MI300X (for on-premise privacy), cadquery for geometric feature extraction, and a five-component LangChain pipeline with vLLM inference to replace manual 30-60 minute feasibility assessments.

"OncoAgent: A Dual-Tier Multi-Agent Framework for Privacy-Preserving Oncology Clinical Decision Support"

HuggingFace Blog · 17d ago · 9 · open source fine tuning agent rag inference deployment

OncoAgent is an open-source clinical decision support system combining dual-tier fine-tuned LLMs (9B/27B via QLoRA), multi-agent LangGraph architecture, and Corrective RAG over medical guidelines with strict privacy (Zero-PHI). The system demonstrates significant technical innovations: 56× speedup on AMD MI300X hardware via sequence packing, 266K oncological case fine-tuning dataset, and deployable on-premises inference eliminating cloud API dependency.

r/MachineLearning · 18d ago · 7 · rag embedding open source deployment

A software engineer built a Steam game recommender system using LLM-powered review analysis to extract nuanced game characteristics (vibes, mechanics, focus percentages) into vector embeddings, then implemented retrieval using PostgreSQL and Chroma DB with a React frontend. The project demonstrates practical RAG and embedding techniques for creating explainable recommendations that surface why games are suggested, avoiding collaborative filtering homogeneity.

Steam Similarity Recommender [P]

r/MachineLearning · 19d ago · 7 · rag tool workflow open source deployment

Engineer built a Steam game recommender system using RAG/vector embeddings on 2k reviews across 80k games, with a pipeline that extracts game vibes and mechanics into interpretable vectors stored in PostgreSQL + Chroma DB. The system uses ChatGPT to generate structured tags from reviews, clusters them semantically, and provides explainable recommendations via a React frontend deployed on Digital Ocean—demonstrating practical LLM integration for recommendation systems with focus on interpretability over black-box collaborative filtering.

Toy experiment: frozen Pythia-70M can use a forward-derived fast memory for contextual one-shot symbolic recall [D]

r/MachineLearning · 24d ago · 7 · research rag fine tuning inference

Experimental work on augmenting frozen transformers with lightweight external memory for in-context adaptation without weight updates. Uses forward-pass derived correction vectors to enable one-shot binding of new facts while maintaining context separation, with results showing 80%+ accuracy on same-context recall but degraded generalization to new contexts.

Looking for feedback on OpenVidya: an open-source AI classroom layer for NCERT/CBSE [R]

r/MachineLearning · 25d ago · 7 · open source agent rag tutorial

OpenVidya is an open-source multi-agent AI system for curriculum-aware lesson generation tailored to Indian education (NCERT/CBSE), featuring concept dependency graphs, exam-pattern grounding, and five pedagogical modes with mode-specific prompting. The project demonstrates practical application of agentic AI and RAG patterns for domain-specific education, with structured curriculum integration as a reusable architecture pattern.

Codebase-scale retrieval using AST-derived graphs + BM25 — reducing LLM context from 100K to 5K tokens [D]

r/MachineLearning · 26d ago · 8 · rag workflow tutorial

A practical approach to code-specific RAG using AST-derived typed graphs stored in SQLite with BM25 retrieval instead of embeddings, achieving ~5K tokens per query vs ~100K with naive chunking. The method leverages structural code relationships (imports, calls, inheritance) through graph traversal and uses lexical matching on distinctive identifiers, with hierarchical fallback for complex multi-file queries.