r/MachineLearning · 22h ago · 8 · tool open source rag

Tomesphere is a free research paper discovery platform indexing 3M arxiv/OpenAlex papers with AI-generated TLDRs, peer reviews, GitHub repos, HuggingFace models, and semantic similarity search using SPECTER2 embeddings in pgvector. The semantic graph approach enables discovery of topically related papers beyond citation networks, with a Chrome extension for arxiv integration and multiple ranking modes (influential, recent, hidden gems, nearest neighbors).

r/MachineLearning · 1d ago · 7 · tool open source rag

Aiki is a lightweight local tool for querying Wikipedia with custom TF-IDF retrieval and optional LLM answer generation. It demonstrates practical RAG implementation with minimal dependencies, featuring query expansion via Wikipedia links and flexible article selection—useful reference for building local knowledge systems.

r/MachineLearning · 3d ago · 8 · benchmark rag inference workflow

Comprehensive benchmark comparing vision-capable LLMs (native PDF) against OCR-based RAG pipelines on long document processing, showing OCR approaches achieve higher accuracy (59.6% vs 52.0%) and lower cost ($0.19 vs $0.25/query) despite the 'vision makes OCR obsolete' narrative. Key findings: vision LLMs struggle with tables/charts, have a 7% failure rate on large PDFs that survives retries, while premium OCR + layout extraction proves more robust for document-heavy workloads.

r/MachineLearning · 4d ago · 6 · rag workflow prompt engineering

Reddit discussion proposing a personalized cognitive profiling system that tracks not just facts but learning patterns, struggling points, and effective explanation styles to improve LLM context retrieval over time. The idea combines dynamic profiling with RAG-like personalization to create an evolving understanding of how individual users think, rather than basic chat memory.

r/MachineLearning · 4d ago · 7 · rag workflow benchmark

This post demonstrates practical RAG optimization techniques including tiered retrieval scoring, corpus-quality awareness metrics, and empirical results across three real-world datasets with varying content density. The author introduces a 'yield score' metric to predict generation quality and notes that semantic relevance still performs reasonably well even on thin, positioning-heavy corpora—a pattern RAG benchmarks typically don't account for.

r/MachineLearning · 4d ago · 7 · research rag architecture benchmark

PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.

HuggingFace Blog · 8d ago · 8 · new model tool open source rag tutorial

Six new Sentence Transformers CrossEncoder rerankers built on ModernBERT, trained with distillation on open datasets, achieving SOTA performance at multiple model sizes. Includes full training recipes, easy 3-line inference API, and a new Hugging Face Agent Skill for fine-tuning rerankers on custom data.

r/MachineLearning · 8d ago · 8 · open source tool library rag deployment

Witchcraft is a Rust-based semantic search engine for client-side deployment using SQLite, achieving 20ms latency without external APIs or vector databases. It includes Pickbrain, a CLI tool that indexes Claude/Codex transcripts and documents for semantic search with direct session resumption, plus skills for both AI platforms to maintain cross-session memory.

HuggingFace Blog · 8d ago · 7 · tool library inference rag open source

PaddleOCR 3.5 now supports Transformers as a backend, enabling easier integration of OCR and document parsing into Hugging Face-centered workflows. This addresses document ingestion for RAG and Document AI pipelines by allowing developers to run PP-OCRv5 and PaddleOCR-VL models with flexible backend selection through a simple engine parameter.

r/MachineLearning · 9d ago · 7 · rag research benchmark

Experimental memory retrieval system achieving 96.4% on LongMemEval benchmark using cognitive science foundations (episodic memory theory, temporal context modeling) with key innovations in query decomposition, temporal salience scoring, and coherence re-ranking. The work isolates retrieval quality from model capability by using a smaller answering model and provides detailed category-level performance breakdown, though acknowledges limitations including single-benchmark evaluation and no ablation studies.

HuggingFace Blog · 12d ago · 8 · new model open source rag deployment inference

Granite Embedding Multilingual R2 releases two new multilingual embedding models (97M and 311M parameters) supporting 200+ languages with 32K token context length and enhanced retrieval for 52 languages plus code. Both models ship with ONNX/OpenVINO optimization, work out-of-the-box with sentence-transformers and major RAG frameworks (LangChain, LlamaIndex, Haystack, Milvus), and are Apache 2.0 licensed—enabling drop-in replacement for language coverage at minimal performance cost.

r/MachineLearning · 14d ago · 6 · rag workflow deployment

Post sharing conference decks from Knowledge Graph Conference highlighting production enterprise systems (Bloomberg, AbbVie, Morgan Stanley) using knowledge graphs as reasoning infrastructure rather than retrieval layers, demonstrating real compliance and governance implementations where KGs serve as source-of-truth with LLM interfaces.

r/MachineLearning · 14d ago · 6 · tool rag workflow

A developer built a Steam game recommender system using custom vector embeddings to capture nuanced game characteristics (gameplay focus, music, vibe) instead of broad tags, enabling more personalized recommendations and discovery of underrated games. The project uses a database-driven approach with explanations for each recommendation and includes an advanced mode for fine-tuned filtering.

HuggingFace Blog · 16d ago · 7 · agent open source inference workflow rag

MachinaCheck is a multi-agent AI system for CNC machine shops that analyzes STEP CAD files to determine manufacturability in 30 seconds. It uses Qwen 2.5 7B running locally on AMD MI300X (for on-premise privacy), cadquery for geometric feature extraction, and a five-component LangChain pipeline with vLLM inference to replace manual 30-60 minute feasibility assessments.

HuggingFace Blog · 17d ago · 9 · open source fine tuning agent rag inference deployment

OncoAgent is an open-source clinical decision support system combining dual-tier fine-tuned LLMs (9B/27B via QLoRA), multi-agent LangGraph architecture, and Corrective RAG over medical guidelines with strict privacy (Zero-PHI). The system demonstrates significant technical innovations: 56× speedup on AMD MI300X hardware via sequence packing, 266K oncological case fine-tuning dataset, and deployable on-premises inference eliminating cloud API dependency.

r/MachineLearning · 18d ago · 7 · rag embedding open source deployment

A software engineer built a Steam game recommender system using LLM-powered review analysis to extract nuanced game characteristics (vibes, mechanics, focus percentages) into vector embeddings, then implemented retrieval using PostgreSQL and Chroma DB with a React frontend. The project demonstrates practical RAG and embedding techniques for creating explainable recommendations that surface why games are suggested, avoiding collaborative filtering homogeneity.

r/MachineLearning · 19d ago · 7 · rag tool workflow open source deployment

Engineer built a Steam game recommender system using RAG/vector embeddings on 2k reviews across 80k games, with a pipeline that extracts game vibes and mechanics into interpretable vectors stored in PostgreSQL + Chroma DB. The system uses ChatGPT to generate structured tags from reviews, clusters them semantically, and provides explainable recommendations via a React frontend deployed on Digital Ocean—demonstrating practical LLM integration for recommendation systems with focus on interpretability over black-box collaborative filtering.

r/MachineLearning · 24d ago · 7 · research rag fine tuning inference

Experimental work on augmenting frozen transformers with lightweight external memory for in-context adaptation without weight updates. Uses forward-pass derived correction vectors to enable one-shot binding of new facts while maintaining context separation, with results showing 80%+ accuracy on same-context recall but degraded generalization to new contexts.

r/MachineLearning · 25d ago · 7 · open source agent rag tutorial

OpenVidya is an open-source multi-agent AI system for curriculum-aware lesson generation tailored to Indian education (NCERT/CBSE), featuring concept dependency graphs, exam-pattern grounding, and five pedagogical modes with mode-specific prompting. The project demonstrates practical application of agentic AI and RAG patterns for domain-specific education, with structured curriculum integration as a reusable architecture pattern.

r/MachineLearning · 26d ago · 8 · rag workflow tutorial

A practical approach to code-specific RAG using AST-derived typed graphs stored in SQLite with BM25 retrieval instead of embeddings, achieving ~5K tokens per query vs ~100K with naive chunking. The method leverages structural code relationships (imports, calls, inheritance) through graph traversal and uses lexical matching on distinctive identifiers, with hierarchical fallback for complex multi-file queries.