Tomesphere is a free research paper discovery platform indexing 3M arxiv/OpenAlex papers with AI-generated TLDRs, peer reviews, GitHub repos, HuggingFace models, and semantic similarity search using SPECTER2 embeddings in pgvector. The semantic graph approach enables discovery of topically related papers beyond citation networks, with a Chrome extension for arxiv integration and multiple ranking modes (influential, recent, hidden gems, nearest neighbors).
Aiki is a lightweight local tool for querying Wikipedia with custom TF-IDF retrieval and optional LLM answer generation. It demonstrates practical RAG implementation with minimal dependencies, featuring query expansion via Wikipedia links and flexible article selection—useful reference for building local knowledge systems.
Comprehensive benchmark comparing vision-capable LLMs (native PDF) against OCR-based RAG pipelines on long document processing, showing OCR approaches achieve higher accuracy (59.6% vs 52.0%) and lower cost ($0.19 vs $0.25/query) despite the 'vision makes OCR obsolete' narrative. Key findings: vision LLMs struggle with tables/charts, have a 7% failure rate on large PDFs that survives retries, while premium OCR + layout extraction proves more robust for document-heavy workloads.
Reddit discussion proposing a personalized cognitive profiling system that tracks not just facts but learning patterns, struggling points, and effective explanation styles to improve LLM context retrieval over time. The idea combines dynamic profiling with RAG-like personalization to create an evolving understanding of how individual users think, rather than basic chat memory.
This post demonstrates practical RAG optimization techniques including tiered retrieval scoring, corpus-quality awareness metrics, and empirical results across three real-world datasets with varying content density. The author introduces a 'yield score' metric to predict generation quality and notes that semantic relevance still performs reasonably well even on thin, positioning-heavy corpora—a pattern RAG benchmarks typically don't account for.
PHI // DRIFT is a cognitive architecture adding persistent internal state and advanced memory retrieval to LLMs through a Decision Memory Unit (DMU) that shows 14.8% context improvement over cosine-only RAG. The approach is validated on consumer hardware without GPU acceleration and includes measurable continuity metrics (PEDI) for evaluating conversation coherence across interactions.
Six new Sentence Transformers CrossEncoder rerankers built on ModernBERT, trained with distillation on open datasets, achieving SOTA performance at multiple model sizes. Includes full training recipes, easy 3-line inference API, and a new Hugging Face Agent Skill for fine-tuning rerankers on custom data.
Witchcraft is a Rust-based semantic search engine for client-side deployment using SQLite, achieving 20ms latency without external APIs or vector databases. It includes Pickbrain, a CLI tool that indexes Claude/Codex transcripts and documents for semantic search with direct session resumption, plus skills for both AI platforms to maintain cross-session memory.
PaddleOCR 3.5 now supports Transformers as a backend, enabling easier integration of OCR and document parsing into Hugging Face-centered workflows. This addresses document ingestion for RAG and Document AI pipelines by allowing developers to run PP-OCRv5 and PaddleOCR-VL models with flexible backend selection through a simple engine parameter.
Experimental memory retrieval system achieving 96.4% on LongMemEval benchmark using cognitive science foundations (episodic memory theory, temporal context modeling) with key innovations in query decomposition, temporal salience scoring, and coherence re-ranking. The work isolates retrieval quality from model capability by using a smaller answering model and provides detailed category-level performance breakdown, though acknowledges limitations including single-benchmark evaluation and no ablation studies.
Granite Embedding Multilingual R2 releases two new multilingual embedding models (97M and 311M parameters) supporting 200+ languages with 32K token context length and enhanced retrieval for 52 languages plus code. Both models ship with ONNX/OpenVINO optimization, work out-of-the-box with sentence-transformers and major RAG frameworks (LangChain, LlamaIndex, Haystack, Milvus), and are Apache 2.0 licensed—enabling drop-in replacement for language coverage at minimal performance cost.
Post sharing conference decks from Knowledge Graph Conference highlighting production enterprise systems (Bloomberg, AbbVie, Morgan Stanley) using knowledge graphs as reasoning infrastructure rather than retrieval layers, demonstrating real compliance and governance implementations where KGs serve as source-of-truth with LLM interfaces.
A developer built a Steam game recommender system using custom vector embeddings to capture nuanced game characteristics (gameplay focus, music, vibe) instead of broad tags, enabling more personalized recommendations and discovery of underrated games. The project uses a database-driven approach with explanations for each recommendation and includes an advanced mode for fine-tuned filtering.
MachinaCheck is a multi-agent AI system for CNC machine shops that analyzes STEP CAD files to determine manufacturability in 30 seconds. It uses Qwen 2.5 7B running locally on AMD MI300X (for on-premise privacy), cadquery for geometric feature extraction, and a five-component LangChain pipeline with vLLM inference to replace manual 30-60 minute feasibility assessments.
OncoAgent is an open-source clinical decision support system combining dual-tier fine-tuned LLMs (9B/27B via QLoRA), multi-agent LangGraph architecture, and Corrective RAG over medical guidelines with strict privacy (Zero-PHI). The system demonstrates significant technical innovations: 56× speedup on AMD MI300X hardware via sequence packing, 266K oncological case fine-tuning dataset, and deployable on-premises inference eliminating cloud API dependency.
A software engineer built a Steam game recommender system using LLM-powered review analysis to extract nuanced game characteristics (vibes, mechanics, focus percentages) into vector embeddings, then implemented retrieval using PostgreSQL and Chroma DB with a React frontend. The project demonstrates practical RAG and embedding techniques for creating explainable recommendations that surface why games are suggested, avoiding collaborative filtering homogeneity.
Engineer built a Steam game recommender system using RAG/vector embeddings on 2k reviews across 80k games, with a pipeline that extracts game vibes and mechanics into interpretable vectors stored in PostgreSQL + Chroma DB. The system uses ChatGPT to generate structured tags from reviews, clusters them semantically, and provides explainable recommendations via a React frontend deployed on Digital Ocean—demonstrating practical LLM integration for recommendation systems with focus on interpretability over black-box collaborative filtering.
Experimental work on augmenting frozen transformers with lightweight external memory for in-context adaptation without weight updates. Uses forward-pass derived correction vectors to enable one-shot binding of new facts while maintaining context separation, with results showing 80%+ accuracy on same-context recall but degraded generalization to new contexts.
OpenVidya is an open-source multi-agent AI system for curriculum-aware lesson generation tailored to Indian education (NCERT/CBSE), featuring concept dependency graphs, exam-pattern grounding, and five pedagogical modes with mode-specific prompting. The project demonstrates practical application of agentic AI and RAG patterns for domain-specific education, with structured curriculum integration as a reusable architecture pattern.
A practical approach to code-specific RAG using AST-derived typed graphs stored in SQLite with BM25 retrieval instead of embeddings, achieving ~5K tokens per query vs ~100K with naive chunking. The method leverages structural code relationships (imports, calls, inheritance) through graph traversal and uses lexical matching on distinctive identifiers, with hierarchical fallback for complex multi-file queries.