Lesson Learned #047: RAG Dual Database Cleanup Needed
Lesson Learned #047: RAG Dual Database Cleanup Needed
ID: LL-047 Impact: Identified through automated analysis
Date: December 15, 2025 Severity: MEDIUM Category: Technical Debt, RAG, Infrastructure Status: IDENTIFIED (cleanup pending)
The Problem
We have two RAG vector databases in the codebase:
| Database | Files Using | Status |
|---|---|---|
| ChromaDB | 2 files | Legacy |
| LanceDB | 3 files | Newer |
This creates:
- Confusion about which to use
- Double dependencies
- Inconsistent behavior
- CI failures when one is missing
How We Got Here
- Started with ChromaDB (legacy)
- Someone added LanceDB (newer, simpler)
- Neither was fully migrated
- Both are now ârequiredâ
Todayâs Fix (Bandaid)
Added both to requirements-minimal.txt:
chromadb==0.6.3
lancedb>=0.4.0
This fixes CI but doesnât solve the underlying mess.
Recommended Cleanup (TODO)
Option A: Consolidate to LanceDB (Recommended)
Why LanceDB:
- Simpler API
- No server needed
- Better for embeddings
- More modern
Steps:
- Migrate ChromaDB code to LanceDB
- Remove ChromaDB from requirements
- Test RAG functionality
- Delete old ChromaDB files
Option B: Consolidate to ChromaDB
Why ChromaDB:
- More established
- Better documentation
- Larger community
Steps:
- Migrate LanceDB code to ChromaDB
- Remove LanceDB from requirements
- Test RAG functionality
- Delete old LanceDB files
Files to Review
ChromaDB users:
rag_store/vector_store.pysrc/rag/unified_rag.py
LanceDB users:
src/rag/lessons_indexer.pysrc/rag/lightweight_rag.pysrc/rag/lessons_search.py
Priority
- P2 - Not urgent but creates technical debt
- Cleanup in next sprint (Week 4)
Tags
rag, technical_debt, chromadb, lancedb, cleanup