A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Local Retrieval-Augmented Generation (RAG) plugin for Claude Code that combines Chroma db vector embeddings with intelli
Just a heads-up, turns out Anthropic / Claude does not like it when you avoid token usage cost by routing traffic to the CLI tool from them. This shadow banned me from their platform when I was on their $200 a month plan. They refuse to respond after months of submitting an appeal, etc, and no project I worked on violated any aspect of their Terms. After research, I see many people have been banned on similar cases. You have been warned.
Local Retrieval-Augmented Generation system for Claude Code with Multi-Agent Framework integration.
A production-ready Claude Code plugin that combines ChromaDB vector embeddings with intelligent document retrieval and Multi-Agent Framework (MAF) orchestration for context-aware development assistance.
Current Version: 2.0.0 Status: Production Ready (with known limitations documented in KNOWN_ISSUES.md)
Key Features:
Alternative Project: For a standalone CLI experience with extended features, see dt-cli. Both projects are actively maintained and can be used together.
RAG-CLI is a production-ready local Retrieval-Augmented Generation system that enhances your development workflow by providing instant access to your project documentation, codebase context, and external resources. It works seamlessly with Claude Code as a native plugin, eliminating the need for external API calls while processing documents locally with enterprise-grade security and performance.
RAG-CLI runs efficiently on:
The easiest way to get RAG-CLI as a Claude Code plugin:
# In Claude Code terminal
/plugin marketplace add https://github.com/ItMeDiaTech/rag-cli.git
/plugin install rag-cli
Then restart Claude Code. The plugin will activate automatically with zero configuration.
Benefits:
/plugin update rag-cliFor development, testing, or custom configuration:
# Clone the repository
git clone https://github.com/ItMeDiaTech/rag-cli.git
cd rag-cli
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Verify installation
python -c "from rag_cli.core import embeddings; print('Installation successful!')"
For contributing to RAG-CLI:
# Clone and install in editable mode
git clone https://github.com/ItMeDiaTech/rag-cli.git
cd rag-cli
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install with development dependencies
pip install -e ".[dev]"
# Configure MCP server for development mode
python scripts/configure_mcp.py
# Run tests to verify
pytest tests/
Important for Contributors: The configure_mcp.py script generates .mcp.json with absolute paths for your system. This file is gitignored and preserves any other MCP servers you have configured. You can re-run the script anytime if your project path changes.
If you installed manually and want to use it as a Claude Code plugin:
# From the RAG-CLI directory
python scripts/sync_plugin.py
# This will copy necessary files to:
# ~/.claude/plugins/marketplaces/rag-cli/
# Then restart Claude Code
No configuration needed. RAG-CLI auto-detects Claude Code environment:
# First time setup: Index your documents
/rag-project
# Or manually index
python scripts/index.py --input /path/to/docs
For development or testing outside Claude Code:
# Set environment variables
export ANTHROPIC_API_KEY="sk-ant-..."
export RAG_CLI_MODE="standalone"
export RAG_CLI_LOG_LEVEL="INFO"
# Index documents
python scripts/index.py --input data/documents
# Test retrieval
python scripts/retrieve.py "Your question here"
Edit config/default.yaml to customize:
# Model selection
embeddings:
model_name: sentence-transformers/all-MiniLM-L6-v2 # Fast, 384-dim
# Search parameters
retrieval:
top_k: 5 # Number of results
hybrid_ratio: 0.7 # 70% semantic, 30% keyword
rerank: true # Use cross-encoder reranking
# Claude settings (standalone only)
claude:
model: claude-haiku-4-5-20251001
max_tokens: 4096
temperature: 0.7
# Test plugin installation
/plugin
# Should show: RAG-CLI plugin is installed and loaded
# Test basic functionality
/search "test query"
# Check system status
python scripts/validate_plugin.py
Use Method 1 (Marketplace) for easiest setup.
Gather your documentation:
# Create documents directory
mkdir -p data/documents
# Copy your files
cp /path/to/docs/*.md data/documents/
cp /path/to/docs/*.pdf data/documents/
Supported formats: Markdown, PDF, DOCX, HTML, TXT
In Claude Code or terminal:
# Option 1: As Claude Code plugin (easiest)
/rag-project # Auto-indexes current project
# Option 2: Manual indexing
python scripts/index.py --input data/documents --output data/vectors
Ask Claude Code questions about your documents:
# In Claude Code
/search "How do I configure authentication?"
# Or directly ask Claude
"How do I configure authentication?"
# RAG-CLI will automatically enhance with context
# In Claude Code
/rag-enable
# Now all your questions will automatically get document context
Disable with: /rag-disable
Traditional workflow:
With RAG-CLI:
Real-world impact: Process 10x more questions per session.
RAG-CLI provides Claude with your actual documentation, code patterns, and project conventions:
Without RAG-CLI:
With RAG-CLI:
Stop mentally tracking:
RAG-CLI automatically provides this context, freeing your mind for actual problem-solving.
API Usage:
Time Savings:
Organizations using RAG-CLI report:
| Metric | Improvement |
|---|---|
| Development Speed | 30-40% faster completion |
| Code Quality | 25% fewer bugs in reviews |
| Documentation Accuracy | 90% vs 60% without context |
| Onboarding Time | 50% reduction |
| API Costs | Up to 60% savings |
RAG-CLI implements a sophisticated document retrieval pipeline:
Document Ingestion
core.constants)Embedding Generation
sentence-transformers/all-MiniLM-L6-v2Intelligent Retrieval
core.constants)Query Enhancement
Response Generation
Local Processing:
Performance Optimized:
Production Ready:
Frontend: Claude Code Plugin
|
Integration Layer: MCP Server + Hooks + Slash Commands
|
Retrieval: Hybrid Search Pipeline
|
ML/AI:
- Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
- Reranking: Cross-encoder (ms-marco-MiniLM-L-6-v2)
- Storage: ChromaDB (with HNSW indexing)
Document Processing:
- Parsing: LangChain + BeautifulSoup + PyPDF2 + python-docx
- Chunking: Semantic boundary detection
- Metadata: Automatic extraction
LLM Integration:
- Model: Claude Haiku (via Anthropic API)
- When plugin: Claude Code internal processing
- Streaming: For better perceived performance
Monitoring:
- Structured Logging: structlog
- Metrics: Prometheus-compatible
- TCP Server: Real-time status monitoring
API Integration
Bug Fixing
Code Review
Knowledge Base
Onboarding
Continuous Learning
Knowledge Synthesis
RAG-CLI supports three operation modes:
Set mode via environment variable:
export RAG_CLI_MODE="claude_code" # or "standalone" or "hybrid"
RAG-CLI/
src/
core/ # Core RAG pipeline
constants.py # Global configuration constants
embeddings.py # Sentence transformer integration
vector_store.py # ChromaDB vector operations
document_processor.py # Document chunking
retrieval_pipeline.py # Hybrid search
claude_integration.py # Claude API interface
monitoring/ # Observability
logger.py # Structured logging
tcp_server.py # Monitoring server
plugin/ # Claude Code integration
skills/ # Agent skills
commands/ # Slash commands
hooks/ # Event hooks
mcp/ # MCP server
scripts/ # CLI utilities
tests/ # Test suites
data/ # Documents and vectors
config/ # Configuration files
config/default.yaml)# Operation Mode
mode:
operation: hybrid # claude_code, standalone, or hybrid
claude_code:
format_context: true
include_metadata: true
max_context_length: 10000
# Embeddings
embeddings:
model_name: sentence-transformers/all-MiniLM-L6-v2
model_dim: 384
batch_size: 32
cache_enabled: true
# Vector Store
vector_store:
type: chromadb
index_type: hnsw # ChromaDB uses HNSW for efficient similarity search
save_path: data/vectors
# Retrieval
retrieval:
top_k: 5
hybrid_ratio: 0.7 # 70% vector, 30% keyword
rerank: true
reranker_model: cross-encoder/ms-marco-MiniLM-L-6-v2
# Claude (for standalone mode)
claude:
model: claude-haiku-4-5-20251001
max_tokens: 4096
temperature: 0.7
api_key_env: ANTHROPIC_API_KEY # Only needed for standalone
Environment Variable Protection:
Never Commit .env Files: The .env file contains sensitive API keys and should NEVER be committed to version control
.gitignoreconfig/templates/.env.template as a referenceFile Permissions: On Unix systems, ensure .env has restricted permissions:
chmod 600 .env # Read/write for owner only
API Key Storage:
.env file onlyos.getenv()Subprocess Security: RAG-CLI automatically sanitizes environment variables when spawning subprocesses, removing sensitive keys like:
ANTHROPIC_API_KEYTAVILY_API_KEYOPENAI_API_KEYConfiguration Files: User-specific configurations in config/ are gitignored. Only default templates in config/defaults/ and config/templates/ are version controlled.
/search [query] - Search indexed documents/rag-enable - Enable automatic RAG enhancement/rag-disable - Disable automatic RAG enhancement/rag-project - Analyze current project and index relevant documentation/update-rag - Synchronize RAG-CLI plugin filesAccess the RAG retrieval skill:
/skill rag-retrieval "Your question here"
RAG-CLI includes several hooks that enhance your Claude Code experience:
RAG-CLI integrates with the Multi-Agent Framework (MAF) to provide intelligent query routing:
The orchestrator automatically selects the best strategy based on query intent, providing faster and more accurate responses.
RAG-CLI provides structured, readable output for all operations:
Search Results Example:
# RAG Search Results
## Retrieval Results
Found: 5 relevant documents
Time: 145ms
## Retrieved Documents
**1. Getting Started Guide (score: 0.890)**
> This guide will help you get started with the installation process...
**2. Configuration Reference (score: 0.870)**
> The configuration file allows you to customize various aspects...
Orchestration Output Example:
## Query Processing
**Strategy:** parallel
**Intent:** troubleshooting
**Confidence:** 87.5%
**Documents:** 3
**MAF Agent:** debugger
The formatting system provides:
from src.core.document_processor import get_document_processor
from src.core.embeddings import get_embedding_model
from src.core.vector_store import get_vector_store
# Process documents
processor = get_document_processor()
documents = processor.process_directory("data/documents")
# Generate embeddings
model = get_embedding_model()
embeddings = model.encode_batch([doc["content"] for doc in documents])
# Store vectors
store = get_vector_store()
store.add_documents(documents, embeddings)
from src.core.retrieval_pipeline import HybridRetriever
# Initialize retriever
retriever = HybridRetriever(vector_store, embedding_model, config)
# Search
results = retriever.search("Your query", top_k=5)
from src.core.claude_integration import ClaudeAssistant
# Initialize assistant
assistant = ClaudeAssistant(config)
# Generate response
response = assistant.generate_response(query, retrieved_docs)
The monitoring server runs on port 9999 and accepts these commands:
STATUS - System health and statisticsMETRICS - Performance metricsLOGS - Recent log entriesHEALTH - Health check status# Check status
./scripts/monitor.ps1 -Command STATUS
# View metrics
./scripts/monitor.ps1 -Command METRICS
import socket
import json
def query_monitor(command):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(("localhost", 9999))
s.send(command.encode())
response = s.recv(4096).decode()
return json.loads(response)
status = query_monitor("STATUS")
print(status)
| Operation | Target | Typical |
|---|---|---|
| Vector Search | <100ms | 45ms |
| End-to-End | <5s | 3.2s |
| Embedding Generation | <500ms | 200ms |
| Document Processing | 0.5s/100 docs | 0.4s/100 docs |
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test
pytest tests/test_core.py::TestEmbeddings
No results found:
ls data/vectors/--threshold 0.5Slow performance:
API errors (Standalone mode only):
tail -f logs/rag_cli.logMode detection issues:
python -c "from src.core.claude_code_adapter import get_adapter; print(get_adapter().get_mode_info())"export RAG_CLI_MODE="claude_code"export RAG_CLI_LOG_LEVEL=DEBUG
python scripts/retrieve.py "test query" --verbose
src/core/ - Core RAG components (includes constants.py for centralized configuration)src/monitoring/ - Logging and metricssrc/plugin/ - Claude Code integrationscripts/ - CLI utilitiestests/ - Test suitesconfig/ - Configuration filesRAG-CLI uses a centralized constants module (core.constants) for all tunable parameters:
This design makes it easy to tune performance without modifying code throughout the codebase.
black for formattingMIT License - see LICENSE file for details.
Built with focus on performance, accuracy, and developer experience.
Native macOS app to monitor Claude AI usage limits and watch your coding sessions live
npx CLI installing 100+ agents, commands, hooks, and integrations in one command
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.
Pocket Flow: Codebase to Tutorial