New 5/6/26: 15 property graph databases total: 8 supported on both LlamaIndex and LangChain, 1 LI-only (Google Cloud Spanner Graph), 6 LC-only (ArangoDB, Apache AGE, Azure Cosmos DB for Gremlin, Apache HugeGraph, SurrealDB, TigerGraph). AWS Neptune RDF/SPARQL added. All 10 vector databases, all 3 search engines, and all LLM/embedding providers work with both LlamaIndex and LangChain. Every pipeline stage (chunking, KG extraction, graph write, vector write, search write, and retrieval fusion) can be configured independently. (Data source reading is LlamaIndex only; RDF stores use framework-independent adapters with LangChain Text-to-SPARQL retrieval.)

New: Flexible GraphRAG now supports RDF-based ontologies for both property graph databases and RDF triple store databases (Graphwise Ontotext GraphDB, Fuseki, and Oxigraph). Document ingestion with KG extraction, auto incremental data source change detection, and UI search (hybrid search, AI query, and AI chat) are all supported with both database types.

New: Flexible GraphRAG supports automatic incremental updates (Optional) from most data sources, keeping your Vector, Search and Graph databases synchronized in real-time or near real-time.

New: KG Spaces Integration of Flexible GraphRAG in Alfresco ACA Client

Flexible GraphRAG

Flexible GraphRAG is an open source AI context platform supporting a document processing pipeline (Docling or LlamaParse), knowledge graph auto-building, ontologies, schemas, many LLM providers, GraphRAG and RAG, hybrid semantic search (fulltext, vector, property graph, RDF/SPARQL), AI query, and AI chat. The backend is Python with LlamaIndex and LangChain as peer frameworks. LlamaIndex is the default for each pipeline stage; LangChain can be selected per stage in environment configuration. The API is a REST FastAPI service. Angular, React, and Vue TypeScript frontends and an MCP server are included. The stack supports 13 data sources (9 with incremental auto-sync), 15 property graph databases, 4 RDF triple stores (Apache Jena Fuseki, Ontotext GraphDB, Oxigraph, Amazon Neptune RDF), 10 vector databases, OpenSearch / Elasticsearch / BM25 search, and Alfresco. Services and dashboards can be enabled with the provided Docker Compose layout.

Flexible GraphRAG data sources, processing tab, auto-sync document states in Postgres, Neo4j

v0.6.0 in brief

Version 0.6.0 broadens framework and database choice: LangChain is a full peer to LlamaIndex (per-stage env pickers for chunking, vector, search, property graph, KG extraction, fusion). 15 property graph backends: 8 on both frameworks, Google Cloud Spanner (LlamaIndex-only), 6 LangChain-only (ArangoDB, Apache AGE, Azure Cosmos DB for Gremlin, HugeGraph, SurrealDB, TigerGraph). RDF includes Apache Jena Fuseki, Ontotext GraphDB, Oxigraph, and Amazon Neptune RDF. Incremental delete, LangChain adapters, and cleanup paths were extended across stores (see CHANGELOG.md).

Features

Hybrid Search: Configurable hybrid search combining vector search, full-text search, property-graph GraphRAG, and SPARQL against RDF stores.
Knowledge Graph GraphRAG: Extracts entities and relationships from documents to build graphs in property graph databases and RDF stores. Optional schemas and ontologies guide extraction or act as a starting point for the LLM to extend.
RDF/Ontology Support: Load OWL/RDFS ontologies to guide KG extraction into any property graph or RDF store; SPARQL 1.1 queries; RDF 1.2 triple annotations; full UI pipeline (ingest, hybrid search, AI query/chat, incremental auto-sync). See Ontology and RDF Support below.
15 Property Graph Databases: 8 on both LI+LC (Neo4j, ArcadeDB, FalkorDB, Ladybug, Memgraph, NebulaGraph, Amazon Neptune, Neptune Analytics), 1 LI-only (Google Cloud Spanner), 6 LC-only (ArangoDB, Apache AGE, Cosmos Gremlin, HugeGraph, SurrealDB, TigerGraph) — with KG extraction, hybrid search, and AI query/chat
4 RDF Triple Stores: Apache Jena Fuseki, Ontotext GraphDB, Oxigraph, Amazon Neptune RDF.
10 Vector Databases: Qdrant, Elasticsearch, OpenSearch, Neo4j, Chroma, Milvus, Weaviate, Pinecone, PostgreSQL pgvector, LanceDB — for semantic similarity search
3 Search Databases: Elasticsearch, OpenSearch, BM25 (built-in) — for full-text search and hybrid ranking
LLM providers (KG extraction & chat): Ollama, OpenAI, Azure OpenAI, Google Gemini, Anthropic Claude, Google Vertex AI, Amazon Bedrock, Groq, Fireworks AI, OpenAI-compatible endpoints (openai_like), OpenRouter, LiteLLM proxy, and vLLM — configurable via LLM_PROVIDER; see Supported LLM Providers
Embedding providers: OpenAI, Ollama, Azure OpenAI, Google GenAI, Vertex AI, Bedrock, Fireworks, OpenAI-like (EMBEDDING_KIND=openai_like), and LiteLLM — see LLM Configuration
Dual-framework pipeline: LlamaIndex and LangChain are first-class choices for chunking, vector and search adapters, property graphs, KG extraction, RDF text-to-SPARQL retrieval, and hybrid fusion—each stage can be set independently (LlamaIndex defaults). See Framework Configuration.
Multi-Source Ingestion: Processes documents from 13 data sources (9 with incremental auto sync): (file upload, cloud storage, enterprise repositories, web sources) with Docling (default) or LlamaParse (cloud API) document parsing.
Observability: Built-in OpenTelemetry instrumentation with automatic LlamaIndex tracing, Prometheus metrics, Jaeger traces, and Grafana dashboards for production monitoring
FastAPI Server with REST API: Python based FastAPI server with REST APIs for document ingesting, hybrid search, AI query, and AI chat.
MCP Server: MCP server providing Claude Desktop and other MCP clients with tools for document/text ingesting (all 13 data sources with 9 supporting incremental auto sync), hybrid search, and AI query. Uses FastAPI backend REST APIs.
UI Clients: Angular, React, and Vue UI clients support choosing the data source (filesystem, Alfresco, CMIS, etc.), ingesting documents, performing hybrid searches, AI queries, and AI chat. The UI clients use the REST APIs of the FastAPI backend.
Docker Deployment Flexibility: Supports both standalone and Docker deployment modes. Docker infrastructure provides modular database selection via docker-compose includes - vector, graph, search engines, and Alfresco can be included or excluded with a single comment. Choose between hybrid deployment (databases in Docker, backend and UIs standalone) or full containerization.

Frontend Screenshots

Angular Frontend - Tabbed Interface

Click to view Angular UI screenshots (Light Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

React Frontend - Tabbed Interface

Click to view React UI screenshots (Dark Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

Click to view React UI screenshots (Light Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

Vue Frontend - Tabbed Interface

Click to view Vue UI screenshots (Light Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

System Components

FastAPI Backend (`/flexible-graphrag`)

REST API Server: Provides endpoints for document ingestion, search, and AI query/chat
Hybrid Search Engine: Combines vector similarity (RAG), fulltext (BM25), and graph traversal (GraphRAG)
Document Processing: Advanced document conversion with Docling and LlamaParse integration
Configurable Architecture: Environment-based configuration for all components
Async Processing: Background task processing with real-time progress updates

MCP Server (`/flexible-graphrag-mcp`)

MCP Client support: Model Context Protocol server for Claude Desktop and other MCP clients
Full API Parity: Tools like ingest_documents() support all 13 data sources with source-specific configs: filesystem, repositories (Alfresco, SharePoint, Box, CMIS), cloud storage, web; skip_graph flag for all data sources; paths parameter for filesystem/Alfresco/CMIS; Alfresco also supports nodeDetails list (multi-select for KG Spaces)
Additional Tools: search_documents(), query_documents(), ingest_text(), system diagnostics, and health checks
Dual Transport: HTTP mode for debugging, stdio mode for production
Tool Suite: 9 specialized tools for document processing, search, and system management
Multiple Installation: pipx system installation or uvx no-install execution

UI Clients (`/flexible-graphrag-ui`)

Angular Frontend: Material Design with TypeScript
React Frontend: Modern React with Vite and TypeScript
Vue Frontend: Vue 3 Composition API with Vuetify and TypeScript
Unified Features: All clients support the 4 tab views, async processing, progress tracking, and cancellation

Docker Infrastructure (`/docker`)

Modular Database Selection: Include/exclude vector, graph, and search engines, and Alfresco with single-line comments
Flexible Deployment: Hybrid mode (databases in Docker, apps standalone) or full containerization
NGINX Reverse Proxy: Unified access to all services with proper routing
Built-in Database Dashboards: Most server dockers also provide built-in web interface dashboards (Neo4j browser, ArcadeDB, FalkorDB, OpenSearch, etc.)
Separate Dashboards: Additional dashboard dockers are provided: including Kibana for Elasticsearch and optional Ladybug Explorer (see docker/includes/ladybug-explorer.yaml).

Data Sources

Flexible GraphRAG supports 13 different data sources for ingesting documents into your knowledge base:

File & Upload Sources

File Upload - Direct file upload through web interface with drag & drop support

Cloud Storage Sources

Amazon S3 - AWS S3 bucket integration
Google Cloud Storage (GCS) - Google Cloud storage buckets
Azure Blob Storage - Microsoft Azure blob containers
OneDrive - Microsoft OneDrive personal/business storage
Google Drive - Google Drive file storage

Enterprise Repository Sources

Alfresco - Alfresco ECM/content repository with two integration options:
- KG Spaces ACA Extension - Integrates the Flexible GraphRAG Angular UI as an extension plugin within the Alfresco Content Application (ACA), enabling multi-select document/folder ingestion with nodeIds directly from the Alfresco interface
- Flexible GraphRAG Alfresco Data Source - Direct integration using Alfresco paths (e.g., /Shared/GraphRAG, /Company Home/Shared/GraphRAG, or /Shared/GraphRAG/cmispress.txt)
SharePoint - Microsoft SharePoint document libraries
Box - Box.com cloud storage
CMIS (Content Management Interoperability Services) - Industry-standard content repository interface

Web Sources

Web Pages - Extract content from web URLs
Wikipedia - Ingest Wikipedia articles by title or URL
YouTube - Process YouTube video transcripts

Each data source includes:

Configuration Forms: Easy-to-use interfaces for credentials and settings
Progress Tracking: Real-time per-file progress indicators
Flexible Authentication: Support for various auth methods (API keys, OAuth, service accounts)

Incremental Updates & Auto-Sync

NEW! Flexible GraphRAG supports automatic incremental updates (Optional) from most data sources, keeping your Vector, Search and Graph databases synchronized in real-time or near real-time:

Data Source	Auto-Sync Support	Detection Method	Status	Notes
Alfresco	✅ Real-time	Community ActiveMQ	Ready	Enterprise Event Gateway planned
Amazon S3	✅ Real-time	SQS event notifications	Ready
Azure Blob Storage	✅ Real-time	Change feed	Ready
Google Cloud Storage	✅ Real-time	Pub/Sub notifications	Ready
Google Drive	✅ Near real-time	Changes API (polling)	Ready
OneDrive	✅ Near real-time	Polling	Ready	Delta query support planned
SharePoint	✅ Near real-time	Polling	Ready	Delta query support planned
Box	✅ Near real-time	Events API (polling)	Ready
Local Filesystem	✅ Real-time	OS events (watchdog)	Ready	REST API and MCP Server only
File Upload UI, CMIS, Web Pages, Wikipedia, YouTube	➖ Not supported	-	-	No support for incremental updates

Features:

Modification Date Tracking: Uses file modification timestamps (ordinal) to detect changes
Content Hash Optimization: Skips reprocessing when file modification date changed but content hasn't
Dual Mechanism: Event-driven streams (real-time) + periodic polling fallback
LlamaIndex Integration: Uses proper abstractions for all databases
UI, REST API, MCP Server: Setting up an auto update data source location can be done thru the 3 UIs, with the REST API, or with the MCP server

Setup Requirements:

Enable incremental updates in your .env file:

hljs language-bash

ENABLE_INCREMENTAL_UPDATES=true

# PostgreSQL database for state management
# By default, uses the pgvector database from docker-compose.yaml
POSTGRES_INCREMENTAL_URL=postgresql://postgres:password@localhost:5433/postgres

Note: The incremental updates system uses PostgreSQL to track document state. The docker-compose.yaml includes a pgvector container that can be used both as a vector database option and for incremental updates state management. The database connection creates the necessary tables automatically on first use.

Usage:

Check the "Enable auto change sync" checkbox in the Processing tab when configuring your data source
For S3: Also provide the "SQS Queue URL" for event notifications
For GCS: Also provide the "Pub/Sub Subscription Name" for real-time updates

PostgreSQL for State Management:

The docker/includes/postgres-pgvector.yaml sets up two databases automatically on first start: flexible_graphrag (for optional pgvector vector storage) and flexible_graphrag_incremental (for incremental update state management, with its schema created automatically). pgAdmin is also configured at http://localhost:5050 with both databases pre-registered — just enter the master password admin when prompted, then use password for the server connection and save it. See docs/DATABASES/POSTGRES-SETUP.md for details.

Documentation:

System overview: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/README.md
Quick start: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/QUICKSTART.md
Detailed setup: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/SETUP-GUIDE.md
API reference: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/API-REFERENCE.md
PostgreSQL setup: docs/DATABASES/POSTGRES-SETUP.md

Scripts:

scripts/incremental/sync-now.sh|.ps1|.bat - Trigger immediate synchronization
scripts/incremental/set-refresh-interval.sh|.ps1|.bat - Configure polling interval
scripts/incremental/TIMING-CONFIGURATION.md - Timing configuration details
scripts/incremental/README.md - Script usage documentation

Document Processing Options

All data sources support two document parser options:

Docling (Default):

Open-source, local processing
Free with no API costs
GPU acceleration supported (CUDA/Apple Silicon) for 5-10x faster processing
Built-in OCR for scanned documents and images — DOCLING_OCR=true + DOCLING_OCR_ENGINE=auto|rapidocr|easyocr|tesseract_cli|tesserocr|ocrmac
Multi-language support (English, German, French, Spanish, Czech, Russian, Chinese, Japanese, etc.)
Configured via: DOCUMENT_PARSER=docling
DOCLING_DEVICE=auto|cpu|cuda|mps — control GPU vs CPU processing
SAVE_PARSING_OUTPUT=true — save intermediate parsing results for inspection (works for both parsers)
PARSER_FORMAT_FOR_EXTRACTION=auto|markdown|plaintext — control format used for knowledge graph extraction
See Docling GPU + OCR Configuration Guide for setup details | Quick Reference

LlamaParse:

Cloud-based API service with advanced AI
Multimodal parsing with Claude Sonnet 3.5
Three modes available:
- parse_page_without_llm - 1 credit/page
- parse_page_with_llm - 3 credits/page (default)
- parse_page_with_agent - 10-90 credits/page
Configured via: DOCUMENT_PARSER=llamaparse + LLAMAPARSE_API_KEY
Get your API key from LlamaCloud
New: SAVE_PARSING_OUTPUT=true - Save parsed output and metadata for inspection
New: PARSER_FORMAT_FOR_EXTRACTION=auto|markdown|plaintext - Control format used for knowledge graph extraction

Supported File Formats

Document Formats

PDF: .pdf
- Docling: Advanced layout analysis, table extraction, formula recognition, configurable OCR (EasyOCR, Tesseract, RapidOCR)
- LlamaParse: Automatic OCR within parsing pipeline, multimodal vision processing
Microsoft Office: .docx, .xlsx, .pptx and legacy formats (.doc, .xls, .ppt)
- Docling: DOCX, XLSX, PPTX structure preservation and content extraction
- LlamaParse: Full Office suite support including legacy formats and hundreds of variants
Web Formats: .html, .htm, .xhtml
- Docling: HTML/XHTML markup structure analysis
- LlamaParse: HTML/XHTML content extraction and formatting
Data Formats: .csv, .tsv, .json, .xml
- Docling: CSV structured data processing
- LlamaParse: CSV, TSV, JSON, XML with enhanced table understanding
Documentation: .md, .markdown, .asciidoc, .adoc, .rtf, .txt, .epub
- Docling: Markdown, AsciiDoc technical documentation with markup preservation
- LlamaParse: Extended format support including RTF, EPUB, and hundreds of text format variants

Image Formats

Standard Images: .png, .jpg, .jpeg, .gif, .bmp, .webp, .tiff, .tif
- Docling: OCR text extraction with configurable OCR backends (EasyOCR, Tesseract, RapidOCR)
- LlamaParse: Automatic OCR with multimodal vision processing and context understanding

Audio Formats

Audio Files: .wav, .mp3, .mp4, .m4a
- Docling: Automatic speech recognition (ASR) support
- LlamaParse: Transcription and content extraction for MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM

Processing Intelligence

Parser Selection:
- Docling (default, free): Local processing with specialized CV models (DocLayNet layout analysis, TableFormer for tables), configurable OCR backends (EasyOCR/Tesseract/RapidOCR), optional local VLM support (Granite-Docling, SmolDocling, Qwen2.5-VL, Pixtral)
- LlamaParse (cloud API, 3 credits/page): Automatic OCR in parsing pipeline, supports hundreds of file formats, fast mode (OCR-only), default mode (proprietary LlamaCloud model), premium mode (proprietary VLM mixture), multimodal mode (bring your own API keys: OpenAI GPT-4o, Anthropic Claude 3.5/4.5 Sonnet, Google Gemini 1.5/2.0, Azure OpenAI)
Output Formats:
- Flexible GraphRAG saves both markdown and plaintext, then automatically selects which to use for processing (knowledge graph extraction, vector embeddings, and search indexing) - defaults to markdown for tables, plaintext for text-heavy docs - override with PARSER_FORMAT_FOR_EXTRACTION
- Docling supports: Markdown, JSON (lossless with bounding boxes and provenance), HTML, plain text, and DocTags (specialized markup preserving multi-column layouts, mathematical formulas, and code blocks)
- LlamaParse supports: Markdown, plain text, raw JSON, XLSX (extracted tables), PDF, images (extracted separately), and structured output (beta - enforces custom JSON schema for strict data model extraction)
Format Detection: Automatic routing based on file extension and content analysis

Database Configuration

Flexible GraphRAG uses three types of databases for its hybrid search capabilities. Each can be configured independently via environment variables.

Search Databases (Full-Text Search)

Set SEARCH_DB to select the store and SEARCH_BACKEND=llamaindex or langchain for the framework.

BM25 (Built-in): Local in-memory BM25 full-text search with TF-IDF ranking
- Dashboard: None (file-based)
- Configuration:
  hljs language-bash
```
SEARCH_DB=bm25
BM25_SEARCH_DB_CONFIG={"persist_dir": "./bm25_index"}
```
Elasticsearch: Enterprise search engine with advanced analyzers, faceted search, and real-time analytics
- Dashboard: Kibana (http://localhost:5601)
- Configuration:
  hljs language-bash
```
SEARCH_DB=elasticsearch
ELASTICSEARCH_SEARCH_DB_CONFIG={"hosts": ["http://localhost:9200"], "index_name": "hybrid_search"}
```
OpenSearch: AWS-led open-source fork with native hybrid scoring (vector + BM25) and k-NN algorithms
- Dashboard: OpenSearch Dashboards (http://localhost:5601)
- Configuration:
  hljs language-bash
```
SEARCH_DB=opensearch
OPENSEARCH_SEARCH_DB_CONFIG={"hosts": ["http://localhost:9201"], "index_name": "hybrid_search"}
```
None: Disable full-text search (vector search only)
- Configuration:
  hljs language-bash
```
SEARCH_DB=none
```

Vector Databases (Semantic Search)

Set VECTOR_DB to select the store and VECTOR_BACKEND=llamaindex or langchain for the framework.

When switching embedding models, delete existing vector indexes — dimensions differ by provider. See docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md for cleanup instructions.

Supported Vector Databases

Neo4j: Can be used as vector database with separate vector configuration

Dashboard: Neo4j Browser (http://localhost:7474)

Configuration:

hljs language-bash

VECTOR_DB=neo4j
NEO4J_VECTOR_DB_CONFIG={"uri": "bolt://localhost:7687", "username": "neo4j", "password": "your_password", "index_name": "hybrid_search_vector"}

Qdrant: Dedicated vector database with advanced filtering
- Dashboard: Qdrant Web UI (http://localhost:6333/dashboard)
- Configuration:
  hljs language-bash
```
VECTOR_DB=qdrant
QDRANT_VECTOR_DB_CONFIG={"host": "localhost", "port": 6333, "collection_name": "hybrid_search"}
```
Elasticsearch: Can be used as vector database with separate vector configuration
- Dashboard: Kibana (http://localhost:5601)
- Configuration:
  hljs language-bash
```
VECTOR_DB=elasticsearch
ELASTICSEARCH_VECTOR_DB_CONFIG={"hosts": ["http://localhost:9200"], "index_name": "hybrid_search_vectors"}
```
OpenSearch: Can be used as vector database with separate vector configuration
- Dashboard: OpenSearch Dashboards (http://localhost:5601)
- Configuration:
  hljs language-bash
```
VECTOR_DB=opensearch
OPENSEARCH_VECTOR_DB_CONFIG={"hosts": ["http://localhost:9201"], "index_name": "hybrid_search_vectors"}
```

Chroma: Open-source vector database with dual deployment modes

Dashboard: Swagger UI (http://localhost:8001/docs/) (HTTP mode)

Configuration (Local Mode):

hljs language-bash

VECTOR_DB=chroma
CHROMA_VECTOR_DB_CONFIG={"persist_directory": "./chroma_db", "collection_name": "hybrid_search"}

Configuration (HTTP Mode):

hljs language-bash

VECTOR_DB=chroma
CHROMA_VECTOR_DB_CONFIG={"host": "localhost", "port": 8001, "collection_name": "hybrid_search"}

Milvus: Cloud-native, scalable vector database for similarity search
- Dashboard: Attu (http://localhost:3003)
- Configuration:
  hljs language-bash
```
VECTOR_DB=milvus
MILVUS_VECTOR_DB_CONFIG={"host": "localhost", "port": 19530, "collection_name": "hybrid_search"}
```
Weaviate: Vector search engine with semantic capabilities and data enrichment
- Dashboard: Weaviate Console (http://localhost:8081/console)
- Configuration:
  hljs language-bash
```
VECTOR_DB=weaviate
WEAVIATE_VECTOR_DB_CONFIG={"url": "http://localhost:8081", "index_name": "HybridSearch"}
```
Pinecone: Managed vector database service optimized for real-time applications
- Dashboard: Pinecone Console (web-based)
- Configuration:
  hljs language-bash
```
VECTOR_DB=pinecone
PINECONE_VECTOR_DB_CONFIG={"api_key": "your_api_key", "region": "us-east-1", "cloud": "aws", "index_name": "hybrid-search"}
```

PostgreSQL: Traditional database with pgvector extension for vector similarity search

Dashboard: pgAdmin (http://localhost:5050)

Configuration:

hljs language-bash

VECTOR_DB=postgres
POSTGRES_VECTOR_DB_CONFIG={"host": "localhost", "port": 5433, "database": "postgres", "username": "postgres", "password": "your_password"}

LanceDB: Modern, lightweight vector database designed for high-performance ML applications
- Dashboard: LanceDB Viewer (http://localhost:3005)
- Configuration:
  hljs language-bash
```
VECTOR_DB=lancedb
LANCEDB_VECTOR_DB_CONFIG={"uri": "./lancedb", "table_name": "hybrid_search"}
```

RAG without GraphRAG

For faster document ingest processing (no graph extraction), and hybrid search with only full text + vector, configure:

hljs language-bash

VECTOR_DB=qdrant       # Any vector store
SEARCH_DB=elasticsearch  # Any search engine
PG_GRAPH_DB=none

Property Graph Databases (Knowledge Graph / GraphRAG)

Set PG_GRAPH_DB to select the store and GRAPH_BACKEND=llamaindex or langchain for the framework where both are supported. LangChain-only stores (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin) route property-graph ingestion and retrieval through LangChain adapters regardless of other env defaults. LlamaIndex-only stores (Spanner): when PG_GRAPH_DB=spanner, startup forces GRAPH_BACKEND=llamaindex and ignores GRAPH_BACKEND=langchain.

Neo4j Property Graph: Primary knowledge graph storage with Cypher querying
- Dashboard: Neo4j Browser (http://localhost:7474)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=neo4j
NEO4J_GRAPH_DB_CONFIG={"uri": "bolt://localhost:7687", "username": "neo4j", "password": "your_password"}
```

ArcadeDB: Multi-model database supporting graph, document, key-value, and search with SQL and Cypher

Dashboard: ArcadeDB Studio (http://localhost:2480)

Configuration:

hljs language-bash

PG_GRAPH_DB=arcadedb
ARCADEDB_GRAPH_DB_CONFIG={"host": "localhost", "port": 2480, "username": "root", "password": "password", "database": "flexible_graphrag", "query_language": "sql"}

FalkorDB: High-performance graph database using GraphBLAS; purpose-built for LLM / GraphRAG
- Dashboard: FalkorDB Browser (http://localhost:3001)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=falkordb
FALKORDB_GRAPH_DB_CONFIG={"url": "falkor://localhost:6379", "database": "falkor"}
```
Ladybug: Embedded property graph database (Cypher, single .lbug file) with optional structured schema and HNSW vector index on chunks; Explorer UI via Docker (port 7003)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=ladybug
LADYBUG_GRAPH_DB_CONFIG={"db_dir": "./ladybug", "db_file": "database.lbug", "use_vector_index": true, "has_structured_schema": false, "strict_schema": false}
```
MemGraph: Real-time graph database with streaming support and advanced graph algorithms
- Dashboard: MemGraph Lab (http://localhost:3002)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=memgraph
MEMGRAPH_GRAPH_DB_CONFIG={"url": "bolt://localhost:7687", "username": "", "password": ""}
```
NebulaGraph: Distributed graph database for large-scale data with horizontal scalability
- Dashboard: NebulaGraph Studio (http://localhost:7001)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=nebula
NEBULA_GRAPH_DB_CONFIG={"space": "flexible_graphrag", "host": "localhost", "port": 9669, "username": "root", "password": "nebula"}
```
Amazon Neptune: Fully managed graph database service supporting property graph and RDF models
- Dashboard: Graph-Explorer (http://localhost:3007) or Neptune Workbench (AWS Console)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=neptune
NEPTUNE_GRAPH_DB_CONFIG={"host": "your-cluster.region.neptune.amazonaws.com", "port": 8182}
```
Amazon Neptune Analytics: Serverless graph analytics with openCypher support
- Dashboard: Graph-Explorer (http://localhost:3007) or Neptune Workbench (AWS Console)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=neptune_analytics
NEPTUNE_ANALYTICS_GRAPH_DB_CONFIG={"graph_identifier": "g-xxxxx", "region": "us-east-1"}
```
Google Cloud Spanner Graph (LlamaIndex only): Managed relational + property graph (GQL). Uses llama-index-spanner — install with uv pip install -e ".[spanner-extras]" then uv pip uninstall llama-index (see Optional under Prerequisites). LangChain is not supported for this store (langchain-google-spanner pins incompatible langchain-core).
- Setup: docs/DATABASES/GRAPH-DATABASES/SPANNER-SETUP.md
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=spanner
# GRAPH_BACKEND=llamaindex is forced for Spanner (LlamaIndex-only); langchain is ignored
SPANNER_GRAPH_DB_CONFIG={"project_id": "my-gcp-project", "instance_id": "my-spanner-instance", "database_id": "my-database", "graph_name": "knowledge_graph", "credentials_file": "./gcs.json"}
```

ArangoDB (LangChain only): Multi-model database with AQL graph queries

Dashboard: ArangoDB Web UI (http://localhost:8529)

Configuration:

hljs language-bash

PG_GRAPH_DB=arangodb
ARANGODB_GRAPH_DB_CONFIG={"url": "http://localhost:8529", "database": "flexible_graphrag", "username": "root", "password": "password"}

Apache AGE (LangChain only): PostgreSQL extension for graph data via Cypher

Dashboard: pgAdmin (http://localhost:5050)

Configuration:

hljs language-bash

PG_GRAPH_DB=apache_age
APACHE_AGE_GRAPH_DB_CONFIG={"host": "localhost", "port": 5434, "database": "flexible_graphrag_age", "username": "postgres", "password": "password", "graph_name": "knowledge_graph"}

HugeGraph (LangChain only): Distributed graph database with Gremlin and openCypher
- Dashboard: HugeGraph Hubble (http://localhost:8085)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=hugegraph
HUGEGRAPH_GRAPH_DB_CONFIG={"host": "localhost", "port": 8082, "database": "hugegraph"}
```

SurrealDB (LangChain only): Multi-model database with SurrealQL graph queries

Dashboard: Surrealist (http://localhost:8011)

Configuration:

hljs language-bash

PG_GRAPH_DB=surrealdb
SURREALDB_GRAPH_DB_CONFIG={"url": "ws://localhost:8010/rpc", "namespace": "test", "database": "flexible_graphrag", "username": "root", "password": "root"}

TigerGraph (LangChain only): Distributed graph database with GSQL

Dashboard: GraphStudio (http://localhost:14240)

Configuration:

hljs language-bash

PG_GRAPH_DB=tigergraph
TIGERGRAPH_GRAPH_DB_CONFIG={"host": "http://localhost", "port": 14240, "restpp_port": 9002, "database": "MyGraph", "username": "tigergraph", "password": "tigergraph"}

Cosmos Gremlin (LangChain only): Azure Cosmos DB for Gremlin API

Configuration:

hljs language-bash

PG_GRAPH_DB=cosmos_gremlin
COSMOS_GREMLIN_GRAPH_DB_CONFIG={"url": "ws://localhost:8182/gremlin"}

None: Disable knowledge graph extraction for RAG-only mode
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=none
```

Ontology and RDF Support

Flexible GraphRAG supports RDF/RDFS/OWL ontologies to guide knowledge graph extraction, with optional RDF graph store backends. Ontology-guided extraction works with any configured store — property graph, RDF graph store, or both.

Load OWL/RDFS ontologies (owl:Class, owl:ObjectProperty, owl:DatatypeProperty, rdfs:domain, rdfs:range) to constrain entity/relation extraction; OWL is supported but not required
Works with all 15 property graph databases — no RDF store required to use ontology-guided extraction
Full pipeline for all 4 RDF graph stores: UI document ingest → KG extraction → RDF storage; auto incremental sync; Hybrid Search and AI Query/Chat fuse RDF store results alongside vector, BM25, and property graph results
SPARQL 1.1 queries; RDF 1.2 triple terms and relation annotations ({| |} syntax); XSD-typed literals from OWL DatatypeProperty ranges

RDF Graph Store Configuration — set RDF_GRAPH_DB to select the store (all four support RDF 1.2 triple terms; Neptune is AWS-managed—no local compose include):

Apache Jena Fuseki — SPARQL 1.1 server; dashboard: http://localhost:3030
hljs language-bash
```
RDF_GRAPH_DB=fuseki
FUSEKI_BASE_URL=http://localhost:3030
FUSEKI_DATASET=flexible-graphrag
```

Ontotext GraphDB — enterprise RDF store with OWL reasoning; dashboard: http://localhost:7200

hljs language-bash

RDF_GRAPH_DB=graphdb
GRAPHDB_BASE_URL=http://localhost:7200
GRAPHDB_REPOSITORY=flexible-graphrag
GRAPHDB_USERNAME=admin
GRAPHDB_PASSWORD=admin

Oxigraph — lightweight local store, native RDF 1.2; dashboard: http://localhost:7878
hljs language-bash
```
RDF_GRAPH_DB=oxigraph
OXIGRAPH_URL=http://localhost:7878
```

Amazon Neptune RDF — managed SPARQL 1.1 on Neptune (same cluster can host property graph and RDF; IAM SigV4 auth). See Neptune RDF setup.

hljs language-bash

RDF_GRAPH_DB=neptune_rdf
NEPTUNE_RDF_HOST=db-neptune-1.cluster-xxxxxxxxxxxx.us-east-1.neptune.amazonaws.com
NEPTUNE_RDF_PORT=8182
NEPTUNE_RDF_REGION=us-east-1
NEPTUNE_RDF_USE_IAM_AUTH=true
NEPTUNE_RDF_USE_HTTPS=true
# Optional explicit keys (else default AWS credential chain):
# NEPTUNE_RDF_AWS_ACCESS_KEY_ID=
# NEPTUNE_RDF_AWS_SECRET_ACCESS_KEY=

None — disable RDF graph store:
hljs language-bash
```
RDF_GRAPH_DB=none
```

Docker Setup: Uncomment local RDF store includes in docker-compose.yaml (Fuseki, GraphDB, Oxigraph):

hljs language-yaml

includes:
  # - includes/jena-fuseki.yaml
  # - includes/ontotext-graphdb.yaml
  # - includes/oxigraph.yaml

Complete Documentation: docs/DATABASES/RDF/RDF-ONTOLOGY-SUPPORT.md | docs/DATABASES/RDF/RDF-STORE-USER-GUIDE.md

Framework Configuration

Every pipeline stage can independently run on LlamaIndex or LangChain via env var pickers:

Variable	Options	Description
`GRAPH_BACKEND`	`llamaindex` \| `langchain`	Property graph store and KG retrieval
`VECTOR_BACKEND`	`llamaindex` \| `langchain`	Vector store adapter
`SEARCH_BACKEND`	`llamaindex` \| `langchain`	Full-text search adapter
`CHUNKER_BACKEND`	`llamaindex` \| `langchain`	Document chunking / splitting
`KG_EXTRACTOR_BACKEND`	`llamaindex` \| `langchain`	KG extraction from chunks
`RETRIEVAL_FUSION`	`llamaindex` \| `langchain`	Result fusion across retrievers

LangChain-only graph stores (ArangoDB, Apache AGE, HugeGraph, SurrealDB, TigerGraph, Cosmos Gremlin) auto-select GRAPH_BACKEND=langchain. LlamaIndex-only Spanner (PG_GRAPH_DB=spanner) forces GRAPH_BACKEND=llamaindex at startup and ignores GRAPH_BACKEND=langchain (no LangChain adapter).

Complete Documentation: docs/ADVANCED/LANGCHAIN/LANGCHAIN-GRAPH-INTEGRATION.md

LLM and Embedding Configuration

Set via LLM_PROVIDER and provider-specific environment variables.

Supported LLM Providers

OpenAI - gpt-4o-mini (default), gpt-4o, gpt-4.1-mini, gpt-5-mini, etc.
Ollama - Local deployment (llama3.2, llama3.1, qwen2.5, gpt-oss, etc.)
Azure OpenAI - Azure-hosted OpenAI models
Google Gemini - gemini-2.5-flash, gemini-3-flash-preview, gemini-3.1-pro-preview, etc.
Anthropic Claude - claude-sonnet-4-5, claude-haiku-4-5, etc.
Google Vertex AI - Google Cloud-hosted Vertex AI Platform Gemini models
Amazon Bedrock - Amazon Nova, Titan, Anthropic Claude, Meta Llama, Mistral AI, etc.
Groq - Fast low-cost LPU inference, low latency: OpenAI GPT-OSS, Meta Llama (4, 3.3, 3.1), Qwen3, Kimi, etc.
Fireworks AI - More choices, fine-tuning: Meta, Qwen, Mistral AI, DeepSeek, OpenAI GPT-OSS, Kimi, GLM, MiniMax, etc.
OpenAI-Compatible (openai_like) - Any OpenAI-compatible endpoint (LM Studio, LocalAI, Llamafile, vLLM, etc.)
OpenRouter - 200+ models via unified API (openai/gpt-4o-mini, anthropic/claude, meta-llama, etc.)
LiteLLM Proxy - 100+ providers via LiteLLM proxy; sample config in scripts/litellm_config.yaml
vLLM - High-performance local inference server (Linux/macOS; use openai_like on Windows)

LLM Provider Configuration

See docs/LLM/LLM-EMBEDDING-CONFIG.md for all 13 providers with detailed configuration examples.

OpenAI (recommended):

hljs language-bash

LLM_PROVIDER=openai
OPENAI_API_KEY=your_api_key
OPENAI_MODEL=gpt-4o-mini

Ollama (local):

hljs language-bash

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:latest

Azure OpenAI:

hljs language-bash

LLM_PROVIDER=azure_openai
AZURE_OPENAI_API_KEY=your_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_ENGINE=gpt-4o-mini

Embedding Configuration

Set EMBEDDING_KIND to choose the embedding provider — independent of the LLM provider. All 13 LLM providers are also supported as embedding providers. See docs/LLM/LLM-EMBEDDING-CONFIG.md for all providers and options.

OpenAI:

hljs language-bash

EMBEDDING_KIND=openai
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=your_api_key

Ollama (local):

hljs language-bash

EMBEDDING_KIND=ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_BASE_URL=http://localhost:11434

Azure OpenAI:

hljs language-bash

EMBEDDING_KIND=azure_openai
AZURE_EMBEDDING_MODEL=text-embedding-3-small
AZURE_EMBEDDING_DEPLOYMENT=your_deployment_name
AZURE_OPENAI_API_KEY=your_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/

Common embedding dimensions:

OpenAI: 1536 (text-embedding-3-small), 3072 (text-embedding-3-large)
Ollama: 384 (all-minilm), 768 (nomic-embed-text), 1024 (mxbai-embed-large)
Google: 768 (gemini-embedding-2-preview)
Bedrock: 1024 (amazon.titan-embed-text-v2:0)

When switching embedding models, delete existing vector indexes. See docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md for cleanup instructions.

Ollama Configuration

When using Ollama, configure system-wide environment variables before starting the Ollama service:

Key requirements:

Configure environment variables system-wide (not in Flexible GraphRAG .env file)
OLLAMA_NUM_PARALLEL=4 for optimal performance (or 1-2 if resource constrained)
Always restart Ollama service after changing environment variables

See docs/LLM/OLLAMA-CONFIGURATION.md for complete setup instructions including platform-specific steps and performance optimization.

Prerequisites

Required

Python 3.12, 3.13, or 3.14 (as specified in pyproject.toml)
UV package manager (for dependency management)
Node.js 22.x (for UI clients)
npm (package manager)
Search database: Elasticsearch or OpenSearch
Vector database: Qdrant (or other supported vector databases)
Property graph database: Neo4j (or other supported property graph databases) - unless using vector-only RAG
OpenAI with API key (recommended) or Ollama (for LLM processing)

Note: The docker/docker-compose.yaml file can provide all these databases via Docker containers.

Install

hljs language-bash

cd flexible-graphrag
uv pip install -e .

Optional (see flexible-graphrag/pyproject.toml for all options)

LangChain 1.x integration — Optional peer stack alongside LlamaIndex (extras pin langchain>=1.0 and the LangChain 1.x line, not legacy 0.3):
- uv pip install -e ".[langchain]" — core LC extras: property graph stores via langchain-community where supported, 10 vector stores, 3 search stores, RDF SPARQL retrieval, native LC LLM/embedding clients for all 13 providers, KG extraction via langchain-experimental, retrieval fusion
- uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras]" — adds Neo4j (LC), PostgreSQL pgvector, ArcadeDB, ArangoDB, Cosmos Gremlin, HugeGraph, TigerGraph, and related dependencies (see pyproject.toml group langchain-extras)
- Apache AGE — property graph via LangChain needs the separate age-extras group (BAEM1N langchain-age driver):
  hljs language-bash
```
uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras,age-extras]"
python scripts/patch_langchain_age.py
```
  Run patch_langchain_age.py on Python 3.14+ (required); on 3.12/3.13 it is harmless.
- uv pip install -e ".[spanner-extras]" — adds LI-only Spanner support via llama-index-spanner. Note: llama-index-spanner declares llama-index (the meta-package) as a dependency, which uv will install. Uninstall it immediately after: uv pip uninstall llama-index — having both llama-index and llama-index-core installed simultaneously can cause version conflicts, as the meta-package pins versions of llama-index-* component packages that can clash with the versions already required by this project
- SurrealDB — two-step install required (resolver conflict):
  hljs language-bash
```
uv pip install -e ".[surrealdb-extras]"
uv pip install "surrealdb>=2.0" "langchain-core>=1.3"
```
ArcadeDB embedded mode (uv pip install arcadedb-embedded>=26.3.2) — runs ArcadeDB in-process; includes a bundled JVM, no separate Java install needed; latest release: 26.3.2
Enterprise Repositories:
- Alfresco repository - only if using Alfresco data source
- SharePoint - requires SharePoint access
- Box - requires Box Business account (3 users minimum), API keys
- CMIS-compliant repository (e.g., Alfresco) - only if using CMIS data source
Cloud Storage (requires accounts and API keys/credentials):
- Amazon S3 - requires AWS account and access keys
- Google Cloud Storage - requires GCP account and service account credentials
- Google Drive - requires Google Cloud account and OAuth credentials or service account
- Azure Blob Storage - requires Azure account and connection string or account keys
- Microsoft OneDrive - requires OneDrive for Business (not personal OneDrive)
- Note: SharePoint and OneDrive for Business are also available with a M365 Developer Program sandbox (with full Visual Studio annual subscription, not monthly).
File Upload (no account required):
- Web interface with file dialog (drag & drop or click to select)
Web Sources (no account required):
- Web pages, Wikipedia, YouTube - no accounts needed

Setup

🐳 Docker Deployment

Docker deployment offers multiple scenarios. Before deploying any scenario, set up your environment files:

Environment File Setup (Required for All Scenarios):

Backend Configuration (.env):

hljs language-bash

# Navigate to backend directory
cd flexible-graphrag

# Linux/macOS
cp env-sample.txt .env

# Windows Command Prompt
copy env-sample.txt .env

# Edit .env with your database credentials, API keys, and settings
# Then return to project root
cd ..

Docker Configuration (docker.env):

hljs language-bash

# Navigate to docker directory
cd docker

# Linux/macOS
cp docker-env-sample.txt docker.env

# Windows Command Prompt
copy docker-env-sample.txt docker.env

# Edit docker.env for Docker-specific overrides (network addresses, service names)
# Stay in docker directory for next steps

Scenario A: Databases in Docker, App Standalone (Hybrid)

Configuration Setup:

hljs language-bash

# If not already in docker directory from previous step:
# cd docker

# Edit docker-compose.yaml to uncomment/comment services as needed
# Scenario A setup in docker-compose.yaml:
# Keep these services uncommented (default setup):
  - includes/neo4j.yaml
  - includes/qdrant.yaml
  - includes/elasticsearch-dev.yaml
  - includes/kibana-simple.yaml

# Keep these services commented out:
# - includes/app-stack.yaml       # Must be commented out for Scenario A
# - includes/proxy.yaml           # Must be commented out for Scenario A
# - All other services remain commented unless you want a different vector database, 
#   graph database, OpenSearch for search, or Alfresco included

Deploy Services:

hljs language-bash

# From the docker directory
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Scenario B: Full Stack in Docker (Complete)

Configuration Setup:

hljs language-bash

# If not already in docker directory from previous step:
# cd docker

# Edit docker-compose.yaml to uncomment/comment services as needed
# Scenario B setup in docker-compose.yaml:
# Keep these services uncommented:
  - includes/neo4j.yaml
  - includes/qdrant.yaml
  - includes/elasticsearch-dev.yaml
  - includes/kibana-simple.yaml
  - includes/app-stack.yaml       # Backend and UI in Docker
  - includes/proxy.yaml           # NGINX reverse proxy

# Keep other services commented out unless you want a different vector database,
# graph database, OpenSearch for search, or Alfresco included

Deploy Services:

hljs language-bash

# From the docker directory
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Scenario B Service URLs:

Angular UI: http://localhost:8070/ui/angular/
React UI: http://localhost:8070/ui/react/
Vue UI: http://localhost:8070/ui/vue/
Backend API: http://localhost:8070/api/

Other Deployment Scenarios

Scenario C: Fully Standalone - Not using docker-compose at all

Standalone backend, standalone UIs, all databases running separately
Configure all database connections in flexible-graphrag/.env

Scenario D: Backend/UIs in Docker, Databases External

Using docker-compose for backend and UIs (app-stack + proxy)
Some or all databases running separately (same docker-compose, other local Docker, cloud/remote servers)
Configure database connections in docker/docker.env: Backend in Docker reads this file
- For databases in same docker-compose: Use service names (e.g., neo4j:7687, qdrant:6333)
- For databases in other local Docker containers: Use host.docker.internal:PORT
- For remote/cloud databases: Use actual hostnames/IPs

Scenario E: Mixed Docker/Standalone

Standalone backend and UIs
Running some databases in Docker (local) and some outside (cloud, external servers)
Configure all database connections in flexible-graphrag/.env: Use host.docker.internal:PORT for locally-running Docker databases, use actual hostnames/IPs for remote Docker or non-Docker databases

Docker Control and Configuration

Managing Docker services:

hljs language-bash

# Navigate to docker directory (if not already there)
cd docker

# Create and start services (recreates if configuration changed)
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

# Stop services (keeps containers)
docker-compose -f docker-compose.yaml -p flexible-graphrag stop

# Start stopped services
docker-compose -f docker-compose.yaml -p flexible-graphrag start

# Stop and remove services
docker-compose -f docker-compose.yaml -p flexible-graphrag down

# View logs
docker-compose -f docker-compose.yaml -p flexible-graphrag logs -f

# Restart after configuration changes
docker-compose -f docker-compose.yaml -p flexible-graphrag down
# Edit docker-compose.yaml, docker.env, or includes/app-stack.yaml as needed
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Configuration:

Modular deployment: Comment/uncomment services in docker/docker-compose.yaml
Backend configuration (Scenario B): Backend uses flexible-graphrag/.env with docker/docker.env for Docker-specific overrides (like using service names instead of localhost). No configuration needed in app-stack.yaml

See docker/README.md for detailed Docker configuration.

🔧 Local Development Setup (Scenario A)

Note: Skip this entire section if using Scenario B (Full Stack in Docker).

Environment Configuration

Create environment file (cross-platform):

hljs language-bash

# Linux/macOS
cp flexible-graphrag/env-sample.txt flexible-graphrag/.env

# Windows Command Prompt  
copy flexible-graphrag\env-sample.txt flexible-graphrag\.env

Edit .env with your database credentials and API keys.

Python Backend Setup (Standalone)

Option A — Install from PyPI package (Quickstart)

hljs language-bash

# 1. Create and activate a virtual environment
uv venv venv-3.13 --python 3.13
venv-3.13\Scripts\Activate   # Windows
source venv-3.13/bin/activate  # Linux/macOS

# 2. Install flexible-graphrag
uv pip install flexible-graphrag

# 3. Optionally install ArcadeDB embedded mode support (includes bundled JVM, no Java install needed)
uv pip install arcadedb-embedded>=26.3.2

# 3a. Optional dependency groups, for example:
uv pip install "flexible-graphrag[langchain]"
# Other extras ([langchain-extras], [age-extras], overrides): see source README, Prerequisites > Optional.

# 4. Create .env from the sample (copy from the source repo or download env-sample.txt)
copy env-sample.txt .env   # Windows
cp env-sample.txt .env     # Linux/macOS
# Edit .env with your LLM API keys and database settings

# 5. Start your databases (docker compose or standalone)
docker compose -f docker/docker-compose.yml up -d

# 6. Run the backend
flexible-graphrag
# or: uv run start.py

Option B — Install from source (editable)

Navigate to the backend directory:
hljs language-bash
```
cd flexible-graphrag
```

Create and activate a virtual environment, then install in editable mode:

hljs language-bash

uv venv venv-3.13 --python 3.13
venv-3.13\Scripts\Activate   # Windows
source venv-3.13/bin/activate  # Linux/macOS
uv pip install -e .

# see flexible-graphrag/pyproject.toml for all options
# --- Optional: dependency groups from pyproject.toml [project.optional-dependencies] ---
# LangChain (peer framework; use overrides when combining with langchain-extras)
uv pip install -e ".[langchain]"
uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras]"
uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras,age-extras]"
python scripts/patch_langchain_age.py
uv pip install --override extras-overrides.txt -e ".[surrealdb-extras]"
uv pip install "surrealdb>=2.0" "langchain-core>=1.3"
uv pip install --override extras-overrides.txt -e ".[spanner-extras]"
uv pip uninstall llama-index

# RDF extras (base install already includes rdflib/pyoxigraph; use these if you need the named groups)
uv pip install -e ".[rdf]"
uv pip install -e ".[rdf-full]"

# Observability
uv pip install -e ".[observability]"
uv pip install -e ".[observability-openlit]"
uv pip install -e ".[observability-dual]"

# Development tests / tooling
uv pip install -e ".[dev]"

# Docling OCR backends (see DOCLING_OCR in env-sample)
uv pip install -e ".[docling-ocr-easyocr]"
uv pip install -e ".[docling-ocr-tesserocr]"
uv pip install -e ".[docling-ocr-ocrmac]"   # macOS only

# Embedded ArcadeDB (not a bracket extra; bundled JVM)
uv pip install arcadedb-embedded>=26.3.2

uv-managed venv (alternative): change managed = false to managed = true in pyproject.toml [tool.uv] section, then just run uv pip install -e ..

Notes: run only the optional lines you need. For age-extras, run patch_langchain_age.py on Python 3.14+ (safe on 3.12/3.13). For surrealdb-extras, keep the follow-up surrealdb / langchain-core upgrades. For spanner-extras, uv pip uninstall llama-index removes the meta-package pulled in by llama-index-spanner. See ### Optional under Prerequisites for context.

Windows Note: If installation fails with "Microsoft Visual C++ 14.0 or greater is required" error, install Microsoft C++ Build Tools (required for compiling Docling dependencies). Select "Desktop development with C++" during installation.

Create a .env file by copying the sample and customizing:
hljs language-bash
```
cp env-sample.txt .env   # Linux/macOS
copy env-sample.txt .env  # Windows
```
Edit .env with your specific configuration. See docs/GETTING-STARTED/ENVIRONMENT-CONFIGURATION.md for detailed setup guide.

Note: The system requires Python 3.12, 3.13, or 3.14 as specified in pyproject.toml (requires-python = ">=3.12,<3.15"). Python 3.12 and 3.13 are fully tested. Python 3.14 works with the patches applied automatically in main.py at startup. Virtual environment management is controlled by managed = false in pyproject.toml [tool.uv] section (you control venv creation and naming).

Start the backend:

hljs language-bash

flexible-graphrag        # after uv pip install flexible-graphrag
# or: uv run start.py   # with source

The backend will be available at http://localhost:8000.

Frontend Setup (Standalone)

Standalone backend and frontend URLs:

Backend API: http://localhost:8000 (FastAPI server)
Angular: http://localhost:4200 (npm start)
React: http://localhost:5173 (npm run dev)
Vue: http://localhost:3000 (npm run dev)

Choose one of the following frontend options to work with:

React Frontend

Navigate to the React frontend directory:
hljs language-bash
```
cd flexible-graphrag-ui/frontend-react
```
Install Node.js dependencies (first time only):
hljs language-bash
```
npm install
```
Start the development server (uses Vite):
hljs language-bash
```
npm run dev
```

The React frontend will be available at http://localhost:5174.

Angular Frontend

Navigate to the Angular frontend directory:
hljs language-bash
```
cd flexible-graphrag-ui/frontend-angular
```
Install Node.js dependencies (first time only):
hljs language-bash
```
npm install
```
Start the development server (uses Angular CLI):
hljs language-bash
```
npm start
```

The Angular frontend will be available at http://localhost:4200.

Vue Frontend

Navigate to the Vue frontend directory:
hljs language-bash
```
cd flexible-graphrag-ui/frontend-vue
```
Install Node.js dependencies (first time only):
hljs language-bash
```
npm install
```
Start the development server (uses Vite):
hljs language-bash
```
npm run dev
```

The Vue frontend will be available at http://localhost:3000.

UI Usage

The system provides a tabbed interface for document processing and querying. Follow these steps in order. See docs/UI-GUIDE/UI-GUIDE.md for full details.

1. Sources Tab

Configure your data source and select files for processing. The system supports 13 data sources:

Detailed Configuration:

File Upload Data Source

Select: "File Upload" from the data source dropdown
Add Files:
- Drag & Drop: Drag files directly onto the upload area
- Click to Select: Click the upload area to open file selection dialog (supports multi-select)
- Note: If you drag & drop new files after selecting via dialog, only the dragged files will be used
Supported Formats: PDF, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, PNG, JPG, and more
Next Step: Click "CONFIGURE PROCESSING →" to proceed to Processing tab

Alfresco Repository

Select: "Alfresco Repository" from the data source dropdown
Configure:
- Alfresco Base URL (e.g., http://localhost:8080/alfresco)
- Username and password
- Path (e.g., /Sites/example/documentLibrary)
Next Step: Click "CONFIGURE PROCESSING →" to proceed to Processing tab

CMIS Repository

Select: "CMIS Repository" from the data source dropdown
Configure:
- CMIS Repository URL (e.g., http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom)
- Username and password
- Folder path (e.g., /Sites/example/documentLibrary)
Next Step: Click "CONFIGURE PROCESSING →" to proceed to Processing tab

All Data Sources (13 available):

Web Sources: Web Page, Wikipedia, YouTube
Cloud Storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Google Drive, Microsoft OneDrive
Enterprise Repositories: Alfresco, Microsoft SharePoint, Box, CMIS

See the Data Sources section for complete details on all 13 sources.

2. Processing Tab

Process your selected documents and monitor progress:

Start Processing: Click "START PROCESSING" to begin document ingestion
Monitor Progress: View real-time progress bars for each file
File Management:
- Use checkboxes to select files
- Click "REMOVE SELECTED (N)" to remove selected files from the list
- Note: This removes files from the processing queue, not from your system
Processing Pipeline: Documents are processed through Docling conversion, vector indexing, and knowledge graph creation

3. Search Tab

Perform searches on your processed documents:

Hybrid Search

Purpose: Find and rank the most relevant document excerpts
Usage: Enter search terms or phrases (e.g., "machine learning algorithms", "financial projections")
Action: Click "SEARCH" button
Results: Ranked list of document excerpts with relevance scores and source information
Best for: Research, fact-checking, finding specific information across documents

Q&A Query

Purpose: Get AI-generated answers to natural language questions
Usage: Enter natural language questions (e.g., "What are the main findings in the research papers?")
Action: Click "ASK" button
Results: AI-generated narrative answers that synthesize information from multiple documents
Best for: Summarization, analysis, getting overviews of complex topics

4. Chat Tab

Interactive conversational interface for document Q&A:

Chat Interface:
- Your Questions: Displayed on the right side vertically
- AI Answers: Displayed on the left side vertically
Usage: Type questions and press Enter or click send
Conversation History: All questions and answers are preserved in the chat history
Clear History: Click "CLEAR HISTORY" button to start a new conversation
Best for: Iterative questioning, follow-up queries, conversational document exploration

Testing Cleanup

Between tests you can clean up data:

Run cleanup.py: Clears vector, graph, and search indexes in one step — run from the flexible-graphrag directory
Vector Indexes: See docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md for vector database cleanup instructions
Graph Data: See docs/DATABASES/GRAPH-DATABASES/README-neo4j.md for graph-related cleanup commands

MCP Server Setup (Quickstart)

The MCP server (flexible-graphrag-mcp) is a lightweight standalone package that connects MCP clients (Claude Desktop, Cursor, etc.) to the Flexible GraphRAG backend via its REST API.

For full details see flexible-graphrag-mcp/README.md and flexible-graphrag-mcp/QUICK-USAGE-GUIDE.md. For the full list of available MCP tools see MCP Tools for Claude Desktop and Other MCP Clients below.

Steps

First terminal — install and run the flexible-graphrag backend (see Python Backend Setup above) — it must be running on http://localhost:8000.

Second terminal — install and start the MCP server in HTTP mode:

hljs language-bash

uv venv venv-mcp --python 3.13
venv-mcp\Scripts\Activate   # Windows
source venv-mcp/bin/activate  # Linux/macOS
uv pip install flexible-graphrag-mcp
flexible-graphrag-mcp --http --port 3001

Third terminal — test with MCP Inspector:
hljs language-bash
```
npx @modelcontextprotocol/inspector
```
Open the URL printed in the console (token pre-filled), set transport to Streamable HTTP, URL to http://localhost:3001/mcp, then click Connect.
Use with Claude Desktop and other MCP clients — see flexible-graphrag-mcp/README.md for stdio transport config and client-specific setup.

MCP Tools for Claude Desktop and Other MCP Clients

The MCP server provides 9 specialized tools for document intelligence workflows:

Tool	Purpose	Usage
`get_system_status()`	System health and configuration	Verify setup and database connections
`ingest_documents()`	Bulk document processing	All sources support `skip_graph`; filesystem/Alfresco/CMIS use `paths`; Alfresco also supports `nodeDetails` list (13 sources have their own config: filesystem, repositories (Alfresco, SharePoint, Box, CMIS), cloud storage, web)
`ingest_text(content, source_name)`	Custom text analysis	Analyze specific text content
`search_documents(query, top_k)`	Hybrid document retrieval	Find relevant document excerpts
`query_documents(query, top_k)`	AI-powered Q&A	Generate answers from document corpus
`test_with_sample()`	System verification	Quick test with sample content
`check_processing_status(id)`	Async operation monitoring	Track long-running ingestion tasks
`get_python_info()`	Environment diagnostics	Debug Python environment issues
`health_check()`	Backend connectivity	Verify API server connection

Client Support

Claude Desktop and other MCP clients: Native MCP integration with stdio transport
MCP Inspector: HTTP transport for debugging and development
Multiple Installation: pipx (system-wide) or uvx (no-install) options

Backend REST API

The FastAPI backend provides the following REST API endpoints:

Base URL: http://localhost:8000/api/

System

Endpoint	Method	Purpose
`/api/health`	GET	Health check — verify backend is running
`/api/status`	GET	System status and configuration (databases, LLM, feature flags)
`/api/info`	GET	System information and package versions
`/api/python-info`	GET	Python environment diagnostics

Ingestion

Endpoint	Method	Purpose
`/api/ingest`	POST	Ingest documents from a data source (`filesystem`, `s3`, `web`, `cmis`, ...)
`/api/upload`	POST	Upload files directly for processing
`/api/ingest-text`	POST	Ingest raw text content
`/api/test-sample`	POST	Test the system with built-in sample content
`/api/cleanup-uploads`	POST	Remove temporarily uploaded files

Async Processing

Endpoint	Method	Purpose
`/api/processing-status/{id}`	GET	Poll status of an async ingestion operation
`/api/processing-events/{id}`	GET	Server-Sent Events stream for real-time progress
`/api/cancel-processing/{id}`	POST	Cancel an ongoing processing operation

Search & Query

Endpoint	Method	Purpose
`/api/search`	POST	Hybrid search — returns ranked document excerpts
`/api/query`	POST	AI-powered Q&A — generates an answer from the document corpus

Graph

Endpoint	Method	Purpose
`/api/graph`	GET	Graph database status and node/relationship counts (Neo4j: live Cypher counts; other LC-backed stores: counts via `lc_graph.query()` where supported; remaining stores: status + dashboard URL)
`/api/graph/query`	POST	Execute a native graph query against the configured store — Cypher (Neo4j, Memgraph, FalkorDB, ArcadeDB, Ladybug, Apache AGE), AQL (ArangoDB), SurrealQL (SurrealDB), Gremlin (Cosmos), GSQL (TigerGraph), openCypher (Neptune/Analytics), GQL (Spanner), SPARQL fallback for RDF-only

RDF / Ontology (when RDF_GRAPH_DB is configured)

Endpoint	Method	Purpose
`/api/rdf/query/sparql`	POST	Execute a SPARQL query against the configured RDF store
`/api/rdf/ontology/info`	GET	Return loaded ontology entity and relation type lists
`/api/rdf/ontology/upload`	POST	Upload a new ontology file at runtime
`/api/rdf/rdf-store/list`	GET	List registered RDF stores
`/api/rdf/rdf-store/connect`	POST	Register an additional RDF store at runtime
`/api/rdf/rdf-store/{name}`	DELETE	Deregister an RDF store
`/api/rdf/export/rdf`	POST	Export knowledge graph as RDF (501 stub — not yet implemented)

Interactive API Documentation (requires running backend):

UI	URL	Notes
Swagger UI	http://localhost:8000/docs	Try endpoints, inspect schemas, submit requests
ReDoc	http://localhost:8000/redoc	Cleaner read-only reference view

See docs/DEVELOPER/REST-API.md for the full endpoint reference with request/response examples.

Full-Stack Debugging (Standalone Mode)

VS Code launch configurations, backend/frontend debugging, log levels, and MCP Inspector setup — see docs/DEVELOPER/DEVELOPER-FULL-STACK-DEBUGGING.md.

Observability and Monitoring

Flexible GraphRAG includes comprehensive observability features for production monitoring:

OpenTelemetry Integration: Industry-standard instrumentation with automatic LlamaIndex tracing
Distributed Tracing: Jaeger UI for visualizing complete request flows
Metrics Collection: Prometheus for RAG-specific metrics (retrieval/LLM latency, token usage, entity/relation counts)
Visualization: Grafana dashboards with pre-configured RAG metrics panels
Dual Mode Support: OpenInference (LlamaIndex) + OpenLIT (optional) as dual OTLP producers
Custom Instrumentation: Decorators for adding tracing to custom code

Quick Start

Install observability dependencies (optional):

hljs language-bash

cd flexible-graphrag
uv pip install -e ".[observability-dual]"  # OpenInference (LlamaIndex + LangChain) + OpenLIT (recommended)
# Or combine with dev tools: uv pip install -e ".[observability-dual,dev]"

Enable in .env:

hljs language-bash

ENABLE_OBSERVABILITY=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OBSERVABILITY_BACKEND=both  # openinference, openlit, or both (recommended)

Start observability stack:

hljs language-bash

cd docker
# Uncomment observability.yaml in docker-compose.yaml first
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Access dashboards:
- Grafana: http://localhost:3009 (admin/admin) - RAG metrics dashboards
- Jaeger: http://localhost:16686 - Distributed tracing
- Prometheus: http://localhost:9090 - Raw metrics

See docs/DEVELOPER/OBSERVABILITY/OBSERVABILITY.md for complete setup, custom instrumentation, and production best practices.

Project Structure

/flexible-graphrag: Python FastAPI backend
- main.py: FastAPI REST API server
- backend.py: Shared business logic used by both API and MCP
- config.py: Configurable settings for data sources, databases, and LLM providers
- factories.py: Factory classes for LLM and database creation
- hybrid_system.py: Main hybrid search and ingestion system
- post_ingestion_state.py: Post-ingestion document state tracking
- query_engine.py: Query engine with result deduplication and re-scoring
- retriever_setup.py: Retriever assembly — vector, search, graph, RDF, synonym expansion
- schema_manager.py: Database schema management
- adapters/: Framework-neutral ABCs and factories for all subsystems
  - adapters/graph/: Property graph and RDF store adapter ABCs
  - adapters/llm/: LLM and embedding adapter ABCs (BothLLMAdapter, BothEmbeddingAdapter)
  - adapters/process/: Chunker and KG extractor ABCs and build_* factories
  - adapters/search/: Search store adapter ABC
  - adapters/vector/: Vector store adapter ABC
- incremental_updates/: Auto-sync engine — detectors, orchestrator, state manager for real-time/near-real-time source sync
- ingest/: Modular ingestion steps — ingest_from_files, ingest_from_text, ingest_from_source, run_chunk_pipeline, update_pg_graph, update_rdf_graph, update_vector, update_search
- langchain/: LangChain peer framework — graph, vector, search, chunking, KG extraction, retrieval
  - langchain/graph/pg_store_adapters/: 15 property graph store adapters (one file per store)
  - langchain/graph/rdf_store_adapters/: 4 RDF/SPARQL store adapters (Fuseki, GraphDB, Oxigraph, Neptune)
  - langchain/graph/retrievers/: li_/lc_ two-layer retriever classes — text-to-query, neighborhood, vector, logging, synonym
  - langchain/llm/: LangChain LLM + embedding factories for all 13 providers
  - langchain/process/: LangChainChunkerAdapter (6 splitter types), LangChainKGExtractorAdapter
  - langchain/search/adapters/: BM25, Elasticsearch, OpenSearch search adapters
  - langchain/vector/adapters/: 10 vector store adapters
- llamaindex/: LlamaIndex peer framework — graph, vector, search, chunking, KG extraction
  - llamaindex/graph/adapters/: LlamaIndex property graph store adapters (Neo4j, ArcadeDB, FalkorDB, Memgraph, Nebula, Neptune, etc.)
  - llamaindex/llm/: LlamaIndex LLM + embedding factories for all 13 providers
  - llamaindex/process/: LlamaIndexChunkerAdapter, LlamaIndexKGExtractorAdapter
  - llamaindex/search/adapters/: Elasticsearch, OpenSearch search adapters
  - llamaindex/vector/adapters/: Qdrant, Elasticsearch, OpenSearch, pgvector, Chroma, and others
- observability/: OpenTelemetry instrumentation, Prometheus metrics, tracing setup
- process/: Core document processing — document_processor.py (Docling/LlamaParse), kg_extractor.py, node_pipeline.py
- rdf/: RDF/ontology support — ontology manager, KG-to-RDF converter, SPARQL tools, bundled schemas (rdf/schemas/)
  - rdf/store/: RDF store adapters — Fuseki, GraphDB, Oxigraph, store factory
- sources/: Data source connectors — filesystem, CMIS/Alfresco, Azure Blob, S3, GCS, OneDrive, SharePoint, Google Drive, Box, web, Wikipedia, YouTube, etc.
- stores/: Index managers — index_manager.py, rdf_manager.py
- pyproject.toml: Modern Python package definition (PEP 517/518)
- uv.toml: UV package manager configuration
- start.py: Startup script (flexible-graphrag console entry point)
- install.py: Installation helper script
/flexible-graphrag-mcp: Standalone MCP server
- main.py: HTTP-based MCP server (calls REST API)
- pyproject.toml: MCP package definition with minimal dependencies
- README.md: MCP server setup and installation instructions
- QUICK-USAGE-GUIDE.md: Quick usage guide
- Lightweight: Only 4 dependencies (fastmcp, nest-asyncio, httpx, python-dotenv)
/flexible-graphrag-ui: Frontend applications
- /frontend-react: React + TypeScript frontend (built with Vite)
  - /src: Source code
  - vite.config.ts: Vite configuration
  - tsconfig.json: TypeScript configuration
  - package.json: Node.js dependencies and scripts
- /frontend-angular: Angular + TypeScript frontend (built with Angular CLI)
  - /src: Source code
  - angular.json: Angular configuration
  - tsconfig.json: TypeScript configuration
  - package.json: Node.js dependencies and scripts
- /frontend-vue: Vue + TypeScript frontend (built with Vite)
  - /src: Source code
  - vite.config.ts: Vite configuration
  - tsconfig.json: TypeScript configuration
  - package.json: Node.js dependencies and scripts
/docker: Docker infrastructure
- docker-compose.yaml: Main compose file with modular includes
- /includes: Modular database and service configurations
- /nginx: Reverse proxy configuration
- README.md: Docker deployment documentation
/docs: Documentation
- ARCHITECTURE.md: System architecture and component relationships
- DEPLOYMENT-CONFIGURATIONS.md: Standalone, hybrid, and full Docker deployment guides
- DOCKER-RESOURCE-CONFIGURATION.md: Docker memory/CPU configuration for Windows (WSL2), macOS, and Linux — essential for running the full stack, especially with vLLM
- ENVIRONMENT-CONFIGURATION.md: Environment setup guide with database switching
- POSTGRES-SETUP.md: PostgreSQL setup for pgvector and incremental state management
- SCHEMA-EXAMPLES.md: Knowledge graph schema examples
- PERFORMANCE.md: Performance benchmarks and optimization guides
- DEFAULT-USERNAMES-PASSWORDS.md: Database credentials and dashboard access
- PORT-MAPPINGS.md: Complete port reference for all services
- DATA-SOURCES/: Data source setup guides (Azure Blob, S3, GCS, Alfresco etc.)
- DOC-PROCESSING/: Document processing guides (Docling GPU, parser output)
- GRAPH-DATABASES/: Graph database guides (Neo4j, Neptune, Nebula, ArcadeDB, etc.)
- INCREMENTAL-UPDATE-AUTO-SYNC/: Incremental updates documentation (README, QUICKSTART, SETUP-GUIDE, API-REFERENCE)
- LLM/: LLM and embedding configuration guides
- LANGCHAIN/: LangChain integration guides (RDF QA fusion, graph retriever setup, adapter reference)
- OBSERVABILITY/: Observability and monitoring guides
- RDF/: RDF/ontology guides (store setup, ontology config, ingestion modes, SPARQL examples, user guide)
- VECTOR-DATABASES/: Vector database guides (dimensions, integration, Chroma modes)
/scripts: Utility scripts
- create_opensearch_pipeline.py: OpenSearch hybrid search pipeline setup
- setup-opensearch-pipeline.sh/.bat: Cross-platform pipeline creation
- rdf_cleanup.py: RDF store CLI tool — list-docs, count, clear-doc, clear-all
- litellm_config.yaml: Sample LiteLLM proxy config (copy to your LiteLLM install dir)
- /incremental: Incremental updates control scripts
  - sync-now.sh/.ps1/.bat: Trigger immediate synchronization
  - set-refresh-interval.sh/.ps1/.bat: Configure polling interval
  - README.md: Script usage documentation
/tests: Test suite
- test_bm25_*.py: BM25 configuration and integration tests
- conftest.py: Test configuration and fixtures
- run_tests.py: Test runner
/examples: Standalone usage examples (not re-tested)
- observability_example.py: OpenTelemetry / observability integration example
- /rdf: RDF/ontology examples
  - sparql_examples.py: Sample SPARQL queries for all three stores
  - unified_query_engine_examples.py: UnifiedQueryEngine usage examples
  - store_index_example.py: Build a LlamaIndex from an RDF store
  - ontology_guided_ingestion_example.py: OntologyAwarePropertyGraphBuilder usage
  - ingest_with_ontology.py: Ontology-guided ingestion example class
  - rdf_export_import_examples.py: RDF export/import patterns
  - config_rdf_stores.py: RDF store config reference snippets

License

This project is licensed under the terms of the Apache License 2.0. See the LICENSE file for details.

New: Flexible GraphRAG supports automatic incremental updates (Optional) from most data sources, keeping your Vector, Search and Graph databases synchronized in real-time or near real-time.

New: KG Spaces Integration of Flexible GraphRAG in Alfresco ACA Client

Flexible GraphRAG

Flexible GraphRAG data sources, processing tab, auto-sync document states in Postgres, Neo4j

v0.6.0 in brief

Features

Hybrid Search: Configurable hybrid search combining vector search, full-text search, property-graph GraphRAG, and SPARQL against RDF stores.
Knowledge Graph GraphRAG: Extracts entities and relationships from documents to build graphs in property graph databases and RDF stores. Optional schemas and ontologies guide extraction or act as a starting point for the LLM to extend.
RDF/Ontology Support: Load OWL/RDFS ontologies to guide KG extraction into any property graph or RDF store; SPARQL 1.1 queries; RDF 1.2 triple annotations; full UI pipeline (ingest, hybrid search, AI query/chat, incremental auto-sync). See Ontology and RDF Support below.
15 Property Graph Databases: 8 on both LI+LC (Neo4j, ArcadeDB, FalkorDB, Ladybug, Memgraph, NebulaGraph, Amazon Neptune, Neptune Analytics), 1 LI-only (Google Cloud Spanner), 6 LC-only (ArangoDB, Apache AGE, Cosmos Gremlin, HugeGraph, SurrealDB, TigerGraph) — with KG extraction, hybrid search, and AI query/chat
4 RDF Triple Stores: Apache Jena Fuseki, Ontotext GraphDB, Oxigraph, Amazon Neptune RDF.
10 Vector Databases: Qdrant, Elasticsearch, OpenSearch, Neo4j, Chroma, Milvus, Weaviate, Pinecone, PostgreSQL pgvector, LanceDB — for semantic similarity search
3 Search Databases: Elasticsearch, OpenSearch, BM25 (built-in) — for full-text search and hybrid ranking
LLM providers (KG extraction & chat): Ollama, OpenAI, Azure OpenAI, Google Gemini, Anthropic Claude, Google Vertex AI, Amazon Bedrock, Groq, Fireworks AI, OpenAI-compatible endpoints (openai_like), OpenRouter, LiteLLM proxy, and vLLM — configurable via LLM_PROVIDER; see Supported LLM Providers
Embedding providers: OpenAI, Ollama, Azure OpenAI, Google GenAI, Vertex AI, Bedrock, Fireworks, OpenAI-like (EMBEDDING_KIND=openai_like), and LiteLLM — see LLM Configuration
Dual-framework pipeline: LlamaIndex and LangChain are first-class choices for chunking, vector and search adapters, property graphs, KG extraction, RDF text-to-SPARQL retrieval, and hybrid fusion—each stage can be set independently (LlamaIndex defaults). See Framework Configuration.
Multi-Source Ingestion: Processes documents from 13 data sources (9 with incremental auto sync): (file upload, cloud storage, enterprise repositories, web sources) with Docling (default) or LlamaParse (cloud API) document parsing.
Observability: Built-in OpenTelemetry instrumentation with automatic LlamaIndex tracing, Prometheus metrics, Jaeger traces, and Grafana dashboards for production monitoring
FastAPI Server with REST API: Python based FastAPI server with REST APIs for document ingesting, hybrid search, AI query, and AI chat.
MCP Server: MCP server providing Claude Desktop and other MCP clients with tools for document/text ingesting (all 13 data sources with 9 supporting incremental auto sync), hybrid search, and AI query. Uses FastAPI backend REST APIs.
UI Clients: Angular, React, and Vue UI clients support choosing the data source (filesystem, Alfresco, CMIS, etc.), ingesting documents, performing hybrid searches, AI queries, and AI chat. The UI clients use the REST APIs of the FastAPI backend.
Docker Deployment Flexibility: Supports both standalone and Docker deployment modes. Docker infrastructure provides modular database selection via docker-compose includes - vector, graph, search engines, and Alfresco can be included or excluded with a single comment. Choose between hybrid deployment (databases in Docker, backend and UIs standalone) or full containerization.

Frontend Screenshots

Angular Frontend - Tabbed Interface

Click to view Angular UI screenshots (Light Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

React Frontend - Tabbed Interface

Click to view React UI screenshots (Dark Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

Click to view React UI screenshots (Light Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

Vue Frontend - Tabbed Interface

Click to view Vue UI screenshots (Light Theme)

Sources Tab	Processing Tab	Search Tab	Chat Tab

System Components

FastAPI Backend (`/flexible-graphrag`)

REST API Server: Provides endpoints for document ingestion, search, and AI query/chat
Hybrid Search Engine: Combines vector similarity (RAG), fulltext (BM25), and graph traversal (GraphRAG)
Document Processing: Advanced document conversion with Docling and LlamaParse integration
Configurable Architecture: Environment-based configuration for all components
Async Processing: Background task processing with real-time progress updates

MCP Server (`/flexible-graphrag-mcp`)

MCP Client support: Model Context Protocol server for Claude Desktop and other MCP clients
Full API Parity: Tools like ingest_documents() support all 13 data sources with source-specific configs: filesystem, repositories (Alfresco, SharePoint, Box, CMIS), cloud storage, web; skip_graph flag for all data sources; paths parameter for filesystem/Alfresco/CMIS; Alfresco also supports nodeDetails list (multi-select for KG Spaces)
Additional Tools: search_documents(), query_documents(), ingest_text(), system diagnostics, and health checks
Dual Transport: HTTP mode for debugging, stdio mode for production
Tool Suite: 9 specialized tools for document processing, search, and system management
Multiple Installation: pipx system installation or uvx no-install execution

UI Clients (`/flexible-graphrag-ui`)

Angular Frontend: Material Design with TypeScript
React Frontend: Modern React with Vite and TypeScript
Vue Frontend: Vue 3 Composition API with Vuetify and TypeScript
Unified Features: All clients support the 4 tab views, async processing, progress tracking, and cancellation

Docker Infrastructure (`/docker`)

Modular Database Selection: Include/exclude vector, graph, and search engines, and Alfresco with single-line comments
Flexible Deployment: Hybrid mode (databases in Docker, apps standalone) or full containerization
NGINX Reverse Proxy: Unified access to all services with proper routing
Built-in Database Dashboards: Most server dockers also provide built-in web interface dashboards (Neo4j browser, ArcadeDB, FalkorDB, OpenSearch, etc.)
Separate Dashboards: Additional dashboard dockers are provided: including Kibana for Elasticsearch and optional Ladybug Explorer (see docker/includes/ladybug-explorer.yaml).

Data Sources

Flexible GraphRAG supports 13 different data sources for ingesting documents into your knowledge base:

File & Upload Sources

File Upload - Direct file upload through web interface with drag & drop support

Cloud Storage Sources

Amazon S3 - AWS S3 bucket integration
Google Cloud Storage (GCS) - Google Cloud storage buckets
Azure Blob Storage - Microsoft Azure blob containers
OneDrive - Microsoft OneDrive personal/business storage
Google Drive - Google Drive file storage

Enterprise Repository Sources

Alfresco - Alfresco ECM/content repository with two integration options:
- KG Spaces ACA Extension - Integrates the Flexible GraphRAG Angular UI as an extension plugin within the Alfresco Content Application (ACA), enabling multi-select document/folder ingestion with nodeIds directly from the Alfresco interface
- Flexible GraphRAG Alfresco Data Source - Direct integration using Alfresco paths (e.g., /Shared/GraphRAG, /Company Home/Shared/GraphRAG, or /Shared/GraphRAG/cmispress.txt)
SharePoint - Microsoft SharePoint document libraries
Box - Box.com cloud storage
CMIS (Content Management Interoperability Services) - Industry-standard content repository interface

Web Sources

Web Pages - Extract content from web URLs
Wikipedia - Ingest Wikipedia articles by title or URL
YouTube - Process YouTube video transcripts

Each data source includes:

Configuration Forms: Easy-to-use interfaces for credentials and settings
Progress Tracking: Real-time per-file progress indicators
Flexible Authentication: Support for various auth methods (API keys, OAuth, service accounts)

Incremental Updates & Auto-Sync

NEW! Flexible GraphRAG supports automatic incremental updates (Optional) from most data sources, keeping your Vector, Search and Graph databases synchronized in real-time or near real-time:

Data Source	Auto-Sync Support	Detection Method	Status	Notes
Alfresco	✅ Real-time	Community ActiveMQ	Ready	Enterprise Event Gateway planned
Amazon S3	✅ Real-time	SQS event notifications	Ready
Azure Blob Storage	✅ Real-time	Change feed	Ready
Google Cloud Storage	✅ Real-time	Pub/Sub notifications	Ready
Google Drive	✅ Near real-time	Changes API (polling)	Ready
OneDrive	✅ Near real-time	Polling	Ready	Delta query support planned
SharePoint	✅ Near real-time	Polling	Ready	Delta query support planned
Box	✅ Near real-time	Events API (polling)	Ready
Local Filesystem	✅ Real-time	OS events (watchdog)	Ready	REST API and MCP Server only
File Upload UI, CMIS, Web Pages, Wikipedia, YouTube	➖ Not supported	-	-	No support for incremental updates

Features:

Modification Date Tracking: Uses file modification timestamps (ordinal) to detect changes
Content Hash Optimization: Skips reprocessing when file modification date changed but content hasn't
Dual Mechanism: Event-driven streams (real-time) + periodic polling fallback
LlamaIndex Integration: Uses proper abstractions for all databases
UI, REST API, MCP Server: Setting up an auto update data source location can be done thru the 3 UIs, with the REST API, or with the MCP server

Setup Requirements:

Enable incremental updates in your .env file:

hljs language-bash

ENABLE_INCREMENTAL_UPDATES=true

# PostgreSQL database for state management
# By default, uses the pgvector database from docker-compose.yaml
POSTGRES_INCREMENTAL_URL=postgresql://postgres:password@localhost:5433/postgres

Usage:

Check the "Enable auto change sync" checkbox in the Processing tab when configuring your data source
For S3: Also provide the "SQS Queue URL" for event notifications
For GCS: Also provide the "Pub/Sub Subscription Name" for real-time updates

PostgreSQL for State Management:

Documentation:

System overview: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/README.md
Quick start: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/QUICKSTART.md
Detailed setup: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/SETUP-GUIDE.md
API reference: docs/DATA-SOURCES/INCREMENTAL-UPDATE-AUTO-SYNC/API-REFERENCE.md
PostgreSQL setup: docs/DATABASES/POSTGRES-SETUP.md

Scripts:

scripts/incremental/sync-now.sh|.ps1|.bat - Trigger immediate synchronization
scripts/incremental/set-refresh-interval.sh|.ps1|.bat - Configure polling interval
scripts/incremental/TIMING-CONFIGURATION.md - Timing configuration details
scripts/incremental/README.md - Script usage documentation

Document Processing Options

All data sources support two document parser options:

Docling (Default):

Open-source, local processing
Free with no API costs
GPU acceleration supported (CUDA/Apple Silicon) for 5-10x faster processing
Built-in OCR for scanned documents and images — DOCLING_OCR=true + DOCLING_OCR_ENGINE=auto|rapidocr|easyocr|tesseract_cli|tesserocr|ocrmac
Multi-language support (English, German, French, Spanish, Czech, Russian, Chinese, Japanese, etc.)
Configured via: DOCUMENT_PARSER=docling
DOCLING_DEVICE=auto|cpu|cuda|mps — control GPU vs CPU processing
SAVE_PARSING_OUTPUT=true — save intermediate parsing results for inspection (works for both parsers)
PARSER_FORMAT_FOR_EXTRACTION=auto|markdown|plaintext — control format used for knowledge graph extraction
See Docling GPU + OCR Configuration Guide for setup details | Quick Reference

LlamaParse:

Cloud-based API service with advanced AI
Multimodal parsing with Claude Sonnet 3.5
Three modes available:
- parse_page_without_llm - 1 credit/page
- parse_page_with_llm - 3 credits/page (default)
- parse_page_with_agent - 10-90 credits/page
Configured via: DOCUMENT_PARSER=llamaparse + LLAMAPARSE_API_KEY
Get your API key from LlamaCloud
New: SAVE_PARSING_OUTPUT=true - Save parsed output and metadata for inspection
New: PARSER_FORMAT_FOR_EXTRACTION=auto|markdown|plaintext - Control format used for knowledge graph extraction

Supported File Formats

Document Formats

PDF: .pdf
- Docling: Advanced layout analysis, table extraction, formula recognition, configurable OCR (EasyOCR, Tesseract, RapidOCR)
- LlamaParse: Automatic OCR within parsing pipeline, multimodal vision processing
Microsoft Office: .docx, .xlsx, .pptx and legacy formats (.doc, .xls, .ppt)
- Docling: DOCX, XLSX, PPTX structure preservation and content extraction
- LlamaParse: Full Office suite support including legacy formats and hundreds of variants
Web Formats: .html, .htm, .xhtml
- Docling: HTML/XHTML markup structure analysis
- LlamaParse: HTML/XHTML content extraction and formatting
Data Formats: .csv, .tsv, .json, .xml
- Docling: CSV structured data processing
- LlamaParse: CSV, TSV, JSON, XML with enhanced table understanding
Documentation: .md, .markdown, .asciidoc, .adoc, .rtf, .txt, .epub
- Docling: Markdown, AsciiDoc technical documentation with markup preservation
- LlamaParse: Extended format support including RTF, EPUB, and hundreds of text format variants

Image Formats

Standard Images: .png, .jpg, .jpeg, .gif, .bmp, .webp, .tiff, .tif
- Docling: OCR text extraction with configurable OCR backends (EasyOCR, Tesseract, RapidOCR)
- LlamaParse: Automatic OCR with multimodal vision processing and context understanding

Audio Formats

Audio Files: .wav, .mp3, .mp4, .m4a
- Docling: Automatic speech recognition (ASR) support
- LlamaParse: Transcription and content extraction for MP3, MP4, MPEG, MPGA, M4A, WAV, WEBM

Processing Intelligence

Parser Selection:
- Docling (default, free): Local processing with specialized CV models (DocLayNet layout analysis, TableFormer for tables), configurable OCR backends (EasyOCR/Tesseract/RapidOCR), optional local VLM support (Granite-Docling, SmolDocling, Qwen2.5-VL, Pixtral)
- LlamaParse (cloud API, 3 credits/page): Automatic OCR in parsing pipeline, supports hundreds of file formats, fast mode (OCR-only), default mode (proprietary LlamaCloud model), premium mode (proprietary VLM mixture), multimodal mode (bring your own API keys: OpenAI GPT-4o, Anthropic Claude 3.5/4.5 Sonnet, Google Gemini 1.5/2.0, Azure OpenAI)
Output Formats:
- Flexible GraphRAG saves both markdown and plaintext, then automatically selects which to use for processing (knowledge graph extraction, vector embeddings, and search indexing) - defaults to markdown for tables, plaintext for text-heavy docs - override with PARSER_FORMAT_FOR_EXTRACTION
- Docling supports: Markdown, JSON (lossless with bounding boxes and provenance), HTML, plain text, and DocTags (specialized markup preserving multi-column layouts, mathematical formulas, and code blocks)
- LlamaParse supports: Markdown, plain text, raw JSON, XLSX (extracted tables), PDF, images (extracted separately), and structured output (beta - enforces custom JSON schema for strict data model extraction)
Format Detection: Automatic routing based on file extension and content analysis

Database Configuration

Flexible GraphRAG uses three types of databases for its hybrid search capabilities. Each can be configured independently via environment variables.

Search Databases (Full-Text Search)

Set SEARCH_DB to select the store and SEARCH_BACKEND=llamaindex or langchain for the framework.

BM25 (Built-in): Local in-memory BM25 full-text search with TF-IDF ranking
- Dashboard: None (file-based)
- Configuration:
  hljs language-bash
```
SEARCH_DB=bm25
BM25_SEARCH_DB_CONFIG={"persist_dir": "./bm25_index"}
```
Elasticsearch: Enterprise search engine with advanced analyzers, faceted search, and real-time analytics
- Dashboard: Kibana (http://localhost:5601)
- Configuration:
  hljs language-bash
```
SEARCH_DB=elasticsearch
ELASTICSEARCH_SEARCH_DB_CONFIG={"hosts": ["http://localhost:9200"], "index_name": "hybrid_search"}
```
OpenSearch: AWS-led open-source fork with native hybrid scoring (vector + BM25) and k-NN algorithms
- Dashboard: OpenSearch Dashboards (http://localhost:5601)
- Configuration:
  hljs language-bash
```
SEARCH_DB=opensearch
OPENSEARCH_SEARCH_DB_CONFIG={"hosts": ["http://localhost:9201"], "index_name": "hybrid_search"}
```
None: Disable full-text search (vector search only)
- Configuration:
  hljs language-bash
```
SEARCH_DB=none
```

Vector Databases (Semantic Search)

Set VECTOR_DB to select the store and VECTOR_BACKEND=llamaindex or langchain for the framework.

When switching embedding models, delete existing vector indexes — dimensions differ by provider. See docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md for cleanup instructions.

Supported Vector Databases

Neo4j: Can be used as vector database with separate vector configuration

Dashboard: Neo4j Browser (http://localhost:7474)

Configuration:

hljs language-bash

VECTOR_DB=neo4j
NEO4J_VECTOR_DB_CONFIG={"uri": "bolt://localhost:7687", "username": "neo4j", "password": "your_password", "index_name": "hybrid_search_vector"}

Qdrant: Dedicated vector database with advanced filtering
- Dashboard: Qdrant Web UI (http://localhost:6333/dashboard)
- Configuration:
  hljs language-bash
```
VECTOR_DB=qdrant
QDRANT_VECTOR_DB_CONFIG={"host": "localhost", "port": 6333, "collection_name": "hybrid_search"}
```
Elasticsearch: Can be used as vector database with separate vector configuration
- Dashboard: Kibana (http://localhost:5601)
- Configuration:
  hljs language-bash
```
VECTOR_DB=elasticsearch
ELASTICSEARCH_VECTOR_DB_CONFIG={"hosts": ["http://localhost:9200"], "index_name": "hybrid_search_vectors"}
```
OpenSearch: Can be used as vector database with separate vector configuration
- Dashboard: OpenSearch Dashboards (http://localhost:5601)
- Configuration:
  hljs language-bash
```
VECTOR_DB=opensearch
OPENSEARCH_VECTOR_DB_CONFIG={"hosts": ["http://localhost:9201"], "index_name": "hybrid_search_vectors"}
```

Chroma: Open-source vector database with dual deployment modes

Dashboard: Swagger UI (http://localhost:8001/docs/) (HTTP mode)

Configuration (Local Mode):

hljs language-bash

VECTOR_DB=chroma
CHROMA_VECTOR_DB_CONFIG={"persist_directory": "./chroma_db", "collection_name": "hybrid_search"}

Configuration (HTTP Mode):

hljs language-bash

VECTOR_DB=chroma
CHROMA_VECTOR_DB_CONFIG={"host": "localhost", "port": 8001, "collection_name": "hybrid_search"}

Milvus: Cloud-native, scalable vector database for similarity search
- Dashboard: Attu (http://localhost:3003)
- Configuration:
  hljs language-bash
```
VECTOR_DB=milvus
MILVUS_VECTOR_DB_CONFIG={"host": "localhost", "port": 19530, "collection_name": "hybrid_search"}
```
Weaviate: Vector search engine with semantic capabilities and data enrichment
- Dashboard: Weaviate Console (http://localhost:8081/console)
- Configuration:
  hljs language-bash
```
VECTOR_DB=weaviate
WEAVIATE_VECTOR_DB_CONFIG={"url": "http://localhost:8081", "index_name": "HybridSearch"}
```
Pinecone: Managed vector database service optimized for real-time applications
- Dashboard: Pinecone Console (web-based)
- Configuration:
  hljs language-bash
```
VECTOR_DB=pinecone
PINECONE_VECTOR_DB_CONFIG={"api_key": "your_api_key", "region": "us-east-1", "cloud": "aws", "index_name": "hybrid-search"}
```

PostgreSQL: Traditional database with pgvector extension for vector similarity search

Dashboard: pgAdmin (http://localhost:5050)

Configuration:

hljs language-bash

VECTOR_DB=postgres
POSTGRES_VECTOR_DB_CONFIG={"host": "localhost", "port": 5433, "database": "postgres", "username": "postgres", "password": "your_password"}

LanceDB: Modern, lightweight vector database designed for high-performance ML applications
- Dashboard: LanceDB Viewer (http://localhost:3005)
- Configuration:
  hljs language-bash
```
VECTOR_DB=lancedb
LANCEDB_VECTOR_DB_CONFIG={"uri": "./lancedb", "table_name": "hybrid_search"}
```

RAG without GraphRAG

For faster document ingest processing (no graph extraction), and hybrid search with only full text + vector, configure:

hljs language-bash

VECTOR_DB=qdrant       # Any vector store
SEARCH_DB=elasticsearch  # Any search engine
PG_GRAPH_DB=none

Property Graph Databases (Knowledge Graph / GraphRAG)

Neo4j Property Graph: Primary knowledge graph storage with Cypher querying
- Dashboard: Neo4j Browser (http://localhost:7474)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=neo4j
NEO4J_GRAPH_DB_CONFIG={"uri": "bolt://localhost:7687", "username": "neo4j", "password": "your_password"}
```

ArcadeDB: Multi-model database supporting graph, document, key-value, and search with SQL and Cypher

Dashboard: ArcadeDB Studio (http://localhost:2480)

Configuration:

hljs language-bash

PG_GRAPH_DB=arcadedb
ARCADEDB_GRAPH_DB_CONFIG={"host": "localhost", "port": 2480, "username": "root", "password": "password", "database": "flexible_graphrag", "query_language": "sql"}

FalkorDB: High-performance graph database using GraphBLAS; purpose-built for LLM / GraphRAG
- Dashboard: FalkorDB Browser (http://localhost:3001)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=falkordb
FALKORDB_GRAPH_DB_CONFIG={"url": "falkor://localhost:6379", "database": "falkor"}
```
Ladybug: Embedded property graph database (Cypher, single .lbug file) with optional structured schema and HNSW vector index on chunks; Explorer UI via Docker (port 7003)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=ladybug
LADYBUG_GRAPH_DB_CONFIG={"db_dir": "./ladybug", "db_file": "database.lbug", "use_vector_index": true, "has_structured_schema": false, "strict_schema": false}
```
MemGraph: Real-time graph database with streaming support and advanced graph algorithms
- Dashboard: MemGraph Lab (http://localhost:3002)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=memgraph
MEMGRAPH_GRAPH_DB_CONFIG={"url": "bolt://localhost:7687", "username": "", "password": ""}
```
NebulaGraph: Distributed graph database for large-scale data with horizontal scalability
- Dashboard: NebulaGraph Studio (http://localhost:7001)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=nebula
NEBULA_GRAPH_DB_CONFIG={"space": "flexible_graphrag", "host": "localhost", "port": 9669, "username": "root", "password": "nebula"}
```
Amazon Neptune: Fully managed graph database service supporting property graph and RDF models
- Dashboard: Graph-Explorer (http://localhost:3007) or Neptune Workbench (AWS Console)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=neptune
NEPTUNE_GRAPH_DB_CONFIG={"host": "your-cluster.region.neptune.amazonaws.com", "port": 8182}
```
Amazon Neptune Analytics: Serverless graph analytics with openCypher support
- Dashboard: Graph-Explorer (http://localhost:3007) or Neptune Workbench (AWS Console)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=neptune_analytics
NEPTUNE_ANALYTICS_GRAPH_DB_CONFIG={"graph_identifier": "g-xxxxx", "region": "us-east-1"}
```
Google Cloud Spanner Graph (LlamaIndex only): Managed relational + property graph (GQL). Uses llama-index-spanner — install with uv pip install -e ".[spanner-extras]" then uv pip uninstall llama-index (see Optional under Prerequisites). LangChain is not supported for this store (langchain-google-spanner pins incompatible langchain-core).
- Setup: docs/DATABASES/GRAPH-DATABASES/SPANNER-SETUP.md
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=spanner
# GRAPH_BACKEND=llamaindex is forced for Spanner (LlamaIndex-only); langchain is ignored
SPANNER_GRAPH_DB_CONFIG={"project_id": "my-gcp-project", "instance_id": "my-spanner-instance", "database_id": "my-database", "graph_name": "knowledge_graph", "credentials_file": "./gcs.json"}
```

ArangoDB (LangChain only): Multi-model database with AQL graph queries

Dashboard: ArangoDB Web UI (http://localhost:8529)

Configuration:

hljs language-bash

PG_GRAPH_DB=arangodb
ARANGODB_GRAPH_DB_CONFIG={"url": "http://localhost:8529", "database": "flexible_graphrag", "username": "root", "password": "password"}

Apache AGE (LangChain only): PostgreSQL extension for graph data via Cypher

Dashboard: pgAdmin (http://localhost:5050)

Configuration:

hljs language-bash

PG_GRAPH_DB=apache_age
APACHE_AGE_GRAPH_DB_CONFIG={"host": "localhost", "port": 5434, "database": "flexible_graphrag_age", "username": "postgres", "password": "password", "graph_name": "knowledge_graph"}

HugeGraph (LangChain only): Distributed graph database with Gremlin and openCypher
- Dashboard: HugeGraph Hubble (http://localhost:8085)
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=hugegraph
HUGEGRAPH_GRAPH_DB_CONFIG={"host": "localhost", "port": 8082, "database": "hugegraph"}
```

SurrealDB (LangChain only): Multi-model database with SurrealQL graph queries

Dashboard: Surrealist (http://localhost:8011)

Configuration:

hljs language-bash

PG_GRAPH_DB=surrealdb
SURREALDB_GRAPH_DB_CONFIG={"url": "ws://localhost:8010/rpc", "namespace": "test", "database": "flexible_graphrag", "username": "root", "password": "root"}

TigerGraph (LangChain only): Distributed graph database with GSQL

Dashboard: GraphStudio (http://localhost:14240)

Configuration:

hljs language-bash

PG_GRAPH_DB=tigergraph
TIGERGRAPH_GRAPH_DB_CONFIG={"host": "http://localhost", "port": 14240, "restpp_port": 9002, "database": "MyGraph", "username": "tigergraph", "password": "tigergraph"}

Cosmos Gremlin (LangChain only): Azure Cosmos DB for Gremlin API

Configuration:

hljs language-bash

PG_GRAPH_DB=cosmos_gremlin
COSMOS_GREMLIN_GRAPH_DB_CONFIG={"url": "ws://localhost:8182/gremlin"}

None: Disable knowledge graph extraction for RAG-only mode
- Configuration:
  hljs language-bash
```
PG_GRAPH_DB=none
```

Ontology and RDF Support

Load OWL/RDFS ontologies (owl:Class, owl:ObjectProperty, owl:DatatypeProperty, rdfs:domain, rdfs:range) to constrain entity/relation extraction; OWL is supported but not required
Works with all 15 property graph databases — no RDF store required to use ontology-guided extraction
Full pipeline for all 4 RDF graph stores: UI document ingest → KG extraction → RDF storage; auto incremental sync; Hybrid Search and AI Query/Chat fuse RDF store results alongside vector, BM25, and property graph results
SPARQL 1.1 queries; RDF 1.2 triple terms and relation annotations ({| |} syntax); XSD-typed literals from OWL DatatypeProperty ranges

RDF Graph Store Configuration — set RDF_GRAPH_DB to select the store (all four support RDF 1.2 triple terms; Neptune is AWS-managed—no local compose include):

Apache Jena Fuseki — SPARQL 1.1 server; dashboard: http://localhost:3030
hljs language-bash
```
RDF_GRAPH_DB=fuseki
FUSEKI_BASE_URL=http://localhost:3030
FUSEKI_DATASET=flexible-graphrag
```

Ontotext GraphDB — enterprise RDF store with OWL reasoning; dashboard: http://localhost:7200

hljs language-bash

RDF_GRAPH_DB=graphdb
GRAPHDB_BASE_URL=http://localhost:7200
GRAPHDB_REPOSITORY=flexible-graphrag
GRAPHDB_USERNAME=admin
GRAPHDB_PASSWORD=admin

Oxigraph — lightweight local store, native RDF 1.2; dashboard: http://localhost:7878
hljs language-bash
```
RDF_GRAPH_DB=oxigraph
OXIGRAPH_URL=http://localhost:7878
```

Amazon Neptune RDF — managed SPARQL 1.1 on Neptune (same cluster can host property graph and RDF; IAM SigV4 auth). See Neptune RDF setup.

hljs language-bash

RDF_GRAPH_DB=neptune_rdf
NEPTUNE_RDF_HOST=db-neptune-1.cluster-xxxxxxxxxxxx.us-east-1.neptune.amazonaws.com
NEPTUNE_RDF_PORT=8182
NEPTUNE_RDF_REGION=us-east-1
NEPTUNE_RDF_USE_IAM_AUTH=true
NEPTUNE_RDF_USE_HTTPS=true
# Optional explicit keys (else default AWS credential chain):
# NEPTUNE_RDF_AWS_ACCESS_KEY_ID=
# NEPTUNE_RDF_AWS_SECRET_ACCESS_KEY=

None — disable RDF graph store:
hljs language-bash
```
RDF_GRAPH_DB=none
```

Docker Setup: Uncomment local RDF store includes in docker-compose.yaml (Fuseki, GraphDB, Oxigraph):

hljs language-yaml

includes:
  # - includes/jena-fuseki.yaml
  # - includes/ontotext-graphdb.yaml
  # - includes/oxigraph.yaml

Complete Documentation: docs/DATABASES/RDF/RDF-ONTOLOGY-SUPPORT.md | docs/DATABASES/RDF/RDF-STORE-USER-GUIDE.md

Framework Configuration

Every pipeline stage can independently run on LlamaIndex or LangChain via env var pickers:

Variable	Options	Description
`GRAPH_BACKEND`	`llamaindex` \| `langchain`	Property graph store and KG retrieval
`VECTOR_BACKEND`	`llamaindex` \| `langchain`	Vector store adapter
`SEARCH_BACKEND`	`llamaindex` \| `langchain`	Full-text search adapter
`CHUNKER_BACKEND`	`llamaindex` \| `langchain`	Document chunking / splitting
`KG_EXTRACTOR_BACKEND`	`llamaindex` \| `langchain`	KG extraction from chunks
`RETRIEVAL_FUSION`	`llamaindex` \| `langchain`	Result fusion across retrievers

Complete Documentation: docs/ADVANCED/LANGCHAIN/LANGCHAIN-GRAPH-INTEGRATION.md

LLM and Embedding Configuration

Set via LLM_PROVIDER and provider-specific environment variables.

Supported LLM Providers

OpenAI - gpt-4o-mini (default), gpt-4o, gpt-4.1-mini, gpt-5-mini, etc.
Ollama - Local deployment (llama3.2, llama3.1, qwen2.5, gpt-oss, etc.)
Azure OpenAI - Azure-hosted OpenAI models
Google Gemini - gemini-2.5-flash, gemini-3-flash-preview, gemini-3.1-pro-preview, etc.
Anthropic Claude - claude-sonnet-4-5, claude-haiku-4-5, etc.
Google Vertex AI - Google Cloud-hosted Vertex AI Platform Gemini models
Amazon Bedrock - Amazon Nova, Titan, Anthropic Claude, Meta Llama, Mistral AI, etc.
Groq - Fast low-cost LPU inference, low latency: OpenAI GPT-OSS, Meta Llama (4, 3.3, 3.1), Qwen3, Kimi, etc.
Fireworks AI - More choices, fine-tuning: Meta, Qwen, Mistral AI, DeepSeek, OpenAI GPT-OSS, Kimi, GLM, MiniMax, etc.
OpenAI-Compatible (openai_like) - Any OpenAI-compatible endpoint (LM Studio, LocalAI, Llamafile, vLLM, etc.)
OpenRouter - 200+ models via unified API (openai/gpt-4o-mini, anthropic/claude, meta-llama, etc.)
LiteLLM Proxy - 100+ providers via LiteLLM proxy; sample config in scripts/litellm_config.yaml
vLLM - High-performance local inference server (Linux/macOS; use openai_like on Windows)

LLM Provider Configuration

See docs/LLM/LLM-EMBEDDING-CONFIG.md for all 13 providers with detailed configuration examples.

OpenAI (recommended):

hljs language-bash

LLM_PROVIDER=openai
OPENAI_API_KEY=your_api_key
OPENAI_MODEL=gpt-4o-mini

Ollama (local):

hljs language-bash

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2:latest

Azure OpenAI:

hljs language-bash

LLM_PROVIDER=azure_openai
AZURE_OPENAI_API_KEY=your_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_ENGINE=gpt-4o-mini

Embedding Configuration

OpenAI:

hljs language-bash

EMBEDDING_KIND=openai
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
OPENAI_API_KEY=your_api_key

Ollama (local):

hljs language-bash

EMBEDDING_KIND=ollama
OLLAMA_EMBEDDING_MODEL=nomic-embed-text
OLLAMA_BASE_URL=http://localhost:11434

Azure OpenAI:

hljs language-bash

EMBEDDING_KIND=azure_openai
AZURE_EMBEDDING_MODEL=text-embedding-3-small
AZURE_EMBEDDING_DEPLOYMENT=your_deployment_name
AZURE_OPENAI_API_KEY=your_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/

Common embedding dimensions:

OpenAI: 1536 (text-embedding-3-small), 3072 (text-embedding-3-large)
Ollama: 384 (all-minilm), 768 (nomic-embed-text), 1024 (mxbai-embed-large)
Google: 768 (gemini-embedding-2-preview)
Bedrock: 1024 (amazon.titan-embed-text-v2:0)

When switching embedding models, delete existing vector indexes. See docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md for cleanup instructions.

Ollama Configuration

When using Ollama, configure system-wide environment variables before starting the Ollama service:

Key requirements:

Configure environment variables system-wide (not in Flexible GraphRAG .env file)
OLLAMA_NUM_PARALLEL=4 for optimal performance (or 1-2 if resource constrained)
Always restart Ollama service after changing environment variables

See docs/LLM/OLLAMA-CONFIGURATION.md for complete setup instructions including platform-specific steps and performance optimization.

Prerequisites

Required

Python 3.12, 3.13, or 3.14 (as specified in pyproject.toml)
UV package manager (for dependency management)
Node.js 22.x (for UI clients)
npm (package manager)
Search database: Elasticsearch or OpenSearch
Vector database: Qdrant (or other supported vector databases)
Property graph database: Neo4j (or other supported property graph databases) - unless using vector-only RAG
OpenAI with API key (recommended) or Ollama (for LLM processing)

Note: The docker/docker-compose.yaml file can provide all these databases via Docker containers.

Install

hljs language-bash

cd flexible-graphrag
uv pip install -e .

Optional (see flexible-graphrag/pyproject.toml for all options)

LangChain 1.x integration — Optional peer stack alongside LlamaIndex (extras pin langchain>=1.0 and the LangChain 1.x line, not legacy 0.3):
- uv pip install -e ".[langchain]" — core LC extras: property graph stores via langchain-community where supported, 10 vector stores, 3 search stores, RDF SPARQL retrieval, native LC LLM/embedding clients for all 13 providers, KG extraction via langchain-experimental, retrieval fusion
- uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras]" — adds Neo4j (LC), PostgreSQL pgvector, ArcadeDB, ArangoDB, Cosmos Gremlin, HugeGraph, TigerGraph, and related dependencies (see pyproject.toml group langchain-extras)
- Apache AGE — property graph via LangChain needs the separate age-extras group (BAEM1N langchain-age driver):
  hljs language-bash
```
uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras,age-extras]"
python scripts/patch_langchain_age.py
```
  Run patch_langchain_age.py on Python 3.14+ (required); on 3.12/3.13 it is harmless.
- uv pip install -e ".[spanner-extras]" — adds LI-only Spanner support via llama-index-spanner. Note: llama-index-spanner declares llama-index (the meta-package) as a dependency, which uv will install. Uninstall it immediately after: uv pip uninstall llama-index — having both llama-index and llama-index-core installed simultaneously can cause version conflicts, as the meta-package pins versions of llama-index-* component packages that can clash with the versions already required by this project
- SurrealDB — two-step install required (resolver conflict):
  hljs language-bash
```
uv pip install -e ".[surrealdb-extras]"
uv pip install "surrealdb>=2.0" "langchain-core>=1.3"
```
ArcadeDB embedded mode (uv pip install arcadedb-embedded>=26.3.2) — runs ArcadeDB in-process; includes a bundled JVM, no separate Java install needed; latest release: 26.3.2
Enterprise Repositories:
- Alfresco repository - only if using Alfresco data source
- SharePoint - requires SharePoint access
- Box - requires Box Business account (3 users minimum), API keys
- CMIS-compliant repository (e.g., Alfresco) - only if using CMIS data source
Cloud Storage (requires accounts and API keys/credentials):
- Amazon S3 - requires AWS account and access keys
- Google Cloud Storage - requires GCP account and service account credentials
- Google Drive - requires Google Cloud account and OAuth credentials or service account
- Azure Blob Storage - requires Azure account and connection string or account keys
- Microsoft OneDrive - requires OneDrive for Business (not personal OneDrive)
- Note: SharePoint and OneDrive for Business are also available with a M365 Developer Program sandbox (with full Visual Studio annual subscription, not monthly).
File Upload (no account required):
- Web interface with file dialog (drag & drop or click to select)
Web Sources (no account required):
- Web pages, Wikipedia, YouTube - no accounts needed

Setup

🐳 Docker Deployment

Docker deployment offers multiple scenarios. Before deploying any scenario, set up your environment files:

Environment File Setup (Required for All Scenarios):

Backend Configuration (.env):

hljs language-bash

# Navigate to backend directory
cd flexible-graphrag

# Linux/macOS
cp env-sample.txt .env

# Windows Command Prompt
copy env-sample.txt .env

# Edit .env with your database credentials, API keys, and settings
# Then return to project root
cd ..

Docker Configuration (docker.env):

hljs language-bash

# Navigate to docker directory
cd docker

# Linux/macOS
cp docker-env-sample.txt docker.env

# Windows Command Prompt
copy docker-env-sample.txt docker.env

# Edit docker.env for Docker-specific overrides (network addresses, service names)
# Stay in docker directory for next steps

Scenario A: Databases in Docker, App Standalone (Hybrid)

Configuration Setup:

hljs language-bash

# If not already in docker directory from previous step:
# cd docker

# Edit docker-compose.yaml to uncomment/comment services as needed
# Scenario A setup in docker-compose.yaml:
# Keep these services uncommented (default setup):
  - includes/neo4j.yaml
  - includes/qdrant.yaml
  - includes/elasticsearch-dev.yaml
  - includes/kibana-simple.yaml

# Keep these services commented out:
# - includes/app-stack.yaml       # Must be commented out for Scenario A
# - includes/proxy.yaml           # Must be commented out for Scenario A
# - All other services remain commented unless you want a different vector database, 
#   graph database, OpenSearch for search, or Alfresco included

Deploy Services:

hljs language-bash

# From the docker directory
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Scenario B: Full Stack in Docker (Complete)

Configuration Setup:

hljs language-bash

# If not already in docker directory from previous step:
# cd docker

# Edit docker-compose.yaml to uncomment/comment services as needed
# Scenario B setup in docker-compose.yaml:
# Keep these services uncommented:
  - includes/neo4j.yaml
  - includes/qdrant.yaml
  - includes/elasticsearch-dev.yaml
  - includes/kibana-simple.yaml
  - includes/app-stack.yaml       # Backend and UI in Docker
  - includes/proxy.yaml           # NGINX reverse proxy

# Keep other services commented out unless you want a different vector database,
# graph database, OpenSearch for search, or Alfresco included

Deploy Services:

hljs language-bash

# From the docker directory
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Scenario B Service URLs:

Angular UI: http://localhost:8070/ui/angular/
React UI: http://localhost:8070/ui/react/
Vue UI: http://localhost:8070/ui/vue/
Backend API: http://localhost:8070/api/

Other Deployment Scenarios

Scenario C: Fully Standalone - Not using docker-compose at all

Standalone backend, standalone UIs, all databases running separately
Configure all database connections in flexible-graphrag/.env

Scenario D: Backend/UIs in Docker, Databases External

Using docker-compose for backend and UIs (app-stack + proxy)
Some or all databases running separately (same docker-compose, other local Docker, cloud/remote servers)
Configure database connections in docker/docker.env: Backend in Docker reads this file
- For databases in same docker-compose: Use service names (e.g., neo4j:7687, qdrant:6333)
- For databases in other local Docker containers: Use host.docker.internal:PORT
- For remote/cloud databases: Use actual hostnames/IPs

Scenario E: Mixed Docker/Standalone

Standalone backend and UIs
Running some databases in Docker (local) and some outside (cloud, external servers)
Configure all database connections in flexible-graphrag/.env: Use host.docker.internal:PORT for locally-running Docker databases, use actual hostnames/IPs for remote Docker or non-Docker databases

Docker Control and Configuration

Managing Docker services:

hljs language-bash

# Navigate to docker directory (if not already there)
cd docker

# Create and start services (recreates if configuration changed)
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

# Stop services (keeps containers)
docker-compose -f docker-compose.yaml -p flexible-graphrag stop

# Start stopped services
docker-compose -f docker-compose.yaml -p flexible-graphrag start

# Stop and remove services
docker-compose -f docker-compose.yaml -p flexible-graphrag down

# View logs
docker-compose -f docker-compose.yaml -p flexible-graphrag logs -f

# Restart after configuration changes
docker-compose -f docker-compose.yaml -p flexible-graphrag down
# Edit docker-compose.yaml, docker.env, or includes/app-stack.yaml as needed
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Configuration:

Modular deployment: Comment/uncomment services in docker/docker-compose.yaml
Backend configuration (Scenario B): Backend uses flexible-graphrag/.env with docker/docker.env for Docker-specific overrides (like using service names instead of localhost). No configuration needed in app-stack.yaml

See docker/README.md for detailed Docker configuration.

🔧 Local Development Setup (Scenario A)

Note: Skip this entire section if using Scenario B (Full Stack in Docker).

Environment Configuration

Create environment file (cross-platform):

hljs language-bash

# Linux/macOS
cp flexible-graphrag/env-sample.txt flexible-graphrag/.env

# Windows Command Prompt  
copy flexible-graphrag\env-sample.txt flexible-graphrag\.env

Edit .env with your database credentials and API keys.

Python Backend Setup (Standalone)

Option A — Install from PyPI package (Quickstart)

hljs language-bash

# 1. Create and activate a virtual environment
uv venv venv-3.13 --python 3.13
venv-3.13\Scripts\Activate   # Windows
source venv-3.13/bin/activate  # Linux/macOS

# 2. Install flexible-graphrag
uv pip install flexible-graphrag

# 3. Optionally install ArcadeDB embedded mode support (includes bundled JVM, no Java install needed)
uv pip install arcadedb-embedded>=26.3.2

# 3a. Optional dependency groups, for example:
uv pip install "flexible-graphrag[langchain]"
# Other extras ([langchain-extras], [age-extras], overrides): see source README, Prerequisites > Optional.

# 4. Create .env from the sample (copy from the source repo or download env-sample.txt)
copy env-sample.txt .env   # Windows
cp env-sample.txt .env     # Linux/macOS
# Edit .env with your LLM API keys and database settings

# 5. Start your databases (docker compose or standalone)
docker compose -f docker/docker-compose.yml up -d

# 6. Run the backend
flexible-graphrag
# or: uv run start.py

Option B — Install from source (editable)

Navigate to the backend directory:
hljs language-bash
```
cd flexible-graphrag
```

Create and activate a virtual environment, then install in editable mode:

hljs language-bash

uv venv venv-3.13 --python 3.13
venv-3.13\Scripts\Activate   # Windows
source venv-3.13/bin/activate  # Linux/macOS
uv pip install -e .

# see flexible-graphrag/pyproject.toml for all options
# --- Optional: dependency groups from pyproject.toml [project.optional-dependencies] ---
# LangChain (peer framework; use overrides when combining with langchain-extras)
uv pip install -e ".[langchain]"
uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras]"
uv pip install --override extras-overrides.txt -e ".[langchain,langchain-extras,age-extras]"
python scripts/patch_langchain_age.py
uv pip install --override extras-overrides.txt -e ".[surrealdb-extras]"
uv pip install "surrealdb>=2.0" "langchain-core>=1.3"
uv pip install --override extras-overrides.txt -e ".[spanner-extras]"
uv pip uninstall llama-index

# RDF extras (base install already includes rdflib/pyoxigraph; use these if you need the named groups)
uv pip install -e ".[rdf]"
uv pip install -e ".[rdf-full]"

# Observability
uv pip install -e ".[observability]"
uv pip install -e ".[observability-openlit]"
uv pip install -e ".[observability-dual]"

# Development tests / tooling
uv pip install -e ".[dev]"

# Docling OCR backends (see DOCLING_OCR in env-sample)
uv pip install -e ".[docling-ocr-easyocr]"
uv pip install -e ".[docling-ocr-tesserocr]"
uv pip install -e ".[docling-ocr-ocrmac]"   # macOS only

# Embedded ArcadeDB (not a bracket extra; bundled JVM)
uv pip install arcadedb-embedded>=26.3.2

uv-managed venv (alternative): change managed = false to managed = true in pyproject.toml [tool.uv] section, then just run uv pip install -e ..

Create a .env file by copying the sample and customizing:
hljs language-bash
```
cp env-sample.txt .env   # Linux/macOS
copy env-sample.txt .env  # Windows
```
Edit .env with your specific configuration. See docs/GETTING-STARTED/ENVIRONMENT-CONFIGURATION.md for detailed setup guide.

Start the backend:

hljs language-bash

flexible-graphrag        # after uv pip install flexible-graphrag
# or: uv run start.py   # with source

The backend will be available at http://localhost:8000.

Frontend Setup (Standalone)

Standalone backend and frontend URLs:

Backend API: http://localhost:8000 (FastAPI server)
Angular: http://localhost:4200 (npm start)
React: http://localhost:5173 (npm run dev)
Vue: http://localhost:3000 (npm run dev)

Choose one of the following frontend options to work with:

React Frontend

Navigate to the React frontend directory:
hljs language-bash
```
cd flexible-graphrag-ui/frontend-react
```
Install Node.js dependencies (first time only):
hljs language-bash
```
npm install
```
Start the development server (uses Vite):
hljs language-bash
```
npm run dev
```

The React frontend will be available at http://localhost:5174.

Angular Frontend

Navigate to the Angular frontend directory:
hljs language-bash
```
cd flexible-graphrag-ui/frontend-angular
```
Install Node.js dependencies (first time only):
hljs language-bash
```
npm install
```
Start the development server (uses Angular CLI):
hljs language-bash
```
npm start
```

The Angular frontend will be available at http://localhost:4200.

Vue Frontend

Navigate to the Vue frontend directory:
hljs language-bash
```
cd flexible-graphrag-ui/frontend-vue
```
Install Node.js dependencies (first time only):
hljs language-bash
```
npm install
```
Start the development server (uses Vite):
hljs language-bash
```
npm run dev
```

The Vue frontend will be available at http://localhost:3000.

UI Usage

The system provides a tabbed interface for document processing and querying. Follow these steps in order. See docs/UI-GUIDE/UI-GUIDE.md for full details.

1. Sources Tab

Configure your data source and select files for processing. The system supports 13 data sources:

Detailed Configuration:

File Upload Data Source

Select: "File Upload" from the data source dropdown
Add Files:
- Drag & Drop: Drag files directly onto the upload area
- Click to Select: Click the upload area to open file selection dialog (supports multi-select)
- Note: If you drag & drop new files after selecting via dialog, only the dragged files will be used
Supported Formats: PDF, DOCX, XLSX, PPTX, TXT, MD, HTML, CSV, PNG, JPG, and more
Next Step: Click "CONFIGURE PROCESSING →" to proceed to Processing tab

Alfresco Repository

Select: "Alfresco Repository" from the data source dropdown
Configure:
- Alfresco Base URL (e.g., http://localhost:8080/alfresco)
- Username and password
- Path (e.g., /Sites/example/documentLibrary)
Next Step: Click "CONFIGURE PROCESSING →" to proceed to Processing tab

CMIS Repository

Select: "CMIS Repository" from the data source dropdown
Configure:
- CMIS Repository URL (e.g., http://localhost:8080/alfresco/api/-default-/public/cmis/versions/1.1/atom)
- Username and password
- Folder path (e.g., /Sites/example/documentLibrary)
Next Step: Click "CONFIGURE PROCESSING →" to proceed to Processing tab

All Data Sources (13 available):

Web Sources: Web Page, Wikipedia, YouTube
Cloud Storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Google Drive, Microsoft OneDrive
Enterprise Repositories: Alfresco, Microsoft SharePoint, Box, CMIS

See the Data Sources section for complete details on all 13 sources.

2. Processing Tab

Process your selected documents and monitor progress:

Start Processing: Click "START PROCESSING" to begin document ingestion
Monitor Progress: View real-time progress bars for each file
File Management:
- Use checkboxes to select files
- Click "REMOVE SELECTED (N)" to remove selected files from the list
- Note: This removes files from the processing queue, not from your system
Processing Pipeline: Documents are processed through Docling conversion, vector indexing, and knowledge graph creation

3. Search Tab

Perform searches on your processed documents:

Hybrid Search

Purpose: Find and rank the most relevant document excerpts
Usage: Enter search terms or phrases (e.g., "machine learning algorithms", "financial projections")
Action: Click "SEARCH" button
Results: Ranked list of document excerpts with relevance scores and source information
Best for: Research, fact-checking, finding specific information across documents

Q&A Query

Purpose: Get AI-generated answers to natural language questions
Usage: Enter natural language questions (e.g., "What are the main findings in the research papers?")
Action: Click "ASK" button
Results: AI-generated narrative answers that synthesize information from multiple documents
Best for: Summarization, analysis, getting overviews of complex topics

4. Chat Tab

Interactive conversational interface for document Q&A:

Chat Interface:
- Your Questions: Displayed on the right side vertically
- AI Answers: Displayed on the left side vertically
Usage: Type questions and press Enter or click send
Conversation History: All questions and answers are preserved in the chat history
Clear History: Click "CLEAR HISTORY" button to start a new conversation
Best for: Iterative questioning, follow-up queries, conversational document exploration

Testing Cleanup

Between tests you can clean up data:

Run cleanup.py: Clears vector, graph, and search indexes in one step — run from the flexible-graphrag directory
Vector Indexes: See docs/DATABASES/VECTOR-DATABASES/VECTOR-DIMENSIONS.md for vector database cleanup instructions
Graph Data: See docs/DATABASES/GRAPH-DATABASES/README-neo4j.md for graph-related cleanup commands

MCP Server Setup (Quickstart)

The MCP server (flexible-graphrag-mcp) is a lightweight standalone package that connects MCP clients (Claude Desktop, Cursor, etc.) to the Flexible GraphRAG backend via its REST API.

Steps

First terminal — install and run the flexible-graphrag backend (see Python Backend Setup above) — it must be running on http://localhost:8000.

Second terminal — install and start the MCP server in HTTP mode:

hljs language-bash

uv venv venv-mcp --python 3.13
venv-mcp\Scripts\Activate   # Windows
source venv-mcp/bin/activate  # Linux/macOS
uv pip install flexible-graphrag-mcp
flexible-graphrag-mcp --http --port 3001

Third terminal — test with MCP Inspector:
hljs language-bash
```
npx @modelcontextprotocol/inspector
```
Open the URL printed in the console (token pre-filled), set transport to Streamable HTTP, URL to http://localhost:3001/mcp, then click Connect.
Use with Claude Desktop and other MCP clients — see flexible-graphrag-mcp/README.md for stdio transport config and client-specific setup.

MCP Tools for Claude Desktop and Other MCP Clients

The MCP server provides 9 specialized tools for document intelligence workflows:

Tool	Purpose	Usage
`get_system_status()`	System health and configuration	Verify setup and database connections
`ingest_documents()`	Bulk document processing	All sources support `skip_graph`; filesystem/Alfresco/CMIS use `paths`; Alfresco also supports `nodeDetails` list (13 sources have their own config: filesystem, repositories (Alfresco, SharePoint, Box, CMIS), cloud storage, web)
`ingest_text(content, source_name)`	Custom text analysis	Analyze specific text content
`search_documents(query, top_k)`	Hybrid document retrieval	Find relevant document excerpts
`query_documents(query, top_k)`	AI-powered Q&A	Generate answers from document corpus
`test_with_sample()`	System verification	Quick test with sample content
`check_processing_status(id)`	Async operation monitoring	Track long-running ingestion tasks
`get_python_info()`	Environment diagnostics	Debug Python environment issues
`health_check()`	Backend connectivity	Verify API server connection

Client Support

Claude Desktop and other MCP clients: Native MCP integration with stdio transport
MCP Inspector: HTTP transport for debugging and development
Multiple Installation: pipx (system-wide) or uvx (no-install) options

Backend REST API

The FastAPI backend provides the following REST API endpoints:

Base URL: http://localhost:8000/api/

System

Endpoint	Method	Purpose
`/api/health`	GET	Health check — verify backend is running
`/api/status`	GET	System status and configuration (databases, LLM, feature flags)
`/api/info`	GET	System information and package versions
`/api/python-info`	GET	Python environment diagnostics

Ingestion

Endpoint	Method	Purpose
`/api/ingest`	POST	Ingest documents from a data source (`filesystem`, `s3`, `web`, `cmis`, ...)
`/api/upload`	POST	Upload files directly for processing
`/api/ingest-text`	POST	Ingest raw text content
`/api/test-sample`	POST	Test the system with built-in sample content
`/api/cleanup-uploads`	POST	Remove temporarily uploaded files

Async Processing

Endpoint	Method	Purpose
`/api/processing-status/{id}`	GET	Poll status of an async ingestion operation
`/api/processing-events/{id}`	GET	Server-Sent Events stream for real-time progress
`/api/cancel-processing/{id}`	POST	Cancel an ongoing processing operation

Search & Query

Endpoint	Method	Purpose
`/api/search`	POST	Hybrid search — returns ranked document excerpts
`/api/query`	POST	AI-powered Q&A — generates an answer from the document corpus

Graph

Endpoint	Method	Purpose
`/api/graph`	GET	Graph database status and node/relationship counts (Neo4j: live Cypher counts; other LC-backed stores: counts via `lc_graph.query()` where supported; remaining stores: status + dashboard URL)
`/api/graph/query`	POST	Execute a native graph query against the configured store — Cypher (Neo4j, Memgraph, FalkorDB, ArcadeDB, Ladybug, Apache AGE), AQL (ArangoDB), SurrealQL (SurrealDB), Gremlin (Cosmos), GSQL (TigerGraph), openCypher (Neptune/Analytics), GQL (Spanner), SPARQL fallback for RDF-only

RDF / Ontology (when RDF_GRAPH_DB is configured)

Endpoint	Method	Purpose
`/api/rdf/query/sparql`	POST	Execute a SPARQL query against the configured RDF store
`/api/rdf/ontology/info`	GET	Return loaded ontology entity and relation type lists
`/api/rdf/ontology/upload`	POST	Upload a new ontology file at runtime
`/api/rdf/rdf-store/list`	GET	List registered RDF stores
`/api/rdf/rdf-store/connect`	POST	Register an additional RDF store at runtime
`/api/rdf/rdf-store/{name}`	DELETE	Deregister an RDF store
`/api/rdf/export/rdf`	POST	Export knowledge graph as RDF (501 stub — not yet implemented)

Interactive API Documentation (requires running backend):

UI	URL	Notes
Swagger UI	http://localhost:8000/docs	Try endpoints, inspect schemas, submit requests
ReDoc	http://localhost:8000/redoc	Cleaner read-only reference view

See docs/DEVELOPER/REST-API.md for the full endpoint reference with request/response examples.

Full-Stack Debugging (Standalone Mode)

VS Code launch configurations, backend/frontend debugging, log levels, and MCP Inspector setup — see docs/DEVELOPER/DEVELOPER-FULL-STACK-DEBUGGING.md.

Observability and Monitoring

Flexible GraphRAG includes comprehensive observability features for production monitoring:

OpenTelemetry Integration: Industry-standard instrumentation with automatic LlamaIndex tracing
Distributed Tracing: Jaeger UI for visualizing complete request flows
Metrics Collection: Prometheus for RAG-specific metrics (retrieval/LLM latency, token usage, entity/relation counts)
Visualization: Grafana dashboards with pre-configured RAG metrics panels
Dual Mode Support: OpenInference (LlamaIndex) + OpenLIT (optional) as dual OTLP producers
Custom Instrumentation: Decorators for adding tracing to custom code

Quick Start

Install observability dependencies (optional):

hljs language-bash

cd flexible-graphrag
uv pip install -e ".[observability-dual]"  # OpenInference (LlamaIndex + LangChain) + OpenLIT (recommended)
# Or combine with dev tools: uv pip install -e ".[observability-dual,dev]"

Enable in .env:

hljs language-bash

ENABLE_OBSERVABILITY=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
OBSERVABILITY_BACKEND=both  # openinference, openlit, or both (recommended)

Start observability stack:

hljs language-bash

cd docker
# Uncomment observability.yaml in docker-compose.yaml first
docker-compose -f docker-compose.yaml -p flexible-graphrag up -d

Access dashboards:
- Grafana: http://localhost:3009 (admin/admin) - RAG metrics dashboards
- Jaeger: http://localhost:16686 - Distributed tracing
- Prometheus: http://localhost:9090 - Raw metrics

See docs/DEVELOPER/OBSERVABILITY/OBSERVABILITY.md for complete setup, custom instrumentation, and production best practices.

Project Structure

/flexible-graphrag: Python FastAPI backend
- main.py: FastAPI REST API server
- backend.py: Shared business logic used by both API and MCP
- config.py: Configurable settings for data sources, databases, and LLM providers
- factories.py: Factory classes for LLM and database creation
- hybrid_system.py: Main hybrid search and ingestion system
- post_ingestion_state.py: Post-ingestion document state tracking
- query_engine.py: Query engine with result deduplication and re-scoring
- retriever_setup.py: Retriever assembly — vector, search, graph, RDF, synonym expansion
- schema_manager.py: Database schema management
- adapters/: Framework-neutral ABCs and factories for all subsystems
  - adapters/graph/: Property graph and RDF store adapter ABCs
  - adapters/llm/: LLM and embedding adapter ABCs (BothLLMAdapter, BothEmbeddingAdapter)
  - adapters/process/: Chunker and KG extractor ABCs and build_* factories
  - adapters/search/: Search store adapter ABC
  - adapters/vector/: Vector store adapter ABC
- incremental_updates/: Auto-sync engine — detectors, orchestrator, state manager for real-time/near-real-time source sync
- ingest/: Modular ingestion steps — ingest_from_files, ingest_from_text, ingest_from_source, run_chunk_pipeline, update_pg_graph, update_rdf_graph, update_vector, update_search
- langchain/: LangChain peer framework — graph, vector, search, chunking, KG extraction, retrieval
  - langchain/graph/pg_store_adapters/: 15 property graph store adapters (one file per store)
  - langchain/graph/rdf_store_adapters/: 4 RDF/SPARQL store adapters (Fuseki, GraphDB, Oxigraph, Neptune)
  - langchain/graph/retrievers/: li_/lc_ two-layer retriever classes — text-to-query, neighborhood, vector, logging, synonym
  - langchain/llm/: LangChain LLM + embedding factories for all 13 providers
  - langchain/process/: LangChainChunkerAdapter (6 splitter types), LangChainKGExtractorAdapter
  - langchain/search/adapters/: BM25, Elasticsearch, OpenSearch search adapters
  - langchain/vector/adapters/: 10 vector store adapters
- llamaindex/: LlamaIndex peer framework — graph, vector, search, chunking, KG extraction
  - llamaindex/graph/adapters/: LlamaIndex property graph store adapters (Neo4j, ArcadeDB, FalkorDB, Memgraph, Nebula, Neptune, etc.)
  - llamaindex/llm/: LlamaIndex LLM + embedding factories for all 13 providers
  - llamaindex/process/: LlamaIndexChunkerAdapter, LlamaIndexKGExtractorAdapter
  - llamaindex/search/adapters/: Elasticsearch, OpenSearch search adapters
  - llamaindex/vector/adapters/: Qdrant, Elasticsearch, OpenSearch, pgvector, Chroma, and others
- observability/: OpenTelemetry instrumentation, Prometheus metrics, tracing setup
- process/: Core document processing — document_processor.py (Docling/LlamaParse), kg_extractor.py, node_pipeline.py
- rdf/: RDF/ontology support — ontology manager, KG-to-RDF converter, SPARQL tools, bundled schemas (rdf/schemas/)
  - rdf/store/: RDF store adapters — Fuseki, GraphDB, Oxigraph, store factory
- sources/: Data source connectors — filesystem, CMIS/Alfresco, Azure Blob, S3, GCS, OneDrive, SharePoint, Google Drive, Box, web, Wikipedia, YouTube, etc.
- stores/: Index managers — index_manager.py, rdf_manager.py
- pyproject.toml: Modern Python package definition (PEP 517/518)
- uv.toml: UV package manager configuration
- start.py: Startup script (flexible-graphrag console entry point)
- install.py: Installation helper script
/flexible-graphrag-mcp: Standalone MCP server
- main.py: HTTP-based MCP server (calls REST API)
- pyproject.toml: MCP package definition with minimal dependencies
- README.md: MCP server setup and installation instructions
- QUICK-USAGE-GUIDE.md: Quick usage guide
- Lightweight: Only 4 dependencies (fastmcp, nest-asyncio, httpx, python-dotenv)
/flexible-graphrag-ui: Frontend applications
- /frontend-react: React + TypeScript frontend (built with Vite)
  - /src: Source code
  - vite.config.ts: Vite configuration
  - tsconfig.json: TypeScript configuration
  - package.json: Node.js dependencies and scripts
- /frontend-angular: Angular + TypeScript frontend (built with Angular CLI)
  - /src: Source code
  - angular.json: Angular configuration
  - tsconfig.json: TypeScript configuration
  - package.json: Node.js dependencies and scripts
- /frontend-vue: Vue + TypeScript frontend (built with Vite)
  - /src: Source code
  - vite.config.ts: Vite configuration
  - tsconfig.json: TypeScript configuration
  - package.json: Node.js dependencies and scripts
/docker: Docker infrastructure
- docker-compose.yaml: Main compose file with modular includes
- /includes: Modular database and service configurations
- /nginx: Reverse proxy configuration
- README.md: Docker deployment documentation
/docs: Documentation
- ARCHITECTURE.md: System architecture and component relationships
- DEPLOYMENT-CONFIGURATIONS.md: Standalone, hybrid, and full Docker deployment guides
- DOCKER-RESOURCE-CONFIGURATION.md: Docker memory/CPU configuration for Windows (WSL2), macOS, and Linux — essential for running the full stack, especially with vLLM
- ENVIRONMENT-CONFIGURATION.md: Environment setup guide with database switching
- POSTGRES-SETUP.md: PostgreSQL setup for pgvector and incremental state management
- SCHEMA-EXAMPLES.md: Knowledge graph schema examples
- PERFORMANCE.md: Performance benchmarks and optimization guides
- DEFAULT-USERNAMES-PASSWORDS.md: Database credentials and dashboard access
- PORT-MAPPINGS.md: Complete port reference for all services
- DATA-SOURCES/: Data source setup guides (Azure Blob, S3, GCS, Alfresco etc.)
- DOC-PROCESSING/: Document processing guides (Docling GPU, parser output)
- GRAPH-DATABASES/: Graph database guides (Neo4j, Neptune, Nebula, ArcadeDB, etc.)
- INCREMENTAL-UPDATE-AUTO-SYNC/: Incremental updates documentation (README, QUICKSTART, SETUP-GUIDE, API-REFERENCE)
- LLM/: LLM and embedding configuration guides
- LANGCHAIN/: LangChain integration guides (RDF QA fusion, graph retriever setup, adapter reference)
- OBSERVABILITY/: Observability and monitoring guides
- RDF/: RDF/ontology guides (store setup, ontology config, ingestion modes, SPARQL examples, user guide)
- VECTOR-DATABASES/: Vector database guides (dimensions, integration, Chroma modes)
/scripts: Utility scripts
- create_opensearch_pipeline.py: OpenSearch hybrid search pipeline setup
- setup-opensearch-pipeline.sh/.bat: Cross-platform pipeline creation
- rdf_cleanup.py: RDF store CLI tool — list-docs, count, clear-doc, clear-all
- litellm_config.yaml: Sample LiteLLM proxy config (copy to your LiteLLM install dir)
- /incremental: Incremental updates control scripts
  - sync-now.sh/.ps1/.bat: Trigger immediate synchronization
  - set-refresh-interval.sh/.ps1/.bat: Configure polling interval
  - README.md: Script usage documentation
/tests: Test suite
- test_bm25_*.py: BM25 configuration and integration tests
- conftest.py: Test configuration and fixtures
- run_tests.py: Test runner
/examples: Standalone usage examples (not re-tested)
- observability_example.py: OpenTelemetry / observability integration example
- /rdf: RDF/ontology examples
  - sparql_examples.py: Sample SPARQL queries for all three stores
  - unified_query_engine_examples.py: UnifiedQueryEngine usage examples
  - store_index_example.py: Build a LlamaIndex from an RDF store
  - ontology_guided_ingestion_example.py: OntologyAwarePropertyGraphBuilder usage
  - ingest_with_ontology.py: Ontology-guided ingestion example class
  - rdf_export_import_examples.py: RDF export/import patterns
  - config_rdf_stores.py: RDF store config reference snippets

License

This project is licensed under the terms of the Apache License 2.0. See the LICENSE file for details.

flexible-graphrag

Flexible GraphRAG

v0.6.0 in brief

Features

Frontend Screenshots

Angular Frontend - Tabbed Interface

React Frontend - Tabbed Interface

Vue Frontend - Tabbed Interface

System Components

FastAPI Backend (/flexible-graphrag)

MCP Server (/flexible-graphrag-mcp)

UI Clients (/flexible-graphrag-ui)

Docker Infrastructure (/docker)

Data Sources

File & Upload Sources

Cloud Storage Sources

Enterprise Repository Sources

Web Sources

Incremental Updates & Auto-Sync

Document Processing Options

Supported File Formats

Document Formats

Image Formats

Audio Formats

Processing Intelligence

Database Configuration

Search Databases (Full-Text Search)

Vector Databases (Semantic Search)

Supported Vector Databases

RAG without GraphRAG

Property Graph Databases (Knowledge Graph / GraphRAG)

Ontology and RDF Support

Framework Configuration

LLM and Embedding Configuration

Supported LLM Providers

LLM Provider Configuration

Embedding Configuration

Ollama Configuration

Prerequisites

Required

Install

Optional (see flexible-graphrag/pyproject.toml for all options)

Setup

🐳 Docker Deployment

Scenario A: Databases in Docker, App Standalone (Hybrid)

Scenario B: Full Stack in Docker (Complete)

Other Deployment Scenarios

Docker Control and Configuration

🔧 Local Development Setup (Scenario A)

Environment Configuration

Python Backend Setup (Standalone)

Option A — Install from PyPI package (Quickstart)

Option B — Install from source (editable)

Frontend Setup (Standalone)

React Frontend

Angular Frontend

Vue Frontend

UI Usage

1. Sources Tab

File Upload Data Source

Alfresco Repository

CMIS Repository

2. Processing Tab

3. Search Tab

Hybrid Search

Q&A Query

4. Chat Tab

Testing Cleanup

MCP Server Setup (Quickstart)

Steps

MCP Tools for Claude Desktop and Other MCP Clients

Client Support

Backend REST API

Full-Stack Debugging (Standalone Mode)

Observability and Monitoring

Quick Start

Project Structure

License

Similar Packages

flexible-graphrag

FastAPI Backend (`/flexible-graphrag`)

MCP Server (`/flexible-graphrag-mcp`)

UI Clients (`/flexible-graphrag-ui`)

Docker Infrastructure (`/docker`)

FastAPI Backend (`/flexible-graphrag`)

MCP Server (`/flexible-graphrag-mcp`)

UI Clients (`/flexible-graphrag-ui`)

Docker Infrastructure (`/docker`)