A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
BGI tries to group code based on what the code actually does (its behavior), not just which file imports what.
BGI is a static architecture analysis tool for large codebases.
It groups code units by behavioral role and emits explicit architectural boundaries.
Project domain: bigindexer.com
Big Indexer is published in the MCP Registry as io.github.ahmedxuhri/bigindexer.
pip install bigindexer==0.1.3
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
Validation: https://bigindexer.com/validation
Most architecture graphs fail at scale in two ways:
BGI is built to keep both under control, so the output remains usable on large repos.
bgi-graph.json, fuse-graph.json) plus optional human context (bigindexer.md).task_fingerprint, behavioral_twins, twin_context) ground prompts in in-repo behavior patterns.ahmedxuhri/bigindexer-pr-risk-bot to auto-comment PRs with blast radius, seams, and risk hints.Run BGI on the included fixture repo:
git clone https://github.com/ahmedxuhri/bigindexer
cd bigindexer
pip install -e .
bgi scan tests/fixtures --lang python --out /tmp/bgi-example.json
head -50 /tmp/bgi-example.json
Observed result on this repository:
121426 unitsOne produced edge looks like:
{
"source": "auth_module.py::AuthService::__init__",
"target": "auth_module.py::AuthService::__del__",
"key": "COV.INIT",
"lock": "COV.TEARDOWN",
"type": "HARD"
}
Why this matters: instead of raw syntax references only, you get behavioral relationships plus cluster structure that can drive architecture decisions.
| BGI term | Plain meaning |
|---|---|
| COV token | A behavior label for a unit (for example: FETCH, PERSIST, AUTHENTICATE) |
| Key-Lock edge | A behavioral connection between two units with complementary roles |
| DRS cluster | A unit-level grouping by behavioral role. Mostly intra-file in practice. File-level architectural components are better expressed via the BGI edge graph or the fuse-graph boundary signal — see external benchmark |
| Fuse edge / fuse event | A refused merge because cluster growth hit the cap; treated as boundary signal |
| Spectral masks | Scope rules that limit where matching is allowed (global, directory, file) |
Source files
->
Gate 1: fingerprint unit behavior (COV tokens)
->
Gate 2: create behavioral edges with scoped matching
->
Gate 3: cluster with hard size cap + boundary emission
->
Artifacts: bgi-graph.json, fuse-graph.json, bigindexer.md, optional routes/graphml/html
Core approach:
.scm - single-pass query extraction path in Gate 1.| Capability | LSP / SCIP index | Call-graph + generic community detection | BGI |
|---|---|---|---|
| Fast symbol lookup | Strong | Medium | Available (Phase 6 index) |
| Behavioral token model | No | Usually no | Yes |
| Hard-bounded clustering | No | Usually no | Yes (unit-level) |
| First-class boundary artifact | No | Usually no | Yes (fuse-graph.json) |
| Scope-constrained edge generation | Limited | Rare | Yes (spectral masks) |
External head-to-head benchmark (Louvain on BGI's edges vs Louvain on raw imports, scored against package layout): BGI's edges win on Python (django F1 0.38 vs 0.29, MoJoFM 0.45 vs 0.34) and currently tie/lose on Go due to lower cross-file edge density on tier-2 scanners. Full results and methodology in docs/VALIDATION_EVIDENCE.md.
Comparable kubernetes sample (go comparable mode, 162,917 units):
141.964s67.261s (historical comparable baseline: 138.869s)9.359s218.584s1.113%0Artifact: output/validation/kubernetes-optionb-controlled-median-v21.json
tests/test_gate2.py).tests/test_gate3.py).python3 -m pytest tests/ -x -q (project baseline target remains passing).task → COV → top-3 twins + seam + rubric) is complete: actionability 4.75/5 (p04 slice: 4.8/5), boundary 1.0, hallucinations 0.django/p02 miss.BGI does not treat all languages equally; support is tiered:
.scm): python, typescript, tsx, javascript, go, rust, java, csharp, php, ruby, kotlin, scalac, lua, elixirswift, r, dart, bash, nim, zig, haskell, ocaml, fsharp, clojure, erlang, matlab, vb, crystal, cobol, groovyUse this as a reliability signal: query-backed and dedicated scanner tiers are stronger than generic fallback.
Cross-file edge density caveat: the language tiers above describe parser quality. A separate axis is cross-file behavioral edge density — how many key-lock pairs the scanner produces that link units in different files. Tier-1 (.scm-backed) languages produce dense cross-file edges. Tier-2 scanner-backed languages currently produce sparser cross-file edges because their token mix is dominated by structural tokens (INTAKE/OUTPUT/CONDITIONAL/LOOP) that gate-2 deliberately scopes to same-file to prevent O(N²) noise. The user-visible MCP product (boundary detection, twin retrieval, AI-assistant context) still works on tier-2 languages — see the validation evidence — but cluster-recovery benchmarks against import-graph baselines reflect this density gap. Concrete numbers in docs/VALIDATION_EVIDENCE.md.
pip install -e .
# scan
bgi scan /path/to/repo --lang auto --out bgi-graph.json
# optional outputs
bgi scan /path/to/repo --lang auto \
--fuse-graph fuse-graph.json \
--routes routes.json \
--graphml graph.graphml \
--html
# incremental
bgi scan /path/to/repo --lang auto --incremental --cache .bgi-cache.json
# diff
bgi diff /path/before /path/after --lang auto --out diff.json
# run MCP server over generated artifacts
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
Example MCP usage pattern (from your client prompt):
Use MCP tool twin_context for:
"Add endpoint that validates input and persists data."
Return top twin candidate, seam suggestion, and rubric checklist.
BGI ships with opt-in, off-by-default anonymous telemetry. To enable:
export BGI_TELEMETRY=1
bgi mcp --graph bgi-graph.json --fuse-graph fuse-graph.json
What's collected when enabled: BGI version, OS, repo size bucket, and a 12-char hash of your repo's git remote (so we can deduplicate "same repo seen twice" without ever knowing which repo). What's never collected: file paths, source code, repo names, user identity, or IP addresses. Full schema and disable instructions in docs/TELEMETRY.md.
MEMORANDUM.md - design contracts and invariantsdocs/LANGUAGE_SUPPORT.md - language implementation detailsdocs/CONTRIBUTING_LANGUAGES.md - language contribution guidedocs/INDEX_SCHEMA.md - interactive index schemadocs/QUERY_PLANNER.md - query planner scoringdocs/MCP_SETUP.md - MCP server setup and usagedocs/MCP_WITH_CONTINUE.md - 5-minute Continue + BGI walkthroughdocs/TELEMETRY.md - opt-in telemetry: what we collect and how to disablehttps://bigindexer.com/validation - public validation evidencedocs/MCP_QUICKSTART_DEMO.md - 5-minute demo walkthroughdocs/MCP_EXAMPLE_TRANSCRIPTS.md - real-world MCP tool invocation examplesdocs/MCP_REAL_TRANSCRIPT.md - unedited transcript from FastAPI analysisscripts/mcp-demo.sh - automated demo script for multiple CLIs and repositoriesLICENSE)DCO) enforced on pull requestsRun Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
Secure MCP server for MySQL database interaction, queries, and schema management
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F