mcp-condenser

MCP proxy that condenses verbose JSON, YAML, XML, and CSV tool responses into compact TOON text, dramatically reducing token usage for API outputs that return many records with the same nested schema — pod listings, node status tables, cloud resource inventories, and similar.

How it works

Flatten nested objects into dot-notation keys (spec.containers.0.image).
Tabulate lists of same-shaped records into compact TOON tables.
Condense columns that repeat the same signal: zero-values, nulls, constants, and timestamps within the same 60-second window are summarized once rather than repeated per row.
Group related keys (e.g. requests.cpu, requests.memory) into compact combined columns.

The result is a human-readable, LLM-friendly text representation that typically achieves 25-68% token reduction on real-world API responses while maintaining or improving LLM comprehension.

Quick start

hljs language-bash

docker run -p 9000:9000 \
  -e UPSTREAM_MCP_URL=http://host.docker.internal:8080/mcp \
  teriyakichild/mcp-condenser:latest

Point your MCP client at http://localhost:9000/mcp.

host.docker.internal resolves to the host machine on Docker Desktop (macOS/Windows). On Linux, add --add-host=host.docker.internal:host-gateway or use the upstream's real address.

MCP proxy usage

To run from source instead of Docker:

Single upstream (env vars)

Set UPSTREAM_MCP_URL and optionally tune behavior with the environment variables listed in the configuration reference below.

hljs language-bash

UPSTREAM_MCP_URL=http://localhost:8080/mcp uv run mcp-condenser-proxy

Multi-upstream (config file)

Aggregate multiple MCP servers behind one endpoint using a JSON config file. Each server block specifies its URL, which tools to expose, per-server condensing toggles, and authentication headers (static or forwarded from the client).

hljs language-bash

CONDENSER_CONFIG=config.json uv run mcp-condenser-proxy

Tool names are prefixed with the server name by default (e.g. k8s_get_pods). Set "prefix_tools": false in the global section to disable prefixing.

See examples/docker-compose/config.json for a complete multi-upstream config example.

Docker Compose

Ready-to-use Compose files for single- and multi-upstream modes are in examples/docker-compose/.

Helm

A Helm chart is included under helm/mcp-condenser/:

hljs language-bash

helm install mcp-condenser ./helm/mcp-condenser \
  --set config.upstreamMcpUrl=http://upstream:8080/mcp

See examples/helm/ for values files and a Helmfile example, and helm/mcp-condenser/values.yaml for all configurable chart values.

Configuration reference

See docs/CONFIGURATION.md for the full reference, including all environment variables, config file schema, condensing heuristics, and Helm chart values.

Quick links:

Single-upstream mode (env vars)
Multi-upstream mode (config file)
Condensing heuristics
Helm chart values

CLI usage

Condense a JSON, YAML, XML, or CSV file directly:

hljs language-bash

uv run mcp-condenser input.json
uv run mcp-condenser input.yaml
uv run mcp-condenser deployments.xml
uv run mcp-condenser metrics.csv
cat pods.yaml | uv run mcp-condenser

Benchmark results

Token reduction

Measured across Kubernetes, AWS, database, monitoring, logging, and CDN fixtures using tiktoken/cl100k_base:

Fixture	Domain	Raw tokens	TOON tokens	Reduction
K8s 16-pod node	Kubernetes	9,876	3,656	63.0%
K8s 6-pod node	Kubernetes	15,285	5,919	61.3%
K8s 30-pod node	Kubernetes	69,885	22,229	68.2%
EC2 instances	AWS	33,498	14,645	56.3%
SQL orders	Database	26,165	11,298	56.8%
Deploy inventory	DevOps (XML)	1,928	664	65.6%
Server metrics	Infra (CSV)	959	994	-3.6%
App performance	APM (CSV)	1,760	1,535	12.8%
Prometheus query	Monitoring	3,083	2,292	25.7%
Elasticsearch logs	Logging	6,489	3,468	46.6%
Istio VirtualServices	Kubernetes	4,197	3,141	25.2%
Access logs	CDN/LB (JSONL)	7,158	4,582	36.0%

Compression is domain-agnostic: Kubernetes pod listings, AWS EC2 describe-instances responses, SQL result sets, Prometheus time-series, Elasticsearch log queries, and XML/CSV/JSONL responses all benefit, with reductions ranging from 25% to 68%.

Note on CSV: CSV is already a tabular format, so TOON condensation adds minimal overhead rather than saving tokens. The value of CSV support is parsing and type inference — the condenser auto-detects CSV/TSV input, converts strings to native types (int, float, null), and feeds the result through the same heuristic pipeline. This means CSV responses still benefit from column elision (zero-only, null-only, constant columns) when those patterns are present. For maximum savings, prefer format_hint: "json" or "xml" on tools whose upstream supports multiple output formats.

LLM accuracy

Run the accuracy benchmark against a local Ollama instance to verify TOON preserves answer quality:

hljs language-bash

# Single model
uv run python benchmarks/accuracy.py --model qwen3:4b --host http://localhost:11434

# Multi-model matrix (generates markdown tables)
uv run python benchmarks/matrix.py --host http://localhost:11434

The benchmark suite tests 120 questions across 7 fixtures (Kubernetes, AWS EC2, SQL, CSV, XML) covering direct lookups, cross-reference queries, aggregations, and multi-hop reasoning.

Local models: context window enablement

Small context windows (8K-64K) common with local models can't fit large API responses in raw form. TOON condensing brings them within reach.

Fixture	Raw tok	TOON tok	8K	16K	32K	64K	128K
K8s 16-pod node	9,876	3,656	Neither	TOON only	Raw + TOON	Raw + TOON	Raw + TOON
K8s 6-pod node	15,285	5,919	Neither	Neither	TOON only	Raw + TOON	Raw + TOON
EC2 instances	33,498	4,386	Neither	TOON only	TOON only	TOON only	Raw + TOON
SQL orders	26,165	11,298	Neither	Neither	Neither	TOON only	Raw + TOON
K8s 30-pod node	69,885	22,229	Neither	Neither	Neither	Neither	TOON only

Run the token reduction tests (no Ollama required):

hljs language-bash

uv run pytest tests/test_benchmark.py -v -s

Development

hljs language-bash

uv sync
uv run pytest tests/ -v

License

Apache-2.0 — see LICENSE.

mcp-condenser

How it works

Flatten nested objects into dot-notation keys (spec.containers.0.image).
Tabulate lists of same-shaped records into compact TOON tables.
Condense columns that repeat the same signal: zero-values, nulls, constants, and timestamps within the same 60-second window are summarized once rather than repeated per row.
Group related keys (e.g. requests.cpu, requests.memory) into compact combined columns.

The result is a human-readable, LLM-friendly text representation that typically achieves 25-68% token reduction on real-world API responses while maintaining or improving LLM comprehension.

Quick start

hljs language-bash

docker run -p 9000:9000 \
  -e UPSTREAM_MCP_URL=http://host.docker.internal:8080/mcp \
  teriyakichild/mcp-condenser:latest

Point your MCP client at http://localhost:9000/mcp.

host.docker.internal resolves to the host machine on Docker Desktop (macOS/Windows). On Linux, add --add-host=host.docker.internal:host-gateway or use the upstream's real address.

MCP proxy usage

To run from source instead of Docker:

Single upstream (env vars)

Set UPSTREAM_MCP_URL and optionally tune behavior with the environment variables listed in the configuration reference below.

hljs language-bash

UPSTREAM_MCP_URL=http://localhost:8080/mcp uv run mcp-condenser-proxy

Multi-upstream (config file)

hljs language-bash

CONDENSER_CONFIG=config.json uv run mcp-condenser-proxy

Tool names are prefixed with the server name by default (e.g. k8s_get_pods). Set "prefix_tools": false in the global section to disable prefixing.

See examples/docker-compose/config.json for a complete multi-upstream config example.

Docker Compose

Ready-to-use Compose files for single- and multi-upstream modes are in examples/docker-compose/.

Helm

A Helm chart is included under helm/mcp-condenser/:

hljs language-bash

helm install mcp-condenser ./helm/mcp-condenser \
  --set config.upstreamMcpUrl=http://upstream:8080/mcp

See examples/helm/ for values files and a Helmfile example, and helm/mcp-condenser/values.yaml for all configurable chart values.

Configuration reference

See docs/CONFIGURATION.md for the full reference, including all environment variables, config file schema, condensing heuristics, and Helm chart values.

Quick links:

Single-upstream mode (env vars)
Multi-upstream mode (config file)
Condensing heuristics
Helm chart values

CLI usage

Condense a JSON, YAML, XML, or CSV file directly:

hljs language-bash

uv run mcp-condenser input.json
uv run mcp-condenser input.yaml
uv run mcp-condenser deployments.xml
uv run mcp-condenser metrics.csv
cat pods.yaml | uv run mcp-condenser

Benchmark results

Token reduction

Measured across Kubernetes, AWS, database, monitoring, logging, and CDN fixtures using tiktoken/cl100k_base:

Fixture	Domain	Raw tokens	TOON tokens	Reduction
K8s 16-pod node	Kubernetes	9,876	3,656	63.0%
K8s 6-pod node	Kubernetes	15,285	5,919	61.3%
K8s 30-pod node	Kubernetes	69,885	22,229	68.2%
EC2 instances	AWS	33,498	14,645	56.3%
SQL orders	Database	26,165	11,298	56.8%
Deploy inventory	DevOps (XML)	1,928	664	65.6%
Server metrics	Infra (CSV)	959	994	-3.6%
App performance	APM (CSV)	1,760	1,535	12.8%
Prometheus query	Monitoring	3,083	2,292	25.7%
Elasticsearch logs	Logging	6,489	3,468	46.6%
Istio VirtualServices	Kubernetes	4,197	3,141	25.2%
Access logs	CDN/LB (JSONL)	7,158	4,582	36.0%

Note on CSV: CSV is already a tabular format, so TOON condensation adds minimal overhead rather than saving tokens. The value of CSV support is parsing and type inference — the condenser auto-detects CSV/TSV input, converts strings to native types (int, float, null), and feeds the result through the same heuristic pipeline. This means CSV responses still benefit from column elision (zero-only, null-only, constant columns) when those patterns are present. For maximum savings, prefer format_hint: "json" or "xml" on tools whose upstream supports multiple output formats.

LLM accuracy

Run the accuracy benchmark against a local Ollama instance to verify TOON preserves answer quality:

hljs language-bash

# Single model
uv run python benchmarks/accuracy.py --model qwen3:4b --host http://localhost:11434

# Multi-model matrix (generates markdown tables)
uv run python benchmarks/matrix.py --host http://localhost:11434

The benchmark suite tests 120 questions across 7 fixtures (Kubernetes, AWS EC2, SQL, CSV, XML) covering direct lookups, cross-reference queries, aggregations, and multi-hop reasoning.

Local models: context window enablement

Small context windows (8K-64K) common with local models can't fit large API responses in raw form. TOON condensing brings them within reach.

Fixture	Raw tok	TOON tok	8K	16K	32K	64K	128K
K8s 16-pod node	9,876	3,656	Neither	TOON only	Raw + TOON	Raw + TOON	Raw + TOON
K8s 6-pod node	15,285	5,919	Neither	Neither	TOON only	Raw + TOON	Raw + TOON
EC2 instances	33,498	4,386	Neither	TOON only	TOON only	TOON only	Raw + TOON
SQL orders	26,165	11,298	Neither	Neither	Neither	TOON only	Raw + TOON
K8s 30-pod node	69,885	22,229	Neither	Neither	Neither	Neither	TOON only

Run the token reduction tests (no Ollama required):

hljs language-bash

uv run pytest tests/test_benchmark.py -v -s

Development

hljs language-bash

uv sync
uv run pytest tests/ -v

License

Apache-2.0 — see LICENSE.

mcp-condenser

mcp-condenser

How it works

Quick start

MCP proxy usage

Single upstream (env vars)

Multi-upstream (config file)

Docker Compose

Helm

Configuration reference

CLI usage

Benchmark results

Token reduction

LLM accuracy

Local models: context window enablement

Development

License

Similar Packages

mcp-condenser

mcp-condenser

How it works

Quick start

MCP proxy usage

Single upstream (env vars)

Multi-upstream (config file)

Docker Compose

Helm

Configuration reference

CLI usage

Benchmark results

Token reduction

LLM accuracy

Local models: context window enablement

Development

License

Similar Packages