A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
MCP server for LLM quantization. Compress any model to GGUF/GPTQ/AWQ in one tool call. First MCP server for model compre
Self-contained Python MCP server for LLM quantization. Compress any HuggingFace model to GGUF, GPTQ, or AWQ format in a single tool call.
No external CLI required -- all quantization logic is embedded.
pip install mcp-turboquant
Or run directly with uvx:
uvx mcp-turboquant
The info, check, and recommend tools work out of the box. For actual quantization, install the backend you need:
# GGUF (Ollama, llama.cpp, LM Studio)
pip install mcp-turboquant[gguf]
# GPTQ (vLLM, TGI)
pip install mcp-turboquant[gptq]
# AWQ (vLLM, TGI)
pip install mcp-turboquant[awq]
# Everything
pip install mcp-turboquant[all]
Add to ~/.claude/settings.json:
{
"mcpServers": {
"turboquant": {
"command": "mcp-turboquant"
}
}
}
Or with uvx (no install needed):
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
Add to claude_desktop_config.json:
{
"mcpServers": {
"turboquant": {
"command": "uvx",
"args": ["mcp-turboquant"]
}
}
}
| Tool | Description | Heavy deps? |
|---|---|---|
info | Get model info from HuggingFace (params, size, architecture) | No |
check | Check available quantization backends on the system | No |
recommend | Hardware-aware recommendation for best format + bits | No |
quantize | Quantize a model to GGUF/GPTQ/AWQ | Yes |
evaluate | Run perplexity evaluation on a quantized model | Yes |
push | Push quantized model to HuggingFace Hub | No |
Once configured, ask Claude:
"Get info on meta-llama/Llama-3.1-8B-Instruct"
"What quantization format should I use for Mistral-7B on my machine?"
"Quantize meta-llama/Llama-3.1-8B to 4-bit GGUF"
"Check which quantization backends I have installed"
"Evaluate the perplexity of my quantized model at /path/to/model.gguf"
"Push my quantized model to myuser/model-GGUF on HuggingFace"
Claude / Agent <--> MCP Protocol (stdio) <--> mcp-turboquant (Python) <--> llama-cpp-python / auto-gptq / autoawq
All quantization logic runs in-process. No external CLI tools needed.
# As a command
mcp-turboquant
# As a module
python -m mcp_turboquant
MIT
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
MCP server integration for DaVinci Resolve Studio
Secure MCP server for MySQL database interaction, queries, and schema management