A community-driven registry for the Claude Code ecosystem. Not affiliated with Anthropic.
Are you the author? Sign in to claim
supertone-mcp
A composable MCP toolkit for the Supertone TTS API. Rather than a single "speak this text" command, it exposes Supertone's SDK as a set of building-block tools — synthesis, voice discovery, preview, duration/credit prediction, usage tracking, and full voice-cloning CRUD — that an LLM assembles to fulfill a request. Works in Claude Desktop, Cursor, or any MCP-compatible client.
Covers Korean, English, Japanese, and 31 languages total. Speed (0.5x–2.0x), pitch shift (-24 to +24 semitones), emotion styles, per-call output mode, streaming, and model selection.
Synthesis
text_to_speech — Convert text to audio. Per-call control of output_mode (files / resources / both), autoplay, streaming, model, plus include_phonemes / normalized_text. Long text is auto-chunked by the SDK.predict_duration — Estimate audio length (and credit cost) without synthesizing.Voice discovery (preset)
search_voice — Filter the catalog by language, gender, age, use_case, style, model, name, or description.get_voice — Full detail for one voice.preview_voice — Sample audio URLs for a voice (filterable by language/style/model).Custom voice cloning
clone_voice — Create a cloned voice from a local WAV/MP3 (≤3MB).search_custom_voice — List/filter cloned voices.get_custom_voice — Full detail for one cloned voice.edit_custom_voice — Update name and/or description.delete_custom_voice — Permanently delete (irreversible).Usage & credits
get_credit_balance — Remaining credits.get_usage_history — Usage over a time window.get_voice_usage — Usage for a specific voice.0.2.0 moves behavior control out of environment variables and into per-call tool parameters — so the LLM decides per request, not the server config.
| Before (env var) | After (per-call parameter) | Note |
|---|---|---|
SUPERTONE_MCP_OUTPUT_MODE=files|resources|both | text_to_speech(output_mode=...) | Default still files |
SUPERTONE_MCP_AUTOPLAY=true | text_to_speech(autoplay=...) | Default changed true → false (playback is now explicit) |
| (always streamed) | text_to_speech(streaming=...) | New, default false (one-shot). streaming=true requires model="sona_speech_1" |
Other changes:
sona_speech_1 → sona_speech_2_flash.list_voices was removed (since the discovery release) and replaced by search_voice — call it with no arguments to reproduce the old "list everything" behavior.If you previously set SUPERTONE_MCP_OUTPUT_MODE or SUPERTONE_MCP_AUTOPLAY, remove them from your client config and pass output_mode / autoplay per call instead. (The server prints a one-time stderr notice if it sees the removed vars.)
# Using uvx (recommended)
uvx supertone-mcp
# Using pip
pip install supertone-mcp
Add to claude_desktop_config.json:
{
"mcpServers": {
"supertone-tts": {
"command": "uvx",
"args": ["supertone-mcp"],
"env": {
"SUPERTONE_API_KEY": "your-api-key-here"
}
}
}
}
Add to your Cursor MCP settings (same JSON shape as above).
Only authentication and stable defaults are configured via the environment — all behavior is controlled per call.
| Variable | Required | Default | Description |
|---|---|---|---|
SUPERTONE_API_KEY | Yes | — | Your Supertone API key |
SUPERTONE_MCP_VOICE_ID | No | preset voice (Aiden, multilingual) | Default voice_id for text_to_speech / predict_duration (override per call) |
SUPERTONE_OUTPUT_DIR | No | ~/supertone-tts-output/ | Directory where audio files are saved (used by output_mode=files/both) |
Removed in 0.2.0:
SUPERTONE_MCP_OUTPUT_MODEandSUPERTONE_MCP_AUTOPLAY— see Migration.
text_to_speech output_mode)| Mode | Returns | Use when |
|---|---|---|
files (default) | Plain text with the saved file path + metadata | You want the file on disk |
resources | MCP AudioContent + TextContent (no file written) | The client renders audio inline (e.g., Claude.ai chat) |
both | File on disk and AudioContent/TextContent | You want both — preview inline, keep the file |
The MCP client routes natural-language requests across these tools — the value of the toolkit is composition: the LLM chains several tools to satisfy one request.
"Find a calm Korean female voice, let me hear a sample, check the cost, then make this announcement as an mp3."
The LLM assembles:
search_voice(language="ko", gender="female", style="neutral") # find candidates
→ preview_voice(voice_id) # sample URLs to confirm the voice
→ predict_duration(text, voice_id) + get_credit_balance() # gauge cost before spending
→ text_to_speech(text, voice_id, output_format="mp3",
output_mode="files") # synthesize
"Make a cloned voice from ~/recordings/sample.wav named MyVoice, then read this greeting with it and play it for me."
The LLM assembles:
clone_voice(name="MyVoice", audio_path="~/recordings/sample.wav") # create the cloned voice
→ get_custom_voice(voice_id) # confirm it was created
→ text_to_speech(text, voice_id=<cloned>, autoplay=true) # synthesize, then play immediately
autoplayis a per-call parameter (defaultfalse), so playback happens only when explicitly requested.
text_to_speech| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | Yes | — | Text to convert (long text is auto-chunked by the SDK) |
voice_id | string | No | env or preset | Voice identifier (browse via search_voice) |
language | string | No | ko | Language code — one of 31 (ko, en, ja, …) |
output_format | string | No | mp3 | mp3 or wav |
model | string | No | sona_speech_2_flash | sona_speech_1, sona_speech_2, sona_speech_2_flash, sona_speech_2t, sona_speech_3t, supertonic_api_1, supertonic_api_3 |
speed | float | No | 1.0 | 0.5–2.0 |
pitch_shift | int | No | 0 | -24 to +24 semitones |
style | string | No | — | Emotion style (varies by voice) |
output_mode | string | No | files | files, resources, or both (see Output modes) |
autoplay | bool | No | false | Play the audio locally after synthesis (macOS afplay) |
streaming | bool | No | false | Stream synthesis. Only supported by model="sona_speech_1" |
include_phonemes | bool | No | false | Return phoneme timing data alongside the audio |
normalized_text | string | No | — | Pre-normalized text (only used by sona_speech_2 / sona_speech_2_flash) |
predict_durationSame core parameter schema as text_to_speech (long text auto-chunked). Returns "Predicted duration: 2.34s (credit usage is proportional to duration).".
search_voiceAll parameters optional. With no filters → full catalog. With any filter → first response line is Filters applied: ....
| Parameter | Type | Description |
|---|---|---|
language | string | e.g., ko, en, ja |
gender | string | e.g., male, female |
age | string | e.g., young_adult, child |
use_case | string | e.g., narration, advertisement |
style | string | e.g., neutral, happy |
model | string | e.g., sona_speech_2_flash |
name | string | partial match |
description | string | partial match |
get_voice / preview_voice| Tool | Required | Optional |
|---|---|---|
get_voice | voice_id | — |
preview_voice | voice_id | language, style, model (filter samples) |
clone_voice| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name (non-empty) |
audio_path | string | Yes | Local WAV or MP3 path (≤3MB). Supports ~ expansion |
description | string | No | Optional note |
| Tool | Required | Optional |
|---|---|---|
search_custom_voice | — | name, description (partial match) |
get_custom_voice | voice_id | — |
edit_custom_voice | voice_id | name, description (at least one required) |
delete_custom_voice | voice_id | — (IRREVERSIBLE) |
| Tool | Required | Optional |
|---|---|---|
get_credit_balance | — | — |
get_usage_history | — | — (reports a recent default window) |
get_voice_usage | voice_id | — |
# Clone and install
git clone https://github.com/supertone-inc/supertone-mcp.git
cd supertone-mcp
uv sync
# Run tests
uv run pytest -q
# Run with coverage
uv run pytest --cov=src --cov-report=term-missing
MIT
Run Claude Code as an MCP server so any agent can delegate coding tasks to it
Browser automation using accessibility snapshots instead of screenshots
English-first Korean equity intelligence MCP — DART filings, foreign-holder 5%-rule flows, activist filings, KRX news. F
Unity MCP acts as a bridge between AI assistants and your Unity Editor. Give your LLM tools to manage assets, control sc