A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join o
Lemonade is the local AI server that gives you the same capabilities as cloud APIs, except 100% free and private. Use the latest models for chat, coding, speech, and image generation on your own NPU and GPU.
Lemonade comes in two flavors:
This project is built by the community for every PC, with optimizations by AMD engineers to get the most from Ryzen AI, Radeon, and Strix Halo PCs.
Want your app featured here? Just submit a marketplace PR!
To run and chat with Gemma:
lemonade run Gemma-4-E2B-it-GGUF
To code with Lemonade models:
lemonade launch claude
Multi-modality:
# image gen
lemonade run SDXL-Turbo
# speech gen
lemonade run kokoro-v1
# transcription
lemonade run Whisper-Large-v3-Turbo
To see available models and download them:
lemonade list
lemonade pull Gemma-4-E2B-it-GGUF
To see the backends available on your PC:
lemonade backends
For hybrid setups, Lemonade can also route to any OpenAI-compatible cloud provider (Fireworks, OpenAI, OpenRouter, Together, …) alongside local models — see Cloud Offload. (Experimental.)
Lemonade supports a wide variety of LLMs (GGUF, FLM, and ONNX), whisper, stable diffusion, etc. models across CPU, GPU, and NPU.
Use lemonade pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.
Lemonade supports multiple inference engines for LLM, speech, TTS, and image generation, and each has its own backend and hardware requirements.
| Modality | Engine | Backend | Device | OS |
|---|---|---|---|---|
| Text generation | llamacpp | vulkan | x86_64 CPU, AMD iGPU, AMD dGPU; ARM64 CPU/GPU (Linux) | Windows, Linux |
rocm | Supported AMD ROCm iGPU/dGPU families* | Windows, Linux | ||
cuda | NVIDIA GPUs (Turing or newer)** | Windows, Linux | ||
cpu | x86_64 CPU; ARM64 CPU (Linux) | Windows, Linux | ||
metal | Apple Silicon GPU | macOS | ||
system | x86_64/ARM64 CPU, GPU | Linux | ||
flm | npu | XDNA2 NPU | Windows, Linux | |
ryzenai-llm | npu | XDNA2 NPU | Windows | |
vllm (experimental) | rocm | Strix Halo iGPU (gfx1151) | Linux | |
| Speech-to-text | whispercpp | npu | XDNA2 NPU | Windows |
vulkan | x86_64 CPU | Linux | ||
cpu | x86_64 CPU | Windows, Linux | ||
moonshine | cpu | x86_64/arm64 CPU | Windows, Linux, macOS | |
| Text-to-speech | kokoro | cpu | x86_64 CPU | Windows, Linux |
| Image generation | sd-cpp | rocm | Supported AMD ROCm iGPU/dGPU families* | Windows, Linux |
vulkan | Vulkan-capable GPUs | Windows, Linux | ||
cuda | NVIDIA GPUs (Turing or newer)** | Linux | ||
cpu | x86_64 CPU | Windows, Linux |
To check exactly which recipes/backends are supported on your own machine, run:
lemonade backends
| Architecture | Platform Support | GPU Models |
|---|---|---|
| gfx1151 (STX Halo) | Windows, Ubuntu | Ryzen AI MAX+ Pro 395 |
| gfx120X (RDNA4) | Windows, Ubuntu | Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT |
| gfx110X (RDNA3) | Windows, Ubuntu | Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT |
| Compute Capability | Architecture | GPU Models |
|---|---|---|
| sm_75 | Turing | RTX 20-series, GTX 16-series, T4 |
| sm_80 / sm_86 | Ampere | RTX 30-series, A100, A40 |
| sm_89 | Ada Lovelace | RTX 40-series, L40, L4 |
| sm_90 | Hopper | H100, H200 |
| sm_100 / sm_120 | Blackwell | RTX 50-series, B100, B200 |
Lemonade's roadmap is defined by a set of working groups. Visit the landing page here to learn each group's goal and roadmap.
Embeddable Lemonade is a binary version of Lemonade that you can bundle into your own app to give it a portable, auto-optimizing, multi-modal local AI stack. This lets users focus on your app, with zero Lemonade installers, branding, or telemetry.
Check out the Embeddable Lemonade guide.
You can use any OpenAI-compatible client library by configuring it to use http://localhost:13305/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
| Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
|---|---|---|---|---|---|---|---|---|
| openai-python | openai-cpp | openai-java | openai-dotnet | openai-node | go-openai | ruby-openai | async-openai | openai-php |
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:13305/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Gemma-4-E2B-it-GGUF", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
Click to learn more about the available APIs and how to embed Lemonade in your own application.
To read our frequently asked questions, see our FAQ Guide
Lemonade is built by the local AI community! If you would like to contribute to this project, please check out our contribution guide.
This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @kenvandine @Geramy @ramkrishna2910 @sawansri @siavashhub @sofiageo @superm1 @vgodsoe, and sponsored by AMD. You can reach us by filing an issue, emailing lemonade@amd.com, or joining our Discord.
Free code signing provided by SignPath.io, certificate by SignPath Foundation.
Privacy policy: This program will not transfer any information to other networked systems unless specifically requested by the user or the person installing or operating it. When the user requests it, Lemonade downloads AI models from Hugging Face Hub (see their privacy policy).
This project is:
A Jetbrains IDE IntelliJ plugin aimed to provide coding agents the ability to leverage intelliJ's indexing of the codeba
mcp-language-server gives MCP enabled clients access semantic tools like get definition, references, rename, and diagnos
Run Claude Code as an MCP server so any agent can delegate coding tasks to it