🍋 Lemonade: Refreshingly fast local AI

Lemonade Banner

Download | Documentation | Discord

Lemonade is the local AI server that gives you the same capabilities as cloud APIs, except 100% free and private. Use the latest models for chat, coding, speech, and image generation on your own NPU and GPU.

Lemonade comes in two flavors:

Lemonade Server installs a service you can connect to hundreds of great apps using standard OpenAI, Anthropic, and Ollama APIs.
Embeddable Lemonade is a portable binary you can package into your own application to give it multi-modal local AI that auto-optimizes for your user’s PC.

This project is built by the community for every PC, with optimizations by AMD engineers to get the most from Ryzen AI, Radeon, and Strix Halo PCs.

Getting Started

Install: Windows · Linux · macOS · Docker · Source
Get Models: Browse and download with the Model Manager
Generate: Try models with the built-in interfaces for chat, image gen, speech gen, and more
Mobile: Take your lemonade to go: iOS · Android · Source
Connect: Use Lemonade with your favorite apps:

Want your app featured here? Just submit a marketplace PR!

Supported Platforms

Platform	Build

Using the CLI

To run and chat with Gemma:

hljs language-arduino

lemonade run Gemma-4-E2B-it-GGUF

To code with Lemonade models:

hljs

lemonade launch claude

Multi-modality:

hljs language-arduino

# image gen
lemonade run SDXL-Turbo

# speech gen
lemonade run kokoro-v1

# transcription
lemonade run Whisper-Large-v3-Turbo

To see available models and download them:

hljs

lemonade list

lemonade pull Gemma-4-E2B-it-GGUF

To see the backends available on your PC:

hljs

lemonade backends

For hybrid setups, Lemonade can also route to any OpenAI-compatible cloud provider (Fireworks, OpenAI, OpenRouter, Together, …) alongside local models — see Cloud Offload. (Experimental.)

Model Library

Lemonade supports a wide variety of LLMs (GGUF, FLM, and ONNX), whisper, stable diffusion, etc. models across CPU, GPU, and NPU.

Use lemonade pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.

Browse all built-in models →

Supported Configurations

Lemonade supports multiple inference engines for LLM, speech, TTS, and image generation, and each has its own backend and hardware requirements.

Modality	Engine	Backend	Device	OS
Text generation	`llamacpp`	`vulkan`	`x86_64` CPU, AMD iGPU, AMD dGPU; ARM64 CPU/GPU (Linux)	Windows, Linux
		`rocm`	Supported AMD ROCm iGPU/dGPU families*	Windows, Linux
		`cuda`	NVIDIA GPUs (Turing or newer)**	Windows, Linux
		`cpu`	`x86_64` CPU; ARM64 CPU (Linux)	Windows, Linux
		`metal`	Apple Silicon GPU	macOS
		`system`	`x86_64`/ARM64 CPU, GPU	Linux
	`flm`	`npu`	XDNA2 NPU	Windows, Linux
	`ryzenai-llm`	`npu`	XDNA2 NPU	Windows
	`vllm` (experimental)	`rocm`	Strix Halo iGPU (gfx1151)	Linux
Speech-to-text	`whispercpp`	`npu`	XDNA2 NPU	Windows
		`vulkan`	`x86_64` CPU	Linux
		`cpu`	`x86_64` CPU	Windows, Linux
	`moonshine`	`cpu`	`x86_64`/`arm64` CPU	Windows, Linux, macOS
Text-to-speech	`kokoro`	`cpu`	`x86_64` CPU	Windows, Linux
Image generation	`sd-cpp`	`rocm`	Supported AMD ROCm iGPU/dGPU families*	Windows, Linux
		`vulkan`	Vulkan-capable GPUs	Windows, Linux
		`cuda`	NVIDIA GPUs (Turing or newer)**	Linux
		`cpu`	`x86_64` CPU	Windows, Linux

To check exactly which recipes/backends are supported on your own machine, run:

hljs

lemonade backends

* See supported AMD ROCm platforms

Architecture	Platform Support	GPU Models
gfx1151 (STX Halo)	Windows, Ubuntu	Ryzen AI MAX+ Pro 395
gfx120X (RDNA4)	Windows, Ubuntu	Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3)	Windows, Ubuntu	Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

** See supported NVIDIA CUDA platforms

Compute Capability	Architecture	GPU Models
sm_75	Turing	RTX 20-series, GTX 16-series, T4
sm_80 / sm_86	Ampere	RTX 30-series, A100, A40
sm_89	Ada Lovelace	RTX 40-series, L40, L4
sm_90	Hopper	H100, H200
sm_100 / sm_120	Blackwell	RTX 50-series, B100, B200

Project Roadmap

Lemonade's roadmap is defined by a set of working groups. Visit the landing page here to learn each group's goal and roadmap.

Integrate Embeddable Lemonade in You Application

Embeddable Lemonade is a binary version of Lemonade that you can bundle into your own app to give it a portable, auto-optimizing, multi-modal local AI stack. This lets users focus on your app, with zero Lemonade installers, branding, or telemetry.

Check out the Embeddable Lemonade guide.

Connect Lemonade Server to Your Application

You can use any OpenAI-compatible client library by configuring it to use http://localhost:13305/v1 as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.

Feel free to pick and choose your preferred language.

Python	C++	Java	C#	Node.js	Go	Ruby	Rust	PHP
openai-python	openai-cpp	openai-java	openai-dotnet	openai-node	go-openai	ruby-openai	async-openai	openai-php

Python Client Example

hljs language-python

from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
    base_url="http://localhost:13305/api/v1",
    api_key="lemonade"  # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
    model="Gemma-4-E2B-it-GGUF",  # or any other available model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Print the response
print(completion.choices[0].message.content)

Click to learn more about the available APIs and how to embed Lemonade in your own application.

FAQ

To read our frequently asked questions, see our FAQ Guide

Contributing

Lemonade is built by the local AI community! If you would like to contribute to this project, please check out our contribution guide.

Maintainers

This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @kenvandine @Geramy @ramkrishna2910 @sawansri @siavashhub @sofiageo @superm1 @vgodsoe, and sponsored by AMD. You can reach us by filing an issue, emailing lemonade@amd.com, or joining our Discord.

Code Signing Policy

Free code signing provided by SignPath.io, certificate by SignPath Foundation.

Committers and reviewers: Maintainers of this repo
Approvers: Owners

Privacy policy: This program will not transfer any information to other networked systems unless specifically requested by the user or the person installing or operating it. When the user requests it, Lemonade downloads AI models from Hugging Face Hub (see their privacy policy).

License and Attribution

This project is:

Built with C++ (server) and React (app) with ❤️ for the open source community,
Standing on the shoulders of great tools from:
- ggml/llama.cpp
- ggml/whisper.cpp
- ggml/stable-diffusion.cpp
- kokoros
- OnnxRuntime GenAI
- Hugging Face Hub
- OpenAI API
- IRON/MLIR-AIE
- and more...
Licensed under the Apache 2.0 License.
- Portions of the project are licensed as described in LICENSE.

🍋 Lemonade: Refreshingly fast local AI

Lemonade Banner

Download | Documentation | Discord

Lemonade comes in two flavors:

Lemonade Server installs a service you can connect to hundreds of great apps using standard OpenAI, Anthropic, and Ollama APIs.
Embeddable Lemonade is a portable binary you can package into your own application to give it multi-modal local AI that auto-optimizes for your user’s PC.

This project is built by the community for every PC, with optimizations by AMD engineers to get the most from Ryzen AI, Radeon, and Strix Halo PCs.

Getting Started

Install: Windows · Linux · macOS · Docker · Source
Get Models: Browse and download with the Model Manager
Generate: Try models with the built-in interfaces for chat, image gen, speech gen, and more
Mobile: Take your lemonade to go: iOS · Android · Source
Connect: Use Lemonade with your favorite apps:

Want your app featured here? Just submit a marketplace PR!

Supported Platforms

Platform	Build

Using the CLI

To run and chat with Gemma:

hljs language-arduino

lemonade run Gemma-4-E2B-it-GGUF

To code with Lemonade models:

hljs

lemonade launch claude

Multi-modality:

hljs language-arduino

# image gen
lemonade run SDXL-Turbo

# speech gen
lemonade run kokoro-v1

# transcription
lemonade run Whisper-Large-v3-Turbo

To see available models and download them:

hljs

lemonade list

lemonade pull Gemma-4-E2B-it-GGUF

To see the backends available on your PC:

hljs

lemonade backends

For hybrid setups, Lemonade can also route to any OpenAI-compatible cloud provider (Fireworks, OpenAI, OpenRouter, Together, …) alongside local models — see Cloud Offload. (Experimental.)

Model Library

Lemonade supports a wide variety of LLMs (GGUF, FLM, and ONNX), whisper, stable diffusion, etc. models across CPU, GPU, and NPU.

Use lemonade pull or the built-in Model Manager to download models. You can also import custom GGUF/ONNX models from Hugging Face.

Browse all built-in models →

Supported Configurations

Lemonade supports multiple inference engines for LLM, speech, TTS, and image generation, and each has its own backend and hardware requirements.

Modality	Engine	Backend	Device	OS
Text generation	`llamacpp`	`vulkan`	`x86_64` CPU, AMD iGPU, AMD dGPU; ARM64 CPU/GPU (Linux)	Windows, Linux
		`rocm`	Supported AMD ROCm iGPU/dGPU families*	Windows, Linux
		`cuda`	NVIDIA GPUs (Turing or newer)**	Windows, Linux
		`cpu`	`x86_64` CPU; ARM64 CPU (Linux)	Windows, Linux
		`metal`	Apple Silicon GPU	macOS
		`system`	`x86_64`/ARM64 CPU, GPU	Linux
	`flm`	`npu`	XDNA2 NPU	Windows, Linux
	`ryzenai-llm`	`npu`	XDNA2 NPU	Windows
	`vllm` (experimental)	`rocm`	Strix Halo iGPU (gfx1151)	Linux
Speech-to-text	`whispercpp`	`npu`	XDNA2 NPU	Windows
		`vulkan`	`x86_64` CPU	Linux
		`cpu`	`x86_64` CPU	Windows, Linux
	`moonshine`	`cpu`	`x86_64`/`arm64` CPU	Windows, Linux, macOS
Text-to-speech	`kokoro`	`cpu`	`x86_64` CPU	Windows, Linux
Image generation	`sd-cpp`	`rocm`	Supported AMD ROCm iGPU/dGPU families*	Windows, Linux
		`vulkan`	Vulkan-capable GPUs	Windows, Linux
		`cuda`	NVIDIA GPUs (Turing or newer)**	Linux
		`cpu`	`x86_64` CPU	Windows, Linux

To check exactly which recipes/backends are supported on your own machine, run:

hljs

lemonade backends

* See supported AMD ROCm platforms

Architecture	Platform Support	GPU Models
gfx1151 (STX Halo)	Windows, Ubuntu	Ryzen AI MAX+ Pro 395
gfx120X (RDNA4)	Windows, Ubuntu	Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3)	Windows, Ubuntu	Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

** See supported NVIDIA CUDA platforms

Compute Capability	Architecture	GPU Models
sm_75	Turing	RTX 20-series, GTX 16-series, T4
sm_80 / sm_86	Ampere	RTX 30-series, A100, A40
sm_89	Ada Lovelace	RTX 40-series, L40, L4
sm_90	Hopper	H100, H200
sm_100 / sm_120	Blackwell	RTX 50-series, B100, B200

Project Roadmap

Lemonade's roadmap is defined by a set of working groups. Visit the landing page here to learn each group's goal and roadmap.

Integrate Embeddable Lemonade in You Application

Check out the Embeddable Lemonade guide.

Connect Lemonade Server to Your Application

Feel free to pick and choose your preferred language.

Python	C++	Java	C#	Node.js	Go	Ruby	Rust	PHP
openai-python	openai-cpp	openai-java	openai-dotnet	openai-node	go-openai	ruby-openai	async-openai	openai-php

Python Client Example

hljs language-python

from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
    base_url="http://localhost:13305/api/v1",
    api_key="lemonade"  # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
    model="Gemma-4-E2B-it-GGUF",  # or any other available model
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

# Print the response
print(completion.choices[0].message.content)

Click to learn more about the available APIs and how to embed Lemonade in your own application.

FAQ

To read our frequently asked questions, see our FAQ Guide

Contributing

Lemonade is built by the local AI community! If you would like to contribute to this project, please check out our contribution guide.

Maintainers

Code Signing Policy

Free code signing provided by SignPath.io, certificate by SignPath Foundation.

Committers and reviewers: Maintainers of this repo
Approvers: Owners

License and Attribution

This project is:

Built with C++ (server) and React (app) with ❤️ for the open source community,
Standing on the shoulders of great tools from:
- ggml/llama.cpp
- ggml/whisper.cpp
- ggml/stable-diffusion.cpp
- kokoros
- OnnxRuntime GenAI
- Hugging Face Hub
- OpenAI API
- IRON/MLIR-AIE
- and more...
Licensed under the Apache 2.0 License.
- Portions of the project are licensed as described in LICENSE.

lemonade

🍋 Lemonade: Refreshingly fast local AI

Download | Documentation | Discord

Getting Started

Supported Platforms

Using the CLI

Model Library

Supported Configurations

Project Roadmap

Integrate Embeddable Lemonade in You Application

Connect Lemonade Server to Your Application

Python Client Example

FAQ

Contributing

Maintainers

Code Signing Policy

License and Attribution

Similar Packages

lemonade

🍋 Lemonade: Refreshingly fast local AI

Download | Documentation | Discord

Getting Started

Supported Platforms

Using the CLI

Model Library

Supported Configurations

Project Roadmap

Integrate Embeddable Lemonade in You Application

Connect Lemonade Server to Your Application

Python Client Example

FAQ

Contributing

Maintainers

Code Signing Policy

License and Attribution

Similar Packages