A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Cascading runtime for AI agents. Optimize cost, latency, quality, and policy decisions inside the agent loop.
Cost Savings: 69% (MT-Bench), 93% (GSM8K), 52% (MMLU), 80% (TruthfulQA) savings, retaining 96% GPT-5 quality.
Python •
TypeScript •

OpenAI Agents •
CrewAI •
PydanticAI •
Google ADK •
n8n •
OpenClaw • Hermes Agent • 📖 Docs • 💡 Examples
The in-process intelligence layer for AI agents. Optimize cost, latency, quality, budget, compliance, and energy — inside the execution loop, not at the HTTP boundary.
cascadeflow works where external proxies can't: per-step model decisions based on agent state, per-tool-call budget gating, runtime stop/continue/escalate actions, and business KPI injection during agent loops. It accumulates insight from every model call, tool result, and quality score — the agent gets smarter the more it runs. Sub-5ms overhead. Works with LangChain, OpenAI Agents SDK, CrewAI, PydanticAI, Google ADK, n8n, Vercel AI SDK, and Hermes Agent.
Update
Hermes Agent delegation cascading
CascadeFlow now provides a Hermes Agent integration for per-skill model cascading, task-complexity cascading, topic-aware subagent cascading, observe-mode rollout, and auditable decisions without taking over provider credentials, base URLs, fallback chains, or API modes.
pip install cascadeflow
npm install @cascadeflow/core
| Dimension | External Proxy | cascadeflow Harness |
|---|---|---|
| Scope | HTTP request boundary | Inside agent execution loop |
| Dimensions | Cost only | Cost + quality + latency + budget + compliance + energy |
| Latency overhead | 10-50ms network RTT | <5ms in-process |
| Business logic | None | KPI weights and targets |
| Enforcement | None (observe only) | stop, deny_tool, switch_model |
| Auditability | Request logs | Per-step decision traces |
cascadeflow is a library and agent harness — an intelligent AI model cascading package that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, cascadeflow automatically escalates to flagship models if needed.
allow, switch_model, deny_tool, stop — based on current context and policy state. Closes the gap between analytics and execution.ℹ️ Note: SLMs (under 10B parameters) are sufficiently powerful for 60-70% of agentic AI tasks. Research paper
cascadeflow uses speculative execution with quality validation:
Zero configuration. Works with YOUR existing models (>17 providers currently supported).
In practice, 60-70% of queries are handled by small, efficient models (8-20x cost difference) without requiring escalation
Result: 40-85% cost reduction, 2-10x faster responses, zero quality loss.
┌─────────────────────────────────────────────────────────────┐
│ cascadeflow Stack │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Cascade Agent │ │
│ │ │ │
│ │ Orchestrates the entire cascade execution │ │
│ │ • Query routing & model selection │ │
│ │ • Drafter -> Verifier coordination │ │
│ │ • Cost tracking & telemetry │ │
│ └───────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Domain Pipeline │ │
│ │ │ │
│ │ Automatic domain classification │ │
│ │ • Rule-based detection (CODE, MATH, DATA, etc.) │ │
│ │ • Optional ML semantic classification │ │
│ │ • Domain-optimized pipelines & model selection │ │
│ └───────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Quality Validation Engine │ │
│ │ │ │
│ │ Multi-dimensional quality checks │ │
│ │ • Length validation (too short/verbose) │ │
│ │ • Confidence scoring (logprobs analysis) │ │
│ │ • Format validation (JSON, structured output) │ │
│ │ • Semantic alignment (intent matching) │ │
│ └───────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Cascading Engine (<2ms overhead) │ │
│ │ │ │
│ │ Smart model escalation strategy │ │
│ │ • Try cheap models first (speculative execution) │ │
│ │ • Validate quality instantly │ │
│ │ • Escalate only when needed │ │
│ │ • Automatic retry & fallback │ │
│ └───────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Provider Abstraction Layer │ │
│ │ │ │
│ │ Unified interface for >17 providers │ │
│ │ • OpenAI • Anthropic • Groq • Ollama │ │
│ │ • Together • vLLM • HuggingFace • LiteLLM │ │
│ │ • Vercel AI SDK (17+ additional providers) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Three tiers of integration — zero-change observability to full policy control:
Tier 1: Zero-change observability
import cascadeflow
cascadeflow.init(mode="observe")
# All OpenAI/Anthropic SDK calls are now tracked. No code changes needed.
Tier 2: Scoped runs with budget
with cascadeflow.run(budget=0.50, max_tool_calls=10) as session:
result = await agent.run("Analyze this dataset")
print(session.summary()) # cost, latency, energy, steps, tool calls
print(session.trace()) # full decision audit trail
Tier 3: Decorated agents with policy
@cascadeflow.agent(budget=0.20, compliance="gdpr", kpi_weights={"quality": 0.6, "cost": 0.3, "latency": 0.1})
async def my_agent(query: str):
return await llm.complete(query)
pip install cascadeflow[all]
from cascadeflow import CascadeAgent, ModelConfig
# Define your cascade - try cheap model first, escalate if needed
agent = CascadeAgent(models=[
ModelConfig(name="nous/hermes-flash", provider="openai", cost=0.000375), # Draft model (~$0.375/1M tokens)
ModelConfig(name="gpt-5", provider="openai", cost=0.00562), # Verifier model (~$5.62/1M tokens)
])
# Run query - automatically routes to optimal model
result = await agent.run("What's the capital of France?")
print(f"Answer: {result.content}")
print(f"Model used: {result.model_used}")
print(f"Cost: ${result.total_cost:.6f}")
For advanced use cases, you can add ML-based semantic similarity checking to validate that responses align with queries.
Step 1: Install the optional ML package:
pip install cascadeflow[semantic] # Adds semantic similarity via FastEmbed (~80MB model)
Step 2: Use semantic quality validation:
from cascadeflow.quality.semantic import SemanticQualityChecker
# Initialize semantic checker (downloads model on first use)
checker = SemanticQualityChecker(
similarity_threshold=0.5, # Minimum similarity score (0-1)
toxicity_threshold=0.7 # Maximum toxicity score (0-1)
)
# Validate query-response alignment
query = "Explain Python decorators"
response = "Decorators are a way to modify functions using @syntax..."
result = checker.validate(query, response, check_toxicity=True)
print(f"Similarity: {result.similarity:.2%}")
print(f"Passed: {result.passed}")
print(f"Toxic: {result.is_toxic}")
What you get:
Full example: See semantic_quality_domain_detection.py
⚠️ GPT-5 Note: GPT-5 streaming requires organization verification. Non-streaming works for all users. Verify here if needed (~15 min). Basic cascadeflow examples work without - GPT-5 is only called when needed (typically 20-30% of requests).
📖 Learn more: Python Documentation | Quickstart Guide | Providers Guide
npm install @cascadeflow/core
import { CascadeAgent, ModelConfig } from '@cascadeflow/core';
// Same API as Python!
const agent = new CascadeAgent({
models: [
{ name: 'nous/hermes-flash', provider: 'openai', cost: 0.000375 },
{ name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
],
});
const result = await agent.run('What is TypeScript?');
console.log(`Model: ${result.modelUsed}`);
console.log(`Cost: $${result.totalCost}`);
console.log(`Saved: ${result.savingsPercentage}%`);
For advanced quality validation, enable ML-based semantic similarity checking to ensure responses align with queries.
Step 1: Install the optional ML packages:
npm install @cascadeflow/ml @huggingface/transformers
Step 2: Enable semantic validation in your cascade:
import { CascadeAgent, SemanticQualityChecker } from '@cascadeflow/core';
const agent = new CascadeAgent({
models: [
{ name: 'nous/hermes-flash', provider: 'openai', cost: 0.000375 },
{ name: 'gpt-4o', provider: 'openai', cost: 0.00625 },
],
quality: {
threshold: 0.40, // Traditional confidence threshold
requireMinimumTokens: 5, // Minimum response length
useSemanticValidation: true, // Enable ML validation
semanticThreshold: 0.5, // 50% minimum similarity
},
});
// Responses now validated for semantic alignment
const result = await agent.run('Explain TypeScript generics');
Step 3: Or use semantic validation directly:
import { SemanticQualityChecker } from '@cascadeflow/core';
const checker = new SemanticQualityChecker();
if (await checker.isAvailable()) {
const result = await checker.checkSimilarity(
'What is TypeScript?',
'TypeScript is a typed superset of JavaScript.'
);
console.log(`Similarity: ${(result.similarity * 100).toFixed(1)}%`);
console.log(`Passed: ${result.passed}`);
}
What you get:
Example: semantic-quality.ts
📖 Learn more: TypeScript Documentation | Quickstart Guide | Node.js Examples
Migrate in 5min from direct Provider implementation to cost savings and full cost control and transparency.
Cost: $0.000113, Latency: 850ms
# Using expensive model for everything
result = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's 2+2?"}]
)
Cost: $0.000007, Latency: 234ms
agent = CascadeAgent(models=[
ModelConfig(name="nous/hermes-flash", provider="openai", cost=0.000375),
ModelConfig(name="gpt-4o", provider="openai", cost=0.00625),
])
result = await agent.run("What's 2+2?")
🔥 Saved: $0.000106 (94% reduction), 3.6x faster
📊 Learn more: Cost Tracking Guide | Production Best Practices | Performance Optimization
If you already have an app using the OpenAI or Anthropic APIs and want the fastest integration, run the gateway and point your existing client at it:
python -m cascadeflow.server --mode auto --port 8084
Use cascadeflow in n8n workflows for no-code AI automation with automatic cost optimization!
@cascadeflow/n8n-nodes-cascadeflow| Node | Type | Use case |
|---|---|---|
| CascadeFlow (Model) | Language Model sub-node | Drop-in for any Chain/LLM node |
| CascadeFlow Agent | Standalone agent (main in/out) | Tool calling, memory, multi-step reasoning |
Quick Start (Model):
Quick Start (Agent):
Result: 40-85% cost savings in your n8n workflows!
Features:
🔌 Learn more: n8n Integration Guide | n8n Package
Use CascadeFlow as an optional Hermes Agent delegation router for subagents. Hermes keeps provider credentials, base URLs, fallback chains, and API modes; CascadeFlow returns a structured routing decision before Hermes spawns a child agent.
This works as a released CascadeFlow module even before a native Hermes PR is accepted. Users can call the router from a local wrapper, local Hermes fork, or small hook script and keep Hermes' current provider configuration as the final source of truth.
from cascadeflow.integrations.hermes import (
HermesDelegationRequest,
HermesDelegationRouter,
)
router = HermesDelegationRouter.from_dict({
"enabled": True,
"mode": "observe",
"routes": {
"code": {
"provider": "nous",
"model": "nous/hermes-4.1",
"reasoning_effort": "high",
},
"simple": {
"provider": "openai",
"model": "gpt-4.1-mini",
"reasoning_effort": "low",
},
},
})
decision = router.route_delegation(HermesDelegationRequest(
goal="Debug the failing unit test and propose a patch",
toolsets=("terminal", "git"),
loaded_skills=("python", "debugging"),
))
print(decision.to_dict())
What Hermes gets:
Learn more: Hermes Agent Integration Guide
Standalone example: examples/integrations/hermes_delegation_router.py

Use cascadeflow with LangChain for intelligent model cascading with full LCEL, streaming, and tools support!
TypeScript
npm install @cascadeflow/langchain @langchain/core @langchain/openai
Python
pip install cascadeflow langchain-openai
import { ChatOpenAI } from '@langchain/openai';
import { ChatAnthropic } from '@langchain/anthropic';
import { withCascade } from '@cascadeflow/langchain';
const cascade = withCascade({
drafter: new ChatOpenAI({ model: 'nous/hermes-flash' }), // $0.15/$0.60 per 1M tokens
verifier: new ChatAnthropic({ model: 'claude-sonnet-4-5' }), // $3/$15 per 1M tokens
qualityThreshold: 0.8, // 80% queries use drafter
});
// Use like any LangChain chat model
const result = await cascade.invoke('Explain quantum computing');
// Optional: Enable LangSmith tracing (see https://smith.langchain.com)
// Set LANGSMITH_API_KEY, LANGSMITH_PROJECT, LANGSMITH_TRACING=true
// Or with LCEL chains
const chain = prompt.pipe(cascade).pipe(new StringOutputParser());
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from cascadeflow.integrations.langchain import CascadeFlow
cascade = CascadeFlow(
drafter=ChatOpenAI(model="nous/hermes-flash"), # $0.15/$0.60 per 1M tokens
verifier=ChatAnthropic(model="claude-sonnet-4-5"), # $3/$15 per 1M tokens
quality_threshold=0.8, # 80% queries use drafter
)
# Use like any LangChain chat model
result = await cascade.ainvoke("Explain quantum computing")
# Optional: Enable LangSmith tracing (see https://smith.langchain.com)
# Set LANGSMITH_API_KEY, LANGSMITH_PROJECT, LANGSMITH_TRACING=true
# Or with LCEL chains
chain = prompt | cascade | StrOutputParser()
Track costs, tokens, and cascade decisions with LangChain-compatible callbacks:
from cascadeflow.integrations.langchain.langchain_callbacks import get_cascade_callback
# Track costs similar to get_openai_callback()
with get_cascade_callback() as cb:
response = await cascade.ainvoke("What is Python?")
print(f"Total cost: ${cb.total_cost:.6f}")
print(f"Drafter cost: ${cb.drafter_cost:.6f}")
print(f"Verifier cost: ${cb.verifier_cost:.6f}")
print(f"Total tokens: {cb.total_tokens}")
print(f"Successful requests: {cb.successful_requests}")
Features:
get_openai_callback() patternFull example: See langchain_cost_tracking.py
For discovering optimal cascade pairs from your existing LangChain models, use the built-in discovery helpers:
import {
discoverCascadePairs,
findBestCascadePair,
analyzeModel,
validateCascadePair
} from '@cascadeflow/langchain';
// Your existing LangChain models (configured with YOUR API keys)
const myModels = [
new ChatOpenAI({ model: 'gpt-3.5-turbo' }),
new ChatOpenAI({ model: 'nous/hermes-flash' }),
new ChatOpenAI({ model: 'gpt-4o' }),
new ChatAnthropic({ model: 'claude-3-haiku' }),
// ... any LangChain chat models
];
// Quick: Find best cascade pair
const best = findBestCascadePair(myModels);
console.log(`Best pair: ${best.analysis.drafterModel} → ${best.analysis.verifierModel}`);
console.log(`Estimated savings: ${best.estimatedSavings}%`);
// Use it immediately
const cascade = withCascade({
drafter: best.drafter,
verifier: best.verifier,
});
// Advanced: Discover all valid pairs
const pairs = discoverCascadePairs(myModels, {
minSavings: 50, // Only pairs with ≥50% savings
requireSameProvider: false, // Allow cross-provider cascades
});
// Validate specific pair
const validation = validateCascadePair(drafter, verifier);
console.log(`Valid: ${validation.valid}`);
console.log(`Warnings: ${validation.warnings}`);
What you get:
Full example: See model-discovery.ts
Features:
🦜 Learn more: LangChain Integration Guide | TypeScript Package | Python Examples
Python Examples:
| Example | Description | Link |
|---|---|---|
| Basic Usage | Simple cascade setup with OpenAI models | View |
| Preset Usage | Use built-in presets for quick setup | View |
| Tool Execution | Function calling and tool usage | View |
| Streaming Text | Stream responses from cascade agents | View |
| Cost Tracking | Track and analyze costs across queries | View |
| Agentic Multi-Agent | Multi-turn tool loops & agent-as-a-tool delegation | View |
| Multi-Step Cascade | Multi-step agent loops with tool calls | View |
| Example | Description | Link |
|---|---|---|
| Budget Enforcement | Budget caps with stop actions in enforce mode | View |
| User Budget Tracking | Per-user budget enforcement and tracking | View |
| Guardrails | Safety and content guardrails | View |
| Rate Limiting | Rate limiting for cascades | View |
| User Profile Usage | User-specific routing and configurations | View |
| Stripe Integration | Billing integration with budget enforcement | View |
| Example | Description | Link |
|---|---|---|
| LangChain Harness | cascadeflow harness with LangChain callback handler | View |
| OpenAI Agents Harness | cascadeflow harness with OpenAI Agents SDK | View |
| CrewAI Harness | cascadeflow harness with CrewAI hooks | View |
| PydanticAI Harness | cascadeflow cascade Model with PydanticAI agents | View |
| Google ADK Harness | cascadeflow harness with Google ADK plugin | View |
| LangChain Basic | Simple LangChain cascade setup | View |
| LangChain LCEL Pipeline | LCEL chains with cascade routing | View |
| LangGraph Multi-Agent | LangGraph multi-agent orchestration | View |
| Example | Description | Link |
|---|---|---|
| Production Patterns | Best practices for production deployments | View |
| Multi-Provider | Mix multiple AI providers in one cascade | View |
| Reasoning Models | Use reasoning models (o1/o3, Claude Sonnet 4, DeepSeek-R1) | View |
| Streaming Tools | Stream tool calls and responses | View |
| Batch Processing | Process multiple queries efficiently | View |
| FastAPI Integration | Integrate cascades with FastAPI | View |
| Edge Device | Run cascades on edge devices with local models | View |
| vLLM Example | Use vLLM for local model deployment | View |
| Multi-Instance Ollama | Run draft/verifier on separate Ollama instances | View |
| Custom Cascade | Build custom cascade strategies | View |
| Custom Validation | Implement custom quality validators | View |
| Semantic Quality Detection | ML-based domain and quality detection | View |
| Cost Forecasting | Forecast costs and detect anomalies | View |
TypeScript Examples:
| Example | Description | Link |
|---|---|---|
| Basic Usage | Simple cascade setup (Node.js) | View |
| Tool Calling | Function calling with tools (Node.js) | View |
| Multi-Provider | Mix providers in TypeScript (Node.js) | View |
| Reasoning Models | Use reasoning models (o1/o3, Claude Sonnet 4, DeepSeek-R1) | View |
| Cost Tracking | Track and analyze costs across queries | View |
| Semantic Quality | ML-based semantic validation with embeddings | View |
| Streaming | Stream responses in TypeScript | View |
| Tool Execution | Tool execution engine and result handling | View |
| Streaming Tools | Stream tool calls with event detection | View |
| Agentic Multi-Agent | Multi-turn tool loops & multi-agent orchestration | View |
| Example | Description | Link |
|---|---|---|
| Production Patterns | Production best practices (Node.js) | View |
| Multi-Instance Ollama | Run draft/verifier on separate Ollama instances | View |
| Multi-Instance vLLM | Run draft/verifier on separate vLLM instances | View |
| Browser/Edge | Vercel Edge runtime example | View |
| LangChain Basic | Simple LangChain cascade setup | View |
| LangChain Cross-Provider | Haiku → GPT-5 with PreRouter | View |
| LangChain LangSmith | Cost tracking with LangSmith | View |
| LangChain Cost Tracking | Compare cascadeflow vs LangSmith cost tracking | View |
| LangGraph Multi-Agent | LangGraph multi-agent orchestration | View |
| LangChain Tool Risk Gating | Tool routing based on risk and complexity | View |
📂 View All Python Examples → | View All TypeScript Examples →
| Guide | Description | Link |
|---|---|---|
| Quickstart | Get started with cascadeflow in 5 minutes | Read |
| Providers Guide | Configure and use different AI providers | Read |
| Presets Guide | Using and creating custom presets | Read |
| Streaming Guide | Stream responses from cascade agents | Read |
| Tools Guide | Function calling and tool usage | Read |
| Cost Tracking | Track and analyze API costs | Read |
| Agentic Patterns | Tool loops, multi-agent, agent-as-a-tool delegation | Read |
| Agent Harness | Budget, compliance, KPI, and energy controls | Read |
| Rollout Guide | Plan your production rollout | Read |
| Guide | Description | Link |
|---|---|---|
| Production Guide | Best practices for production deployments | Read |
| Enterprise Networking | Proxy, TLS, and network configuration | Read |
| Customization | Custom cascade strategies and validators | Read |
| Observability | Telemetry, logging, and privacy controls | Read |
| LangChain Integration | Use cascadeflow with LangChain | Read |
| OpenAI Agents SDK | Use cascadeflow with OpenAI Agents | Read |
| CrewAI Integration | Use cascadeflow with CrewAI | Read |
| PydanticAI Integration | Cascade Model for PydanticAI agents | Read |
| Google ADK | Use cascadeflow with Google ADK | Read |
| Hermes Agent | Per-skill, complexity, and topic-aware subagent routing | Read |
| n8n Integration | Use cascadeflow in n8n workflows | Read |
| Vercel AI SDK | Middleware for Vercel AI SDK | Read |
| Feature | Benefit |
|---|---|
| 🎯 Speculative Cascading | Tries cheap models first, escalates intelligently |
| 💰 40-85% Cost Savings | Research-backed, proven in production |
| ⚡ 2-10x Faster | Small models respond in <50ms vs 500-2000ms |
| ⚡ Low Latency | Sub-2ms framework overhead, negligible performance impact |
| 🔄 Mix Any Providers | OpenAI, Anthropic, Groq, Ollama, vLLM, Together + LiteLLM (optional) + LangChain integration |
| 👤 User Profile System | Per-user budgets, tier-aware routing, enforcement callbacks |
| ✅ Quality Validation | Automatic checks + semantic similarity (optional ML, ~80MB, CPU) |
| 🎨 Cascading Policies | Domain-specific pipelines, multi-step validation strategies |
| 🧠 Domain Understanding | 15 domains auto-detected (code, medical, legal, finance, math, etc.), routes to specialists |
| 🤖 Drafter/Validator Pattern | 20-60% savings for agent/tool systems |
| 🔧 Tool Calling Support | Universal format, works across all providers |
| 📊 Cost Tracking | Built-in analytics + OpenTelemetry export (vendor-neutral) |
| 🚀 3-Line Integration | Zero architecture changes needed |
| 🔁 Agent Loops | Multi-turn tool execution with automatic tool call, result, re-prompt cycles |
| 🧭 Hermes Agent Routing | Per-skill, task-complexity, and topic-aware subagent routing with observe-mode rollout |
| 📋 Message & Tool Call Lists | Full conversation history with tool_calls and tool_call_id preservation across turns |
| 🪝 Hooks & Callbacks | Telemetry callbacks, cost events, and streaming hooks for observability |
| 🏭 Production Ready | Streaming, batch processing, tool handling, reasoning model support, caching, error recovery, anomaly detection |
| 💳 Budget Enforcement | Per-run and per-user budget caps with automatic stop actions when limits are exceeded |
| 🔒 Compliance Gating | GDPR, HIPAA, PCI, and strict model allowlists — block non-compliant models before execution |
| 📊 KPI-Weighted Routing | Inject business priorities (quality, cost, latency, energy) as weights into every model decision |
| 🌱 Energy Tracking | Deterministic compute-intensity coefficients for carbon-aware AI operations |
| 🔍 Decision Traces | Full per-step audit trail: action, reason, model, cost, budget state, enforcement status |
| ⚙️ Harness Modes | off / observe / enforce — roll out safely with observe, then switch to enforce when ready |
MIT © see LICENSE file.
Free for commercial use. Attribution appreciated but not required.
We ❤️ contributions!
📝 Contributing Guide - Python & TypeScript development setup
If you use cascadeflow in your research or project, please cite:
@software{cascadeflow2025,
author = {Lemony Inc., Sascha Buehrle and Contributors},
title = {cascadeflow: Agent runtime intelligence layer for AI agent workflows},
year = {2025},
publisher = {GitHub},
url = {https://github.com/lemony-ai/cascadeflow}
}
Ready to cut your AI costs by 40-85%?
pip install cascadeflow
npm install @cascadeflow/core
Read the Docs • View Python Examples • View TypeScript Examples • Join Discussions
Built with ❤️ by Lemony Inc. and the cascadeflow Community
One cascade. Hundreds of specialists.
New York | Zurich
⭐ Star us on GitHub if cascadeflow helps you save money!
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.
Native macOS app to monitor Claude AI usage limits and watch your coding sessions live
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
npx CLI installing 100+ agents, commands, hooks, and integrations in one command