A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
Self-growing multi-agent system: Gemma Orchestrator + Qwen Executor + Claude Escalation + Skill Factory
English | 繁體中文
The platform that makes local LLMs ship work.
Local models alone are weak. Wrap them in OpenTeddy and you get a real agent — hardened orchestration, a self-growing skills library, and just enough commercial-LLM escalation to finish what local can't.
🌐 Web: https://openteddy.net/ · 📦 Source: github.com/m31527/OpenTeddy
| 🍎 macOS desktop | OpenTeddy-1.0.2-aarch64.dmg (105 MB, Apple Silicon, signed + notarized) |
| 🐧 Linux desktop (NEW) | .AppImage / .deb for x86_64 — see Releases (auto-built via GitHub Actions on every tag) |
| 🐧 Linux / WSL2 OSS | curl -fsSL https://openteddy.net/install | bash |
| 🐳 Docker | see Docker Deployment below |
A 2B / 4B / 7B local model on its own is a toy. It hallucinates, it loops, it stops mid-task. The model isn't the product — the platform around it is. OpenTeddy is that platform:
num_ctx, max_tokens, and timeout per tier.The result: your $0/token local hardware actually finishes the job, and the savings counter in the sidebar is what makes you stop worrying about Claude Pro auto-renewing.
If this resonates with you — or you just want to cheer the project on — please drop a ⭐ on the repo. It genuinely helps and keeps me motivated to ship more. → github.com/m31527/OpenTeddy
unhealthy containers, ERROR 1045, command not found), or "the task asked for a file but the model produced zero" all trigger cloud-LLM intervention automatically..pdf into chat, ask questions; the agent extracts text page-by-page (CJK supported) and can cite page numbers. Image-only PDFs are flagged honestly rather than fabricated.doc_to_markdown (v1.1.0) — PowerPoint, Word, Excel, EPUB, images (EXIF + OCR), audio (EXIF + transcription), HTML, CSV/JSON/XML, ZIP archives, and YouTube URLs — all read through a single tool backed by Microsoft markitdown. PDFs unchanged (pypdf still canonical, A/B tested better on resumes / forms).cyber_skill_lookup (v1.1.0) — indexed catalogue of cybersecurity workflows from Anthropic-Cybersecurity-Skills (754 entries, mapped to MITRE ATT&CK / NIST CSF / D3FEND / ATLAS / NIST AI RMF) plus last30days-skill for multi-platform trend research. Agent auto-consults the catalogue first when goals touch security / IR / forensics / trend analysis — Nitter / Reddit JSON API workarounds beat the "browser_fetch → login wall → fail" dead end.DELETE / DROP / TRUNCATE / UPDATE) is hard-blocked on every code path — defence in depth.agent-workspace/sessions/<id>/; files from one session never bleed into another. Toggleable in Settings.num_ctx, and pinned session-workspace context that prevents "model wanders to the wrong directory" drift.tail -f, journalctl -f, watch …, and auto-adds -d to docker compose up so a runaway log stream can't hang an entire subtask.web_search tool (Brave Search API) so the local model can ground answers in current data instead of hallucinating recent events / version numbers / today's prices.TESTING to ACTIVE automatically; the count grows as the install matures.📎 Files produced with text artifacts inlined / binary artifacts sent as tap-to-download attachments. Auto-approves high-risk tools so you don't have to leave Telegram for routine work — but a hard denylist (rm, rmdir, DROP TABLE, TRUNCATE TABLE, DELETE FROM, mkfs, dd if=…/of=/dev/…, …) hard-blocks destructive actions regardless. Commands: /start, /help, /cancel, /new. See Remote Access../run.sh --host 0.0.0.0 + your tailnet means the dashboard works from your phone's browser too. Sessions / chat-mode / artifact previews all responsive; mobile header collapses the session controls into a ⋯ kebab. No port-forwarding, no nginx, no public DNS — just install the Tailscale app on your phone and hit http://<your-machine>:8000.desktop/.csv_describe + python_exec tools and an HTML report generator that renders charts with value labels.static/i18n.js; build-hash check + per-commit cache-buster auto-reload when the dashboard is updated.User Goal
│
▼
┌───────────────────────────────────────────────────┐
│ Orchestrator (Gemma via Ollama) │
│ • Decomposes goal into ordered SubTasks │
│ • Streams plan tokens to the UI as it thinks │
│ • Retrieves long-term memory for context │
│ • Drives execution + escalation loop │
└────────────────────┬──────────────────────────────┘
│ SubTasks
▼
┌───────────────────────────────────────────────────┐
│ Executor (Qwen via Ollama, function calling) │
│ • Runs a matching Skill if available │
│ • Uses tools: shell, file (incl. pdf_extract_text),│
│ http, db, gcp, package, csv_describe, │
│ python_exec, generate_report, web_search │
│ • Streams answer tokens; parallelises low-risk │
│ tool calls; caps per-tool-name retries │
│ • Compresses old turns when context fills up │
│ • Reports confidence (clamped on hard failures) │
└────────────────────┬──────────────────────────────┘
│ produced files
▼
┌───────────────────────────────────────────────────┐
│ Deliverable Verifier (LLM-as-judge, Qwen) │
│ • Reads the produced HTML/MD/Py/etc. │
│ • Verdict: PASS or FAIL — forces retry on FAIL │
│ • Skipped via `verification_enabled = false` │
└────────────────────┬──────────────────────────────┘
low conf │ timeout │ failure signal │ unhealthy
▼
┌───────────────────────────────────────────────────┐
│ Escalation Agent (Claude via API) │
│ • Resolves hard subtasks with full diagnostics │
│ • Synthesises the final summary │
└────────────────────┬──────────────────────────────┘
▼
┌───────────────────────────────────────────────────┐
│ Skill Factory (Claude via API) │
│ • Generates new Python skills on demand │
│ • Promotes skills after N successes │
│ • Saves skills to disk + SQLite DB │
└───────────────────────────────────────────────────┘
The agent loop has been progressively hardened to make small / mid-size local models (Gemma 3:4B, Qwen 2.5:3B class) reliable enough to ship work end-to-end, not just fast enough to look impressive on a single tool call:
| Mechanism | What it does |
|---|---|
| Adaptive prompts | Compact system prompts on small models; richer guidance only when context allows. |
| Parallel tool fan-out | Low-risk tool calls (file reads, shell ls, HTTP gets, csv_describe) inside a single round are dispatched with asyncio.gather instead of serially. |
| Per-step deliverable verification | After each successful subtask, an LLM-as-judge reviews the produced HTML/MD/code file. If it looks like a description of the goal rather than the actual deliverable (the "Snake Game report" failure pattern), the subtask is forced to retry with feedback. |
| Context watchdog | When the prompt size approaches num_ctx, the executor compresses earlier turns into a recap and pins discovery memos to the system prompt — keeping recent tool context intact instead of letting Ollama silently truncate. |
| Discovery memos | Useful one-off facts learned from tool calls (e.g. "the workspace already contains data.csv with columns X/Y/Z") are pinned to the system prompt so the model doesn't re-discover them every round. |
| Per-tool-name cap (tiered) | Read-only inspection tools (read_file, list_directory, db_query, db_describe_table, csv_describe, pdf_extract_text, web_search, shell_exec_readonly) get a cap of 10–15. State-mutating tools (write_file, python_exec, shell_exec_write, …) stay at 5 — they're the ones that actually loop. |
| Empty-artifact guard | When a code/analytic subtask description mentions building / creating / writing a concrete file but workspace got 0 new artifacts, confidence is clamped and the loop forces a retry (eventually escalates). Catches "small model wrote a confident summary without actually producing the deliverable" — the deliverable judge can't catch this case because it has no file to look at. |
| macOS PATH augmentation | Shell subprocesses get /opt/homebrew/bin, /usr/local/bin, ~/.cargo/bin, /Applications/Docker.app/Contents/Resources/bin, … appended to PATH. Tauri's minimal LaunchServices PATH would otherwise leave the agent with docker: command not found on a Mac that obviously has Docker installed. |
| Circuit breaker | After 5 cumulative tool failures the loop is forced to commit to a final answer instead of looping forever. |
| Common error hints | Twelve frequent stack-trace patterns (ModuleNotFoundError, KeyError, PermissionError, …) are matched against tool stderr and converted into one-line hints so the model corrects itself instead of repeating the same mistake. |
| WS reconnect + replay | The dashboard WebSocket carries a 600-event ring buffer keyed by sequence number — a refreshed tab or a wifi blip replays missed events on reconnect. |
| Pinned workspace context | Every executor round prepends WORKSPACE: <abs-path> to the user message, and shell-tool refusals embed the correct path — small models stop drifting to "their idea of the project root" and re-emitting working_dir=/home/.../OpenTeddy round after round. |
| Forever-command guard | _sanitize_command auto-adds -d to docker compose up, strips -f/--follow from docker logs / docker compose logs, and refuses tail -f, journalctl -f, watch … outright. Stops a container in a restart-crash loop from holding a subtask hostage forever. |
| Web-search grounding | Chat mode exposes the web_search tool (Brave Search) so the local model can ground answers in current data instead of hallucinating events / version numbers / prices past its training cutoff. |
OpenTeddy/
├── config.py # Config via .env / environment variables
├── models.py # Pydantic models + SQLite schema
├── tracker.py # Async SQLite persistence (aiosqlite) + perf stats
├── skill_factory.py # Claude-powered skill generation & loader
├── executor.py # Qwen executor — function calling, streaming,
│ # parallel low-risk tools, context watchdog,
│ # discovery memos, per-tool cap, circuit breaker
├── escalation.py # Claude escalation agent
├── orchestrator.py # Gemma orchestrator (plan → execute → verify →
│ # escalate) + per-step deliverable judge
├── memory.py # ChromaDB long-term memory
├── approval_store.py # Human-in-the-loop approval queue
├── settings_store.py # Hot-reloadable settings (SQLite-backed)
├── tool_registry.py # Tool registration + risk gating
├── tools/ # shell / file / http / db / gcp / package /
│ # analytic (csv_describe, python_exec) /
│ # report_tool (HTML + Chart.js datalabels)
├── skills/ # Auto-generated skill .py files
├── static/ # Web dashboard (index.html, i18n.js — 22 locales,
│ # OpenTeddy-logo.svg)
├── desktop/ # Native macOS Tauri 2.x client (own repo)
├── main.py # FastAPI server + CLI entry point + WS ring buffer
└── .env.example # Environment variable template
curl -fsSL https://openteddy.net/install | bash
The installer:
git clones OpenTeddy to ~/OpenTeddy.venv + pip install -r requirements.txtgemma4:e2b, qwen3.5:2b) if Ollama is presentIt's idempotent — re-run any time to pull the latest source + refresh deps. All work scoped to $HOME/OpenTeddy, no sudo, no secondary scripts fetched.
Want to audit before running? curl -fsSL <url> -o install.sh && less install.sh && bash install.sh is encouraged. The script also accepts --dry-run to preview what it would do without changing anything.
After install:
cd ~/OpenTeddy
./run.sh --open # boots uvicorn on :8000 + opens browser
./run.sh auto-activates .venv, pings the Ollama daemon, then runs uvicorn main:app --reload so editing source hot-reloads the backend. Common flags:
./run.sh # local-only on 127.0.0.1:8000 (the safe default)
./run.sh --open # also opens http://localhost:8000 in your browser
./run.sh --port 8001 # bind a different port
./run.sh --host 0.0.0.0 # ⚠ expose to LAN / Tailscale / other machines
./run.sh --no-reload # production-style — don't watch for file changes
./run.sh --help # full flag list
⚠️
--host 0.0.0.0opens the agent to every machine that can reach the port. The agent hasshell_exec_write/delete_fileand other powerful tools. Only use0.0.0.0when you trust every device on that network — a private home LAN, a Tailscale tailnet, or a server behind a real firewall. For public servers, put it behind nginx / Caddy / Cloudflare Tunnel with auth. For "I want to use OpenTeddy from my phone", the recommended setup is--host 0.0.0.0+ Tailscale — see Remote Access.
Customisation flags for the installer: --dir <path>, --force, --skip-models. See ./install.sh --help.
If you'd rather install by hand:
ollama pull gemma4:e2b
ollama pull qwen3.5:2b
git clone https://github.com/m31527/OpenTeddy.git
cd OpenTeddy
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env — at minimum set ANTHROPIC_API_KEY
uvicorn main:app --reload
# Dashboard → http://localhost:8000
# API docs → http://localhost:8000/docs
If you're deploying OpenTeddy on a Linux server (NVIDIA DGX Spark, Jetson, Raspberry Pi 5, a rack box, anywhere without a permanent monitor), there are extra moving parts beyond uvicorn main:app:
chrome_attached_tool suitex_search('黴菌', top_n=10) is meaningless if the underlying browser is signed outThree bundled scripts under scripts/ handle the entire flow:
| Script | When to run |
|---|---|
scripts/quickstart.sh --login | First time only on a fresh node. Sets up everything below in one command. |
scripts/setup-edge-cdp.sh | Manually invoked by quickstart. Installs Brave (ARM64) or Microsoft Edge (amd64) + writes the systemd unit. |
scripts/login-helper.sh | Whenever a site's cookies expire (X ~30 days, LinkedIn ~90). Just opens a GUI browser for you to re-login. |
On a GUI-capable session (AnyDesk, physical monitor, ssh -X, GNOME/KDE):
git clone https://github.com/m31527/OpenTeddy
cd OpenTeddy
bash scripts/quickstart.sh --login
This will:
aarch64 / arm64 → Brave Browser (official ARM64 apt repo, no sandbox issues)x86_64 → Microsoft Edge (official amd64 apt repo)openteddy-cdp.service) that keeps the browser running headless on 127.0.0.1:9222 with Restart=always. Survives reboots.pip install -r requirements.txt.:8000 + healthcheck-verify it.After this, http://<host>:8000 is a fully working OpenTeddy. Chat "整理 X 上最近討論黴菌的熱門推文 top 10" and the x_search tool will pull real tweets through your logged-in session.
You have two clean options:
ssh -X admin@<host> — Brave's window forwards to your local laptop's screen. Works as long as you have X11 on your laptop (Linux has it natively, macOS needs XQuartz).scripts/setup-novnc-login.sh — installs Xvfb + noVNC so you can log in via a web browser:
sudo bash scripts/setup-novnc-login.sh
sudo systemctl start openteddy-novnc-login.service
# On your laptop:
ssh -L 6080:localhost:6080 admin@<host>
# Then open http://localhost:6080/vnc.html in any browser
# Log in via the Brave window that appears, close the tab, then:
sudo systemctl stop openteddy-novnc-login.service
127.0.0.1 only by default — the SSH tunnel is the auth + encryption layer. To expose more widely (Tailscale mesh, etc.), pass OPENTEDDY_NOVNC_BIND=0.0.0.0 to the setup script; the install summary explains the security trade-offs.You'll know cookies have expired when x_search starts returning empty posts: [] or your scheduled trend-tracking task starts coming back blank. Just run:
bash scripts/login-helper.sh
It pauses the headless service, opens a GUI browser pointed at https://x.com/login, waits for you to log in and close the window, then automatically restarts the headless service. Takes about 2 minutes.
If you're running OpenTeddy on a Mac (Apple Silicon or Intel) and want
the same browser-scraping capability without dropping into Terminal every
time, the macOS counterpart of the above is in scripts/setup-mac-chrome.sh
scripts/login-mac-helper.sh:# First-time setup — installs a LaunchAgent that keeps Chrome (or Brave /
# Edge / Chromium if Chrome isn't installed) running headless on
# 127.0.0.1:9222 across reboots. Idempotent.
bash scripts/setup-mac-chrome.sh
# Whenever you need to log in to a site (X / Threads / LinkedIn / etc.):
bash scripts/login-mac-helper.sh
# → temporarily swaps in a headful Chrome window pointed at the same
# profile; log in, close the window, the LaunchAgent restarts the
# headless instance automatically.
# Uninstall:
bash scripts/setup-mac-chrome.sh --uninstall
Key difference from Linux: the macOS setup uses a SEPARATE Chrome profile
(~/Library/Application Support/OpenTeddy/Chrome-CDP) instead of sharing
your day-to-day profile. That way OpenTeddy's scraping Chrome runs
alongside the Chrome window you have open for normal browsing — no
killing your existing tabs every time. Trade-off: you have to log in to
scraping sites once inside the OpenTeddy profile (via
login-mac-helper.sh), separately from your normal Chrome.
The same three scripts work as an Ansible playbook payload — each node gets identical setup, then each operator-trusted node gets its own login once. Pattern that we use on NVIDIA DGX Spark fleets:
# One-time per node (e.g. via Ansible / cloud-init)
ansible all -m shell -a "git clone https://github.com/m31527/OpenTeddy /opt/openteddy"
ansible all -m shell -a "bash /opt/openteddy/scripts/quickstart.sh"
# Then on each node, an operator logs in via AnyDesk / VNC + runs:
# bash /opt/openteddy/scripts/login-helper.sh
Each node owns its own browser profile + cookies; no cross-node cookie sync needed. If you want fleet-wide cookie sharing (one login, all nodes inherit), drop a storage_state.json (Playwright format) at /var/lib/openteddy/storage_state.json — chrome_attached_tool auto-injects it on every attach. See scripts/capture-edge-state.md for the capture recipe.
Two complementary ways to reach your OpenTeddy instance away from the machine it runs on. Both work against the same server and the same sessions — you can start a goal in Telegram on the train and finish reading the artifact output in the desktop web UI when you're back home.
The simplest "let me check on the agent from anywhere" setup. Zero port-forwarding, no DNS, no nginx.
On the server (the machine running OpenTeddy):
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
./run.sh --host 0.0.0.0
--host 0.0.0.0 makes uvicorn bind to all interfaces (including the
tailnet one). Tailscale itself is what restricts who can actually reach
the port — only devices on your tailnet.
On your phone: install the Tailscale app from the App Store / Play Store, sign in with the same account, turn on the VPN toggle.
Open the browser and hit http://<machine-name>:8000 (or the
tailnet IP from tailscale status). The web UI loads with the same
sessions, the same artifact chips, the same WebSocket live stream. On
a phone-width screen the session header collapses memory / privacy /
export controls into a ⋯ kebab and the mode switcher reflows.
⚠️ Why Tailscale instead of just
--host 0.0.0.0on the LAN? Plain LAN exposure means anyone on your WiFi (including guest devices) can drive your agent — and the agent hasshell_exec_writeand friends. Tailscale ACLs let you keep the port reachable from only the devices you explicitly approve. If you do want plain LAN, only do it on a private home network you fully control.
Send a goal from anywhere, get the result pushed back to the same chat. Live progress updates edit a single message in place — no spam. Built for self-hosted servers that stay running 24/7 (Mac mini, NUC, home Linux box) — long-polling stops when the server stops, so this is a worse fit for the desktop app that you close and reopen all day.
Open Telegram, talk to @BotFather, send /newbot, follow the prompts.
Save the bot token (looks like 123456:ABC-DEF1234...).
Send any message to @userinfobot. It replies with your numeric id
(e.g. 987654321). For a group: send a message in the group, then
forward it to @userinfobot — it shows the group's chat-ID (negative
number, e.g. -1001234567890).
Search for your bot's @username in Telegram and tap Start (or send
/start). This is the single most-missed step — without it, Telegram's
"bot can't message you out of the blue" rule kicks in and every
outbound test pings back chat not found.
In Settings → Notification Credentials:
| Field | Value |
|---|---|
| Bot Token | the token from BotFather |
| Default Chat ID | your numeric chat-ID (lets the telegram_send tool work) |
| Test ping button | click — expect ✓ + a "🐻 OpenTeddy ping" message in Telegram |
| Enable inbound polling | ✅ check |
| Chat-ID whitelist | your chat-ID(s), comma-separated |
Save. Restart the server (hot-reload of the toggle is on the
backlog — for now ./run.sh must be restarted to start the polling
loop). On boot you should see in the log:
[INFO] telegram_bridge: Telegram inbound bridge started — polling with 1-id whitelist.
If you instead see Telegram inbound bridge NOT started: … the message
spells out exactly which field needs another look.
| You send | What happens |
|---|---|
| any text | run as a goal in this chat's bound session; reply with the result |
/start | confirm you're connected |
/help | command list |
/cancel | abort the currently-running task |
/new | start a fresh session (old one stays in history) |
The agent's reply includes:
✅ Completed · 12.4s · 3 subtasks📎 Files produced — every artifact (incl. shell-redirect outputs
caught by the post-subtask workspace scanner) with size + emitting toolsendDocument
so you can tap-to-download from inside the chat.shell_exec_write, python_exec, file_write and
friends run without web-UI approval prompts that nobody would see.rm / rmdir / unlink / git rm
/ shred, SQL DROP TABLE / DROP DATABASE / TRUNCATE TABLE
/ DELETE FROM, system-level mkfs / dd if=…/of=/dev/…
/ > /dev/sd[a-z] / fdisk / format X: / recursive chmod 0…,
plus any tool whose name matches *delete* / *remove* / *drop_table*
/ *truncate* / *wipe* / *purge*. The agent's reply explains the
block and points at the web UI for interactive approval.asyncio.wait_for(timeout=600) so a hung Ollama call or tool deadlock
can't freeze the chat — the bot replies ⌛ Task ran longer than 10 min and was force-cancelled instead of going silent.curl -s http://<server>:8000/admin/telegram/status | jq
Returns the bridge's runtime state — running, inbound_enabled,
token_set, the parsed whitelist, the most recent silently-dropped
chat_id (the single fastest answer to "why isn't my bot replying?"),
and any in-flight chats. Token is never returned, only a boolean flag.
Turn several OpenTeddy installs into one cluster: a central node dispatches goals to worker nodes (each runs the goal on its own model + tools + data), and workers proactively push alerts when their watcher loop spots an anomaly. Built for 5–10 node NVIDIA DGX Spark fleets but works on any networked installs.
Off by default — zero impact on single-machine installs. Nothing in
fleet/ is imported unless OPENTEDDY_FLEET_ROLE is set. Desktop /
personal users never touch any of this; no .env fleet keys are
required.
bash scripts/fleet-demo.sh # in-process central+worker → 🎉 PASSED
.env per roleStep 1 — one shared token, same value on every node:
openssl rand -hex 32 # copy the output
Step 2 — the central node (exactly one). Append to its .env:
OPENTEDDY_FLEET_TOKEN=<the token from step 1>
OPENTEDDY_FLEET_ROLE=orchestrator
OPENTEDDY_FLEET_PORT=8770
Step 3 — each worker node. Append to its .env:
OPENTEDDY_FLEET_TOKEN=<the SAME token>
OPENTEDDY_FLEET_ROLE=worker
OPENTEDDY_FLEET_CENTRAL=ws://<central-host-or-ip>:8770
OPENTEDDY_FLEET_NODE_ID=dgx-02 # a name for this node
OPENTEDDY_FLEET_NODE_ROLE=finance # its job: finance / secops / …
# optional — proactive monitoring:
OPENTEDDY_FLEET_WATCH_ENABLED=1
OPENTEDDY_FLEET_WATCH_PROMPT=檢查 /data/finance 最近一小時是否有異常大額付款
Ready-made templates with full comments:
cat fleet/env.orchestrator.example >> .env (central) /
cat fleet/env.worker.example >> .env (worker).
Step 4 — restart each node, then test from the central:
# list connected nodes
curl -s http://localhost:8000/fleet/nodes | python3 -m json.tool
# dispatch a goal (auto-picks an idle worker)
curl -s -X POST http://localhost:8000/fleet/dispatch \
-H 'Content-Type: application/json' \
-d '{"node_id":"auto","goal":"整理今日 GitHub trending top 5","mode":"code"}' \
| python3 -m json.tool
# proactive alerts pushed by workers' watchers
curl -s http://localhost:8000/fleet/alerts | python3 -m json.tool
Or just open the web console at http://<central>:8000/fleet —
three tabs: Workers (live status), Playground (type a goal → it
auto-picks an idle worker), Alerts (proactive anomaly reports).
Full guide: fleet/README.md ·
design: docs/fleet-architecture.md.
By default OpenTeddy's local executor runs on Ollama (cross-platform, zero setup). On a Linux+CUDA fleet node serving concurrent load (multiple operators + watcher loops hitting one node), vLLM can serve those requests together via continuous batching. On single-stream use it does not beat Ollama — both are memory-bandwidth-bound on DGX-class hardware — so vLLM only earns its keep under concurrency.
macOS / non-CUDA: skip this. vLLM is Linux + NVIDIA only; OpenTeddy hard-gates Darwin to Ollama, and the Settings toggle is disabled there.
One script sets it up in a dedicated venv (never touches OpenTeddy's own — vLLM pins conflicting deps that would otherwise break chromadb) and installs the JIT build toolchain (ninja / python3-dev) that vLLM needs:
df -h / # need ~25 GB free for a 7B model + venv
sudo systemctl stop ollama # free GPU memory for the test
sudo bash scripts/setup-vllm.sh --model Qwen/Qwen2.5-7B-Instruct --gpu-mem 0.5 --enforce-eager
# verify OpenTeddy ↔ vLLM (uses OpenTeddy's .venv; it only HTTP-calls vLLM)
OPENTEDDY_LOCAL_ENGINE=vllm VLLM_BASE_URL=http://127.0.0.1:8001 \
QWEN_MODEL=Qwen/Qwen2.5-7B-Instruct .venv/bin/python scripts/verify-vllm.py
# → "ALL PASS" means it works
Then switch to it via Settings → Model Settings → Local Inference
Engine → vLLM, or set OPENTEDDY_LOCAL_ENGINE=vllm +
VLLM_BASE_URL=http://127.0.0.1:8001 in .env.
For coexistence with Ollama (the real fleet config — planner on
Ollama, executor on vLLM) run vLLM at --gpu-mem 0.35 so both fit in
unified memory, and don't stop Ollama.
Hit a wall during setup? Every gotcha we tripped over (dedicated-venv
isolation, missing ninja / Python.h, OOM vs Ollama, systemd restart loops
hiding the real error, --enforce-eager for fast startup) is documented
with fixes in docs/vllm-deployment.md.
| Method | Endpoint | Description |
|---|---|---|
POST | /run | Submit a task |
GET | /tasks/{id} | Check task status |
GET | /tasks | List recent tasks (filter by session_id) |
GET | /skills | List all skills |
POST | /skills/generate?name=…&description=… | Manually create a skill |
GET | /tools | List available tools |
GET | /approvals | Pending human approvals |
POST | /approvals/{id}/approve | /reject | Resolve an approval |
GET | /memory | Browse long-term memory |
GET | /usage, /usage/summary | Token usage & estimated cost |
GET | /benchmark/stats | Per-model token-throughput stats (#6) |
GET | /settings | POST /settings | Read/update runtime settings |
GET | /settings/ollama/models | /status | Local model management |
POST | /settings/ollama/pull | Pull a model (streamed progress) |
GET | /version | Build hash + version (used by UI auto-reload) |
GET | /update/check | Check GitHub Releases for a newer version |
POST | /update/apply | Apply an available update |
POST | /optimize_prompt | Rewrite a draft goal via Claude |
GET | /admin/diagnostics | Download a zipped diagnostic bundle |
GET | /admin/telegram/status | Inbound bridge runtime snapshot (running, whitelist, last-dropped chat_id, in-flight chats). Safe to expose — never returns the bot token. |
POST | /settings/telegram/test | One-shot "OpenTeddy is connected" message to the default chat — friendly-error remapping translates Telegram's terse codes into 30-second fix steps. |
GET | /sessions/{id}/export | Download a single-JSON dump of the session (metadata + tasks + subtasks + memory + DB connection with password masked). Used by the chat header's ⋯ kebab → 📥 Export. |
GET | /health | Health check |
WS | /ws?since=N | Live event stream — since replays the ring buffer from sequence N |
curl -X POST http://localhost:8000/run \
-H 'Content-Type: application/json' \
-d '{"goal": "Summarise the key benefits of async Python", "priority": 7}'
OpenTeddy tries to keep every task local. Claude is called only when the local path breaks down:
| Trigger | Where | Default |
|---|---|---|
| Subtask timeout (local model hangs) | orchestrator._run_subtask | 120 s |
| Low self-reported confidence | executor._qwen_execute | < 0.6 |
| Repeated failures in a row | orchestrator._run_subtask | 3 |
Hard-failure signal in tool output (unhealthy, Exited, ERROR 1045, Error response from daemon, …) | executor._finalize_response | confidence clamped to 0.3 → escalates |
| Container health check fails after a Docker task | orchestrator._inspect_docker_health | auto-pulls docker logs + inspect, then escalates |
Deliverable verifier returns FAIL | orchestrator._verify_deliverable | confidence clamped to 0.3 → retry, then escalate |
| Circuit breaker tripped (5 cumulative tool failures) | executor._qwen_execute | forces final-answer commit; escalation kicks in if confidence is still low |
This keeps cost low for everyday work while still guaranteeing you get a real answer when the local model cannot deliver one. Two ways to opt out:
Set OPENTEDDY_LLM_MODE=local in .env if you want the local-only choice baked in across restarts.
OpenTeddy learns by spotting patterns in your usage, not by asking the local model to introspect about itself. The mechanism:
task_result memories
whose embedding is semantically close to the just-finished goal.
When ≥ SKILL_AUTO_DETECT_MIN_REPEATS (default 3) past goals
score ≥ SKILL_AUTO_DETECT_SIMILARITY (default 0.75) — that's a
recurring pattern.{name, description} capturing the reusable function.SkillFactory.generate_skill(name, description)
asks Claude (or whichever cloud LLM is configured) to write the
async def run(input_data: dict) -> str function and saves it to
skills/<name>.py.TESTING. After
SKILL_PROMOTION_THRESHOLD (default 5) successful invocations it's
promoted to ACTIVE.ACTIVE
skill at ≥ SKILL_MATCH_THRESHOLD (default 0.4) invoke it
directly, skipping the LLM tool-call round entirely.The original mechanism asked the executor LLM to set
skill_needed/skill_description in its JSON output. Empirically, 2-3B
parameter models almost never produce that kind of metacognitive
self-flag — verified on a real install with > 100 tasks and zero
auto-generated skills. The embedding approach moves the "is this
recurring" judgment from the model's introspection (unreliable) to a
deterministic similarity check against memory (reliable).
Tunable knobs live in Settings → Parameter Settings, or via the
SKILL_AUTO_DETECT_* env vars in .env. Set min_repeats=0 to
disable auto-detection entirely (skills can still be created manually
via POST /skills/generate?name=…&description=…).
OpenTeddy is a one-person side project. Hosting, cloud-LLM API testing, the macOS signing + notarisation pipeline, and the time it takes to keep shipping new tools all cost real money and weekends each month. If this project saves you time, please consider chipping in:
No pricing tiers, no feature gates, no "premium". Everything in this repo stays MIT-licensed and free forever — the coffee just buys me a few extra hours to keep adding tools, fixing planner edge cases, and writing the docs nobody else will write.
Most of these can also be edited live from the dashboard's Settings panel —
changes are persisted to SQLite and config.reload_from_store() re-applies them
without a server restart.
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY | — | Required only if escalation is enabled. Anthropic API key. |
CLAUDE_MODEL | claude-opus-4-6 | Claude model for escalation. |
GEMMA_BASE_URL | http://localhost:11434 | Ollama base URL for the orchestrator. |
GEMMA_MODEL | gemma4:e2b | Orchestrator model tag. |
QWEN_BASE_URL | http://localhost:11434 | Ollama base URL for the executor. |
QWEN_MODEL | qwen3.5:2b | Executor model tag. |
BRAVE_SEARCH_API_KEY | — | Optional. Powers the Chat-mode web_search tool. Free tier covers 2,000 queries/month at api-dashboard.search.brave.com. Without it, the local model answers from training data and warns the user about staleness. |
DB_PATH | openteddy.db | SQLite database path. |
MEMORY_DB_PATH | ./memory_db | ChromaDB directory. |
SKILLS_DIR | skills | Directory for skill files. |
| Variable | Default | Description |
|---|---|---|
OPENTEDDY_LLM_MODE | mixed | One of local / mixed / cloud. local = Gemma plans, Qwen executes, never call cloud. mixed = local-first with cloud safety net on failure. cloud = every subtask handled directly by the configured cloud LLM (Ollama not required). The legacy ESCALATION_ENABLED below is auto-derived from this. |
LLM_PROVIDER | anthropic | Which cloud LLM provider escalation + Cloud mode route to. One of anthropic / openrouter / openai / gemini / deepseek. |
| Variable | Default | Description |
|---|---|---|
ESCALATION_ENABLED | true | Legacy kill-switch — now auto-derived from OPENTEDDY_LLM_MODE (local → False, mixed / cloud → True). Setting this directly still works for backward compat. |
ESCALATION_THRESHOLD | 0.6 | Min Qwen confidence before escalation. |
ESCALATION_FAILURE_LIMIT | 3 | Max consecutive failures before escalation. |
SUBTASK_TIMEOUT | 900 | Wall-clock seconds before a subtask is treated as hung. Real hang detection is via SHELL_SILENCE_TIMEOUT; this is just the ceiling. |
SHELL_SILENCE_TIMEOUT | 180 | Kill a shell command after this many seconds of no stdout/stderr output. Long-but-active commands (docker build, pip install) stay alive as long as they're printing progress. |
SKILL_PROMOTION_THRESHOLD | 5 | Successes needed to promote a TESTING skill to ACTIVE. |
SKILL_AUTO_DETECT_MIN_REPEATS | 3 | Min number of past goals that must match the current goal (above the similarity floor) before OpenTeddy synthesises a new skill. Set to 0 to disable auto-detection entirely. |
SKILL_AUTO_DETECT_SIMILARITY | 0.75 | Cosine-similarity floor (0.0-1.0) for counting a past task as "recurring". Calibrated against ChromaDB's default MiniLM embedder; bump to ~0.9 if you see weird skills getting generated, drop to ~0.65 if expected patterns aren't being caught. |
SKILL_MATCH_THRESHOLD | 0.4 | Min similarity to match an existing ACTIVE skill against a new goal — skips the LLM round entirely on match. |
APPROVAL_AUTO_APPROVE_AFTER | 0 | Seconds after which a high-risk approval auto-resolves to approved. 0 = off (the safer default — wait for explicit click). |
Most of these matter most on big models — turn them off to trade safety nets for speed.
| Variable | Default | Description |
|---|---|---|
STREAMING_ENABLED | true | Stream LLM tokens to the chat as they generate. Major perceived-latency win on small thinking models. |
VERIFICATION_ENABLED | true | Run the per-step LLM-as-judge verifier after each successful subtask. Set to false on big-model setups (DGX Spark, qwen3.5:35b) where each judge call is 5–60s. |
QWEN_NUM_CTX | 16384 | Ollama num_ctx for the executor. Larger = more tool-round history before the watchdog has to compress, but more VRAM. |
GEMMA_NUM_CTX | 16384 | Same, for the orchestrator. |
CONTEXT_COMPRESS_AT | 0.7 | Trigger context compression when prompt-token usage crosses this fraction of num_ctx. |
All blank by default — the relevant tools / bridges report a clear "not configured" error pointing at Settings, so a stock install never silently does the wrong thing. These can all be edited from Settings → Notification Credentials at runtime; the env-var form is just for headless / Docker-compose installs.
| Variable | Default | Description |
|---|---|---|
TELEGRAM_BOT_TOKEN | — | Bot token from @BotFather. Required for both outbound telegram_send and inbound polling. |
TELEGRAM_DEFAULT_CHAT_ID | — | Optional. When set, telegram_send and /settings/telegram/test can omit chat_id and send here by default. |
TELEGRAM_INBOUND_ENABLED | false | Master toggle for the long-polling Telegram→OpenTeddy bridge (see Remote Access → Bidirectional Telegram bot). |
TELEGRAM_INBOUND_WHITELIST | — | Comma-separated chat-IDs allowed to drive the agent (numeric for users / groups, @channelname for public channels). Empty = inbound refuses to start even if the toggle is on — we don't run open bots. |
SMTP_HOST / SMTP_PORT / SMTP_USER / SMTP_PASSWORD / SMTP_FROM | — | Used by the email_send tool. SMTP_PORT defaults to 587. |
WEBHOOK_SECRET | — | Optional shared-secret for POST /webhooks/{session_id}. Empty = endpoint is open to anyone on the network (UI warns when this is the case). |
OpenTeddy ships with a native macOS shell built on Tauri 2.x that wraps the
web dashboard inside a polished launcher. Source lives in desktop/
(its own repo — gitignored from the main repo).
What you get on top of the web UI:
confirm / alert / prompt (which
Tauri blocks) with in-app modals that match the chrome.users/{uid} doc on first launch. Google sign-in is
optional and unlocks cross-device cloud sync.pairings/{pairId} Firestore doc with HMAC-style nonce verification.subscription.status to active → upgrade pill auto-disappears.app.log + tasks/usage/settings
zip for bug reports.enter_main, so subsequent starts land on the
main window immediately.cd desktop
npm install
npx tauri dev # hot-reload dev (still needs uvicorn running separately)
# Iteration: dev build (filename gets a "dev-" prefix so you can't
# accidentally mistake it for a public release)
./scripts/build_macos.sh
# Ship: real release — builds + signs + notarizes + git-tags + uploads
# to GitHub Releases in one shot. Requires APPLE_DEV_ID +
# APPLE_NOTARY_PROFILE in desktop/.notarize.env.
./scripts/release.sh 1.0.2
./scripts/release.sh 1.0.2 --dry-run # walk through without acting
The shipping .dmg (from release.sh) is signed with our Apple
Developer ID and notarized by Apple, so the published builds work
with one double-click on any Mac — no Gatekeeper warnings, no
xattr workaround needed. (Self-built .dmgs from
build_macos.sh without the notarize env set are ad-hoc signed and
will hit Gatekeeper on machines other than the one that built
them — that's expected for dev iteration.)
| Platform | Status | Notes |
|---|---|---|
| macOS (Apple Silicon) | ✅ Native desktop .dmg (signed + notarized) + OSS web | Primary development target. |
| Linux (x86_64) | ✅ Native desktop .AppImage / .deb (NEW v1.0.3) + OSS web | Built on Ubuntu 22.04 CI; AppImage works on any glibc 2.34+ distro. |
| Windows (native) | ⚠️ Partial — use WSL2 if possible | See caveats below. No native desktop installer yet (roadmap). |
| Windows (WSL2) | ✅ Fully supported (OSS web) | Behaves like Linux. Recommended on Windows. |
The codebase itself is cross-platform Python (uses pathlib, os.path.join,
asyncio), and package_tool.py already handles the Windows venv layout
(Scripts\pip.exe). The things that actually trip Windows users are:
ls, rm -rf, grep, chmod, or pipes like cmd1 | tee file, those
are executed through the system shell — which is cmd.exe / PowerShell
on native Windows, so they fail. Running OpenTeddy under WSL2 makes
this a non-issue.lsof / ps are not available on native Windows. The deploy-tool
helpers that inspect port occupancy (port_probe, port_free in
tools/deploy_tool.py) degrade: port_probe
returns a bound/free flag but no PID/process name; port_free returns an
error and cannot kill by port.Recommendation: on Windows, install Ollama natively on the host, then run OpenTeddy itself inside WSL2 Ubuntu. That gives you GPU-accelerated local inference + a POSIX userspace for the shell-heavy parts of the agent.
docker-compose.yml uses extra_hosts: ["host-gateway:host-gateway"] so
the container can reach Ollama running on the host. This requires Docker
Engine 20.10+ on Linux, and Ollama must be bound to 0.0.0.0, not
just 127.0.0.1 — otherwise the container's bridged traffic can't reach
it. Set OLLAMA_HOST=0.0.0.0:11434 before ollama serve. On Docker
Desktop (Mac / Windows) this "just works".
cp .env.example .env
# Fill in ANTHROPIC_API_KEY
docker compose up -d
# Open http://localhost:8000
Notes:
ollama serve).host-gateway alias set in docker-compose.yml.openteddy_data Docker volume.docker compose up -d --build.The default docker-compose.yml only mounts an isolated named volume
(openteddy_data → /app/data). It does not bind-mount your home
directory, Desktop, Downloads, or any other host folder. That means:
~/Documents/report.pdf", "tidy up my Downloads folder",
or "run this script on my Desktop" will not work in the Docker setup —
the container simply cannot see those files./app/data and disappear
if the volume is removed.If you need the agent to operate on files on your machine, run OpenTeddy
directly with uvicorn (see Quick Start) instead of Docker.
The native process has full access to your filesystem (subject to your user's
permissions), which is what most "local assistant" use cases actually want.
Alternatively, if you really want to stay on Docker, you can add a bind mount
to docker-compose.yml — e.g.:
volumes:
- openteddy_data:/app/data
- ${HOME}/openteddy-workspace:/workspace # ← exposed host folder
…and then point the agent at /workspace inside the container. Only the
folders you explicitly mount are visible; everything else stays isolated.
OpenTeddy is a solo side-project trying to prove that a small open stack can get close to the big commercial agents. If you want to see it keep growing:
OpenTeddy itself is MIT.
Third-party content bundled in this repo:
| Bundled artifact | Source | Upstream license |
|---|---|---|
cyber_skills/index.json (the indexed 755-workflow catalogue) | mukul975/Anthropic-Cybersecurity-Skills (754 entries) + mvanhorn/last30days-skill (1 entry) | Apache 2.0 + MIT |
tools/doc_to_markdown.py wrapper around microsoft/markitdown | upstream PyPI package | Apache 2.0 |
cyber_skills/index.json is a derivative work — see
cyber_skills/README.md for attribution details. Every indexed entry
carries source_repo + upstream_url fields so any single workflow
can be traced back to its origin. Refer to the linked upstream repos
for the full license text and NOTICE files where applicable.
npx CLI installing 100+ agents, commands, hooks, and integrations in one command
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.
Native macOS app to monitor Claude AI usage limits and watch your coding sessions live
Pocket Flow: Codebase to Tutorial