A community-driven registry for Claude, Cursor, Windsurf, Cline & more. Not affiliated with Anthropic.
Are you the author? Sign in to claim
A 27-chapter hands-on tutorial for building an autonomous AI agent from zero in Python. Agent loop, tool system, memory,
Build a production-grade autonomous AI agent from scratch in Python. A 27-chapter, code-first tutorial covering the agent loop, tool system, session persistence, memory, skills, context compression, MCP, multi-platform gateway (Telegram / Discord / Slack / WeChat), and RL-based self-evolution — inspired by Hermes Agent.
Every chapter ships a runnable reference implementation under agents/sNN_*.py, paired with a prose explanation under docs/en/ (and docs/zh/ for the Chinese mainline). Read, run, tweak, repeat.
This repo does not try to mirror every product detail from the Hermes Agent codebase. It focuses on the mechanisms that actually decide whether an agent can work autonomously across platforms:
The goal is simple:
understand the real design backbone well enough that you can rebuild it yourself.
One sentence first:
The model does the reasoning. The harness gives the model a working environment that spans platforms, persists across sessions, and manages its own skills.
That working environment is made of a few cooperating parts:
Agent Loop: send messages to the model, execute tool calls, append results, continueTool System: a self-registering dispatch layer — the agent's handsSession Store: SQLite with FTS5 — conversation memory that survives restartsPrompt Builder: assemble system prompts from personality, memory, config, and contextContext Compression: keep the active window small when conversations grow longMemory & Skills: durable knowledge and agent-managed skill filesPermission System: detect dangerous commands before executionGateway: a single agent loop that listens on Telegram, Discord, Slack, WeChat, and moreTerminal Backends: run commands locally, in Docker, over SSH, or on serverless platformsCron / MCP / Voice: grow the single-agent core into a full working platformThis is the teaching promise of the repo:
This repo is not trying to preserve every detail that exists in the production system.
If a detail is not central to the agent's core operating model, it should not dominate the teaching line. That includes things like:
Those details may matter in production. They do not belong at the center of a 0-to-1 teaching path.
The assumed reader:
So the repo tries to keep a few strong teaching rules:
docs/en/s00-architecture-overview.mddocs/en/s00f-code-reading-order.mddocs/en/glossary.mddocs/en/teaching-scope.mddocs/en/data-structures.mdDo not open random chapters first.
The safest path is:
docs/en/s00-architecture-overview.md for the full system map.docs/en/s00f-code-reading-order.md so you know which source files to open first.s01-s06 -> s07-s11 -> s12-s15 -> s16-s20 -> s21-s27.If the middle and late chapters start to blur together, reset in this order:
docs/en/data-structures.mddocs/en/entity-map.mds01-s06: build a working single-agent core with persistences07-s11: add intelligence — memory, skills, safety, delegation, and configurations12-s15: go multi-platform — gateway, adapters, terminal backends, and schedulings16-s20: add advanced capabilities — MCP, browser, voice, vision, and background reviews21-s27: self-improvement — skill creation, hooks, trajectory/RL, plugins, evaluation, and optimization| Chapter | Topic | What you get |
|---|---|---|
s00 | Architecture Overview | the global map, key terms, and learning order |
s01 | Agent Loop | the synchronous conversation loop — ask, tool-call, append, continue |
s02 | Tool System | a self-registering tool registry with dispatch orchestration |
s03 | Session Store | SQLite + FTS5 persistence — conversations that survive restarts |
s04 | Prompt Builder | section-based system prompt assembly from personality, memory, and config |
s05 | Context Compression | auto-triggered LLM summarization when context grows too long |
s06 | Error Recovery | API error classification, retry with backoff, and provider failover |
s07 | Memory System | cross-session persistent knowledge with MEMORY.md and USER.md |
s08 | Skill System | agent-managed skills — create, edit, and execute |
s09 | Permission System | dangerous command detection and approval gates |
s10 | Subagent Delegation | spawn fresh context for isolated subtasks |
s11 | Configuration System | YAML config, env vars, profiles, and runtime migration |
s12 | Gateway Architecture | the multi-platform message dispatch loop |
s13 | Platform Adapters | building integrations for Telegram, Discord, Slack, WeChat, and more |
s14 | Terminal Backends | run commands in Docker, over SSH, on Modal, or Daytona |
s15 | Cron Scheduler | time-based automation with duration strings and cron expressions |
s16 | MCP Integration | external capability routing via Model Context Protocol |
s17 | Browser Automation | Playwright + Browserbase for web interaction |
s18 | Voice & Vision | TTS/STT pipelines and image analysis |
s19 | CLI Interface | prompt_toolkit + Rich for an interactive terminal experience |
s20 | Background Review | every N turns, a background pass updates memory and extracts skills |
s21 | Skill Creation Loop | background review extracts patterns into reusable skills |
s22 | Hook System | lifecycle hooks for extensibility without modifying core code |
s23 | Trajectory & RL | conversation trajectories become training data for model improvement |
s24 | Plugin Architecture | pluggable memory, compression, and capability providers |
s25 | Self-Evolution Overview | the core insight, four evolution targets, and full pipeline overview |
s26 | Evaluation System | eval datasets, LLM-as-judge fitness scoring, and constraint gates |
s27 | Optimization & Deploy | the feedback→mutate→select loop, full pipeline, and Phase 2-4 concepts |
If this is your first time learning this material systematically, do not spread your attention evenly across all details. For each chapter, focus on 3 things:
| Chapter | Key Data Structures / Entities | What you should have after this chapter |
|---|---|---|
s01 | messages list / AIAgent class / run_conversation() | a minimal working synchronous conversation loop |
s02 | ToolRegistry / ToolEntry / tool_result | a self-registering, self-discovering tool system |
s03 | SessionDB / state.db / FTS5 index | a SQLite persistence layer — conversations survive restarts |
s04 | build_context_files_prompt() / build_skills_system_prompt() | a pipeline assembling prompts from personality, memory, and config |
s05 | ContextCompressor / compression trigger threshold | an auto-summarization layer when context grows too long |
s06 | ClassifiedError / FailoverReason / classify_api_error() | error classification + backoff retry + provider failover |
s07 | MemoryStore / MemoryManager / MEMORY.md / USER.md | a layer that separates "temporary context" from "cross-session memory" |
s08 | SkillMeta / SkillBundle / skill SKILL.md files | a skill system that can create, edit, and execute |
s09 | DANGEROUS_PATTERNS / detect_dangerous_command() / _ApprovalEntry | a "dangerous operations must pass the gate" approval pipeline |
s10 | delegate_tool / child messages / isolated AIAgent | a subagent mechanism with isolated context for one-off delegation |
s11 | config dict / Profile management / migration functions | YAML config + profiles + runtime migration |
s12 | GatewayRunner / MessageEvent / platform routing | a unified multi-platform message dispatch loop |
s13 | BasePlatformAdapter / MessageType / SendResult | a reusable platform adapter pattern |
s14 | BaseEnvironment / local / docker / ssh / modal / daytona | abstract execution environments: local, Docker, SSH, cloud |
s15 | parse_schedule() / create_job() / get_due_jobs() / job dicts | a "when the time comes, work starts" scheduling layer |
s16 | mcp_tool / MCP config / tool schema bridging | a bus for plugging external tools and capabilities into the system |
s17 | browser_tool / Playwright / Browserbase provider | a browser automation layer for web interaction |
s18 | tts_tool / voice_mode / vision_tools | multimodal pipelines: voice I/O + image analysis |
s19 | HermesCLI / CommandDef / KawaiiSpinner / Rich rendering | a fully-featured interactive terminal interface |
s20 | BackgroundReviewer / _MEMORY_REVIEW_PROMPT / dual trigger counters | an "every N turns, auto-reflect → update memory/skills" background review mechanism |
s21 | skill creation loop / pattern extraction prompt / skill persistence pipeline | the "discover patterns → create reusable skills" prerequisite for self-evolution |
s22 | HookRegistry / PluginHookRegistry / BOOT.md handler | lifecycle hooks — inject custom logic without modifying core code |
s23 | convert_to_trajectory() / compress_trajectory() / reward functions | conversation data → training pipeline for model improvement |
s24 | plugin interfaces / memory providers / compression providers | pluggable memory and compression without touching core code |
s25 | EvalExample / EvalDataset | the foundational data structures for self-evolution |
s26 | SyntheticDatasetBuilder / FitnessScore / ConstraintValidator | measurement infrastructure — generate data, score outputs, gate changes |
s27 | SkillOptimizer / EvolutionResult / evolve_skill() | the optimization loop and full 7-step pipeline |
Best for readers encountering agent systems for the first time.
Read in this order:
s00 -> s01 -> ... -> s20 -> s21 -> ... -> s27 (follow the numbers; s24 is docs-only).
Best for "get it running, then fill in the gaps" readers.
Read in this order:
s01-s06: build a core agent with persistence and context compressions07-s11: add memory, skills, safety, delegation, and configs12-s15: go multi-platform, learn cross-environment executions16-s20: advanced capabilities plus the background self-reviews21-s27: step into self-evolution — skill creation, hooks, trajectories, evaluation, and optimizationIf you hit a wall in the middle or late chapters, do not push forward blindly.
Reset in this order:
docs/en/s00-architecture-overview.mddocs/en/data-structures.mddocs/en/entity-map.mdWhen readers truly get stuck, it is usually not "I can't read the code" but rather:
git clone <repo-url>
cd learn-hermes-agent
pip install -r requirements.txt
cp .env.example .env
Then configure your API key in .env, and run:
python agents/s01_agent_loop.py
Suggested order:
s01 and make sure the minimal loop really works.s00, then move through s01 -> s06 in order.s07 -> s11.s12 -> s15 only after the core agent makes sense.s16 -> s20, then the self-evolution chapters s21 -> s27.Each chapter is easier to absorb if you keep the same reading rhythm:
If you keep asking:
go back to:
learn-hermes-agent/
├── agents/ # runnable Python reference implementations per chapter (s24 is an exception, see below)
├── docs/zh/ # Chinese mainline docs
├── docs/en/ # English docs
├── illustrations/ # chalkboard-style diagrams for each chapter
├── tests/ # smoke tests
├── web/ # web teaching platform (optional)
├── .env.example # environment variable template
└── requirements.txt # Python dependencies
Note:
s24 Plugin Architecturecurrently ships with documentation only (docs/en/s24-plugin-architecture.mdand the Chinese counterpart). There is noagents/s24_*.pyreference implementation. The doc is self-contained and does not block the rest of the reading order.
To ensure "buildable from 0 to 1", this repo makes deliberate tradeoffs:
This means the repo aims for:
High fidelity on core mechanisms, deliberate tradeoffs on peripheral details.
Chinese is the canonical teaching line and the fastest-moving version.
zh: most reviewed and most completeen: all chapters s00-s27 available; Chinese is updated firstIf you want the fullest and most frequently refined explanation path, use the Chinese docs first.
By the end of the repo, you should be able to answer these questions clearly:
If you can answer those questions clearly and build a similar system yourself, this repo has done its job.
This is not "copy the source code line by line." This is "grasp the designs that truly matter, then build it yourself."
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent assistance
Deterministic multi-agent pipeline for end-to-end software development, orchestrating CLI-based AI tools (e.g. Gemini, C
干净、强大、属于你的 AI Agent 平台 --AI agents, without the clutter.