GuideTech Frontier

Hermes Agent Source Code Teardown: How Nous Research Built a Self-Evolving AI Agent

A deep teardown of the open-source Hermes Agent framework by Nous Research: 141K lines of Python code, 74 built-in skills, 12 platform adapters, and a closed-loop learning system. From the tool registry pattern to iterative context compression, from MCP OAuth 2.1 to intelligent model routing, we analyze why this is currently the most opinionated Agent framework.

LangChain teaches you to stack blocks, CrewAI teaches you to form squads, AutoGen teaches you to hold meetings. But there is one problem none of them solve: the mistakes your Agent made yesterday, it will make again today. Nous Research's Hermes Agent, with 141,000 lines of Python code, offers a different answer — let the Agent create skills from its own experience, improve those skills through use, and remember who you are across sessions. This is not just another LLM wrapper — it is a self-evolving AI operating system.

141K

Lines of Python Production Code

12

Messaging Platform Adapters

74

Built-in Skills

6

Terminal Execution Backends

Why Another Agent Framework? Because Everyone Else Is Building Tools — This One Is Building a Brain

Let's start with a comparison:

CapabilityHermes AgentLangChainCrewAIAutoGen
Closed-Loop LearningBuilt-in skill creation + improvementNoneNoneNone
Multi-Platform Messaging12 adapters out of the boxSelf-built requiredLimitedNone
Persistent MemoryHoncho user modeling + FTS5 searchBasic ChainNoneNone
Context CompressionIterative LLM summarizationNoneNoneNone
MCP IntegrationFull support + OAuth 2.1NoneNoneNone
Cost TrackingPer-session + per-provider billingPartialNoneNone

The key differentiator is closed-loop learning. LangChain's Agent finishes a task and that's it — next time, it starts from scratch. Hermes Agent is different: after completing a complex task, it automatically creates a .md skill file, and the next time it encounters a similar problem, it calls the skill directly. Moreover, skills continuously improve through use. This mechanism makes the Agent smarter over time, rather than reinventing the wheel every time.

Architecture Teardown: Six Progressive Layers, Each with Highlights

Hermes Agent's code organization follows a clear layered architecture, with 202 Python modules each handling distinct responsibilities. Let's break it down from top to bottom:

Layer 1: Entry Points and Gateway

The Gateway layer (gateway/ directory, approximately 280K lines of code) is the heaviest layer. The core file gateway/run.py reaches 6,332 lines and handles all platform message routing, session management, and instruction parsing. 12 platform adapters cover mainstream enterprise communication scenarios:

  • International platforms: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS
  • Chinese platforms: DingTalk, Feishu (Lark)
  • Developer tools: Mattermost, native CLI

The Gateway also includes a built-in OpenAI-compatible HTTP API (/v1/chat/completions), meaning you can treat Hermes Agent as an enhanced LLM API endpoint, directly replacing OpenAI calls in existing systems.

Layer 2: Agent Core

run_agent.py (8,492 lines) is the brain. At its core is a conversation loop with tool calling, executing up to 90 iterations per conversation. Several design choices worth noting:

Iterative Context Compression (agent/context_compressor.py, 2,030 lines) — this is the most elegant context management approach I have seen:

  1. Protects head and tail messages based on token budget (not message count)
  2. First performs cheap preprocessing: trimming old tool output results
  3. Then uses an auxiliary cheap model (e.g., Haiku 4.5) to generate summaries
  4. In subsequent compressions, feeds the previous summary alongside new messages to the LLM for incremental updates
  5. Summary budget cap: 20% of compressed content, but never exceeding 12K tokens
SUMMARY_PREFIX = "[CONTEXT COMPACTION] Earlier turns compacted..."
_SUMMARY_RATIO = 0.20   # Allocate 20% of compressed content to summary
_SUMMARY_TOKENS_CEILING = 12_000  # Summary never exceeds 12K tokens

Auxiliary Model System (agent/auxiliary_client.py, 1,926 lines) — the primary model handles reasoning and decision-making, while auxiliary models handle vision analysis, text summarization, sampling, and other low-complexity tasks. This lets you use Claude Opus for core reasoning while using Haiku for context compression, reducing costs by 10x.

Layer 3: Tool Registry — An Elegant Solution to Circular Imports

tools/registry.py is only about 110 lines of code, but it solves a problem that plagues all Agent frameworks: circular imports between tool files.

The traditional approach is to import all tools in one large file, but this means any tool's import failure cascades to the entire system. Hermes Agent's approach:

Registry Pattern

Each tool file calls registry.register() at module level, declaring its name, toolset, JSON Schema, and handler function. Tools never import each other.

Lazy Discovery

model_tools.py (only 472 lines) triggers _discover_tools() only when needed, loading approximately 21 tool modules via importlib.import_module().

Failure Isolation

If a tool's import fails (e.g., missing dependencies), it only logs the error without affecting other tools or Agent startup.

This design pattern is worth adopting for all Agent frameworks. In FluxWise's enterprise Agent platform design, we have also adopted a similar registry pattern for managing pluggable business skills.

Layer 4: Skill System — The Agent's Long-Term Memory

74 built-in skills + 50+ optional skills, all stored in Markdown format (YAML frontmatter + Markdown body), compatible with the agentskills.io standard. The skill system uses progressive disclosure:

  1. Metadata layer: Name, description, trigger conditions — used for quick matching
  2. Full instruction layer: Detailed execution steps — loaded after a successful match
  3. Reference layer: Template files, reference documents — loaded on demand

The most interesting feature is automatic skill creation. When the Agent completes a complex multi-step task, it automatically distills the operation sequence into a new skill file. The next time it encounters a similar problem, it calls the skill directly instead of re-reasoning. While still experimental, this capability points toward an AI system that truly accumulates experience.

Layer 5: Memory and Persistence

Three-layer memory architecture:

Honcho User Modeling — builds user profiles through dialectical Q&A, with semantic search + LLM synthesis. The result is an Agent that can say things like "based on our previous discussion about X..."

Session Search — SQLite + FTS5 full-text search virtual table, with 6 versioned schema migrations. LLM-driven cross-session recall, not simple keyword matching.

Procedural Memory — user's MEMORY.md and persona's SOUL.md, persistently stored in the file system.

Layer 6: Intelligent Model Routing

agent/smart_model_routing.py (168 lines) implements a lightweight but practical multi-model router:

  • Rate limit detected → immediately switch to backup model
  • Context approaching limits → suggest cheaper alternatives
  • Multi-provider aware: fuzzy matching via models.dev registry + custom endpoint metadata

For enterprise users, this means: one configuration file handles multi-vendor failover, without handling OpenAI vs. Anthropic differences at the code level.

MCP Integration: Not Toy-Grade

Many frameworks claim to support MCP (Model Context Protocol), but Hermes Agent's implementation depth is on another level:

  • Complete OAuth 2.1 PKCE flow: Supports authentication for modern MCP servers like GitHub and Google
  • CSRF protection + state validation: Production-grade security standards
  • Automatic token refresh: Long-running Agents won't be interrupted by token expiration
  • Sampling support: MCP servers can request LLM completions during tool invocation
  • Fine-grained control: Each MCP server can have individual RPM limits, token caps, and model overrides

tools/mcp_tool.py + tools/mcp_oauth.py total 2,019 lines of code — this is not a demo; this is a production-grade implementation.

Six Terminal Backends: From Laptops to HPC Clusters

When the Agent executes tools, it needs a runtime environment. Hermes Agent provides 6 options:

BackendUse CaseCostIsolation
Local ShellDevelopment & debuggingFreeNone
Docker ContainerSecure executionLowHigh
Remote SSHExisting serversExistingMedium
Modal ServerlessOn-demand computePay-per-secondFully isolated
Daytona Persistent ContainersState persistence neededLowHigh
Singularity HPCScientific computingDepends on clusterFully isolated

The Modal backend is particularly interesting: the Agent costs almost nothing when idle (less than 1 cent), and automatically spins up GPU instances when code needs to execute. This is practical for manufacturing enterprises — your AI Agent doesn't need to occupy server resources 24/7, consuming resources only on-demand when tasks arise.

Engineering Quality: 3,700 Tests Are Not Just for Show

  • 372 test files, approximately 3,700 test cases
  • Integration tests separated with @pytest.mark.integration
  • pytest-xdist support for parallel execution
  • Dependency versions all pinned with range locks (supply chain security)
  • Pydantic 2.12+ for data validation
  • Comprehensive type annotations

Several engineering details in the code worth learning from:

Thread-Safe Async Bridging (model_tools.py) — the most common pitfall in Python async programming is asyncio.run() creating and then closing the event loop, causing cached httpx/AsyncOpenAI clients to throw "Event loop is closed" errors. Hermes Agent uses threading.local() to maintain persistent event loops per thread, with three handling paths for each scenario: main CLI thread, async context, and worker threads.

Structured Summary Templates — summaries generated by context compression follow a fixed format: Goals, Progress, Decisions, File Changes, Next Steps. This ensures compressed information remains actionable rather than becoming vague generalizations.

Enterprise Deployment: Strengths and Weaknesses

auto_awesomeEnterprise-Grade Advantages

Multi-tenant messaging gateway — 12 platform adapters + session isolation. Building internal Slack/DingTalk bots without writing platform SDKs yourself.

Cost controllable — per-provider billing tracking, context compression to reduce token consumption, auxiliary model offloading for low-complexity tasks, automatic degradation on rate limits.

Audit compliance — SQLite session storage records complete message history, tool call logs, and cost breakdowns, exportable for GDPR/SOX compliance.

Security defenses — approval required before dangerous tool execution, path traversal protection, prompt injection defense, credential stripping from error messages.

Self-hosted — runs on a $5/month VPS, no cloud vendor dependency. MIT license, commercially usable.

But there are also hard limitations:

  1. Single-process constraint — no cluster/distributed Agent pool, cannot scale horizontally. Each VPS runs one Agent instance
  2. Gateway coupling — a 6,332-line core routing file; extending new platforms requires modifying core code
  3. No vector retrieval — memory search relies on FTS5 full-text matching, with weak semantic recall capabilities
  4. Insufficient team documentation — tutorials target individual developers, lacking enterprise-level deployment guides (multi-user collaboration, secret management, monitoring & alerting)

Verdict: Who Should Use It, Who Shouldn't

Good fit for:

  • Teams needing an internal AI assistant that remembers users and improves with use
  • Deploying AI capabilities within DingTalk/Feishu/Slack and other existing tools
  • R&D teams wanting Agents for paper reading, code review, knowledge accumulation
  • Budget-conscious teams needing multi-model intelligent routing for cost control

Not a good fit for:

  • High-throughput systems requiring hundreds of concurrent Agents processing tasks
  • Engineering teams needing deep customization at every level (better off building with LangChain)
  • Simple ChatBot functionality (overkill)

Hermes Agent's positioning is clear: it is not a library, it is a product. You don't build on top of it — you deploy it, configure it, and let it evolve on its own. For 80% of enterprise AI use cases, this may be the right approach: instead of spending 3 months building an Agent from scratch with LangChain, spend 3 days deploying Hermes Agent and let it learn your business.

想了解更多?

预约免费业务诊断,看看AI能帮你的企业做什么。