Hermes Agent Source Code Teardown: How Nous Research Built a Self-Evolving AI Agent

LangChain teaches you to stack blocks, CrewAI teaches you to form squads, AutoGen teaches you to hold meetings. But there is one problem none of them solve: the mistakes your Agent made yesterday, it will make again today. Nous Research's Hermes Agent, with 141,000 lines of Python code, offers a different answer — let the Agent create skills from its own experience, improve those skills through use, and remember who you are across sessions. This is not just another LLM wrapper — it is a self-evolving AI operating system.

141K

Lines of Python Production Code

Messaging Platform Adapters

Built-in Skills

Terminal Execution Backends

Why Another Agent Framework? Because Everyone Else Is Building Tools — This One Is Building a Brain

Let's start with a comparison:

Capability	Hermes Agent	LangChain	CrewAI	AutoGen
Closed-Loop Learning	Built-in skill creation + improvement	None	None	None
Multi-Platform Messaging	12 adapters out of the box	Self-built required	Limited	None
Persistent Memory	Honcho user modeling + FTS5 search	Basic Chain	None	None
Context Compression	Iterative LLM summarization	None	None	None
MCP Integration	Full support + OAuth 2.1	None	None	None
Cost Tracking	Per-session + per-provider billing	Partial	None	None

The key differentiator is closed-loop learning. LangChain's Agent finishes a task and that's it — next time, it starts from scratch. Hermes Agent is different: after completing a complex task, it automatically creates a .md skill file, and the next time it encounters a similar problem, it calls the skill directly. Moreover, skills continuously improve through use. This mechanism makes the Agent smarter over time, rather than reinventing the wheel every time.

Architecture Teardown: Six Progressive Layers, Each with Highlights

Hermes Agent's code organization follows a clear layered architecture, with 202 Python modules each handling distinct responsibilities. Let's break it down from top to bottom:

Layer 1: Entry Points and Gateway

The Gateway layer (gateway/ directory, approximately 280K lines of code) is the heaviest layer. The core file gateway/run.py reaches 6,332 lines and handles all platform message routing, session management, and instruction parsing. 12 platform adapters cover mainstream enterprise communication scenarios:

International platforms: Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Email, SMS
Chinese platforms: DingTalk, Feishu (Lark)
Developer tools: Mattermost, native CLI

The Gateway also includes a built-in OpenAI-compatible HTTP API (/v1/chat/completions), meaning you can treat Hermes Agent as an enhanced LLM API endpoint, directly replacing OpenAI calls in existing systems.

Layer 2: Agent Core

run_agent.py (8,492 lines) is the brain. At its core is a conversation loop with tool calling, executing up to 90 iterations per conversation. Several design choices worth noting:

Iterative Context Compression (agent/context_compressor.py, 2,030 lines) — this is the most elegant context management approach I have seen:

Protects head and tail messages based on token budget (not message count)
First performs cheap preprocessing: trimming old tool output results
Then uses an auxiliary cheap model (e.g., Haiku 4.5) to generate summaries
In subsequent compressions, feeds the previous summary alongside new messages to the LLM for incremental updates
Summary budget cap: 20% of compressed content, but never exceeding 12K tokens

SUMMARY_PREFIX = "[CONTEXT COMPACTION] Earlier turns compacted..."
_SUMMARY_RATIO = 0.20   # Allocate 20% of compressed content to summary
_SUMMARY_TOKENS_CEILING = 12_000  # Summary never exceeds 12K tokens

Auxiliary Model System (agent/auxiliary_client.py, 1,926 lines) — the primary model handles reasoning and decision-making, while auxiliary models handle vision analysis, text summarization, sampling, and other low-complexity tasks. This lets you use Claude Opus for core reasoning while using Haiku for context compression, reducing costs by 10x.

Layer 3: Tool Registry — An Elegant Solution to Circular Imports

tools/registry.py is only about 110 lines of code, but it solves a problem that plagues all Agent frameworks: circular imports between tool files.

The traditional approach is to import all tools in one large file, but this means any tool's import failure cascades to the entire system. Hermes Agent's approach:

Registry Pattern

Each tool file calls registry.register() at module level, declaring its name, toolset, JSON Schema, and handler function. Tools never import each other.

Lazy Discovery

model_tools.py (only 472 lines) triggers _discover_tools() only when needed, loading approximately 21 tool modules via importlib.import_module().

Failure Isolation

If a tool's import fails (e.g., missing dependencies), it only logs the error without affecting other tools or Agent startup.

This design pattern is worth adopting for all Agent frameworks. In FluxWise's enterprise Agent platform design, we have also adopted a similar registry pattern for managing pluggable business skills.

Layer 4: Skill System — The Agent's Long-Term Memory

74 built-in skills + 50+ optional skills, all stored in Markdown format (YAML frontmatter + Markdown body), compatible with the agentskills.io standard. The skill system uses progressive disclosure:

Metadata layer: Name, description, trigger conditions — used for quick matching
Full instruction layer: Detailed execution steps — loaded after a successful match
Reference layer: Template files, reference documents — loaded on demand

The most interesting feature is automatic skill creation. When the Agent completes a complex multi-step task, it automatically distills the operation sequence into a new skill file. The next time it encounters a similar problem, it calls the skill directly instead of re-reasoning. While still experimental, this capability points toward an AI system that truly accumulates experience.

Layer 5: Memory and Persistence

Three-layer memory architecture:

Honcho User Modeling — builds user profiles through dialectical Q&A, with semantic search + LLM synthesis. The result is an Agent that can say things like "based on our previous discussion about X..."

Session Search — SQLite + FTS5 full-text search virtual table, with 6 versioned schema migrations. LLM-driven cross-session recall, not simple keyword matching.

Procedural Memory — user's MEMORY.md and persona's SOUL.md, persistently stored in the file system.

Layer 6: Intelligent Model Routing

agent/smart_model_routing.py (168 lines) implements a lightweight but practical multi-model router:

Rate limit detected → immediately switch to backup model
Context approaching limits → suggest cheaper alternatives
Multi-provider aware: fuzzy matching via models.dev registry + custom endpoint metadata

For enterprise users, this means: one configuration file handles multi-vendor failover, without handling OpenAI vs. Anthropic differences at the code level.

MCP Integration: Not Toy-Grade

Many frameworks claim to support MCP (Model Context Protocol), but Hermes Agent's implementation depth is on another level:

Complete OAuth 2.1 PKCE flow: Supports authentication for modern MCP servers like GitHub and Google
CSRF protection + state validation: Production-grade security standards
Automatic token refresh: Long-running Agents won't be interrupted by token expiration
Sampling support: MCP servers can request LLM completions during tool invocation
Fine-grained control: Each MCP server can have individual RPM limits, token caps, and model overrides

tools/mcp_tool.py + tools/mcp_oauth.py total 2,019 lines of code — this is not a demo; this is a production-grade implementation.

Six Terminal Backends: From Laptops to HPC Clusters

When the Agent executes tools, it needs a runtime environment. Hermes Agent provides 6 options:

Backend	Use Case	Cost	Isolation
Local Shell	Development & debugging	Free	None
Docker Container	Secure execution	Low	High
Remote SSH	Existing servers	Existing	Medium
Modal Serverless	On-demand compute	Pay-per-second	Fully isolated
Daytona Persistent Containers	State persistence needed	Low	High
Singularity HPC	Scientific computing	Depends on cluster	Fully isolated

The Modal backend is particularly interesting: the Agent costs almost nothing when idle (less than 1 cent), and automatically spins up GPU instances when code needs to execute. This is practical for manufacturing enterprises — your AI Agent doesn't need to occupy server resources 24/7, consuming resources only on-demand when tasks arise.

Engineering Quality: 3,700 Tests Are Not Just for Show

372 test files, approximately 3,700 test cases
Integration tests separated with @pytest.mark.integration
pytest-xdist support for parallel execution
Dependency versions all pinned with range locks (supply chain security)
Pydantic 2.12+ for data validation
Comprehensive type annotations

Several engineering details in the code worth learning from:

Thread-Safe Async Bridging (model_tools.py) — the most common pitfall in Python async programming is asyncio.run() creating and then closing the event loop, causing cached httpx/AsyncOpenAI clients to throw "Event loop is closed" errors. Hermes Agent uses threading.local() to maintain persistent event loops per thread, with three handling paths for each scenario: main CLI thread, async context, and worker threads.

Structured Summary Templates — summaries generated by context compression follow a fixed format: Goals, Progress, Decisions, File Changes, Next Steps. This ensures compressed information remains actionable rather than becoming vague generalizations.

Enterprise Deployment: Strengths and Weaknesses

auto_awesomeEnterprise-Grade Advantages

Multi-tenant messaging gateway — 12 platform adapters + session isolation. Building internal Slack/DingTalk bots without writing platform SDKs yourself.

Cost controllable — per-provider billing tracking, context compression to reduce token consumption, auxiliary model offloading for low-complexity tasks, automatic degradation on rate limits.

Audit compliance — SQLite session storage records complete message history, tool call logs, and cost breakdowns, exportable for GDPR/SOX compliance.

Security defenses — approval required before dangerous tool execution, path traversal protection, prompt injection defense, credential stripping from error messages.

Self-hosted — runs on a $5/month VPS, no cloud vendor dependency. MIT license, commercially usable.

But there are also hard limitations:

Single-process constraint — no cluster/distributed Agent pool, cannot scale horizontally. Each VPS runs one Agent instance
Gateway coupling — a 6,332-line core routing file; extending new platforms requires modifying core code
No vector retrieval — memory search relies on FTS5 full-text matching, with weak semantic recall capabilities
Insufficient team documentation — tutorials target individual developers, lacking enterprise-level deployment guides (multi-user collaboration, secret management, monitoring & alerting)

Verdict: Who Should Use It, Who Shouldn't

Good fit for:

Teams needing an internal AI assistant that remembers users and improves with use
Deploying AI capabilities within DingTalk/Feishu/Slack and other existing tools
R&D teams wanting Agents for paper reading, code review, knowledge accumulation
Budget-conscious teams needing multi-model intelligent routing for cost control

Not a good fit for:

High-throughput systems requiring hundreds of concurrent Agents processing tasks
Engineering teams needing deep customization at every level (better off building with LangChain)
Simple ChatBot functionality (overkill)

Hermes Agent's positioning is clear: it is not a library, it is a product. You don't build on top of it — you deploy it, configure it, and let it evolve on its own. For 80% of enterprise AI use cases, this may be the right approach: instead of spending 3 months building an Agent from scratch with LangChain, spend 3 days deploying Hermes Agent and let it learn your business.