Hermes Agent Guide to Multi‑Agent AI: Setup, Benchmarks, and Real‑World Use

AI agents now do more than chat. They can remember past work, call tools, and run on your own server. Hermes Agent is a new open‑source agent that learns from experience and supports multi‑agent workflows.

This guide explains what Hermes Agent is, how multi‑agent AI works, and why this mix matters.

What Is Hermes Agent and Multi‑Agent AI?

Hermes Agent is an open‑source autonomous AI agent built by Nous Research. It runs on your own infrastructure, remembers what it learns, and creates reusable “skills” from your tasks.

You can talk to it from the terminal, Telegram, Discord, Slack, or WhatsApp through a single gateway. It works with many language models, including Hermes 4, OpenAI models, Anthropic models, and any OpenAI‑compatible endpoint.

Multi‑agent AI is a system where several agents work together in one environment.
Each agent is an independent component that can take actions, hold goals, and share information with others.

These agents can split a complex task into parts, coordinate, and then combine their results into one answer. This approach helps with large, changing problems that are hard for a single agent.

Key Features of Hermes Agent

Core Agent Features

Persistent cross‑session memory – Hermes stores past conversations and events in a SQLite database with FTS5 full‑text search and language‑model summarization, so it can recall relevant history across sessions.
Self‑created skills – After complex tasks, Hermes can write “skill documents” as markdown files using the agentskills.io open standard, then reuse and improve these skills over time.
Honcho user modeling – Hermes integrates with Honcho to build a profile of your preferences, work style, and recurring patterns, which shapes future decisions.
40+ built‑in tools – It ships with tools for web search, file operations, terminal access, browser automation, image generation, speech synthesis, and more.

Deployment and Integration

Runs anywhere – Hermes supports six terminal backends: local, Docker, SSH, Daytona, Singularity, and Modal, from a $5 VPS up to GPU clusters or serverless setups.
Multi‑platform messaging – A unified gateway lets you talk to the agent from CLI, Telegram, Discord, Slack, and WhatsApp while it keeps one shared context.
Model‑agnostic design – You can connect Nous Portal, OpenRouter, OpenAI, Anthropic, or any OpenAI‑compatible API, and even local models through Ollama or vLLM.
MCP and external tools – Hermes supports the Model Context Protocol (MCP) and other integrations to call external tools and services as part of workflows.

Multi‑Agent and Automation Features

Delegation and subagents – Hermes can spawn subagents for separate workstreams and run them in parallel, then aggregate their outputs.
Cron‑based scheduling – A built‑in scheduler lets you run tasks on a schedule, send reports, or monitor systems without manual prompts.
Research and RL support – Hermes includes an environment framework for batch processing, trajectory export, and reinforcement learning through the Atropos backend.
Transparent skill evolution – Skills are plain files you can inspect, edit, share, or remove, which gives clear control over how the agent grows.

How to Install or Set Up Hermes Agent

This section focuses on Linux, macOS, and WSL2, which the project supports.

Option 1: One‑Line Quick Install

Check Git is installed
Run git --version and confirm you see a version number, because the installer needs Git.
Run the installer
On Linux, macOS, or WSL2, run:
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
Reload your shell
Run source ~/.bashrc or source ~/.zshrc so the new hermes command is on your PATH.
Verify the install
Run hermes version and hermes doctor to confirm dependencies, model provider, and config.
Start the agent
Run hermes to open an interactive chat in your terminal.

The installer sets up Python, Node.js, ripgrep, ffmpeg, a virtual environment, and the hermes entry point.

Option 2: Manual Install (More Control)

Clone the repo
Run git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git then cd hermes-agent.
Install uv and create a virtual environment
Run curl -LsSf https://astral.sh/uv/install.sh | sh then uv venv venv --python 3.11.
Install Python dependencies
Export VIRTUAL_ENV="$(pwd)/venv" and run uv pip install -e ".[all]" to install Hermes with all extras.
Install optional Atropos backend
Run uv pip install -e "./tinker-atropos" if you want RL environments.
Install Node.js dependencies for browser tools
Run npm install in the repo root to support browser automation and WhatsApp.
Create config folders
Create ~/.hermes subfolders for sessions, logs, memories, skills, hooks, and caches, and copy cli-config.yaml.example to ~/.hermes/config.yaml.
Add API keys
Edit ~/.hermes/.env and set keys like OPENROUTER_API_KEY and optional keys for web scraping or image generation.
Add hermes to PATH
Symlink venv/bin/hermes into ~/.local/bin and ensure that folder is on your PATH.
Configure the model provider
Run hermes model to pick your provider and preferred model.
Run diagnostics and start
Use hermes doctor to check your setup, then run hermes to start chatting.

Windows is only supported through WSL2; native Windows shells are not supported.

How to Run or Use Hermes Agent

First Run in the Terminal

When you run hermes, you get a simple chat interface in your terminal.
On the first run, it may ask you to pick a model provider if no key is set.

After that, you can type a natural language request like “Help me organize my project tasks for this week.” Hermes will plan the steps, use tools like web search or file access if needed, and respond.

If you connect it to a model like Hermes 4 70B through a provider, you get strong reasoning and tool‑use skills. Hermes Agent then uses its orchestration loop to decide when to search, when to run code, and when to ask follow‑up questions.

ts memory layer stores the session summary and key facts for later recall.
Over time, repeated patterns turn into skills that it can trigger for similar tasks.

Connecting Messaging Platforms

You can run hermes gateway setup to connect Telegram, Discord, Slack, or WhatsApp. This sets up a unified gateway service that routes chats from these platforms into the same agent.

After setup, sending a message on Telegram is like talking directly to the terminal agent. Hermes keeps one memory across channels, so it can link a Slack request to a Telegram follow‑up.

Using Tools and Multi‑Agent Workflows

Hermes can call tools such as: web search, terminal commands, browser automation, and code execution. You can manage tools with hermes tools, enabling or disabling groups for safety or focus.

For complex work, Hermes can spawn subagents for parallel tasks, such as code refactoring and documentation at the same time. Each subagent uses the shared memory and skills, then returns its result to the main agent for final synthesis.

A typical pattern is: one subagent collects research, another edits files, and a third runs tests. This looks like a multi‑agent system but feels like a single assistant from the user’s view.

Hermes’s scheduler can then run these workflows on cron, such as nightly reports or health checks. All results and important events become part of its long‑term memory and skills.

Benchmark Results

Hermes Agent itself is a framework, so raw model benchmarks depend on the LLM you use. Many users pair it with Hermes 4 70B or similar strong models, so this table focuses on model‑level scores.

Model Benchmarks Relevant to Hermes Agent

Model	Benchmark	Score / Metric	Notes
Hermes 4 70B	MMLU Pro	66.4%	Academic knowledge benchmark; Gemini 3 Pro scores 89.8% for context.
Hermes 4 70B	GPQA	49.1%	Graduate‑level science reasoning.
Hermes 4 70B	Throughput	~74.6 tokens per second	Measured across providers on PricePerToken and model trackers.
Hermes 4.3 36B (Psyche)	MATH‑500	93.8%	Competitive with larger closed models on math tasks.
GPT‑4o (reference)	GPQA	~54–55%	Reported in model comparisons; used as a yardstick.
Claude 3.5 Sonnet	MMLU	~90.4%	Strong on general knowledge benchmarks.
Claude 3.5 Sonnet	GPQA	~59–67% (5‑shot)	High graduate‑level reasoning scores.
Gemini 3 Pro Preview	MMLU Pro	89.8%	Very strong academic knowledge.
Gemini 3 Pro Preview	GPQA	90.8%	High graduate‑level reasoning.

Hermes Agent can use any of these models as the “brain” behind its tools and multi‑agent workflows. The choice depends on your budget, latency goals, and target quality.

Multi‑Agent Framework Benchmark (Research Example)

A separate academic Hermes framework (telecom network planning) shows why multi‑agent structures help. In that study, a multi‑agent Hermes setup reached 75–85% success on complex network design tasks, above chain‑of‑thought baselines.

This supports the idea that structured agent teams can solve tasks that overwhelm a single reasoning loop.

Testing Details

Public model benchmarks are based on standard suites like MMLU, GPQA, MATH‑500, MT‑Bench, and others. These tests run fixed question sets that cover areas such as science, law, common sense, and math reasoning.

For example, Hermes 4 models were tested on MATH‑500, where a 36B Psyche build scored 93.8%. They also appear in price‑per‑token dashboards that log throughput, latency, and benchmark scores by provider.

Multi‑agent system tests, like the telecom Hermes framework, used simulation tasks that required many modeling blocks. These tasks combined multiple steps, such as predicting SINR values after base station changes.

The study ran 20 independent trials per task and counted runs with errors under 10% as successful. This gave success rates that showed Hermes agents outperformed chain‑of‑thought and single‑agent coder baselines, especially on harder tasks.

Comparison Table: Hermes Agent vs Other Multi‑Agent Frameworks

This table compares Hermes Agent with three well‑known frameworks: OpenClaw, CrewAI, and LangGraph.

Criteria	Hermes Agent	OpenClaw	CrewAI	LangGraph (LangChain)
Open source core	Yes, fully open source, self‑hosted.	Yes, open‑source agent framework.	Open‑source framework plus paid cloud.	Open‑source library plus paid cloud.
Memory	Multi‑level memory with FTS5 search and skill docs.	Markdown‑based persistent memory on local files.	Role‑based agents with RAG‑style memory.	State‑based memory with checkpoints.
Skills / tools	40+ built‑in tools, skill documents via agentskills.io.	Skill system with configurable permissions and local tools.	Task‑oriented agents, tools and crews.	Nodes as functions or agents; uses LangChain tools.
Multi‑agent style	Single agent with subagents and delegation.	Multi‑agent routing and heartbeat scheduler.	Role‑based crews of agents.	Graph of agents / tools as nodes.
Hosting model	Self‑host only, runs on VPS, local, or serverless backends.	Self‑hosted on your hardware or VPS.	Open‑source self‑host plus managed cloud tiers.	Open source self‑host plus hosted “LangGraph Cloud.”
Pricing model	Software free; you pay only for models and infra.	Software free; you pay for models and server.	Tiered SaaS pricing from $99/month up to large enterprise.	Free open source; paid usage‑based cloud at $0.001 per node.
Best suited for	Long‑term personal or team assistant that learns.	High‑autonomy “AI that does things” across many apps.	Structured multi‑agent business workflows.	Complex, branching workflows and enterprise graphs.

Pricing Table

Hermes Agent software is free, but real costs come from models and infrastructure. Other frameworks offer clear SaaS tiers.

Tier Type	Tool / Plan	Platform Cost (Typical)	Notes
Free	Hermes Agent self‑hosted	$0 for software; pay only for API tokens or local host.	Works on a $5 VPS or local machine; supports many models.
Free	LangGraph open source	$0 for library; infra and models separate.	Good for teams that self‑host orchestration.
Free	OpenClaw self‑hosted	$0 for framework; VPS and tokens extra.	Common setup uses a low‑cost VPS and pay‑per‑token models.
Paid	CrewAI Basic	$99/month, limited executions.	Cloud platform for small deployments.
Paid	CrewAI Standard / Pro	$6,000–$12,000/year.	More executions, more crews, extra support.
Paid	LangGraph Plus (cloud)	$0.001 per node + standby charges; LangSmith $39/user/month.	Usage‑based; first 100k nodes often free on dev tier.
Enterprise	CrewAI Enterprise	~$60,000/year.	10,000 executions per month and many deployed crews.
Enterprise	CrewAI Ultra	~$120,000/year.	500,000 executions per month, 100 crews.
Enterprise	LangGraph Enterprise	Custom pricing.	For large teams and strict security needs.

For Hermes Agent, model costs matter more than platform fees.
For example, Hermes 4 70B through Nous costs about $0.13 per million input tokens and $0.40 per million output tokens.

Unique Selling Proposition (USP) of Hermes Agent

Hermes Agent combines three traits that are rare in one tool.

First, it has deep, persistent memory that covers sessions, user preferences, and reusable skills.

Second, it runs entirely on your own infrastructure, so you keep control over data and hosting choices.

Third, it contains a closed learning loop that turns long‑running use into better performance through new skills and better recall.

Many frameworks offer multi‑agent orchestration, but Hermes focuses on a self‑improving agent that feels like one growing teammate, not just a static workflow.

Pros and Cons

Pros

Open‑source and free to run, with no subscription for the core agent.
Strong multi‑level memory and user modeling that reduce repeated instructions.
Self‑created skills that capture your workflows as reusable procedures.
Runs on many backends, from a $5 VPS to serverless Modal or Daytona.
Works with many LLMs and providers, including the Hermes model family and mainstream APIs.
Single gateway for CLI, Telegram, Discord, Slack, and WhatsApp.
Built‑in support for research, batch processing, and RL experiments.

Cons

No hosted SaaS version; you must manage your own server or VPS.
Setup needs basic command‑line comfort and some knowledge of API keys.
Model costs can grow with heavy use, especially with large models like Hermes 4 70B.
Community ecosystem and skills library are newer than older frameworks like LangGraph or AutoGen.
Native Windows support is missing; you must use WSL2.

Quick Comparison Chart

Aspect	Hermes Agent	Classic Single Agent	Typical Multi‑Agent Framework (CrewAI / LangGraph)
Memory	Persistent, cross‑session with summaries.	Often session‑only history.	Varies; some state and RAG for workflows.
Skill growth	Autonomously writes and improves skills.	None; behavior stays fixed.	Depends on custom code or external storage.
Hosting	Self‑host only, wide backend choice.	Cloud or vendor‑hosted in many tools.	Self‑host plus managed SaaS options.
Multi‑agent support	Single agent with subagents and cron.	Usually single loop only.	Rich orchestration graphs or crews.
Setup effort	One‑line install plus config.	Often just a web UI.	Framework plus extra infra; more design work.
Best for	Ongoing personal or team assistant that learns.	Simple Q&A and chat.	Complex enterprise workflows and pipelines.

Demo or Real‑World Example: Using Hermes as a Long‑Term Project Assistant

This example shows a developer using Hermes Agent as a long‑term assistant for a software project.

Step 1: Install and Connect a Model

The developer installs Hermes on a small VPS using the one‑line installer.
They run hermes model and choose a provider like OpenRouter with Hermes 4 70B or a smaller model.

They then add the API key to ~/.hermes/.env.

Finally, they connect Telegram through hermes gateway setup so they can chat from their phone.

They upload their project README and key design docs into a folder that Hermes can read.

Using a “context file” feature described in community guides, they point Hermes at this folder so it can index it.

Hermes summarizes the codebase and stores these summaries in its memory layer.
From now on, it can recall project structure and important decisions without new uploads.

Step 3: Daily Multi‑Agent‑Style Workflow

Each morning, the developer sends a Telegram message like: “Review new pull requests, update the changelog, and propose release notes.”

Hermes decomposes this into subtasks: code review, documentation updates, and release note drafting.

One subagent reads pull requests and comments on style and possible bugs using terminal and Git tools. Another subagent edits the changelog file and suggests version bumps.

Hermes then merges these results into a single summary plus suggested Git commands.

The developer reviews and applies the commands on their own machine for safety.
Hermes stores the successful workflow as a skill document, so next time it can run a similar flow with less guidance.

Over weeks, it learns the team’s preferred commit style, release schedule, and code review tone.

Step 4: Scheduled Automation

The developer uses Hermes’s cron scheduler to set a weekly “project health report.”

At a fixed time, Hermes scans logs, open issues, and recent commits. It then posts a report in a shared Slack channel through the messaging gateway.

Team members see one steady assistant that tracks progress and highlights risks, powered by Hermes’s multi‑level memory and skills.

Conclusion

Hermes Agent brings multi‑agent ideas into a self‑hosted assistant that grows with real use. Its memory system, skills, and subagent support move it beyond a simple stateless chatbot.

Because it is open source and model‑agnostic, you can tune cost and performance to your needs. For people who want an AI teammate that remembers work and improves across months, Hermes Agent offers a strong option.

FAQ

1. Is Hermes Agent a single agent or a full multi‑agent system?

Hermes acts as one main agent that can spawn subagents for separate tasks.
This gives multi‑agent behavior while keeping a single shared memory and skills library.

2. Do I need a powerful server to run Hermes Agent?

No. Many users run it on a $5 VPS or a basic home machine.
Larger models or heavy workloads may need stronger CPUs or GPUs.

3. Can Hermes Agent work fully offline?

Hermes can use local models through tools like Ollama or vLLM, which reduces outside calls.
Some web tools and external APIs still need network access if you enable them.

4. How is Hermes Agent different from OpenClaw?

OpenClaw focuses on a multi‑agent gateway with strong task autonomy and routing.
Hermes focuses on one self‑improving agent with deep memory and skills that lives on your server.

5. Who should consider Hermes Agent today?

Good fits include developers and teams who want a long‑term AI partner that learns their stack.
It also suits power users who value open source, data control, and flexible model choices.