Nvidia NemoClaw + OpenClaw: Secure Sandbox Guide for Local vLLM Agents

Nvidia NemoClaw is a new open-source stack that adds privacy and security controls to the fast-growing OpenClaw agent platform. It wraps OpenClaw agents in Nvidia’s OpenShell sandbox and connects them to local and cloud language models. This guide explains the main ideas and shows how to set up a secure local vLLM backend. It also shares benchmark data and compares NemoClaw with other agent frameworks.

What Is Nvidia NemoClaw + OpenClaw?

OpenClaw is a free, open-source, self-hosted AI agent that runs on your own hardware and connects to many chat channels and tools. It can use local and cloud models to automate tasks such as coding, file work, and web research. NemoClaw is Nvidia’s open-source stack that adds a secure runtime, models, and policies around OpenClaw with a single installation command.

NemoClaw uses Nvidia’s Agent Toolkit and the new OpenShell runtime to isolate agents in sandboxed environments. A sandbox is a locked area where the agent process runs with strict rules for file access, network connections, and data handling. This helps reduce the risk that a bug, a malicious skill, or a prompt attack can damage your system or leak data.

NemoClaw also makes it easier to mix local and cloud models for OpenClaw. It can run Nvidia Nemotron models locally on RTX GPUs or DGX systems and route some requests to frontier cloud models through a privacy router. A privacy router is a gateway that controls which calls can go to the internet and can hide sensitive fields in those calls.

Core Components in the Stack

The NemoClaw + OpenClaw stack has four main parts.

OpenClaw core: the agent platform with skills, channels, and workflows.
NemoClaw plugin: commands and “blueprints” that wire OpenClaw into OpenShell and model providers.
OpenShell runtime: a secure sandbox with kernel-level isolation using features such as Landlock, seccomp, and network namespaces.
Model backends: Nemotron and other models via Nvidia cloud, vLLM servers, or tools such as Ollama.

Several reports describe OpenClaw as an “operating system for personal AI agents,” and NemoClaw adds the missing security layer around it for enterprise use.

Key Features

NemoClaw focuses on security, privacy, and practical deployment for always-on OpenClaw agents.

Open-source stack
NemoClaw and OpenShell are released as open source under Apache 2.0 style terms, so you can download, modify, and deploy them.
One-command installation
An official script installs OpenShell, configures NemoClaw, and links it to OpenClaw, which reduces manual setup steps.
Secure sandbox runtime
OpenShell enforces policy-based control over file system access, network egress, and process capabilities for every agent.
Privacy router for model calls
Model requests flow through a gateway that can hide or strip sensitive data before they reach cloud providers.
Local and cloud model support
NemoClaw supports Nemotron models on local GPUs and can connect to cloud models through APIs for hybrid workloads.
Hardware-agnostic deployment
NemoClaw and OpenShell run on general Linux servers and are not locked to Nvidia hardware, though performance is better on Nvidia GPUs.
Integration with OpenClaw ecosystem
The stack works with existing OpenClaw skills, channels, and templates, so current agents can move into the sandbox with few changes.

How to Install or Set Up

These steps assume a Linux host such as Ubuntu 22.04 with sudo access and internet connectivity.

1. Check Hardware and OS

CPU: at least 4 vCPUs.
RAM: 8 GB minimum, 16 GB recommended.
Disk: at least 20 GB free.
OS: Ubuntu 22.04 LTS or newer.
GPU: an Nvidia RTX card for strong local inference, or CPU-only for smaller models.

2. Install Required Software

Install core dependencies.

Update system packages with your package manager.
Install Node.js 20 or later and npm 10 or later for the OpenClaw and NemoClaw CLIs.
Install Docker and confirm that the Docker daemon is running, because OpenShell uses containers to run sandboxes.
Optionally install vLLM or Ollama for local models and verify that the model server runs on a local port.

3. Install OpenClaw

Use the official OpenClaw documentation for installation on Linux.

Clone the official OpenClaw repository from GitHub.
Run the installer or setup script described in the README.
Configure at least one control channel, such as the terminal user interface or web dashboard.

4. Install OpenShell Runtime

OpenShell is the secure runtime that NemoClaw uses for sandboxing.

Install OpenShell from Nvidia’s releases or from the DGX Spark “NemoClaw” guide.
Confirm that the openshell CLI runs and can create a basic sandbox.
Verify that Docker integration and kernel features such as Landlock and seccomp are enabled.

5. Install NemoClaw Plugin

NemoClaw provides the orchestration layer between OpenClaw and OpenShell.

Run the official installation script, which looks like:
curl -fsSL https://nvidia.com/nemoclaw.sh | bash.
This installs the NemoClaw CLI and downloads a versioned blueprint for sandbox orchestration.
Confirm installation with nemoclaw --help and openclaw nemoclaw status.

6. Run NemoClaw Onboard Wizard

The onboard wizard configures core components for the first run.

Run nemoclaw onboard.
Provide an Nvidia API key if you want to use Nemotron models from Nvidia cloud.
Choose a default inference provider: Nvidia cloud, a local vLLM server, or another backend.
Let the wizard create the OpenShell gateway, sandbox, and base policies.

7. Link OpenClaw to NemoClaw

Launch OpenClaw inside the NemoClaw-managed sandbox.

Use the OpenClaw CLI with the NemoClaw plugin, for example:
openclaw nemoclaw launch --profile my-assistant.
Wait while the blueprint runner creates the sandbox container and applies network and file policies.
Check health with openclaw nemoclaw status and inspect logs with openclaw nemoclaw logs -f.

When status is healthy, the OpenClaw agent runs inside an OpenShell sandbox and can use local or cloud models.

How to Run or Use It

From a user’s view, the stack behaves like normal OpenClaw, but with extra security and routing layers. The focus here is a setup that uses a local vLLM server as the primary backend.

1. Prepare a Local vLLM Server

vLLM is an inference engine that serves large language models with high throughput and low latency.

Install vLLM in a virtual environment or container on the same host or on a GPU server.
Download a compatible model, such as a Qwen2.5 Coder variant or an open Nemotron release.
Start the vLLM HTTP server and note the base URL, for example http://localhost:8000.

If the vLLM server is on a remote GPU machine, expose it with SSH port forwarding or a secure tunnel, not through a public port.

2. Configure NemoClaw to Use vLLM

NemoClaw’s wizard can register multiple providers, including a local vLLM endpoint.

Run nemoclaw providers add or re-run nemoclaw onboard.
Choose a custom HTTP provider and enter the vLLM base URL.
Map one or more logical model IDs to the vLLM deployment, such as local/qwen2.5-coder.

3. Update OpenClaw Model Configuration

OpenClaw stores model providers in a JSON or YAML configuration file.

Open the model configuration section.
Add a provider entry for a local backend, such as vLLM or Ollama, pointing to the NemoClaw or gateway URL.
Set the default agent model to the local ID, for example local/qwen2.5-coder.

OpenClaw now sends model requests through NemoClaw and OpenShell to your vLLM server.

4. Start the Agent Inside the Sandbox

Use the NemoClaw commands to connect to the sandbox shell.

Run nemoclaw my-assistant connect.
A shell prompt appears inside the sandbox, where OpenClaw runs under OpenShell controls.
From here, use the OpenClaw terminal interface or web dashboard to send prompts and watch logs.

Every prompt passes through sandbox policies, then to the vLLM backend, and then back to OpenClaw for planning and actions.

5. Example: Local Coding Agent

With vLLM and a coding model such as Qwen2.5 Coder, NemoClaw can drive a local coding assistant.

The user sends a message like “Refactor this Python script and add logging.”
OpenClaw receives the request on Telegram, terminal, or another channel and turns it into a task.
NemoClaw routes the model call to the local vLLM server inside the sandbox.
The agent plans steps, edits files inside the sandbox file system, and reports results back.

The sandbox rules stop the agent from touching unapproved paths or making network calls to unknown hosts.

Benchmark Results

Below is real performance data from public Nemotron, Qwen2.5 Coder, and vLLM benchmarks. These numbers show expected ranges, not exact results for every NemoClaw deployment.

Setup	Model and Provider	Hardware	Output Speed / Tokens per Second	Notes
A	Nemotron 3 Super 120B A12B (reasoning) via DeepInfra	Cloud GPUs	515.7 tokens/s output speed	Fastest listed provider in one benchmark set.
B	Nemotron 3 Super 120B A12B via Lightning AI	Cloud GPUs	491.1 tokens/s output speed	Slightly lower speed, similar latency.
C	Qwen2.5 Coder 7B via local runtime	RTX 3090	Around 27.9 tokens/s generation in community tests	Full GPU offload and flash attention enabled.
D	Qwen2.5 Coder 32B via local runtime	Apple M4 Max	About 14 tokens/s generation in reported tests	Throughput depends on context length and quantization.

When NemoClaw and OpenClaw route to these backends, end-to-end speed also depends on sandbox overhead, network path, and tool-calling depth.

Testing Details

Different sources describe NemoClaw and OpenShell performance in qualitative terms, while community tests give concrete numbers for vLLM-based setups.

What Was Tested

Nemotron 3 Super providers
ArtificialAnalysis tracks output speed and latency for Nemotron 3 Super 120B across multiple providers such as DeepInfra and Lightning AI.
Local Qwen2.5 Coder models
Community benchmarks in LocalLLaMA threads report tokens per second for Qwen2.5 Coder models on various GPUs and quantization levels.
vLLM video workload
A vLLM GitHub issue shows prompt throughput near 868 tokens/s and generation throughput around 10 tokens/s for a video description task.

How the Tests Ran

Nemotron benchmarks measure how many tokens per second providers return once streaming begins, plus time to first token. They usually fix input length and compute end-to-end time for 500 output tokens. Community Qwen2.5 Coder tests share rig specs and split prompt and response throughput.

The vLLM video case uses about 30 frames of 360p video and a short prompt like “describe this video,” then tracks throughput and GPU use. These results show the impact of vision encoders and long context on speed.

Key Findings

Cloud Nemotron providers can exceed 450 tokens/s output speed with sub-second latency to first token.
Local Qwen2.5 Coder models often reach 30 to 55 tokens/s on strong GPUs for code tasks.
Vision tasks and very long contexts reduce throughput even with fast hardware and tuned runtimes.

For NemoClaw + OpenClaw, a local vLLM backend with a 7B or 14B model often gives a good balance between speed and hardware cost.

Comparison Table

Comparison of NemoClaw + OpenClaw with plain OpenClaw, OpenAI Swarm, and LangGraph.

Criterion	NemoClaw + OpenClaw	OpenClaw (alone)	OpenAI Swarm	LangGraph
Core type	Secure stack for OpenClaw agents with sandbox and model routing	Self-hosted multi-channel agent platform without built-in sandbox	Experimental multi-agent orchestration library from OpenAI	Agent framework and orchestration library with hosted platform
Security runtime	OpenShell sandbox, kernel-level isolation, privacy router	No standard sandbox; depends on host OS, Docker, and community patterns	Depends on host environment; focuses on agent handoff logic	No built-in OS sandbox; focuses on graph orchestration and state
License	Apache 2.0 open source for NemoClaw and OpenShell	MIT / open source in official repo	Open-source framework, free core	Open-source core (MIT) plus paid hosted tiers
Model support	Nemotron local, vLLM, and cloud frontier models via router	Local models via Ollama and LM Studio, plus cloud APIs	OpenAI models and tools, with some external integrations	Any model reachable from user code or integrations
Target users	Developers and enterprises that need secure OpenClaw deployments	Power users and teams that accept more manual security work	Developers prototyping multi-agent apps on OpenAI stack	Teams building complex agent graphs with observability
Deployment	Runs on local Linux, RTX PCs, DGX, and cloud servers	Runs on local machines or servers; no extra runtime layer	Library runs wherever Python runs; many use cloud hosts	Library for self-hosting plus paid managed SaaS

Pricing Table

NemoClaw software is free and open-source, but there are optional paid support tiers and external costs for models and hosting.

Stack / Tier	Software Cost	Model / Usage Cost	Notes
NemoClaw Community	$0 for open-source stack (Apache 2.0)	Pay for Nemotron API calls or local GPU power	For developers and startups.
NemoClaw Pro	Around $79 per month in one published offer	Same model costs as Community	Adds support, integrations, and monitoring.
NemoClaw Enterprise	Custom pricing	Depends on scale and support level	Includes enterprise support and SLAs.
OpenClaw (self-hosted)	$0 for MIT-licensed core	Pay for chosen model APIs or GPU hosting	No official managed cloud listed yet.
OpenAI Swarm	Free open-source framework	Pay per token for OpenAI models	No extra fee beyond API usage.
LangGraph OSS	$0 for open-source framework	Model and infrastructure costs only	No built-in SaaS layer.
LangGraph Plus	Charges per node executed and standby time	Same plus model costs	Requires LangSmith Plus at about $39 per user monthly.

Always confirm current prices on official sites before you plan budgets.

USP — What Makes It Different

NemoClaw stands out because it pairs the open and flexible OpenClaw ecosystem with a hardened, policy-driven sandbox designed for enterprise security needs. Many frameworks focus on orchestration or developer experience but leave runtime isolation and privacy controls to each team. NemoClaw’s tight integration of OpenShell, Nemotron models, and a privacy router into one stack gives a consistent way to run always-on agents near sensitive data while still using local vLLM or cloud models.

Pros and Cons

Pros

Open-source stack with Apache 2.0 license and a community edition at zero software cost.
Strong sandboxing and privacy features through OpenShell and Nvidia Agent Toolkit.
Supports hybrid local and cloud model routing, including Nemotron and vLLM backends.
Integrates with the rich OpenClaw ecosystem of skills, channels, and tools.
Hardware-agnostic design that still runs best on Nvidia GPUs but does not require them.

Cons

Setup needs comfort with Linux, Docker, containers, and basic networking.
Governance and workflow tooling are lighter than some fully managed enterprise platforms.
Performance and stability depend on correct kernel configuration and GPU drivers.
Security still depends on good policy design; weak rules can leave gaps even inside a sandbox.

Quick Comparison Chart

Scenario	Recommended Stack	Reason
Secure always-on OpenClaw agents near internal data	NemoClaw + OpenClaw	Combines sandboxing with open-source flexibility.
Lightweight personal agent on a laptop	OpenClaw alone with local Ollama	Faster initial setup with fewer moving parts.
Multi-agent experiments on OpenAI stack	OpenAI Swarm	Tight integration with OpenAI models and tools.
Complex agent workflows with graphs and observability	LangGraph plus your model stack	Focus on orchestration and monitoring.
Fully managed enterprise agent on OpenClaw	Third-party platforms like ClawWorker	Add governance and admin features on top of OpenClaw.

Demo or Real-World Example

Here is a concrete use case: a small team builds a secure coding assistant using NemoClaw, OpenClaw, and a local vLLM backend.

Step-by-Step Use Case

Prepare hardware and OS
Use a workstation with an RTX 4090 GPU, at least 24 GB RAM, and Ubuntu 22.04. Install Nvidia drivers and CUDA that match vLLM support.
Deploy vLLM with a coding model
Install vLLM and download a Qwen2.5 Coder 7B or 14B model. Start vLLM on localhost:8000 with GPU offload and confirm a test prompt responds at around 30 to 50 tokens per second.
Install OpenShell and NemoClaw
Follow Nvidia’s guide or DGX Spark tutorial to install OpenShell and its CLI. Run the NemoClaw installation script and confirm that nemoclaw --help works.
Install and configure OpenClaw
Install OpenClaw from its official repository and run its setup steps. Configure the terminal interface or web dashboard as the main channel.
Run NemoClaw onboard and register vLLM
Run nemoclaw onboard and choose local model routing. Add a provider entry that points to the vLLM endpoint and map a logical ID such as local/qwen-coder to that deployment.
Launch the sandboxed agent
Run openclaw nemoclaw launch --profile dev-coder to start a sandboxed OpenClaw agent. Use nemoclaw dev-coder connect to enter the sandbox shell, then start the OpenClaw TUI from there.
Use the coding assistant
Send a task like “Scan this repository and list risky functions with reasons.” The agent uses vLLM to read and understand the code, plans edits, and changes files inside the sandboxed file system. Policy rules block network calls to unknown hosts and writes outside approved directories, which limits damage if something goes wrong.

This flow gives the team strong AI coding help while keeping code and secrets on their own hardware, with NemoClaw and OpenShell reducing risk from agent mistakes or hostile prompts.

Conclusion

NemoClaw turns OpenClaw from a powerful but risky agent framework into a safer option for always-on agents by wrapping it in OpenShell sandboxes and adding model routing and privacy controls. It stays open-source and hardware-agnostic, and it integrates well with Nvidia’s Nemotron models and wider AI stack.

For teams that already like OpenClaw but need stronger isolation, or that want to run local vLLM backends near sensitive data, NemoClaw offers a practical path. Good policy design and monitoring still matter, but the stack provides a better foundation than running agents without a dedicated runtime.

FAQ

1. Is NemoClaw really free to use?

Yes. NemoClaw is open source under Apache 2.0 style terms, and the community edition has no software fee. You still pay for model usage and your own hardware or cloud resources.

2. Do I need Nvidia GPUs to use NemoClaw?

No. NemoClaw and OpenShell are hardware-agnostic and run on general Linux servers. Nvidia GPUs give better performance, but they are optional.

3. Can NemoClaw work with models other than Nemotron?

Yes. NemoClaw can route to vLLM servers, Ollama, and other backends when the gateway configuration points to them. Nemotron support is a key feature, but not a requirement.

4. Does NemoClaw replace enterprise governance tools?

No. NemoClaw focuses on runtime sandboxing and privacy routing. Enterprise platforms such as ClawWorker build on top of OpenClaw and NemoClaw to add workflow, audit, and admin controls.

5. Is a local vLLM backend mandatory for secure use?

No. You can use only cloud models with NemoClaw if that matches your needs. A local vLLM backend is useful when you want more privacy, speed, or control over the model runtime.