Nvidia NemoClaw + OpenClaw: Secure Sandbox Guide for Local vLLM Agents
Nvidia NemoClaw is a new open-source stack that adds privacy and security controls to the fast-growing OpenClaw agent platform. It wraps OpenClaw agents in Nvidia’s OpenShell sandbox and connects them to local and cloud language models. This guide explains the main ideas and shows how to set up a secure local vLLM backend. It also shares benchmark data and compares NemoClaw with other agent frameworks.
What Is Nvidia NemoClaw + OpenClaw?
OpenClaw is a free, open-source, self-hosted AI agent that runs on your own hardware and connects to many chat channels and tools. It can use local and cloud models to automate tasks such as coding, file work, and web research. NemoClaw is Nvidia’s open-source stack that adds a secure runtime, models, and policies around OpenClaw with a single installation command.
NemoClaw uses Nvidia’s Agent Toolkit and the new OpenShell runtime to isolate agents in sandboxed environments. A sandbox is a locked area where the agent process runs with strict rules for file access, network connections, and data handling. This helps reduce the risk that a bug, a malicious skill, or a prompt attack can damage your system or leak data.
NemoClaw also makes it easier to mix local and cloud models for OpenClaw. It can run Nvidia Nemotron models locally on RTX GPUs or DGX systems and route some requests to frontier cloud models through a privacy router. A privacy router is a gateway that controls which calls can go to the internet and can hide sensitive fields in those calls.
Core Components in the Stack
The NemoClaw + OpenClaw stack has four main parts.
- OpenClaw core: the agent platform with skills, channels, and workflows.
- NemoClaw plugin: commands and “blueprints” that wire OpenClaw into OpenShell and model providers.
- OpenShell runtime: a secure sandbox with kernel-level isolation using features such as Landlock, seccomp, and network namespaces.
- Model backends: Nemotron and other models via Nvidia cloud, vLLM servers, or tools such as Ollama.
Several reports describe OpenClaw as an “operating system for personal AI agents,” and NemoClaw adds the missing security layer around it for enterprise use.
Key Features
NemoClaw focuses on security, privacy, and practical deployment for always-on OpenClaw agents.
- Open-source stack
NemoClaw and OpenShell are released as open source under Apache 2.0 style terms, so you can download, modify, and deploy them. - One-command installation
An official script installs OpenShell, configures NemoClaw, and links it to OpenClaw, which reduces manual setup steps. - Secure sandbox runtime
OpenShell enforces policy-based control over file system access, network egress, and process capabilities for every agent. - Privacy router for model calls
Model requests flow through a gateway that can hide or strip sensitive data before they reach cloud providers. - Local and cloud model support
NemoClaw supports Nemotron models on local GPUs and can connect to cloud models through APIs for hybrid workloads. - Hardware-agnostic deployment
NemoClaw and OpenShell run on general Linux servers and are not locked to Nvidia hardware, though performance is better on Nvidia GPUs. - Integration with OpenClaw ecosystem
The stack works with existing OpenClaw skills, channels, and templates, so current agents can move into the sandbox with few changes.
How to Install or Set Up
These steps assume a Linux host such as Ubuntu 22.04 with sudo access and internet connectivity.
1. Check Hardware and OS
- CPU: at least 4 vCPUs.
- RAM: 8 GB minimum, 16 GB recommended.
- Disk: at least 20 GB free.
- OS: Ubuntu 22.04 LTS or newer.
- GPU: an Nvidia RTX card for strong local inference, or CPU-only for smaller models.
2. Install Required Software
Install core dependencies.
- Update system packages with your package manager.
- Install Node.js 20 or later and npm 10 or later for the OpenClaw and NemoClaw CLIs.
- Install Docker and confirm that the Docker daemon is running, because OpenShell uses containers to run sandboxes.
- Optionally install vLLM or Ollama for local models and verify that the model server runs on a local port.
3. Install OpenClaw
Use the official OpenClaw documentation for installation on Linux.
- Clone the official OpenClaw repository from GitHub.
- Run the installer or setup script described in the README.
- Configure at least one control channel, such as the terminal user interface or web dashboard.
4. Install OpenShell Runtime
OpenShell is the secure runtime that NemoClaw uses for sandboxing.
- Install OpenShell from Nvidia’s releases or from the DGX Spark “NemoClaw” guide.
- Confirm that the
openshellCLI runs and can create a basic sandbox. - Verify that Docker integration and kernel features such as Landlock and seccomp are enabled.
5. Install NemoClaw Plugin
NemoClaw provides the orchestration layer between OpenClaw and OpenShell.
- Run the official installation script, which looks like:
curl -fsSL https://nvidia.com/nemoclaw.sh | bash. - This installs the NemoClaw CLI and downloads a versioned blueprint for sandbox orchestration.
- Confirm installation with
nemoclaw --helpandopenclaw nemoclaw status.
6. Run NemoClaw Onboard Wizard
The onboard wizard configures core components for the first run.
- Run
nemoclaw onboard. - Provide an Nvidia API key if you want to use Nemotron models from Nvidia cloud.
- Choose a default inference provider: Nvidia cloud, a local vLLM server, or another backend.
- Let the wizard create the OpenShell gateway, sandbox, and base policies.
7. Link OpenClaw to NemoClaw
Launch OpenClaw inside the NemoClaw-managed sandbox.
- Use the OpenClaw CLI with the NemoClaw plugin, for example:
openclaw nemoclaw launch --profile my-assistant. - Wait while the blueprint runner creates the sandbox container and applies network and file policies.
- Check health with
openclaw nemoclaw statusand inspect logs withopenclaw nemoclaw logs -f.
When status is healthy, the OpenClaw agent runs inside an OpenShell sandbox and can use local or cloud models.
How to Run or Use It
From a user’s view, the stack behaves like normal OpenClaw, but with extra security and routing layers. The focus here is a setup that uses a local vLLM server as the primary backend.
1. Prepare a Local vLLM Server
vLLM is an inference engine that serves large language models with high throughput and low latency.
- Install vLLM in a virtual environment or container on the same host or on a GPU server.
- Download a compatible model, such as a Qwen2.5 Coder variant or an open Nemotron release.
- Start the vLLM HTTP server and note the base URL, for example
http://localhost:8000.
If the vLLM server is on a remote GPU machine, expose it with SSH port forwarding or a secure tunnel, not through a public port.
2. Configure NemoClaw to Use vLLM
NemoClaw’s wizard can register multiple providers, including a local vLLM endpoint.
- Run
nemoclaw providers addor re-runnemoclaw onboard. - Choose a custom HTTP provider and enter the vLLM base URL.
- Map one or more logical model IDs to the vLLM deployment, such as
local/qwen2.5-coder.
3. Update OpenClaw Model Configuration
OpenClaw stores model providers in a JSON or YAML configuration file.
- Open the model configuration section.
- Add a provider entry for a local backend, such as vLLM or Ollama, pointing to the NemoClaw or gateway URL.
- Set the default agent model to the local ID, for example
local/qwen2.5-coder.
OpenClaw now sends model requests through NemoClaw and OpenShell to your vLLM server.
4. Start the Agent Inside the Sandbox
Use the NemoClaw commands to connect to the sandbox shell.
- Run
nemoclaw my-assistant connect. - A shell prompt appears inside the sandbox, where OpenClaw runs under OpenShell controls.
- From here, use the OpenClaw terminal interface or web dashboard to send prompts and watch logs.
Every prompt passes through sandbox policies, then to the vLLM backend, and then back to OpenClaw for planning and actions.
5. Example: Local Coding Agent
With vLLM and a coding model such as Qwen2.5 Coder, NemoClaw can drive a local coding assistant.
- The user sends a message like “Refactor this Python script and add logging.”
- OpenClaw receives the request on Telegram, terminal, or another channel and turns it into a task.
- NemoClaw routes the model call to the local vLLM server inside the sandbox.
- The agent plans steps, edits files inside the sandbox file system, and reports results back.
The sandbox rules stop the agent from touching unapproved paths or making network calls to unknown hosts.
Benchmark Results
Below is real performance data from public Nemotron, Qwen2.5 Coder, and vLLM benchmarks. These numbers show expected ranges, not exact results for every NemoClaw deployment.
When NemoClaw and OpenClaw route to these backends, end-to-end speed also depends on sandbox overhead, network path, and tool-calling depth.
Testing Details
Different sources describe NemoClaw and OpenShell performance in qualitative terms, while community tests give concrete numbers for vLLM-based setups.
What Was Tested
- Nemotron 3 Super providers
ArtificialAnalysis tracks output speed and latency for Nemotron 3 Super 120B across multiple providers such as DeepInfra and Lightning AI. - Local Qwen2.5 Coder models
Community benchmarks in LocalLLaMA threads report tokens per second for Qwen2.5 Coder models on various GPUs and quantization levels. - vLLM video workload
A vLLM GitHub issue shows prompt throughput near 868 tokens/s and generation throughput around 10 tokens/s for a video description task.
How the Tests Ran
Nemotron benchmarks measure how many tokens per second providers return once streaming begins, plus time to first token. They usually fix input length and compute end-to-end time for 500 output tokens. Community Qwen2.5 Coder tests share rig specs and split prompt and response throughput.
The vLLM video case uses about 30 frames of 360p video and a short prompt like “describe this video,” then tracks throughput and GPU use. These results show the impact of vision encoders and long context on speed.
Key Findings
- Cloud Nemotron providers can exceed 450 tokens/s output speed with sub-second latency to first token.
- Local Qwen2.5 Coder models often reach 30 to 55 tokens/s on strong GPUs for code tasks.
- Vision tasks and very long contexts reduce throughput even with fast hardware and tuned runtimes.
For NemoClaw + OpenClaw, a local vLLM backend with a 7B or 14B model often gives a good balance between speed and hardware cost.
Comparison Table
Comparison of NemoClaw + OpenClaw with plain OpenClaw, OpenAI Swarm, and LangGraph.
Pricing Table
NemoClaw software is free and open-source, but there are optional paid support tiers and external costs for models and hosting.
Always confirm current prices on official sites before you plan budgets.
USP — What Makes It Different
NemoClaw stands out because it pairs the open and flexible OpenClaw ecosystem with a hardened, policy-driven sandbox designed for enterprise security needs. Many frameworks focus on orchestration or developer experience but leave runtime isolation and privacy controls to each team. NemoClaw’s tight integration of OpenShell, Nemotron models, and a privacy router into one stack gives a consistent way to run always-on agents near sensitive data while still using local vLLM or cloud models.
Pros and Cons
Pros
- Open-source stack with Apache 2.0 license and a community edition at zero software cost.
- Strong sandboxing and privacy features through OpenShell and Nvidia Agent Toolkit.
- Supports hybrid local and cloud model routing, including Nemotron and vLLM backends.
- Integrates with the rich OpenClaw ecosystem of skills, channels, and tools.
- Hardware-agnostic design that still runs best on Nvidia GPUs but does not require them.
Cons
- Setup needs comfort with Linux, Docker, containers, and basic networking.
- Governance and workflow tooling are lighter than some fully managed enterprise platforms.
- Performance and stability depend on correct kernel configuration and GPU drivers.
- Security still depends on good policy design; weak rules can leave gaps even inside a sandbox.
Quick Comparison Chart
Demo or Real-World Example
Here is a concrete use case: a small team builds a secure coding assistant using NemoClaw, OpenClaw, and a local vLLM backend.
Step-by-Step Use Case
- Prepare hardware and OS
Use a workstation with an RTX 4090 GPU, at least 24 GB RAM, and Ubuntu 22.04. Install Nvidia drivers and CUDA that match vLLM support. - Deploy vLLM with a coding model
Install vLLM and download a Qwen2.5 Coder 7B or 14B model. Start vLLM onlocalhost:8000with GPU offload and confirm a test prompt responds at around 30 to 50 tokens per second. - Install OpenShell and NemoClaw
Follow Nvidia’s guide or DGX Spark tutorial to install OpenShell and its CLI. Run the NemoClaw installation script and confirm thatnemoclaw --helpworks. - Install and configure OpenClaw
Install OpenClaw from its official repository and run its setup steps. Configure the terminal interface or web dashboard as the main channel. - Run NemoClaw onboard and register vLLM
Runnemoclaw onboardand choose local model routing. Add a provider entry that points to the vLLM endpoint and map a logical ID such aslocal/qwen-coderto that deployment. - Launch the sandboxed agent
Runopenclaw nemoclaw launch --profile dev-coderto start a sandboxed OpenClaw agent. Usenemoclaw dev-coder connectto enter the sandbox shell, then start the OpenClaw TUI from there. - Use the coding assistant
Send a task like “Scan this repository and list risky functions with reasons.” The agent uses vLLM to read and understand the code, plans edits, and changes files inside the sandboxed file system. Policy rules block network calls to unknown hosts and writes outside approved directories, which limits damage if something goes wrong.
This flow gives the team strong AI coding help while keeping code and secrets on their own hardware, with NemoClaw and OpenShell reducing risk from agent mistakes or hostile prompts.
Conclusion
NemoClaw turns OpenClaw from a powerful but risky agent framework into a safer option for always-on agents by wrapping it in OpenShell sandboxes and adding model routing and privacy controls. It stays open-source and hardware-agnostic, and it integrates well with Nvidia’s Nemotron models and wider AI stack.
For teams that already like OpenClaw but need stronger isolation, or that want to run local vLLM backends near sensitive data, NemoClaw offers a practical path. Good policy design and monitoring still matter, but the stack provides a better foundation than running agents without a dedicated runtime.
FAQ
1. Is NemoClaw really free to use?
Yes. NemoClaw is open source under Apache 2.0 style terms, and the community edition has no software fee. You still pay for model usage and your own hardware or cloud resources.
2. Do I need Nvidia GPUs to use NemoClaw?
No. NemoClaw and OpenShell are hardware-agnostic and run on general Linux servers. Nvidia GPUs give better performance, but they are optional.
3. Can NemoClaw work with models other than Nemotron?
Yes. NemoClaw can route to vLLM servers, Ollama, and other backends when the gateway configuration points to them. Nemotron support is a key feature, but not a requirement.
4. Does NemoClaw replace enterprise governance tools?
No. NemoClaw focuses on runtime sandboxing and privacy routing. Enterprise platforms such as ClawWorker build on top of OpenClaw and NemoClaw to add workflow, audit, and admin controls.
5. Is a local vLLM backend mandatory for secure use?
No. You can use only cloud models with NemoClaw if that matches your needs. A local vLLM backend is useful when you want more privacy, speed, or control over the model runtime.