Nvidia NemoClaw + OpenClaw: Secure Sandbox Guide for Local vLLM Agents

Nvidia NemoClaw is a new open-source stack that adds privacy and security controls to the fast-growing OpenClaw agent platform. It wraps OpenClaw agents in Nvidia’s OpenShell sandbox and connects them to local and cloud language models. This guide explains the main ideas and shows how to set up a secure local vLLM backend. It also shares benchmark data and compares NemoClaw with other agent frameworks.

What Is Nvidia NemoClaw + OpenClaw?

OpenClaw is a free, open-source, self-hosted AI agent that runs on your own hardware and connects to many chat channels and tools. It can use local and cloud models to automate tasks such as coding, file work, and web research. NemoClaw is Nvidia’s open-source stack that adds a secure runtime, models, and policies around OpenClaw with a single installation command.

NemoClaw uses Nvidia’s Agent Toolkit and the new OpenShell runtime to isolate agents in sandboxed environments. A sandbox is a locked area where the agent process runs with strict rules for file access, network connections, and data handling. This helps reduce the risk that a bug, a malicious skill, or a prompt attack can damage your system or leak data.

NemoClaw also makes it easier to mix local and cloud models for OpenClaw. It can run Nvidia Nemotron models locally on RTX GPUs or DGX systems and route some requests to frontier cloud models through a privacy router. A privacy router is a gateway that controls which calls can go to the internet and can hide sensitive fields in those calls.

Core Components in the Stack

The NemoClaw + OpenClaw stack has four main parts.

  • OpenClaw core: the agent platform with skills, channels, and workflows.
  • NemoClaw plugin: commands and “blueprints” that wire OpenClaw into OpenShell and model providers.
  • OpenShell runtime: a secure sandbox with kernel-level isolation using features such as Landlock, seccomp, and network namespaces.​
  • Model backends: Nemotron and other models via Nvidia cloud, vLLM servers, or tools such as Ollama.

Several reports describe OpenClaw as an “operating system for personal AI agents,” and NemoClaw adds the missing security layer around it for enterprise use.

Key Features

NemoClaw focuses on security, privacy, and practical deployment for always-on OpenClaw agents.

  • Open-source stack
    NemoClaw and OpenShell are released as open source under Apache 2.0 style terms, so you can download, modify, and deploy them.
  • One-command installation
    An official script installs OpenShell, configures NemoClaw, and links it to OpenClaw, which reduces manual setup steps.
  • Secure sandbox runtime
    OpenShell enforces policy-based control over file system access, network egress, and process capabilities for every agent.​
  • Privacy router for model calls
    Model requests flow through a gateway that can hide or strip sensitive data before they reach cloud providers.
  • Local and cloud model support
    NemoClaw supports Nemotron models on local GPUs and can connect to cloud models through APIs for hybrid workloads.
  • Hardware-agnostic deployment
    NemoClaw and OpenShell run on general Linux servers and are not locked to Nvidia hardware, though performance is better on Nvidia GPUs.
  • Integration with OpenClaw ecosystem
    The stack works with existing OpenClaw skills, channels, and templates, so current agents can move into the sandbox with few changes.​

How to Install or Set Up

These steps assume a Linux host such as Ubuntu 22.04 with sudo access and internet connectivity.

1. Check Hardware and OS

  • CPU: at least 4 vCPUs.
  • RAM: 8 GB minimum, 16 GB recommended.
  • Disk: at least 20 GB free.
  • OS: Ubuntu 22.04 LTS or newer.
  • GPU: an Nvidia RTX card for strong local inference, or CPU-only for smaller models.

2. Install Required Software

Install core dependencies.

  1. Update system packages with your package manager.
  2. Install Node.js 20 or later and npm 10 or later for the OpenClaw and NemoClaw CLIs.​
  3. Install Docker and confirm that the Docker daemon is running, because OpenShell uses containers to run sandboxes.
  4. Optionally install vLLM or Ollama for local models and verify that the model server runs on a local port.

3. Install OpenClaw

Use the official OpenClaw documentation for installation on Linux.

  1. Clone the official OpenClaw repository from GitHub.
  2. Run the installer or setup script described in the README.
  3. Configure at least one control channel, such as the terminal user interface or web dashboard.

4. Install OpenShell Runtime

OpenShell is the secure runtime that NemoClaw uses for sandboxing.​

  1. Install OpenShell from Nvidia’s releases or from the DGX Spark “NemoClaw” guide.
  2. Confirm that the openshell CLI runs and can create a basic sandbox.
  3. Verify that Docker integration and kernel features such as Landlock and seccomp are enabled.

5. Install NemoClaw Plugin

NemoClaw provides the orchestration layer between OpenClaw and OpenShell.

  1. Run the official installation script, which looks like:
    curl -fsSL https://nvidia.com/nemoclaw.sh | bash.​
  2. This installs the NemoClaw CLI and downloads a versioned blueprint for sandbox orchestration.​
  3. Confirm installation with nemoclaw --help and openclaw nemoclaw status.​

6. Run NemoClaw Onboard Wizard

The onboard wizard configures core components for the first run.

  1. Run nemoclaw onboard.
  2. Provide an Nvidia API key if you want to use Nemotron models from Nvidia cloud.
  3. Choose a default inference provider: Nvidia cloud, a local vLLM server, or another backend.
  4. Let the wizard create the OpenShell gateway, sandbox, and base policies.​

Launch OpenClaw inside the NemoClaw-managed sandbox.​​

  1. Use the OpenClaw CLI with the NemoClaw plugin, for example:
    openclaw nemoclaw launch --profile my-assistant.​
  2. Wait while the blueprint runner creates the sandbox container and applies network and file policies.​​
  3. Check health with openclaw nemoclaw status and inspect logs with openclaw nemoclaw logs -f.​

When status is healthy, the OpenClaw agent runs inside an OpenShell sandbox and can use local or cloud models.​​

How to Run or Use It

From a user’s view, the stack behaves like normal OpenClaw, but with extra security and routing layers. The focus here is a setup that uses a local vLLM server as the primary backend.​​

1. Prepare a Local vLLM Server

vLLM is an inference engine that serves large language models with high throughput and low latency.​​

  1. Install vLLM in a virtual environment or container on the same host or on a GPU server.​
  2. Download a compatible model, such as a Qwen2.5 Coder variant or an open Nemotron release.
  3. Start the vLLM HTTP server and note the base URL, for example http://localhost:8000.​

If the vLLM server is on a remote GPU machine, expose it with SSH port forwarding or a secure tunnel, not through a public port.​

2. Configure NemoClaw to Use vLLM

NemoClaw’s wizard can register multiple providers, including a local vLLM endpoint.

  1. Run nemoclaw providers add or re-run nemoclaw onboard.
  2. Choose a custom HTTP provider and enter the vLLM base URL.
  3. Map one or more logical model IDs to the vLLM deployment, such as local/qwen2.5-coder.

3. Update OpenClaw Model Configuration

OpenClaw stores model providers in a JSON or YAML configuration file.

  1. Open the model configuration section.
  2. Add a provider entry for a local backend, such as vLLM or Ollama, pointing to the NemoClaw or gateway URL.​
  3. Set the default agent model to the local ID, for example local/qwen2.5-coder.​

OpenClaw now sends model requests through NemoClaw and OpenShell to your vLLM server.​​

4. Start the Agent Inside the Sandbox

Use the NemoClaw commands to connect to the sandbox shell.​​

  1. Run nemoclaw my-assistant connect.
  2. A shell prompt appears inside the sandbox, where OpenClaw runs under OpenShell controls.​
  3. From here, use the OpenClaw terminal interface or web dashboard to send prompts and watch logs.​​

Every prompt passes through sandbox policies, then to the vLLM backend, and then back to OpenClaw for planning and actions.​​

5. Example: Local Coding Agent

With vLLM and a coding model such as Qwen2.5 Coder, NemoClaw can drive a local coding assistant.

  • The user sends a message like “Refactor this Python script and add logging.”
  • OpenClaw receives the request on Telegram, terminal, or another channel and turns it into a task.​
  • NemoClaw routes the model call to the local vLLM server inside the sandbox.​
  • The agent plans steps, edits files inside the sandbox file system, and reports results back.​​

The sandbox rules stop the agent from touching unapproved paths or making network calls to unknown hosts.​​

Benchmark Results

Below is real performance data from public Nemotron, Qwen2.5 Coder, and vLLM benchmarks. These numbers show expected ranges, not exact results for every NemoClaw deployment.

SetupModel and ProviderHardwareOutput Speed / Tokens per SecondNotes
ANemotron 3 Super 120B A12B (reasoning) via DeepInfraCloud GPUs515.7 tokens/s output speedFastest listed provider in one benchmark set.
BNemotron 3 Super 120B A12B via Lightning AICloud GPUs491.1 tokens/s output speedSlightly lower speed, similar latency.
CQwen2.5 Coder 7B via local runtimeRTX 3090Around 27.9 tokens/s generation in community testsFull GPU offload and flash attention enabled.
DQwen2.5 Coder 32B via local runtimeApple M4 MaxAbout 14 tokens/s generation in reported testsThroughput depends on context length and quantization.

When NemoClaw and OpenClaw route to these backends, end-to-end speed also depends on sandbox overhead, network path, and tool-calling depth.

Testing Details

Different sources describe NemoClaw and OpenShell performance in qualitative terms, while community tests give concrete numbers for vLLM-based setups.

What Was Tested

  • Nemotron 3 Super providers
    ArtificialAnalysis tracks output speed and latency for Nemotron 3 Super 120B across multiple providers such as DeepInfra and Lightning AI.​
  • Local Qwen2.5 Coder models
    Community benchmarks in LocalLLaMA threads report tokens per second for Qwen2.5 Coder models on various GPUs and quantization levels.​
  • vLLM video workload
    A vLLM GitHub issue shows prompt throughput near 868 tokens/s and generation throughput around 10 tokens/s for a video description task.​

How the Tests Ran

Nemotron benchmarks measure how many tokens per second providers return once streaming begins, plus time to first token. They usually fix input length and compute end-to-end time for 500 output tokens. Community Qwen2.5 Coder tests share rig specs and split prompt and response throughput.

The vLLM video case uses about 30 frames of 360p video and a short prompt like “describe this video,” then tracks throughput and GPU use. These results show the impact of vision encoders and long context on speed.​

Key Findings

  • Cloud Nemotron providers can exceed 450 tokens/s output speed with sub-second latency to first token.​
  • Local Qwen2.5 Coder models often reach 30 to 55 tokens/s on strong GPUs for code tasks.​
  • Vision tasks and very long contexts reduce throughput even with fast hardware and tuned runtimes.​

For NemoClaw + OpenClaw, a local vLLM backend with a 7B or 14B model often gives a good balance between speed and hardware cost.

Comparison Table

Comparison of NemoClaw + OpenClaw with plain OpenClaw, OpenAI Swarm, and LangGraph.

CriterionNemoClaw + OpenClawOpenClaw (alone)OpenAI SwarmLangGraph
Core typeSecure stack for OpenClaw agents with sandbox and model routingSelf-hosted multi-channel agent platform without built-in sandboxExperimental multi-agent orchestration library from OpenAIAgent framework and orchestration library with hosted platform
Security runtimeOpenShell sandbox, kernel-level isolation, privacy routerNo standard sandbox; depends on host OS, Docker, and community patternsDepends on host environment; focuses on agent handoff logicNo built-in OS sandbox; focuses on graph orchestration and state
LicenseApache 2.0 open source for NemoClaw and OpenShellMIT / open source in official repoOpen-source framework, free coreOpen-source core (MIT) plus paid hosted tiers
Model supportNemotron local, vLLM, and cloud frontier models via routerLocal models via Ollama and LM Studio, plus cloud APIsOpenAI models and tools, with some external integrationsAny model reachable from user code or integrations
Target usersDevelopers and enterprises that need secure OpenClaw deploymentsPower users and teams that accept more manual security workDevelopers prototyping multi-agent apps on OpenAI stackTeams building complex agent graphs with observability
DeploymentRuns on local Linux, RTX PCs, DGX, and cloud serversRuns on local machines or servers; no extra runtime layerLibrary runs wherever Python runs; many use cloud hostsLibrary for self-hosting plus paid managed SaaS

Pricing Table

NemoClaw software is free and open-source, but there are optional paid support tiers and external costs for models and hosting.

Stack / TierSoftware CostModel / Usage CostNotes
NemoClaw Community$0 for open-source stack (Apache 2.0)Pay for Nemotron API calls or local GPU powerFor developers and startups.
NemoClaw ProAround $79 per month in one published offerSame model costs as CommunityAdds support, integrations, and monitoring.
NemoClaw EnterpriseCustom pricingDepends on scale and support levelIncludes enterprise support and SLAs.
OpenClaw (self-hosted)$0 for MIT-licensed corePay for chosen model APIs or GPU hostingNo official managed cloud listed yet.
OpenAI SwarmFree open-source frameworkPay per token for OpenAI modelsNo extra fee beyond API usage.
LangGraph OSS$0 for open-source frameworkModel and infrastructure costs onlyNo built-in SaaS layer.
LangGraph PlusCharges per node executed and standby timeSame plus model costsRequires LangSmith Plus at about $39 per user monthly.

Always confirm current prices on official sites before you plan budgets.

USP — What Makes It Different

NemoClaw stands out because it pairs the open and flexible OpenClaw ecosystem with a hardened, policy-driven sandbox designed for enterprise security needs. Many frameworks focus on orchestration or developer experience but leave runtime isolation and privacy controls to each team. NemoClaw’s tight integration of OpenShell, Nemotron models, and a privacy router into one stack gives a consistent way to run always-on agents near sensitive data while still using local vLLM or cloud models.

Pros and Cons

Pros

  • Open-source stack with Apache 2.0 license and a community edition at zero software cost.
  • Strong sandboxing and privacy features through OpenShell and Nvidia Agent Toolkit.​
  • Supports hybrid local and cloud model routing, including Nemotron and vLLM backends.
  • Integrates with the rich OpenClaw ecosystem of skills, channels, and tools.​​
  • Hardware-agnostic design that still runs best on Nvidia GPUs but does not require them.

Cons

  • Setup needs comfort with Linux, Docker, containers, and basic networking.
  • Governance and workflow tooling are lighter than some fully managed enterprise platforms.
  • Performance and stability depend on correct kernel configuration and GPU drivers.​​
  • Security still depends on good policy design; weak rules can leave gaps even inside a sandbox.

Quick Comparison Chart

ScenarioRecommended StackReason
Secure always-on OpenClaw agents near internal dataNemoClaw + OpenClawCombines sandboxing with open-source flexibility.
Lightweight personal agent on a laptopOpenClaw alone with local OllamaFaster initial setup with fewer moving parts.
Multi-agent experiments on OpenAI stackOpenAI SwarmTight integration with OpenAI models and tools.
Complex agent workflows with graphs and observabilityLangGraph plus your model stackFocus on orchestration and monitoring.
Fully managed enterprise agent on OpenClawThird-party platforms like ClawWorkerAdd governance and admin features on top of OpenClaw.

Demo or Real-World Example

Here is a concrete use case: a small team builds a secure coding assistant using NemoClaw, OpenClaw, and a local vLLM backend.

Step-by-Step Use Case

  1. Prepare hardware and OS
    Use a workstation with an RTX 4090 GPU, at least 24 GB RAM, and Ubuntu 22.04. Install Nvidia drivers and CUDA that match vLLM support.​
  2. Deploy vLLM with a coding model
    Install vLLM and download a Qwen2.5 Coder 7B or 14B model. Start vLLM on localhost:8000 with GPU offload and confirm a test prompt responds at around 30 to 50 tokens per second.
  3. Install OpenShell and NemoClaw
    Follow Nvidia’s guide or DGX Spark tutorial to install OpenShell and its CLI. Run the NemoClaw installation script and confirm that nemoclaw --help works.
  4. Install and configure OpenClaw
    Install OpenClaw from its official repository and run its setup steps. Configure the terminal interface or web dashboard as the main channel.
  5. Run NemoClaw onboard and register vLLM
    Run nemoclaw onboard and choose local model routing. Add a provider entry that points to the vLLM endpoint and map a logical ID such as local/qwen-coder to that deployment.
  6. Launch the sandboxed agent
    Run openclaw nemoclaw launch --profile dev-coder to start a sandboxed OpenClaw agent. Use nemoclaw dev-coder connect to enter the sandbox shell, then start the OpenClaw TUI from there.​​
  7. Use the coding assistant
    Send a task like “Scan this repository and list risky functions with reasons.” The agent uses vLLM to read and understand the code, plans edits, and changes files inside the sandboxed file system. Policy rules block network calls to unknown hosts and writes outside approved directories, which limits damage if something goes wrong.​

This flow gives the team strong AI coding help while keeping code and secrets on their own hardware, with NemoClaw and OpenShell reducing risk from agent mistakes or hostile prompts.

Conclusion

NemoClaw turns OpenClaw from a powerful but risky agent framework into a safer option for always-on agents by wrapping it in OpenShell sandboxes and adding model routing and privacy controls. It stays open-source and hardware-agnostic, and it integrates well with Nvidia’s Nemotron models and wider AI stack.

For teams that already like OpenClaw but need stronger isolation, or that want to run local vLLM backends near sensitive data, NemoClaw offers a practical path. Good policy design and monitoring still matter, but the stack provides a better foundation than running agents without a dedicated runtime.

FAQ

1. Is NemoClaw really free to use?

Yes. NemoClaw is open source under Apache 2.0 style terms, and the community edition has no software fee. You still pay for model usage and your own hardware or cloud resources.

2. Do I need Nvidia GPUs to use NemoClaw?

No. NemoClaw and OpenShell are hardware-agnostic and run on general Linux servers. Nvidia GPUs give better performance, but they are optional.

3. Can NemoClaw work with models other than Nemotron?

Yes. NemoClaw can route to vLLM servers, Ollama, and other backends when the gateway configuration points to them. Nemotron support is a key feature, but not a requirement.

4. Does NemoClaw replace enterprise governance tools?

No. NemoClaw focuses on runtime sandboxing and privacy routing. Enterprise platforms such as ClawWorker build on top of OpenClaw and NemoClaw to add workflow, audit, and admin controls.

5. Is a local vLLM backend mandatory for secure use?

No. You can use only cloud models with NemoClaw if that matches your needs. A local vLLM backend is useful when you want more privacy, speed, or control over the model runtime.