Run SpatialLM on Ubuntu: Step by Step Installation Guide

SpatialLM is a cutting-edge AI tool designed to analyze videos, generate 3D maps of spaces, and identify structural elements such as walls, doors, windows, and furniture. This guide provides a step-by-step walkthrough for installing, configuring, running inference, and visualizing SpatialLM on Ubuntu.
Introduction to SpatialLM
SpatialLM is a large language model designed for spatial understanding through 3D scene reconstruction. It processes point cloud data from sources like monocular video sequences, RGBD images, and LiDAR sensors to generate structured outputs such as floor plans or bounding boxes for architectural elements.
Key Features:
- Multimodal data processing (video, RGBD images, LiDAR)
- High-level semantic understanding of environments
- Lightweight models suitable for consumer-grade GPUs
How SpatialLM Works
Video Analysis and 3D Mapping
SpatialLM uses input videos to create 3D point cloud representations of environments. It identifies objects within the space while ensuring spatial relationships remain consistent across viewpoints.
Master SLAM and Point Cloud Encoding
The tool employs Simultaneous Localization and Mapping (SLAM) techniques to generate point clouds from video data. These point clouds are compressed using specialized encoders for efficient processing.
Large Language Model Integration
Compressed spatial data is fed into a large language model that generates structured outputs in formats such as:
- Detailed structural datasets
- 2D floor plans
- Industry-standard formats for architectural analysis
Prerequisites for Running SpatialLM on Ubuntu
Before proceeding with installation, ensure your system meets the following requirements:
- Operating System: Ubuntu 20.04 or later
- Python Version: Python 3.11
- PyTorch Version: PyTorch 2.4.1
- CUDA Version: CUDA Toolkit 12.4
- GPU: NVIDIA GPU with CUDA support
- Dependencies: Conda package manager and Poetry for dependency management
Installation Steps
Step 1: Cloning the Repository
Start by cloning the SpatialLM GitHub repository:
git clone https://github.com/manycore-research/SpatialLM.git
cd SpatialLM
Step 2: Setting Up the Environment
Create a Conda environment tailored for SpatialLM:
conda create -n spatiallm python=3.11
conda activate spatiallm
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash
Step 3: Installing Dependencies
Install required dependencies using Poetry:
pip install poetry && poetry config virtualenvs.create false --local
poetry install poe install-torchsparse # Building wheel for torchsparse may take time.
Running Inference with SpatialLM
Preparing Input Data
Download preprocessed point clouds from Hugging Face:
huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir .
Executing Inference
Run the inference script to process the point cloud:
python inference.py --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt --model_path manycore-research/SpatialLM-Llama-1B
The output will include bounding boxes and labels for structural elements like walls, doors, and windows.
Visualizing Outputs
Use the rerun
tool to visualize the processed outputs:
rerun --point_cloud pcd/scene0000_00.ply --output scene0000_00.txt
This visualization helps interpret spatial layouts effectively.
Applications of SpatialLM
Interior Design and Architecture
SpatialLM enables architects to quickly map spaces and optimize layouts by identifying structural constraints.
Robotics and Intelligent Assistants
Robots equipped with SpatialLM can navigate environments intelligently based on real-time spatial awareness.
Enhanced Human Interaction
SpatialLM serves as an intelligent assistant capable of answering spatial queries or suggesting modifications in room layouts.
Troubleshooting Common Issues
- Dependency Errors:
Ensure all dependencies are installed correctly using Poetry. - Inference Failures:
Check if the input point cloud is axis-aligned as required by SpatialLM.
CUDA Compatibility:
Verify that your GPU supports CUDA 12.4 by running:
nvcc --version
Conclusion
SpatialLM is a revolutionary tool that simplifies 3D space mapping and analysis across various industries. Its ability to process diverse input formats makes it highly versatile for applications ranging from architecture to robotics.
By following this guide, you can successfully install and run SpatialLM on Ubuntu while exploring its full potential in spatial reasoning tasks.