Run SmolVLM2 2.2B on Linux/ Ubuntu: Installation Guide
SmolVLM2 2.2B is a cutting-edge vision and video model that has garnered significant attention in the AI community for its efficiency and performance. This article provides a detailed guide on how to install and run SmolVLM2 2.2B on Linux, covering the prerequisites, installation steps, and troubleshooting tips.
What is SmolVLM2 2.2B?
SmolVLM2 2.2B is part of a series of models designed to be compact yet powerful, making them suitable for deployment on a variety of devices, including those with limited computational resources. The model is available in different sizes, but the 2.2B version is particularly notable for its balance between size and capability.
Prerequisites to Run SmolVLM2 2.2B on Linux
Before you start installing SmolVLM2 2.2B on your Linux system, ensure you have the following prerequisites:
- Linux Distribution: You can use any modern Linux distribution such as Ubuntu, Debian, or Fedora. This guide will focus on Ubuntu as an example.
- Python Environment: Python 3.8 or later is recommended. You will also need to create a virtual environment to isolate your project dependencies.
- GPU Support: While not strictly necessary, having a GPU (Graphics Processing Unit) significantly speeds up model execution. Ensure your system has a compatible GPU and install the appropriate drivers.
- Memory and Storage: Ensure you have sufficient RAM (at least 16 GB recommended) and disk space (about 10 GB for the model and dependencies).
Step-by-Step Installation Guide
Step 1: Update Your Linux System
First, update your Linux system to ensure you have the latest packages:
sudo apt update
sudo apt upgrade
Step 2: Install Python and Virtual Environment
Activate the Virtual Environment:
source smolvlm-env/bin/activate
Create a Virtual Environment: Install virtualenv
if you don't have it, then create a new virtual environment:
sudo apt install python3-venv
python3 -m venv smolvlm-env
Install Python: If Python is not already installed, you can install it using:
sudo apt install python3 python3-pip
Step 3: Install Required Packages
Install the necessary packages for running SmolVLM2 2.2B:
pip install torch torchvision transformers
If you have a GPU, ensure you install the CUDA toolkit and cuDNN library compatible with your GPU. You can find instructions on the NVIDIA website.
Step 4: Download SmolVLM2 2.2B Model
You can download the SmolVLM2 2.2B model from the Hugging Face model hub. First, install the Hugging Face transformers
library if you haven't already:
pip install transformers
Then, download the model using the following Python script:
from transformers import AutoModelForVision2Seq, AutoFeatureExtractor
# Load model and feature extractor
model_name = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
model = AutoModelForVision2Seq.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
This script will automatically download the model if it's not already present locally.
Step 5: Run SmolVLM2 2.2B
To run the model, you can use a simple Python script. Here’s an example that processes an image:
from PIL import Image
import torch
# Load image
image = Image.open("path/to/your/image.jpg")
# Preprocess image
inputs = feature_extractor(images=image, return_tensors="pt")
# Run inference
outputs = model.generate(**inputs)
# Print result
print(outputs)
Replace "path/to/your/image.jpg"
with the path to the image you want to process.
Creating a GUI Application with Gradio
For a more interactive experience, you can create a GUI application using Gradio. First, install Gradio:
pip install gradio
Then, create a simple Gradio app:
import gradio as gr
from PIL import Image
import torch
# Load model and feature extractor
model_name = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
model = AutoModelForVision2Seq.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
def process_image(image):
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model.generate(**inputs)
return outputs
demo = gr.Interface(
fn=process_image,
inputs=gr.Image(type="pil"),
outputs="text",
title="SmolVLM2 2.2B Image Processing",
description="Upload an image to generate text",
)
if __name__ == "__main__":
demo.launch()
Run this script to launch the Gradio app in your web browser.
Real-World Coding Examples
Example 1: Using Python with Hugging Face Transformers
To run SmolVLM2 2.2B on Linux using Python and the Hugging Face Transformers library, follow these steps:
Perform Inference: Use the loaded model to perform inference on an image. Here’s an example of how to generate a text description of an image:PythonCopy
from PIL import Image
# Load an image
image = Image.open("path_to_your_image.jpg")
# Prepare inputs
inputs = processor(images=image, return_tensors="pt").to(device)
# Generate text
generated_ids = model.generate(**inputs, max_new_tokens=64)
generated_text = processor.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)
Load the Model and Processor: Load the SmolVLM2 2.2B model and processor using the following Python script:PythonCopy
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
# Replace with the actual model path
model_path = "HuggingFaceTB/SmolVLM2-2.2B-Instruct"
# Load the processor and model
processor = AutoProcessor.from_pretrained(model_path)
model = AutoModelForImageTextToText.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
_attn_implementation="flash_attention_2"
).to("cuda" if torch.cuda.is_available() else "cpu")
# Use GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
Install Dependencies: Ensure you have the latest version of the Transformers library installed. You can install it directly from the GitHub repository to get the most recent features:bashCopy
pip install git+https://github.com/huggingface/transformers.git
This script loads the SmolVLM2 2.2B model, processes an image, and generates a text description of the image.
Example 2: Using Docker for a Portable Setup
For a more isolated and portable setup, you can run SmolVLM2 2.2B using Docker on Linux. This ensures that all dependencies are contained within the Docker environment.
- Install Docker: Download and install Docker on your Linux system. You can find the installation instructions on the official Docker website.
- Access the Web Interface: Open your web browser and navigate to
http://localhost:5000
to access the web interface for SmolVLM2.
Pull and Run the Docker Image: Use the following commands to pull and run the Docker image:bashCopy
docker pull clamsproject/app-smolvlm2-captioner
docker run -p 5000:5000 clamsproject/app-smolvlm2-captioner
This will start the SmolVLM2 server inside a Docker container, accessible at http://localhost:5000
.
Troubleshooting Tips
- GPU Issues: Ensure your GPU drivers are up-to-date. If you encounter CUDA errors, check that your CUDA and cuDNN versions are compatible with your PyTorch installation.
- Memory Errors: If you encounter memory errors, consider reducing the batch size or using a smaller model.
- Model Download Issues: If the model fails to download, check your internet connection and try downloading it manually from the Hugging Face model hub.
Future Directions
As AI models continue to evolve, it's essential to stay updated with the latest developments. Here are some future directions you might consider:
- Experiment with Different Models: Explore other models available on the Hugging Face hub to compare performance and capabilities.
- Optimize Performance: Look into techniques like model pruning or quantization to improve efficiency on devices with limited resources.
- Integrate with Other Tools: Consider integrating SmolVLM2 with other AI tools or frameworks to build more complex applications.
Future Directions
As AI models continue to evolve, it's essential to stay updated with the latest developments. Here are some future directions you might consider:
- Experiment with Different Models: Explore other models available on the Hugging Face hub to compare performance and capabilities.
- Optimize Performance: Look into techniques like model pruning or quantization to improve efficiency on devices with limited resources.
- Integrate with Other Tools: Consider integrating SmolVLM2 with other AI tools or frameworks to build more complex applications.
Conclusion
Running SmolVLM2 2.2B on Linux is a straightforward process that requires careful setup of your environment and dependencies. By following this guide, you can leverage the power of this model for vision and video tasks, whether you're working on a research project or building a practical application.