How to Run Gemma 3 on Ubuntu: A Comprehensive Guide

Gemma 3, Google's latest open-weight multimodal AI model, is a groundbreaking tool capable of processing text, images, and short videos. Designed for accessibility, efficiency, and versatility, it is an excellent choice for developers and researchers.
This guide provides a detailed walkthrough on running Gemma 3 on Ubuntu, covering prerequisites, installation methods, and optimization tips.
Overview of Gemma 3
Key Features:
- Multimodal Capabilities: Processes text, images, and videos seamlessly.
- Open Weights: Allows fine-tuning and commercial use.
- Optimized Performance: Runs efficiently on single GPUs.
- Multilingual Support: Compatible with over 140 languages.
- Scalability: Model sizes range from 1 billion to 27 billion parameters.
Gemma 3 is useful for applications such as content creation, multilingual translation, medical image analysis, and autonomous systems.
Prerequisites
Before installing Gemma 3 on Ubuntu, ensure your system meets the following requirements:
Hardware Requirements
- GPU:
- Small models (1B or 4B parameters): NVIDIA GTX 1650 (4GB VRAM) or equivalent.
- Large models (12B or 27B parameters): NVIDIA RTX 3090 (24GB VRAM) or higher.
- Disk Space: At least 100 GB free storage.
- RAM: Minimum of 16 GB recommended.
Software Requirements
- Ubuntu 20.04 or later (64-bit).
- NVIDIA CUDA Toolkit for GPU acceleration.
- Python (version ≥3.8).
- Administrative privileges for software installation.
Optional Tools
- Jupyter Notebook for experimentation.
- Docker for containerized deployment.
Step-by-Step Installation Guide
There are two primary methods to run Gemma 3 on Ubuntu: using Ollama or Hugging Face Transformers. Both approaches are covered below.
Method 1: Using Ollama
Ollama simplifies running AI models locally. Follow these steps:
Verify Installation Check if the model is running:
ollama list
Install Gemma 3 Models Run the appropriate command based on model size:
ollama run gemma3:1b
ollama run gemma3:4b
ollama run gemma3:12b
ollama run gemma3:27b
Start the Ollama Server Launch the server:
ollama serve
Install Ollama Download and install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
Install GPU Utilities Ensure your GPU is properly configured:
sudo apt install pciutils lshw -y
Update System Packages
sudo apt update && sudo apt upgrade -y
Method 2: Using Hugging Face Transformers
Hugging Face provides flexibility for developers familiar with Python and machine learning.
Fine-Tune the Model (Optional)
from peft import LoraConfig
config = LoraConfig(...)
model = get_peft_model(model, config)
# Proceed with fine-tuning...
Run Inference
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3")
inputs = tokenizer("Your input text", return_tensors="pt")
outputs = model(**inputs)
print(outputs)
Download Pretrained Weights
from transformers import AutoModel
model = AutoModel.from_pretrained("google/gemma-3")
Install Python Dependencies
pip install transformers torch torchvision accelerate
Optimizations for Low-End Devices
If running Gemma 3 on consumer-grade hardware:
- Use smaller models (
gemma3:1b
orgemma3:4b
). - Optimize inference speed using tools like
Llama.cpp
.
Enable quantization (e.g., 4-bit precision) to reduce memory usage:
ollama quantize --model gemma3 --precision int4
Practical Applications
- Content Creation:
- Automate blog writing with multimodal inputs (text + images).
- Generate social media posts in multiple languages.
- Medical Image Analysis:
- Analyze X-rays or MRI scans using high-resolution image processing.
- Multilingual Chatbots:
- Build AI assistants that understand and respond in over 140 languages.
- Autonomous Systems:
- Train robots or self-driving cars using multimodal datasets.
Troubleshooting Common Issues
- Insufficient VRAM Error:
- Reduce model size or enable quantization.
- Run in CPU-only mode as a fallback (not recommended for large models).
- Slow Performance:
- Use smaller models or distributed training across multiple GPUs.
- Optimize batch sizes during inference.
- CUDA Not Found:
Ensure CUDA is installed and properly configured:
nvidia-smi
Conclusion
Running Gemma 3 on Ubuntu opens up a world of possibilities for developers and researchers. By following this guide, you can harness the power of this state-of-the-art AI model for applications ranging from content generation to advanced image analysis.