gemma 3

How to Run Gemma 3 on Ubuntu: A Comprehensive Guide

Anas Mohammad

Mar 21, 2025 • 3 min read

Gemma 3, Google's latest open-weight multimodal AI model, is a groundbreaking tool capable of processing text, images, and short videos. Designed for accessibility, efficiency, and versatility, it is an excellent choice for developers and researchers.

This guide provides a detailed walkthrough on running Gemma 3 on Ubuntu, covering prerequisites, installation methods, and optimization tips.

Overview of Gemma 3

Key Features:

Multimodal Capabilities: Processes text, images, and videos seamlessly.
Open Weights: Allows fine-tuning and commercial use.
Optimized Performance: Runs efficiently on single GPUs.
Multilingual Support: Compatible with over 140 languages.
Scalability: Model sizes range from 1 billion to 27 billion parameters.

Gemma 3 is useful for applications such as content creation, multilingual translation, medical image analysis, and autonomous systems.

Prerequisites

Before installing Gemma 3 on Ubuntu, ensure your system meets the following requirements:

Hardware Requirements

GPU:
- Small models (1B or 4B parameters): NVIDIA GTX 1650 (4GB VRAM) or equivalent.
- Large models (12B or 27B parameters): NVIDIA RTX 3090 (24GB VRAM) or higher.
Disk Space: At least 100 GB free storage.
RAM: Minimum of 16 GB recommended.

Software Requirements

Ubuntu 20.04 or later (64-bit).
NVIDIA CUDA Toolkit for GPU acceleration.
Python (version ≥3.8).
Administrative privileges for software installation.

Optional Tools

Jupyter Notebook for experimentation.
Docker for containerized deployment.

Step-by-Step Installation Guide

There are two primary methods to run Gemma 3 on Ubuntu: using Ollama or Hugging Face Transformers. Both approaches are covered below.

Method 1: Using Ollama

Ollama simplifies running AI models locally. Follow these steps:

Verify Installation Check if the model is running:

ollama list

Install Gemma 3 Models Run the appropriate command based on model size:

ollama run gemma3:1b
ollama run gemma3:4b
ollama run gemma3:12b
ollama run gemma3:27b

Start the Ollama Server Launch the server:

ollama serve

Install Ollama Download and install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Install GPU Utilities Ensure your GPU is properly configured:

sudo apt install pciutils lshw -y

Update System Packages

sudo apt update && sudo apt upgrade -y

Method 2: Using Hugging Face Transformers

Hugging Face provides flexibility for developers familiar with Python and machine learning.

Fine-Tune the Model (Optional)

from peft import LoraConfig

config = LoraConfig(...)
model = get_peft_model(model, config)

# Proceed with fine-tuning...

Run Inference

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3")
inputs = tokenizer("Your input text", return_tensors="pt")
outputs = model(**inputs)

print(outputs)

Download Pretrained Weights

from transformers import AutoModel

model = AutoModel.from_pretrained("google/gemma-3")

Install Python Dependencies

pip install transformers torch torchvision accelerate

Optimizations for Low-End Devices

If running Gemma 3 on consumer-grade hardware:

Use smaller models (gemma3:1b or gemma3:4b).
Optimize inference speed using tools like Llama.cpp.

Enable quantization (e.g., 4-bit precision) to reduce memory usage:

ollama quantize --model gemma3 --precision int4

Practical Applications

Content Creation:
- Automate blog writing with multimodal inputs (text + images).
- Generate social media posts in multiple languages.
Medical Image Analysis:
- Analyze X-rays or MRI scans using high-resolution image processing.
Multilingual Chatbots:
- Build AI assistants that understand and respond in over 140 languages.
Autonomous Systems:
- Train robots or self-driving cars using multimodal datasets.

Troubleshooting Common Issues

Insufficient VRAM Error:
- Reduce model size or enable quantization.
- Run in CPU-only mode as a fallback (not recommended for large models).
Slow Performance:
- Use smaller models or distributed training across multiple GPUs.
- Optimize batch sizes during inference.
CUDA Not Found:

Ensure CUDA is installed and properly configured:

nvidia-smi

Conclusion

Running Gemma 3 on Ubuntu opens up a world of possibilities for developers and researchers. By following this guide, you can harness the power of this state-of-the-art AI model for applications ranging from content generation to advanced image analysis.