microsoft

Run Microsoft Phi-4 on Ubuntu: A Comprehensive Guide

Anas Mohammad

May 1, 2025 • 7 min read

Microsoft’s Phi-4 is a state-of-the-art multimodal AI model, capable of advanced language, vision, and audio understanding. Running Phi-4 locally on Ubuntu allows developers, researchers, and enthusiasts to leverage its capabilities.

What is Microsoft Phi-4?

Microsoft Phi-4 is a cutting-edge multimodal AI model, designed to process and generate text, understand images, and transcribe or translate audio.

Running Phi-4 locally on Ubuntu provides:

Data privacy: Keep your data on-premises.
Customization: Fine-tune and experiment freely.
Performance: Utilize local hardware for faster inference.

System Requirements

Before installing Phi-4, ensure your system meets the following requirements:

Operating System: Ubuntu 20.04 or later (64-bit)
GPU: NVIDIA GPU with CUDA support (RTX 4090, A6000, or equivalent recommended)3 7
CUDA Toolkit: Version compatible with your GPU drivers (CUDA 12.x preferred)5
RAM: At least 32 GB (48 GB recommended for large models)3
Disk Space: Minimum 40 GB free3
Python: Version 3.8 or higher (Python 3.12 recommended)5
Internet Connection: Required for downloading model weights and dependencies

Preparing Your Ubuntu Environment

Update System Packages

Open your terminal and run:

bashsudo apt update && sudo apt upgrade -y

Install Essential Tools

bashsudo apt install python3 python3-pip python3-venv git unzip curl -y

Verify GPU and CUDA Installation

Install NVIDIA drivers and CUDA Toolkit if not already present:

bashnvidia-smi
nvcc --version

If these commands fail, refer to the official NVIDIA documentation to install the latest drivers and CUDA toolkit.

Installation Methods Overview

There are several ways to run Phi-4 on Ubuntu. The best method depends on your use case, technical comfort, and hardware:

Method	Best For	Ease of Setup	Flexibility	GPU Required
Ollama	Quick setup, chat UI	Easiest	Moderate	Yes
Python Direct	Custom scripts, research	Moderate	High	Yes
vLLM	High performance, APIs	Advanced	High	Yes

Method 1: Running Phi-4 with Ollama

Ollama is a user-friendly platform for running large language models locally. It abstracts much of the complexity, making it ideal for quick deployment and experimentation.

Step 1: Install Ollama

bashcurl -fsSL https://ollama.com/install.sh | sh

This script installs Ollama and its dependencies on your system3 7.

Step 2: Start Ollama Service

bashollama serve

This command starts the Ollama server, allowing you to interact with models.

Step 3: Download Phi-4 Model

List available models:

bashollama list

Pull the Phi-4 model:

bashollama pull vanilj/Phi-4

For better performance with more RAM, use the quantized version:

bashollama pull vanilj/Phi-4-q8_0

Step 4: Run Phi-4 in Terminal

Start an interactive chat session:

bashollama run vanilj/Phi-4

You can now type queries and receive responses directly in your terminal.

Optional: Web Interface with OpenWebUI

For a graphical interface, you can use OpenWebUI via Docker:

bashdocker pull openwebui/openwebui
docker run -d -p 8080:8080 openwebui/openwebui

Then connect OpenWebUI to your local Ollama instance7.

Method 2: Direct Python Installation (Hugging Face Transformers)

For full flexibility, especially for research or custom pipelines, install and run Phi-4 using Python and Hugging Face Transformers.

Step 1: Set Up Python Virtual Environment

bashpython3 -m venv phi4env
source phi4env/bin/activate

Step 2: Install Required Python Packages

Create a requirements.txt file with the following content:

flash_attn2.7.4.post1
torch2.6.0
transformers4.48.2
accelerate1.3.0
soundfile0.13.1
pillow11.1.0
scipy1.15.2
torchvision0.21.0
backoff2.2.1
peft0.13.2
huggingface-hub

Step 3: Download Phi-4 Model Weights

Create a directory for the model:

bashmkdir model

Download the model from Hugging Face:

bashpip install "huggingface_hub[cli]"
huggingface-cli download microsoft/Phi-4-multimodal-instruct --local-dir ./model

Step 4: Example Python Script to Run Phi-4

import torch
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
from PIL import Image
import soundfile as sf
import io
import requests
from urllib.request import urlopen

model_path = "./model"

processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
attn_implementation='flash_attention_2',
).cuda()
generation_config = GenerationConfig.from_pretrained(model_path)

Method 3

Step 1: Install vLLM

Follow the official vLLM installation instructions (typically via pip):

bashpip install vllm

Step 2: Download GGUF Model Weights

bashwget https://huggingface.co/microsoft/phi-4-gguf/resolve/main/phi-4-q4.gguf

Step 3: Serve the Model

bashvllm serve ./phi-4-q4.gguf --tokenizer microsoft/phi-4 --host 0.0.0.0 --port 7000

This will start the vLLM API server, accessible on port 7000.

Note: Some users have reported issues with engine process failures and LoRA adapter errors. Ensure all dependencies are compatible and your GPU drivers are up to date. See the Troubleshooting section for more details.

Method 4

Installation Steps

Run Phi-4: Create a Python script to load and run the Phi-4 model:PythonCopy

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load pre-trained model and tokenizer
model = AutoModelForCausalLM.from_pretrained('./model')
tokenizer = AutoTokenizer.from_pretrained('./model')

# Process input
input_text = "What are the applications of quantum computing?"
inputs = tokenizer(input_text, return_tensors='pt')

# Generate response
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Download Phi-4 Model:bashCopy

mkdir model
huggingface-cli download microsoft/Phi-4 --local-dir ./model

Install Python Packages: Create a requirements.txt file with the following dependencies:plaintextCopy

flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2

Install the dependencies:bashCopy

pip install -r requirements.txt

Create a Virtual Environment:bashCopy

python3 -m venv venv
source venv/bin/activate

Install Dependencies:bashCopy

sudo apt install python3 python3-pip python3-venv git unzip -y

Update Ubuntu:bashCopy

sudo apt update && sudo apt upgrade -y

Live Examples

Example 1: Solving a Complex Algebra Problem

Problem: Solve the quadratic equation 2x2+3x−2=0.
Explanation: The model provides a step-by-step solution to the quadratic equation, making it a valuable tool for learning and understanding complex problems.

Example 2: Generating Python Code for Fibonacci Sequence

Problem: Generate a Python script to continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8.
Explanation: The model generates a Python function fibonacci_sequence that computes the Fibonacci sequence up to n terms and prints the sequence.

Output:PythonCopy

# Python script to continue the Fibonacci sequence
def fibonacci_sequence(n):
    a, b = 1, 1
    sequence = [a, b]
    for _ in range(n - 2):
        a, b = b, a + b
        sequence.append(b)
    return sequence

# Continue the sequence up to 10 terms
fib_sequence = fibonacci_sequence(10)
print(fib_sequence)

Steps:bashCopy

ollama pull vanilj/Phi-4
ollama run vanilj/Phi-4 -- "Generate a Python script to continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8"

Output:Copy

To solve the quadratic equation 2x^2 + 3x - 2 = 0, we use the quadratic formula:
x = (-b ± sqrt(b^2 - 4ac)) / (2a)

Here, a = 2, b = 3, and c = -2.

First, calculate the discriminant (D):
D = b^2 - 4ac
D = 3^2 - 4 * 2 * (-2)
D = 9 + 16
D = 25

Now, calculate the roots:
x1 = (-3 + sqrt(25)) / (2 * 2)
x1 = (-3 + 5) / 4
x1 = 2 / 4
x1 = 0.5

x2 = (-3 - sqrt(25)) / (2 * 2)
x2 = (-3 - 5) / 4
x2 = -8 / 4
x2 = -2

The roots of the equation 2x^2 + 3x - 2 = 0 are x1 = 0.5 and x2 = -2.

Steps:bashCopy

ollama pull vanilj/Phi-4
ollama run vanilj/Phi-4 -- "Solve the quadratic equation 2x^2 + 3x - 2 = 0"

Using Phi-4: Examples

Text Generation

Prompt:

pythonprompt = "Explain quantum entanglement in simple terms."

Command (Ollama):

bashollama run vanilj/Phi-4
# Then type your prompt

Python:
Use the processor and model as shown in the previous script.

Image Understanding

Provide an image and ask for a description or analysis (see Python example above).

Audio Transcription and Translation

Provide an audio file and prompt Phi-4 to transcribe and translate (see Python example above).

Troubleshooting and Common Issues

Model Initialization Failures: Double-check file paths and ensure all dependencies are installed4.
CUDA/Driver Issues: Ensure your NVIDIA drivers and CUDA toolkit are compatible and up to date5.
Out of Memory Errors: Use quantized model versions (e.g., Q4, Q8) or upgrade your hardware7.
vLLM Engine Process Failed: Review stack traces for missing dependencies or incompatible versions8.
LoRA Adapter Errors: Ensure all required adapters are present and correctly referenced6.
Slow Performance: Close unnecessary applications, ensure sufficient VRAM, and consider using more powerful GPUs.

Best Practices for Performance and Security

Use a Dedicated GPU Node: For production or research, dedicate a machine with ample GPU resources3.
Virtual Environments: Always use Python virtual environments to avoid dependency conflicts5.
SSH Keys for Remote Access: Use SSH keys instead of passwords for secure remote connections3.
Monitor GPU Usage: Use nvidia-smi to monitor GPU utilization and temperature3.
Regular Updates: Keep your system, drivers, and packages updated to benefit from performance improvements and security patches.

Conclusion

Running Microsoft Phi-4 on Ubuntu empowers you with a powerful, flexible, and private AI system capable of advanced language, vision, and audio tasks. Whether you choose the simplicity of Ollama, the flexibility of Python, or the performance of vLLM, Phi-4 can be tailored to your workflow.

References

Running Microsoft Phi-4 on Ubuntu

Microsoft's Phi-4 is a powerful 14-billion-parameter language model optimized for complex reasoning tasks like mathematical problem-solving, code generation, and natural language understanding. Below are detailed steps to run Phi-4 on Ubuntu, along with two live examples and a conclusion.

Prerequisites

Ubuntu OS: Ensure you have Ubuntu installed (Ubuntu 20.04 LTS or 22.04 LTS recommended).
NVIDIA GPU: Phi-4 benefits from GPU acceleration, so an NVIDIA CUDA-enabled GPU is recommended.
Python Environment: Python 3.7 or higher is required.

Conclusion

By following the steps outlined above, you can successfully deploy and run Microsoft Phi-4 on Ubuntu. Phi-4 excels in mathematical reasoning and outperforms many larger models in solving complex problems. Its ability to provide detailed step-by-step solutions makes it a powerful tool for students, educators, and professionals alike.

What is Microsoft Phi-4?

System Requirements

Preparing Your Ubuntu Environment

Update System Packages

Install Essential Tools

Verify GPU and CUDA Installation

Installation Methods Overview

Method 1: Running Phi-4 with Ollama

Step 1: Install Ollama

Step 2: Start Ollama Service

Step 3: Download Phi-4 Model

Step 4: Run Phi-4 in Terminal

Optional: Web Interface with OpenWebUI

Method 2: Direct Python Installation (Hugging Face Transformers)

Step 1: Set Up Python Virtual Environment

Step 2: Install Required Python Packages

Step 3: Download Phi-4 Model Weights

Step 4: Example Python Script to Run Phi-4

Method 3

Step 1: Install vLLM

Step 2: Download GGUF Model Weights

Step 3: Serve the Model

Method 4

Installation Steps

Live Examples

Example 1: Solving a Complex Algebra Problem

Example 2: Generating Python Code for Fibonacci Sequence

Using Phi-4: Examples

Text Generation

Audio Transcription and Translation

Troubleshooting and Common Issues

Best Practices for Performance and Security

Conclusion

References

Running Microsoft Phi-4 on Ubuntu

Prerequisites

Conclusion

References

Sign up for more like this.