Run Microsoft Phi-4 on Ubuntu: A Comprehensive Guide
Microsoft’s Phi-4 is a state-of-the-art multimodal AI model, capable of advanced language, vision, and audio understanding. Running Phi-4 locally on Ubuntu allows developers, researchers, and enthusiasts to leverage its capabilities.
What is Microsoft Phi-4?
Microsoft Phi-4 is a cutting-edge multimodal AI model, designed to process and generate text, understand images, and transcribe or translate audio.
Running Phi-4 locally on Ubuntu provides:
- Data privacy: Keep your data on-premises.
- Customization: Fine-tune and experiment freely.
- Performance: Utilize local hardware for faster inference.
System Requirements
Before installing Phi-4, ensure your system meets the following requirements:
- Operating System: Ubuntu 20.04 or later (64-bit)
- GPU: NVIDIA GPU with CUDA support (RTX 4090, A6000, or equivalent recommended)37
- CUDA Toolkit: Version compatible with your GPU drivers (CUDA 12.x preferred)5
- RAM: At least 32 GB (48 GB recommended for large models)3
- Disk Space: Minimum 40 GB free3
- Python: Version 3.8 or higher (Python 3.12 recommended)5
- Internet Connection: Required for downloading model weights and dependencies
Preparing Your Ubuntu Environment
Update System Packages
Open your terminal and run:
bashsudo apt update && sudo apt
upgrade -y
Install Essential Tools
bashsudo apt install python3 python3-pip python3-venv git unzip curl
-y
Verify GPU and CUDA Installation
Install NVIDIA drivers and CUDA Toolkit if not already present:
bashnvidia-smi
nvcc --version
If these commands fail, refer to the official NVIDIA documentation to install the latest drivers and CUDA toolkit.
Installation Methods Overview
There are several ways to run Phi-4 on Ubuntu. The best method depends on your use case, technical comfort, and hardware:
Method | Best For | Ease of Setup | Flexibility | GPU Required |
---|---|---|---|---|
Ollama | Quick setup, chat UI | Easiest | Moderate | Yes |
Python Direct | Custom scripts, research | Moderate | High | Yes |
vLLM | High performance, APIs | Advanced | High | Yes |
Method 1: Running Phi-4 with Ollama
Ollama is a user-friendly platform for running large language models locally. It abstracts much of the complexity, making it ideal for quick deployment and experimentation.
Step 1: Install Ollama
bashcurl -fsSL https://ollama.com/install.sh | sh
This script installs Ollama and its dependencies on your system37.
Step 2: Start Ollama Service
bashollama serve
This command starts the Ollama server, allowing you to interact with models.
Step 3: Download Phi-4 Model
List available models:
bashollama list
Pull the Phi-4 model:
bashollama pull vanilj/Phi-4
For better performance with more RAM, use the quantized version:
bashollama pull vanilj/Phi-4-q8_0
Step 4: Run Phi-4 in Terminal
Start an interactive chat session:
bashollama run vanilj/Phi-4
You can now type queries and receive responses directly in your terminal.
Optional: Web Interface with OpenWebUI
For a graphical interface, you can use OpenWebUI via Docker:
bashdocker
pull openwebui/openwebuidocker run -d -p 8080
:8080 openwebui/openwebui
Then connect OpenWebUI to your local Ollama instance7.
Method 2: Direct Python Installation (Hugging Face Transformers)
For full flexibility, especially for research or custom pipelines, install and run Phi-4 using Python and Hugging Face Transformers.
Step 1: Set Up Python Virtual Environment
bashpython3 -m venv phi4envsource
phi4env/bin/activate
Step 2: Install Required Python Packages
Create a requirements.txt
file with the following content:
flash_attn2.7.4.post1
torch2.6.0
transformers4.48.2
accelerate1.3.0
soundfile0.13.1
pillow11.1.0
scipy1.15.2
torchvision0.21.0
backoff2.2.1
peft0.13.2
huggingface-hub
Step 3: Download Phi-4 Model Weights
Create a directory for the model:
bashmkdir
model
Download the model from Hugging Face:
bashpip install "huggingface_hub[cli]"
huggingface-cli download microsoft/Phi-4-multimodal-instruct --local-dir ./model
Step 4: Example Python Script to Run Phi-4
import torch
from transformers import AutoModelForCausalLM, AutoProcessor, GenerationConfig
from PIL import Image
import soundfile as sf
import io
import requests
from urllib.request import urlopen
model_path = "./model"
processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
attn_implementation='flash_attention_2',
).cuda()
generation_config = GenerationConfig.from_pretrained(model_path)
Method 3
Step 1: Install vLLM
Follow the official vLLM installation instructions (typically via pip):
bashpip install
vllm
Step 2: Download GGUF Model Weights
bashwget
https://huggingface.co/microsoft/phi-4-gguf/resolve/main/phi-4-q4.gguf
Step 3: Serve the Model
bashvllm serve ./phi-4-q4.gguf --tokenizer microsoft/phi-4 --host 0.0.0.0 --port 7000
This will start the vLLM API server, accessible on port 7000.
Note: Some users have reported issues with engine process failures and LoRA adapter errors. Ensure all dependencies are compatible and your GPU drivers are up to date. See the Troubleshooting section for more details.
Method 4
Installation Steps
Run Phi-4: Create a Python script to load and run the Phi-4 model:PythonCopy
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load pre-trained model and tokenizer
model = AutoModelForCausalLM.from_pretrained('./model')
tokenizer = AutoTokenizer.from_pretrained('./model')
# Process input
input_text = "What are the applications of quantum computing?"
inputs = tokenizer(input_text, return_tensors='pt')
# Generate response
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Download Phi-4 Model:bashCopy
mkdir model
huggingface-cli download microsoft/Phi-4 --local-dir ./model
Install Python Packages: Create a requirements.txt
file with the following dependencies:plaintextCopy
flash_attn==2.7.4.post1
torch==2.6.0
transformers==4.48.2
accelerate==1.3.0
soundfile==0.13.1
pillow==11.1.0
scipy==1.15.2
torchvision==0.21.0
backoff==2.2.1
peft==0.13.2
Install the dependencies:bashCopy
pip install -r requirements.txt
Create a Virtual Environment:bashCopy
python3 -m venv venv
source venv/bin/activate
Install Dependencies:bashCopy
sudo apt install python3 python3-pip python3-venv git unzip -y
Update Ubuntu:bashCopy
sudo apt update && sudo apt upgrade -y
Live Examples
Example 1: Solving a Complex Algebra Problem
- Problem: Solve the quadratic equation 2x2+3x−2=0.
- Explanation: The model provides a step-by-step solution to the quadratic equation, making it a valuable tool for learning and understanding complex problems.
Example 2: Generating Python Code for Fibonacci Sequence
- Problem: Generate a Python script to continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8.
- Explanation: The model generates a Python function
fibonacci_sequence
that computes the Fibonacci sequence up ton
terms and prints the sequence.
Output:PythonCopy
# Python script to continue the Fibonacci sequence
def fibonacci_sequence(n):
a, b = 1, 1
sequence = [a, b]
for _ in range(n - 2):
a, b = b, a + b
sequence.append(b)
return sequence
# Continue the sequence up to 10 terms
fib_sequence = fibonacci_sequence(10)
print(fib_sequence)
Steps:bashCopy
ollama pull vanilj/Phi-4
ollama run vanilj/Phi-4 -- "Generate a Python script to continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8"
Output:Copy
To solve the quadratic equation 2x^2 + 3x - 2 = 0, we use the quadratic formula:
x = (-b ± sqrt(b^2 - 4ac)) / (2a)
Here, a = 2, b = 3, and c = -2.
First, calculate the discriminant (D):
D = b^2 - 4ac
D = 3^2 - 4 * 2 * (-2)
D = 9 + 16
D = 25
Now, calculate the roots:
x1 = (-3 + sqrt(25)) / (2 * 2)
x1 = (-3 + 5) / 4
x1 = 2 / 4
x1 = 0.5
x2 = (-3 - sqrt(25)) / (2 * 2)
x2 = (-3 - 5) / 4
x2 = -8 / 4
x2 = -2
The roots of the equation 2x^2 + 3x - 2 = 0 are x1 = 0.5 and x2 = -2.
Steps:bashCopy
ollama pull vanilj/Phi-4
ollama run vanilj/Phi-4 -- "Solve the quadratic equation 2x^2 + 3x - 2 = 0"
Using Phi-4: Examples
Text Generation
Prompt:
pythonprompt = "Explain quantum entanglement in simple terms."
Command (Ollama):
bashollama run vanilj/Phi-4# Then type your prompt
Python:
Use the processor and model as shown in the previous script.
- Image Understanding
Provide an image and ask for a description or analysis (see Python example above).
Audio Transcription and Translation
Provide an audio file and prompt Phi-4 to transcribe and translate (see Python example above).
Troubleshooting and Common Issues
- Model Initialization Failures: Double-check file paths and ensure all dependencies are installed4.
- CUDA/Driver Issues: Ensure your NVIDIA drivers and CUDA toolkit are compatible and up to date5.
- Out of Memory Errors: Use quantized model versions (e.g., Q4, Q8) or upgrade your hardware7.
- vLLM Engine Process Failed: Review stack traces for missing dependencies or incompatible versions8.
- LoRA Adapter Errors: Ensure all required adapters are present and correctly referenced6.
- Slow Performance: Close unnecessary applications, ensure sufficient VRAM, and consider using more powerful GPUs.
Best Practices for Performance and Security
- Use a Dedicated GPU Node: For production or research, dedicate a machine with ample GPU resources3.
- Virtual Environments: Always use Python virtual environments to avoid dependency conflicts5.
- SSH Keys for Remote Access: Use SSH keys instead of passwords for secure remote connections3.
- Monitor GPU Usage: Use
nvidia-smi
to monitor GPU utilization and temperature3. - Regular Updates: Keep your system, drivers, and packages updated to benefit from performance improvements and security patches.
Conclusion
Running Microsoft Phi-4 on Ubuntu empowers you with a powerful, flexible, and private AI system capable of advanced language, vision, and audio tasks. Whether you choose the simplicity of Ollama, the flexibility of Python, or the performance of vLLM, Phi-4 can be tailored to your workflow.
References
- Run DeepSeek Janus-Pro 7B on Mac: A Comprehensive Guide Using ComfyUI
- Run DeepSeek Janus-Pro 7B on Mac: Step-by-Step Guide
- Run DeepSeek Janus-Pro 7B on Windows: A Complete Installation Guide
- Run Microsoft Phi 4 on Mac: Installation Guide
Running Microsoft Phi-4 on Ubuntu
Microsoft's Phi-4 is a powerful 14-billion-parameter language model optimized for complex reasoning tasks like mathematical problem-solving, code generation, and natural language understanding. Below are detailed steps to run Phi-4 on Ubuntu, along with two live examples and a conclusion.
Prerequisites
- Ubuntu OS: Ensure you have Ubuntu installed (Ubuntu 20.04 LTS or 22.04 LTS recommended).
- NVIDIA GPU: Phi-4 benefits from GPU acceleration, so an NVIDIA CUDA-enabled GPU is recommended.
- Python Environment: Python 3.7 or higher is required.
Conclusion
By following the steps outlined above, you can successfully deploy and run Microsoft Phi-4 on Ubuntu. Phi-4 excels in mathematical reasoning and outperforms many larger models in solving complex problems. Its ability to provide detailed step-by-step solutions makes it a powerful tool for students, educators, and professionals alike.