Run Nari Dia 1.6B on Mac: Installation Guide

Run Nari Dia 1.6B on Mac: Installation Guide

Nari Dia 1.6B is an advanced, open-source text-to-speech (TTS) model developed by Nari Labs. With 1.6 billion parameters, it is designed to generate highly realistic, multi-speaker conversational audio.

Its open weights and code, Apache 2.0 license, and dialogue-centric features make it a compelling alternative to commercial TTS services like ElevenLabs.

What is Nari Dia 1.6B?

  • Model Size: 1.6 billion parameters, optimized for capturing intricate speech patterns.
  • Dialogue Generation: Supports scripts with multiple speakers using simple tags (e.g., [S1], [S2]).
  • Non-Verbal Communication: Can generate sounds like laughter, coughs, and throat clearing when specified in the input.
  • Audio Conditioning: Allows users to influence voice output via audio samples, enabling emotion and tone control.
  • Open Source: Released under Apache 2.0, with open weights and code available on Hugging Face.
  • Language Support: Currently, only English is supported1.

Hardware and Software Requirements

A. Hardware Requirements

  • GPU Dependency: Dia 1.6B is designed for CUDA-enabled NVIDIA GPUs, requiring about 10GB of GPU memory for full performance. This typically means you need a mid-range to high-end GPU (e.g., RTX 3070/4070 or better)1.
  • Mac Hardware: Most Macs, including the latest Apple Silicon (M1, M2, M3), do not natively support CUDA. This presents a challenge for running Dia 1.6B at full speed on a Mac.
  • CPU Support: As of now, official CPU support is planned but not yet available. Running the model on CPU will be possible in the future, but with significant performance limitations1.
  • RAM: At least 16GB of system RAM is recommended, especially if running in a virtualized or emulated environment.

B. Software Requirements

  • Operating System: macOS (latest version recommended for compatibility and security)2.
  • Python: Python 3.8 or later.
  • Git: For cloning the repository.
  • uv (Recommended): A fast Python package manager (pip install uv)1.
  • PyTorch: The model requires PyTorch 2.0+ and CUDA 12.6 for GPU acceleration. On Mac, PyTorch can be installed for CPU, but without CUDA support.
  • Other Dependencies: Hugging Face Transformers, Gradio, and Descript Audio Codec (handled by the setup script).

Challenges of Running Dia 1.6B on Mac

A. Lack of Native CUDA Support

Apple Silicon and Intel-based Macs do not natively support NVIDIA’s CUDA, which is required for optimal Dia 1.6B performance. This means:

  • No direct GPU acceleration: Running on Mac will default to CPU, resulting in much slower inference.
  • Workarounds: Advanced users may attempt to use cloud GPUs, Docker containers with GPU passthrough (on supported hardware), or emulation/virtualization, but these are complex and not officially supported.

B. Alternatives for Mac Users

  • Try Online Demo: Use the Hugging Face ZeroGPU Space for a cloud-hosted demo without local setup.
  • Wait for CPU/Quantized Versions: Nari Labs plans to release CPU-compatible and quantized versions, which will lower hardware requirements and improve accessibility for Mac users.
  • Explore Orpheus.CPP: Other TTS models like Orpheus.CPP can be run on Mac CPUs, though with different features and quality.

Step-by-Step Installation Guide

A. Preparing Your Mac

  1. Update macOS: Ensure your system is up to date for best compatibility and security.
  2. Install Homebrew (if not already installed):bash/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  3. Install Python and Git:bashbrew install python git
  4. (Optional) Install uv:bashpip3 install uv

B. Clone the Dia Repository

Open Terminal and run:

bashgit clone https://github.com/nari-labs/dia.git
cd dia

C. Set Up Python Environment

Using uv (Recommended):

bashuv run app.py

  • The first run will install all dependencies and download the model weights. This may take some time.

Manual Setup (Alternative):

bashpython3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python app.py

D. Running the Application

  • The application will launch a Gradio web interface, allowing you to enter text, select speakers, and generate audio.
  • On Mac, expect slower performance since inference will run on CPU.

Using Dia 1.6B: Features and Workflow

A. Dialogue Input Format

  • Use tags to designate speakers:text[S1] Hello, how are you?
    [S2] I'm fine, thank you! (laughs)
  • Non-verbal cues (e.g., (laughs), (coughs)) are supported in the script.

B. Audio Conditioning

  • Upload a short audio sample to clone a voice or set emotional tone.
  • This feature enables custom voices and expressive speech output.

C. Gradio UI

  • The Gradio interface provides fields to input text, select speaker tags, upload audio samples, and listen to generated speech.

Performance Considerations

  • CPU Inference: On Mac, generation will be significantly slower than on a CUDA-enabled GPU. Expect long wait times for audio synthesis, especially for longer scripts.
  • Resource Usage: The model is memory-intensive; ensure you have sufficient RAM and disk space.
  • Future Improvements: Quantized and CPU-optimized versions are expected to improve performance and accessibility for Mac users1.

Troubleshooting and Tips

  • Dependency Issues: Ensure all required Python packages are installed. If issues arise, try updating pip and reinstalling dependencies.
  • Model Download Problems: Check your internet connection and available disk space.
  • Slow Performance: This is expected on Mac due to lack of GPU acceleration. Consider using cloud-based inference for faster results.
  • Audio Output Issues: Verify your Mac’s sound settings and output device configuration.

Alternative Approaches for Mac Users

A. Use Hugging Face ZeroGPU Space

  • No installation required.
  • Try Dia 1.6B online with limited usage1.

B. Wait for CPU/Quantized Support

  • Monitor Nari Labs’ announcements for updates on CPU-compatible releases.

C. Explore Other TTS Options

  • Orpheus.CPP: Can run on Mac CPU, supports text-to-speech with less hardware demand, though with different feature sets.
  • Other Open-Source TTS Models: Research alternatives that are optimized for CPU or Apple Silicon.

Conclusion

Running Nari Dia 1.6B on a Mac is possible, but with significant performance limitations due to the lack of native CUDA GPU support. The model’s open-source nature and advanced dialogue capabilities make it an exciting tool for developers, researchers, and hobbyists interested in TTS and voice cloning.

For the best experience, use a CUDA-enabled GPU on a supported system, or leverage cloud-based demos until Mac-optimized versions are available.

References

  1. Run DeepSeek Janus-Pro 7B on Mac: A Comprehensive Guide Using ComfyUI
  2. Run DeepSeek Janus-Pro 7B on Mac: Step-by-Step Guide
  3. Run DeepSeek Janus-Pro 7B on Windows: A Complete Installation Guide