GPU acceleration
How Ditto uses your NVIDIA GPU to speed up transcription, when to turn it off, and what to check if it isn't working.
What it does
Whisper.cpp can run on either CPU or GPU. On supported NVIDIA cards, GPU mode uses CUDA to push the heavy matrix math (attention layers, FFT, matmul) onto the graphics card. The result: 5x to 30x faster transcriptions depending on model size.
The speedup is more dramatic on bigger models. Tiny barely benefits, since it’s already near the limit of CPU IO overhead. Large-v3 can take 80 seconds on CPU and a few seconds on a recent GPU.
The toggle
Settings → Models → Use GPU (CUDA) controls this. It’s on by default.
When on, Ditto launches whisper-cli.exe without the -ng flag and the GPU backend takes over. When off, the same binary runs with -ng (no GPU) and falls back to CPU.
What’s bundled
Ditto ships the runtime libraries needed for CUDA so you don’t have to install the CUDA Toolkit yourself. The installer includes:
cudart64_12.dll— CUDA runtimecublas64_12.dll,cublasLt64_12.dll,nvblas64_12.dll— cuBLAS for linear algebranvrtc64_120_0.dll,nvrtc-builtins64_124.dll— runtime kernel compilationggml-cuda.dll— Whisper.cpp’s CUDA backend
These are distributed by NVIDIA under their CUDA EULA and are bundled with the whisper.cpp Windows binaries.
You do still need a recent NVIDIA driver. Anything from the last few years should work — drivers ship the kernel-mode CUDA support that lives on the GPU side.
Hardware support
GPU acceleration in Ditto requires:
- An NVIDIA GPU with up-to-date drivers. Most consumer cards from the last 5–6 years work (GeForce GTX 10-series and newer, RTX cards, Quadros).
- CUDA Compute Capability 3.5 or higher — basically anything Maxwell-era or newer.
- Enough VRAM for the model. Roughly: Tiny needs ~150 MB, Base ~250 MB, Small ~600 MB, Medium ~2 GB, Large-v3 ~4 GB.
Verifying it’s working
There’s no in-app indicator yet (planned for a future version), but you can confirm GPU usage two ways:
1. Watch your GPU during transcription.
Open Task Manager → Performance → GPU. Trigger a transcription with a medium or large model. You should see a brief spike in CUDA utilization (not 3D, not Video). Tiny and Base finish too fast for the spike to register clearly.
2. Compare timings with the toggle off and on.
Record a 10-second sentence with a medium-sized model. Transcribe once with GPU on, once with GPU off. The difference should be obvious — at minimum 3-4x faster on most cards.
When to turn it off
GPU mode is the default and should work transparently for most NVIDIA users. Turn it off only if:
- You’re seeing driver crashes or system instability during transcription.
- Your card is older than 2014 and CUDA mode falls back silently to a slow path.
- You’re on a laptop and want to save battery (CPU is more efficient on short clips than spinning up the GPU).
Otherwise, leave it on.
If GPU mode fails to start
Symptoms: transcription takes much longer than expected, or the binary silently runs in CPU mode despite the toggle being on.
Things to check:
- Driver version. Open NVIDIA’s app or
Get-CimInstance Win32_VideoControllerin PowerShell. Drivers older than ~2 years may not support the CUDA 12 runtime that Ditto bundles. Update via GeForce Experience or NVIDIA’s driver page. - GPU is shared with another process that’s hogging VRAM. A browser doing GPU acceleration on dozens of tabs can leave you with too little free VRAM for Whisper. Close some tabs and retry.
- Hybrid laptops. On laptops with both an integrated GPU and a discrete NVIDIA GPU, Windows may route Ditto to the integrated one. Open Settings → System → Display → Graphics in Windows and force “High performance” for
Ditto.exe.
If none of those help, file an issue on GitHub with the model size, the output you see in Get-FileHash for the model file, and your GPU model.