Docs / Advanced / GPU acceleration

GPU acceleration

How Ditto uses your NVIDIA GPU to speed up transcription, when to turn it off, and what to check if it isn't working.

What it does

Whisper.cpp can run on either CPU or GPU. On supported NVIDIA cards, GPU mode uses CUDA to push the heavy matrix math (attention layers, FFT, matmul) onto the graphics card. The result: 5x to 30x faster transcriptions depending on model size.

The speedup is more dramatic on bigger models. Tiny barely benefits, since it’s already near the limit of CPU IO overhead. Large-v3 can take 80 seconds on CPU and a few seconds on a recent GPU.

The toggle

Settings → Models → Use GPU (CUDA) controls this. It’s on by default.

When on, Ditto launches whisper-cli.exe without the -ng flag and the GPU backend takes over. When off, the same binary runs with -ng (no GPU) and falls back to CPU.

What’s bundled

Ditto ships the runtime libraries needed for CUDA so you don’t have to install the CUDA Toolkit yourself. The installer includes:

These are distributed by NVIDIA under their CUDA EULA and are bundled with the whisper.cpp Windows binaries.

You do still need a recent NVIDIA driver. Anything from the last few years should work — drivers ship the kernel-mode CUDA support that lives on the GPU side.

Hardware support

GPU acceleration in Ditto requires:

Verifying it’s working

There’s no in-app indicator yet (planned for a future version), but you can confirm GPU usage two ways:

1. Watch your GPU during transcription.

Open Task Manager → Performance → GPU. Trigger a transcription with a medium or large model. You should see a brief spike in CUDA utilization (not 3D, not Video). Tiny and Base finish too fast for the spike to register clearly.

2. Compare timings with the toggle off and on.

Record a 10-second sentence with a medium-sized model. Transcribe once with GPU on, once with GPU off. The difference should be obvious — at minimum 3-4x faster on most cards.

When to turn it off

GPU mode is the default and should work transparently for most NVIDIA users. Turn it off only if:

Otherwise, leave it on.

If GPU mode fails to start

Symptoms: transcription takes much longer than expected, or the binary silently runs in CPU mode despite the toggle being on.

Things to check:

If none of those help, file an issue on GitHub with the model size, the output you see in Get-FileHash for the model file, and your GPU model.