First transcription
A walkthrough of dictating your first sentence, what each pill state means, and how to cancel or recover if something goes wrong.
Before you start
Two things should be in place from the previous steps:
- Ditto is installed and running. The pill is somewhere on screen, or hidden in the tray.
- A Whisper model is downloaded. If you skipped the welcome window, open Settings → Models and pick one.
Dictating
- Click into a text field. Anywhere a cursor blinks works: a Slack message, a code editor, an email draft, a search bar. Ditto pastes wherever your cursor is when transcription finishes.
- Press Ctrl+Shift+Space. The pill switches to its recording state. On Matte and Onyx themes it shows voice-driven bars; on Aurora, Sunset, and Mint it shows a wave animation at the bottom that reacts to your voice.
- Talk normally. Speak at your usual pace. Pauses are fine, Whisper will handle them. Background noise is OK in moderation, especially with the noise filter on.
- Press Ctrl+Shift+Space again. The pill collapses to a small dots loader while Ditto runs the audio through the model.
- Your text is pasted. Within a second or two (longer for big models or long recordings), the transcription appears in the text field where your cursor was.
What each pill state looks like
| State | What you see | Meaning |
|---|---|---|
| Idle | Wide pill that reads “Ditto” | Ready, waiting for the shortcut |
| Recording | Voice-reactive bars or wave | Capturing audio |
| Transcribing | Small pill with a row of animated dots | Running Whisper, generating text |
| Copied | Short pill with a check icon and “Copied!” | Text is in your clipboard but auto-paste is off |
The “Copied” state only appears if you turned off auto-paste in Settings → General. By default Ditto pastes the text for you.
Canceling a recording
If you change your mind mid-sentence:
- Press Esc while recording. The pill returns to idle and the audio is discarded. Nothing is sent to Whisper.
The cancel shortcut is configurable in Settings → Shortcut, in case Esc clashes with something else for you.
If something goes wrong
A few common situations:
- Nothing happens when you press the shortcut. Another app may have registered the same combo. Open Settings → Shortcut and pick a different one, or check the Shortcuts page for details.
- The pill records but no text appears. Check that the model file still exists. If you deleted it manually, Ditto reopens the welcome window so you can pick another. See Models for more.
- The text is wrong or in the wrong language. Open Settings → Audio and set a specific transcription language instead of “Auto-detect”. See Audio input and Languages.
- The pill never reaches “Copied” and your app is empty. Some apps (RDP sessions, certain games, sandboxed text fields) block simulated Ctrl+V. Turn on Settings → General → Keep previous clipboard if you’d rather paste manually.
For broader issues, jump to Troubleshooting.
A few tips for better results
- Say punctuation. Whisper picks up “comma”, “period”, “question mark” if you say them clearly. Doesn’t always work but often enough to be useful.
- Don’t whisper. A normal speaking voice gives the model the most signal.
- Bigger isn’t always better. Base or Small are usually plenty for daily notes. Medium and Large-v3 shine for technical jargon, accents, or noisy environments — but they’re slower.
- Pick a fixed language if you always speak the same one. Auto-detect is flexible but adds latency. Setting it explicitly speeds things up and reduces mistakes.