Voice
Voice is Supervertaler’s voice command and dictation engine. It lets you control any application on your computer – Trados, memoQ, Word, or anything else in the foreground – using your voice, while Supervertaler Workbench stays running in the background.
Open it via the 🎤 Voice top tab in Workbench, the tray icon’s Open Voice entry, or press Ctrl+Alt+A to toggle Always-On listening from anywhere on your computer.

Two modes
Always-On listening
Always-On runs a continuous microphone stream in the background. When you speak, Voice detects speech via amplitude-based VAD (voice activity detection), captures the utterance, and hands it to the active recognition engine.
With the Vosk engine (default) the recogniser only emits text for phrases in your command list – anything else is silently dropped as [unk]. So Vosk Always-On is “commands only” by design: you can leave it on all day, talk to colleagues, take phone calls, etc., and only matching command phrases will trigger actions.
With faster-whisper or OpenAI Whisper API every utterance is transcribed in full. If it matches a command the action fires; if not (and “Listen for commands only” is off), the transcribed text is typed into whichever window is in the foreground.
To start: click ▶ Start Always-On in the Voice tab, or press Ctrl+Alt+A from any application. A red mic icon appears in the system tray while Always-On is active.
To stop: click ⏹ Stop Always-On or press Ctrl+Alt+A again.
Focus matters: Voice sends keystrokes and text to whichever window is currently focused. After starting Always-On, click into Trados, Word, or your browser before you speak.
Push-to-Talk (F9 / Ctrl+Shift+Space)
Press F9 (inside the Workbench editor) or Ctrl+Shift+Space (globally, from any application – ⌘⇧Space on macOS) to record a single utterance for free-form running-text dictation. A small ”🎤 Listening…” toast appears in the top-right of the screen so you know the recording is live; it goes away again when you stop. Recording stops when you release the key (in hold-to-talk mode) or when you press the trigger again (in toggle mode). The transcribed text is then typed at the cursor position.
Always-On + push-to-talk coexist. If Always-On is running when you trigger push-to-talk, Voice pauses the always-on listener for the duration of the recording, runs the dictation, then resumes Always-On automatically. So you get free continuous Vosk command recognition all day plus a hotkey for occasional running-text dictation, without having to manually toggle Always-On off and on.
F9 modes (configurable in the Push-to-Talk settings):
- Toggle (default) – press once to start, press again to stop
- Hold-to-talk – hold the key to record, release to stop. Note: this only affects F9. The global Ctrl+Shift+Space hotkey always uses Toggle mode (Windows can’t reliably deliver key-up events across processes for global hotkeys).
Voice commands
Voice commands execute specific actions when you speak a trigger phrase. They can type text, press keyboard shortcuts, run AutoHotkey scripts, or call internal Workbench functions.
The commands table
The commands table (right side of the Voice tab) lists all your configured commands.
| Column | Description |
|---|---|
| ☑ | Enable/disable checkbox – uncheck to silence a command without deleting it |
| Phrase | The primary trigger word or phrase |
| Aliases | Alternative phrases that also trigger the command |
| Type | Command / Keystroke / AHK Script / AHK Inline |
| Action | What happens when the phrase is recognised |
| Category | Organisational label (Navigation, Editing, etc.) |
Enabling and disabling commands
- Single command – click the checkbox in the first column
- All commands – click the checkbox column header to toggle all at once (enables any disabled, or disables all if all are already active)
- Multiple commands – select rows with Shift+Click or Ctrl+Click, then right-click and choose ✅ Activate or ⬜ Deactivate
Disabled commands are greyed out and are skipped during recognition. Their settings are preserved.
Editing a command
Double-click any row to open the Edit Voice Command dialog. You can change the phrase, aliases, action type, and action value.
You can also use the Edit button in the toolbar above the table.
Adding a command
Click + Add above the table. Choose a command type:
- Command – calls an internal Workbench action (confirm segment, next segment, etc.)
- Keystroke – sends a key combination to the active window. Click into the Keystroke field and press the keys you want to send (e.g. press Ctrl+Enter); the field shows the platform-native symbols (⌘⇧⌥⌃ on macOS, Ctrl+Shift+Alt elsewhere) so you don’t have to translate between platforms.
- AHK Script – runs an AutoHotkey v2 script file
- AHK Inline – runs a short AutoHotkey v2 snippet directly
The Edit Voice Command dialog includes a context-sensitive cheat sheet below the Action field that updates with the Type dropdown – it explains the press-to-capture editor for Keystroke commands, lists common AHK patterns for AutoHotkey Code, names the available internal actions, etc. So you don’t need to memorise the full reference up front.
Removing a command
Select the row and click Remove, or select multiple rows and remove them together.
Edits take effect immediately under Vosk
When the Always-On engine is Vosk, adding / editing / removing / disabling a command immediately rebuilds Vosk’s recogniser grammar in the background – you don’t have to stop and restart Always-On to “teach” Vosk a new phrase. The status bar briefly shows 🔄 Vosk grammar refreshed (N phrases) to confirm the swap took effect. The next utterance you speak will use the new grammar.
Settings
Always-On engine
The dropdown in the Always-On section picks which speech-recognition backend listens for commands.
| Engine | Best for | Speed | Cost | Internet |
|---|---|---|---|---|
| Vosk (default, recommended) | Commands only – your phrase list, ignores everything else | Instant (~30 ms) | Free | No |
| faster-whisper | Commands + dictation of running text from one continuous mic stream | ~1–3 s | Free | No |
| OpenAI Whisper API | Same as faster-whisper but offloaded to OpenAI’s servers | ~0.5–2 s | $0.006 / minute of audio | Yes (API key) |
Vosk is the default for new installs. It’s purpose-built for fixed-vocabulary command recognition: pass it your active phrase list, and it biases the recogniser toward those phrases while silently dropping anything else as [unk]. That makes it both faster and more accurate for commands than any Whisper variant – and you can leave Always-On running all day for $0 in API fees and near-zero CPU load.
faster-whisper runs the same Whisper models OpenAI ships, but on a CTranslate2 C++ engine – roughly 4× faster than the original openai-whisper package on CPU, with much lower RAM. Choose this if you want continuous dictation of running text in always-on mode (every utterance gets transcribed in full, then either typed if it doesn’t match a command, or fires the matched command).
OpenAI Whisper API sends each utterance to OpenAI’s hosted whisper-1 model. Slightly faster end-to-end than running faster-whisper locally on most laptops, but each minute of audio costs about $0.006 – so leaving it on all day adds up. Requires an OpenAI API key in Settings → AI Settings.
The first time you start Always-On with Vosk, the small English model (~40 MB) auto-downloads to <data folder>/vosk-models/. Same for the small Dutch model when your project’s target language is Dutch. Models are cached forever after the first download.
Push-to-talk dictation engine
The Dictation engine dropdown in the Push-to-Talk Mode group controls what handles F9 / Ctrl+Shift+Space when you trigger push-to-talk dictation. This is independent of the Always-On engine, because the two paths have different needs:
| Setting | What runs when you press Ctrl+Shift+Space / F9 |
|---|---|
| Same as Always-On (default) | Auto-routes: Vosk or faster-whisper Always-On → faster-whisper push-to-talk; OpenAI API Always-On → OpenAI API push-to-talk |
| faster-whisper (offline) | Always faster-whisper, regardless of Always-On engine |
| OpenAI Whisper API (online, fast) | Always the API, regardless of Always-On engine. Useful pairing: Vosk for free continuous commands + OpenAI API for fast running-text dictation. |
The “ℹ️ Push-to-talk will use: …” indicator below the dropdown shows the resolved engine (after auto-routing) so you always know which backend will run.
Why isn’t Vosk an option here? Vosk’s grammar mode is built for fixed phrases, not free-form transcription. Pressing Ctrl+Shift+Space produces running text, which Whisper handles vastly better. So push-to-talk silently falls through to a Whisper engine even when Always-On is set to Vosk.
faster-whisper model
The Whisper model size dropdown applies whenever a Whisper engine is active – that’s faster-whisper for either Always-On or push-to-talk, or the OpenAI API. (The API ignores this setting and always uses whisper-1 server-side.) Larger models are more accurate but slower and need more RAM.
| Model | Download size | Notes |
|---|---|---|
| tiny | ~75 MB | Very fast, lowest accuracy |
| base | ~142 MB | Good balance (recommended) |
| small | ~466 MB | Noticeably better accuracy |
| medium | ~1.5 GB | High accuracy |
| large | ~2.9 GB | Best accuracy, slow on CPU |
Mic sensitivity
Controls the amplitude threshold used to detect speech onset.
- Low (noisy) – raises the threshold; ignores quiet background sounds but may miss soft speech
- Medium (normal) – default; works well in a typical home office
- High (quiet) – lowers the threshold; captures quiet voices but may trigger on background noise
Listen for commands only
Whisper engines only. The checkbox is hidden when the Always-On engine is Vosk, because Vosk’s grammar mode already drops non-command speech at the recogniser level – the setting would be a structural no-op there.
For faster-whisper and the OpenAI Whisper API: when checked, Always-On fires voice commands but discards any speech that doesn’t match a command – it is not typed. Use this if you only want voice control with a Whisper engine, not dictation. When unchecked, unmatched speech is transcribed and typed at the cursor position.
Maximum recording duration
Sets the upper limit (in seconds) for a single voice clip. Speech that exceeds this length is cut and transcribed up to the limit. Useful to prevent long silences from being held open indefinitely.
Language
- Auto – uses the project’s target language as the transcription hint
- Explicit language – forces Whisper to transcribe in the selected language, which improves accuracy when the target language differs from the source
AutoHotkey integration
AutoHotkey v2 must be installed for AHK-type commands to work. Supervertaler checks for it automatically and shows the path in the AutoHotkey section of the Voice settings panel.
To verify: the status line shows either the AHK path (green) or “AutoHotkey v2 not found” (orange).
Click Open scripts folder to open the folder where standalone AHK script files are stored.
Using Voice with Trados Studio
Voice sends input at the Win32 hardware-input level (equivalent to physical keystrokes), which is fully compatible with Trados Studio’s WPF editor. Useful commands to add:
| Phrase | Type | Action |
|---|---|---|
| ”confirm segment” | Keystroke | ctrl+enter |
| ”next segment” | Keystroke | alt+down |
| ”previous segment” | Keystroke | alt+up |
| ”go to top” | Keystroke | ctrl+home |
| ”undo” | Keystroke | ctrl+z |
After creating a command, start Always-On, click into Trados Studio, and speak the phrase.
Global hotkeys
| Shortcut | Action |
|---|---|
| Ctrl+Alt+A (⌘⌥A on macOS) | Toggle Always-On listening |
| Ctrl+Shift+Space (⌘⇧Space on macOS) | Push-to-talk (one utterance) |
| F9 | Push-to-talk (inside Workbench editor) |
Global hotkeys work on macOS too (via the NSEvent monitor), but require Accessibility permission for whichever binary launched Python – see Keyboard Shortcuts for setup. All hotkeys can be customised in Settings → Keyboard Shortcuts.