Voice and Dictation
Witsy allows you to interact with AI models using your voice, supporting both Speech-to-Text (STT) for hands-free dictation and Text-to-Speech (TTS) for hearing AI responses read aloud.
How to Configure Voice Providers
Before using voice features, you must configure a provider in your settings. Witsy supports a wide range of cloud and local options.
- Open Settings in Witsy.
- Navigate to the Speech-to-Text or Text-to-Speech tab.
- Select your preferred provider:
- Cloud (High Quality): OpenAI (Whisper), ElevenLabs, Groq, or fal.ai.
- Local (Private/Free): Local Whisper (runs directly on your machine).
- Enter your API key for the selected provider (if using a cloud service).
How to Dictate Your Prompts
Instead of typing, you can speak directly to Witsy. This is ideal for long-form brainstorming or when you want to capture thoughts quickly.
Using the Microphone Icon
- Click the Microphone icon in the chat input area.
- Start speaking. Witsy will record your voice.
- Click the Stop/Checkmark button when finished.
- The recorded audio will be transcribed into text inside the input field.
- Press Enter to send the prompt to the AI.
Recipe: Rapid Transcription with Groq
If you need near-instant transcription, use Groq as your Speech-to-Text provider.
- Setup: Select Groq in Settings and provide an API key.
- Result: Groq’s optimized Whisper implementation can transcribe paragraphs of text in under a second, making the transition from voice to text feel seamless.
How to Hear AI Responses (TTS)
Witsy can turn AI-generated text into natural-sounding speech. This is useful for accessibility or when you are multitasking.
Listening to a Specific Message
- Hover over any AI response in the chat window.
- Click the Speaker/Play icon that appears in the message actions.
- The AI will read the response using your configured TTS engine (e.g., ElevenLabs or OpenAI).
Enabling Auto-Read
If you want Witsy to act like a voice assistant, you can enable automatic playback:
- Go to Settings > General.
- Toggle Auto-read responses to "On."
- Every time the AI finishes generating a message, it will begin speaking automatically.
How to Use Local Dictation for Privacy
If you are working with sensitive information and don't want your voice data sent to external servers, you can use Local Whisper.
- In Settings > Speech-to-Text, select Local Whisper.
- The first time you use it, Witsy will download the necessary model weights (this may take a moment depending on your internet speed).
- Once downloaded, all transcription happens entirely on your CPU/GPU, ensuring your audio never leaves your device.
Recommended Usage Scenarios
| Scenario | Recommended STT | Recommended TTS | | :--- | :--- | :--- | | Maximum Privacy | Local Whisper | None (Visual only) | | Natural Conversations | Groq (Fast) | ElevenLabs (High Realism) | | Budget Friendly | Groq / Local Whisper | OpenAI (HD Voices) | | Mobile/Remote Use | OpenAI Whisper | fal.ai |
Pro-Tip: Keyboard Shortcuts
To speed up your workflow, check the Shortcuts section in Settings. You can often map a global hotkey to trigger the microphone, allowing you to dictate into Witsy even when the app is in the background.