Parshu-STT
interactive Demo
Parshu-STT
Parshu-STT – Real-Time Voice Transcription
Parshu-STT is a lightweight, always-on voice transcription tool designed specifically for Windows. It transforms spoken words into text in real-time and automatically pastes the transcription at your cursor position using a global hotkey. Built with production-ready architecture, this tool is used daily for hands-free typing and documentation tasks. The combination of Groq Whisper v3 Turbo for ultra-fast transcription, FFmpeg for audio capture, and PyQt6 for native Windows integration creates a seamless productivity experience. What makes Parshu-STT unique is its workflow-focused design: custom voice commands like "nextline" (inserts newline) and "and" (inserts comma with space) enable natural dictation flow without breaking concentration. Works universally across all applications—text editors, browsers, IDEs, and more.
1. Activation: User presses global hotkey (Ctrl+Shift+V) from any application to activate recording. 2. Audio Capture: FFmpeg begins capturing real-time audio from the default microphone at 16kHz sample rate optimized for Whisper. 3. Recording Indicator: Visual notification shows recording status while user speaks. 4. Streaming to API: Audio is buffered and sent to Groq's Whisper v3 Turbo API for low-latency transcription. 5. Real-Time Transcription: Groq API returns transcribed text within 1-2 seconds with automatic punctuation and formatting. 6. Command Processing: Special keywords ("nextline", "and") are replaced with their corresponding characters. 7. Auto-Paste: Transcribed text is automatically pasted at the current cursor position using clipboard simulation. 8. Ready State: System returns to idle state, ready for next dictation session.