Parshu-STT – Real-Time Voice Transcription

Parshu-STT is a lightweight, always-on voice transcription tool designed specifically for Windows. It transforms spoken words into text in real-time and automatically pastes the transcription at your cursor position using a global hotkey. Built with production-ready architecture, this tool is used daily for hands-free typing and documentation tasks. The combination of Groq Whisper v3 Turbo for ultra-fast transcription, FFmpeg for audio capture, and PyQt6 for native Windows integration creates a seamless productivity experience. What makes Parshu-STT unique is its workflow-focused design: custom voice commands like "nextline" (inserts newline) and "and" (inserts comma with space) enable natural dictation flow without breaking concentration. Works universally across all applications—text editors, browsers, IDEs, and more.

1. Activation: User presses global hotkey (Ctrl+Shift+V) from any application to activate recording. 2. Audio Capture: FFmpeg begins capturing real-time audio from the default microphone at 16kHz sample rate optimized for Whisper. 3. Recording Indicator: Visual notification shows recording status while user speaks. 4. Streaming to API: Audio is buffered and sent to Groq's Whisper v3 Turbo API for low-latency transcription. 5. Real-Time Transcription: Groq API returns transcribed text within 1-2 seconds with automatic punctuation and formatting. 6. Command Processing: Special keywords ("nextline", "and") are replaced with their corresponding characters. 7. Auto-Paste: Transcribed text is automatically pasted at the current cursor position using clipboard simulation. 8. Ready State: System returns to idle state, ready for next dictation session.

Global Hotkey: Works from any application with system-wide hotkey registration

Auto-Paste: Automatically inserts text at cursor position without manual copy-paste

Custom Commands: "nextline" for newlines, "and" for commas—extensible design for more commands

Universal Compatibility: Works with all text input fields across any application

Groq Whisper v3 Turbo: Industry-leading transcription with <2 second latency

Windows-Native: System tray integration, auto-startup option, minimal resource footprint

PythonElectron 28Groq Whisper v3 TurboFFmpegPyQt6

Parshu-STT

Parshu-STT – Real-Time Voice Transcription

Overview

How It Works

Key Features

Tech Stack