Manual Transcription

Overview

Beyond real-time voice dictation, Vowen can transcribe pre-recorded audio and video files. Use this for interviews, podcasts, recorded meetings, or any media file.

How to Transcribe a File

Open the Transcribe dialog

Click the + Transcribe button at the top right of the Vowen window, or open Transcribe from the sidebar and click the same button there. A dialog opens with the upload options.

Pick a transcription model

Choose any of your configured local or cloud models from the Transcription Model dropdown. This selection is per-file, so you can run a single file through a higher-accuracy model without changing your global default.

Set language and speaker options

Pick a language from the Language dropdown, or leave it on Auto-detect. If your selected model supports speaker diarization, toggle Identify Speakers to label each speaker in the output.

Add your file

Drag and drop an audio or video file into the drop zone, or click to browse.

Click Transcribe

Vowen handles compression and chunking automatically if needed. The transcription appears with timestamps shown as [MM:SS] badges where the model supports them. You can edit, copy, regenerate, or export the result.

Supported Formats

Audio: mp3, wav, m4a, aac, ogg, flac, wma, opus Video: mp4, mov, avi, mkv, flv, wmv, webm, mpeg, mpg For video files, Vowen automatically extracts the audio track before transcription.

Timestamps

For manual transcriptions using Parakeet or Whisper CLI mode, timestamps are included in the output. These appear as time badges marking when each segment was spoken. Most cloud transcription models also produce timestamps; refer to the Models Guide for specifics.

File Size and Duration Limits

Vowen handles large files automatically. Compression and chunking happen behind the scenes based on the provider you pick:

Provider	Threshold	What Vowen does
Groq Whisper	24 MB	Compresses WAV to MP3 at 96 kbps, then splits into chunks of about 30 minutes
ElevenLabs Scribe v2	50 MB	Splits into 20-minute chunks and merges the results
Mistral Voxtral	50 MB or 27 minutes	Splits into 27-minute chunks
Sarvam Saaras v3	30 seconds (hard provider limit)	Splits into 28-second chunks
Deepgram, AssemblyAI, Soniox, Speechmatics, xAI Aurora	No client-side limit	Audio is sent as-is; provider-side limits apply

You never need to split files manually. The five providers in the last row enforce their own limits at the API level; the others have explicit handling in Vowen.

Regenerating Transcriptions

When you open a completed transcription, the detail page shows a Regenerate Transcript panel in the sidebar with every model you have configured, grouped by Local and Cloud. Local models include Parakeet; cloud options can include Groq, Soniox, Deepgram, Mistral, AssemblyAI, Sarvam AI, ElevenLabs, Speechmatics, xAI, and any other provider you have connected.

Open the transcription detail page
Pick any model from the panel
For models that support speaker diarization, toggle Identify Speakers to label each speaker in the new transcript
Click Regenerate Transcript

The original transcript is preserved as a version, so regeneration produces a new one without overwriting the first. This is useful for comparing how different models handle the same audio, or for running a higher-accuracy pass after a quick first run.

Export Options

From the transcription detail page or the export modal, you can save the transcript in three formats:

Format	Extension	Best For
Plain Text	`.txt`	Notes, copy-paste, archival
WebVTT	`.vtt`	Web video captions
SubRip	`.srt`	Standard video editor subtitles

The export modal also lets you swap the speech model and toggle speaker diarization right before export, so you can re-process the audio without leaving the modal.

WebVTT and SubRip exports require a model that produces fine-grained timestamps. Parakeet’s batch output does not include the granularity these subtitle formats need, so VTT and SRT are disabled when Parakeet is the active model. Switch to a different model in the export modal to enable them, or use Plain Text export.

Chat with the Transcript

Once a transcription is complete, you can ask AI questions about its contents from the chat panel: pull out action items, summarize a section, find a quote, and so on. See Chat with Transcriptions.

Get Started

Transcription

AI Features

Meeting Notes

Features

Integrations

Platform Guides

More

Manual Transcription

Overview

How to Transcribe a File

Supported Formats

Timestamps

File Size and Duration Limits

Regenerating Transcriptions

Export Options

Chat with the Transcript

Get Started

Transcription

AI Features

Meeting Notes

Features

Integrations

Platform Guides

More

Documentation Index

​Overview

​How to Transcribe a File

​Supported Formats

​Timestamps

​File Size and Duration Limits

​Regenerating Transcriptions

​Export Options

​Chat with the Transcript

Overview

How to Transcribe a File

Supported Formats

Timestamps

File Size and Duration Limits

Regenerating Transcriptions

Export Options

Chat with the Transcript