Bulk Transcribe Audio to SRT Subtitles Online Free

Generate SRT subtitles with accurate timestamps — two Whisper models, up to 216x realtime.

Drop up to 50 files at once — no install, no sign-up required.

Drop Audio or Video Files Here

100 MB or 1 hour per file Up to 50 files 3 parallel conversions 2 credits per minute

Encrypted AI-Powered Global Servers Auto-delete 1h

Median transcription time (last 10k jobs): 1.2s per minute
Outputs: SRT Model: Fast

How it works

  1. 1 · Drop your files

    Drag & drop audio or video files. Supports MP3, WAV, M4A, MP4, and more. No account required.

  2. 2 · We generate subtitles

    Transcribed by OpenAI Whisper (choose Fast or Quality model) with accurate timestamps. Encrypted in transit & at rest.

  3. 3 · Download & auto-delete

    Get your SRT subtitle file in seconds. Files delete automatically after 1 hour.

Frequently Asked Questions

What is the SRT format?

SRT (SubRip Text) is the most widely supported subtitle format.

It contains numbered subtitle entries with timestamps and text.

SRT files work with virtually all video players, editors, and platforms including YouTube, Vimeo, and social media.

SRT vs VTT — which should I use?

Use SRT for:

  • Video editing software
  • Desktop players (VLC, QuickTime)
  • Platforms like YouTube, Vimeo, or Facebook

Use VTT for HTML5 web video with <track> tags.

SRT has broader legacy compatibility across devices and software; VTT supports advanced styling and positioning for web.

Need web captions instead? Try Audio to VTT →

How accurate are the timestamps?

Our AI generates word-level timestamps using OpenAI Whisper — one of the most accurate speech recognition models available.

Timestamps are typically accurate to within a few hundred milliseconds.

What is the difference between Fast and Quality models?

Two OpenAI Whisper models — choose speed or accuracy:

ModelEngineSpeedCost
FastWhisper V3 Large Turbo (809M)~216x realtime2 credits/min
QualityWhisper V3 Large (1.55B)~189x realtime5 credits/min

Fast is the default — great for clear audio, podcasts, and lectures.

Quality uses the full 1.55B-parameter model. Independent benchmarks show ~10% WER for Quality vs ~12% for Fast (Artificial Analysis). Choose Quality for accented speech, noisy recordings, or technical content.

Both models support 99+ languages. Switch in the options panel above.

Sources: Groq docs, Artificial Analysis benchmark, Hugging Face model cards.

Can Meeting Intelligence label who speaks in subtitles?

Yes — Meeting Intelligence adds speaker labels to your SRT subtitles. This makes it clear who's speaking in multi-speaker videos like interviews, panel discussions, or podcasts.

When enabled, speaker names or labels appear at the start of subtitle entries:

1
00:00:01,000 --> 00:00:04,000
John: Welcome to the show.

2
00:00:04,500 --> 00:00:07,000
Sarah: Thanks for having me.

Our AI post-processing attempts to identify speakers by name when they're introduced or addressed in the audio. Important: Name detection isn't perfect — it works best when speakers introduce themselves ("Hi, I'm John...") or are called by name. If names aren't detected, you'll see generic labels like "Speaker 1" and "Speaker 2."

Meeting Intelligence costs extra credits and is ideal for content where knowing who speaks matters.

Can I customize the subtitle timing?

The generated SRT file uses Whisper's word-level timing, which works well for most use cases.

For fine-tuning, import the SRT into any subtitle editor (Aegisub, Subtitle Edit) to adjust timing, merge lines, or split captions.

The transcript text will be accurate — you're just adjusting presentation.

What are the limits for this converter?

TierMax File SizeMax Files/BatchParallel Processing
Guest/Free100 MB50 files3 at once
Pro1024 MB1000 files6 at once

Note: File size limits are specific to this converter. Batch and parallel processing limits apply to all images converters site-wide. See all converter limits →

How are credits calculated for this conversion?

Cost: 2 credits per minute

How it works:

  • Files up to 1 minutes: 2 credits
  • 2 minutes: 4 credits
  • 3 minutes: 6 credits
  • 4 minutes: 8 credits

Example: A 10-minute file = 20 credits. A 180-minute (3h) audiobook = 360 credits.

Why per-minute? Audio conversion time scales with content duration, not file size. Longer audio requires proportionally more processing.

What are my daily and monthly credit limits?

Credit allocations vary by account tier:

TierDaily LimitMonthly Limit
Guest100 credits/day
Free100 credits/day
Pro12,000 credits/month

Daily credits (Guest & Free tiers) reset every day at midnight UTC. Monthly credits (Pro) reset on your billing cycle date.

Note: With 2 credit per minute, audio files under 1 MB cost 2 credit each. Pro users can convert 6,000 audio files per month.

Other Transcription Formats

Need a different output format for your transcription?

What's New in Audio to SRT

Latest improvements to this converter

Last updated February 9, 2026
Feb 9, 2026

Added Whisper V3 Large as a Quality mode for higher-accuracy transcription.

Jan 22, 2026

Launched Audio to SRT transcription with timestamped subtitles.

Need to get more done? Pro starts from $5.

1 GB files 1,000 per batch Priority queue Web + API

No subscription required.