ChatGPT Realtime Audio
- Realtime-2 — voice agent · GPT-5-class reasoning · $32 / $64 per 1M audio tokens
- Realtime-Translate — live translation · 70 → 13 languages · $0.034 / min ≈ $2.04 / hour
- Realtime-Whisper — streaming speech-to-text · $0.017 / min ≈ $1.02 / hour
This page is the honest comparison vs Whipscribe — last price-check 2026-05-08.
If you're a developer building a voice agent and you want GPT-5 reasoning on the audio path with a single endpoint — Realtime-2 is the right tool, and Whipscribe doesn't compete on that.
If you want a transcript — paste a URL, drop a file, or hit an MCP/REST endpoint — Whipscribe is $0.010/minute on the $2 / 200-minute Starter pack (a bit above Realtime-Whisper's ~$1.02/hr), but bigger packs collapse the rate fast: $12 buys 6,000 minutes (effective $0.12/hr — ~5× cheaper than ChatGPT Realtime at the same volume). Every pack is one-time payment, credits never expire, no subscription. 30 minutes/day free without signup. 99 languages, diarization included, no audio sent to OpenAI servers.
If you care about privacy — Whipscribe runs self-hosted faster-whisper / whisperX with no third-party AI calls on the audio path; ChatGPT Realtime sends every audio stream to OpenAI's US servers, retained per OpenAI's data-usage policy.
The three new models
GPT-Realtime-2
Voice agent with GPT-5-class reasoning. Carries multi-turn dialogue, calls tools, executes actions while the user is still talking. The flagship — and the most expensive of the three.
GPT-Realtime-Translate
Live speech-to-speech translation. 70 input languages → 13 output languages, low-latency streaming. Aimed at meeting bots, dubbing, and customer-support overlay use cases.
GPT-Realtime-Whisper
Streaming speech-to-text. Same Whisper family OpenAI has shipped before, now exposed as a real-time stream so partial transcripts appear as the speaker talks. The closest direct competitor to Whipscribe's transcription API.
Pricing pulled from OpenAI's launch announcement on 2026-05-08. Audio billing is rounded to the nearest second; Realtime-2 token math depends on input length.
At a glance
ChatGPT Realtime Audio vs Whipscribe
| Feature | ChatGPT Realtime Audio OpenAI · 2026-05-08 |
Whipscribe Neugence · privacy-first |
|---|---|---|
| Product category | Voice AI · streaming STT API developer-only |
Transcription utility web app · API · MCP ChatGPT GPT · Mac desktop |
| Cheapest transcription rate | Realtime-Whisper · $0.017/min (~$1.02/hr) Realtime-Translate · $0.034/min (~$2.04/hr) |
$0.010 / minute on $2 Starter pack drops fast on bigger packs: $12 = 6,000 min ($0.002/min) $29 = 30,000 min ($0.001/min) credits never expire · no subscription |
| Voice-agent / GPT-5 reasoning | Yes — Realtime-2 $32 / $64 per 1M tokens |
Not offered we ship transcripts; bring your own model (Claude, GPT, local) |
| Try without signing up | No OpenAI account + API key required |
Yes 30 min/day free no account, no card |
| Free tier | None billed from minute zero |
Anonymous · 30 min/day + 60 minutes free on signup |
| Privacy / data residency | Audio sent to OpenAI servers (US) retention per OpenAI data policy not HIPAA-eligible by default |
Self-hosted Whisper (our own GPU cluster) audio never sent to OpenAI no training on uploads see /security + /privacy |
| Speaker diarization | Not built-in | Included whisperX + pyannote no extra fee |
| Word-level timestamps | Streaming token deltas no aligned word timings |
SRT · VTT · JSON · DOCX · TXT word-level alignment |
| Languages | Whisper-realtime · 99 Translate · 70 → 13 |
99 (full Whisper coverage, all tiers) |
| URL input (YouTube, Vimeo, podcast feeds) |
No raw audio stream only |
Yes paste a link, we pull the audio |
| File uploads (.mp3, .mp4, .m4a, .wav, .mov…) |
No raw audio stream only you write the chunking, the resampling, the VAD, the retry logic |
Yes — drag many files up to 10 hours / 5 GB per file parallel jobs auto-resume on tab close any container we can decode |
| Searchable transcript library | None token deltas in a stream; no archive, no search, no folders, no share links — you build all of it |
Full library every transcript saved · full-text search folders + share links + trash rename · delete · re-export anytime cross-device sync via your account |
| Chat with the transcript (Claude Sonnet) |
No token feed only "summarize my meeting" = your code, your prompt, your model bill |
Built-in Claude Sonnet on every transcript "summarize the decisions", "pull every action item", "what did Sarah say about churn?" private · per-session, never stored |
| Live transcription | Yes real-time streaming model |
Live Meeting Notes (beta) streaming Whisper on web |
| Native integrations | OpenAI SDK Realtime API endpoints |
REST API MCP server (Claude/Cursor) ChatGPT Custom GPT Obsidian · Mac desktop Chrome extension |
| Human support | Community forum + docs API status page Enterprise tier for ticketed support |
Email a human contact@neugence.ai same person who built the product typically same-day reply no tier-gating |
| Pricing model | Usage-metered API no caps · no free tier |
One-time credit packs $2 / 200 min · $4 / 600 min · $7 / 2,400 min $12 / 6,000 min · $29 / 30,000 min Credits never expire · no auto-renew |
What Whipscribe ships that ChatGPT Realtime doesn’t
Realtime is a developer audio API — its job ends at "stream out the bytes." Whipscribe ships the finished workflow: the transcript file, the place to keep it, the AI to talk to it, and a human you can email when something is off. Four pieces, all included on every paid pack, none of them on the Realtime side without you building them yourself.
📁 Files of any size, any format
Drop a 3-hour podcast .mp3, a Zoom .mp4, an iPhone Voice-Memo .m4a, a .wav, a .mov, an .opus — Whipscribe decodes it. Up to 10 hours / 5 GB per file, and you can drag a whole folder in for parallel jobs. No chunking code, no resampler, no "audio format not supported." Realtime accepts a raw audio stream — file ingestion is on you.
🔍 A searchable library that’s yours
Every transcript you make is saved to your account, full-text searchable across the whole library — find "the meeting where we discussed pricing" in one query, six months later. Folders, share links, trash with restore, rename, re-export anytime. Realtime returns token deltas; archive + search + share is something you build.
💬 Claude Sonnet chat on every transcript
Open any transcript and ask Claude — "summarize the key decisions", "pull every action item with timestamps", "what did Sarah say about churn?" — included on every paid pack. Conversations are per-session, never stored on our server, and Anthropic doesn’t train on them. With Realtime you’d wire your own model, your own prompt, your own bill.
📨 A human who answers your email
Email contact@neugence.ai and you reach the same person who built the product — typically same-day reply, no support-tier gate. OpenAI’s Realtime API ships with docs, a forum, and an enterprise contract for ticketed support. On Whipscribe a question about your file, your bill, or a feature request goes straight to a human who can fix it.
Pricing — head to head
| Workload | ChatGPT Realtime Audio | Whipscribe |
|---|---|---|
| 1 hour of transcription / month | ~$1.02 / hour (60 min × $0.017, Realtime-Whisper) |
$2.00 PAYG or $0 if under the daily 30-min free tier |
| 10 hours / month | ~$10.20 | $4 Casual pack (600 minutes · $0.007/min) credits never expire |
| 40 hours / month (active podcaster / journalist) |
~$40.80 (40 × $1.02/hr) |
$7 Creator pack (2,400 minutes · $0.003/min) or $12 / 6,000-min pack ~5–10× cheaper |
| 6,000 minutes / month (podcast network · research lab) |
~$102.00 (100 × $1.02/hr) |
$12 / 6,000-min pack (effective $0.12/hr · ~8.5× cheaper) |
| 1 hour live translation | ~$2.04 / hour (60 min × $0.034, Realtime-Translate) |
$2.00 transcript + paste into Claude / DeepL |
| Voice-agent app (50K interactions / month) |
Token-metered Realtime-2 · $32 / $64 per 1M tokens |
Out of scope we’re a transcription utility, not a voice agent |
| Try without paying | No free tier | 30 min/day anonymous + 60 minutes free on signup |
All numbers from public price pages on 2026-05-08. Realtime-2 voice-agent math depends on conversation length; the line above is illustrative only.
When ChatGPT Realtime Audio is the right call
- You’re building a voice agent that needs GPT-5 reasoning, tool calls, and barge-in. Realtime-2 is genuinely state-of-the-art here.
- You need live speech-to-speech translation in 13 target languages and want a single OpenAI endpoint instead of stitching Whisper + GPT-4o + TTS.
- Your audio is already in OpenAI’s ecosystem (Realtime API, Responses API, Assistants) and a separate vendor adds friction.
When Whipscribe is the better fit
- You want a transcript — TXT, SRT, VTT, DOCX, JSON — not a voice agent.
- You care about privacy: legal, medical, journalism, EU-residency, or anything subject to data-export rules.
- You want to try without signing up — paste a URL or drop a file and read the transcript in 30 seconds.
- You need diarization out of the box, URL input, batch uploads, an MCP server, a ChatGPT Custom GPT, an Obsidian plugin, or a Mac desktop app.
- You want credits that never expire: pick a pack that fits ($2 / 200-min · $4 / 600 · $7 / 2,400 · $12 / 6,000 · $29 / 30,000) and use them today, next month, or a year from now. At the $12 pack the effective rate is $0.12/hr; at the $29 pack it’s $0.058/hr — both well under the $1.02/hr Realtime-Whisper rate.
FAQ
Which one should I pick for my use case?
If you need a transcript file (TXT, SRT, DOCX) from a recording, an interview, a meeting, a podcast, or a YouTube link — Whipscribe is the right tool. Drop the file or paste the URL and read the transcript.
If you’re a developer building a live voice agent that needs GPT-5 reasoning, tool calls, and barge-in — ChatGPT Realtime-2 is the right tool. It’s an API, not a transcription product.
If you need live speech-to-speech translation across 70 → 13 languages with low latency — ChatGPT Realtime-Translate is built for that. Whipscribe transcribes; translation is a separate step.
Do I need to write code to use ChatGPT Realtime Audio?
Yes. All three models are developer APIs — you’ll need an OpenAI account, an API key, and code that opens an audio stream to api.openai.com. There is no web app and no upload form.
Whipscribe has a web app at whipscribe.com — paste a URL or drop a file and you get a transcript in seconds, no code, no account required for the first 30 minutes a day.
Can I transcribe a YouTube video or podcast URL with ChatGPT Realtime?
Not directly. ChatGPT Realtime accepts a raw audio stream — you’d need to download or capture the audio yourself and pipe it in. Whipscribe accepts a URL: paste a YouTube, Vimeo, podcast, or direct media link and the audio is fetched for you.
How much will I actually pay for typical workloads?
Realtime-Whisper is billed at $0.017 per minute, rounded to the nearest second.
- 1 hour / month — ChatGPT Realtime ≈ $1.02 · Whipscribe $2.00 PAYG (or $0 if under the daily 30-min free tier)
- 10 hours / month — ChatGPT Realtime ≈ $10.20 · Whipscribe $4 / 600-minute pack (covers it with room to spare)
- 40 hours / month — ChatGPT Realtime ≈ $40.80 · Whipscribe $7 / 2,400-min pack (≈ 5.8× cheaper)
- 6,000 minutes / month — ChatGPT Realtime ≈ $102 · Whipscribe $12 / 6,000-min pack (≈ 8.5× cheaper)
Realtime-Translate at $0.034/min (≈ $2.04/hr) and Realtime-2 voice agent ($32 / $64 per 1M tokens) bill the same way — usage-metered, no monthly cap, no free tier.
Is my audio private with each tool?
ChatGPT Realtime Audio: every audio stream is sent to OpenAI’s servers in the United States and retained per OpenAI’s data-usage policy. Default API access is not HIPAA-eligible.
Whipscribe: audio is processed on Whipscribe’s own GPU cluster using self-hosted Whisper / whisperX. Audio is never sent to OpenAI or any third-party AI provider. Recordings are not used for training. The full posture is on /security and /privacy.
Does ChatGPT Realtime work for Zoom / Google Meet / Teams transcripts?
There’s no built-in meeting bot — you’d capture the meeting audio yourself, then stream it in. Whipscribe accepts uploaded recordings (mp4, m4a, mp3, wav and many more) and offers Live Meeting Notes in beta for browser-tab capture.
Which is more accurate?
Realtime-Whisper is the same Whisper family Whipscribe runs (whisper / whisperX). On clean speech the two are very close. Differences in the final transcript come from features layered on top: speaker diarization, word-level alignment, punctuation restoration, and language detection — all included on Whipscribe, not on Realtime-Whisper.
HIPAA / SOC 2 / EU data residency — what are my options?
OpenAI offers HIPAA via their Enterprise tier; default API access is not HIPAA-eligible. EU residency requires an OpenAI Enterprise contract.
Whipscribe runs on Neugence-owned infrastructure with self-hosted models and no third-party AI on the audio path. See /security for the current posture, certifications status, and how to request a DPA.
Can I use both together?
Yes — they solve different problems. A common pattern: use Whipscribe for the transcript (with diarization, timestamps, exports), then feed the text to GPT-5 / Realtime-2 to build a voice agent on top. The Whipscribe MCP server and the ChatGPT Custom GPT make that handoff one click.
Is there a free way to try either one?
ChatGPT Realtime Audio has no free tier — you pay from the first second.
Whipscribe gives every visitor 30 minutes of transcription per day with no signup, plus 60 minutes free on signup. Credits don’t expire.
Whipscribe is a managed faster-whisper + whisperX service — privacy-first, credit packs from $2 / 200 min PAYG up to $29 / 30,000 min (credits never expire), no API key to try, 99 languages, diarization included.
Transcribe a file →Cross-references: OpenAI Whisper API (older, $0.006/min) · Deepgram · AssemblyAI · all 27 tools · our security posture · our pricing.