ChatGPT Realtime Audio

Name: ChatGPT Realtime Audio
Price: 0.017 USD
Author: OpenAI

by OpenAI · launched 2026-05-08

NewOpenAI shipped three GPT-Realtime audio models on 2026-05-08:

Realtime-2 — voice agent · GPT-5-class reasoning · $32 / $64 per 1M audio tokens
Realtime-Translate — live translation · 70 → 13 languages · $0.034 / min ≈ $2.04 / hour
Realtime-Whisper — streaming speech-to-text · $0.017 / min ≈ $1.02 / hour

All three are developer APIs requiring an OpenAI account and API key.
This page is the honest comparison vs Whipscribe — last price-check 2026-05-08.

TL;DR

If you're a developer building a voice agent and you want GPT-5 reasoning on the audio path with a single endpoint — Realtime-2 is the right tool, and Whipscribe doesn't compete on that.

If you want a transcript — paste a URL, drop a file, or hit an MCP/REST endpoint — Whipscribe is $0.010/minute on the $2 / 200-minute Starter pack (a bit above Realtime-Whisper's ~$1.02/hr), but bigger packs collapse the rate fast: $12 buys 6,000 minutes (effective $0.12/hr — ~5× cheaper than ChatGPT Realtime at the same volume). Every pack is one-time payment, credits never expire, no subscription. 30 minutes/day free without signup. 99 languages, diarization included, no audio sent to OpenAI servers.

If you care about privacy — Whipscribe runs self-hosted faster-whisper / whisperX with no third-party AI calls on the audio path; ChatGPT Realtime sends every audio stream to OpenAI's US servers, retained per OpenAI's data-usage policy.

The three new models

GPT-Realtime-2

$32 / 1M audio input · $64 / 1M audio output ($0.40 cached input)

Voice agent with GPT-5-class reasoning. Carries multi-turn dialogue, calls tools, executes actions while the user is still talking. The flagship — and the most expensive of the three.

GPT-Realtime-Translate

$0.034 / minute (~$2.04/hr)

Live speech-to-speech translation. 70 input languages → 13 output languages, low-latency streaming. Aimed at meeting bots, dubbing, and customer-support overlay use cases.

GPT-Realtime-Whisper

$0.017 / minute (~$1.02/hr)

Streaming speech-to-text. Same Whisper family OpenAI has shipped before, now exposed as a real-time stream so partial transcripts appear as the speaker talks. The closest direct competitor to Whipscribe's transcription API.

Pricing pulled from OpenAI's launch announcement on 2026-05-08. Audio billing is rounded to the nearest second; Realtime-2 token math depends on input length.

At a glance

ChatGPT Realtime Audio vs Whipscribe

Feature	ChatGPT Realtime Audio OpenAI · 2026-05-08	Whipscribe Neugence · privacy-first
Product category	Voice AI · streaming STT API developer-only	Transcription utility web app · API · MCP ChatGPT GPT · Mac desktop
Cheapest transcription rate	Realtime-Whisper · $0.017/min (~$1.02/hr) Realtime-Translate · $0.034/min (~$2.04/hr)	$0.010 / minute on $2 Starter pack drops fast on bigger packs: $12 = 6,000 min ($0.002/min) $29 = 30,000 min ($0.001/min) credits never expire · no subscription
Voice-agent / GPT-5 reasoning	Yes — Realtime-2 $32 / $64 per 1M tokens	Not offered we ship transcripts; bring your own model (Claude, GPT, local)
Try without signing up	No OpenAI account + API key required	Yes 30 min/day free no account, no card
Free tier	None billed from minute zero	Anonymous · 30 min/day + 60 minutes free on signup
Privacy / data residency	Audio sent to OpenAI servers (US) retention per OpenAI data policy not HIPAA-eligible by default	Self-hosted Whisper (our own GPU cluster) audio never sent to OpenAI no training on uploads see /security + /privacy
Speaker diarization	Not built-in	Included whisperX + pyannote no extra fee
Word-level timestamps	Streaming token deltas no aligned word timings	SRT · VTT · JSON · DOCX · TXT word-level alignment
Languages	Whisper-realtime · 99 Translate · 70 → 13	99 (full Whisper coverage, all tiers)
URL input (YouTube, Vimeo, podcast feeds)	No raw audio stream only	Yes paste a link, we pull the audio
File uploads (.mp3, .mp4, .m4a, .wav, .mov…)	No raw audio stream only you write the chunking, the resampling, the VAD, the retry logic	Yes — drag many files up to 10 hours / 5 GB per file parallel jobs auto-resume on tab close any container we can decode
Searchable transcript library	None token deltas in a stream; no archive, no search, no folders, no share links — you build all of it	Full library every transcript saved · full-text search folders + share links + trash rename · delete · re-export anytime cross-device sync via your account
Chat with the transcript (Claude Sonnet)	No token feed only "summarize my meeting" = your code, your prompt, your model bill	Built-in Claude Sonnet on every transcript "summarize the decisions", "pull every action item", "what did Sarah say about churn?" private · per-session, never stored
Live transcription	Yes real-time streaming model	Live Meeting Notes (beta) streaming Whisper on web
Native integrations	OpenAI SDK Realtime API endpoints	REST API MCP server (Claude/Cursor) ChatGPT Custom GPT Obsidian · Mac desktop Chrome extension
Human support	Community forum + docs API status page Enterprise tier for ticketed support	Email a human contact@neugence.ai same person who built the product typically same-day reply no tier-gating
Pricing model	Usage-metered API no caps · no free tier	One-time credit packs $2 / 200 min · $4 / 600 min · $7 / 2,400 min $12 / 6,000 min · $29 / 30,000 min Credits never expire · no auto-renew

Privacy in plain English. Whipscribe is privacy-first by design: your audio hits our self-hosted Whisper / whisperX cluster and never leaves it for an OpenAI, AssemblyAI, or Deepgram round-trip. We don’t train on uploads. Anonymous transcripts are auto-deleted on a short clock; signed-in transcripts stay in your library, deletable any time. Read the full posture at /security and /privacy. With ChatGPT Realtime Audio, every audio frame is sent to OpenAI’s US servers and held under OpenAI’s data-usage policy — that may be fine for some teams, and a hard blocker for legal, medical, journalism-source, and EU-residency workflows.

What Whipscribe ships that ChatGPT Realtime doesn’t

Realtime is a developer audio API — its job ends at "stream out the bytes." Whipscribe ships the finished workflow: the transcript file, the place to keep it, the AI to talk to it, and a human you can email when something is off. Four pieces, all included on every paid pack, none of them on the Realtime side without you building them yourself.

📁 Files of any size, any format

Drop a 3-hour podcast .mp3, a Zoom .mp4, an iPhone Voice-Memo .m4a, a .wav, a .mov, an .opus — Whipscribe decodes it. Up to 10 hours / 5 GB per file, and you can drag a whole folder in for parallel jobs. No chunking code, no resampler, no "audio format not supported." Realtime accepts a raw audio stream — file ingestion is on you.

🔍 A searchable library that’s yours

Every transcript you make is saved to your account, full-text searchable across the whole library — find "the meeting where we discussed pricing" in one query, six months later. Folders, share links, trash with restore, rename, re-export anytime. Realtime returns token deltas; archive + search + share is something you build.

💬 Claude Sonnet chat on every transcript

Open any transcript and ask Claude — "summarize the key decisions", "pull every action item with timestamps", "what did Sarah say about churn?" — included on every paid pack. Conversations are per-session, never stored on our server, and Anthropic doesn’t train on them. With Realtime you’d wire your own model, your own prompt, your own bill.

📨 A human who answers your email

Email contact@neugence.ai and you reach the same person who built the product — typically same-day reply, no support-tier gate. OpenAI’s Realtime API ships with docs, a forum, and an enterprise contract for ticketed support. On Whipscribe a question about your file, your bill, or a feature request goes straight to a human who can fix it.

Pricing — head to head

Workload	ChatGPT Realtime Audio	Whipscribe
1 hour of transcription / month	~$1.02 / hour (60 min × $0.017, Realtime-Whisper)	$2.00 PAYG or $0 if under the daily 30-min free tier
10 hours / month	~$10.20	$4 Casual pack (600 minutes · $0.007/min) credits never expire
40 hours / month (active podcaster / journalist)	~$40.80 (40 × $1.02/hr)	$7 Creator pack (2,400 minutes · $0.003/min) or $12 / 6,000-min pack ~5–10× cheaper
6,000 minutes / month (podcast network · research lab)	~$102.00 (100 × $1.02/hr)	$12 / 6,000-min pack (effective $0.12/hr · ~8.5× cheaper)
1 hour live translation	~$2.04 / hour (60 min × $0.034, Realtime-Translate)	$2.00 transcript + paste into Claude / DeepL
Voice-agent app (50K interactions / month)	Token-metered Realtime-2 · $32 / $64 per 1M tokens	Out of scope we’re a transcription utility, not a voice agent
Try without paying	No free tier	30 min/day anonymous + 60 minutes free on signup

All numbers from public price pages on 2026-05-08. Realtime-2 voice-agent math depends on conversation length; the line above is illustrative only.

When ChatGPT Realtime Audio is the right call

You’re building a voice agent that needs GPT-5 reasoning, tool calls, and barge-in. Realtime-2 is genuinely state-of-the-art here.
You need live speech-to-speech translation in 13 target languages and want a single OpenAI endpoint instead of stitching Whisper + GPT-4o + TTS.
Your audio is already in OpenAI’s ecosystem (Realtime API, Responses API, Assistants) and a separate vendor adds friction.

When Whipscribe is the better fit

You want a transcript — TXT, SRT, VTT, DOCX, JSON — not a voice agent.
You care about privacy: legal, medical, journalism, EU-residency, or anything subject to data-export rules.
You want to try without signing up — paste a URL or drop a file and read the transcript in 30 seconds.
You need diarization out of the box, URL input, batch uploads, an MCP server, a ChatGPT Custom GPT, an Obsidian plugin, or a Mac desktop app.
You want credits that never expire: pick a pack that fits ($2 / 200-min · $4 / 600 · $7 / 2,400 · $12 / 6,000 · $29 / 30,000) and use them today, next month, or a year from now. At the $12 pack the effective rate is $0.12/hr; at the $29 pack it’s $0.058/hr — both well under the $1.02/hr Realtime-Whisper rate.

FAQ

Which one should I pick for my use case?

If you need a transcript file (TXT, SRT, DOCX) from a recording, an interview, a meeting, a podcast, or a YouTube link — Whipscribe is the right tool. Drop the file or paste the URL and read the transcript.

If you’re a developer building a live voice agent that needs GPT-5 reasoning, tool calls, and barge-in — ChatGPT Realtime-2 is the right tool. It’s an API, not a transcription product.

If you need live speech-to-speech translation across 70 → 13 languages with low latency — ChatGPT Realtime-Translate is built for that. Whipscribe transcribes; translation is a separate step.

Do I need to write code to use ChatGPT Realtime Audio?

Yes. All three models are developer APIs — you’ll need an OpenAI account, an API key, and code that opens an audio stream to api.openai.com. There is no web app and no upload form.

Whipscribe has a web app at whipscribe.com — paste a URL or drop a file and you get a transcript in seconds, no code, no account required for the first 30 minutes a day.

Can I transcribe a YouTube video or podcast URL with ChatGPT Realtime?

Not directly. ChatGPT Realtime accepts a raw audio stream — you’d need to download or capture the audio yourself and pipe it in. Whipscribe accepts a URL: paste a YouTube, Vimeo, podcast, or direct media link and the audio is fetched for you.

How much will I actually pay for typical workloads?

Realtime-Whisper is billed at $0.017 per minute, rounded to the nearest second.

1 hour / month — ChatGPT Realtime ≈ $1.02 · Whipscribe $2.00 PAYG (or $0 if under the daily 30-min free tier)
10 hours / month — ChatGPT Realtime ≈ $10.20 · Whipscribe $4 / 600-minute pack (covers it with room to spare)
40 hours / month — ChatGPT Realtime ≈ $40.80 · Whipscribe $7 / 2,400-min pack (≈ 5.8× cheaper)
6,000 minutes / month — ChatGPT Realtime ≈ $102 · Whipscribe $12 / 6,000-min pack (≈ 8.5× cheaper)

Realtime-Translate at $0.034/min (≈ $2.04/hr) and Realtime-2 voice agent ($32 / $64 per 1M tokens) bill the same way — usage-metered, no monthly cap, no free tier.

Is my audio private with each tool?

ChatGPT Realtime Audio: every audio stream is sent to OpenAI’s servers in the United States and retained per OpenAI’s data-usage policy. Default API access is not HIPAA-eligible.

Whipscribe: audio is processed on Whipscribe’s own GPU cluster using self-hosted Whisper / whisperX. Audio is never sent to OpenAI or any third-party AI provider. Recordings are not used for training. The full posture is on /security and /privacy.

Does ChatGPT Realtime work for Zoom / Google Meet / Teams transcripts?

There’s no built-in meeting bot — you’d capture the meeting audio yourself, then stream it in. Whipscribe accepts uploaded recordings (mp4, m4a, mp3, wav and many more) and offers Live Meeting Notes in beta for browser-tab capture.

Which is more accurate?

Realtime-Whisper is the same Whisper family Whipscribe runs (whisper / whisperX). On clean speech the two are very close. Differences in the final transcript come from features layered on top: speaker diarization, word-level alignment, punctuation restoration, and language detection — all included on Whipscribe, not on Realtime-Whisper.

HIPAA / SOC 2 / EU data residency — what are my options?

OpenAI offers HIPAA via their Enterprise tier; default API access is not HIPAA-eligible. EU residency requires an OpenAI Enterprise contract.

Whipscribe runs on Neugence-owned infrastructure with self-hosted models and no third-party AI on the audio path. See /security for the current posture, certifications status, and how to request a DPA.

Can I use both together?

Yes — they solve different problems. A common pattern: use Whipscribe for the transcript (with diarization, timestamps, exports), then feed the text to GPT-5 / Realtime-2 to build a voice agent on top. The Whipscribe MCP server and the ChatGPT Custom GPT make that handoff one click.

Is there a free way to try either one?

ChatGPT Realtime Audio has no free tier — you pay from the first second.

Whipscribe gives every visitor 30 minutes of transcription per day with no signup, plus 60 minutes free on signup. Credits don’t expire.

Whipscribe is a managed faster-whisper + whisperX service — privacy-first, credit packs from $2 / 200 min PAYG up to $29 / 30,000 min (credits never expire), no API key to try, 99 languages, diarization included.

Transcribe a file →

Cross-references: OpenAI Whisper API (older, $0.006/min) · Deepgram · AssemblyAI · all 27 tools · our security posture · our pricing.