DubPilot AI

Dub any video into 100+ languages without opening a video editor.

Built and launched as an Ascent Innovate product, live at dubpilotai.com. Users upload a video or paste a YouTube link. The platform extracts audio, transcribes and diarizes speakers, translates the transcript, regenerates voiceovers with gender-matched neural TTS, and stitches a finished MP4 back together with original background audio preserved. The whole workflow lives in one browser surface, with segment-level editing and a preview-to-unlock billing model built into the schema. We own the product, the roadmap, and the codebase, and we ship and operate it ourselves at production scale.

~7 months Next.js 15 · React 19 · Node · MongoDB · FFmpeg · AWS · Azure · Stripe · Paddle Launched Oct 2025

The Brief

Turn dubbing from a six-tool workflow into one browser surface.

The context

Dubbing a single video into another language normally means stitching together five or six tools: a transcription service, a translation service, a TTS vendor, a video editor to cut the new voice against the timeline, and a mixer to preserve background audio. Creators, marketers, and localization teams were spending more time managing the toolchain than producing content. And the moment a translated line read wrong, the only option was to run the whole pipeline again.

The mandate

Build an Ascent Innovate product that takes a raw video or a YouTube link and hands back a finished, dubbed MP4, with the speakers separated, the transcript editable, and a free-to-paid monetization model baked into the job itself. Ship it to production, charge for it, and use it as a proof point that we build consumer SaaS end-to-end.

The bar

Segment-level regeneration so a single bad translation does not invalidate the whole video. Gender-matched voices per speaker, not one TTS voice for everyone. Background audio preserved, not flattened. Dual billing through Stripe and Paddle so pricing works across geographies. And a preview-to-unlock flow that lets users validate output quality on a free sample before paying for the full length.

Ascent productConsumer SaaSVideo AILaunched Oct 2025

Languages

100+

Dubbed output across language pairs, speaker by speaker

Regeneration

Per segment

Edit one line of translation without reprocessing the whole video

Monetization

Preview → unlock

Free preview, same job unlocked to full length with no re-upload

How we shipped it

Seven months, from first line of code to a paying product.

What we did

End-to-end dubbing pipeline: upload or YouTube link → FFmpeg audio extraction → AWS Transcribe diarization → pluggable translation → Azure Neural TTS → audio merge with preserved background → final MP4 export with SRT and VTT
Speaker-aware editor: video player paired with segmented transcript, showing original text, translated text, speaker label, and per-side edit plus regenerate controls
Preview-to-unlock billing flow built into the job schema: short previews render fast, the same job unlocks to full length using the stored S3 reference
Pluggable translation engine routed by env flag across OpenAI, Claude, Azure Translator, and AWS Translate for cost and quality tuning per language
Dual billing surface with Stripe and Paddle webhook handlers, per-second usage metering on the free tier, and a per-job billing ledger for analytics
Cron-driven ops layer: Transcribe polling, temp media cleanup, completion emails, unviewed-transcript nudges, and a daily metrics digest

Our process

Discovery & product shape

Month 1

Defined the product surface, the pricing model, and the dubbing pipeline end-to-end before writing production code. Chose Node + Express + MongoDB for the backend and Next.js 15 + React 19 for the browser surface so a single team could own both sides.

Core dubbing pipeline

Months 2-3

Built the full audio-to-dubbed-MP4 loop on FFmpeg, AWS Transcribe diarization, pluggable translation, and Azure Neural TTS. Background audio is extracted once and re-mixed under the new voices so the finished video keeps its soundscape.

Segment editor & speaker workspace

Months 4-5

Shipped the speaker-aware workspace: segmented transcript next to the video player, per-segment regenerate controls, insert/split/delete, speaker reassignment. One bad line of translation is a one-segment re-render, not a full rerun.

Billing, ops, and launch

Months 6-7

Wired dual billing through Stripe and Paddle, added the preview-to-unlock schema, built the cron-driven ops layer for polling and cleanup, and launched dubpilotai.com to paying users.

Services covered

SaaS / MVP EngineeringAI & LLM SystemsCloud & DevOpsProduct Scaling

Under the hood

A stack tuned for heavy media and pluggable AI vendors.

Frontend

Next.js 15 (App Router)
React 19 + TypeScript
Tailwind CSS + MUI

API

Node.js
Express
TypeScript

Database

MongoDB
Mongoose

AI pipeline

AWS Transcribe (diarized STT)
Pluggable translation (OpenAI, Claude, Azure, AWS)
Azure Neural TTS (gender-matched voices)

Media pipeline

FFmpeg (audio extraction, mixing, final MP4 encode)
AWS S3 (source video + rendered output storage)
YouTube Data API (URL ingestion)
Deepgram + AssemblyAI (alternate STT engines)

Billing & auth

Stripe + Paddle (dual billing with webhooks)
Per-second usage metering on the free tier
Google OAuth + email verification

Ops layer

5 scheduled cron jobs
Transcribe polling + temp media cleanup
Completion emails + unviewed-transcript nudges
Daily metrics digest

Deployment pipeline

Deploy

Node.js + Express API • Next.js web app

Browser product paired with a heavy-media server pipeline

Configure

Env-scoped TranslationEngine flag • Per-environment AWS + Azure keys • Stripe + Paddle webhook endpoints

Operate

5 scheduled cron jobs • Transcribe polling + temp media cleanup • Daily metrics digest

Stack summary

Segment-level regeneration

Job model is segmented, not monolithic
Each dubbing job stores per-segment transcript, translation, speaker, and rendered audio. Editing one line triggers a single-segment re-render and a timeline re-stitch, not a full pipeline rerun. Users iterate on translation quality without paying full processing cost again.

Preview-to-unlock billing

Same job, two states
The preview and the full-length unlock share one job record and one S3 source reference (24h expiry). Users validate output quality on a short sample before paying, then unlock the same job to full length with no re-upload. Conversion happens inside the product.

Pluggable translation engine

Config-level translation routing
A TranslationEngine env flag routes translation per language across OpenAI, Claude, Azure Translator, and AWS Translate. Quality and cost are tuned per language pair without touching product code, and adding a new vendor is a config change.

Key integrations

AWS TranscribeAzure Neural TTSOpenAIClaudeStripePaddleYouTube Data APIFFmpeg

Built-in product surfaces

Upload + YouTube URL ingestionSegmented transcript editorPer-segment regenerateGender-matched voices per speakerBackground audio preservedMP4 + SRT + VTT export

Outcome

A live consumer SaaS at dubpilotai.com, owned by Ascent Innovate.

Shipped a production consumer product at dubpilotai.com over ~7 months, from first commit to paying users.

Replaced a six-tool dubbing workflow with one browser surface that hands back a finished MP4 in 100+ languages.

Built the preview-to-unlock flow so free-to-paid conversion happens inside the product, on the same job record.

Launched Oct 2025Ascent Innovate productConsumer SaaSVideo AI100+ languages

Feature highlights

Upload a video or paste a YouTube URL, get back a dubbed MP4 with preserved background audio
AWS Transcribe diarization with gender-matched Azure Neural TTS voices per speaker
Segment-level transcript editor with per-side edit, insert, split, delete, and speaker reassignment
Per-segment regeneration without reprocessing the rest of the video
Pluggable translation engine across OpenAI, Claude, Azure Translator, and AWS Translate
Dual billing through Stripe and Paddle with preview-to-unlock monetization on the same job
Cron-driven ops layer covering Transcribe polling, cleanup, completion emails, and a daily metrics digest

Innovations

Per-segment regeneration

The job model is segmented at the timeline. Editing a single translated line triggers a one-segment re-render and a timeline re-stitch. Users iterate on translation quality one line at a time, at one-segment processing cost.

Preview-to-unlock monetization

Preview and full unlock share one job record and one S3 source reference. Users validate output quality on a free sample, then unlock the same job to full length with no re-upload. The conversion path lives inside the product surface.

Pluggable translation engine

A TranslationEngine env flag routes translation across OpenAI, Claude, Azure Translator, and AWS Translate. Quality and cost are tuned per language pair, and adding a new vendor is a config change rather than a refactor.

Why it matters

DubPilot AI is proof that Ascent Innovate ships consumer SaaS end-to-end. We own the product, the billing, the infrastructure, and the roadmap.
The segmented job model turns translation quality from a batch problem into an iterative one. Teams can refine a dub the way writers refine a draft, one line at a time.
The preview-to-unlock flow keeps the free-to-paid conversion inside the product surface. There's no second funnel to maintain and no re-upload friction between a trial user and a paying one.

Ascent Innovate product

Built, launched, and operated by Ascent Innovate at dubpilotai.com
The same codebase that ships to clients, applied to our own consumer product
A live reference implementation for heavy-media AI SaaS at production scale

Who this is for

Product founders building a consumer AI SaaS and looking for a partner who has shipped one themselves.
Platform teams who need heavy-media AI pipelines (transcription, translation, TTS, video stitching) running end-to-end in production.
Localization and creator-economy products where segment-level control and preview-to-unlock billing move the conversion number.