DubPilot AI

Dub any video into 100+ languages without opening a video editor.

Built and launched as an Ascent Innovate product, live at dubpilotai.com. Users upload a video or paste a YouTube link. The platform extracts audio, transcribes and diarizes speakers, translates the transcript, regenerates voiceovers with gender-matched neural TTS, and stitches a finished MP4 back together with original background audio preserved. The whole workflow lives in one browser surface, with segment-level editing and a preview-to-unlock billing model built into the schema. We own the product, the roadmap, and the codebase, and we ship and operate it ourselves at production scale.

~7 months Next.js 15 · React 19 · Node · MongoDB · FFmpeg · AWS · Azure · Stripe · Paddle Launched Oct 2025
The Brief

Turn dubbing from a six-tool workflow into one browser surface.

The context

Dubbing a single video into another language normally means stitching together five or six tools: a transcription service, a translation service, a TTS vendor, a video editor to cut the new voice against the timeline, and a mixer to preserve background audio. Creators, marketers, and localization teams were spending more time managing the toolchain than producing content. And the moment a translated line read wrong, the only option was to run the whole pipeline again.

The mandate

Build an Ascent Innovate product that takes a raw video or a YouTube link and hands back a finished, dubbed MP4, with the speakers separated, the transcript editable, and a free-to-paid monetization model baked into the job itself. Ship it to production, charge for it, and use it as a proof point that we build consumer SaaS end-to-end.

The bar

Segment-level regeneration so a single bad translation does not invalidate the whole video. Gender-matched voices per speaker, not one TTS voice for everyone. Background audio preserved, not flattened. Dual billing through Stripe and Paddle so pricing works across geographies. And a preview-to-unlock flow that lets users validate output quality on a free sample before paying for the full length.

Ascent productConsumer SaaSVideo AILaunched Oct 2025
Languages
100+
Dubbed output across language pairs, speaker by speaker
Regeneration
Per segment
Edit one line of translation without reprocessing the whole video
Monetization
Preview → unlock
Free preview, same job unlocked to full length with no re-upload
How we shipped it

Seven months, from first line of code to a paying product.

What we did

  • End-to-end dubbing pipeline: upload or YouTube link → FFmpeg audio extraction → AWS Transcribe diarization → pluggable translation → Azure Neural TTS → audio merge with preserved background → final MP4 export with SRT and VTT
  • Speaker-aware editor: video player paired with segmented transcript, showing original text, translated text, speaker label, and per-side edit plus regenerate controls
  • Preview-to-unlock billing flow built into the job schema: short previews render fast, the same job unlocks to full length using the stored S3 reference
  • Pluggable translation engine routed by env flag across OpenAI, Claude, Azure Translator, and AWS Translate for cost and quality tuning per language
  • Dual billing surface with Stripe and Paddle webhook handlers, per-second usage metering on the free tier, and a per-job billing ledger for analytics
  • Cron-driven ops layer: Transcribe polling, temp media cleanup, completion emails, unviewed-transcript nudges, and a daily metrics digest

Our process

Discovery & product shape
Month 1

Defined the product surface, the pricing model, and the dubbing pipeline end-to-end before writing production code. Chose Node + Express + MongoDB for the backend and Next.js 15 + React 19 for the browser surface so a single team could own both sides.

Core dubbing pipeline
Months 2-3

Built the full audio-to-dubbed-MP4 loop on FFmpeg, AWS Transcribe diarization, pluggable translation, and Azure Neural TTS. Background audio is extracted once and re-mixed under the new voices so the finished video keeps its soundscape.

Segment editor & speaker workspace
Months 4-5

Shipped the speaker-aware workspace: segmented transcript next to the video player, per-segment regenerate controls, insert/split/delete, speaker reassignment. One bad line of translation is a one-segment re-render, not a full rerun.

Billing, ops, and launch
Months 6-7

Wired dual billing through Stripe and Paddle, added the preview-to-unlock schema, built the cron-driven ops layer for polling and cleanup, and launched dubpilotai.com to paying users.

Services covered

SaaS / MVP EngineeringAI & LLM SystemsCloud & DevOpsProduct Scaling
Under the hood

A stack tuned for heavy media and pluggable AI vendors.

Frontend
  • Next.js 15 (App Router)
  • React 19 + TypeScript
  • Tailwind CSS + MUI
API
  • Node.js
  • Express
  • TypeScript
Database
  • MongoDB
  • Mongoose
AI pipeline
  • AWS Transcribe (diarized STT)
  • Pluggable translation (OpenAI, Claude, Azure, AWS)
  • Azure Neural TTS (gender-matched voices)
Media pipeline
  • FFmpeg (audio extraction, mixing, final MP4 encode)
  • AWS S3 (source video + rendered output storage)
  • YouTube Data API (URL ingestion)
  • Deepgram + AssemblyAI (alternate STT engines)
Billing & auth
  • Stripe + Paddle (dual billing with webhooks)
  • Per-second usage metering on the free tier
  • Google OAuth + email verification
Ops layer
  • 5 scheduled cron jobs
  • Transcribe polling + temp media cleanup
  • Completion emails + unviewed-transcript nudges
  • Daily metrics digest

Deployment pipeline

Deploy
Node.js + Express API • Next.js web app
Browser product paired with a heavy-media server pipeline
Configure
Env-scoped TranslationEngine flag • Per-environment AWS + Azure keys • Stripe + Paddle webhook endpoints
Operate
5 scheduled cron jobs • Transcribe polling + temp media cleanup • Daily metrics digest

Stack summary

Segment-level regeneration
  • Job model is segmented, not monolithic
    Each dubbing job stores per-segment transcript, translation, speaker, and rendered audio. Editing one line triggers a single-segment re-render and a timeline re-stitch, not a full pipeline rerun. Users iterate on translation quality without paying full processing cost again.
Preview-to-unlock billing
  • Same job, two states
    The preview and the full-length unlock share one job record and one S3 source reference (24h expiry). Users validate output quality on a short sample before paying, then unlock the same job to full length with no re-upload. Conversion happens inside the product.
Pluggable translation engine
  • Config-level translation routing
    A TranslationEngine env flag routes translation per language across OpenAI, Claude, Azure Translator, and AWS Translate. Quality and cost are tuned per language pair without touching product code, and adding a new vendor is a config change.

Key integrations

AWS TranscribeAzure Neural TTSOpenAIClaudeStripePaddleYouTube Data APIFFmpeg

Built-in product surfaces

Upload + YouTube URL ingestionSegmented transcript editorPer-segment regenerateGender-matched voices per speakerBackground audio preservedMP4 + SRT + VTT export
Outcome

A live consumer SaaS at dubpilotai.com, owned by Ascent Innovate.

Shipped a production consumer product at dubpilotai.com over ~7 months, from first commit to paying users.

Replaced a six-tool dubbing workflow with one browser surface that hands back a finished MP4 in 100+ languages.

Built the preview-to-unlock flow so free-to-paid conversion happens inside the product, on the same job record.

Launched Oct 2025Ascent Innovate productConsumer SaaSVideo AI100+ languages
Feature highlights
  • Upload a video or paste a YouTube URL, get back a dubbed MP4 with preserved background audio
  • AWS Transcribe diarization with gender-matched Azure Neural TTS voices per speaker
  • Segment-level transcript editor with per-side edit, insert, split, delete, and speaker reassignment
  • Per-segment regeneration without reprocessing the rest of the video
  • Pluggable translation engine across OpenAI, Claude, Azure Translator, and AWS Translate
  • Dual billing through Stripe and Paddle with preview-to-unlock monetization on the same job
  • Cron-driven ops layer covering Transcribe polling, cleanup, completion emails, and a daily metrics digest

Innovations

Per-segment regeneration

The job model is segmented at the timeline. Editing a single translated line triggers a one-segment re-render and a timeline re-stitch. Users iterate on translation quality one line at a time, at one-segment processing cost.

Preview-to-unlock monetization

Preview and full unlock share one job record and one S3 source reference. Users validate output quality on a free sample, then unlock the same job to full length with no re-upload. The conversion path lives inside the product surface.

Pluggable translation engine

A TranslationEngine env flag routes translation across OpenAI, Claude, Azure Translator, and AWS Translate. Quality and cost are tuned per language pair, and adding a new vendor is a config change rather than a refactor.

Why it matters
  • DubPilot AI is proof that Ascent Innovate ships consumer SaaS end-to-end. We own the product, the billing, the infrastructure, and the roadmap.
  • The segmented job model turns translation quality from a batch problem into an iterative one. Teams can refine a dub the way writers refine a draft, one line at a time.
  • The preview-to-unlock flow keeps the free-to-paid conversion inside the product surface. There's no second funnel to maintain and no re-upload friction between a trial user and a paying one.
Ascent Innovate product
  • Built, launched, and operated by Ascent Innovate at dubpilotai.com
  • The same codebase that ships to clients, applied to our own consumer product
  • A live reference implementation for heavy-media AI SaaS at production scale
Who this is for
  • Product founders building a consumer AI SaaS and looking for a partner who has shipped one themselves.
  • Platform teams who need heavy-media AI pipelines (transcription, translation, TTS, video stitching) running end-to-end in production.
  • Localization and creator-economy products where segment-level control and preview-to-unlock billing move the conversion number.