Dub any video into 100+ languages without opening a video editor.
Built and launched as an Ascent Innovate product, live at dubpilotai.com. Users upload a video or paste a YouTube link. The platform extracts audio, transcribes and diarizes speakers, translates the transcript, regenerates voiceovers with gender-matched neural TTS, and stitches a finished MP4 back together with original background audio preserved. The whole workflow lives in one browser surface, with segment-level editing and a preview-to-unlock billing model built into the schema. We own the product, the roadmap, and the codebase, and we ship and operate it ourselves at production scale.
Dubbing a single video into another language normally means stitching together five or six tools: a transcription service, a translation service, a TTS vendor, a video editor to cut the new voice against the timeline, and a mixer to preserve background audio. Creators, marketers, and localization teams were spending more time managing the toolchain than producing content. And the moment a translated line read wrong, the only option was to run the whole pipeline again.
Build an Ascent Innovate product that takes a raw video or a YouTube link and hands back a finished, dubbed MP4, with the speakers separated, the transcript editable, and a free-to-paid monetization model baked into the job itself. Ship it to production, charge for it, and use it as a proof point that we build consumer SaaS end-to-end.
Segment-level regeneration so a single bad translation does not invalidate the whole video. Gender-matched voices per speaker, not one TTS voice for everyone. Background audio preserved, not flattened. Dual billing through Stripe and Paddle so pricing works across geographies. And a preview-to-unlock flow that lets users validate output quality on a free sample before paying for the full length.
Defined the product surface, the pricing model, and the dubbing pipeline end-to-end before writing production code. Chose Node + Express + MongoDB for the backend and Next.js 15 + React 19 for the browser surface so a single team could own both sides.
Built the full audio-to-dubbed-MP4 loop on FFmpeg, AWS Transcribe diarization, pluggable translation, and Azure Neural TTS. Background audio is extracted once and re-mixed under the new voices so the finished video keeps its soundscape.
Shipped the speaker-aware workspace: segmented transcript next to the video player, per-segment regenerate controls, insert/split/delete, speaker reassignment. One bad line of translation is a one-segment re-render, not a full rerun.
Wired dual billing through Stripe and Paddle, added the preview-to-unlock schema, built the cron-driven ops layer for polling and cleanup, and launched dubpilotai.com to paying users.
Shipped a production consumer product at dubpilotai.com over ~7 months, from first commit to paying users.
Replaced a six-tool dubbing workflow with one browser surface that hands back a finished MP4 in 100+ languages.
Built the preview-to-unlock flow so free-to-paid conversion happens inside the product, on the same job record.
The job model is segmented at the timeline. Editing a single translated line triggers a one-segment re-render and a timeline re-stitch. Users iterate on translation quality one line at a time, at one-segment processing cost.
Preview and full unlock share one job record and one S3 source reference. Users validate output quality on a free sample, then unlock the same job to full length with no re-upload. The conversion path lives inside the product surface.
A TranslationEngine env flag routes translation across OpenAI, Claude, Azure Translator, and AWS Translate. Quality and cost are tuned per language pair, and adding a new vendor is a config change rather than a refactor.