A live video AMA surface where the avatar answers from the protocol's own docs, not the model's imagination.
A browser-based operator platform built for a crypto/DeFi protocol's community team. The host talks on camera; an AI avatar answers back in the same tab, grounded in the protocol's knowledge base. Mic audio streams to real-time transcription, transcripts become retrieval-augmented answers, and a Tavus avatar speaks the reply on screen, so the figure on camera is reading the protocol's docs rather than improvising during a live call.
A crypto/DeFi protocol's community team runs live AMAs where the on-camera figure is an AI avatar, not a human spokesperson. Generic model replies weren't good enough: one improvised token claim on a live call would create compliance exposure and a moderation problem in public. The team needed an avatar that answers from the protocol's own documentation, in real time, without a second producer typing answers behind the camera.
Ship a live AMA surface where the host speaks into a mic, the avatar listens, retrieves grounding from the protocol's knowledge base, and speaks a grounded reply back on camera. All in the same browser tab, all operated by a single host.
Real-time transcription fast enough to feel like conversation, retrieval grounded in the protocol's docs before any answer is spoken, and a live video surface the host can run alone without a second producer on the call.
Mapped the live AMA call path, settled on the audio-in / avatar-out browser topology, and chose the STT, RAG, and avatar stack before writing production code.
Built the live loop: browser audio over WebSockets to real-time STT, transcripts into a RAG layer over the protocol's KB, answers returned to a Tavus persona via an OpenAI-compatible webhook.
Shipped the SSE-driven control bus so the host flips video modes and audio sources mid-call, plus three desktop utilities for system-audio capture, virtual-camera routing, and mic testing inside Zoom / Meet / Teams rehearsals.
JWT auth with admin / marketer role separation, dashboards for AMAs and KB management, and a warmed vector store so the first question in a live call isn't cold. Shipped to the operator team.
Shipped in ~8 weeks from brief to a live AMA call, production-ready by Sep 2025.
Replaced 'either a producer types answers in real time or the avatar goes off-script' with a single grounded pipeline the host operates solo.
One host, one tab: audio in, grounded avatar out, with video modes and audio sources switchable mid-call.
The Tavus persona doesn't call OpenAI directly. It calls an OpenAI-compatible endpoint that retrieves from the protocol's KB first and then composes. The avatar structurally can't go off-script because the script is grounded one hop upstream of it.
The KB is embedded once and kept warm in an in-memory JSON vector store next to the API, with Pinecone alongside. The first question in a live AMA doesn't pay cold-start latency.
An SSE control bus lets the host flip video modes (mix, avatar, cohost, follow) and toggle per-source audio mid-call. No second producer on the call, no WebRTC reconnect when the scene changes.