On-Brand AI Avatar for Live AMAs

A live video AMA surface where the avatar answers from the protocol's own docs, not the model's imagination.

A browser-based operator platform built for a crypto/DeFi protocol's community team. The host talks on camera; an AI avatar answers back in the same tab, grounded in the protocol's knowledge base. Mic audio streams to real-time transcription, transcripts become retrieval-augmented answers, and a Tavus avatar speaks the reply on screen, so the figure on camera is reading the protocol's docs rather than improvising during a live call.

~2 months Next.js 16 · Node · MongoDB · Tavus · AssemblyAI · OpenAI · Pinecone Launched Sep 2025
The Brief

Stop the AMA avatar from going off-script.

The context

A crypto/DeFi protocol's community team runs live AMAs where the on-camera figure is an AI avatar, not a human spokesperson. Generic model replies weren't good enough: one improvised token claim on a live call would create compliance exposure and a moderation problem in public. The team needed an avatar that answers from the protocol's own documentation, in real time, without a second producer typing answers behind the camera.

The mandate

Ship a live AMA surface where the host speaks into a mic, the avatar listens, retrieves grounding from the protocol's knowledge base, and speaks a grounded reply back on camera. All in the same browser tab, all operated by a single host.

The bar

Real-time transcription fast enough to feel like conversation, retrieval grounded in the protocol's docs before any answer is spoken, and a live video surface the host can run alone without a second producer on the call.

Live AI avatarRAG-grounded answersReal-time STTCrypto/DeFi
Answer source
100% grounded
Retrieved from the protocol's KB before the avatar speaks
Pipeline
1 tab
audio → STT → RAG → avatar round-trip, end-to-end
Operator headcount
1 host
Live controls replace a second producer on the call
How we shipped it

Two months, from brief to a live AMA call.

What we did

  • Live AMA pipeline: browser mic → real-time STT → RAG → Tavus avatar, all in one tab
  • OpenAI-compatible chat endpoint wrapping RAG, consumed by Tavus Personas via webhook
  • RAG ingestion over the protocol's knowledge base with Pinecone and a warmed in-memory vector store
  • SSE control bus for video modes and per-source audio toggles, without WebRTC reconnects
  • Three companion desktop utilities for system-audio capture, virtual camera, and mic-routing tests
  • Admin and marketer roles behind JWT auth with dashboards for AMAs, KB, scheduler, and events

Our process

Discovery & architecture
Week 1-2

Mapped the live AMA call path, settled on the audio-in / avatar-out browser topology, and chose the STT, RAG, and avatar stack before writing production code.

Core live pipeline
Week 3-4

Built the live loop: browser audio over WebSockets to real-time STT, transcripts into a RAG layer over the protocol's KB, answers returned to a Tavus persona via an OpenAI-compatible webhook.

Host controls & companion tools
Week 5-6

Shipped the SSE-driven control bus so the host flips video modes and audio sources mid-call, plus three desktop utilities for system-audio capture, virtual-camera routing, and mic testing inside Zoom / Meet / Teams rehearsals.

Harden & ship
Week 7-8

JWT auth with admin / marketer role separation, dashboards for AMAs and KB management, and a warmed vector store so the first question in a live call isn't cold. Shipped to the operator team.

Services covered

AI & LLM SystemsSaaS / MVP EngineeringCloud & DevOpsProduct Scaling
Under the hood

A stack chosen for live latency and grounded answers.

Frontend
  • Next.js 16 (App Router)
  • React 19
  • Tailwind v4
API
  • Node.js
  • Express
  • TypeScript
Database
  • MongoDB
  • Mongoose
AI & Avatar
  • Tavus CVI (live AI avatar)
  • OpenAI (embeddings + answer composition)
  • Pinecone + warmed in-memory vector store
Live media
  • AssemblyAI Universal-Streaming (STT)
  • WebSockets (STT bridge)
  • SSE (host control bus)
  • RTMP (stream egress)
Companion utilities
  • Python + Tkinter (desktop utilities)
  • pyvirtualcam (virtual camera bridge)
  • soundcard / sounddevice (WASAPI loopback)
  • Puppeteer (headless browser automation)
Security & runtime
  • JWT auth (admin / marketer roles)
  • Model warmed at boot

Deployment pipeline

Deploy
Node.js + Express API • Next.js web app
Browser-side live surface paired with a server-side RAG + webhook stack
Configure
Environment-scoped configs • Per-environment STT and avatar keys
Govern
JWT auth • Admin / marketer role separation • Dashboards for AMAs, KB, scheduler, and events

Stack summary

RAG-grounded at the avatar webhook
  • Retrieved before spoken
    The Tavus persona doesn't call OpenAI directly. It calls an OpenAI-compatible endpoint that retrieves from the protocol's KB first and then composes the reply, so the avatar is structurally unable to go off-script on a live call.
Dual vector layer
  • Warmed in-memory + Pinecone
    The KB is embedded once and kept warm in an in-memory JSON vector store next to the API, with Pinecone alongside. First question in a live AMA doesn't pay cold-start latency.
Host-only live controls
  • SSE control bus
    The host flips video modes (mix, avatar, cohost, follow) and toggles per-source audio mid-call over a server-sent-events bus. No WebRTC reconnect when the scene changes, and no second producer needed on the call.

Key integrations

AssemblyAIOpenAIPineconeTavus CVI

Built-in safeguards

KB-grounded answers onlyWarmed vector store on bootJWT auth with role separationAdmin + marketer dashboardsOne-host live operationNo WebRTC reconnect on scene change
Outcome

A live AMA surface the team runs alone, from one browser tab.

Shipped in ~8 weeks from brief to a live AMA call, production-ready by Sep 2025.

Replaced 'either a producer types answers in real time or the avatar goes off-script' with a single grounded pipeline the host operates solo.

One host, one tab: audio in, grounded avatar out, with video modes and audio sources switchable mid-call.

Shipped Sep 2025Live AI avatarRAG-groundedReal-time STTCrypto/DeFi
Feature highlights
  • Live mic-in / avatar-out AMAs in a single browser tab
  • Real-time transcription via AssemblyAI Universal-Streaming
  • RAG-grounded answers over the protocol's KB with Pinecone plus a warmed in-memory vector store
  • Tavus AI avatar wired to an OpenAI-compatible endpoint for on-brand replies
  • Host-side SSE control bus for video modes and audio toggling without WebRTC reconnects
  • Three desktop companion utilities for system-audio capture, virtual camera, and mic routing
  • Admin and marketer dashboards for AMAs, KB, scheduler, and events

Innovations

RAG grounded at the avatar webhook

The Tavus persona doesn't call OpenAI directly. It calls an OpenAI-compatible endpoint that retrieves from the protocol's KB first and then composes. The avatar structurally can't go off-script because the script is grounded one hop upstream of it.

Warmed dual vector layer

The KB is embedded once and kept warm in an in-memory JSON vector store next to the API, with Pinecone alongside. The first question in a live AMA doesn't pay cold-start latency.

One host, live controls

An SSE control bus lets the host flip video modes (mix, avatar, cohost, follow) and toggle per-source audio mid-call. No second producer on the call, no WebRTC reconnect when the scene changes.

Why it matters
  • The community team runs live AMAs solo. Host talks, avatar answers from the docs, scene changes on-the-fly, nobody else has to be on the call.
  • The avatar stays on-brand because its answers are retrieved from the protocol's own knowledge base before they're spoken. No free-text model output ever reaches the audience.
  • Latency is tuned for live conversation: real-time STT, a warmed vector store, and an OpenAI-compatible avatar webhook. The pipeline is built for a live call, not a chat window.
Shipped internally
  • Running inside the operator team's own platform; no public URL by design
  • Stack became the foundation for follow-on operator modules on the same codebase
Who this is for
  • Crypto, DeFi, and Web3 operator teams running live community programs who can't afford an avatar that improvises claims on camera.
  • Protocol and platform teams who want a one-host AMA surface grounded in their own docs, not a second producer typing answers into a chat window.