On-Brand AI Avatar for Live AMAs

A live video AMA surface where the avatar answers from the protocol's own docs, not the model's imagination.

A browser-based operator platform built for a crypto/DeFi protocol's community team. The host talks on camera; an AI avatar answers back in the same tab, grounded in the protocol's knowledge base. Mic audio streams to real-time transcription, transcripts become retrieval-augmented answers, and a Tavus avatar speaks the reply on screen, so the figure on camera is reading the protocol's docs rather than improvising during a live call.

~2 months Next.js 16 · Node · MongoDB · Tavus · AssemblyAI · OpenAI · Pinecone Launched Sep 2025

The Brief

Stop the AMA avatar from going off-script.

The context

A crypto/DeFi protocol's community team runs live AMAs where the on-camera figure is an AI avatar, not a human spokesperson. Generic model replies weren't good enough: one improvised token claim on a live call would create compliance exposure and a moderation problem in public. The team needed an avatar that answers from the protocol's own documentation, in real time, without a second producer typing answers behind the camera.

The mandate

Ship a live AMA surface where the host speaks into a mic, the avatar listens, retrieves grounding from the protocol's knowledge base, and speaks a grounded reply back on camera. All in the same browser tab, all operated by a single host.

The bar

Real-time transcription fast enough to feel like conversation, retrieval grounded in the protocol's docs before any answer is spoken, and a live video surface the host can run alone without a second producer on the call.

Live AI avatarRAG-grounded answersReal-time STTCrypto/DeFi

Answer source

100% grounded

Retrieved from the protocol's KB before the avatar speaks

Pipeline

1 tab

audio → STT → RAG → avatar round-trip, end-to-end

Operator headcount

1 host

Live controls replace a second producer on the call

How we shipped it

Two months, from brief to a live AMA call.

What we did

Live AMA pipeline: browser mic → real-time STT → RAG → Tavus avatar, all in one tab
OpenAI-compatible chat endpoint wrapping RAG, consumed by Tavus Personas via webhook
RAG ingestion over the protocol's knowledge base with Pinecone and a warmed in-memory vector store
SSE control bus for video modes and per-source audio toggles, without WebRTC reconnects
Three companion desktop utilities for system-audio capture, virtual camera, and mic-routing tests
Admin and marketer roles behind JWT auth with dashboards for AMAs, KB, scheduler, and events

Our process

Discovery & architecture

Week 1-2

Mapped the live AMA call path, settled on the audio-in / avatar-out browser topology, and chose the STT, RAG, and avatar stack before writing production code.

Core live pipeline

Week 3-4

Built the live loop: browser audio over WebSockets to real-time STT, transcripts into a RAG layer over the protocol's KB, answers returned to a Tavus persona via an OpenAI-compatible webhook.

Host controls & companion tools

Week 5-6

Shipped the SSE-driven control bus so the host flips video modes and audio sources mid-call, plus three desktop utilities for system-audio capture, virtual-camera routing, and mic testing inside Zoom / Meet / Teams rehearsals.

Harden & ship

Week 7-8

JWT auth with admin / marketer role separation, dashboards for AMAs and KB management, and a warmed vector store so the first question in a live call isn't cold. Shipped to the operator team.

Services covered

AI & LLM SystemsSaaS / MVP EngineeringCloud & DevOpsProduct Scaling

Under the hood

A stack chosen for live latency and grounded answers.

Frontend

Next.js 16 (App Router)
React 19
Tailwind v4

API

Node.js
Express
TypeScript

Database

MongoDB
Mongoose

AI & Avatar

Tavus CVI (live AI avatar)
OpenAI (embeddings + answer composition)
Pinecone + warmed in-memory vector store

Live media

AssemblyAI Universal-Streaming (STT)
WebSockets (STT bridge)
SSE (host control bus)
RTMP (stream egress)

Companion utilities

Python + Tkinter (desktop utilities)
pyvirtualcam (virtual camera bridge)
soundcard / sounddevice (WASAPI loopback)
Puppeteer (headless browser automation)

Security & runtime

JWT auth (admin / marketer roles)
Model warmed at boot

Deployment pipeline

Deploy

Node.js + Express API • Next.js web app

Browser-side live surface paired with a server-side RAG + webhook stack

Configure

Environment-scoped configs • Per-environment STT and avatar keys

Govern

JWT auth • Admin / marketer role separation • Dashboards for AMAs, KB, scheduler, and events

Stack summary

RAG-grounded at the avatar webhook

Retrieved before spoken
The Tavus persona doesn't call OpenAI directly. It calls an OpenAI-compatible endpoint that retrieves from the protocol's KB first and then composes the reply, so the avatar is structurally unable to go off-script on a live call.

Dual vector layer

Warmed in-memory + Pinecone
The KB is embedded once and kept warm in an in-memory JSON vector store next to the API, with Pinecone alongside. First question in a live AMA doesn't pay cold-start latency.

Host-only live controls

SSE control bus
The host flips video modes (mix, avatar, cohost, follow) and toggles per-source audio mid-call over a server-sent-events bus. No WebRTC reconnect when the scene changes, and no second producer needed on the call.

Key integrations

AssemblyAIOpenAIPineconeTavus CVI

Built-in safeguards

KB-grounded answers onlyWarmed vector store on bootJWT auth with role separationAdmin + marketer dashboardsOne-host live operationNo WebRTC reconnect on scene change

Outcome

A live AMA surface the team runs alone, from one browser tab.

Shipped in ~8 weeks from brief to a live AMA call, production-ready by Sep 2025.

Replaced 'either a producer types answers in real time or the avatar goes off-script' with a single grounded pipeline the host operates solo.

One host, one tab: audio in, grounded avatar out, with video modes and audio sources switchable mid-call.

Shipped Sep 2025Live AI avatarRAG-groundedReal-time STTCrypto/DeFi

Feature highlights

Live mic-in / avatar-out AMAs in a single browser tab
Real-time transcription via AssemblyAI Universal-Streaming
RAG-grounded answers over the protocol's KB with Pinecone plus a warmed in-memory vector store
Tavus AI avatar wired to an OpenAI-compatible endpoint for on-brand replies
Host-side SSE control bus for video modes and audio toggling without WebRTC reconnects
Three desktop companion utilities for system-audio capture, virtual camera, and mic routing
Admin and marketer dashboards for AMAs, KB, scheduler, and events

Innovations

RAG grounded at the avatar webhook

The Tavus persona doesn't call OpenAI directly. It calls an OpenAI-compatible endpoint that retrieves from the protocol's KB first and then composes. The avatar structurally can't go off-script because the script is grounded one hop upstream of it.

Warmed dual vector layer

The KB is embedded once and kept warm in an in-memory JSON vector store next to the API, with Pinecone alongside. The first question in a live AMA doesn't pay cold-start latency.

One host, live controls

An SSE control bus lets the host flip video modes (mix, avatar, cohost, follow) and toggle per-source audio mid-call. No second producer on the call, no WebRTC reconnect when the scene changes.

Why it matters

The community team runs live AMAs solo. Host talks, avatar answers from the docs, scene changes on-the-fly, nobody else has to be on the call.
The avatar stays on-brand because its answers are retrieved from the protocol's own knowledge base before they're spoken. No free-text model output ever reaches the audience.
Latency is tuned for live conversation: real-time STT, a warmed vector store, and an OpenAI-compatible avatar webhook. The pipeline is built for a live call, not a chat window.

Shipped internally

Running inside the operator team's own platform; no public URL by design
Stack became the foundation for follow-on operator modules on the same codebase

Who this is for

Crypto, DeFi, and Web3 operator teams running live community programs who can't afford an avatar that improvises claims on camera.
Protocol and platform teams who want a one-host AMA surface grounded in their own docs, not a second producer typing answers into a chat window.