lipzink
voice-agnostic lip-sync for React

Make your avatar talk.

A notion-style talking avatar and a pure-CSS lip-sync mouth. Build a face, feed it any audio — ElevenLabs, a recording, the mic — and the mouth follows every phoneme. Then drop it into your app.

ready

That’s a phonetic pangram — every English sound. “Sample” needs no key; “Make it talk” uses ElevenLabs via /api/tts.

hi.tsx
import { Avatar } from "@lipzink/avatar"
import "@lipzink/avatar/styles.css"

// "spec" is the JSON you build & copy below
export function Hi({ spec }) {
  return <Avatar spec={spec} size={96} />
}
Step 01 · Create

Build your avatar

Click through the parts — or roll the dice. The avatar above updates live. When you like it, copy the spec: a tiny JSON blob you pass to <Avatar spec />.

Hair1/58
Glassesoff
Beardoff
Accessoryoff
Head1/16
Brows1/16
Eyes1/14
Nose1/14
Detailsoff
Step 02 · Give it a voice

Two lines to talking

The avatar is voice-agnostic — you bring the sound. With ElevenLabs timestamps the mouth lands on every phoneme on time; with any other audio it follows along live. Pick your lane:

talking.tsx
import { useRef } from "react"
import { Avatar, type AvatarVoiceHandle } from "@lipzink/avatar"
import { fetchElevenLabsSpeech } from "@lipzink/mouth"
import "@lipzink/avatar/styles.css"
import "@lipzink/mouth/styles.css"

function Talking({ spec }) {
  const voice = useRef<AvatarVoiceHandle>(null)

  async function say(text: string) {
    // /api/tts proxies ElevenLabs' /with-timestamps (key stays server-side).
    const { audio, cues } = await fetchElevenLabsSpeech("/api/tts", {
      text,
      voiceId: "21m00Tcm4TlvDq8ikWAM",
    })
    // Scheduled against the audio clock → lands on every phoneme, on time.
    voice.current?.playCues(audio, cues)
  }

  return <Avatar spec={spec} ref={voice} onClick={() => say("Hello there!")} />
}
13 shapes · every phoneme

Whatever drives the mouth — timestamps, live audio, the mic — resolves to one of these. Tap to preview.

Use case 02 · Mouth only

Just the mouth

Don’t need the whole character? @lipzink/mouth is a standalone, pure-CSS mouth — no avatar, no assets. Position it over any illustration and tint it to match.

character.tsx
import { TalkingMouth } from "@lipzink/mouth"
import "@lipzink/mouth/styles.css"

// Just the mouth — no avatar, no bundled art. Overlay it on your illustration:
function MyCharacter() {
  return (
    <div style={{ position: "relative" }}>
      <img src="/character.png" alt="" />
      <div style={{ position: "absolute", left: "50%", top: "62%",
                    transform: "translate(-50%,-50%)" }}>
        <TalkingMouth audio="/hello.mp3" scale={1.4} />
      </div>
    </div>
  )
}
Use case 03 · Bring your own face

Put a mouth on anything

The mouth is just a positioned CSS element — so it works on a photo, an illustration, even a Renaissance masterpiece. Here she is, finally able to answer the question everyone asks.

portrait.tsx
import { useRef } from "react"
import { TalkingMouth, type TalkingMouthHandle } from "@lipzink/mouth"
import "@lipzink/mouth/styles.css"

function Portrait() {
  const mouth = useRef<TalkingMouthHandle>(null)
  return (
    <div style={{ position: "relative" }}>
      <img src="/mona-lisa.png" alt="Mona Lisa" />
      {/* Position + tint the mouth to match YOUR art */}
      <div style={{ position: "absolute", left: "50%", top: "56%",
                    transform: "translate(-50%,-50%)" }}>
        <TalkingMouth ref={mouth} scale={0.7} cavity="#7a1f1f" tongue="#c75c5c" />
      </div>
      <button onClick={() => mouth.current?.play("/hello.mp3")}>Speak</button>
    </div>
  )
}
Everything else

Built to drop in

Small, typed, and unopinionated about your stack.

Voice-agnostic

Bring any audio — ElevenLabs, OpenAI TTS, a recording, or the live microphone. No TTS is baked in.

On-time lip-sync

ElevenLabs timestamps schedule visemes against the audio clock, so the mouth lands on each phoneme instead of trailing it.

Three drivers

Scheduled cues, live audio analysis, or swap in your own — one headless useLipsync hook behind them all.

13 visemes

Every phoneme maps to a CSS mouth shape. Mappings ship for ElevenLabs, Azure visemes, plain text, and live audio.

Pure-CSS mouth

No canvas, no WebGL. The mouth is a positioned, tintable CSS element you can drop over any illustration.

Notion-style avatar

Hundreds of bundled SVG parts, a randomizer, and a copy-paste spec — the whole character travels as JSON.

Tiny & typed

React 19, full TypeScript, no heavy dependencies. Two packages you can adopt together or apart.

Headless option

useLipsync() hands you { shape, amplitude, status } so you can build entirely custom visuals on top.