Skip to main content
All examples use the base URL https://api.hedra.com/web-app/public and require an X-API-Key header.
export HEDRA_API_KEY="your_api_key"
Avatar videos are driven by audio — the character in the image will lip-sync and move to the provided audio. Two models are available:
ModelIDBest for
Hedra Avatar26f0fc66-152b-40ab-abed-76c43df99bc8Talking-head videos, lip-sync, up to 10 minutes
Hedra Omniaab372b84-432f-44f5-bacc-c2542465f712Motion avatar videos with full-body movement, up to 8s
Both models require an image and audio input.

Step 1: Upload your audio

The audio determines the video length. For a 10-minute video, provide ~10 minutes of audio.
# Create the asset record
curl -X POST https://api.hedra.com/web-app/public/assets \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -d '{
    "name": "ten-minute-audio.mp3",
    "type": "audio"
  }'

# Upload the file
curl -X POST https://api.hedra.com/web-app/public/assets/{audio_asset_id}/upload \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -F "file=@/path/to/ten-minute-audio.mp3"

Step 2: Upload your portrait image

curl -X POST https://api.hedra.com/web-app/public/assets \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -d '{
    "name": "portrait.png",
    "type": "image"
  }'

curl -X POST https://api.hedra.com/web-app/public/assets/{image_asset_id}/upload \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -F "file=@/path/to/portrait.png"

Step 3: Generate the avatar video

Use the Hedra Avatar model (26f0fc66-152b-40ab-abed-76c43df99bc8):
curl -X POST https://api.hedra.com/web-app/public/generations \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -d '{
    "type": "video",
    "ai_model_id": "26f0fc66-152b-40ab-abed-76c43df99bc8",
    "start_keyframe_id": "{image_asset_id}",
    "audio_id": "{audio_asset_id}",
    "generated_video_inputs": {
      "text_prompt": "A person speaking to the camera",
      "aspect_ratio": "9:16",
      "resolution": "720p",
      "duration_ms": 600000
    }
  }'
Set duration_ms to 600000 for 10 minutes (600 seconds = 600,000 ms). The video duration will match your audio length.

Using Hedra Omnia

For motion avatar videos with full-body movement, swap the model ID to Hedra Omnia:
curl -X POST https://api.hedra.com/web-app/public/generations \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -d '{
    "type": "video",
    "ai_model_id": "ab372b84-432f-44f5-bacc-c2542465f712",
    "start_keyframe_id": "{image_asset_id}",
    "audio_id": "{audio_asset_id}",
    "generated_video_inputs": {
      "text_prompt": "A person gesturing expressively while speaking",
      "aspect_ratio": "9:16",
      "resolution": "720p",
      "duration_ms": 8000
    }
  }'

Inline audio generation

Instead of uploading audio separately, you can generate speech inline by passing audio_generation instead of audio_id. To find a voice_id, list the available voices:
curl https://api.hedra.com/web-app/public/voices \
  -H "X-API-Key: $HEDRA_API_KEY"
See the Generate Audio guide for the full list of voice options, including voice cloning.
curl -X POST https://api.hedra.com/web-app/public/generations \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $HEDRA_API_KEY" \
  -d '{
    "type": "video",
    "ai_model_id": "26f0fc66-152b-40ab-abed-76c43df99bc8",
    "start_keyframe_id": "{image_asset_id}",
    "audio_generation": {
      "type": "text_to_speech",
      "voice_id": "f412c62f-e94f-41c0-bfc6-97f63289941c",
      "text": "Hello, this is a demo of inline audio generation for avatar videos."
    },
    "generated_video_inputs": {
      "text_prompt": "A person speaking to the camera",
      "aspect_ratio": "9:16",
      "resolution": "540p"
    }
  }'

Step 4: Poll for completion

Long-form avatar videos take longer to generate. Check progress (0-1) and eta_sec for estimates:
curl https://api.hedra.com/web-app/public/generations/{generation_id}/status \
  -H "X-API-Key: $HEDRA_API_KEY"
When status is "complete", the response includes an asset_id for the generated video.