Overview

The Voice API manages voice samples and generates voice embeddings for text-to-speech. Upload a voice recording, and the service automatically splits it into samples and creates embeddings via a processing pipeline.

Typical Flow

Upload a voice recording to create a new voice
The recording is automatically split into opus-encoded samples
Embedding is generated from the samples via RunPod
Retrieve test samples and quality scores to verify the result
Download the embedding zip or individual samples as needed

Features

Automatic processing: Upload an audio file and splitting + embedding run automatically
Manual sample upload: Create an empty voice and upload individual samples
Re-splitting: Re-process the original recording with custom parameters (sample count, duration)
Quality scoring: Each sample receives a similarity score after embedding
Test samples: TTS-generated test audio to preview voice quality
Sample bundles: Download all samples as a zip archive with metadata

Important Considerations

Audio conversion: All uploads are automatically converted to opus 96kbps mono
Processing locks: Voices cannot be modified while splitting or embedding is in progress
Rate limits: Requests are rate-limited per API key

Typical Flow​

Features​

Typical Flow

Features