Overview
The Voice API manages voice samples and generates voice embeddings for text-to-speech. Upload a voice recording, and the service automatically splits it into samples and creates embeddings via a processing pipeline.
Typical Flow
- Upload a voice recording to create a new voice
- The recording is automatically split into opus-encoded samples
- Embedding is generated from the samples via RunPod
- Retrieve test samples and quality scores to verify the result
- Download the embedding zip or individual samples as needed
Features
- Automatic processing: Upload an audio file and splitting + embedding run automatically
- Manual sample upload: Create an empty voice and upload individual samples
- Re-splitting: Re-process the original recording with custom parameters (sample count, duration)
- Quality scoring: Each sample receives a similarity score after embedding
- Test samples: TTS-generated test audio to preview voice quality
- Sample bundles: Download all samples as a zip archive with metadata
Important Considerations
- Audio conversion: All uploads are automatically converted to opus 96kbps mono
- Processing locks: Voices cannot be modified while splitting or embedding is in progress
- Rate limits: Requests are rate-limited per API key