How can I create a custom voice?

Navigate to Assets → Voices on the left menu, and click on the Clone your voice button

— or go directly to the recording page.

Here, you can also set up the name of the voice and the voice gender - which can influence which avatars can be paired with your new voice:

How should I prepare for the recording?

Schedule a time to record when you won't be interrupted or distracted. Allow enough time to record several takes if necessary. Choose a quiet and distraction-free recording area. Avoid recording in places with a lot of background noise, such as busy streets or near construction sites. Background noise like air-conditioning or people chattering, will create worse results.

Do I need special equipment?

For the best voice quality, we recommend using a high-quality microphone. Even an affordable condenser microphone can significantly improve audio clarity and overall sound quality.

How long should the recording be?

For optimal results, record an audio file that is approximately 5 minutes long and close to the 10MB file size limit, ensuring sufficient recorded content while staying within the allowed size.

Will the custom voice have my accent?

No. Our custom voice technology is currently unable to capture or control intonation, accents, and speed.

Can I record in a different language?

Yes, we have scripts available in multiple languages.

On the recording page: after selecting "Record your voice," you can change the script language:

How can I adjust my custom voice?

Stability: Increasing stability helps to keep a consistent quality for the voice in longer scenes, but makes it less expressive. High stability is recommended for most applications to maintain a consistent tone and quality. However, reducing stability can sometimes introduce interesting variations, though it may also lead to less predictable results.

Similarity: A high similarity means the custom voice will sound very much like the original voice, while a low similarity results in a more generic voice that captures fewer distinctive traits of the original.

Style exaggeration: Picks up characteristics of the original recording (expressiveness, monotony, longer pauses, etc.) and exaggerates them when the value is increased. It is generally recommended to keep this setting at 0 to avoid over-exaggeration unless you specifically want a more dramatic effect. Higher exaggeration can also increase latency and may lead to instability with lower-quality input voices.

Can I upload an audio file instead of a recording?

Yes, you can upload a good-quality audio file instead of recording. The size limit is 10MB and we recommend recording 5 minutes of audio.

(Cloning your voice requires a sample. Sample quality is more important than quantity. Noisy samples may result in bad voices. Providing about 5 minutes of audio gives enough to process your voice. You can expect your cloned voice to be ready in 5-10 minutes.)

Supported languages

Languages Supported

🇦🇪 Arabic (UAE)
🇧🇬 Bulgarian
🇨🇳 Chinese
🇭🇷 Croatian
🇨🇿 Czech
🇩🇰 Danish
🇳🇱 Dutch
🇬🇧 English (UK)
🇺🇸 English (US)
🇫🇮 Finnish
🇫🇷 French (France)
🇩🇪 German
🇬🇷 Greek
🇮🇳 Hindi
🇭🇺 Hungarian
🇮🇩 Indonesian
🇮🇹 Italian
🇯🇵 Japanese
🇰🇷 Korean
🇲🇾 Malay
🇳🇴 Norwegian
🇵🇱 Polish
🇵🇹 Portuguese (Portugal)
🇷🇴 Romanian
🇷🇺 Russian
🇸🇰 Slovak
🇪🇸 Spanish (Spain)
🇸🇪 Swedish
🇮🇳 Tamil
🇹🇷 Turkish
🇺🇦 Ukrainian
🇻🇳 Vietnamese

Who has access to my custom voices?

Cloned, aka Custom Voices are owned by the user who creates them.

Cloned voices are shared in the Workspace they are created in.

Cloned Voices can be shared with other Workspaces.

Cloned Voices can be unshared with a Workspace.

Cloned Voices must be shared in at least one Workspace at all times.

If a Cloned Voice is shared in a Workspace, members can use that Cloned Voice in videos.