How can I create a custom voice?
Navigate to Assets โ Voices on the left menu, and click on the Clone your voice button
โ or go directly to the recording page.
Here, you can also set up the name of the voice and the voice gender - which can influence which avatars can be paired with your new voice:
How should I prepare for the recording?
Schedule a time to record when you won't be interrupted or distracted. Allow enough time to record several takes if necessary. Choose a quiet and distraction-free recording area. Avoid recording in places with a lot of background noise, such as busy streets or near construction sites. Background noise like air-conditioning or people chattering, will create worse results.
Do I need special equipment?
For the best voice quality, we recommend using a high-quality microphone. Even an affordable condenser microphone can significantly improve audio clarity and overall sound quality.
How long should the recording be?
For optimal results, record an audio file that is approximately 5 minutes long and close to the 10MB file size limit, ensuring sufficient recorded content while staying within the allowed size.
Will the custom voice have my accent?
No. Our custom voice technology is currently unable to capture or control intonation, accents, and speed.
Can I record in a different language?
Yes, we have scripts available in multiple languages.
On the recording page: after selecting "Record your voice," you can change the script language:
How can I adjust my custom voice?
Stability: Increasing stability helps to keep a consistent quality for the voice in longer scenes, but makes it less expressive. High stability is recommended for most applications to maintain a consistent tone and quality. However, reducing stability can sometimes introduce interesting variations, though it may also lead to less predictable results.
Similarity: A high similarity means the custom voice will sound very much like the original voice, while a low similarity results in a more generic voice that captures fewer distinctive traits of the original.
Style exaggeration: Picks up characteristics of the original recording (expressiveness, monotony, longer pauses, etc.) and exaggerates them when the value is increased. It is generally recommended to keep this setting at 0 to avoid over-exaggeration unless you specifically want a more dramatic effect. Higher exaggeration can also increase latency and may lead to instability with lower-quality input voices.
Can I upload an audio file instead of a recording?
Yes, you can upload a good-quality audio file instead of recording. The size limit is 10MB and we recommend recording 5 minutes of audio.
(Cloning your voice requires a sample. Sample quality is more important than quantity. Noisy samples may result in bad voices. Providing about 5 minutes of audio gives enough to process your voice. You can expect your cloned voice to be ready in 5-10 minutes.)
Supported languages
Languages Supported
- ๐ฆ๐ช Arabic (UAE)
- ๐ง๐ฌ Bulgarian
- ๐จ๐ณ Chinese
- ๐ญ๐ท Croatian
- ๐จ๐ฟ Czech
- ๐ฉ๐ฐ Danish
- ๐ณ๐ฑ Dutch
- ๐ฌ๐ง English (UK)
- ๐บ๐ธ English (US)
- ๐ซ๐ฎ Finnish
- ๐ซ๐ท French (France)
- ๐ฉ๐ช German
- ๐ฌ๐ท Greek
- ๐ฎ๐ณ Hindi
- ๐ญ๐บ Hungarian
- ๐ฎ๐ฉ Indonesian
- ๐ฎ๐น Italian
- ๐ฏ๐ต Japanese
- ๐ฐ๐ท Korean
- ๐ฒ๐พ Malay
- ๐ณ๐ด Norwegian
- ๐ต๐ฑ Polish
- ๐ต๐น Portuguese (Portugal)
- ๐ท๐ด Romanian
- ๐ท๐บ Russian
- ๐ธ๐ฐ Slovak
- ๐ช๐ธ Spanish (Spain)
- ๐ธ๐ช Swedish
- ๐ฎ๐ณ Tamil
- ๐น๐ท Turkish
- ๐บ๐ฆ Ukrainian
- ๐ป๐ณ Vietnamese