This service clones voices in five seconds and converts text into speech across four languages. Users upload a short audio sample and the system generates a synthetic version that matches the original speaker's tone and cadence. English, Chinese, Japanese, and Korean all work with the cross-language feature, which lets you input text in one language while maintaining the cloned voice characteristics from another.
Voice similarity hits 70.5% on the free tier. Paid subscribers get 99.5%. That 29-point jump matters for professional work where listeners need to believe the voice belongs to the claimed speaker. The free version also caps input at 30 characters per conversion, while paid plans handle 10,000 characters at once. Content creators working with longer scripts will hit that 30-character wall constantly.
The service includes pre-made celebrity voices. Donald Trump, Elon Musk, Cristiano Ronaldo, and Emma Watson are available alongside cartoon characters like Rick and Sonic. This removes the cloning step entirely for users who want recognizable voices without recording samples. Whether these celebrity clones match the 99.5% similarity standard isn't specified, but they exist as ready-to-use options.
Beyond basic voice cloning, this system handles emotion control. You can adjust how the synthetic voice delivers text, though the mechanics of this control aren't detailed. Cross-language emotion preservation means a voice cloned from an English sample can supposedly maintain emotional inflection when generating Japanese speech. The Unlimited plan specifically lists this feature, suggesting lower tiers might not include it.
This feature set extends into audio production. AI Podcast creation, AI Cover Song generation, AI Accompaniment, and AI Dubbing all appear in the feature list. Speech-to-text runs in reverse, converting audio into written transcripts. AI Denoising cleans up background noise, while AI Voice Replacement swaps one voice for another in existing recordings. High-fidelity voice restoration presumably improves degraded audio quality, though no technical specifications explain the restoration process.
Free accounts get 500 characters for text-to-speech and 500 seconds for speech-to-text conversions. That's roughly one paragraph of generated speech and eight minutes of transcription. No technical support comes with free access. Pro plans start at $10.90 monthly when billed yearly, jumping to $25.90 for month-to-month billing. You get 2 million characters for TTS, 2 million seconds for STT, commercial usage rights, and priority processing that runs five times faster than standard speed.
Unlimited plans cost $26.90 yearly or $36.90 monthly. Character quotas triple to 6 million for TTS and STT. The cross-language emotion preservation feature appears exclusively here. All paid quotas expire after one month, so unused characters don't roll over. The service offers 24-hour refunds and claims 50% savings on annual subscriptions compared to monthly billing.
Over one million users have signed up. That number includes free accounts, so it doesn't indicate how many pay for the service. Priority processing on paid tiers suggests free users face longer wait times, though exact queue lengths aren't disclosed.
Content creators working in multiple languages represent the core audience. Someone producing Japanese podcast episodes from English scripts could clone their voice once and generate content in both languages. Professional creators need the commercial rights that come with paid plans. Businesses dubbing training videos or marketing content across regions would use the cross-language features and emotion controls to maintain brand voice consistency.