Technology

Microsoft AI can clone a voice from just three seconds of audio.

Author

[vc_row][vc_column][vc_column_text dp_text_size=”size-4″]Microsoft’s new text-to-speech AI can replicate voices, including tone and pitch, from a brief three-second audio clip. Despite being a complicated system, VALL-“neural E’s codec language model” is extremely simple to use and only requires the insertion of audio and text.

The developers of the programme are certain that it can be applied to high-quality text-to-speech tasks including speech modifying and audio content production. Microsoft’s application is based on EnCodec, which Meta unveiled in October of the previous year.

VALL-E analyses how someone sounds and separates that information into discrete components, producing discrete audio codec codes from text and acoustic stimuli. EnCodec compares what it knows about how that voice would sound if it delivered a different phrase to training data.

The speech-synthesis abilities of VALL-E were taught using audio from a library that Meta put together, which contained 60,000 hours of English speakers from more than 7,000 speakers. The submitted training data must closely match the three-second voice clip sample for a successful outcome.

By altering the random seed used in the generating process, the computer can produce differences in voice tone, as shown by the sample provided by Microsoft. VALL-E can simulate the acoustic environment of the audio that was present in the sample audio, such as simulating a voice over the phone.

Although speech-generating software is frequently used by news sites, it requires a lot of input. What’s more, the voice lacks a human-like quality and is unable to communicate expressions or inflections. Because it requires less input and produces better and more accurate results, VALL-E is highly sophisticated. The programme, however, offers significant hazards when the model is misused, such as voice identification spoofing or speaker impersonation.

[/vc_column_text][/vc_column][/vc_row]

Author

View All Posts

Microsoft AI can clone a voice from just three seconds of audio.

Author

Author

Related News

Bitcoin Hits All-Time High, Surpasses $122,000 for First Time

Bitcoin Reaches All-Time High amid Strong Market Surge

Nvidia Becomes First Company to Reach 4 Trillion Dollar Market Value

Modi Government Blocks Reuters’ X Accounts in India

Trending

Trending

Recent News

FBISE Reveals Matric Result Date for Academic Year 2025-26

Saudi Arabia Lifts Mahram Rule for Women Performing Hajj

Pakistan Navy Confers Military Honors to Officers, Sailors, and Civilians

Indus River Water Level Rises, Low-Level Flood Persists at Guddu Barrage

Follow Us

Microsoft AI can clone a voice from just three seconds of audio.

Author

Author

Related News

Trending

Trending

Recent News

Follow Us

Type to Search