Main Content

Speech Transcription and Synthesis

Use a pretrained model or third-party APIs for text-to-speech and speech-to-text

Audio Toolbox™ provides examples for small-vocabulary recognition and sound synthesis. Use the wav2vec 2.0 pretrained network to perform general speech-to-text transcription with speech2text. You can download Audio Toolbox extended functionality from File Exchange for text-to-speech and speech-to-text through interfaces to popular third-party APIs. Supported APIs include Google®, IBM® Watson, Microsoft® Azure, and Amazon®.

You can interact with speech-to-text functionality graphically in the Signal Labeler app to quickly label regions of speech.


Signal LabelerLabel signal attributes, regions, and points of interest, and extract features (Since R2019a)


speech2textTranscribe speech signal to text (Since R2022b)
text2speechSynthesize speech from text (Since R2022b)
speechClientInterface with pretrained model or third-party speech service (Since R2022b)