Speech to text model for a non-English language.

Question

Murtaza Mohammadi on 22 Sep 2024

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/2154620-speech-to-text-model-for-a-non-english-language

Commented: Umar on 24 Sep 2024

Hello

I want to develop a tool that can transcribe a non-English audio and using the letters of that language itself. What I have gathered so far is that I need to create some labelled data and train a deep learning model. Beyound that I am unaware how to proceed. All online examples and discussions pertain to an existing English dataset or a trained model on the English language. I would like to develop something for a regional dialect which uses a different alphabet system.

Looking for some detailed help and guidance here.

Thank you.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Umar on 22 Sep 2024

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/2154620-speech-to-text-model-for-a-non-english-language#answer_1520740

Hi @Murtaza Mohammadi ,

The first step in developing your transcription tool is to gather a dataset of audio recordings in the target language. This dataset should include:

Audio Files: Recordings of spoken language in various contexts (e.g., conversations, speeches).

Transcriptions: Text files that contain the corresponding transcriptions of the audio files in the target alphabet.

You may need to create this dataset manually or find existing resources. Ensure that the audio quality is high and that the recordings cover a diverse range of speakers and dialects. Once you have your audio files, you need to label them. This involves creating a mapping between the audio and its corresponding text. You can use a simple CSV format for this purpose:

audio_file, transcription
audio1.wav, "transcription in target alphabet"
audio2.wav, "another transcription"

So, before you train a model, you must preprocess the audio data. This typically involves:

Resampling: Ensure all audio files are at the same sample rate.

Feature Extraction: Convert audio signals into a format suitable for model training, such as Mel-frequency cepstral coefficients (MFCCs).

Here’s a MATLAB code snippet to extract MFCC features from an audio file: language-matlab

[audioIn, fs] = audioread('audio1.wav'); % Read audio file
audioIn = resample(audioIn, 16000, fs); % Resample to 16 kHz
coeffs = mfcc(audioIn, 16000); % Extract MFCC features

For more information on these functions, please refer to

https://www.mathworks.com/help/matlab/import_export/read-and-get-information-about-audio-files.html

https://www.mathworks.com/help/signal/ref/resample.html?searchHighlight=resample&s_tid=srchtitle_support_results_1_resample

https://www.mathworks.com/help/audio/ref/mfcc.html?searchHighlight=mfcc&s_tid=srchtitle_support_results_1_mfcc

Now for transcription tasks, I will recommend using recurrent neural networks (RNNs) or convolutional neural networks (CNNs) which are commonly used. You can also consider using Long Short-Term Memory (LSTM) networks, which are effective for sequence prediction problems.Here’s a simple example of defining an LSTM network in MATLAB:

layers = [
  sequenceInputLayer(13) % Input layer for MFCC features
  lstmLayer(100, 'OutputMode', 'sequence') % LSTM layer
  fullyConnectedLayer(numClasses) % Output layer for classes
  softmaxLayer
  classificationLayer];

For more information on lstm layer, please refer to

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html?searchHighlight=lstmLayer&s_tid=srchtitle_support_results_1_lstmLayer

Once your model is defined, then you can train it using the labeled data. Use the trainNetwork function in MATLAB:

options = trainingOptions('adam', ...
  'MaxEpochs', 100, ...
  'MiniBatchSize', 32, ...
  'Verbose', 0, ...
  'Plots', 'training-progress');

net = trainnet(trainingData, layers, options);

For more information on trainnet, please refer to

https://www.mathworks.com/help/deeplearning/ref/trainnet.html

After training, evaluate your model's performance using a separate test dataset. Calculate metrics such as accuracy, precision, and recall to assess how well your model transcribes audio. Now that you are satisfied with the model's performance, you can deploy it as a standalone application or integrate it into a larger system. Consider using MATLAB's App Designer to create a user-friendly interface for your transcription tool. For more information on App Designer, please refer to

https://www.mathworks.com/help/matlab/ref/appdesigner.html?searchHighlight=App%20designer&s_tid=srchtitle_support_results_1_App%20designer

Hope, this should help you get started with your project. Please let me know if you have any further questions.

2 Comments
Show NoneHide None

Murtaza Mohammadi on 23 Sep 2024

Hi @Umar

Thanks for your detailed response. I will get going on this and keep you posted.

Umar on 24 Sep 2024

Hi @ Murtaza Mohammadi,

Thank you for your prompt acknowledgment. I appreciate your commitment to moving forward with this matter. Please feel free to reach out if you have any questions or require further assistance as you proceed. I look forward to hearing from you soon.

Sign in to comment.

Speech to text model for a non-English language.

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Speech to text model for a non-English language.

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None