How to find word error rate of spoken sentence for regression based model?

Question

Shilpa Sonawane on 21 Oct 2023

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/2036661-how-to-find-word-error-rate-of-spoken-sentence-for-regression-based-model

Commented: Shilpa Sonawane on 14 Dec 2023

I am working on visual speech synthesis. I have used GRID dataset which consists of short sentences. The developed model is regression based model.The model takes mute video as a input & generate speech signal. My aim is to find word error rate from output signal(speech signal). I don't know how to seperate words from input and output signal in order to find word error rate.

Kindly guide me about this.

1 Comment
Show -1 older commentsHide -1 older comments

Shilpa Sonawane on 26 Oct 2023

Thank you so much

Sign in to comment.

Sign in to answer this question.

Answer 1

Drew on 25 Oct 2023

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/2036661-how-to-find-word-error-rate-of-spoken-sentence-for-regression-based-model#answer_1340396

Word Error Rate (WER) is a widely used metric for evaluating Automatic Speech Recognition (ASR). To calculate WER for a visual speech synthesis (VSS) system, a reference word transcription and a hypothesis word transcription will be needed, and then standard word error rate alignment can be performed to obtain the WER. These word transcriptions can be obtained in various ways. For example, the reference word transcriptions might come from the visual dataset labels. The hypothesis word transcription might come from the VSS system itself (if the VSS system has an intermediate representation in words), or from running ASR on the synthesized speech. It is important to note that while WER is a widely-used metric, it does not capture all aspects of visual speech synthesis quality. Other evaluation metrics, such as perceptual evaluation of speech quality (PESQ) or subjective user studies, could be conducted to assess the system's performance from different perspectives, including audio-visual synchronization, intelligibility, overall usefulness of the synthesized speech, and naturalness.

If this answer helps you, please remember to accept the answer.

1 Comment
Show -1 older commentsHide -1 older comments

Shilpa Sonawane on 14 Dec 2023

Thank you.

Sign in to comment.

How to find word error rate of spoken sentence for regression based model?

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to find word error rate of spoken sentence for regression based model?

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments