Wav2Vec2 Word Error Rate is too high for librispeech-clean

#66
by GokseninYuksel - opened

Hello Hugging Face Audio Team,

Thank you for all your great work! I had a quick question regarding the evaluation metrics for a couple of models.

Could you help clarify why there's a significant difference in WER on the librispeech-clean dataset between facebook/wav2vec2-large (12.81 WER) and speechbrain/asr-wav2vec2-librispeech (1.77 WER)?
My hypothesis is that the Facebook model might only be using a linear CTC head, while the SpeechBrain model uses language model head. Could you confirm if this is the case?
Thanks so much.

Kind Regards,
Goksenin Yuksel

GokseninYuksel changed discussion title from Wav2Vec2 Word Error Rate is too low for librispeech-clean to Wav2Vec2 Word Error Rate is too high for librispeech-clean

Sign up or log in to comment