Bhilai Institute of Technology - Durg, India
The session gives a glance into Whisper: a speech-to-text model. Brief: Whisper is an advanced speech-to-text model by OpenAI, trained on 680,000 hours of diverse data. It uses a Transformer architecture to convert 30-second audio segments into text and can identify languages, provide timestamps, and translate non-English speech.
The session provides an overview of Whisper, a cutting-edge speech-to-text model developed by OpenAI. Whisper is designed to efficiently convert spoken language into written text, making it highly useful for various applications. It is trained on an extensive dataset comprising 680,000 hours of diverse audio data from the web, enabling it to understand a wide range of languages and accents. Whisper employs a sophisticated Transformer architecture, which includes both an encoder and a decoder to process speech effectively.
The model works by breaking down audio into 30-second segments and converting each piece into a log-Mel spectrogram, a visual representation of sound. This spectrogram is then used to generate text. Besides transcribing speech, Whisper can identify the language being spoken, provide timestamps for phrases, transcribe speech in multiple languages, and even translate non-English speech into English.
This versatility makes Whisper suitable for numerous applications, from live transcription services to language translation tools, highlighting its potential to revolutionize how we interact with and understand spoken language.
Bhilai Institute of technology
Cloud Team
ML Team
ML Team
GDSC BITD
Web Team
Web Team
PR & Outreach Team
Bhilai Institute Of Technology
Content Team
Content Team
Contact Us