23 Jun How to Convert Speech to Text in Python
In this tutorial, we’ll learn how to convert speech or an audio file to text format. This very interesting topic has been utilized in different ways such as Business, Content Creation, Bots, and lots more. The Speech Recognition library is an essential library to discuss whenever we’re looking into speech-to-text. Python supports many speech recognition engines and APIs, including the Google Speech Engine, Google Cloud Speech API, IBM Speech to Text, and lots more.
Speech recognition can be broken down into three stages:
- Automatic speech recognition (ASR): This performs the task of transcribing the audio file.
- Natural language processing (NLP): It works on deriving meaning from the speech data and each text converted.
- Text-to-speech (TTS): This converts text to human-like speech.
Our primary focus here is how we can convert speech to text. We’ll demonstrate in a step-by-step process.
Step 1: Install Libraries
Here, we’ll install all essential libraries we need in our code to convert speech or audio file to text.
The first library we need to install is the Python Speech Recognition Module. We can install it with the command below;
1 pip install speechrecognition
The next library to install is the Pydub library which is very useful for manipulating audio files. You can install it with this command;
1 pip install pydub
The last one to install is the Pyaudio Library, which we can install with the command below;
1 pip install pyaudio
Step 2: Convert Speech to Text
In this code, we’ll use the Speech recognition library to gain access to our Microphone, whereby we’ll speak and this will be converted to text here. The code we used for demonstrating is shown below;
1 #import Library 2 import speech_recognition as sr 3 4 # this will be used to get audio from the microphone 5 v = sr.Recognizer() 6 #Here, we represent our microphone as source 7 with sr.Microphone() as source: 8 print("Speak:") 9 #This is where it listens to our speech before going further to recognize it 10 the_audio = v.listen(source) 11 12 try: 13 print("Your Speech was:" + v.recognize_google(the_audio)) 14 except sr.UnknownValueError: 15 print("Could not understand audio")
Output
Step 3: Convert Audio Files to Text
We converted our speech to text from what we did in the earlier step, but here we will work with an Audio file. To work with our Demo Audio, we’ll use the Pydub library we installed earlier to break our file into smaller pieces. We usually do that for a long Audio file to make the speech recognition library listen well and return accurate text format.
The code used is shown below;
1 #we import our libraries here 2 from pydub import AudioSegment 3 from pydub.utils import make_chunks 4 import os 5 import speech_recognition as sr 6 import warnings 7 warnings.filterwarnings("ignore") 8 9 def process_audio(filename): 10 #We open a text file here to write in our Audio file that has been converted to text 11 txtf = open("the_audio.txt", "w+") 12 #we use the AudioSegment to open our audio file, this file is in .wav format 13 myaudio = AudioSegment.from_wav(filename) 14 #we specify our chunk length we want to use 15 chunks_length_ms = 7000 16 chunks = make_chunks(myaudio, chunks_length_ms) 17 #Here, we loop through our chunk object, which we already pass our Audio file in and the length 18 for i, chunk in enumerate(chunks): 19 #we save this chunked file individually in a folder in same wav format 20 chunkName = './chunked/'+filename+"_{0}.wav".format(i) 21 print('I am exporting', chunkName) 22 chunk.export(chunkName, format="wav") 23 #From here, we pass in the file individually to be recognized via speech recognition library 24 file = chunkName 25 #we assign an object method from the speech recognition library 26 r = sr.Recognizer() 27 #We make use of the speech recognition library here, listening to each of our files 28 with sr.AudioFile(file) as source: 29 audio_listened = r.listen(source) 30 try: 31 #Here it tries to recognize and convert to text 32 rec = r.recognize_google(audio_listened) 33 #Now, we use our earlier text file we opened and pass in the result of our conversion inside 34 txtf.write(rec+".") 35 #This handles error incase, the speech recognition library doesn't understand your audio 36 except sr.UnknownValueError: 37 print("I don't recognize your audio") 38 except sr.RequestError as e: 39 print("could not get the result.check your internet") 40 #we created a folder where all those Audio files broken down will be saved 41 try: 42 os.makedirs("chunked") 43 except: 44 pass 45 46 #we call our function here 47 48 process_audio("hello.wav") 49 50 51
Follow the steps we highlighted above to start converting your speech to text in Python. Thanks for reading.
No Comments