How to Convert Speech to Text in Python

In this tutorial, we’ll learn how to convert speech or an audio file to text format. This very interesting topic has been utilized in different ways such as Business, Content Creation, Bots, and lots more. The Speech Recognition library is an essential library to discuss whenever we’re looking into speech-to-text. Python supports many speech recognition engines and APIs, including the Google Speech Engine, Google Cloud Speech API, IBM Speech to Text, and lots more.

Speech recognition can be broken down into three stages:

Automatic speech recognition (ASR): This performs the task of transcribing the audio file.
Natural language processing (NLP): It works on deriving meaning from the speech data and each text converted.
Text-to-speech (TTS): This converts text to human-like speech.

Our primary focus here is how we can convert speech to text. We’ll demonstrate in a step-by-step process.

Step 1: Install Libraries

Here, we’ll install all essential libraries we need in our code to convert speech or audio file to text.

The first library we need to install is the Python Speech Recognition Module. We can install it with the command below;

1 pip install speechrecognition

The next library to install is the Pydub library which is very useful for manipulating audio files. You can install it with this command;

1 pip install pydub

The last one to install is the Pyaudio Library, which we can install with the command below;

1 pip install pyaudio

Step 2: Convert Speech to Text

In this code, we’ll use the Speech recognition library to gain access to our Microphone, whereby we’ll speak and this will be converted to text here. The code we used for demonstrating is shown below;

1 #import Library
2 import speech_recognition as sr
3
4 # this will be used to get audio from the microphone 5 v = sr.Recognizer()
6 #Here, we represent our microphone as source 7 with sr.Microphone() as source: 8 print("Speak:")
9 #This is where it listens to our speech before going further to recognize it 10 the_audio = v.listen(source)
11
12 try:
13 print("Your Speech was:" + v.recognize_google(the_audio))
14 except sr.UnknownValueError:
15 print("Could not understand audio")

Output

Step 3: Convert Audio Files to Text

We converted our speech to text from what we did in the earlier step, but here we will work with an Audio file. To work with our Demo Audio, we’ll use the Pydub library we installed earlier to break our file into smaller pieces. We usually do that for a long Audio file to make the speech recognition library listen well and return accurate text format.

The code used is shown below;

1 #we import our libraries here
2 from pydub import AudioSegment
3 from pydub.utils import make_chunks
4 import os
5 import speech_recognition as sr
6 import warnings
7 warnings.filterwarnings("ignore")
8
9 def process_audio(filename):
10 #We open a text file here to write in our Audio file
that has been converted to text
11 txtf = open("the_audio.txt", "w+")
12 #we use the AudioSegment to open our audio file, this
file is in .wav format
13 myaudio = AudioSegment.from_wav(filename)
14 #we specify our chunk length we want to use
15 chunks_length_ms = 7000
16 chunks = make_chunks(myaudio, chunks_length_ms)
17 #Here, we loop through our chunk object, which we
already pass our Audio file in and the length
18 for i, chunk in enumerate(chunks):
19 #we save this chunked file individually in a folder
in same wav format
20 chunkName = './chunked/'+filename+"_{0}.wav".format(i)
21 print('I am exporting', chunkName)
22 chunk.export(chunkName, format="wav")
23 #From here, we pass in the file individually to
be recognized via speech recognition library
24 file = chunkName
25 #we assign an object method from the speech
recognition library
26 r = sr.Recognizer()
27 #We make use of the speech recognition library
here, listening to each of our files
28 with sr.AudioFile(file) as source:
29 audio_listened = r.listen(source)
30 try:
31 #Here it tries to recognize and convert to text
32 rec = r.recognize_google(audio_listened)
33 #Now, we use our earlier text file we opened and
pass in the result of our conversion inside
34 txtf.write(rec+".")
35 #This handles error incase, the speech recognition
library doesn't understand your audio
36 except sr.UnknownValueError:
37 print("I don't recognize your audio")
38 except sr.RequestError as e:
39 print("could not get the result.check your internet") 40 #we created a folder where all those Audio files broken down will be saved
41 try:
42 os.makedirs("chunked")
43 except:
44 pass
45
46 #we call our function here
47
48 process_audio("hello.wav")
49
50
51

Follow the steps we highlighted above to start converting your speech to text in Python. Thanks for reading.