In this tutorial, we’ll show you how to convert text from images into a machine-readable format with the help of the Python Pytesseract module. The Pytesseract Module is a Python wrapper for the Google Tesseract library for OCR. We will be using this module to convert the words in an image to a string.

Optical Character Recognition(OCR) has been seen as a field of research in pattern recognition, artificial intelligence, and computer vision. This technique of extracting text from images is generally carried out by data scientists, software engineers, and at different work environments, whereby we know it’s certain the image would contain text data.

Installation

To install the Pytesseract on our machine, we will need to download the package. In this tutorial, we will use the Windows Operating system.

You can as well download it like this:

Pytesseract

pip install Pytesseract

Pillow

pip install pillow

The library requires the tesseract.exe binary to be indicated when specifying the path. So, during our installation, we can copy the path and keep it for use in the code later. This path highlighted in the image will be used in our code.

Sample one

We will convert this particular image below to text by using the pytesseract module:

Code:

#we first import our libraries here
from PIL import Image
from pytesseract import *
#Here we specified the path to our tessseract installation
pytesseract.tesseract_cmd = "C:\\Users\\CNDRO\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"
#This is the name of the image we have above
image_path = "brush.png"  
# Opening the image & storing it in an image object
img = Image.open(image_path)
#Providing the location to pytesseract library
#pytesseract.tesseract_cmd = pytesseract
# we will use this particular function to extract the text from the image
text = pytesseract.image_to_string(img)
  
# We will display the result below
print(text[:-1])

Output:

Sample Two

Let’s say we have an image that has a lot of text, we can as well use the pytesseract module to extract our text from the image. We will demonstrate it with the image below:

Code:

#we first import our libraries here
from PIL import Image
from pytesseract import *
#Here we specified the path to our tessseract installation
pytesseract.tesseract_cmd = "C:\\Users\\CNDRO\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe"
#This is the name of the image we have above
image_path = "behind.png"  
# Opening the image & storing it in an image object
img = Image.open(image_path)
#Providing the location to pytesseract library
#pytesseract.tesseract_cmd = pytesseract# we will use this particular function to extract the text from the image
text = pytesseract.image_to_string(img)
  
# We will display the result below
print(text[:-1])

Output:

Thanks for reading this post. If you found this post helpful, share, and follow us for more tutorial posts.