Image Processing ML Projects | Image to Text | Text to Audio

Image Processing ML Projects using Python: 

In this post, we are going to learn an interesting problem statement of machine learning with image processing. We will try to understand the concept with some hands-on example to understand the algorithm to convert image into text file and then convert the same text file into an audio speech. Come lets get started.

Problem Statement:

Input Image 

Our today's problem statement is to

a) Convert the above input image with some text in it into a text file and save the string of text into an output file. 

b) Convert the output text file generated into an audio file to hear the output from it.


Overview on Image Processing:

Recognizing text from the give image is one of the great functionality to explore for those whose wants to start their carrier with machine learning, deep learning with image processing. Many applications in our real time can be see using this functionality namely, Google lens, CamScanner. These application moved one step ahead in machine learning and implemented the OCR (Optical Character recognition) algorithm to convert the scanned image to text.

Libraries we use:

For Solving the above give problem, we are going to use the following libraries,


It is a python library, that helps us to convert the text-to-speech offline and also it is compatible with both the versions of python. To install this package in your windows machine, execute the below command in your anaconda command prompt.

pip install pyttsx3

Note: If any error pop out stating No module named win32 or win32com.client or win32api, the you will additionally be required to install pypiwin32 aswell.


Python tesseract is the python library sponsored by google. It is the OCR tool available in python to recognize text in image. Let us look how to install this package. Type in the below command in your command prompt.

pip install pytesseract

Else, if you are using anaconda, then open anaconda prompt and type in the below line to install the tesseract.

conda install -c mcs07 tesseract

Note: This OCR package sponsored by google really does not provide true bindings with python. It will make a copy of image to the temporary location on the disk and call tesseract binaries to capture the text in it. We also need to install tesseract engine to run the below code. To download to your local machine follow the link download tesseract for windows

PIL - Python Imaging Library:

Python Imaging Library, in short PIL refers to the python package that extends the image processing capabilities to your python interpreter. To install this library, if not available then execute the pip command as follows,

pip install Pillow


Open a fresh Jupyter notebook to start with our problem statement in converting our image to text.

Step 1: Import all the required packages

#import required libraries
import pytesseract
from PIL import Image
import pyttsx3

Step 2: Read the input image and check the image by printing it.'input_img.jpg')


Step 3: The given image looks to be clean, but often we encounter the problem of image being filled with the noise. So, to avoid noise and overcome this sort of issue we need to convert the image into a monochrome mode.

img = input_img.convert('L')


Step 4: In this step, we will add tesseract engine and convert the scanned image into string of text using google's machine learning OCR algorithm, tesseract with python builder.

#Select the tesseract engine that you installed
pytesseract.pytesseract.tesseract_cmd =r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text_data = pytesseract.image_to_string(img)


Step 5: Final step is to convert the text into an audio. This is done by pyttsx3 engine. Code snippet to achieve this is given below.


The Output file is saved as an mp3 format file and could able to hear the text convert as an audio file


We can also enhance this piece of code by adding a translator package provided by google that provides the option to translate the OCR text into desired language. Only thing to be taken care while using google translator is to make sure that translator package will support the target language and also pyttsx3 will be able to recognize to speak this text.

Try it yourself: Handwriting Recognition Machine Learning Project

Practice the above code to convert the image into text and then text into audio file. If you are able to crack that, try out with the hand written images, which is in png format.

Hope you enjoyed learning this technique with image processing. Post your comments if you have any queries on this method. Also, share any other method you are familiar with in converting OCR to audio.

Full Program:

Happy Learning!!!

Post a Comment