Image Processing Projects using Python:
In this post, we are going to learn an interesting problem statement of machine learning with image processing. We will try to understand the concept with some hands-on example and get some insights on the algorithm to transform image into text file and then convert text into an audio using python code. Lets get started.
Problem Statement:
Input Image #learntospark |
Our today's problem statement is to
a) Convert the above input image with some text in it into a text file and save the string of text into an output file.
b) Convert the output text file generated into an audio file to hear the output from it.
Solution:
Overview on Image Processing:
Recognizing text from the give image is one of the great functionality to explore for those whose wants to start their carrier with machine learning, deep learning with image processing. Many applications in our real time can be see using this functionality namely, Google lens, CamScanner. These application moved one step ahead in machine learning and implemented the OCR (Optical Character recognition) algorithm to convert the scanned image to text.
Libraries we use:
For Solving the above give problem, we are going to use the following libraries,
pyttsx3:
pip install pyttsx3
Note: If any error pop out stating No module named win32 or win32com.client or win32api, the you will additionally be required to install pypiwin32 aswell.
Pytesseract:
pip install pytesseract
Else, if you are using anaconda, then open anaconda prompt and type in the below line to install the tesseract.
conda install -c mcs07 tesseract
Note: This OCR package sponsored by google really does not provide true bindings with python. It will make a copy of image to the temporary location on the disk and call tesseract binaries to capture the text in it. We also need to install tesseract engine to run the below code. To download to your local machine follow the link download tesseract for windows
PIL - Python Imaging Library:
pip install Pillow
Coding:
Step 1: Import all the required packages
#import required libraries
import pytesseract
from PIL import Image
import pyttsx3
Step 2: Read the input image and check the image by printing it.
input_img=Image.open('input_img.jpg')
print(input_img)
input_img
Out[]:
Step 3: The given image looks to be clean, but often we encounter the problem of image being filled with the noise. So, to avoid noise and overcome this sort of issue we need to convert the image into a monochrome mode.
img = input_img.convert('L')
img
Out[]:
Step 4: In this step, we will add tesseract engine and convert the scanned image into string of text using google's machine learning OCR algorithm, tesseract with python builder.
#Select the tesseract engine that you installed
pytesseract.pytesseract.tesseract_cmd =r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text_data = pytesseract.image_to_string(img)
print(text_data)
Out[]:
Step 5: Final step is to convert the text into an audio. This is done by pyttsx3 engine. Code snippet to achieve this is given below.
engine=pyttsx3.init()
engine.say(text_data)
engine.save_to_file(filename='audio.mp3',text=text_data)
The Output file is saved as an mp3 format file and could able to hear the text convert as an audio file
Enhancements:
Try it yourself: Handwriting Recognition Machine Learning Project
Practice the above code to convert the image into text and then text into audio file. If you are able to crack that, try out with the hand written images, which is in png format.
Hope you enjoyed learning this technique with image processing. Post your comments if you have any queries on this method. Also, share any other method you are familiar with in converting OCR to audio.
Full Program:
Happy Learning!!!
0 Comments