I've been working on a simple OCR project for a couple days. The app is supposed to extract a text from an image. The solution I've come up with is:
greyscaling, rotating, removing noise from the image and isolating every single character on the image. So I need some help with a simple algorithm that would let me recognise the character. I only need to recognise the letters A,B,C,D.
The method i'd use would be to make a mask of each letter, scaling it to the rectangle you found when cutting the picture and "score" each letter, by comparing RGB for example.
Related
I'll start saying what I'm doing:
I'll take a photo with a webcam, in this photo there will be an object, always the same object, in a square format with letters inside it. I need to identify those letters. The step of identifying those letters is already done, the problem is the quality of the image coming from the webcam: it won't be the best nor in the best positioning, and the api I'm using to identify those letters requires positioning and quality.
The reason why I have a square is to help to identify where those letters are, so I can 'look for an square' in the image an then do what I've already done to identify the letters. My question is: is there more things I have to do in order to achieve this? Os is it only 're positioning the image, look for the square and then it's done'. If I need to study image processing there is no problem, I'm here because I don't even know what I have to look for.
I'm developing in Java because 'school things', so if there's already and api (I've heard and tried OpenCV, but I don't know what to do with it) it would really help me.
Thanks in advance.
Edit 1: As asked by Springfield762, I took some photos and I'll explain them below.
First let me explain what are the photos: the 'square thing' that will contain the letters isn't done yet, another department is taking care of it, so I had to improvise something here with pens and batteries. The letters will all be made of wood in a nice shape, I had to replace them with some Magicka cards as I don't have them yet, but the cards fits well to explain the example. I also made an example of the the square (that actually ended as an rectangle) in paint, so it has absolutely nothing of beautiful.
I took 3 photos, one using the light coming from the window, the second using the light of my room and the third using the flash of the webcam. (Sorry about links, I can't post images nor links, although I'm always here, this is the first time I post a question...)
Window light:
Room light:
Flash:
Square (rectangle) example:
The 'project' of the square you guys can ignore, I did it so that you can understand the images. And the reason I took 3 different photos was just to show all different possibilities that the webcam might be in. Also, the quality of the Magicka cards isn't a problem, since each card represents one letter, so it'll be easy to 'see' them.
Well, I found most answers to this question, I'll explain them below.
First it's not a square, but a rectangle, and it is still to be made. So I started testing the software using anything that was a rectangle, first I had to 'locate' the rectangle in the frame captured by the camera, then show it in the original image seen by the user, I accomplished that by:
Capturing the actual frame
Converting that frame to HSV;
Applying some kind of threshold (using the Core.inRange function, so that I could find a specific color in the range specified in the function);
Applying the Imgproc.findContours to find the contours of the rectangle;
Finally drawing a rectangle using the points found by the findContours;
How it ended: i.imgur.com/wmNVai0.jpg
After that I knew that I could place the rectangle in a way that all the letters inside it would be in a straight line, so I didn't need to care about the positioning of the letters. Now I had to fight with the OCR.
I chose Tesseract as it is OpenSource and seems to be a strong tool (supported by Google, that's for sure something), then I started to test some images.
In the beginning it was tough and I thought I'd have to train OCR even more, but the thing is that it has some kind of dictionary that tries to find words which are listed in this dictionary, and I didn't need that as I was looking for characters that could be in a total random way. I had to turn off that dictionary by adding the following line to a conf file:
load_system_dawg F
load_freq_dawg F
After that I had to change somethings in the image as well:
Transform into Grayscale;
Resize it by ~80%;
Original images (I can't post links...):
i.imgur.com/DFqNSYB.jpg
i.imgur.com/2Ntfqy3.jpg
Grayscale:
imgur.com/XUZ9b1Z.jpg
i.imgur.com/yjXMH5Q.jpg
Resized:
i.imgur.com/zgX9bKF.jpg
i.imgur.com/CWPRU3I.jpg
(Sometimes I had problems with resized images and on other moments I didn't, that's something I have to test even more.)
Then I could get some good results, though I'm still afraid as the light of the environment makes a whole difference, I still have to test it and mainly I still need the god da** base, I'll post it as an edit later.
If I did anything wrong or if anyone wants to correct me, please feel free to say it!
i have the following picture of which i want to crop out all letters in to new images.
http://imgur.com/N2JqmFi
The result of each letter should look like this for example:
http://imgur.com/LvjdZh1
I have had troubles achieving this, i have used thresholding, findContours, and many other things. I just cannot seem to cut out the letters since the image contains very much noise.
Can someone please provide some help and information?
if it did not find any contours in your img, rect_min will contain invalid negative numbers, and submat will crash.
you'll have to add some logic, that cares for that.
I'm working on an OCR Project which reads an image and finds the words
in the image and slices them into small pieces where each piece will have a word.
Problem :
I want an OCR API ( Java , Open source recommended ) which find out word
edges for me. Is there anything available ?
I have already gone through Tesseract ( Tess4j) , JavaOCR.. But i couldnt find in
these anything about finding exact word locations.
Please share your ideas & knowledge...
What do you exactly mean by finding the edges? For example you want to find the coordinates of the rectangle which a word will fit in that and output these coordinates?
As far as I remember from the javaOCR code, you can see the part that draws rectangles over your image file and find out where the coordinates of the rectangles are come from!
I have 5 images by default in the program, and I will allow the user choose an image from the desktop. The program will determine which image between the 5 images is the closest one to the user image.
Can anyone help me and take me to the start of the idea?
You can try to use a feature extraction algorithm like SIFT, SURF etc. Then compare extracted features with your database. You can select the best matching image based on the number of correct matches.
Generally SIFT works fine for 2D objects, like picture of a label or an advertisement board. Rotation on 2D plane or scale wont matter if you are using SIFT. SURF is supposed to be an improvement of SIFT but I do not have much experience on it.
These algorithms are said to be bit heavy. Anyway if you are matching just 5 images it wont be much of a problem.(Or you can simply calculate the descriptors(features) of your images before hand and store them. Then at run time all you have to do is get the descriptor of the user image and compare it) But still if you are trying to match images of basic shapes like squares and circles, using square detection or circle detection might be efficient performance wise.
I am doing my final year Mca and my topic is image matching using edge detection.
I have an Image at hand where the subject is not smiling, and two other images.
Bigger image containing the image at hand as a part of the image
Same as the image at hand with (some modification like smiling)
Now I want to check for presence in the first case, matching in the second case.
My approach:
I will find edges for all the given images-to reduce the amount of data to check.
I'm stuck on how to proceed. Any suggestions are extremely appreciated.
I have been Java user for years and I can do virtually anything, but ... as I found Mathematica about 2 years ago, I really started to love Mathematica. This is kind of problem I would use Mathematica to solve.
Just take a look at image processing reference.
Example of ImageCorrelate function:
CVOnline is a great source of computer vision algorithms. The section on edge detection can probably point you in the right direction.