Getting handwritten text from images - java

How to extract handwritten text from images, like bank form images, in Java?
I tried to using Tesseract, OCR, GOCR but didn't working for me. Are there any other ways to extract handwritten text from images in Java which works at least 80-90%?

Question with links to libraries
I don't think JAVA natively supports this function, so you gotta use libraries.
There was a question which asked for working libraries and the general consent was: you won't get a 80-90% working recognition in a free and open source library.
Anyway, you can try this, as it is a wrapper for Tesseract.

Related

Convert string text to GCode

I need to create an app (using Android Studio) that generates CNC code to operate a 3D printer. It takes a String as input.
I've found a couple libraries in Python and Javascript that does this, but as I don't have time to translate whole libraries to Java, can you recommend any libraries that does that for me? If there are no open-source options, can you recommend any guide to help me develop this conversor?
What we ended up doing:
App asks for a String as input;
String is converted to a bitmap and then saved as a .png;
.png is loaded and converted to a .svg file. We used this repo: https://github.com/jankovicsandras/imagetracerandroid
We developed a parser to convert a .svg to g-code.
It worked but it's not the best solution, we're looking to implement something that runs python in Android, as there are many pythons that do all the work already, but that's how we've done it and it's working by now.

Java Desktop Capture

I want to continuously capture the entire desktop inside of a java application. As I'm capturing, I'd like to chunk the stream of data into small video files (mp4, WebM) for storage. From my research, it would seem that the Robot Java class and the FFmpeg tool are my best options. However, Robot seems to best-fit the use case of obtaining images, not videos. FFmpeg seems like it may support this, but I've struggled to find definitive documentation. I'm looking to emulate what can be done through Chrome's getUserMedia and desktopCapture APIs along with the MediaStreamRecorder JavaScript library. Does anyone have a suggestion for a similar and elegant solution in Java?

Using Java to capture an area of the screen and identify text found there

This question may be beyond the scope of a simple answer here at stack overflow, but my hope is that it will lead me to be able to formulate several more specific questions to get where I need to be.
I want to write a program that searches a buffered image for text and returns it as a string. I don't want to write an entire OCR program, but would rather use an API that is freely available such as tesseract. Unfortunately I've been unable to find a Java API for tesseract.
I know that the font is arial and I know it's size. I am wondering if that will help.
I've already managed to capture the screen, but I'm not sure how to accomplish the next step of identifying the text found in the image.
the question
How can I implement a simple OCR function into my java program?
You can use tesjeract or tess4j wrapper of Tesseract API. Be sure to rescale you images to 300 DPI since screenshots' resolution (72 or 96 DPI) is in general not adequate for OCR purpose.
The OCR implementation is complicated, but using an SDK like http://asprise.com/product/ocr/index.php?lang=java is simple.

How to perform: Upload Image > Recognize Text > Make Image Searchable > Store into DB?

I need to know how to perform the procedure, you already have read in the title.
You'll upload an image (e.g. a piece of text, an article) and on server-side the text will be recognized via OCR and stored into a database.
Which would be the best programming language for it? It should be a browser application.
I found the ocropus project, but how can I combine it to common web scripting languages like PHP? Is it possible at all? Didn't have worked with Python yet.
Or a totally different approach..? Java Enterprise?
Let's rock that,
Chris
maybe you can use this php library i use for recognize text from images and store the text readed into database
http://www.phpclasses.org/browse/package/2874/download/targz.html
download the rar package and run example.php and then example1.php to see how it works
here you have an image upload example:
http://www.reconn.us/content/view/30/51/
hope this helps

Extract text from PDF (google app engine)

Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I've read about PDFJet, but it can't read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.
iText now has a text parsing module (I'm one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.
PdfBox does not run on GAE. It uses not-allowed java classes.
(GAE only permits these http://code.google.com/appengine/docs/java/jrewhitelist.html)
I have partially modified a very old version of PdfBox (0.7.3) to be GAE complaiant. Now I'm able to extract text from PDF (whole page or rectangular area). I only modified a minumum part of the pdf text extraction and not the whole PdfBox. :)
The idea was to remove refences to java.awt.retangle & C. using my own "rectangle" class.
More info: http://fhtino.blogspot.com/2010/04/pdfbox-text-extration-gae.html
I modified the latest (1.8.0-Snapshot) version to run on Google AppEngine. Had to disable one Unit-Test, but it runs fine for simple text extraction.
Following the simple try-fail-fix approach i had to modify 5 files in total. Pretty doable.
You'll also have to explicitly use a RandomAccessBuffer, like Fabrizio explained.
For the extra lazy, heres the compiled jar, dependencies for text extraction, and the patch. Note that it might not work for every usecase (i.e. rectangle based extraction). Used it to extract text of a whole page.
https://docs.google.com/folder/d/0B53n_gP2oU6iVjhOOVBNZHk0a0E/edit
I know there is http://pdfbox.apache.org/index.html
Apache PDFBox is an open source Java
PDF library for working with PDF
documents. This project allows
creation of new PDF documents,
manipulation of existing documents and
the ability to extract content from
documents.
but I've never tested it.
Last month, I'd just finished extracting text from pdf file in my project. I used XPDF tool for getting text, and text coordinates, but I used it in Xcode (Objective-C). This tool was open source, written by C++, and able to be encoded in many language. However, I didn't know whether XPdf would be work on your java, or not. Anyway, You can try this tool.

Categories