API to read text from Image file using OCR

API to read text from Image file using OCR - java

I am looking out for an example code or API name from OCR (Optical character recognition) in Java using which I can extract all text present from an image file. Without comparing it with any image which I am doing using below code.
public class OCRTest {
static String STR = "";
public static void main(String[] args) {
OCR l = new OCR(0.70f);
l.loadFontsDirectory(OCRTest.class, new File("fonts"));
l.loadFont(OCRTest.class, new File("fonts", "font_1"));
ImageBinaryGrey i = new ImageBinaryGrey(Capture.load(OCRTest.class, "full.png"));
STR = l.recognize(i, 1285, 654, 1343, 677, "font_1");
System.out.println(STR);
}
}

You can try Tess4j or JavaCPP Presets for Tesseract. I perfer later as its easier than the former.
Add the dependency to your pom `
<dependency>
<groupId>org.bytedeco.javacpp-presets</groupId>
<artifactId>tesseract-platform</artifactId>
<version>3.04.01-1.3</version>
</dependency>
`
And its simple to code
import org.bytedeco.javacpp.*;
import static org.bytedeco.javacpp.lept.*;
import static org.bytedeco.javacpp.tesseract.*;
public class BasicExample {
public static void main(String[] args) {
BytePointer outText;
TessBaseAPI api = new TessBaseAPI();
// Initialize tesseract-ocr with English, without specifying tessdata path
if (api.Init(null, "eng") != 0) {
System.err.println("Could not initialize tesseract.");
System.exit(1);
}
// Open input image with leptonica library
PIX image = pixRead(args.length > 0 ? args[0] : "/usr/src/tesseract/testing/phototest.tif");
api.SetImage(image);
// Get OCR result
outText = api.GetUTF8Text();
System.out.println("OCR output:\n" + outText.getString());
// Destroy used object and release memory
api.End();
outText.deallocate();
pixDestroy(image);
}
}
Tess4j is little complex as its requires specific VC++ redistributable package to be installed.

You can try javaocr on sourceforge: http://javaocr.sourceforge.net/
There is also a great example with an applet which uses Encog: http://www.heatonresearch.com/articles/42/page1.html
That said, OCR requires a lot of power, so it means that if you are looking for a heavy use, you should look after OCR libraries written in C and integrate that with Java.
OCR is hard. So be sure to qualify your needs before adventuring yourself in it.
Tesseract and opencv (with javacv for integration for instance) are common choices. There are also commercial solutions such as ABBYY FineReader Engine and ABBYY Cloud OCR SDK.

Open Source OCR engine is available from Google for OCR.
It can be processed using CMD. You can process the CMD using java for web applications easily. Please visit https://www.youtube.com/watch?v=Mjg4yyuqr5E
. You will get the step by step details to process OCR using CMD.

Related

Can Eclipse for Java and Eclipse for WEBMethods be on the same hard drive?

I am new to WEBMethods. I have been working on a Java service for a project. I really need to be able to write some code in regular Java for some quick testing of reading in a simple text expression with some regular expressions. Nothing at all that fancy with the Java part. But eclipse currently is set up for WEBMethods and I need to be in a regular Java mode for Eclipse (If there is such a thing). At home I have the standard eclipse version and have no trouble writting code. But at work I have WEBMethods installed in the Eclipse (Software AG Designer). I think that if I can write the code in regular Java then I can just copy and paste it into the WEBMethods Java services and set up the INPUT and OUTPUT variables and it should work. But currently I cannot find a way to just write Java code like I do from my home computer.
Question: How can I write just a regular Java program (classes, packages, ...etc...) with a machine with WEBMethods installed? Do I have to install another session of Eclipse on my hard drive? (I tried this a while back and there was an issue with having more than one session of Eclipse on the machine).
Java Web Services Code:
package DssAccessBackup.services.flow;
import com.wm.data.*;
import com.wm.util.Values;
import com.wm.app.b2b.server.Service;
import com.wm.app.b2b.server.ServiceException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public final class new_javaService_SVC
{
/**
* The primary method for the Java service
*
* #param pipeline
* The IData pipeline
* #throws ServiceException
*/
public static final void new_javaService(IData pipeline)
throws ServiceException {
// pipeline
IDataCursor pipelineCursor = pipeline.getCursor();
String inputFileName = IDataUtil.getString( pipelineCursor, "inputFileName" );
pipelineCursor.destroy();
// pipeline
IDataCursor pipelineCursor_1 = pipeline.getCursor();
IDataUtil.put( pipelineCursor_1, "fileName", "fileName" );
// outDoc
IData outDoc = IDataFactory.create();
IDataUtil.put( pipelineCursor_1, "outDoc", outDoc );
pipelineCursor_1.destroy();
String fileName = new String();
fileName = null;
try {
BufferedReader reader = new BufferedReader(new FileReader("C:\\Users\\itpr13266\\Desktop\\TestFile.txt"));
String line = null;
//Will read through the file until EOF
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
} catch (IOException e) {
System.out.println("Try-Catch Message - " + e.getMessage());
e.printStackTrace();
}
}
// --- <<IS-BEGIN-SHARED-SOURCE-AREA>> ---
// --- <<IS-END-SHARED-SOURCE-AREA>> ---
}

You don't need to install another Eclipse for Java development. WebMethods Designer (v9) comes with Java tooling. Just open the Java perspective and use it.
Besides that you should use the Service Development perspective, when developing WebMethods Java Services, because WM Designer handles Java services in a special way, which could make importing standard Java files difficult.

There is no problem running multiple instances of Eclipse at the same time as long as they point to different workspaces.
Normally you get a dialog to choose the workspace when Eclipse starts up. If not, check this answer on how to enable that dialog: https://stackoverflow.com/a/8616216/1599890
So if you download, unzip and set up Eclipse for Java development and point it to another workspace than Software AG Designer uses you should be good to go.

Add Java SE Classes to Java ME read PDF

I'm writing a Java ME application that uses iText to read PDF. When I write my code in standard Java including the iText libraries in the class-path, the application runs. However if I move the code into a java mobile application including the iText libraries in the class-path there is an error during compiling that says
error: cannot access URL
PdfReader reader = new PdfReader(pdfPath);
class file for java.net.URL not found
My problem is that I need a work around to read the PDF file. I've tried adding rt.jar as a library into my code which is the package that contains java.io but it is too big to be compiled. Please help me find a work around. My code is here
package PDFreaderpackage;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.sun.lwuit.Display;
import com.sun.lwuit.Form;
import com.sun.lwuit.TextArea;
import javax.microedition.midlet.MIDlet;
public class Midlet extends MIDlet {
Form displayForm;
TextArea pdfText;
private String bookcontent;
public static String INPUTFILE = "c:/test.pdf";
public static int pageNumber = 1;
public void startApp() {
Display.init(this);
this.bookcontent = readPDF(INPUTFILE, pageNumber);
pdfText = new TextArea(bookcontent);
displayForm = new Form("Works");
displayForm.addComponent(pdfText);
displayForm.show();
}
public void pauseApp() {
}
public void destroyApp(boolean unconditional) {
}
public String readPDF(String pdfPath, int pageNumber) {
try {
PdfReader reader = new PdfReader(pdfPath);
this.bookcontent = PdfTextExtractor.getTextFromPage(reader, pageNumber);
} catch (Exception e) {
System.out.println(e);
}
return bookcontent;
}
}

These classes aren't available on a mobile device and JavaME doesn't support Java 5 features. What you are trying to do is somewhat impractical. Codename One allows some more classes thanks to bytecode processing but even then this isn't close to a complete rt.jar.

If you have the time, you can try and create a Java ME compliant version of iText, but to properly open a PDF the library must use some form of Random Access File because of the xref table at the end of the file. This sort of file connection is not available in Java ME.
What the library can do is to fully load the PDF to memory, which is highly dependent on the file size and the handset memory available.
You better create a Web Service to receive your PDF and return, for example, PNG images from it.

Lotus Domino 8.5.2 Java Agent , write Metadata to extracted attachments?

I'm using Lotus Domino server 8.5.2. Using Java scheduled agents, I can extract the attachments of several Lotus Domino Documents in to the file system (win 32). The Idea is that after extraction I need add some metadata to the files and upload the files to another system.
Does someone knows, or can give me a few tips (preferably using Java) of how I can write some metadata to the extracted files? I need add some keywords, change the author, and so on. I understand Lotus Domino 8.5.2 supports Java 6
thank you!
Alex.

According to this answer, Java 7 has a native ability to manipulate Windows metadata but Java 6 does not.
It does say that you can use Java Native Access (JNA) to make calls to native DLLs, which means you should be able to use dsofile.dll to manipulate the metadata. Example from here of using JNA to access the "puts" function from msvcrt.dll (couldn't find any examples specific to dsofile.dll):
Interface
package CInterface;
import com.sun.jna.Library;
public interface CInterface extends Library
{
public int puts(String str);
}
Sample class
// JNA Demo. Scriptol.com
package CInterface;
import com.sun.jna.Library;
import com.sun.jna.Native;
import com.sun.jna.Platform;
public class hello
{
public static void main(String[] args)
{
String mytext = "Hello World!";
if (args.length != 1)
{
System.err.println("You can enter your own text between quotes...");
System.err.println("Syntax: java -jar /jna/dist/demo.jar \"myowntext\"");
}
else
mytext = args[0];
// Library is c for unix and msvcrt for windows
String libName = "c";
if (System.getProperty("os.name").contains("Windows"))
{
libName = "msvcrt";
}
// Loading dynamically the library
CInterface demo = (CInterface) Native.loadLibrary(libName, CInterface.class);
demo.puts(mytext);
}
}

Unable to load library 'gsdll32'

I am running following code to create bmp image from pdf using Ghost4j
i have a commad which is executed by GhostScript generator to generate Bmp image of a page from pdf.
Code is:
package ghost;
import net.sf.ghost4j.Ghostscript;
import net.sf.ghost4j.GhostscriptException;
public class GhostDemo {
public static void main(String[] a){
Ghostscript gs = Ghostscript.getInstance(); //create gs instance
String[] gsArgs = new String[10];/*command string array*/
gsArgs[0] = "-dUseCropBox";/*use crop box*/
gsArgs[1] = "-dNOPAUSE";
gsArgs[2] = "-dBATCH";
gsArgs[3] = "-dSAFER";
gsArgs[3] = "-r300";
gsArgs[4] = "-sDEVICE=bmp16m";
gsArgs[6] = "-dTextAlphaBits=4";
gsArgs[5] = "-sOutputFile=C:/PagesWorkspace/1/masterData/1.bmp";/*bmp file location with name*/
gsArgs[6] = "C:/MasterWorkspace/pipeline.pdf";/*pdf location with name*/
try {
gs.initialize(gsArgs); /*initialise ghost interpreter*/
gs.exit();
} catch (GhostscriptException e) {
e.printStackTrace();
}
}
}
i am getting Exception
Exception in thread "main" java.lang.UnsatisfiedLinkError: Unable to load library 'gsdll32': The specified module could not be found.
at com.sun.jna.NativeLibrary.loadLibrary(NativeLibrary.java:145)
at com.sun.jna.NativeLibrary.getInstance(NativeLibrary.java:188)
at com.sun.jna.Library$Handler.<init>(Library.java:123)
at com.sun.jna.Native.loadLibrary(Native.java:255)
at com.sun.jna.Native.loadLibrary(Native.java:241)
at net.sf.ghost4j.GhostscriptLibraryLoader.loadLibrary(GhostscriptLibraryLoader.java:36)
at net.sf.ghost4j.GhostscriptLibrary.<clinit>(GhostscriptLibrary.java:32)
at net.sf.ghost4j.Ghostscript.initialize(Ghostscript.java:292)
at ghost.GhostDemo.main(GhostDemo.java:22)
Can any one tell me why i am getting this exception?

Do you have Ghostscript installed at all?
If yes, which version?
If yes, in which location?
Does it include a file gsdll32.dll?
If not, download the Ghostscript installer for Win32 and run it. After the installation, there should be a file gsdll32.dll in directory %your_install_dir%\gs\gs9.05\bin\

Pasting dll file in eclipse project made my program work!

For the SO community, another thing to check with this error is that you are using 32-bit Java. If your instance of Java is 64-bit, you will get the exact same message:
Unable to load library 'gsdll32': The specified module could not be found.
without any further explanation even if you are pointing to the correct dll.

fannj library doesn't work

I'm trying to run project which uses fannj library, but I'm getting error:
Exception in thread "main" java.lang.UnsatisfiedLinkError: Error looking up function 'fann_create_standard_array':
at com.sun.jna.Function.<init>(Function.java:179)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:347)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:327)
at com.sun.jna.Native.register(Native.java:1355)
at com.sun.jna.Native.register(Native.java:1032)
at com.googlecode.fannj.Fann.<clinit>(Fann.java:46)
at javaapplication9.JavaApplication9.main(JavaApplication9.java:14)
Java Result: 1
This is what I did:
I put fannfloat.dll to C:\Windows\System32
I added fannj-0.3.jar to project
I added newest jna.jar to project
here is code:
public static void main(String[] args) {
System.setProperty("jna.library.path", "C:\\Windows\\System32");
System.loadLibrary("fannfloat");
Fann fann=new Fann("D:\\SunSpots.net");
fann.close();
}
SunSpots.net is file from example package. fannfloat.dll: you can get from here.

The "#8" at the end of _fann_create_standard_array indicates that the library is using the stdcall calling convention, so your library interface needs to implement that interface (StdCallLibrary) and it will automatically get the function name mapper applied that converts your simple java name to the decorated stdcall one.
This is covered in the JNA documentation.

It was the first time I had to work with FANN and it took me some time to make it work.
Downloaded Fann 2.2.0. Extract (in my case "C:/FANN-2.2.0-Source") and check the path of the fannfloat.dll file. This is the library that we will use later.
Download fannj-0.6.jar from http://code.google.com/p/fannj/downloads/list.
The dll is compiled for 32 bit environment. So, make sure you have a 32 bit Java installed (even in 64 bit Windows).
I suppose you already have the .net file with your ANN. Write something like this in Java
public class FannTest {
public static void main(String[] args) {
System.setProperty("jna.library.path", "C:/FANN-2.2.0-Source/bin");
Fann fann = new Fann("C:/MySunSpots.net" );
float[] inputs = new float[]{0.686470295f, 0.749375936f, 0.555167249f, 0.816774838f, 0.767848228f, 0.60908637f};
float[] outputs = fann.run( inputs );
fann.close();
for (float f : outputs) {
System.out.print(f + ",");
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

API to read text from Image file using OCR - java

Open Source OCR engine is available from Google for OCR. It can be processed using CMD. You can process the CMD using java for web applications easily. Please visit https://www.youtube.com/watch?v=Mjg4yyuqr5E . You will get the step by step details to process OCR using CMD.

Related

Can Eclipse for Java and Eclipse for WEBMethods be on the same hard drive?

Add Java SE Classes to Java ME read PDF

Lotus Domino 8.5.2 Java Agent , write Metadata to extracted attachments?

Unable to load library 'gsdll32'

fannj library doesn't work

Categories

Resources