Tesseract image to searchable pdf in java

Tesseract image to searchable pdf in java - java

I am trying to convert the image to a searchable pdf using tesseract. The below command line option working fine for me.
Exploring a similar option in java. But not sure what to pass in the arguments. Below is my java code
import java.io.File;
import java.util.Arrays;
import java.util.List;
import net.sf.saxon.expr.instruct.ValueOf;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
public class Mask2 {
public static void main(String[] args) {
File image = new File("D:\\ML\\Java\\img3.PNG");
Tesseract tesseract = new Tesseract();
tesseract.setDatapath("C://Program Files//Tesseract-OCR//tessdata");
tesseract.setLanguage("eng");
tesseract.setPageSegMode(1);
tesseract.setOcrEngineMode(1);
try {
// Not sure what to pass in arguments
tesseract.createDocumentsWithResults()
} catch (TesseractException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
Any Suggestions / Solutions would be much helpful.

you can create a list of renderFormats like this ( you can add others)
List<RenderedFormat> renderFormats = new ArrayList<RenderedFormat>();
renderFormats.add(RenderedFormat.PDF);
and then you can pass the path of the input filename (PDF or IMG), the path of the output filename with no extension, and the render format you want to use.
tesseract.createDocuments("a/b/c/inputfile.PNG", "a/b/c/outputfile", renderFormats);
Ciao!

Related

Java Creating copy of file before uploading to AWS cloud

I have an image in a directory.
I want to make a copy of that image with a different name without doing harm to the original image in the same directory.
So there will be two same images in one folder with a different name.
I want a basic code like I tried -
File source = new File("resources/"+getImage(0));
File dest = new File("resources/");
source.renameTo("resources/"+getImage(0)+);
try {
FileUtils.copyDirectory(source, dest);
} catch (IOException e) {
e.printStackTrace();
}
When I upload the same image to the Amazon server multiple times in automation and then it starts giving issue to upload.
So we want to upload a mirror copy of image everytime.
In eclipse generally have resources folder. I want to make copy of a original image every-time before we upload and delete it after upload.
Kindly suggest some approach

You can just copy the file and use StandardCopyOption.COPY_ATTRIBUTES
public static final StandardCopyOption COPY_ATTRIBUTES
Copy attributes to the new file.
Files.copy(Paths.get(//path//to//file//and//filename),
Paths.get(//path//to//file//and//newfilename), StandardCopyOption.COPY_ATTRIBUTES);

Not a perfect solution, but Instead of handling pop-up box we can directly force file path into the form: [I have used date-stamp for creating new filenames but some different logic could also be used viz- Random String appender etc.]
import org.junit.jupiter.api.Test;
import java.io.*;
import java.nio.file.Files;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
public class Upload {
private static final String SRC_RESOURCES_FILE_PATH = System.getProperty("user.dir")+"/src/resources/";
File s1 = new File(SRC_RESOURCES_FILE_PATH+"Img1.png");
File s2 = new File(SRC_RESOURCES_FILE_PATH+"Img"+getDateStamp()+".png");
#Test
public void uploadFunction() throws IOException {
copyFileUsingJava7Files(s1,s2);
}
private String getDateStamp(){
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
Date date = new Date();
return dateFormat.format(date).toString();
}
private static void copyFileUsingJava7Files(File source, File dest)
throws IOException {
Files.copy(source.toPath(), dest.toPath());
}
}

How to Open a PDF at a Named Destination

I need to write a Java program that opens a PDF file at a named destination. The file test.pdf contains the named destination "DestinationX" on page 2. The program opens the PDF file but does not go to the named destination. How do I get to the named destination?
import java.awt.Desktop;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
public class MyLauncher {
static void openFileAtNamedDest(){
if (Desktop.isDesktopSupported()) {
try {
URI myURI = new URI("file:///C:/test.pdf#nameddest=DestinationX");
Desktop.getDesktop().browse( myURI );
} catch (IOException e) {
e.printStackTrace();
}
catch (URISyntaxException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) {
openFileAtNamedDest();
}
}

According to the spec, the format of your URL is correct. The only question is what application you are actually launching via browse(). I think it acts the same way as if you had double-clicked the file's icon on your desktop: it will launch whatever application is registered as the default handler for PDFs.
Acrobat should be able to handle a URL with a named destination, but other PDF viewers may not support it.

How to perform OCR on an image containing euro symbol with Tess4J?

I have the following image I want to OCR:
I'm using Tess4J for this and followed these instructions.
This is what I'm trying:
import java.io.File;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.TesseractException;
public class Main {
public static void main(String[] args) {
// Perform OCR
// ===========
File imageFile = new File("./CroppedSubtotal.png");
ITesseract instance = new Tesseract(); // JNA Interface Mapping
try {
String result = instance.doOCR(imageFile);
System.out.println("====== Result: " + result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
}
}
When I run this in IntelliJ the console returns the following:
/Library/Java/JavaVirtualMachines/jdk1.7.0_80.jdk ...
====== Result:
Process finished with exit code 0
What can I try to fix this?
Update:
When I OCR on the images below it does work
The euro symbol must be the cause. I've tried adding it to the whitelist but without success
instance.setTessVariable("tessedit_char_whitelist", "€0123456789,.");

Tesseract recognizes Euro sign just fine using English data pack. Your console may have not been able to display it.

How to setup zxing library on Windows 8 machine?

I have images of codes that I want to decode. How can I use zxing so that I specify the image location and get the decoded text back, and in case the decoding fails (it will for some images, that's the project), it gives me an error.
How can I setup zxing on my Windows machine? I downloaded the jar file, but I don't know where to start. I understand I'll have to create a code to read the image and supply it to the library reader method, but a guide how to do that would be very helpful.

I was able to do it. Downloaded the source and added the following code. Bit rustic, but gets the work done.
import com.google.zxing.NotFoundException;
import com.google.zxing.ChecksumException;
import com.google.zxing.FormatException;
import com.google.zxing.BarcodeFormat;
import com.google.zxing.DecodeHintType;
import com.google.zxing.Reader;
import com.google.zxing.BinaryBitmap;
import com.google.zxing.Result;
import com.google.zxing.LuminanceSource;
import com.google.zxing.client.j2se.BufferedImageLuminanceSource;
import com.google.zxing.common.HybridBinarizer;
import java.awt.image.BufferedImage;
import javax.imageio.ImageIO;
import java.io.File;
import java.io.IOException;
import java.util.*;
import com.google.zxing.qrcode.QRCodeReader;
class qr
{
public static void main(String args[])
{
Reader xReader = new QRCodeReader();
BufferedImage dest = null;
try
{
dest = ImageIO.read(new File(args[0]));
}
catch(IOException e)
{
System.out.println("Cannot load input image");
}
LuminanceSource source = new BufferedImageLuminanceSource(dest);
BinaryBitmap bitmap = new BinaryBitmap(new HybridBinarizer(source));
Vector<BarcodeFormat> barcodeFormats = new Vector<BarcodeFormat>();
barcodeFormats.add(BarcodeFormat.QR_CODE);
HashMap<DecodeHintType, Object> decodeHints = new HashMap<DecodeHintType, Object>(3);
decodeHints.put(DecodeHintType.POSSIBLE_FORMATS, barcodeFormats);
decodeHints.put(DecodeHintType.TRY_HARDER, Boolean.TRUE);
Result result = null;
try
{
result = xReader.decode(bitmap, decodeHints);
System.out.println("Code Decoded");
String text = result.getText();
System.out.println(text);
}
catch(NotFoundException e)
{
System.out.println("Decoding Failed");
}
catch(ChecksumException e)
{
System.out.println("Checksum error");
}
catch(FormatException e)
{
System.out.println("Wrong format");
}
}
}

The project includes a class called CommandLineRunner which you can simply call from the command line. You can also look at its source to see how it works and reuse it.
There is nothing to install or set up. It's a library. Typically you don't download the jar but declare it as a dependency in your Maven-based project.
If you just want to send an image to decode, use http://zxing.org/w/decode.jspx

How to get the single images of an mp4-Movie in Java

I want to do some image analysis on a video that's stored in .mp4 format. Therefore I need a way to just get the images of this movie in Java.
I goolged a lot and found some libraries like jcodec and jaad. BUT I wasn't able to get the things running with these libraries. And as I found out, there were examples (at least I found none) that showed my usecase.
Can you help me? Do you know any library that can do what I need and is running at least on Win7 64 bit.
Or do you know how to accomplish this with jcodec?
edit:
As I wrote, I tried it with jcodec. I found out how to get the data of a frame, but not how I can get it into something like a BufferedImage or so. I expect that these data isn't in a simple RGB format but in any compressed format or so. (Am I right with that?) I don't know to to decode this data.
You can get the data of a frame with jcodec as follows (at least as far as I understand this):
public static void main(String[] args) throws IOException, MP4DemuxerException {
String path = "videos/video-2011-09-21-20-07-21.mp4";
MP4Demuxer demuxer1 = new MP4Demuxer(new FileInput(new File(path)));
DemuxerTrack videoTrack = demuxer1.getVideoTrack();
Packet firstFrame = videoTrack.getFrames(1);
byte[] data = firstFrame.getData();
}
I also found the following:
http://code.google.com/p/jcodec/source/browse/trunk/src/test/java/org/jcodec/containers/mp4/DitherTest.java?r=70
But this isn't working (has compile errors) with the downloadable jar-package.

you could use jcodec(https://github.com/jcodec/jcodec) in the followinf program i am extracting frames from a video.
/*
* To extract frames from a mp4(avc) video
*
*/
package avc_frame;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import org.jcodec.api.FrameGrab;
import org.jcodec.api.JCodecException;
public class Avc_frame {
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws IOException, JCodecException {
long time = System.currentTimeMillis();
for (int i = 50; i < 57; i++) {
BufferedImage frame = FrameGrab.getFrame(new File("/Users/jovi/Movies/test.mp4"), i);
ImageIO.write(frame, "bmp", new File("/Users/jovi/Desktop/frames/frame_"+i+".bmp"));
}
System.out.println("Time Used:" + (System.currentTimeMillis() - time)+" Milliseconds");
}
}

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
import javax.imageio.ImageIO;
import org.bytedeco.javacpp.opencv_core.IplImage;
import org.bytedeco.javacv.FFmpegFrameGrabber;
import org.bytedeco.javacv.FrameGrabber.Exception;
public class Read{
public static void main(String []args) throws IOException, Exception
{
FFmpegFrameGrabber frameGrabber = new FFmpegFrameGrabber("C:/Users/Digilog/Downloads/Test.mp4");
frameGrabber.start();
IplImage i;
try {
i = frameGrabber.grab();
BufferedImage bi = i.getBufferedImage();
ImageIO.write(bi,"png", new File("D:/Img.png"));
frameGrabber.stop();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Tesseract image to searchable pdf in java - java

Related

Java Creating copy of file before uploading to AWS cloud

How to Open a PDF at a Named Destination

How to perform OCR on an image containing euro symbol with Tess4J?

How to setup zxing library on Windows 8 machine?

How to get the single images of an mp4-Movie in Java

Categories

Resources