why does image rendring through ghostscript API takes so much time? - java

Im using Ghostscript to render images from PDFs through java using commands, however I’m trying to run Ghoscript for image rendering from PDF using ghost4j-0.5.0.jar with the below code that I took it from this website.
The problem is that the rending process takes more than two minutes to generate one image, though it takes a second to do it through command line, the thing is im trying to run every thing through java, I want to stop using imagemagick and ghostscript as a tools, please note that im satisfied with using ghoscript and i don't want to use any other tool as it provides me with the image quality and sizes i need,
the code im using is,:
public class SimpleRendererExample {
public static void main(String[] args) {
imageRenderingFromPdf();
}
public static void imageRenderingFromPdf() {
try {
PDFConverter converter = new PDFConverter();
PDFDocument doc;
// load PDF document
PDFDocument document = new PDFDocument();
document.load(new File("d:/cur/outputfile.pdf"));
// create renderer
SimpleRenderer renderer = new SimpleRenderer();
// set resolution (in DPI)
renderer.setResolution(100);
System.out.println("started");
// render
long before = System.currentTimeMillis();
List<Image> images = renderer.render(document);
long after = System.currentTimeMillis();
System.out.println("reder " + (after - before) / 1000);
// write images to files to disk as PNG
try {
before = System.currentTimeMillis();
ImageIO.write((RenderedImage) images.get(0), "png", new File(
"d:/dd" + ".png"));
after = System.currentTimeMillis();
System.out.println("write " + (after - before) / 1000);
} catch (IOException e) {
System.out.println("ERROR: " + e.getMessage());
}
} catch (Exception e) {
System.out.println("ERROR: " + e.getMessage());
}
}

There are couple of things what's slowing down the 'rendering' process.
First of all, it's not due Ghostscript, Ghostscript by it's self works a same and it doesn't matter if it's executed via command line or API.
The speed difference is the result of ghost4j rendering implementation. I just checked the source code of the ghost4j and I see that it's a mixture of the iText and Ghostscript implementation.
So, how the code that you use works:
First the pdf document is loaded and parsed by iText.
Then a copy of the complete document is made by writing loaded pdf document back to disk to a new place.
Then Ghostscript is initialized.
Then Ghostscript loads, parse and render the document from a new place for a second time.
For each page, Ghostscript is calling ghost4j display device callback.
For each display device callback, ghost4j takes rasterized page from the memory and stores it to the disk.
The end.
Week parts are iText and used display device callback. I thing that the speed could be gained by letting Ghostscript take care of the rasterized result storage instead of doing it manually from the Java...
I think now you can see why you noticed the speed difference.

Related

Seperate all types of videos into frames(series of images)

Hi I am working on image processing where I need to convert all types
of videos into series of frames.
I already tried with JCodec which worked only with .mp4 type videos.
Below code shows what I did before to grab frames from video,
try {
int frameNumber = 0;
BufferedImage frame = null;
for(int i=0;i<100;i++)
{
frameNumber = i;
//video from which frames can be retrieved, declare frame number,
//returns numbered frame from video
frame = FrameGrab.getFrame(new File("D:\\Traffic.mp4\\"), frameNumber);
//write frame as image declare image format and file path where image
//is to be write
ImageIO.write(frame, "png", new File("D:\\Frames2\\frame_"+frameNumber+".png\\"));
}
System.out.println("Finished");
} catch (Exception e) {
e.printStackTrace();
}
Here in above code I am trying to read Traffic.mp4 video and getting
first 100 frames. This code is working fine for all types of .mp4
videos, but as I tried with .flv, .avi type of video it is giving
me NullPointer exception.
So is there any other Java API I can try which accepts all type of
videos.
You can try using xuggler, a wrapper of FFMpeg for java. FFMpeg has support for processing individual frames so I guess Xuggler won't be far from it, the demo DecodeAndCaptureFrames.java seems to be doing what you want.
Take a look at Marvin Framework. It uses JavaCV as an interface to cameras and media files. Check this Example showing how to get each video frame from a WMV file for image processing purpose.
In the JavaCV group there is a Discussion about supported file formats.

Do any Java OCR tools convert images of text into editable text files?

I'm working on a project that entails photographing text (from any hard copy of text) and converting that text into a text file. Then I'd like to use that text file to do some different things, such as provide hyperlinks to news articles or allow the user to edit the document.
The tool I've tried so far is Java OCR from sourceforge.net, which works fine on the images provided in the package. But when I photograph my own text, it doesnt work at all. Is there some training process I should be implementing? If so, does anybody know how to implement it? Any help will go a long way. Thank you!
I have a java application where I ended up deciding to use Tesseract OCR, and just call out to it using Runtime.exec(). Perhaps not quite the answer you need, but just in case you'd not considered it.
Edit + code added in response to comment reply
On a Windows installation I think I was able to use an installer, or unzip a ready made binary.
On a Linux server, I needed to compile Tesseract myself, but it's not too hard if you're used to that kind of thing (gcc); the only gotcha is that there's a dependency on Leptonica which also needs to be compiled.
// Tesseract can only handle .tif format, so we have to convert it
ImageIO.write( ImageIO.read( new java.io.File(file.getPath())), "tif", tmpFile[0]);
String[] tesseractCmd = new String[]{"tesseract", tmpFile[0].getAbsolutePath(), StringUtils.removeEnd(tmpFile[1].getAbsolutePath(), ".txt")};
final Process process = Runtime.getRuntime().exec(tesseractCmd);
try {
int exitValue = process.waitFor();
if(exitValue == 0) {
final String extractedText = SearchableTextExtractionUtils.extractPlainText(new FileReader(tmpFile[1]));
return extractedText;
}
throw new SearchableTextExtractionException(exitValue, Arrays.toString(tesseractCmd));
} catch (InterruptedException e) {
throw new SearchableTextExtractionException(e);
} finally {
process.destroy();
}

Java Communication Server to print 1 million PDF document

I have a Java batch job which prints 1 million (1 page) PDF document.
This batch job will run after every 5 days.
For printing 1 million (1 Page) PDF document through batch job, which method is better ?
In this PDF most of the text / paragraph is same for all customers, only few information is dynamically picked from database as (Customer Id/ Name/ Due Date/ Expiry Date/ Amount)
We have tried following
1) Jasper Report
2) iText
But above 2 methods are not giving better performance as static text / paragraph for each document is created runtime always.
So I am thinking for some approach like
There will be a template with place holders for dynamic values (Customer Id/ Name/ Due Date/ Expiry Date/ Amount).
There will be a Communication Server like Open Office, which will have this template.
Through our Java Application deployed on web server will fetch dataset from database and pass onto this communication server, where templates are already opened into memory and just from dataset dynamic placeholder values will be changed and template will be saved like "Save As" command.
Can this above approach will be achievable, If yes which API / Communication server is better ?
Here is Jasper Report Code for reference
InputStream is = getClass().getResourceAsStream("/jasperreports/reports/"+reportName+".jasper" );
JasperPrint print = JasperFillManager.fillReport(is, parameters, dataSource);
pdf = File.createTempFile("report.pdf", "");
JasperExportManager.exportReportToPdfFile(print, pdf.getPath());
Wow. 1 million PDF files every 5 days.
Even if it takes you just 0.5 second to generate a PDF file from the beginning to end (a finished file on disk) - It will take you a FULL 5 days to generate this amount of PDFs sequentially.
I think any approach of generating the file in sub-second amount of time is fine (and Jasper reports certainly can give you this level of performance).
I think you need to think about how you're going to optimise the whole process: you're certainly going to have to use multi-threading and perhaps even several physical servers to generate this amount of files in any reasonable amount of time (at least overnight).
I will go with PDF forms (this should be "fast"):
public final class Batch
{
private static final String FORM = "pdf-form.pdf"
public static void main(final String[] args) {
final PdfPrinter printer = new PdfPrinter(FORM);
final List<Customer> customers = readCustomers();
for(final Customer customer : customers) {
try {
printer.fillAndCreate("pdf-" + customer.getId(), customer);
} catch (IOException e) {
// handle exception
} catch (DocumentException e) {
// handle exception
}
}
printer.close();
}
private #Nonnull List<Customers> readCustomers() {
// implements me
}
private Batch() {
// nothing
}
}
public class PdfPrinter implements Closable
{
private final PdfReader reader;
public PdfPrinter(#Nonnull final String src) {
reader = new PdfReader(src); // <= this reads the form pdf
}
#Override
public void close() {
reader.close();
}
public void fillAndCreate(#Nonnull final String dest, #Nonnull final Customer customer) throws IOException, DocumentException {
final PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest)); // dest = output
final AcroFields form = stamper.getAcroFields();
form.setField("customerId", customer.getId());
form.setField("name", customer.getName());
// ...
stamper.close();
}
}
see also: http://itextpdf.com/examples/iia.php?id=164
As a couple of posters mentioned, 1 million PDF files is going to mean you are going to have to sustain a rate of over 2 documents per second. This is achievable from a pure document-generation aspect, but you need to keep in mind that the load on the systems running the queries and compiling the data will also undergo a reasonable load. You also haven't said anything about the PDFs - a one page PDF is much easier to generate than a 40 page PDF...
I have seen iText and Docmosis achieve tens of documents per second and so Jasper and other technologies probably could also. I mention Docmosis because it works along the lines of the technique you mentioned (populating templates loaded into memory). Please note I work for the company that produces Docmosis.
If you haven't already, you will need to consider the hardware/software architecture and run trials with whatever technologies you are trying to make sure you will be able to get the performance you require. Presumably the peak-load might be somewhat higher than the average load.
Good luck.

Inconsistent result while reading multiple QR Code from scanned PDF using zxing library

I am new to zxing library and so to QR Codes. Using zxing library 1.7 I have generated QR codes, those QR codes are sticked to the papers and papers are later scanned in a PDF. I do have created client program of course using zxing library itself which reads this scanned PDF page by page and shows QR Code text if any QR Code is found on a page. I am trying to read multiple QR from each page of scanned PDF.
Though I am able to read some QR code the result is inconsistent. Means I am able to read some QR code in a PDF page while some of them are not getting recognized by my client program. I have gone through other threads for same topic. and modified my code a little though I am not able to get 100% result.
Here is my code snippet to give more idea about what I am exactly doing.
Note: I am using PdfReaderContentParser of itext PDF library to extract scanned image of each pdf page as shown here
private void extractBarcodeText(BufferedImage bufferedImage) {
try {
Hashtable<DecodeHintType, Object> hints = new Hashtable<DecodeHintType, Object>();
hints.put(DecodeHintType.TRY_HARDER, BarcodeFormat.QR_CODE);
LuminanceSource source = new com.google.zxing.client.j2se.BufferedImageLuminanceSource(bufferedImage);
BinaryBitmap bitmap = new BinaryBitmap(new GlobalHistogramBinarizer(source));
List<String> innerTextList = new ArrayList<String>();
QRCodeMultiReader multiReader = new QRCodeMultiReader();
Result[] results = multiReader.decodeMultiple(bitmap, hints);
for (int k = 0; k < results.length; k++) {
String text = results[k].getText();
innerTextList.add(text);
System.out.println("#################### Rendered Text from Image #################"+ " " + text);
}
} catch (NotFoundException e) {
e.printStackTrace();
}
}
I have tried many combinations but no luck. Is it due to poor image quality ? But then how some images are getting recognized and some remains as a mystery :(
Do anyone know what should I do to overcome this issue? Here is one sample image at bottom for your reference, in that first image is getting recognized using above code where second one (HRA) is not.!
My guess based on what you've said is that you need to lightly blur or down-sample the image. The large amount of white noise interferes with detection.

Is there a 100% Java alternative to ImageIO for reading JPEG files?

We are using Java2D to resize photos uploaded to our website, but we run into an issue (a seemingly old one, cf.: http://forums.sun.com/thread.jspa?threadID=5425569) - a few particular JPEGs raise a CMMException when we try to ImageIO.read() an InputStream containing their binary data:
java.awt.color.CMMException: Invalid image format
at sun.awt.color.CMM.checkStatus(CMM.java:131)
at sun.awt.color.ICC_Transform.<init>(ICC_Transform.java:89)
at java.awt.image.ColorConvertOp.filter(ColorConvertOp.java:516)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.acceptPixels(JPEGImageReader.java:1114)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.readImage(Native Method)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.readInternal(JPEGImageReader.java:1082)
at com.sun.imageio.plugins.jpeg.JPEGImageReader.read(JPEGImageReader.java:897)
at javax.imageio.ImageIO.read(ImageIO.java:1422)
at javax.imageio.ImageIO.read(ImageIO.java:1326)
...
(snipped the remainder of the stack trace, which is our ImageIO.read() call, servlet code and such)
We narrowed it down to photos taken on specific cameras, and I selected a photo that triggers this error: http://img214.imageshack.us/img214/5121/estacaosp.jpg.
We noticed that this only happens with Sun's JVM (on Linux and Mac, just tested it on 1.6.0_20) - a test machine with OpenJDK reads the same photos without a hitch, possibly due to a different implementation of the JPEG reader.
Unfortunately, we are unable to switch JVMs in production, nor to use native-dependent solutions such as ImageMagick ( http://www.imagemagick.org/ ).
Considering that, my question is: Does a replacement for ImageIOs JPEG reader which can handle photos such as the linked one exist? If not, is there another 100% pure Java photo resizing solution which we can use?
Thank you!
One possibly useful library for you could be the Java Advanced Imaging Library (JAI)
Using this library can be quite a bit more complicated than using ImageIO but in a quick test I just ran, it did open and display the problem image file you linked.
public static void main(String[] args) {
RenderedImage image = JAI.create("fileload", "estacaosp.jpg");
float scale=(float) 0.5;
ParameterBlock pb = new ParameterBlock();
pb.addSource(image);
pb.add(scale);
pb.add(scale);
pb.add(1.0F);
pb.add(1.0F);
pb.add(new InterpolationNearest() );// ;InterpolationBilinear());
image = JAI.create("scale", pb);
// Create an instance of DisplayJAI.
DisplayJAI srcdj = new DisplayJAI(image);
JScrollPane srcScrollPaneImage = new JScrollPane(srcdj);
// Use a label to display the image
JFrame frame = new JFrame();
frame.getContentPane().add(srcScrollPaneImage, BorderLayout.CENTER);
frame.pack();
frame.setVisible(true);
}
After running this code the image seems to load fine. It is then resized by 50% using the ParamaterBlock
And finally if you wish to save the file you can just call :
String filename2 = new String ("tofile.jpg");
String format = new String ("JPEG");
RenderedOp op = JAI.create ("filestore", image, filename2, format);
I hope this helps you out. Best of luck.
Old post, but for future reference:
Inspired by this question and links found here, I've written a JPEGImageReader plugin for ImageIO that supports JPEG images with these kind of "bad" ICC color profiles (the "issue" is the rendering intent in the ICC profile is incompatible with Java's ColorConvertOp). It's plain Java and does not require JAI.
The source code and linked binary builds are freely available from the TwelveMonkeys project on GitHub.
I faced the same issue. I was reluctant to use JAI as it is outdated but it looks like it's the shortest solution.
This code converts an InputStream to a BufferedImage, using sun's ImageIO (fast) or in the few cases where this problem occur, using JAI:
public static BufferedImage read(InputStream is) throws IOException {
try {
// We try it with ImageIO
return ImageIO.read(ImageIO.createImageInputStream(is));
} catch (CMMException ex) {
// If we failed...
// We reset the inputStream (start from the beginning)
is.reset();
// And use JAI
return JAI.create("stream", SeekableStream.wrapInputStream(is, true)).getAsBufferedImage();
}
}

Categories