Silent Printing of PDF From Within Java - java

We are looking into silent printing of PDF documents from within Java. The printing will be invoked from the desktop and not through a browser so we cannot use JavaScript. PDF Renderer is an operational solution but their rendering quality is not acceptable. iText does not seem to be pluggable with the Java print service. There are some commercial Java libraries, jPDFPrint by Qoppa, JPedal, and ICEpdf which we have not tried out yet.
Does anybody have any experience with PDF silent printing from Java?

Apache PDFBox. It is currently in incubation, but the PDF printing functionality has been around before that. Internally, it uses the Java Print Services to create a print job, and it also supports silent printing.
Do note that it requires Fontbox as well, and the current (upcoming 0.8.0 release) has included graceful fallback for documents with Type 0 fonts. Type 1 fonts are printed correctly; however in 0.7.3, attempts to print documents with Type 0 fonts will result in an exception being thrown.

Maybe I'm misunderstanding, but why not just use the Print Service API directly? The following works for me (assumes you have the PDF document as a byte array):
DocFlavor flavor = DocFlavor.BYTE_ARRAY.PDF;
PrintService[] services = PrintServiceLookup.lookupPrintServices(flavor, null);
if (services.length > 0)
{
DocPrintJob printJob = services[0].createPrintJob();
Doc document = new SimpleDoc(pdfBytes, flavor, null)
printJob.print(document, null);
}
else
{
System.out.println("No PDF printer available.");
}

This works for me:
public void print() {
DocFlavor flavor = DocFlavor.INPUT_STREAM.AUTOSENSE;
PrintService[] services = PrintServiceLookup.lookupPrintServices(flavor, null);
FileInputStream psStream = null;
try {
psStream = new FileInputStream("c:\\test.pdf");
} catch (FileNotFoundException ffne) {
ffne.printStackTrace();
}
if (psStream == null) {
return;
}
if (services.length > 0)
{
PrintService myService = null;
for(PrintService service : services) {
System.out.println(service.getName());
if(service.getName().contains("my printer")) {
myService = service;
break;
}
}
DocPrintJob printJob = myService.createPrintJob();
Doc document = new SimpleDoc(psStream, flavor, null);
try {
printJob.print(document, null);
} catch (PrintException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
else
{
System.out.println("No PDF printer available.");
}
}

Have a look at www.pdflib.com. Its comercial but PDFlib Lite is available for free for open source projects. It has bindings for java.

There is an example using JPedal at http://www.jpedal.org/support_egSP.php
You will need the commercial version of IcePdf if you want full font support.

I have experience with making Acrobat (Reader or Full) do the printing, but it's anything but silent (it is unattended, though - just depends on how 'silent' the silent requirement is). If there's interest, I can shoot you the native code that makes the required DDE calls.

iText is intended for creating PDF files (per a post I saw from the author), and thus probably isn't what you want.
I've used Qoppa's jPDFPrint quite successfully for exactly this purpose, but it's not cheap. If you can afford it, it's the most robust solution I've found thus far. I've also been very impressed with the level of support; they even generated some custom sample code for me.
I tried PDFBox, but found that it doesn't support the "Shrink to printable area" page scaling that you get with Acrobat. Not everyone will care about this feature, but it's essential for me.

Related

Adding fonts to Apache Pdfbox?

Is there a way to add additional font styles into Apache Pdfbox?
We're currently trying to work around printing PDFs in our system (currently being done with PDF-Renderer.) I have been looking at various alternatives (pdfbox, jpedal, jPDFPrint)
Our hope is for a free GPL compatible library to use, and as such we're leaning towards pdfbox. I have been able to write some sample code to print out the pdf which 'works'. See below:
PDDocument doc;
try {
doc = PDDocument.load("test.pdf");
doc.print();
} catch (Exception e) {
// Come up with better thing to do on fail.
e.printStackTrace();
}
As I mentioned, this works but the problem I'm running into is that PdfBox doesn't seem to be recognizing the fonts used in the pdf, and as such changes the font being used. As a result the document looks very odd (spacing and character size are different and look bizarre). I routinely see the following log message, or things like it:
Apr 16, 2014 2:56:21 PM org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString
WARNING: Changing font on < > from < NimbusMono > to the default font
Does anyone know of a way (or a reference) on how to approach adding a new fonttype into pdfbox? Or barring that, how to change the default font type?
From what I can tell, pdfbox supports 14 standard fonts. Unfortunately NimbusMono is not one of them. Any guidance would be appreciated.
The unreleased 2.0 version supports the rendering of embedded fonts. You can get it as a snapshot
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/
or through "svn checkout http://svn.apache.org/repos/asf/pdfbox/trunk/". The API is slightly different from the 1.8.x versions and might change, the best is to look at the code examples. A quick test to see whether your file will be rendered properly is to download the "pdfbox-app"
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/
and then run the viewer:
java -jar pdfbox-app-2.0.0-20140416.173452-273.jar PDFReader your-file-name.pdf
There's also a print feature.
Good luck!
Update 2016: 2.0 release is out, download it here.
If you have used the 1.8 version, read the migration guide.
I came across this post while trying to solve the same problem. The PDFBox 2.0 API documentation isn't great at the moment.
What you're looking for is the FontFileFinder in Fontbox.
Make sure you're using the full pdfbox-app jar which includes Fontbox.
I've only tried this on Windows but looking at the classes it seems like it supports the other main operating systems.
Here's a simple example class I wrote that writes out a small bit of text in the bottom left corner of a PDF, using a non-standard font.
import java.io.File;
import java.io.IOException;
import java.net.URI;
import java.util.List;
import org.apache.fontbox.util.autodetect.FontFileFinder;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
public class TestPDFWrite {
public static void main(String[] args) throws IOException {
FontFileFinder fontFinder = new FontFileFinder();
List<URI> fontURIs = fontFinder.find();
File fontFile = null;
for (URI uri : fontURIs) {
File font = new File(uri);
if (font.getName().equals("CHILLER.TTF")) {
fontFile = font;
}
}
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.beginText();
if (fontFile != null) {
contentStream.setFont(PDType0Font.load(document, fontFile), 12);
} else {
contentStream.setFont(PDType1Font.HELVETICA, 12);
}
contentStream.newLineAtOffset(10, 10);
contentStream.showText("Hello World");
contentStream.endText();
contentStream.close();
document.save("C:/Hello World.pdf");
document.close();
}
}
I ran into a similar problem with PDFBox. PDFs can be printed in a straightforward way using Java's javax.print package. The following code is slightly modified from the API docs for javax.print.
DocFlavor flavor = DocFlavor.INPUT_STREAM.PDF;
PrintRequestAttributeSet aset = new HashPrintRequestAttributeSet();
aset.add(MediaSizeName.ISO_C6); //letter size
PrintService[] pservices = PrintServiceLookup.lookupPrintServices(flavor, aset);
if (pservices.length > 0) {
DocPrintJob pj = pservices[0].createPrintJob();
try {
FileInputStream fis = new FileInputStream("test.pdf");
Doc doc = new SimpleDoc(fis, flavor, null);
pj.print(doc, aset);
} catch (FileNotFoundException | PrintException e) {
//do something
}
This code assumes that the printer can accept a PDF directly but it allows you to bypass PDFBox 1.8 branch's wonky font issues.

why does image rendring through ghostscript API takes so much time?

Im using Ghostscript to render images from PDFs through java using commands, however I’m trying to run Ghoscript for image rendering from PDF using ghost4j-0.5.0.jar with the below code that I took it from this website.
The problem is that the rending process takes more than two minutes to generate one image, though it takes a second to do it through command line, the thing is im trying to run every thing through java, I want to stop using imagemagick and ghostscript as a tools, please note that im satisfied with using ghoscript and i don't want to use any other tool as it provides me with the image quality and sizes i need,
the code im using is,:
public class SimpleRendererExample {
public static void main(String[] args) {
imageRenderingFromPdf();
}
public static void imageRenderingFromPdf() {
try {
PDFConverter converter = new PDFConverter();
PDFDocument doc;
// load PDF document
PDFDocument document = new PDFDocument();
document.load(new File("d:/cur/outputfile.pdf"));
// create renderer
SimpleRenderer renderer = new SimpleRenderer();
// set resolution (in DPI)
renderer.setResolution(100);
System.out.println("started");
// render
long before = System.currentTimeMillis();
List<Image> images = renderer.render(document);
long after = System.currentTimeMillis();
System.out.println("reder " + (after - before) / 1000);
// write images to files to disk as PNG
try {
before = System.currentTimeMillis();
ImageIO.write((RenderedImage) images.get(0), "png", new File(
"d:/dd" + ".png"));
after = System.currentTimeMillis();
System.out.println("write " + (after - before) / 1000);
} catch (IOException e) {
System.out.println("ERROR: " + e.getMessage());
}
} catch (Exception e) {
System.out.println("ERROR: " + e.getMessage());
}
}
There are couple of things what's slowing down the 'rendering' process.
First of all, it's not due Ghostscript, Ghostscript by it's self works a same and it doesn't matter if it's executed via command line or API.
The speed difference is the result of ghost4j rendering implementation. I just checked the source code of the ghost4j and I see that it's a mixture of the iText and Ghostscript implementation.
So, how the code that you use works:
First the pdf document is loaded and parsed by iText.
Then a copy of the complete document is made by writing loaded pdf document back to disk to a new place.
Then Ghostscript is initialized.
Then Ghostscript loads, parse and render the document from a new place for a second time.
For each page, Ghostscript is calling ghost4j display device callback.
For each display device callback, ghost4j takes rasterized page from the memory and stores it to the disk.
The end.
Week parts are iText and used display device callback. I thing that the speed could be gained by letting Ghostscript take care of the rasterized result storage instead of doing it manually from the Java...
I think now you can see why you noticed the speed difference.

Do any Java OCR tools convert images of text into editable text files?

I'm working on a project that entails photographing text (from any hard copy of text) and converting that text into a text file. Then I'd like to use that text file to do some different things, such as provide hyperlinks to news articles or allow the user to edit the document.
The tool I've tried so far is Java OCR from sourceforge.net, which works fine on the images provided in the package. But when I photograph my own text, it doesnt work at all. Is there some training process I should be implementing? If so, does anybody know how to implement it? Any help will go a long way. Thank you!
I have a java application where I ended up deciding to use Tesseract OCR, and just call out to it using Runtime.exec(). Perhaps not quite the answer you need, but just in case you'd not considered it.
Edit + code added in response to comment reply
On a Windows installation I think I was able to use an installer, or unzip a ready made binary.
On a Linux server, I needed to compile Tesseract myself, but it's not too hard if you're used to that kind of thing (gcc); the only gotcha is that there's a dependency on Leptonica which also needs to be compiled.
// Tesseract can only handle .tif format, so we have to convert it
ImageIO.write( ImageIO.read( new java.io.File(file.getPath())), "tif", tmpFile[0]);
String[] tesseractCmd = new String[]{"tesseract", tmpFile[0].getAbsolutePath(), StringUtils.removeEnd(tmpFile[1].getAbsolutePath(), ".txt")};
final Process process = Runtime.getRuntime().exec(tesseractCmd);
try {
int exitValue = process.waitFor();
if(exitValue == 0) {
final String extractedText = SearchableTextExtractionUtils.extractPlainText(new FileReader(tmpFile[1]));
return extractedText;
}
throw new SearchableTextExtractionException(exitValue, Arrays.toString(tesseractCmd));
} catch (InterruptedException e) {
throw new SearchableTextExtractionException(e);
} finally {
process.destroy();
}

Printing to an Epson PictureMate from Java

I'm to use the Java printing API to print a JPG to an Epson PictureMate photo printer. I want the print to take up the entire page. The image prints but it has an eighth of an inch of unprinted space on right edge. Here's the code I'm using:
public void printImage(File image) throws Exception {
PrintRequestAttributeSet aset = new HashPrintRequestAttributeSet();
aset.add(OrientationRequested.REVERSE_LANDSCAPE);
aset.add(MediaSizeName.JAPANESE_POSTCARD);
DocPrintJob printerJob = printService.createPrintJob();
FileInputStream fis = new FileInputStream(image);
Doc doc = new SimpleDoc(fis, DocFlavor.INPUT_STREAM.JPEG, null);
printerJob.print(doc, aset);
fis.close();
}
I thought the the JAPANESE_POSTCARD size was correct but it seems to small for 4"x6" prints. I also tried setting MediaPrintableArea to 4"x6" but that didn't work either. Any ideas?
I know there is some (potential) internal wrangling of the Paper that goes on after you pass it to the PrintJob, basically it's trying to valid that the paper size and margins can work with the specified printer (from experience).
However, you might to take a read of http://www.jpedal.org/PDFblog/2009/06/java-printing-page-size-problem/ as it might have some ideas on how to over come some of them.
As to how to them apply that back to the PrintServices API is another question ;)

Check if a file is an image

I am using JAI and create a file with:
PlanarImage img = JAI.create("fileload", myFilename);
I check before that line if the file exists. But how could I check if the file is a .bmp or a .tiff or an image file?
Does anyone know?
The Image Magick project has facilities to identify image and there's a Java wrapper for Image Magick called JMagick which I think you may want to consider instead of reinventing the wheel:
http://www.jmagick.org
I'm using Image Magick all the time, including its "identify" feature from the command line and it never failed once to identify a picture.
Back in the days where I absolutely needed that feature and JMagick didn't exist yet I used to Runtime.exec() ImageMagick's identify command from Java and it worked perfectly.
Nowadays that JMagick exist this is probably not necessary anymore (but I haven't tried JMagick yet).
Note that it gives much more than just the format, for example:
$ identify tmp3.jpg
tmp3.jpg JPEG 1680x1050 1680x1050+0+0 DirectClass 8-bit 293.582kb
$ identify tmp.png
tmp.png PNG 1012x900 1012x900+0+0 DirectClass 8-bit 475.119kb
Try using the width of the image:
boolean isImage(String image_path){
Image image = new ImageIcon(image_path).getImage();
if(image.getWidth(null) == -1){
return false;
}
else{
return true;
}
}
if the width is -1 then is not image.
To tell if something is a png, I've used this below snippet in Android java.
public CompressFormat getCompressFormat(Context context, Uri fileUri) throws IOException {
// create input stream
int numRead;
byte[] signature = new byte[8];
byte[] pngIdBytes = { -119, 80, 78, 71, 13, 10, 26, 10 };
InputStream is = null;
try {
ContentResolver resolver = context.getContentResolver();
is = resolver.openInputStream(fileUri);
// if first 8 bytes are PNG then return PNG reader
numRead = is.read(signature);
if (numRead == -1)
throw new IOException("Trying to reda from 0 byte stream");
} finally {
if (is != null)
is.close();
}
if (numRead == 8 && Arrays.equals(signature, pngIdBytes)) {
return CompressFormat.PNG;
}
return null;
}
At the beginning of files, there is an identifying character sequence.
For example JPEG files starts with FF D8 FF.
You can check for this sequence in your program but I am not sure whether this works for every file.
For information about identifying characters you can have a look at http://filext.com
You could use DROID, a tool for file format identification that also offers a Java API, to be used roughly like this:
AnalysisController controller = new AnalysisController();
controller.readSigFile(signatureFileLocation);
controller.addFile(fileToIdentify.getAbsolutePath());
controller.runFileFormatAnalysis();
Iterator<IdentificationFile> it = controller.getFileCollection().getIterator();
Documentation on the API usage is rather sparse, but you can have a look at this working example (the interesting part is in the identifyOneBinary method).
The only (semi-)reliable way to determine the contents of a file is to open it and read the first few characters. Then you can use a set of tests such as implemented in the Unix file command to make an educated guess as to the contents of the file.
Expanding on Birkan's answer, there is a list of 'magic numbers' available here:
http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html
I just checked a BMP and TIFF file (both just created in Windows XP / Paint), and they appear to be correct:
First two bytes "42 4d" -> BMP
First four bytes "4d 4d 00 2a" -> TIFF
I used VIM to edit the files and then did Tools | Convert to Hex, but you can also use 'od -c' or something similar to check them.
As a complete aside, I was slightly amused when I found out the magic numbers used for compiled Java Classes: 'ca fe ba be' - 'cafe babe' :)
Try using the standard JavaBeans Activation Framework (JAF)
With the JavaBeans Activation Framework standard extension, developers who use Java technology can take advantage of standard services to determine the type of an arbitrary piece of data, encapsulate access to it, discover the operations available on it, and to instantiate the appropriate bean to perform said operation(s). For example, if a browser obtained a JPEG image, this framework would enable the browser to identify that stream of data as an JPEG image, and from that type, the browser could locate and instantiate an object that could manipulate, or view that image.
if(currentImageType ==null){
ByteArrayInputStream is = new ByteArrayInputStream(image);
String mimeType = URLConnection.guessContentTypeFromStream(is);
if(mimeType == null){
AutoDetectParser parser = new AutoDetectParser();
Detector detector = parser.getDetector();
Metadata md = new Metadata();
mimeType = detector.detect(is,md).toString();
if (mimeType.contains("pdf")){
mimeType ="pdf";
}
else if(mimeType.contains("tif")||mimeType.contains("tiff")){
mimeType = "tif";
}
}
if(mimeType.contains("png")){
mimeType ="png";
}
else if( mimeType.contains("jpg")||mimeType.contains("jpeg")){
mimeType = "jpg";
}
else if (mimeType.contains("pdf")){
mimeType ="pdf";
}
else if(mimeType.contains("tif")||mimeType.contains("tiff")){
mimeType = "tif";
}
currentImageType = ImageType.fromValue(mimeType);
}

Categories