I'm running across a problem trying to print a crystal report in java where all of the text is being replaced with the little box characters. The report blob is stored in an Oracle database, and I can preview it using adobe reader and see that it is properly formed with actual text. This blob is passed to a java applet that uses the PDFRenderer to print it.
My theory is that the problem lies in the fact that the crystal reports that we generate use version 1.2 of PDF. There are also a number of jasper reports that are generated as version 1.4 and these print correctly - it's only the 1.2 pdfs that have this problem.
Does PDFRenderer not support printing this version or is there some additional steps I need to take to successfully print those?
Any help is greatly appreciated.
It's very unlikely that you encounter an issue that's due to PDF version.
Especially with text content the PDF spec get's very complex and probability is high that crystal reports creates content that either
relies on some strange encoding
uses CID (multibyte) font techniques
and pdf renderer has a blind spot there.
You may try to play around with settings on the report side regarding the
encoding
font (Type1 / TrueType9)
font embedding
and maybe you find an option better suited.
Does PDFRenderer display the PDFs if you use it as a viewer? PDFRenderer does not have support for later PDF versions (ie compressed objects) but 1.2 is fairly straight-forward.
Related
I have some troubles with JasperReports. I generated a formular with iReport including two subreports which generates a grid of values (1 or 2 Characters long).
The compiled PDF from iReport it works fine and looks good, but if i use the same *.jrxml and *.jasper files for my web app the generated PDF has some minor differences. One big problem is, that some cells of the grid now are 2 lines high. Values like "NB" only use one line but "GS" for example uses 2 lines.
For me it is not possible to find the error. Workarounds with smaller font size or wider cells didn't help.
Make sure the font you are using in the template is available on the JVM generating the report. If the font doesn't exist then a different font will be used. If changing the font isn't an option then you can create a font extension package. Creating a font extension is documented here: JasperReports Font Reference
Sound like you could have a different version of iReports in your web application. Making the cells sufficiently wide enough should at least allow the text to span just one line.
Create a Java Desktop test that generates a PDF based on the .jrxml and make sure it has the same results. If it does then there is something with the way iReports is working, if it doesn't then you know it is something with how you are viewing or creating the PDF in the web app.
i've been searching on the internet on how to convert a HTML page into a PDF file using Java. i found a lot of pointers, and in short, they don't work or are too difficult to implement. i also downloaded a commercial product, pdf4ml; the API is something i'd be happy to work with, except that when i crawled a simple page on wikipedia, i get a out of memory error (setting Xmx to 1024 M). in some approaches, they suggest converting HTML -> XHTML -> FO -> PDF. however, i am getting a lot of exceptions for the XHTML-to-FO XLS file; and reading the documentations, it's not something that i have enough time to understand right now.
here are my questions/concerns.
1. is there another cohesive API out there that will easily convert HTML to PDF (commercial or not)?
2. is there a way i can simply capture a HTML page and store it as a single file. this approach would be similar to using internet explorer's way of saving a web page as a web archive (single file, MHT format)?
any help is appreciated. (btw, i know this question has been asked repeatedly, but in addition to the original spirit of the question, i'm opened to other ways). thanks.
Try wkhtmltopdf, which is using WebKit. Another option (I'm using that currently) is using OpenOffice (remote controlled via macros).
you may use iText open source Java lib for that, and read this
or use YaHPConverter open source Java lib.
or do this whith help of icepdf popular open source lib
or use pd4ml, but it not free, only trial.
or use this, and this is man for it.
My 2 cents using opensource tools:
You can use either Capture screenshots with Selenium or WebDriver to save html page's screenshot in an image file from your Java code. And once you have image file you can convert it to pdf again from your Java code.
EDIT:
It seems you can do all that in 1 step using itext Html to Pdf
I am not sure but you could Try
1) cobra html rendering engine http://lobobrowser.org/cobra.jsp
2) htmleditorkit -- part of jdk
3) JWebPane
Use the rendering kit to parse and render html. The rendered out put is a swing component. Swing component can be used by itext to generate pdf file out put
You can try out Pdfcrowd. It is an easy to use commercial online API with many options and with support for Java.
It can create PDF either from web pages or raw HTML code.
A PDF I generate with jasper reports renders Ok in my MAC but some labels show wrong on Linux. For example, I have a static label that doesn't show completely on linux (only a part of the whole word) but yes on Mac. Can the OS be somehow related? What is the usual source of this kind of problems?
Missing fonts on your linux machine may cause such problems. I had the same when creating pdfs with iText.
Always embed fonts in generated PDFs! It saves you alot of hassles...
And notify the following slight difference: If you create a PDF with Arial as font it shows good in Windows, and will very likely use Helvetica on Linux (which is nearly the same font), but since it has some other metric properties your bounding boxes will not fit.
Again: Always embed fonts in generated PDFs!
If you have time you can look at Docmosis, it generates PDFs with the fonts embedded, so cross-platform rendering isn't an issue. Beware though if you cross-platform generate doc or odf files, then the fonts will be potentially different showing up in pagination or layout changes. Like has been said above, you need to take care that the destination will have the fonts to display those that were used to generate the document (or embed the fonts if possible).
The closes fotn to the PDF Helvetica (SansSerif) is MS Arial.ttf. The problem is that it is not available on the Linux machine by default. Copy it to the /usr/share/fonts (and update fonts.dir) or put it to your Linux JRE installation to fonts/ directory (and update fonts.dir). If you do not want to use MS Arial, try GNU FreeFont http://ftp.gnu.org/gnu/freefont/
You can embed the fonts in the PDF as Daniel has adviced, but it makes PDF larger.
Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I've read about PDFJet, but it can't read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don't handle non-English characters correctly.
iText now has a text parsing module (I'm one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.
PdfBox does not run on GAE. It uses not-allowed java classes.
(GAE only permits these http://code.google.com/appengine/docs/java/jrewhitelist.html)
I have partially modified a very old version of PdfBox (0.7.3) to be GAE complaiant. Now I'm able to extract text from PDF (whole page or rectangular area). I only modified a minumum part of the pdf text extraction and not the whole PdfBox. :)
The idea was to remove refences to java.awt.retangle & C. using my own "rectangle" class.
More info: http://fhtino.blogspot.com/2010/04/pdfbox-text-extration-gae.html
I modified the latest (1.8.0-Snapshot) version to run on Google AppEngine. Had to disable one Unit-Test, but it runs fine for simple text extraction.
Following the simple try-fail-fix approach i had to modify 5 files in total. Pretty doable.
You'll also have to explicitly use a RandomAccessBuffer, like Fabrizio explained.
For the extra lazy, heres the compiled jar, dependencies for text extraction, and the patch. Note that it might not work for every usecase (i.e. rectangle based extraction). Used it to extract text of a whole page.
https://docs.google.com/folder/d/0B53n_gP2oU6iVjhOOVBNZHk0a0E/edit
I know there is http://pdfbox.apache.org/index.html
Apache PDFBox is an open source Java
PDF library for working with PDF
documents. This project allows
creation of new PDF documents,
manipulation of existing documents and
the ability to extract content from
documents.
but I've never tested it.
Last month, I'd just finished extracting text from pdf file in my project. I used XPDF tool for getting text, and text coordinates, but I used it in Xcode (Objective-C). This tool was open source, written by C++, and able to be encoded in many language. However, I didn't know whether XPdf would be work on your java, or not. Anyway, You can try this tool.
I have to internationalize an application, and therefore have to let a user chose a font for a PDF that I will create at runtime. I want to be able to display a list of font choices from the user's system using
GraphicsEnvironment.getLocalGraphicsEnvironment().getAllFonts();
This works fine for getting fonts, but then it seems that I have to jump through some hoops to get that font to play nice with FOP, as outlined here. I can follow all the instructions that they've outlined, except I'm not sure how to get the location of fonts in order to run TTFReader.
Does anyone know how to list the path of each font that java gathers with the getAllFonts() command?
Is it a good idea to embed fonts in PDFs? For instance the Arial Unicode MS TFF file is ~ 30 MB.
Is this the only way to create internationalized pdfs with FOP and XSL?
There have been major improvements in font configuration with the last release (FOP 0.95), so please just consult the document there for a much easier approach:
http://xmlgraphics.apache.org/fop/0.95/fonts.html#basics
The XML font metrics files are no longer necessary.
To your questions:
That's impossible. Java won't give you that information. That's partly why FOP has its own font subsystem.
Yes, it's actually recommended. Please note that FOP supports font subsetting so only the glyphs needed are embedded in the PDF.
Not sure what you mean. With the above information and an upgrade to the latest release should make this simple enough.