I am managing a java web application and I want to make a pdf export that contains some labels with a logo on them and some data like “id”.
What is the best way to do that.
I am thinking something like iText (I am not only intresting in html to pdf converter) but I am not sure if I can include images and borders in my tickets.
Every proposal is welcome.
If what you want is create from scratch PDF files that contain images and text, then yes, iText can do that (and much much more). Edit your question with further details about your requirements if a more detailed answer.
Related
How to extract PDF file content in java completely as Text and render as HTML?
Not like extracting just text separately or just images separately, requirement is to display contents of PDF file (as like original file-means including images and tables right at place where it was in original file) as HTML content.
Some how same like the sample in the answer here Convert Word to HTML with Apache POI which extracts contents of MS Doc file into HTML using Apache POI.
Extracting data from a PDF file is fairly simple. There are multiple libraries out there that do it correctly. Extracting data, and preserving its layout, on the other hand (the workflow the OP describes) is a very difficult process. The reason behind it is simple - most* PDF files, don't really have any elements that define structure. When a PDF file, for example, displays a table, it's very easy for humans to see it, and understand this is indeed a table with some data in it. However, in the PDF file itself, this is a collection of vector lines, and some text runs in between. The PDF itself, or the PDF viewer, are not aware that this is a table. Therefore when this data is converted to HTML, we don't know that we need to draw a table, but instead see this as vector art. This is just one example of why this is difficult. There are many others that can be used to illustrate this point.
On the other hand, such a thing exists as "Tagged PDF" (section 10.7). It's a PDF where structure elements are actually defined, and extraction is fairly easy. However tagged PDF files are not as common as we would like, and in most cases you won't be guaranteed to work with one.
There are some tools on the market that use sophisticated logic to infer the structure of an untagged document. Some of them do a better job than others at this. I've worked with Adobe Acrobat, which does a decent job at creating an HTML file. There is also an offering from Datalogics (I work for Datalogics) called PDF Alchemist which converts PDF to HTML. Both of them are commercial solutions.
If you are looking for a free solution, PDFBox does a good job at extracting content from a PDF document. However, it doesn't have the ability to create an HTML file, and this is something that will have to be implemented outside of the library. I'm not aware of any free PDF to HTML solutions that do a good enough job, and I would be willing to recommend.
I am very beginner in this iText. I just want to know how to do basic things with PDF from Java.
How to open a PDF page so that the PDF file is opened in front of me?
How to jump to a page (by page number) so that I see seeked page in front of me?
How to change page size so that I see page size changed in front of me?
I know these things can be done using mouse and keyboard, but I just want to make a program that open PDF files according to parameters.
When I do search about iText, I just found topics that tackle creation/modification issues.
You have misunderstood what iText is for. It's not a library for PDF rendering, you won't be able to visualize a PDF file with iText.
As for your issue, most of the PDF viewers (Adobe Acrobat, Foxit Reader) accept parameters to achieve (partly) what you want.
It's just a matter of finding the proper reader. Remember that seeking advices about libraries, applications, frameworks is considered off-topic on StackOverflow, so your question is likely to be closed.
i've been searching on the internet on how to convert a HTML page into a PDF file using Java. i found a lot of pointers, and in short, they don't work or are too difficult to implement. i also downloaded a commercial product, pdf4ml; the API is something i'd be happy to work with, except that when i crawled a simple page on wikipedia, i get a out of memory error (setting Xmx to 1024 M). in some approaches, they suggest converting HTML -> XHTML -> FO -> PDF. however, i am getting a lot of exceptions for the XHTML-to-FO XLS file; and reading the documentations, it's not something that i have enough time to understand right now.
here are my questions/concerns.
1. is there another cohesive API out there that will easily convert HTML to PDF (commercial or not)?
2. is there a way i can simply capture a HTML page and store it as a single file. this approach would be similar to using internet explorer's way of saving a web page as a web archive (single file, MHT format)?
any help is appreciated. (btw, i know this question has been asked repeatedly, but in addition to the original spirit of the question, i'm opened to other ways). thanks.
Try wkhtmltopdf, which is using WebKit. Another option (I'm using that currently) is using OpenOffice (remote controlled via macros).
you may use iText open source Java lib for that, and read this
or use YaHPConverter open source Java lib.
or do this whith help of icepdf popular open source lib
or use pd4ml, but it not free, only trial.
or use this, and this is man for it.
My 2 cents using opensource tools:
You can use either Capture screenshots with Selenium or WebDriver to save html page's screenshot in an image file from your Java code. And once you have image file you can convert it to pdf again from your Java code.
EDIT:
It seems you can do all that in 1 step using itext Html to Pdf
I am not sure but you could Try
1) cobra html rendering engine http://lobobrowser.org/cobra.jsp
2) htmleditorkit -- part of jdk
3) JWebPane
Use the rendering kit to parse and render html. The rendered out put is a swing component. Swing component can be used by itext to generate pdf file out put
You can try out Pdfcrowd. It is an easy to use commercial online API with many options and with support for Java.
It can create PDF either from web pages or raw HTML code.
I'm using PDFBox to extract the outline (bookmarks) information from PDF files, that's even explained in the same site.
However, I've had problems not extracting but generating the qualified urls (foo.pdf#page=22777&zoom=2,2,777) to open the PDF in those bookmarks. Sometimes PDFBox is not able to find the page in which the bookmark is placed (i.e. the page number, left coordinate or top coordinate are wrong.)
Anyone knows a PDF library capable to do this (preferably in Java)? Thanks.
Best regards,
Alexander.
iText (http://itextpdf.com) might work for you.
I've used it mostly to create PDFs (not so much with parsing already exitingones), but the library is good, and does have objects related to outlines and bookmarks.
I'm developing an web application using Flex and JSP.
I am having some performance issues with displaying multiple PDF files.
I am trying to display about 50-100 PDF files. I know that is a little crazy.
Hence, I made the project to convert PDF files to JPG format and display the JPG files.
I'm wondering if there is a way to decrease the file size of PDF to size of JPG.
Additionally, I would like to seek other way that may improve the performance.
Does anyone know a good way to display many PDF files (that will be mostly just text) for web application? Or, should I just have it display JPG files?
If the PDF files are mostly text you should probably use HTML. Is there something that would prevent you from making regular pages from your PDFs?
You can convert the PDF to rtf text file, use the text from rtf file to populate your HTML page perhaps in a table.
Check out ghostscript lib for doing this conversion.