iText: create Phrase from HTML - java

Is there a way to create a Phrase object from HTML in iText?
I am using iText# 7, but Java examples and iText 5 examples are still welcome.

This is the most straightforward way of creating a PDF from HTML with iText7 (and pfdHTML)
// IO
File htmlSource = new File("input.html");
File pdfDest = new File("output.pdf");
// pdfHTML specific code
ConverterProperties converterProperties = new ConverterProperties();
HtmlConverter.convertToPdf(new FileInputStream(htmlSource), new FileOutputStream(pdfDest), converterProperties);
You can convert HTML to List<IElement>, by using another static method of HtmlConverter
List<IElement> elements = HtmlConverter.convertToElements(new FileInputStream(src), properties);
Check out the resources at the iText website:
https://developers.itextpdf.com/content/itext-7-examples/itext-7-converting-html-pdf
https://www.youtube.com/watch?v=zlTdttU_XyU&feature=youtu.be

Related

itext7 HTML to PDF element converter in JAVA. Multi fonts created

I develop the creating PDF code when I added some paragraphs by using the method HtmlConverter.convertToElements.
This is the example:
String font_folder="C:\\FormIT\\formit\\ConvertIT\\Resources\\Fonts";
ConverterProperties properties;
properties = new ConverterProperties();
FontProvider fp = new DefaultFontProvider(true, true, false);
fp.addDirectory(font_folder);
properties.setFontProvider(fp);**
…
List<IElement> main_elem = HtmlConverter.convertToElements(html_elem,properties);
hp = (Paragraph) main_elem.get(0);
document.add(hp);**
It works well, but the problem is the PDF created with duplicates Fonts. I mean that the font was added to PDF for each paragraph was created by convertToElement.
It looks as follows in Adobe Reader properties:
The question is: how can I create the paragraph such that the font remains reusable as it works with adding the sample paragraph?

How to use pdf2Dom library in java code to get html doc from pdf?

I am trying to extract tabular data from PDF, and to start it, my first step of algorithm says to convert PDF to html doc.
How can I convert PDF to html using pdf2Dom library?
you can convert it using this
private void generateHTMLFromPDF(String filename) {
PDDocument pdf = PDDocument.load(new File(filename));
Writer output = new PrintWriter("src/output/pdf.html", "utf-8");
new PDFDomTree().writeText(pdf, output);
output.close();
}
reference - link

How to export data set using pdf template with itext?

In my project, some data sets are needed to be exported in PDF format.
I learned that iText is helpful, and PdfpTable can do the work, but it needs much code to deal with styles. While using PDF template can save time and code for adjusting style, but I can only set certain fields left in the template.
Can you give me some suggestions to show the data sets using commands like foreach? Thanks in advance!
Here are my code using pdfpTable, which has done the work, but the code is a little ugly:
PdfPTable pdfTable = createNewPDFTable();
for (int i = 0; i < dataSet.size(); i++) {
MetaObject metaObject = SystemMetaObject.forObject(dataSet.get(i));
for (String field : fields) {
Phrase phrase = new Phrase(String.valueOf(metaObject.getValue(field) != null ? metaObject.getValue(field) : "")
, PDFUtil.createChineseSong(DEFAULT_CELL_FONT_SIZE));
PdfPCell fieldCell = new PdfPCell(phrase);
fieldCell.setBorder(Rectangle.NO_BORDER);
fieldCell.setFixedHeight(DEFAULT_COLUMN_HEIGHT);
fieldCell.setHorizontalAlignment(Element.ALIGN_CENTER);
fieldCell.setVerticalAlignment(Element.ALIGN_MIDDLE);
pdfTable.addCell(fieldCell);
}
}
Here are some code using pdfp template,which is copied from itext examples, the work is unfinished yet because i haven't find a proper way to show the data set.
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
form.setField("text_1", "Bruno Lowagie");
form.setFieldProperty("text_1", "setfflags", PdfFormField.FF_READ_ONLY, null);
There is an inconsistency in your question. You write: PdfpTable can do the work, but it needs much code to deal with styles. However, in your first code snippet, you don't really create your PDFs the way one would expect. Instead of producing a high volume of finished PDFs, you create use PdfPTable to create a template. I assume you then use that template to create a high volume of finished PDFs.
If you want to use a template and populate it afterwards, you shouldn't create your form using iText. Create it manually, for instance using Open Office or Libre Office. See for instance the example in chapter 6 of my book (section 6.3.5). Create the template with a tool that has a GUI, then fill out that template many times using iText.
This approach has some down-sides: all the content has to fit the fields you define. All fields have a fixed position on a fixed page.
If "applying styles through code" is a problem, you may want to follow the approach described in the ZUGFeRD book. In that book, we create HTML first: Creating HTML invoices.
Once you have the HTML, then convert the HTML to PDF, and use CSS to apply styles: Creating PDF invoices.
This is how we create a ZUGFeRDDocument:
ZugferdDocument pdfDocument = new ZugferdDocument(
new PdfWriter(fos), ZugferdConformanceLevel.ZUGFeRDComfort,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", new FileInputStream(INTENT)));
pdfDocument.addFileAttachment(
"ZUGFeRD invoice", dom.toXML(), "ZUGFeRD-invoice.xml",
PdfName.ApplicationXml, new PdfDictionary(), PdfName.Alternative);
pdfDocument.setTagged();
HtmlConverter.convertToPdf(
new ByteArrayInputStream(html), pdfDocument, getProperties());
The getProperties() method looks like this:
public ConverterProperties getProperties() {
if (properties == null) {
properties = new ConverterProperties()
.setBaseUri("resources/zugferd/");
}
return properties;
}
You can find other examples on how to use HTML to PDF here: pdfHTML add-on (read the introduction).
Note that you are using an old version of iText. The examples I shared are using iText 7. There's a huge difference between iText 5 and iText 7.

pdf conversion Using java library

I am willing to convert xhtml files into pdf/a format or pdf files to pdf/a format.. Can anyone please suggest which java library I can use..
Thank you
I will make my example more specific
I have a simple html file xyz.html
<html><body>
hello
<br>
<font style = "Helvetica">hello</font>
<br>
</body></html>
java code :
Document document = new Document(PageSize.A4);
FileOutputStream fout = new FileOutputStream(pdffile);
PdfWriter pdfWriter = PdfWriter.getInstance(document, fout);
pdfWriter.setPDFXConformance(PdfWriter.PDFA1B);
FileReader fr = new FileReader(xyz.html);
document.open();
HashMap<String, Object> Provider = new HashMap<String, Object>();
DefaultFontProvider def = new
Provider.put(HTMLWorker.FONT_PROVIDER, def);
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.setProviders(Provider);
htmlWorker.parse(fr);
I get the error com.itextpdf.text.pdf.PdfXConformanceException: All the fonts must be embedded. This one isn't: Helvetica
try the flying soucer: http://code.google.com/p/flying-saucer/
Check for iText library which has support for both Java and .net
http://itextpdf.com/
Few examples in the below link :
http://itextpdf.com/book/examples.php
http://www.rgagnon.com/javadetails/java-html-to-pdf-using-itext.html
This is proprietory but Its really a smart enterprise library and has good customer support.
Consider Apache FOP project, it supports conversion of xml files to pdf files.
I work at Expected Behavior, and we've developed a SaaS application called DocRaptor that converts HTML to PDF using Prince XML as our rendering engine. DocRaptor uses HTTP POST requests to generate PDF files, and can be used with Java.
Here's a link to our Java example:
DocRaptor Java example
And a link to DocRaptor's home page:
DocRaptor
DocRaptor IS a subscription based service, but our free plan allows you to create up to 5 documents per month, and we don't embed watermarks or restrict the size of free documents.

Add HTML to iText document in memory using XHTMLrenderer (FlyingSaucer)

I am using iText 2.1.7 to generate a document from a database. One of the fields I need to add is in XHTML format. I can use the HTMLWorker class to generate the HTML but this is a bit limited.
I convert this to XHTML using the following code:
String url = chapterDesc.getString("description").toString(); // get the HTML string from the database
org.w3c.dom.Document doc = XMLResource.load(new ByteArrayInputStream(url.getBytes())).getDocument();
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
ByteArrayOutputStream os = new ByteArrayOutputStream();
renderer.layout();
renderer.createPDF(os);
I want to add this information to the document in memory. Is this possible?
Do I need to use PdfStamper? I believe that this requires the document to be closed? If it is possible I would like to avoid using multiple passes to add these descriptions.
Flying saucer does not work correctly with any version of iText other than 2.0.8. Also since you meantioned creating the pdf in memory are you using JSF, JSP, or servlets? If you are than you can just send your ByteArrayOutputStream as a response on one of these pages using something along the lines of
response.setContentType("application/pdf");
response.setContentLength(os.size());
os.writeTo(response.getOutputStream());
response.flushBuffer();
I know it's been more than two years since you've asked, but I'm facing the same problem. I googled for a solution and apparently there is none anywhere to be found. So I had to develop my own and I thought I might as well share it. Hope it'll be useful to someone.
I tried to use flying saucer as you did, but it didn't work for me. My piece of HTML was just a simple table so I could use iText HTMLWorker to do the parsing.
So first I get a PdfStamper as you suggested.
PdfReader template = new PdfReader(templateFileName);
PdfStamper editablePage = new PdfStamper(template, reportOutStream);
Then I work with the document (fill the fields, insert some images) and after that I need to insert an HTML snippet.
//getting a 'canvas' to add parsed elements
final ColumnText page = new ColumnText(editablePage.getOverContent(pageNumber));
//finding out the page sizefinal
Rectangle pagesize = editablePage.getReader().getPageSize(pageNumber);
//you can define any size here, that will be where your parsed elements will be added
page.setSimpleColumn(0, 0, pagesize.getWidth(), pagesize.getHeight());
If you need simple styling, HTMLWorker can do some
StyleSheet styles = new StyleSheet();
styles.loadStyle("h1", "color", "#008080");
//parsing
List<Element> parsedTags = HTMLWorker.parseToList(new StringReader(htmlSnippet), styles);
for (Element tag : parsedTags)
{
page.addElement(tag);
page.go();
}
These are just some basic ideas of how to do that, hope it helps.

Categories