Error converting to pdf from html using itextpdf (pdfhtml) - java

I have a HTML file containing scripts, images and charts (uses 'svg'). As per one of the iText examples I am using following code to convert input html file into a pdf file:
HtmlConverter.convertToPdf(new FileInputStream(src), new FileOutputStream(dest));
During conversion I get following errors:
c.i.h.a.i.DefaultHtmlProcessor - No worker found for tag script
c.i.s.r.f.DefaultSvgNodeRendererFactory - Could not find implementation for tag tspan
c.i.h.c.a.u.PaddingApplierUtil - Padding value in percents not supported
How do I resolve these errors ?

Related

InvalidPDFException when viewing the Chinese content PDF using Primeface DefaultStreamedContent

Using Primefaces 6.0 and
JAVA 1.8
Using below code to view the PDF document using DefaultStreamedContent.
<pe:documentViewer locale="en" height="600" value="#{documentMBean.pdfFile}" id="pdfDocViewer" >
Below code to get the Stream as pdfFile.
byte[] documentData = null;
setPdfFile(new DefaultStreamedContent());
documentData =//Getting the byte array from DB
getPdfFile().setStream(new ByteArrayInputStream(documentData));
getPdfFile().setContentType("application/pdf");
getPdfFile().setContentEncoding("UTF-8");
When I upload and store a PDF file which contents Chineese charactes getting below exception in PDF Viewer.
PDF.js v1.0.21 (build: f954cde) Message: InvalidPDFException
Note: Plain text PDF which contents English characters working fine and able to view the PDF.
I have tried to set different character-encoding such as UTF-8,UTF-16
Please assist me how to resolve above exception. Where I can find PDF.js for further analysis ?

How to write data to pdf file which contains html tags using itext lib in Java

I have String which contains some html tags and it is coming from database, i want to write that in PDF file with same styling present in the String in the form of HTML tag. I tried to use XMLWorkerHelper like this
String html = What is the equation of the line passing through the
point (2,-3) and making an angle of -45<sup>2</sup> with the positive
X-axis?
XMLWorkerHelper.getInstance().parseXHtml(writer, document, new
StringReader(html));
but it only reads the data which is inside the html tag(in this case only 2) other string it simply ignores. But i want the entire String with HTML formating.
With HTMLWorker it works perfectly but that is deprecated so please let me know how to achieve this.
I am using iText 5 lib

Merging HTML, RTF to Docx using Docx4J

I'm new to Docx4j and I need some advice.
Currently I'm creating a simple (X)HTML document with Java. It contains some information from a database. After creating this html, Docx4j creates a Word-Docx file by using a very simple word template. This works fine.
Now I have to enhance this HTML. One database value contains a byte array which holds an RTF file.
Currently I'm putting this data into HTML as a string.
String content = new String(allbytes,"UTF-8");
html+=content;
At least the html files looks like this:
<html>
....
<td>
{\rtf1\ansi\deflang1033\ftnbj\uc1\deff1.....
</td>
...
</html>
Docx4J now creates a Word-Docx which shows this RTF as a string and not as an imported RTF file.
Off course not, but I wish to see it as imported RTF.
How can I archive this?
Is there a simple way to do this?
Converting rtf to docx content is outside the scope of docx4j.
You'll need to look for a third party solution which does rtf to docx, or failing that, rtf to (x)html (see Convert Rtf to HTML)
You could try http://sourceforge.net/projects/rtf2xml/ and then transform the XML to WordML.
Another possibility may be LibreOffice via JODConverter.

Extract the first page content from docx file by XML parsing

I need to extract the first page content from the docx file and save it as a seperate document. I need everything from the first page( images, tables, text) to be saved as it is in new docx file.
What i tried is :
I looked into the xml of the unzipped docx file. Since word document is reflowable i couldnt find a page break after each page ends. So i couldnt find the end of each page via the document.xml
Is there any way to get the XML content of the first page of the document alone using java XML DOM parser ?
Do not write a new parser, there are tons of already existing tools for that (e.g., what if your input changes from XML to binary Word files?).
Use Apache POI for example, as #JFB suggested.

docx4j convert docx in wrong html format

I have some problems with docx4j samples. I need to convert a file from docx in html format and back. I'm try to compile ConvertInXHTMLDocument.java sample. Html file it creates fine, but when trying to convert it back into docx, throws an exception that is missing close tags (META, img etc). Has anyone encountered this problem?
XHTMLImporter requires its input to be well-formed XML. So you need to ensure you don't have missing close tags (META, img etc); if you do, run JTidy or similar first.
docx4j's (X)HTML output can either be HTML or XML. From 3.0, the property Convert.Out.HTML.OutputMethodXML will control which.

Categories