Java Render XML Document as PDF - java

I have an XML document currently stored as an in-memory string & want to render it as a PDF. In other words, the PDF content will be an XML document. The XML being rendered by the method is generic -- multiple types of XML documents might be sent in.
I'm having a bit difficulty figuring out how to accomplish using using various Java-based frameworks.
Apache FOP
It appears as if this framework require specific transformation for XML elements in the document to FOP entities. Since the method in questions must accept generic XML, I don't think this framework fits my requirement.
iText
I've tried rendering a document using a combination of iText/Flying Saucer (org.xhtmlrenderer) and while it does render a PDF, the content only contains space-separated data values and no xml elements or attributes. Using the code & test data below below:
File
<?xml version="1.0" encoding="UTF-8"?>
<root>
<elem1>value1</elem1>
<elem2>value2</elem2>
</root>
Code
File inputFile = new File(PdfGenerator.class.getResource("test.xml").getFile());
OutputStream os = new FileOutputStream("c:\\temp\\Sample.pdf");
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(inputFile);
renderer.layout();
renderer.createPDF(os);
os.close();
Results in a PDF that contains the content values value1 value2, but no tags.
My question is
can someone provide a code snippet for rending a PDF containing XML content using one of the frameworks above, or is there another framework better suited to my needs?
Edit:
I realize the same question was asked here, but it seems the solution presented requires intimate knowledge of the structure of the incoming XML doc in the css file.

Just for the sake of giving an example using fop - here you have it. For everyone to be able to follow this I'm using the fop command line tool.
The same can easily be performed within Java code and then you don't need to have the xml as a file at any time.
XSLT that produce a PDF
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:template match="/">
<fo:root>
<fo:layout-master-set>
<fo:simple-page-master master-name="content"
page-width="210mm" page-height="297mm" margin="20mm 20mm 20mm 20mm">
<fo:region-body/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="content">
<fo:flow flow-name="xsl-region-body">
<fo:block>
<xsl:apply-templates />
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
<xsl:template match="#*">
<xsl:text> </xsl:text>
<xsl:value-of select="name()" />
<xsl:text>="</xsl:text>
<xsl:value-of select="." />
<xsl:text>"</xsl:text>
</xsl:template>
<xsl:template match="*">
<xsl:param name="indent">0</xsl:param>
<fo:block margin-left="{$indent}">
<xsl:text><</xsl:text>
<xsl:value-of select="name()" />
<xsl:apply-templates select="#*" />
<xsl:text>></xsl:text>
<xsl:apply-templates>
<xsl:with-param name="indent" select="$indent+10" />
</xsl:apply-templates>
<xsl:text></</xsl:text>
<xsl:value-of select="name()" />
<xsl:text>></xsl:text>
</fo:block>
</xsl:template>
</xsl:stylesheet>
We call this file xml2pdf.xsl
Short explanation of the code
The template match="/" mainly builds the pdf except for the row which calls the other template match methods or more precise the template match="*".
The template match="" writes the element start and end and calls which in turn calls the template match="#" for each attribute in the element (if any). Finally it calls the
The indent parameter gets increased by 10 for each level the template reaches with the select="$indent+10" attribute in the with-param statement.
Using the code
# fop -xsl xml2pdf.xsl -xml sample.xml -pdf result.pdf

This is the solution using itext . Your html content is in the request. And itext is not free. Check out its licensing requirement as it has changed in recent years although it is not very expensive.
public class MyPDFGeneratorService {
public byte[] generatePdf(final XhtmlPDFGenerationRequest request) {
try {
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(this.getDocument(request.getContent()), null);
renderer.layout();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
renderer.createPDF(baos);
return this.toByteArray(baos);
}
catch (Exception e) {
throw new PDFGenerationException(
"Unable to generate PDF.", e);
}
}
private Document getDocument(final String content) {
InputSource is = new InputSource(new BufferedReader(new StringReader(
content)));
return XMLResource.load(is).getDocument();
}
private byte[] toByteArray(final ByteArrayOutputStream baos)
throws IOException {
byte[] bytes = baos.toByteArray();
baos.close();
return bytes;
}
}

Try Googling, there are a number of code snippets. For example: http://www.vogella.com/articles/JavaPDF/article.html
I recommend iText rather than FOP, it's faster, less memory-intensive and you have more control over the result.

Related

Restrict element creation in XSLT if value is empty

I wanted to create new element in target XML if and only if the element value of source XML is not empty. I can do this using below code. But, my problem is I have around 5k field to wrap with similar condition. Do we have any better way to handle this?
<xsl:if test="edi:po-num"> //wanted to avoid this for each element
<xsl:element name="element">
<xsl:attribute name="name">order_reference_number</xsl:attribute>
<xsl:value-of select="edi:po-num"/>
</xsl:element>
</xsl:if>
java code to transform:
Transformer trans = StylesheetCache.newTransformer(xslFilePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
trans.transform(source, new StreamResult(outputStream));
Your options in XSLT 1.0 are limited - XSLT 1.0 code tends to be verbose. But if it's really repetitive, then you could consider writing a meta-stylesheet - an XSLT stylesheet that generates your stylesheet from some higher-level description of what it needs to do.
Note also, your code will be a lot less verbose if you use literal result elements and attribute value templates rather than xsl:element and xsl:attribute.

OutOfMemoryError: Java heap space using XSLT transform

I want to transform XML file using XSLT.
I made:
TransformerFactory factory = TransformerFactory.newInstance();
InputStream is =
this.getClass().getResourceAsStream(getPathToXSLTFile());
Source xslt = new StreamSource(is);
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File(getInputFileName()));
transformer.transform(text, new StreamResult(new File(getOutputFileName())));
Which input file have about 10000000 lines, I have error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xml.internal.utils.FastStringBuffer.append(FastStringBuffer.java:682)
at com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM.characters(SAX2DTM.java:2111)
at com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.characters(SAXImpl.java:863)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.characters(AbstractSAXParser.java:546)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:455)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:421)
at com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(XSLTCDTMManager.java:215)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(TransformerImpl.java:556)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:739)
at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:351)
at ru.magnit.task.utils.AbstractXmlUtil.transformXML(AbstractXmlUtil.java:66)
at ru.magnit.task.EntryPoint.main(EntryPoint.java:72)
In this line:
transformer.transform(text, new StreamResult(new File(getOutputFileName())));
What is the reason for this and can it be optimized somehow, without the size of the heap?
UPDATE:
My XSLT file:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="entries">
<entries>
<xsl:apply-templates/>
</entries>
</xsl:template>
<xsl:template match="entry">
<entry>
<xsl:attribute name="field">
<xsl:apply-templates select="*"/>
</xsl:attribute>
</entry>
</xsl:template>
In general XSLT 1.0 and 2.0 work with a data model which pulls the complete XML input into a tree model to allow full XPath navigation, resulting in a memory usage that increases with the size of the input document.
So unless you increase the heap space if your current document size leads to memory shortage there is not much you can do, at least not in general, there might be XSLT processor specific and some XSLT specific optimizations depending on your concrete XSLT code, but you can't avoid that the processor first pulls in the complete document. We would need to see your XSLT to try to tell whether it can be optimized. Profiling a stylesheet can help to identify areas to be optimized, I am not sure whether Xalan supports that. And I am not sure whether that stack trace not simply means that Xalan already runs out of memory when building the DTM (its tree model) for your large input, in that case obviously optimizing the XSLT code does not help as it is not even executed.
A Java specific way you could attempt is to use https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/sax/SAXTransformerFactory.html instead to create a SAX filter from your stylesheet and chain it with a default Transformer to serialize the result of the filter, I think I have once tried that and found it can consume less memory than the traditional approach with a Transformer.
XSLT 3.0 tries to address the memory problem with the new approach of streaming (https://www.w3.org/TR/xslt-30/#streaming-concepts), however so far there is only one implementation with Saxon 9 EE, a commercial product. And in general a stylesheet is not necessarily streamable, instead you have to rewrite it to make it streamable (if that is at all possible, for instance sorting input nodes is not possible with streaming).
For instance, your posted stylesheet converted to XSLT 3.0 to use streaming (no rewrite necessary, only needed to set up the default mode as streamable) is
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="xs math"
version="3.0">
<xsl:mode streamable="yes"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="entries">
<entries>
<xsl:apply-templates/>
</entries>
</xsl:template>
<xsl:template match="entry">
<entry>
<xsl:attribute name="field">
<xsl:apply-templates select="*"/>
</xsl:attribute>
</entry>
</xsl:template>
</xsl:stylesheet>
and Saxon 9.8 EE and the beta of Exselt assess that as streamable.

XSLT in Java: CDATA section split

I want to replace some items in a huge XML file, and I thought I'll do it with XSLT. I have absolutely no experience with it, so if you think there would be better ways to do this, please tell me.
Anyway, as a first step I just wanted to copy the whole XML over. This is my xsl file:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no" cdata-section-elements="script"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The relevant Java code is:
Source xmlInput = new StreamSource(oldProjectStream);
Source xsl = new StreamSource("test.xsl");
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsl);
StreamResult xmlOutput = new StreamResult("output/project.xml");
transformer.transform(xmlInput, xmlOutput);
Most of the output is fine, also the order of the elements is not changed (this could turn out quite important).
The XML contains some Lua code in CDATA sections. At some (seemingly random) points, however, the CDATA section is closed and reopened again. It seems to have to do with brackets in the code, but just rately - there are about 5 points in a 1.4 MB XML looking like this:
<script><![CDATA[
...
html_encoding["Otilde" ] = string.char(213)
html_encoding["Ouml" ]]]><![CDATA[ = string.char(214)
html_encoding["Oslash" ] = string.char(216)
...
]]></script>
In the original file, the middle line looks just like the other ones. There are thousands of lines where I've put the dots. What's going on here?
The (proprietary) application that should handle the XML isn't able to load it.
It's useful to tell us which XSLT processor you are using.
The serializer has to close and reopen a CDATA section if it encounters "]]>" in the data, because that sequence cannot legally appear in a CDATA section. It shouldn't need to do so under any other circumstances, though the spec probably doesn't disallow it.

Transforming XML and preserving Unicode characters with XSLT

My XSLT transformations have been successful for months until I ran across an XML file with Unicode characters (most likely emoji). I need to preserve the Unicode but XSLT is converting it to HTML Entities. I thought that setting the encoding to UTF-8 would solve my problem but I'm still having issues.
Any help appreciated. Code:
private byte[] transform(InputStream stream) throws Exception{
System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.processor.TransformerFactoryImpl");
Transformer xmlTransformer;
xmlTransformer = (TransformerImpl) TransformerFactory.newInstance().newTransformer(new StreamSource(createXsltStylesheet()));
xmlTransformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(stream,"UTF-8");
Source staxSource = new StAXSource(reader, true);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(outputStream, "UTF-8");
xmlTransformer.transform(staxSource, new StreamResult(writer));
return outputStream.toByteArray();
}
If I add
xmlTransformer.setOutputProperty(OutputKeys.METHOD, "text");
the Unicode is preserved but the XML is not.
I just ran across this same issue, and after far too long researching it, here's what I've concluded.
Java XSLT processors escape multi-byte UTF-8 characters into HTML entities even if the output mode is XML... if multibyte chars occur in a text() node that's not wrapped in CDATA. If the characters are wrapped in CDATA (for output) the multibyte character will be preserved.
My Problem:
I had an xml file that looked like this, complete with emoji.
<events>
<event>
<id>RANDOMID</id>
<blah>
<blahId>FOOONE</blahId>
</blah>
<blah>
<blahId>FOOTWO</blahId>
</blah>
<eventComment>Did some things. Had some Fun. 👍</eventComment>
</event>
</events>
I started with an XSL stylesheet that looked like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict"
>
<xsl:output method = "xml" version="1.0" encoding = "UTF-8" omit-xml-declaration="no" indent="yes" />
<xsl:template match="/">
<events>
<xsl:for-each select="/events/event">
<event>
<xsl:copy-of select="./*[name() != 'blah'"/>
<xsl:for-each select="./blah">
<blahId><xsl:copy-of select="./blahId/text()"/></blahId>
</xsl:for-each>
</event>
</xsl:for-each>
</events>
</xsl:template>
</xsl:stylesheet>
Running this with a java Transformer consistently produced 👍 where my emoji should be. Subsequent attempts to parse the resultant Document failed with the following exception message:
org.xml.sax.SAXParseException; lineNumber: y; columnNumber: x; Character reference "&#55357" is an invalid XML character.
HOGWASH!
Testing this with xsltproc on the command line was useless, since xsltproc isn't stupid when it comes to multibyte characters. I got the output I expected.
A SOLUTION
Having the XSLT wrap the eventComment in CDATA by specifying the QName in the xsl:output tag cdata-section-elements attribute will preserve the bytes and works with xsltproc and the java Transformer.
The magic here is the output cdata-secion-elements property from the <xsl:output> tag. https://www.w3.org/TR/xslt#output
I updated my XSL template to be:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict"
>
<xsl:output cdata-section-elements="eventComment" method="xml" version="1.0" encoding="UTF-8" omit-xml-declaration="no" indent="yes"/>
<xsl:template match="/">
<events>
<xsl:for-each select="/events/event">
<event>
<xsl:copy-of select="./*[name() != 'blah' and name() != 'eventComment']"/>
<!-- For the cdata-section-elements to resolve that eventComment needs to be preserved as CDATA
(so we don't get java doing stupid things with unicode escapment)
it needs to be explicitly referenced here.
-->
<eventComment><xsl:copy-of select="./eventComment/text()"/></eventComment>
<xsl:for-each select="./blah">
<blahId><xsl:copy-of select="./blahId/text()"/></blahId>
</xsl:for-each>
</event>
</xsl:for-each>
</events>
</xsl:template>
</xsl:stylesheet>
And now my output from both xsltproc and a java Transformer looks like this, and parses happily with java DocumentBuilders.
<?xml version="1.0" encoding="UTF-8"?>
<events xmlns="http://www.w3.org/TR/xhtml1/strict">
<event>
<id xmlns="">RANDOMID</id>
<eventComment><![CDATA[Did some things. Had some Fun. 👍]]></eventComment>
<blahId>FOO</blahId>
<blahId>FOOTOO</blahId>
</event>
</events>
This line is suspicious:
stream = IOUtils.toInputStream(outputStream.toString(),"UTF-8");
You are converting a ByteArrayOutputStream to a String using the default encoding of your platform, which is probably not UTF-8. Change it to
stream = IOUtils.toInputStream(outputStream.toString("UTF-8"),"UTF-8");
or, for better performance, just wrap the byte array in a ByteArrayInputStream :
return new ByteArrayInputStream(outputStream.toByteArray());
Try to convert to String the XML using Apache Serializer.
//Serialize DOM
OutputFormat format = new OutputFormat (doc);
// as a String
StringWriter stringOut = new StringWriter ();
XMLSerializer serial = new XMLSerializer (stringOut,
format);
serial.serialize(doc);
// Display the XML
System.out.println(stringOut.toString());
just solved a similar problem by adding below line to original XML:
document.appendChild(document.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, ""));
refer to : Writing emoji to XML file in JAVA
perhaps can use similar setting for the transformer...

Unusual output for XSL transformations

I have an xml document and a style sheet to convert the document into another useful xml.
For the reference the xml document is somewhat like this:
<root>
<element1>value1</element1>
<element2>value2</element2>
<element3>value3</element3>
<element4>..some more levels of data</element4>
</root>
The style sheet looks somewhat like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:include href="errorResponse.xsl"/>
<xsl:template match="root/element4">
<xsl:element name="myRoot">
<xsl:element name="myElement">
<xsl:apply-templates select="./someElement/someOtherElement"/>
</xsl:element>
</xsl:element>
</xsl:template>
The output xml string which I am getting is like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
value1
value2
value3
<myRoot><myelement> some data </myElemrnt></myroot>
The code snippet which I am using for transformation is this:
InputStream styleSheet = new FileUtil().getFileStream("xsltFileName");
StreamSource xslStream = new StreamSource(styleSheet);
DOMSource in = new DOMSource(inputXMLDoc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TransformerFactory transFact = TransformerFactory.newInstance();
transFact.setURIResolver(new XsltURIResolver());
Transformer trans = transFact.newTransformer(xslStream);
trans.transform(in, new StreamResult(baos));
System.out.println(baos.toString()); // displays the above output
However the output is in undesired format. I dont want value1, value2, value3. This is also creating problems further for the new XML generated, to be processed.
I have seen a lot of questions around the transformations. This is bugging me for a long time. Appreciate a lot if someone could point out where I am going wrong.
Also point out if I am following any incorrect conventions during the entire process.
Thanks and regards.
You are getting that output because of the Default Template Rule, which outputs the text nodes. If you don't want those nodes you need to exclude them explicitly by matching them and replacing them with nothing (i.e. an empty template).
Try adding this template to your stylesheet:
<xsl:template match="/">
<xsl:apply-templates select="root/element4"/>
</xsl:template>
It matches the root and discards everything except for root/element4.
What happens here is that the XSLT built-in templates are applied to any node not matched explicitly by a template. The net effect of the built-in templates is to copy any text node (on which tey are applied) to the output.
One of the simplest and shortest way to supress this unwanted output is to add the following template:
<xsl:template match="text()"/>
which causes any text-node for which this template is selected for execution, not to be copied to the output.

Categories