Unusual output for XSL transformations - java

I have an xml document and a style sheet to convert the document into another useful xml.
For the reference the xml document is somewhat like this:
<root>
<element1>value1</element1>
<element2>value2</element2>
<element3>value3</element3>
<element4>..some more levels of data</element4>
</root>
The style sheet looks somewhat like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:include href="errorResponse.xsl"/>
<xsl:template match="root/element4">
<xsl:element name="myRoot">
<xsl:element name="myElement">
<xsl:apply-templates select="./someElement/someOtherElement"/>
</xsl:element>
</xsl:element>
</xsl:template>
The output xml string which I am getting is like this:
<?xml version="1.0" encoding="ISO-8859-1"?>
value1
value2
value3
<myRoot><myelement> some data </myElemrnt></myroot>
The code snippet which I am using for transformation is this:
InputStream styleSheet = new FileUtil().getFileStream("xsltFileName");
StreamSource xslStream = new StreamSource(styleSheet);
DOMSource in = new DOMSource(inputXMLDoc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TransformerFactory transFact = TransformerFactory.newInstance();
transFact.setURIResolver(new XsltURIResolver());
Transformer trans = transFact.newTransformer(xslStream);
trans.transform(in, new StreamResult(baos));
System.out.println(baos.toString()); // displays the above output
However the output is in undesired format. I dont want value1, value2, value3. This is also creating problems further for the new XML generated, to be processed.
I have seen a lot of questions around the transformations. This is bugging me for a long time. Appreciate a lot if someone could point out where I am going wrong.
Also point out if I am following any incorrect conventions during the entire process.
Thanks and regards.

You are getting that output because of the Default Template Rule, which outputs the text nodes. If you don't want those nodes you need to exclude them explicitly by matching them and replacing them with nothing (i.e. an empty template).
Try adding this template to your stylesheet:
<xsl:template match="/">
<xsl:apply-templates select="root/element4"/>
</xsl:template>
It matches the root and discards everything except for root/element4.

What happens here is that the XSLT built-in templates are applied to any node not matched explicitly by a template. The net effect of the built-in templates is to copy any text node (on which tey are applied) to the output.
One of the simplest and shortest way to supress this unwanted output is to add the following template:
<xsl:template match="text()"/>
which causes any text-node for which this template is selected for execution, not to be copied to the output.

Related

Restrict element creation in XSLT if value is empty

I wanted to create new element in target XML if and only if the element value of source XML is not empty. I can do this using below code. But, my problem is I have around 5k field to wrap with similar condition. Do we have any better way to handle this?
<xsl:if test="edi:po-num"> //wanted to avoid this for each element
<xsl:element name="element">
<xsl:attribute name="name">order_reference_number</xsl:attribute>
<xsl:value-of select="edi:po-num"/>
</xsl:element>
</xsl:if>
java code to transform:
Transformer trans = StylesheetCache.newTransformer(xslFilePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
trans.transform(source, new StreamResult(outputStream));
Your options in XSLT 1.0 are limited - XSLT 1.0 code tends to be verbose. But if it's really repetitive, then you could consider writing a meta-stylesheet - an XSLT stylesheet that generates your stylesheet from some higher-level description of what it needs to do.
Note also, your code will be a lot less verbose if you use literal result elements and attribute value templates rather than xsl:element and xsl:attribute.

XSLT in Java: CDATA section split

I want to replace some items in a huge XML file, and I thought I'll do it with XSLT. I have absolutely no experience with it, so if you think there would be better ways to do this, please tell me.
Anyway, as a first step I just wanted to copy the whole XML over. This is my xsl file:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="no" cdata-section-elements="script"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The relevant Java code is:
Source xmlInput = new StreamSource(oldProjectStream);
Source xsl = new StreamSource("test.xsl");
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsl);
StreamResult xmlOutput = new StreamResult("output/project.xml");
transformer.transform(xmlInput, xmlOutput);
Most of the output is fine, also the order of the elements is not changed (this could turn out quite important).
The XML contains some Lua code in CDATA sections. At some (seemingly random) points, however, the CDATA section is closed and reopened again. It seems to have to do with brackets in the code, but just rately - there are about 5 points in a 1.4 MB XML looking like this:
<script><![CDATA[
...
html_encoding["Otilde" ] = string.char(213)
html_encoding["Ouml" ]]]><![CDATA[ = string.char(214)
html_encoding["Oslash" ] = string.char(216)
...
]]></script>
In the original file, the middle line looks just like the other ones. There are thousands of lines where I've put the dots. What's going on here?
The (proprietary) application that should handle the XML isn't able to load it.
It's useful to tell us which XSLT processor you are using.
The serializer has to close and reopen a CDATA section if it encounters "]]>" in the data, because that sequence cannot legally appear in a CDATA section. It shouldn't need to do so under any other circumstances, though the spec probably doesn't disallow it.

Transforming XML and preserving Unicode characters with XSLT

My XSLT transformations have been successful for months until I ran across an XML file with Unicode characters (most likely emoji). I need to preserve the Unicode but XSLT is converting it to HTML Entities. I thought that setting the encoding to UTF-8 would solve my problem but I'm still having issues.
Any help appreciated. Code:
private byte[] transform(InputStream stream) throws Exception{
System.setProperty("javax.xml.transform.TransformerFactory", "org.apache.xalan.processor.TransformerFactoryImpl");
Transformer xmlTransformer;
xmlTransformer = (TransformerImpl) TransformerFactory.newInstance().newTransformer(new StreamSource(createXsltStylesheet()));
xmlTransformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
XMLStreamReader reader = XMLInputFactory.newInstance().createXMLStreamReader(stream,"UTF-8");
Source staxSource = new StAXSource(reader, true);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(outputStream, "UTF-8");
xmlTransformer.transform(staxSource, new StreamResult(writer));
return outputStream.toByteArray();
}
If I add
xmlTransformer.setOutputProperty(OutputKeys.METHOD, "text");
the Unicode is preserved but the XML is not.
I just ran across this same issue, and after far too long researching it, here's what I've concluded.
Java XSLT processors escape multi-byte UTF-8 characters into HTML entities even if the output mode is XML... if multibyte chars occur in a text() node that's not wrapped in CDATA. If the characters are wrapped in CDATA (for output) the multibyte character will be preserved.
My Problem:
I had an xml file that looked like this, complete with emoji.
<events>
<event>
<id>RANDOMID</id>
<blah>
<blahId>FOOONE</blahId>
</blah>
<blah>
<blahId>FOOTWO</blahId>
</blah>
<eventComment>Did some things. Had some Fun. 👍</eventComment>
</event>
</events>
I started with an XSL stylesheet that looked like this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict"
>
<xsl:output method = "xml" version="1.0" encoding = "UTF-8" omit-xml-declaration="no" indent="yes" />
<xsl:template match="/">
<events>
<xsl:for-each select="/events/event">
<event>
<xsl:copy-of select="./*[name() != 'blah'"/>
<xsl:for-each select="./blah">
<blahId><xsl:copy-of select="./blahId/text()"/></blahId>
</xsl:for-each>
</event>
</xsl:for-each>
</events>
</xsl:template>
</xsl:stylesheet>
Running this with a java Transformer consistently produced 👍 where my emoji should be. Subsequent attempts to parse the resultant Document failed with the following exception message:
org.xml.sax.SAXParseException; lineNumber: y; columnNumber: x; Character reference "&#55357" is an invalid XML character.
HOGWASH!
Testing this with xsltproc on the command line was useless, since xsltproc isn't stupid when it comes to multibyte characters. I got the output I expected.
A SOLUTION
Having the XSLT wrap the eventComment in CDATA by specifying the QName in the xsl:output tag cdata-section-elements attribute will preserve the bytes and works with xsltproc and the java Transformer.
The magic here is the output cdata-secion-elements property from the <xsl:output> tag. https://www.w3.org/TR/xslt#output
I updated my XSL template to be:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict"
>
<xsl:output cdata-section-elements="eventComment" method="xml" version="1.0" encoding="UTF-8" omit-xml-declaration="no" indent="yes"/>
<xsl:template match="/">
<events>
<xsl:for-each select="/events/event">
<event>
<xsl:copy-of select="./*[name() != 'blah' and name() != 'eventComment']"/>
<!-- For the cdata-section-elements to resolve that eventComment needs to be preserved as CDATA
(so we don't get java doing stupid things with unicode escapment)
it needs to be explicitly referenced here.
-->
<eventComment><xsl:copy-of select="./eventComment/text()"/></eventComment>
<xsl:for-each select="./blah">
<blahId><xsl:copy-of select="./blahId/text()"/></blahId>
</xsl:for-each>
</event>
</xsl:for-each>
</events>
</xsl:template>
</xsl:stylesheet>
And now my output from both xsltproc and a java Transformer looks like this, and parses happily with java DocumentBuilders.
<?xml version="1.0" encoding="UTF-8"?>
<events xmlns="http://www.w3.org/TR/xhtml1/strict">
<event>
<id xmlns="">RANDOMID</id>
<eventComment><![CDATA[Did some things. Had some Fun. 👍]]></eventComment>
<blahId>FOO</blahId>
<blahId>FOOTOO</blahId>
</event>
</events>
This line is suspicious:
stream = IOUtils.toInputStream(outputStream.toString(),"UTF-8");
You are converting a ByteArrayOutputStream to a String using the default encoding of your platform, which is probably not UTF-8. Change it to
stream = IOUtils.toInputStream(outputStream.toString("UTF-8"),"UTF-8");
or, for better performance, just wrap the byte array in a ByteArrayInputStream :
return new ByteArrayInputStream(outputStream.toByteArray());
Try to convert to String the XML using Apache Serializer.
//Serialize DOM
OutputFormat format = new OutputFormat (doc);
// as a String
StringWriter stringOut = new StringWriter ();
XMLSerializer serial = new XMLSerializer (stringOut,
format);
serial.serialize(doc);
// Display the XML
System.out.println(stringOut.toString());
just solved a similar problem by adding below line to original XML:
document.appendChild(document.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, ""));
refer to : Writing emoji to XML file in JAVA
perhaps can use similar setting for the transformer...

XSLT: extract the last x digit of a sibling node with xpath expression

I am trying to extract the last 4 numbers of the "red" sibling with xpath.
The source xml looks like:
...
<node2>
<key><![CDATA[RED]]></key>
<value><![CDATA[98472978241908]]></value>
... more key value pairs here...
</node2>
...
And when I use the follwing xpath:
/nodelevelX/nodelevelY/node2/key[text()='RED']/following-sibling::value
I have the full number in output, then I tried to extract the digit with this xpath experssion:
/nodelevelX/nodelevelY/node2/key[text()='RED']/following-sibling::value/text()[substring(., string-length(.)-4)]
I still have the full number. The substring function does not seems to work.
my xsl header is:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
I think there is a small error, but I cannot see where. I followed many discussions on SO and others (w3schools) and tried to follow the advices whithout success.
UPDATE: The context:
I use the following identity which copy all the nodes from my source XML to the destination (xml)
and I apply specific rules for some node after inside a xsl:template:
<!-- This copy the whole source XML in destination -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<!-- specific rules for some nodes -->
<xsl:template match="/nodeDetails">
<mynewnode>
<!-- here I take the whole value and it s working -->
<someVal><xsl:value-of select="/nodeDetails/nodeX/key[text()='ANOTHER_KEY']/following-sibling::value" /></someVal>
<!-- FIXME substring does not work now -->
<redVal><xsl:value-of select="/nodeDetails/nodeX/key[text()='RED']/following-sibling::value/text()[substring(.,string-length(.)-4)]" /></redVal>
</mynewnode>
</xsl:template>
And for the transformation I use the following code from a junit class in Java (JDK 6):
#Test
public void transformXml() throws TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(getClass().getResourceAsStream("contract.xsl"));
Transformer transformer = factory.newTransformer(xslt);
Source input = new StreamSource(getClass().getResourceAsStream("source.xml"));
Writer output = new StringWriter();
transformer.transform(input, new StreamResult(output));
System.out.println("output=" + output.toString());
}
Your current XPath will evaluate to a nodeset, but what you need is a string. Please try something like this:
<xsl:variable name="value"
select="/nodelevelX/nodelevelY/node2/key[. = 'RED']
/following-sibling::value[1]" />
<xsl:value-of select="substring($value, string-length($value) - 3)" />
Though to be sure about an answer, I'd need to see the portion of your XSLT where you are trying to output this value.
Use this XPath 2.0 expression:
/nodelevelX/nodelevelY/node2/key[text()='RED']
/following-sibling::*[1][self::value]
/substring(., string-length() -3)
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/nodelevelX/nodelevelY/node2/key[text()='RED']
/following-sibling::*[1][self::value]
/substring(., string-length() -3)"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<nodelevelX>
<nodelevelY>
<node2>
<key>GREEN</key>
<value>0123456789</value>
<key>RED</key>
<value>98472978241908</value>
<key>BLACK</key>
<value>987654321</value>
</node2>
</nodelevelY>
</nodelevelX>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
1908

How to put String text without converting content to xml file in Java?

I need to put String content to xml in Java. I use this kind of code to insert information in xml:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File ("file.xml"));
DOMSource source = new DOMSource (doc);
Node cards = doc.getElementsByTagName ("cards").item (0);
Element card = doc.createElement ("card");
cards.appendChild(card);
Element question = doc.createElement("question");
question.appendChild(doc.createTextNode("This <b>is</b> a test.");
card.appendChild (question);
StreamResult result = new StreamResult (new File (file));
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(source, result);
But string is converted in xml like this:
<cards>
<card>
<question>This <b>is</b> a test.</question>
</card>
</cards>
It should be like this:
<cards>
<card>
<question>This <b>is</b> a test.</question>
</card>
</cards>
I tried to use CDDATA method but it puts code like this:
// I changed this code
question.appendChild(doc.createTextNode("This <b>is</b> a test.");
// to this
question.appendChild(doc.createCDATASection("This <b>is</b> a test.");
This code gets a xml file look like:
<cards>
<card>
<question><![CDATA[This <b>is</b> a test.]]></question>
</card>
</cards>
I hope that somebody can help me to put String content in the xml file exactly with same content.
Thanks in advance!
This would be expected behaviour.
Consider if the brackets were kept as you put them, the end result would essentially be:
<cards>
<card>
<question>
This
<b>
is
</b>
a test.
</question>
</card>
</cards>
Basically, it would result in the <b> being an additional node in the xml tree. Encoding the brackets to < and > ensures that when displayed by any xml parser, the brackets will be displayed, and not confused as being an additional node.
If you really wanted them to display as you say you do, you will need to create elements named b. This will not only be awkward, it will also not display quite as you've written above - it would display as additional nested nodes as I've shown above. So you would need to amend the xml writer to output inline for those tags.
Nasty.
Check this solution: how to unescape XML in java
Maybe you could solve it in this way (code only for <question> tag part):
Element question = doc.createElement("question");
question.appendChild(doc.createTextNode("This ");
Element b = doc.createElement("b");
b.appendChild(doc.createTextNode("is");
question.appendChild(b);
question.appendChild(doc.createTextNode(" a test.");
card.appendChild(question);
What you are effectively trying to do is to insert XML into the middle of a DOM without parsing it. You can't do this since the DOM APIs don't support it.
You have three choices:
You could serialize the DOM and then insert the String at the appropriate point. The end result may or may not be well-formed XML ... depending on what is in the String that you inserted.
You could create and insert DOM nodes representing the text and the <b>...</b> element. This requires you to know the XML structure of the stuff that you are inserting. #bluish's answer gives an example.
You could wrap the String in some container element, parse it using an XML parser to give a second DOM, find the nodes of interest, and add them to the original DOM. This requires that the String is well-formed XML when wrapped in the container element.
Or, since you're already using a Transformation, why not go all the way?
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="cards">
<card>
<question>This <b>is</b> a test</question>
</card>
</xsl:template>
</xsl:stylesheet>

Categories