How to get a non-XML output using JDOM XSLTransformer? - java

I have an XML file which I'd like to parse into a non-XML (text) file based on a XLST file. The code in both seem correct, and it works when testing manually, but I'm having a problem doing this programatically.
I'm using JDOM's XSLTransformer class to apply the XSLT to the XML and it returns it in the format of a JDOM Document. The problem here is that I can't seem to access anything in the Document as it is not a proper XML file and I get a "java.lang.IllegalStateException: Root element not set" error.
Is there a better way within Java to obtain a non-XML file as a result of XSLT?

JDOM XSLTTransformer is a convenience wrapper around javax.xml.transform.Transformer for JDOM input and output.
A JDOM input is easily transformed to text output.
Transformer transformer = TransformerFactory.newInstance().newTransformer(new StreamSource(stylesheet));
JDOMSource in = new JDOMSource(doc);
StringWriter writer = new StringWriter();
StreamResult out = new StreamResult(writer);
transformer.transform(in, out);
return writer.toString();

Related

Save API Response as XML file

I'm actually working on a test project using "REST Assured" in Java :Using a get API to retrive an xml content ( the content type is "application/atom+xml") then update this xml and push it using a post API.
My idea is to save the response body of get API in xml file -> update it -> post it -> delete the xml file
and I'm blocked in the first step to save the response in xml file, I tried many methods but could'nt succeed.
Any solution for this or any other ideas
I tried saving the response as a String then convert it to XML , but the xml content is too big so I get many errors which fixing it will change the format/content
I also tried using the XMLpath to parse the content that I want to change without saving it to a file but couldnt make it work because I save the response to a string then parse it
The issue was with the xml format , I had to use transformer to make it pretty and safe it as xml
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes" );
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
Writer out = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(out));
FileWriter fw = new FileWriter("my-file.xml");
fw.write( out.toString());
fw.close();

Pretty-print and incomplete XML

We have a logging system where we log payload on-demand to troubleshoot and in non-prod. However, due to column size constraint, we truncate the XML if it is more than 5000 characters.
The XML is not pretty-print formatted and is a continuous string.
When the XML is truncated, it is hard to format it to make it easy to check the data in the XML. Usually, I use Java DocumentBuilderFactory to format a complete XML, but that fails if we use against a incomplete XML.
I would like to have a solution that can format an incomplete XML instead of throwing an error.
Following the approach Michael Kay had outlined in his answer I linked to in a comment to use an identity Transformer with indentation over a StreamSource to catcn any parse exception the code looks like
String xml = "<root><section><p>Paragraph 1.</p><p>Paragraph 2."; //"<root><section><p>Paragraph 1.</p><p>Paragraph 2.</p></section></root>";
Transformer identityTransformer = TransformerFactory.newInstance().newTransformer();
identityTransformer.setOutputProperty("indent", "yes");
StringWriter resultWriter = new StringWriter();
StreamResult resultStream = new StreamResult(resultWriter);
try {
identityTransformer.transform(new StreamSource(new StringReader(xml)), resultStream);
}
catch (TransformerException e) {
System.out.println(e.getMessageAndLocation());
System.out.println(resultWriter.toString());
}
and then at least, for that example, gets to the last p element:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<section>
<p>Paragraph 1.</p>
<p
So some information at the end is lost but before that incomplete element the code at least breaks up the long one liner of the input into several lines.
Note: I used Saxon 10 HE as the default Transformer, if you use the JRE's one or Xalan you will need to set identityTransformer.setOutputProperty("{http://xml.apache.org/xalan}indent-amount", "2"); as otherwise you get line breaks but no indentation.

Disable automatic ampersand escaping in XML?

Consider:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.newDocument();
Element root = doc.createElement("list");
doc.appendChild(root);
for(CorrectionEntry correction : dictionary){
Element elem = doc.createElement("elem");
elem.setAttribute("from", correction.getEscapedFrom());
elem.setAttribute("to", correction.getEscapedTo());
root.appendChild(elem);
}
(then follows the writing of the document into an XML file)
where getEscapedFrom and getEscapedTo return (in my code) something like finké if the originating word is finké. So as to perform a Unicode escape for the characters that are bigger than 127.
The problem is that the final XML has the following line <elem from="finke" to="fink&#xE9;" /> (from is finke, to is finké) where I would like it to be <elem from="finke" to="finké" />
I've tried, following another response in StackOverflow, to disable escaping of ampersands putting the line doc.appendChild(doc.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "&")); after the creation of the doc but without success.
How could I "tell XML" to not escape ampersands? Or, conversely, how could I let "XML" to convert from é, or \\u00E9, to é?
Update
I managed to come to the problem: up until the writing of the file the node (through debug) seems to contain the right string. Once I call transformer.transform(domSource, streamResult); everything goes wild.
DOMSource domSource = new DOMSource(doc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(baos);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(domSource, streamResult);
System.out.println(baos.toString());
The problem seems to be the transformer.
Try setting setOutputProperty("encoding", "us-ascii") on the transformer. That tells the serializer to produce the output using ASCII characters only, which means any non-ASCII character will be escaped. But you can't control whether it will be a decimal or hex escape (unless you use Saxon-PE or higher as your Transformer, in which case there's a serialization option to control this).
It's never a good idea to try to do the serialization "by hand". For at least three reasons: (a) you'll get it wrong (we see a lot of SO questions caused by people producing bad XML this way), (b) you should be working with the tools, not against them, (c) the people who wrote the serializers understand XML better than you do, and they know what's expected of them. You're probably working to requirements written by someone whose understanding of XML is very superficial.

Error writing XML Document to file in Java

I am trying to write org.w3c.dom.Document to a file. I get the Document from
String URL = "http://...."
DOMParser parser = new DOMParser();
Document doc = null;
try {
parser.parse(new InputSource(URL));
doc = parser.getDocument();
} catch () {}
Then I write this Document to a file using
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);
While doing this I keep getting the following error
ERROR: 'Namespace for prefix 'xlink' has not been declared.'
What might be wrong? Thanks
I recommend using a different library such as Dom4J rather than trying to fight your way through the built-in XML API in Java. Dom4J is better designed and makes your code much more readable:
Document doc = new SAXReader().read(inputStream);
new XMLWriter(outputStream).write(doc);
None of this mucking around with FactoryFactoryFactoryFactories.
I know this doesn't directly answer your question but hopefully it will help anyway. Dom4j knows how to talk to the Java XML API so you can mix and match them to suit your needs. You can even plug it into Xalan or something similar if you want to use XSLT.

How to Preserve the Input's Declared Encoding in the Output of javax.xml.transform.Transformer.transform? (e.g. avoid UTF-16 changing to UTF-8)

Assuming this input XML
<?xml version="1.0" encoding="UTF-16"?>
<test></test>
Writing these lines of code :
StreamSource source = new StreamSource(new StringReader(/* the above XML*/));
StringWriter stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
TransformerFactory.newInstance().newTransformer().transform(source, streamResult);
return stringWriter.getBuffer().toString();
Outputs for me this XML:
<?xml version="1.0" encoding="UTF-8"?>
<test></test>
(the declared encoding of UTF-16 is converted to the default UTF-8)
I know I can explicitly ask for UTF-16 output
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
But the question is, how to make the output encoding automatically be the same as the input?
To do this, you'll have to use something more sophisticated than a StreamSource. For example, a StAXSource takes an XMLStreamReader, which has the getCharacterEncodingScheme() method that tells you which encoding the input document used - you can the set that as output enocding.
try this:
// Create an XML Stream Reader
XMLStreamReader xmlSR = XMLInputFactory.newInstance()
.createXMLStreamReader(new StringReader(/* the above XML*/));
// Wrap the XML Stream Reader in a StAXSource
StAXSource source = new StAXSource(xmlSR);
// Create a String Writer
StringWriter stringWriter = new StringWriter();
// Create a Stream Result
StreamResult streamResult = new StreamResult(stringWriter);
// Create a transformer
Transformer transformer = TransformerFactory.newInstance().newTransformer();
// Set STANDALONE based on the source stream
transformer.setOutputProperty(OutputKeys.STANDALONE,
xmlSR.isStandalone() ? "yes" : "no");
// Set ENCODING based on the source stream
transformer.setOutputProperty(OutputKeys.ENCODING,
xmlSR.getCharacterEncodingScheme());
// Set VERSION based on the source stream
transformer.setOutputProperty(OutputKeys.VERSION, xmlSR.getVersion());
// Transform the source stream to the out stream
transformer.transform(source, streamResult);
// Print the results
return stringWriter.getBuffer().toString();
You need to peek into the stream first.
Section F of the XML specification gives you an idea how to auto-detect the encoding.
The XSLT processor doesn't actually know what the input encoding is (the XML parser doesn't tell it, because it doesn't need to know). You can set the output encoding using xsl:output, but to make this the same as the input encoding you're going to have to discover the input encoding first, for example by peeking at the source file before parsing it.

Categories