LSSerializer vs Transformer for serializing xml to String - java

I have to turn a org.w3c.dom.Document into a java.lang.String. I have found two possible approaches, one using org.w3c.dom.ls.LSSerializer and the other using a javax.xml.transform.Transformer. I have samples of each below.
Can anyone tell me which method is to be preferred?
public String docToStringUsingLSSerializer(org.w3c.dom.Document doc) {
DOMImplementationRegistry reg = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) reg.getDOMImplementation("LS");
LSSerializer serializer = impl.createLSSerializer();
return serializer.writeToString(doc);
}
public String docToStringUsingTransformer(org.w3c.dom.Document doc) {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StringWriter stw = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(stw));
return stw.toString();
}

There are several points to consider:
LSSerializer is usually considered faster than Transformer.
Nevertheless it heavily depends on the implementation. A Transformer based on SAX will have good performance. And there are different implementors (Xalan, Xerces, ...).
It is very easy to check which is better in your system. Design a simple test case with a complex XML. Run it in a loop thousdns of time, wrap that with time check (Syste.getCurrentMilliseconds or something) and you've got yourself an answer.
Other nice answers include:
Is there a more elegant way to convert an XML Document to a String in Java than this code?
https://stackoverflow.com/questions/1137488/ways-of-producing-xml-in-java

Related

Java edit XML file with DOM

I have hit somewhat of a roadblock.
My goal is to filter out everything except the number.
Here is the xml file
<?xml version="1.0" encoding="utf-8" ?>
<orders>
<order>
<stuff>"Some random information and # 123456"</stuff>
</order>
</orders>
Here is my incomplete code. I don't know how to find it nor how to go about making the change I want.
public static void main(String argv[]) {
try {
// Lesen der Datei
File inputFile = new File("C:\\filepath...\\asdf.xml");
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse(inputFile);
// I don't know where to go from there
NodeList filter = doc.getChildNodes();
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, consoleResult);
} catch (Exception e) {
e.printStackTrace();
}
}
When you use
Transformer transformer = transformerFactory.newTransformer();
the transformer is an "identity transformer" - it copies the input to the output with no change. In effect you're using the identity transformer here for serialization only, to convert the DOM to lexical XML.
If you want to make actual changes to the XML content, you have two choices: either write Java code to modify the in-memory DOM tree before serialising it, or write XSLT code so your Transformer is doing a real transformation not just an identity transformation. XSLT is almost certainly the better approach except that it involves more of a learning curve.
I'm not sure exactly what output you want, which makes it difficult to give you working code. The phrase "filter out" is unfortunately ambiguous, when people say "I want to filter out X" they sometimes mean they want to remove X, and sometimes they mean they want to remove everything except X. Also, "removing the number" isn't a complete specification unless we know all possibilities of what might appear in your document, for example is the number always preceded by "#", or is that only the case in this one example input? But one approach would be to remove all digits, which you could do with a call on translate(., '0123456789', '').
Note that if you're using XSLT you don't need to construct a DOM first, in fact, it's a waste of time and space. Just supply the lexical XML as input to the transformer, in the form of a StreamSource.

Usage of compiled XSL transformations

I am producing compiled .class files (Translet) from XSL transformation files with using TransformerFactory which is implemented by org.apache.xalan.xsltc.trax.TransformerFactoryImpl.
Unfortunately, I couldn't find the way how to use these translet classes on XML transformation despite my searchings for hours.
Is there any code example or reference documentation may you give? Because this document is insufficient and complicated.
Thanks.
A standard transformation in XSLT looks like this:
public void translate(InputStream xmlStream, InputStream styleStream, OutputStream resultStream) {
Source source = new StreamSource(xmlStream);
Source style = new StreamSource(styleStream);
Result result = new StreamResult(resultStream);
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer t = tFactory.newTransformer(style);
t.transform(source, result);
}
so given that you don't use a Transformer factory, but a ready made Java class (which is an additional maintenance headache and doesn't give you that much better performance since you can keep your transformer object after the initial compilation) the same function would look like this:
public void translate(InputStream xmlStream, OutputStream resultStream) {
Source source = new StreamSource(xmlStream);
Result result = new StreamResult(resultStream);
Translet t = new YourTransletClass();
t.transform(source, result);
}
In your search you missed out to type the Interface specification into Google where the 3rd link shows the interface definition, that has the same call signature as Transformer. So you can swap a transformer object for your custom object (or keep your transformer objects in memory for reuse)
Hope that helps

Java XML Transformer : Empty elements in long notation instead of short

I have created a conversion tool to add some information to an existing xml file.
This is done by using DOM and the Transformer class.
The output file will be processed by third party software.
This TPS needs the empty tags from the input and outputfile in Long Notation.
Unfortunately, transformer class always change them to short notation.
Is there a way to prevent this from happenning?
I have been searching various sites, but haven't found a solution that really fits my needs.
Please help,
Thanks,
Kind regards,
Maarten
You can transform the DOM to StAXResult.
For instance,
XMLOutputFactory factory=XMLOutputFactory.newFactory();
XMLStreamWriter writer=factory.createXMLStreamWriter(System.out);
StAXResult result=new StAXResult(writer);
trans.transform(new DOMSource(doc),result);
XMLOutputFactory factory = XMLOutputFactory.newFactory();
XMLStreamWriter writer = factory.createXMLStreamWriter(System.out);
StAXResult result = new StAXResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.transform(new DOMSource(doc), result);

Error writing XML Document to file in Java

I am trying to write org.w3c.dom.Document to a file. I get the Document from
String URL = "http://...."
DOMParser parser = new DOMParser();
Document doc = null;
try {
parser.parse(new InputSource(URL));
doc = parser.getDocument();
} catch () {}
Then I write this Document to a file using
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult result = new StreamResult(new File(file));
transformer.transform(source, result);
While doing this I keep getting the following error
ERROR: 'Namespace for prefix 'xlink' has not been declared.'
What might be wrong? Thanks
I recommend using a different library such as Dom4J rather than trying to fight your way through the built-in XML API in Java. Dom4J is better designed and makes your code much more readable:
Document doc = new SAXReader().read(inputStream);
new XMLWriter(outputStream).write(doc);
None of this mucking around with FactoryFactoryFactoryFactories.
I know this doesn't directly answer your question but hopefully it will help anyway. Dom4j knows how to talk to the Java XML API so you can mix and match them to suit your needs. You can even plug it into Xalan or something similar if you want to use XSLT.

Java: need help with optimizing a part of code

I have a simple code for transforming XML, but it is very time consuming (I have to repeat it many times). Does anyone have a recommendation how to optimize this code? Thanks.
EDIT: This is a new version of the code. I unfortunatelly can't reuse Transformer, since XSLTRuleis in most of the cases different. I'm now reusing TransformerFactory. I'm not reading from files before this so I can't use StreamSource. Largest amount of time is spent on initialization of Transformer.
private static TransformerFactory tFactory = TransformerFactory.newInstance();
public static String transform(String XML, String XSLTRule) throws TransformerException {
Source xmlInput = new StreamSource(new StringReader(XML));
Source xslInput = new StreamSource(new StringReader(XSLTRule));
Transformer transformer = tFactory.newTransformer(xslInput);
StringWriter resultWriter = new StringWriter();
Result result = new StreamResult(resultWriter);
transformer.transform(xmlInput, result);
return resultWriter.toString();
}
The first thing you should do is to skip the unnecessary conversion of the XML string to bytes (especially with a hardcoded, potentially incorrect encoding). You can use a StringReader and pass that to the StreamSource constructor. The same for the result: use a StringWriter and avoid the conversion.
Of course, if you call the method after converting your XML from a file (bytes) to a String in the first place (again with a potentially wrong encoding), it would be even better to have the StreamSource read from the file directly.
It seems like you apply an XSLT to an XML file. To speed things up, you can try compiling the XSLT, like with XSLTC.
I can only think of a couple of minor things:
The TransformerFactory could be reused.
The Transformer could be reused if it is thread confined, and the XSL input is the same each time.
If you can estimate the output size reasonably accurately, you could create the ByteArrayOutputStream with an initial size hint.
As stated in Michaels answer, you could potentially speed things up by not loading either the input or output xml entirely into memory yourself and make your api stream based.

Categories