When I use colon in the tag name like in the example below, it ends up in error (there is no problem with tags without the colon).
package test;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class SomeClass{
public StringWriter test() throws XMLStreamException, TransformerConfigurationException, TransformerException {
StringWriter stringOut = new StringWriter();
XMLStreamWriter xmlWriter = XMLOutputFactory.newInstance().createXMLStreamWriter(stringOut);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeStartElement("SomeWordHere");
{
xmlWriter.writeStartElement("SomeName:enable");//<--- notice the colon
xmlWriter.writeCharacters("true");
xmlWriter.writeEndElement();
}
xmlWriter.writeEndElement();
xmlWriter.writeEndDocument();
xmlWriter.flush();
xmlWriter.close();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.STANDALONE, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter formattedStringWriter = new StringWriter();
transformer.transform(new StreamSource(new StringReader(stringOut.toString())), new StreamResult(formattedStringWriter));
return formattedStringWriter;
}
}
How to write the tag that would still conain the colon and would not end up in error?
I am trying to emulate the XML output (Collada DAE) produced by LEGO Stud.io software, there are sections like the one below containing tag names with colons.
<library_materials>
<material id="material_id_7" name="SOLID-BLUE">
<instance_effect url="#effect_id_7-fx" />
<extra>
<technique profile="eyesight">
<ScratchBump:enable> true </ScratchBump:enable>
<MinScratchStrength:value> 0 </MinScratchStrength:value>
<MaxScratchStrength:value> 0.2 </MaxScratchStrength:value>
<BigScratch:enable> true </BigScratch:enable>
<SmallScratch:enable> true </SmallScratch:enable>
</technique>
</extra>
</material>
</library_materials>
Colon is used for namespaces and per "Namespaces in XML" specification, it cannot be used in entity names.
The specification states:
[Definition: A document is namespace-well-formed if it conforms to
this specification. ]
It follows that in a namespace-well-formed document:
All element and attribute names contain either zero or one colon;
No entity names, processing instruction targets, or notation names contain any colons.
You can use a trick that is to declare "SomeName" as a namespace as it is suggested in this question: xml schema validation error "prefix is not bound".
On the other hand, "Extensible Markup Language" Specification state that:
Note:
The Namespaces in XML Recommendation [XML Names] assigns a meaning to
names containing colon characters. Therefore, authors should not use
the colon in XML names except for namespace purposes, but XML
processors must accept the colon as a name character.
If you change the parser you can get what you want:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
public class CreateXmlFileDemo {
public static void main(String[] args) {
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.newDocument();
Element rootElement = doc.createElement("SomeName:enable");
doc.appendChild(rootElement);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
StreamResult consoleResult = new StreamResult(System.out);
transformer.transform(source, consoleResult);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Reference: https://www.w3.org/TR/REC-xml-names/
Related
I have some kind of complex XML data structure. The structure contains different fragments like in the following example:
<data>
<content-part-1>
<h1>Hello <strong>World</strong>. This is some text.</h1>
<h2>.....</h2>
</content-part1>
....
</data>
The h1 tag within the tag 'content-part-1' is of interest. I want to get the full content of the xml tag 'h1'.
In java I used the javax.xml.parsers.DocumentBuilder and tried something like this:
String my_content="<h1>Hello <strong>World</strong>. This is some text.</h1>";
// parse h1 tag..
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = documentBuilder.parse(new InputSource(new StringReader(my_content)));
Node node = doc.importNode(doc.getDocumentElement(), true);
if (node != null && node.getNodeName().equals("h1")) {
return node.getTextContent();
}
But the method 'getTextContent()' will return:
Hello World. This is some text.
The tag "strong" is removed by the xml parser (as it is the documented behavior).
My question is how I can extract the full content of a single XML Node within a org.w3c.dom.Document without any further parsing the node content?
Although java DOM parser provides functionality for parsing mixed content, in this particular case it could be more convenient to use Jsoup library. When using it code to extract h1 element content would be as follows:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
String text = "<data>\n"
+ " <content-part1>\n"
+ " <h1>Hello <strong>World</strong>. This is some text.</h1>\n"
+ " <h2></h2>\n"
+ " </content-part1>\n"
+ "</data>";
Document doc = Jsoup.parse(text);
Elements h1Elements = doc.select("h1");
for (Element h1 : h1Elements) {
System.out.println(h1.html());
}
Output in this case will be "Hello <strong>World</strong>. This is some text."
What you probaly want is XML generation from some subnode of your document.
So with slighlty modified nodeToString from earlier answer to similar question I can propose to
generate text <h1>Hello <strong>World</strong>. This is some text.</h1>. Some extra effor might be needed to get rid of <h1> and </h1>
package com.github.vtitov.test;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.StringReader;
import java.io.StringWriter;
import static org.hamcrest.MatcherAssert.*;
import static org.hamcrest.Matchers.*;
public class XmlTest {
#Test
public void buildXml() throws Exception {
String my_content="<h1>Hello <strong>World</strong>. This is some text.</h1>";
// parse h1 tag..
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = documentBuilder.parse(new InputSource(new StringReader(my_content)));
Node node = doc.importNode(doc.getDocumentElement(), true);
String h1Content = null;
if (node != null && node.getNodeName().equals("h1")) {
h1Content = nodeToString(node);
}
assertThat("h1", h1Content, equalTo("<h1>Hello <strong>World</strong>. This is some text.</h1>"));
}
private static String nodeToString(Node node) throws TransformerException {
StringWriter sw = new StringWriter();
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "no");
t.transform(new DOMSource(node), new StreamResult(sw));
return sw.toString();
}
}
I was following this post: JAXB Marshaller indentation
But I ran to an error:
org.w3c.dom.DOMException: NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces.
Which actually pertains to the Marshaller I have used when it did:
marshaller.marshal(instance, domResult);
Your comments and opinions are highly appreciated.
Cheers,
Artanis Zeratul
I fixed my problem by tweaking Antonio Maria Sanchez's answer a bit.
Reference: JAXB Marshaller indentation
So here is my answer:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class ObjectToXMLWriter {
public static <Type> boolean writeToFileWithXmlTransformer(Type instance
,String fullFileNamePath) throws FileNotFoundException {
boolean isSaved = false;
JAXBContext jaxBContent = null;
Marshaller marshaller = null;
StringWriter stringWriter = new StringWriter();
try {
jaxBContent = JAXBContext.newInstance(instance.getClass());
marshaller = jaxBContent.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.marshal(instance, stringWriter);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.transform(new StreamSource(new StringReader(stringWriter.toString()))
,new StreamResult(new File(fullFileNamePath)));
isSaved = true;
} catch(JAXBException jaxBException) {
System.out.println("JAXBException happened!");
jaxBException.printStackTrace();
} catch(Exception exception) {
System.out.println("Exception happened!");
exception.printStackTrace();
}
return isSaved;
}
}
The critical points to this answer are the following:
marshaller.marshal(instance, stringWriter);
instead of using DOMResult
transformer.transform(new StreamSource(new StringReader(stringWriter.toString()))
,new StreamResult(new File(fullFileNamePath)));
instead of using DOMSource
Cheers,
Artanis Zeratul
I am trying to transform the following XML
<PHONEBOOK>
<PERSON>
<NAME>Ren1</NAME>
<EMAIL>ren1#gmail.com</EMAIL>
<TELEPHONE>999-999-9999</TELEPHONE>
<WEB>www.ren1.com</WEB>
</PERSON>
<PERSON>
<NAME>Ren2</NAME>
<EMAIL>ren2#gmail.com</EMAIL>
<TELEPHONE>999-999-9999</TELEPHONE>
<WEB>www.ren2.com</WEB>
</PERSON>
<PERSON>
<NAME>Ren3</NAME>
<EMAIL>ren3#gmail.com</EMAIL>
<TELEPHONE>999-999-9999</TELEPHONE>
<WEB>www.ren3.com</WEB>
</PERSON>
</PHONEBOOK>
to
<Names><Name>Ren1</Name><Name>Ren2</Name><Name>Ren3</Name></Names>
using DOMSource, DOMResult and XSLT.
XSLT used is as follows
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output omit-xml-declaration="yes" method="xml"></xsl:output>
<xsl:template match="/">
<Names>
<xsl:for-each select="PHONEBOOK/PERSON">
<Name>
<xsl:value-of select="NAME" />
</Name>
</xsl:for-each>
</Names>
Java Code used for transformation:
package test1;
import java.io.IOException;
import java.io.StringWriter;
import java.io.ObjectInputStream.GetField;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;
public class Test2 {
public static void main(String[] args) throws TransformerException,
ParserConfigurationException, SAXException, IOException {
// TODO Auto-generated method stub
//Stylesheet
StreamSource stylesource = new StreamSource(
"src/test1/transform_stylesheet1.xsl");
DocumentBuilderFactory docbFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder dBuilder = docbFactory.newDocumentBuilder();
//source XML
Document sourceDoc = dBuilder.parse("src/test1/Sample1.xml");
DOMSource source = new DOMSource(sourceDoc);
TransformerFactory transformerFactory = TransformerFactory
.newInstance();
Transformer transformer = transformerFactory
.newTransformer(stylesource);
Document document = dBuilder.newDocument();
DOMResult result = new DOMResult(document);
transformer.transform(source, result);
Node resultDoc = ((Document) result.getNode()).getDocumentElement();
System.out.println(resultDoc.getChildNodes().getLength());
// print the result
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(resultDoc), new StreamResult(writer));
String str = writer.toString();
System.out.println(str);
}
}
Output of the above is as follows:
3 <Names/>
but i expect,
3
<Names><Name>Ren1</Name><Name>Ren2</Name><Name>Ren3</Name></Names>
i debugged the code and found that 'resultDoc' has the content which i expect. Am i missing something while printing the result?
Your problem is that you're using the same transformer for the stylesheet processing and the output. That means, the stylesheet is applied again, but this time to the <Names><Name>Ren1</Name>...</Names> xml. You can imagine that this doesn't give the results you want.
Change your code to:
// print the result
StringWriter writer = new StringWriter();
Transformer transformer2 = transformerFactory.newTransformer();
transformer2.transform(new DOMSource(resultDoc), new StreamResult(writer));
String str = writer.toString();
System.out.println(str);
and it should work.
As #Abel mentions, you can also do the stylesheet processing and the to String in one go:
StringWriter writer = new StringWriter();
transformer.transform(source, new StreamResult(writer));
String str = writer.toString();
System.out.println(str);
You don't need the DOMResult and DOMSource variables then.
Is there a way to print the XML content without the XML header tag in Java?
For example if I have an XML like this:
<?xml version='1.0' encoding='UTF-8'?>
<rootElement>
<childElement>Text</childElement>
</rootElement>
I just want to print
<rootElement>
<childElement>Text</childElement>
</rootElement>
This is very similar to what I am doing so far:
http://sacrosanctblood.blogspot.com/2008/07/convert-xml-file-to-xml-string-in-java.html
I cannot give out the exact source code but the above link example should give you some idea. Here's that code with imports:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import org.w3c.dom.Attr;
import org.w3c.dom.Document;
import org.w3c.dom.Text;
public String convertXMLFileToString(String fileName)
{
try{
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
InputStream inputStream = new FileInputStream(new File(fileName));
org.w3c.dom.Document doc = documentBuilderFactory.newDocumentBuilder().parse(inputStream);
StringWriter stw = new StringWriter();
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.transform(new DOMSource(doc), new StreamResult(stw));
return stw.toString();
}
catch (Exception e) {
e.printStackTrace();
}
return null;
}
Transformer serializer = TransformerFactory.newInstance().newTransformer();
serializer.setOutputProperty("omit-xml-declaration", "yes");
serializer.transform(new DOMSource(doc), new StreamResult(stw));
Good and old XSL ;).
In an open source project I maintain, we have at least three different ways of reading, processing and writing XML files and I would like to standardise on a single method for ease of maintenance and stability.
Currently all of the project files use XML from the configuration to the stored data, we're hoping to migrate to a simple database at some point in the future but will still need to read/write some form of XML files.
The data is stored in an XML format that we then use a XSLT engine (Saxon) to transform into the final HTML files.
We currently utilise these methods:
- XMLEventReader/XMLOutputFactory (javax.xml.stream)
- DocumentBuilderFactory (javax.xml.parsers)
- JAXBContext (javax.xml.bind)
Are there any obvious pros and cons to each of these?
Personally, I like the simplicity of DOM (Document Builder), but I'm willing to convert to one of the others if it makes sense in terms of performance or other factors.
Edited to add:
There can be a significant number of files read/written when the project runs, between 100 & 10,000 individual files of around 5Kb each
It depends on what you are doing with the data.
If you are simply performing XSLT transforms on XML files to produce HTML files then you may not need to touch a parser directly:
import java.io.File;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
Transformer transformer = tf.newTransformer(xsltTransform);
StreamSource source = new StreamSource(new File("source.xml"));
StreamResult result = new StreamResult(new File("result.html"));
transformer.transform(source, result);
}
}
If you need to make changes to the input document before you transform it, DOM is a convenient mechanism for doing this:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.w3c.dom.Document;
public class Demo {
public static void main(String[] args) throws Exception {
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
Transformer transformer = tf.newTransformer(xsltTransform);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(new File("source.xml"));
// modify the document
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File("result.html"));
transformer.transform(source, result);
}
}
If you prefer a typed model to make changes to the data then JAXB is a perfect fit:
import java.io.File;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.util.JAXBSource;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
TransformerFactory tf = TransformerFactory.newInstance();
StreamSource xsltTransform = new StreamSource(new File("xslt.xml"));
Transformer transformer = tf.newTransformer(xsltTransform);
JAXBContext jc = JAXBContext.newInstance("com.example.model");
Unmarshaller unmarshaller = jc.createUnmarshaller();
Model model = (Model) unmarshaller.unmarshal(new File("source.xml"));
// modify the domain model
JAXBSource source = new JAXBSource(jc, model);
StreamResult result = new StreamResult(new File("result.html"));
transformer.transform(source, result);
}
}
This is a very subjective topic. It primarily depends on how you are going to use the xml and size of XML. If XML is (always) small enough to be loaded in to memory, then you don't have to worry about memory foot print. You can use DOM parser. If you need to a parse through 150 MB xml you may want to think of using SAX. etc.