Java - Extract child elements

Java - Extract child elements - java

I am new in Java and need to parse the below XML file
<foo>
<foo1>
<tag1>1</tag1>
<tag2>2</tag2>
</foo1>
<foo2>
<element1>aaa</element1>
<element1>bbb</element2>
</foo2>
</foo>
I need
<foo1>
<tag1>1</tag1>
<tag2>2</tag2>
</foo1>
<foo2>
<element1>aaa</element1>
<element1>bbb</element2>
</foo2>
as output. I was able to get the values of nodes but not desired output. Please help me out.. Thanks.

OK, I think I got it. Here's the code with documentation:
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
// parse input XML file into Document
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse("C://Temp/xx.xml");
// build a formatted XML String
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
transformer.transform(new DOMSource(doc), result);
String xmlString = result.getWriter().toString();
// Split the formatted-XML-String according to new-line
String lines[] = xmlString.split("\\r?\\n");
// rebuild the String, skipping undesired lines
xmlString = "" ;
String newLine = String.format("%n");
for (int i = 2 ; i < lines.length-1 ; i++) {
xmlString += lines[i] + newLine;
}
System.out.println(xmlString);

Since in your case, you desire an invalid XML as output, I suggest you don't parse the XML file. Just read the entire file line-by-line and build the desired String by skipping first and last lines

Related

Transformer escapes CR

Suggest the following program:
import java.io.StringReader;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class CrDemo {
public static void main(String[] args) throws Exception {
final String xml = "<a>foo
\nbar
\n</a>";
final TransformerFactory tf = TransformerFactory.newInstance();
final Transformer t = tf.newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "no");
t.setOutputProperty(OutputKeys.STANDALONE, "yes");
t.transform(new StreamSource(new StringReader(xml)), new StreamResult(System.out));
}
}
The output looks like this:
<a>foo
bar
</a>
Is it possible to prevent the Transformer from escaping CR?

If the input XML contained literal CR characters, they would be removed during parsing. XML parsers normalize line endings to a single NL character; but this doesn't apply if the CR is escaped as 
.
So if a text node contains a CR character, the XSLT processor assumes you have worked hard to put it there and that you really want it, and it therefore outputs it in such a way that it will survive round-tripping where the resulting serialized output is re-processed by an XML parser.
Of course, you can get rid of CR characters in your XSLT code, just as you can get rid of any other characters. But it won't happen automatically.

How to unescape string in XML using Transformer?

I've a function which takes a XML document as parameter and writes it to the file. It contains element as <tag>"some text & some text": <text> text</tag> but in output file it's written as <tag>"some text & some text": <text> text</tag> But I don't want string to be escaped while writing to the file.
Function is,
public static void function(Document doc, String fileUri, String randomId){
DOMSource source = new DOMSource(doc,ApplicationConstants.ENC_UTF_8);
FileWriterWithEncoding writer = null;
try {
File file = new File(fileUri+File.separator+randomId+".xml");
if (!new File(fileUri).exists()){
new File(fileUri).mkdirs();
}
writer = new FileWriterWithEncoding(new File(file.toString()),ApplicationConstants.ENC_UTF_8);
StreamResult result = new StreamResult(writer);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = null;
transformer = transformerFactory.newTransformer();
transformer.setParameter(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
writer.close();
transformer.clearParameters();
}catch (IOException | TransformerException e) {
log.error("convert Exception is :"+ e);
}
}

There are five escape characters in XML ("'<>&). According to XML grammar, they must be escaped in certain places in XML, please see this question:
What characters do I need to escape in XML documents?
So you can't to much, for instance, to avoid escaping & or < in text content.
You could use CDATA sections if you want to retain "unescaped" content. Please see this question:
Add CDATA to an xml file

Adding linebreak in xml file before root node

I am trying to add line break after my comments above the root node in XML document.
I need something like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--DO NOT EDIT THIS FILE-->
<projects>
</projects>
But What I was able to get is this(Line break inside the root but I need line break after the comment):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--DO NOT EDIT THIS FILE--><projects>
</projects>
I need to add the line break just after my comment. Is there a way to do this?
My code:
import java.io.File;
import java.io.FileInputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Comment;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Text;
public class XMLNewLine {
/**
* #param args
*/
public static void main(String[] args) {
System.out.println("Adding comment..");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
DocumentBuilder db;
try {
Document doc;
StreamResult result;
result = new StreamResult(new File("abc.xml"));
db = dbf.newDocumentBuilder();
doc = db.parse(new FileInputStream(new File("abc.xml")));
Element element = doc.getDocumentElement();
Text lineBreak = doc.createTextNode("\n");
element.appendChild(lineBreak);
Comment comment = doc
.createComment("DO NOT EDIT THIS FILE");
element.getParentNode().insertBefore(comment, element);
doc.getDocumentElement().normalize();
TransformerFactory transformerFactory = TransformerFactory
.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource source = new DOMSource(doc);
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(source, result);
} catch (Exception e) {
// TODO Auto-generated catch block
}
}
}

You basically want a text node containing a line break after the comment node.
Element docElem = doc.getDocumentElement();
doc.insertBefore(doc.createComment("DO NOT EDIT THIS FILE"), docElem);
doc.insertBefore(doc.createTextNode("\\n"), docElem);
EDIT: It seems that appending even whitespace-only text nodes is not allowed at the root node of an org.w3c.dom.Document. This is 100% formally correct, but also unhelpful.
The way comments are rendered in the output of the Transformer is determined by the serializer it uses (there are different serializers for HTML, XML and plain text outputs). In the built-in XML serializer the end of a comment is defined as --> - without a newline.
Since the internals of javax.xml.transform.Transformer are hard-wired, the serializers are not public API and the class is marked as final, overriding that behavior or setting a custom serializer is impossible.
In other words, you are out of luck adding your line break in a clean way.
You can, however, safely add it in a slightly unclean way:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
FileInputStream inputXml = new FileInputStream(new File("input.xml"));
Document doc = db.parse(inputXml);
// add the comment node
doc.insertBefore(doc.createComment("THIS IS A COMMENT"), doc.getDocumentElement());
StringWriter outputXmlStringWriter = new StringWriter();
Transformer transformer = transformerFactory.newTransformer();
// "xml" + "UTF-8" "include XML declaration" is the default anyway, but let's be explicit
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(doc), new StreamResult(outputXmlStringWriter));
// now insert our newline into the string & write an UTF-8 file
String outputXmlString = outputXmlStringWriter.toString()
.replaceFirst("<!--", "\n<!--").replaceFirst("-->", "-->\n");
FileOutputStream outputXml = new FileOutputStream(new File("output.xml"));
outputXml.write(outputXmlString.getBytes("UTF-8"));
Doing search-and-replace operations on XML strings is highly discouraged in general, but in this case there is little that can go wrong.

Revisiting this after some time because I had the same issue. I found another solution that does not need to buffer the output in a String:
Write only the XML-declaration by passing an empty document. This will also append a linebreak.
Write the document content without XML-declaration
Code:
StreamResult streamResult = new StreamResult(writer);
// output XML declaration with an empty document
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.transform(new DOMSource(), streamResult);
// output the document without XML declaration
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(doc), streamResult);

You can achieve this by not adding the comment node to your document, but instead partially transforming your document. First transform your own XML processing instruction and comment separately, and then the rest of document:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new FileInputStream(new File("abc.xml")));
Result output = new StreamResult(new File("abc.xml"));
Source input = new DOMSource(doc);
// xml processing instruction and comment node
ProcessingInstruction xmlpi = doc.createProcessingInstruction("xml", "version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"");
Comment comment = doc.createComment("DO NOT EDIT THIS FILE");
// first transform the processing instruction and comment
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.transform(new DOMSource(xmlpi), output);
transformer.transform(new DOMSource(comment), output);
// then the document
transformer.transform(input, output);

There is a JDK bug concerning this. It was not fixed (as you would expect) because that would likely cause many problems to users' existing applications.
Adding the following output property fixes this:
transformer.setOutputProperty("http://www.oracle.com/xml/is-standalone", "yes");

Had the same issue.
I solved it by putting the comment inside the root element.
Not exactly the same, but I think acceptable.

This is my solution. I just take writer and write to it declaration and the header comment. After that I disable declaration in transformer this way
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
All code:
public static String xmlToTree(String xml, String headerComment) {
try (StringReader reader = new StringReader(xml)) {
StreamResult result = new StreamResult(new StringWriter());
result.getWriter().write("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n");
result.getWriter().write(headerComment + "\n");
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StreamSource source = new StreamSource(reader);
transformer.transform(source, result);
String xmlTree = result.getWriter().toString();
return xmlTree;
} catch (Exception ex) {
ex.printStackTrace();
return null;
}
}

How to unformat xml file

I have a method which returns a String with a formatted xml. The method reads the xml from a file on the server and parses it into the string:
Esentially what the method currently does is:
private ServletConfig config;
InputStream xmlIn = null ;
xmlIn = config.getServletContext().getResourceAsStream(filename + ".xml") ;
String xml = IOUtils.toString(xmlIn);
IOUtils.closeQuietly(xmlIn);
return xml;
What I need to do is add a new input argument, and based on that value, continue returning the formatted xml, or return unformatted xml.
What I mean with formatted xml is something like:
<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>
And what I mean with unformatted xml is something like:
<xml><root><elements><elem1/><elem2/><elements><root></xml>
or:
<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>
Is there a simple way to do this?

Strip all newline characters with String xml = IOUtils.toString(xmlIn).replace("\n", ""). Or \t to keep several lines but without indentation.

if you are sure that the formatted xml like:
<xml>
<root>
<elements>
<elem1/>
<elem2/>
<elements>
<root>
</xml>
you can replace all group 1 in ^(\s*)< to "". in this way, the text in xml won't be changed.

an empty transformer with a parameter setting the indent params like so
public static String getStringFromDocument(Document dom, boolean indented) {
String signedContent = null;
try {
StringWriter sw = new StringWriter();
DOMSource domSource = new DOMSource(dom);
TransformerFactory tf = new TransformerFactoryImpl();
Transformer trans = tf.newTransformer();
trans = tf.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");
trans.transform(domSource, new StreamResult(sw));
sw.flush();
signedContent = sw.toString();
} catch (TransformerException e) {
e.printStackTrace();
}
return signedContent;
}
works for me.
the key lies in this line
trans.setOutputProperty(OutputKeys.INDENT, indented ? "yes" : "no");

Try something like the following:
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(
new StreamSource(new StringReader(
"<xsl:stylesheet version=\"1.0\"" +
" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">" +
"<xsl:output method=\"xml\" omit-xml-declaration=\"yes\"/>" +
" <xsl:strip-space elements=\"*\"/>" +
" <xsl:template match=\"#*|node()\">" +
" <xsl:copy>" +
" <xsl:apply-templates select=\"#*|node()\"/>" +
" </xsl:copy>" +
" </xsl:template>" +
"</xsl:stylesheet>"
))
);
Source source = new StreamSource(new StringReader("xml string here"));
StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
Instead of source being StreamSource in the second instance, it can also be DOMSource if you have an in-memory Document, if you want to modify the DOM before saving.
DOMSource source = new DOMSource(document);
To read an XML file into a Document object:
File file = new File("c:\\MyXMLFile.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
doc.getDocumentElement().normalize();
Enjoy :)

If you fancy trying your hand with JAXB then the marshaller has a handy property for setting whether to format (use new lines and indent) the output or not.
JAXBContext jc = JAXBContext.newInstance(packageName);
Marshaller m = jc.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
m.marshal(element, outputStream);
Quite an overhead to get to that stage though... perhaps a good option if you already have a solid xsd

You can:
1) remove all consecutive whitespaces (but not single whitespace) and then replace all >(whitespace)< by ><
applicable only if usefull content does not have multiple consecutive significant whitespaces
2) read it in some dom tree and serialize it using some nonpretty serialization
SAXReader reader = new SAXReader();
Reader r = new StringReader(data);
Document document = reader.read(r);
OutputFormat format = OutputFormat.createCompactFormat();
StringWriter sw = new StringWriter();
XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
String string = writer.toString();
3) use Canonicalization (but you must somehow explain to it that those whitespaces you want to remove are insignificant)

Kotlin.
An indentation will usually come after new line and formatted as one space or more. Hence, to make everything in the same column, we will replace all of the new lines, following one or more spaces:
xmlTag = xmlTag.replace("(\n +)".toRegex(), " ")

XML Node to String in Java

I came across this piece of Java function to convert an XML node to a Java String representation:
private String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
System.out.println("nodeToString Transformer Exception");
}
return sw.toString();
}
It looks straightforward in that it wants the output string doesn't have any XML declaration and it must contain indentation.
But I wonder how the actual output should be, suppose I have an XML node:
<p><media type="audio" id="au008093" rights="wbowned">
<title>Bee buzz</title>
</media>Most other kinds of bees live alone instead of in a colony. These bees make
tunnels in wood or in the ground. The queen makes her own nest.</p>
Could I assume the resulting String after applying the above transformation is:
"media type="audio" id="au008093" rights="wbowned" title Bee buzz title /media"
I want to test it myself, but I have no idea on how to represent this XML node in the way this function actually wants.
I am bit confused, and thanks in advance for the generous help.

All important has already been said. I tried to compile the following code.
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
public class Test {
public static void main(String[] args) throws Exception {
String s =
"<p>" +
" <media type=\"audio\" id=\"au008093\" rights=\"wbowned\">" +
" <title>Bee buzz</title>" +
" " +
" Most other kinds of bees live alone instead of in a colony." +
" These bees make tunnels in wood or in the ground." +
" The queen makes her own nest." +
"</p>";
InputStream is = new ByteArrayInputStream(s.getBytes());
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document d = db.parse(is);
Node rootElement = d.getDocumentElement();
System.out.println(nodeToString(rootElement));
}
private static String nodeToString(Node node) {
StringWriter sw = new StringWriter();
try {
Transformer t = TransformerFactory.newInstance().newTransformer();
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
t.setOutputProperty(OutputKeys.INDENT, "yes");
t.transform(new DOMSource(node), new StreamResult(sw));
} catch (TransformerException te) {
System.out.println("nodeToString Transformer Exception");
}
return sw.toString();
}
}
And it produced the following output:
<p> <media id="au008093" rights="wbowned" type="audio"> <title>Bee buzz</title> </media> Most other kinds of bees live alone instead of in a colony. These bees make tunnels in wood or in the ground. The queen makes her own nest.</p>
You can further tweak it by yourself. Good luck!

You have an XML respesentation in a DOM tree.
For example you have opened an XML file and you have passed it in the DOM parser.
As a result a DOM tree in memory with your XML is created.
Now you can only access the XML info via traversal of the DOM tree.
If you need though, a String representation of the XML info of the DOM tree you use a transformation.
This happens since it is not possible to get the String representation directly from a DOM tree.
So if for example as Node node you pass in nodeToString is the root element of the XML doc then the result is a String containing the original XML data.
The tags will still be there. I.e. you will have a valid XML representation. Only this time will be in a String variable.
For example:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
Document xmlDoc = parser.parse(file);//file has the xml
String xml = nodeToString(xmlDoc.getDocumentElement());//pass in the root
//xml has the xml info. E.g no xml declaration. Add it
xml = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?> + xml;//bad to append this way...
System.out.println("XML is:"+xml);
DISCLAIMER: Did not even attempt to compile code. Hopefully you understand what you have to do

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Extract child elements - java

Since in your case, you desire an invalid XML as output, I suggest you don't parse the XML file. Just read the entire file line-by-line and build the desired String by skipping first and last lines

Related

Transformer escapes CR

How to unescape string in XML using Transformer?

Adding linebreak in xml file before root node

How to unformat xml file

XML Node to String in Java

Categories

Resources