Why is the output of my XML file appendin a '/>' - java

Here is my dilemma:
I have an XML, where I want to insert animation_sequence,however, instead the code adds animation_sequnce/> with an opening angle bracket, I can add all other elements but that one. Why is that? I tried adding the XML here but it wouldn't render. Here is my code:
Element state = testDoc.createElement("state");
state.setTextContent(element);
Element animationState = testDoc.createElement("animation_state");
Element sequence = testDoc.createElement("animation_sequence");
testDoc.getElementsByTagName("animations_list").item(0).appendChild(animationState).appendChild(state);
testDoc.getElementsByTagName("animation_state").item(testDoc.getElementsByTagName("state").getLength() - 1).appendChild(sequence);

The code you have shown us creates nodes in a tree. It doesn't append any angle brackets to anything. Angle brackets only appear when you serialize the tree (convert it to lexical XML). Generally the system takes care of how to serialize the XML, and you don't need to worry when it chooses between different ways of serializing it because when the XML is parsed the differences won't matter.
Now it could be that the "/>" is a symptom that the tree you have built isn't the tree that you intended to build, but that's a different matter.

Related

How to getChildText() of a node with a namespace, when there are multiple namespaces in the XML?

I want to use getChildText() to get text from a node that is a few levels deep. There are two namespaces in the file. The syntax below does not work and sets textToGet to null.
String textToGet = root.getChildText("ns1:Customer/ns1:Address/ns1:Street/ns2:Streetname");
I know there is an alternative of first getting the Child Element, and then its Text, but I want to use a one-liner.
Also, would rather not chain getChild(), because some of the elements are not guaranteed to be in the file.
You are not going to be able to make that a one-liner....
Consider using XPaths.... JDOM 2.x should help with that:
XPathExpression<String> xpe = XPathFactory.instance().compile(
Filters.fstring(), "ns1:Customer/ns1:Address/ns1:Street/ns2:Streetname",
null, namespace_ns1, namespace_ns2);
String textToGet = xpe.evaluateFirst(root);
(textToGet may be null)
Edit, the XPath expression above actually returns an element... you should add "/text()" to the end of the XPath, or change textToGet to be String (and the Filters too).
Rolf

Serialize a Document object in Java, while preserving the formatting of arbitrary elements

I am using the function below to convert a DOM Document object into a String in Java.
public static String convertDocumentToString(final Document doc) {
final DOMImplementationLS domImplementation = (DOMImplementationLS) doc.getImplementation();
final LSSerializer lsSerializer = domImplementation.createLSSerializer();
lsSerializer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE);
final String xml = lsSerializer.writeToString(doc);
return xml;
}
This works well most of the time, but there is are some specific elements that I don't want formatted (e.g. the screen DocBook element). So I have two questions:
Is there a way to skip certain elements when formatting XML in Java like in the code above?
If not, is there another way to convert a Document to a String while preserving the layout of arbitrary elements?
Note that I have also used the Transformer in the past (see Getting xml string from Document in Java), but that didn't preserve CDATA sections.
Update:
Just so I am clear, I am deserializing and serializing XML in order to create a Document object that can be edited programatically via a DOM, with the serialization process preferably "pretty printing" the resulting XML (with the exception of some arbitrary elements).
Update 2:
In the end I created a custom function to convert a Node to a String with optional formatting. See the convertNodeToString function at https://sourceforge.net/p/commonclasses/code/110/tree/trunk/src/com/redhat/ecs/commonutils/XMLUtilities.java called like so:
final String exampleXml = FileUtilities.readFileContents(new File("test.xml"));
final ArrayList<String> contentsInlineElements = new ArrayList<String>();
contentsInlineElements.add("title");
contentsInlineElements.add("term");
final ArrayList<String> inlineElements = new ArrayList<String>();
inlineElements.add("prompt");
inlineElements.add("command");
inlineElements.add("firstterm");
inlineElements.add("ulink");
inlineElements.add("guilabel");
inlineElements.add("filename");
inlineElements.add("replaceable");
inlineElements.add("parameter");
inlineElements.add("literal");
inlineElements.add("classname");
inlineElements.add("sgmltag");
inlineElements.add("guibutton");
inlineElements.add("guimenuitem");
inlineElements.add("guimenu");
inlineElements.add("menuchoice");
inlineElements.add("citetitle");
final ArrayList<String> verbatimElements = new ArrayList<String>();
verbatimElements.add("screen");
verbatimElements.add("programlisting");
final Document doc = XMLUtilities.convertStringToDocument(exampleXml);
final String formattedXml = XMLUtilities.convertNodeToString(doc.getDocumentElement(), true, false, false, verbatimElements, inlineElements, contentsInlineElements, true, 1, 0);
Serialization is designed to get data across a transport medium, but not necessarily (or even usually) in a way that is true to the form of the input data, if that form is by definition not carrying any extra information (as is the case with XML documents).
If you need to carry over the design, too, you will have to encode this "meta" information (i. e. the formatting) into the data itself, for example by escaping whitespace etc. Maybe the easiest solution, but one that will keep you from simply "reading" (as in with your eyes) the transport stream, is to encode your formatted data in something like Base64. This will perfectly transport inside an XML wrapper, while at the same time conserving the fidelity of the original input data you fed into the encoder.
On the other side, of course, you will have to decode the data again, before you can go on processing it further.
The short answer: you can't. When you tell the serializer to pretty-print, you're making a statement about the use of inter-element whitespace (ie, it's ignorable).
The longer answer: you can't without modifying the DOM (or a copy of it). IMO the simplest way is the following:
Identify the node that you want to preserve. I'll assume that you have an ID or some other way to select it using XPath.
Call Document.adoptNode() to move that node into a new DOM. I recall having some issues with this method, but that was many years ago. If it doesn't work, use Document.importNode() and explicitly remove the node from the source document. I believe that you can adopt a node as the root of a document, but can't guarantee that.
Insert a text node into the original document, containing unique content. An easy way to generate unique content is UUID.randomUUID().toString().
Convert both documents to strings, pretty-printing one and not pretty-printing the other.
Use String.replace() to insert the not-pretty-printed document into the pretty-printed document.
And, as always, if you're planning to write those strings to a file or other byte-oriented format, you must explicitly encode as UTF-8.
Whitespace is not significant in XML documents other than in CDATA sections, and none of the standard tools is designed to preserve it. Any requirement to the contrary is ill-formed.

org.w3c.dom.Node with Android version less than 2.2

getTextContent() is not a recognized function. getNodeValue() works fine for strings, but anytime I try to parse a number using getNodeValue(), it returns null!
How can I parse a Long from XML using this class?
The root cause of this is that the getTextContent() method is a W3C DOM Level 3 method; see the changes section of the DOM level 3 core spec.
The Node interface has two new attributes, Node.baseURI and Node.textContent. ...
and getTextContent() is the getter for the new attribute.
(Presumably, older versions of Android don't implement the DOM level 3 APIs.)
The behavior of getTextContent() is a bit complicated in the general case; see the spec for the textContext attribute. In the simple case where the target node is an Element with (only) text contents, node.getTextContext() is the same as node.getFirstChild().getNodeValue().
You need to navigate down to the text node. For instance, with something like this:
<val>10000</val>
the parsed XML tree has an Element node for the tag which, in turn, has a child Text node for the "10000". A typical idiom is
valNode.getFirstChild().getNodeValue()
If as you say "getNodeValue() works fine for strings", then nothing should prevent from doing the:
Long l = Long.getLong(node.getNodeValue());
Note, getNodeValue() will always return a String that should be then manually converted to a numerical type.
Also - are you sure you are parsing a right node (the one that holds the needed long value)?

XPath getting text from an element after a certain element

So right now if I have something like this:
//div[#class='artist']/p[x]/text()
x can either be 3 or 4, or maybe even a different number. Luckily, if what I am looking for is not in 3, I can just check for null and go on until I find text. The issue is I would rather know I'm going to the right element every time. So I tried this:
div[#class='people']/h3[text()='h3 text']/p/text()
since there will always be a <p> right after <h3>h3 text</h3>. However this never returns anything, and usually results in an error. If I remove /p I will get 'h3 text' returned.
Anyway, how do I get that <p> directly after <h3>?
BTW, I'm using HTMLCleaner in Java for this.
By default when you don't specify an axis you get the child:: axis, which is why the / operator seems to descend the DOM tree child by child. There is an implied child:: after each slash.
In your case you don't want to find a child of the <div>, you want to find a sibling of it. A sibling is an element at the same nesting level. Specifically, you should use the following-sibling:: axis.
div[#class='people']/h3[text()='h3 text']/following-sibling::p/text()
XPath Axes
Axes are an advanced feature of XPath. They are one of the features that make XPath especially powerful.
You're already familiar with one other axis, though you may not have realized it: the # symbol is shorthand for attribute::. When you write #href you're really saying attribute::href, as in look for an attribute called "href" instead of a child.
Axes, eh? Shorthand, eh? Tell me more, you say? OK!
. and .. are shorthand for the more verbose self::node() and parent::node(), respectively. You could use the longer forms if you wished.
The // operator you commonly see as //p or body//a has a hidden descendant-or-self::node() between the slashes. //p is shorthand for /descendant-or-self::node()/p.
Anyway, how do I get that <p>
directly after <h3>?
Use:
div[#class='people']/h3[text()='h3 text']/following-sibling::p[1]

Handling Empty Nodes Using Java DOM

I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):
<InputStringList>
<InputString></InputString>
<InputString>000</InputString>
<InputString>111</InputString>
<InputString>01001</InputString>
<InputString>1011011</InputString>
<InputString>1011000</InputString>
<InputString>01010</InputString>
<InputString>1010101110</InputString>
</InputStringList>
I extract my strings from the list using:
//Get input strings to be validated
xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
for (int j = 0; j < xmlNodeList.getLength(); j++) {
//Add input string to list
if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());
} else {
arrInputStrings.add("");
}
}
How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.
Thank you in advance for your time.
if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
nodeValue shouldn't be null; it would be firstChild itself that might be null and should be checked for:
Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());
However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.
What you really want is the textContent property from DOM Level 3 Core, which will give you all the text inside the element, however contained.
arrInputStrings.add(xmlNodeList.item(j).getTextContent());
This is available in Java 1.5 onwards.
You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:
List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
.find(XML_INPUT_STRING)
.texts();

Categories