unmarshalling <br/> in XML data - java

I have some xml data I'm trying to unmarshall into java objects and one of the elements contains <br/> elements:
<details>
<para>
Line Number One
<br/>
Line Number Two
</para>
</details>
In my Details java object I have:
class Details {
#XmlElement(name="para")
private List<String> paragraphs;
}
The problem is that the only element in the paragraphs list is 'Line Number Two'. Does anyone know how I can deal with this?

You can represent mixed content with #XmlMixed as follows (note that it's applied to content of a class itself rather than to its element, thus you need an additional class):
class Details {
#XmlElement(name="para")
private Para para;
...
}
class Para {
#XmlMixed
#XmlAnyElement
private List<Object> paragraphs;
...
}
paragraphs property will contain Strings for text lines and Elements for XML elements.

In that case the XML is not formed correctly. Put the entire data inside the tags within CDATA to avoid this issue. Refer - http://www.w3schools.com/xml/xml_cdata.asp

You could use #XmlAnyElement along with a DomHandler to preserve fragments of the XML document as a String. Below is a link to a complete example demonstrating how to do this:
http://blog.bdoughan.com/2011/04/xmlanyelement-and-non-dom-properties.html

Related

Compare two XML attributes with same name in two different XML files using Java

I have an XML with attributes and elements/tags.
I want to know whether using an attribute or a tag is good according to performance.
Could you please give an example to compare if the content has a child tag and also if the content has a attribute.
My question is, is it possible to compare 2 attributes with same name in 2 different XML files and also here we will have huge data.
So, I want to be sure how the performance is, if i consider it as a attribute or tag.
<A Name="HRMS">
<B BName="IN">
<C Code="0001">
<IN irec="200" />
<OUT orec="230" Number="" Outname=""/>
</C>
<C Code="0004">
<IN irec="209" />
<OUT orec="209" Number="" Outname=""/>
</C>
<C Code="0008">
<IN irec="250" />
<OUT orec="250" Number="" Outname=""/>
</C>
</B>
</A>
Here, i have to compare irec with orec for a particular B name and C code
It's possible. You need a java lib like jsoup to help parse xml by path expression like jquery css selection expression.
Jsoup is a HTML parser, but html is a kind of xml application, so you can use it to parse xml content.
jsoup example:
String xml = "<root><person name=\"Bob\"><age>20</age></person></root>";
Document root = Jsoup.parse(xml);
System.out.println(root.body().html());//origin XML content
Elements persons = root.getElementsByTag("person");
Element person = persons.first();
System.out.println("The attribute 'name' of Person:" + person.attr("name"));
System.out.println(persons.select("person[name=Bob]").first().text());
You can implement compare difference function using jsoup simplily.

Which annotation tag should use to get repeatable tag content in JaxB?

Which annotation tag should use to store paragraph element contents in a String Array? Since there are no elements inside the paragraph, it is not possible to store content even by creating a separate paragraph class.
<letter>
<from>Hansel</from>
<to>Gretal</to>
<paragraphs>
<paragraph>
First paragraph text
</paragraph>
<paragraph>
Seconds paragraph text
</paragraph>
</paragraphs>
</letter>
I think you may use #XmlElementWrapper:
http://docs.oracle.com/javase/7/docs/api/javax/xml/bind/annotation/XmlElementWrapper.html.
Like this:
#XmlElementWrapper(name = "paragraphs")
#XmlElement(name = "paragraph")
String[] paragraphs;
See also this for an example:
http://grepcode.com/file/repo1.maven.org/maven2/de.cubeisland/messageextractor-core/2.0.0/de/cubeisland/messageextractor/extractor/java/configuration/Annotation.java#82

Jaxb unmarshal - elements with minoccurs 0 which should be combined

I have following XML file:
<?xml version="1.0" encoding="utf-8"?>
<Paragraph>
<ParaStyleName>headline_red</ParaStyleName>
<TextStyleRanges>
<TextStyleRange>
<CharStyleName>[Ohne]</CharStyleName>
<Contents>
<Content>inhalt</Content>
<Content>test text</Content>
<SpecialCharacter name="HARD_RETURN"/>
<Content> "text here</Content>
<SpecialCharacter name="DOUBLE_QUOTE_LEFT"/>
</Contents>
</TextStyleRange>
</TextStyleRanges>
</Paragraph>
From this xml I need to obtain the Content part like this:
inhalt test text HARD_RETURN "text here DOUBLE_QUOTE_LEFT
For me the tag order inside of <Contents> is important, problem is that the number of and <SpecialCharacter> is not always fix, and also the position of this tags is not fixed.
Note: I'm using JAXB for this and I have created the Model Class for Contents, for Content and for SpecialCharacter where in Contents I have as members ArrayList<Content> and ArrayList<SpecialCharacter> but in this case I can't linked the lists correct to keep the correct order of tags.
Please HELP me with a solution for this case.
Thanks!
You are going to need to merge these two lists as follows:
#XmlElements(
#XmlElement(name="Content", type=Content.class),
#XmlElement(name="SpecialCharacter", type=SpecialCharacter.class)
})
public List<Object> getValues() {
return values;
}

Does org.dom4j.io.SAXReader.read(Reader reader) method preserves the order of elements and attributes of XML

My XML file is:
<XYZ>
<A name="one">
<label>I am A one</label>
</A>
<B name="two">
<label>I am B two</label>
</B>
<A name="three">
<label>I am A three</label>
</A>
</XYZ>
My Code is:
String myXmlAsString = //Read the above xml as String
Document document = new SAXReader().read(new StringReader(myXmlAsString ));
List<Element> dataElements = document.selectNodes("/XYZ");
My Question is:
If I read my XML file through above mentioned code then does the dataElements List returned by selectNodes(String xPathExpr) method will have the same order as in the original XML file?
If yes, does this holds true even if the XML has deep nesting and I call the selectNodes(String xPathExpr) method on any Element object from this document object.
XPath does not change the order of elements when returning results, so the elements are exactly in the same order as in your input xml.
Lists are ordered structures. There is no reason for the SAXReader to remove that order.

How to remove CDATA from XML in Java and do some conversion?

I am trying create Java Servlet which will modify existing XML.
This a part of my orginal XML:
<customfieldvalues>
<div id="errorDiv" style="display:none;"/>
<![CDATA[
Vinduer, dører
]]>
</customfieldvalues>
I want to get the following result:
<customfieldvalues>
<div id="errorDiv" style="display:none;"/>
Vinduer, dører
</customfieldvalues>
I iterate over the XML structure with:
Document doc = parseXML(connection.getInputStream());
NodeList descNodes = doc.getElementsByTagName("customfieldvalues");
for (int i=0; i<descNodes.getLength();i++) {
Node node = descNodes.item(i);
// how to ?
}
So, I need to remove CDATA and convert the content.
I saw that I can use this for the conversion.
javax.xml.parsers.DocumentBuilderFactory.setCoalescing API
Specifies that the parser produced by this code will
convert CDATA nodes to Text nodes and append it to the
adjacent (if any) text node. By default the value of this is set to
false

Categories