parsing XML that contain XML in elements, Can this be done - java

I have a 'complex item' that is in XML, Then a 'workitem' (in xml) that contains lots of other info, and i would like this to contain a string that contains the complex item in xml.
for example:
<inouts name="ClaimType" type="complex" value="<xml string here>"/>
However, trying SAX and other java parsers I cannot get it to process this line, it doesn't like the < or the "'s in the string, I have tried escaping, and converting the " to '.
Is there anyway around this at all?? Or will I have to come up with another solution?
Thanks

I think you'll find that the XML you're dealing with won't parse with a lot of parsers since it's invalid. If you have control over the XML, you'll at a bare minimum need to escape the attribute so it's something like:
<inouts name="ClaimType" type="complex" value="<xml string here>" />
Then, once you've extracted the attribute you can possibly re-parse it to treat it as XML.
Alternatively, you can take one of the approaches above (using CDATA sections) with some re-factoring of your XML.
If you don't have control over your XML, you could try using the TagSoup library to parse it to see how you go. (Disclaimer: I've only used TagSoup for HTML, I have no idea how it'd go with non-HTML content)
(The tag soup site actually appears down ATM, but you should be able to find enough doco on the web, and downloads via the maven repository)

Possibly the easiest solution would be to use a CDATA section. You could convert your example to look like this:
<inouts name="ClaimType" type="complex">
<![CDATA[
<xml string here>
]]>
</inouts>
If you have more than one attribute you want to store complex strings for, you could use multiple child elements with different names:
<inouts name="ClaimType" type="complex">
<value1>
<![CDATA[
<xml string here>
]]>
</value1>
<value2>
<![CDATA[
<xml string here>
]]>
</value2>
</inouts>
Or multiple value elements with an identifying id:
<inouts name="ClaimType" type="complex">
<value id="complexString1">
<![CDATA[
<xml string here>
]]>
</value>
<value id="complexString2">
<![CDATA[
<xml string here>
]]>
</value>
</inouts>

CDATA section or escaping
NB There is a big difference between escaping and encoding, which some other posters have referred to. Be careful of confusing the two.

I'm not sure how it works for attributes, and if escaping (< as < and > as >) does not work, then I don't know.
If it were an inner tag: you could use the Xml Any mechanism (never used it myself) or declare it in a CDATA section.

you are http://www.doingitwrong.com/
If inouts/#value really is tree-structured (i.e. XML) then it shouldn't be an attribute, it should be a child element:
<inout name="ClaimType" type="complex">
<value>
<some-arbitrary>
<xml-stuff/>
</some-arbitrary>
</value>
</inout>
If it is not, in fact, guaranteed to be well-formed XML, but just sort of looks like it because you put some pointy brackets in it, then you should ask yourself if there isn't some better way to solve this problem. That failing, use <![CDATA[ as some have already suggested.

Related

How to use xpath in camel when the outermost element has an xmlns attribute?

I am having some trouble using xpath to extract the "Payload" values below using apache-camel. I use the below xpath in my route for both of the example xml, the first example xml returns SomeElement and SomeOtherElement as expected, but the second xml seems unable to parse the xml at all.
xpath("//Payload/*")
This example xml parses just fine.
<Message>
<Payload>
<SomeElement />
<SomeOtherElement />
</Payload>
</Message>
This example xml does not parse.
<Message xmlns="http://www.fake.com/Message/1">
<Payload>
<SomeElement />
<SomeOtherElement />
</Payload>
</Message>
I found a similar question about xml and xpath, but it deals with C# and is not a camel solution.
Any idea how to solve this using apache-camel?
Your 2nd example xml, specifies a default namespace: xmlns="http://www.fake.com/Message/1" and so your xpath expression will not match, as it specifies no namespace.
See http://camel.apache.org/xpath.html#XPath-Namespaces on how to specify a namespace.
You would need something like
Namespaces ns = new Namespaces("fk", "http://www.fake.com/Message/1");
xpath("//fk:Payload/*", ns)
I'm not familiar with Apache-Camel, this was just a result of some quick googling.
An alternative maybe to just change your xPath to something like
xpath("//*[local-name()='Payload']/*)
Good luck.

remove <![CDATA[ tag from xml webserivce responses

I am using below format to response for the webservices.
<Name>abc</Name>
<Detail>
<RESPONSE>
<Age>20</Age>
<Address>blahblah</Address>
<Mobile>12345</Mobile>
</RESPONSE>
</Detail>
Due to the requirements, I need to return xml format data insides the <Detail></Detail> tag.
In my java class, I parse using Xstream and format into xml and put insides the Detail tag.
But when I test using SOAPUI , I am getting extra <![CDATA[<RESPONSE>.. <</RESPONSE>]]> insdies Detail tag.
How can I avoid having those CDATA tag for the xml response?
<![CDATAP[......]]> is used to tell that the XML meaning of it should not be taken and to treat it as normal text that is called character data. so Parser won't seek for any XML meaning in it.
As Dave Newton and kshitij told it will automatically removed while converting it into object.
If you are not supposed to parse it as it is no issue to bother about it.

XML parsing:Retrieve multiple rows in xml using digester

While parsing an xml file like the one below, i want to get the list of telephone numbers for one particular id.I am using Digester to do this.But i am not understanding how to add the call methods or createobjects .Can anyone help me with this.My xml file contains 1000's of
types
<?xml version='1.0' encoding='utf-8'?>
<address-book>
<contact type="individual">
<id>50</id>
<city>New York</city>
<province>NY</province>
<postalcode>10013</postalcode>
<country>USA</country>
<address>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
</address>
</contact>
<contact type="business">
<id>52</id>
<city>Zagreb</city>
<province></province>
<postalcode>10000</postalcode>
<country>Croatia</country>
<address>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
</address>
</contact>
Also how should i stop the parsing when i get the required Id.
Although the question was specific to using the apache-commons-digester, this can be solved by the host of libraries already available in the XML families of functions - namely a SAX parser coupled with an XPath search. Instead of brute-forcing through the data, if what is being searched is known, an XPath query can find the data relatively efficiently. Otherwise, if traversing the entire set of data for indexing or other purposes, again, recommend using a simple SAX parser and looping through the elements (again possibly via an //MyElement type XPath query) and then for each instance, pass the value to a function for indexing or whatever operation. The apache-commons-digester may be overly complicated and/or slow for what is needed.

XStream doesn't show CData tags

When I read an XML with XStream, it doesn't show tag <![CDATA[ and ]]>.
I'd like XStream to show it.
For example:
This is a part of "test.xml"
<![CDATA[<b>]]>
If I show it in a browser, the browser shows it correctly:
<![CDATA[ <b> ]]>
But when I read and show XML with XStream I see only:
<b>
If i'm not mistaken each element should have a name and a value, (if their being read in as Xppdom objects). I'm guessing what you're looking at is the value. with the it might be a little different, because it is unparsed data, so the name may be "!CDATA" or may not have one at all. In the normal case: if you have <node attr1='val1'> text </node>, when it is read in, calling .getName() will return "node", .getValue() will return text, and .getAttribute("attr1") will return "val1".
If you wanted to print everything with their tags you could make a method String formatXppDom(XppDom elem) to format a printable string with the tags.

Doxygen doesn't parse tags normally. It parses tags like <Code>,<value> and \s\p into <computeroutput></computeroutput>

I am trying to generate xml using doxygen from java sourcecode. Doxygen doesn't parse tags like
<code>,<value> and \s\p.... correctly. It generates xml with incorrect values.
For example:
<code>0x0</code> tag is converted into <computeroutput>0x0</computeroutput>.
<para>
<computeroutput>This is code tag</computeroutput>
<value2>test value4</value2> </meta> </meta> <gid>000001</gid> <read>1</read>
</parameter> </component> </algebra>
</para>
similarly for other tags like <value> and \s\p also.
I am wondering why it happens?????
Please let me know what are all other tags also will produce the same output
and how to resolve it.
"correctly" is a bit of a misnomer when referring to xml, unless it weren't structured correctly, but I think you're referring to the tags.
If you don't like the output from doxygen why not write an xslt to make it whatever you want? I'm sure there are many doxygen.xml --> myflavor.xml transforms out there that you could use as a starting point.

Categories