XML parsing:Retrieve multiple rows in xml using digester - java

While parsing an xml file like the one below, i want to get the list of telephone numbers for one particular id.I am using Digester to do this.But i am not understanding how to add the call methods or createobjects .Can anyone help me with this.My xml file contains 1000's of
types
<?xml version='1.0' encoding='utf-8'?>
<address-book>
<contact type="individual">
<id>50</id>
<city>New York</city>
<province>NY</province>
<postalcode>10013</postalcode>
<country>USA</country>
<address>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
</address>
</contact>
<contact type="business">
<id>52</id>
<city>Zagreb</city>
<province></province>
<postalcode>10000</postalcode>
<country>Croatia</country>
<address>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
<telephone>1-212-345-6789</telephone>
</address>
</contact>
Also how should i stop the parsing when i get the required Id.

Although the question was specific to using the apache-commons-digester, this can be solved by the host of libraries already available in the XML families of functions - namely a SAX parser coupled with an XPath search. Instead of brute-forcing through the data, if what is being searched is known, an XPath query can find the data relatively efficiently. Otherwise, if traversing the entire set of data for indexing or other purposes, again, recommend using a simple SAX parser and looping through the elements (again possibly via an //MyElement type XPath query) and then for each instance, pass the value to a function for indexing or whatever operation. The apache-commons-digester may be overly complicated and/or slow for what is needed.

Related

Java - Parse XML dynamically and insert elements and values into a stack or deque

I have a requirement where I should be able to take any XML document, parse it dynamically and store into a stack/deque for further processing.
Can someone recommend what is a good way to parse XMLs dynamically in JAVA.
Consider this XML
<Response>
<Stock>
<RecordID>130</RecordID>
<SegmentLength>0023</SegmentLength>
<Account>
<Number>233342</Number>
<Type>P</Type>
</Account>
</Stock>
<Stock>
<RecordID>030</RecordID>
<SegmentLength>1023</SegmentLength>
<Account>
<Number>255673</Number>
<Type>P</Type>
</Account>
</Stock>
</Response>
How can I write a method that parsers this XML dynamically and pushes elements into a stack/deque.
I cannot use DOM as DOM requires me to provide the element tag while parsing. The program should be able to accept any XML and parse it dynamically

DOM xml parsing difficulties in java

<XML>
<log>
<date>20022014</date>
<time>2323</time>
<schools>
<school name="ahss"/>
<student>shiva</student>
<class>B</class>
</schools>
</log>
<log>...</log>
</XML>
need to parse this xml format using DOM i have tried many substitutes but i couldn't get it
is there any one to give shoulder

remove <![CDATA[ tag from xml webserivce responses

I am using below format to response for the webservices.
<Name>abc</Name>
<Detail>
<RESPONSE>
<Age>20</Age>
<Address>blahblah</Address>
<Mobile>12345</Mobile>
</RESPONSE>
</Detail>
Due to the requirements, I need to return xml format data insides the <Detail></Detail> tag.
In my java class, I parse using Xstream and format into xml and put insides the Detail tag.
But when I test using SOAPUI , I am getting extra <![CDATA[<RESPONSE>.. <</RESPONSE>]]> insdies Detail tag.
How can I avoid having those CDATA tag for the xml response?
<![CDATAP[......]]> is used to tell that the XML meaning of it should not be taken and to treat it as normal text that is called character data. so Parser won't seek for any XML meaning in it.
As Dave Newton and kshitij told it will automatically removed while converting it into object.
If you are not supposed to parse it as it is no issue to bother about it.

How to get data with tag name & their values inside parent tag in xml

I am working on Java. I am parsing an xml file, I am getting tag values, it is working. I have xml file as follows:
<DOC>
<STUDENT>
<ID>1</ID>
<NAME>DAN</NAME>
<ADDRESS>U.K</ADDRESS>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>JACK</NAME>
<ADDRESS>U.S</ADDRESS>
</STUDENT>
</DOC>
I have question that I want to fetch data inside <DOC>....</DOC> with their tag name & value as well. Means I want data as follows:
"<STUDENT>
<ID>1</ID>
<NAME>DAN</NAME>
<ADDRESS>U.K</ADDRESS>
</STUDENT>
<STUDENT>
<ID>2</ID>
<NAME>JACK</NAME>
<ADDRESS>U.S</ADDRESS>
</STUDENT>"
Please guide me how to do it.
The most common approaches in Java are to use one of either SAX or Dom parsing libraries.
If you look them up you should find loads of documentation/tutorials about them.
Dom is the easiest to use normally as it stores the entire XML in memory and you cna then access any tag, however, this is less performant and can be problematic if you are using very large XML. SAX requires more work, but reads the XML and processes each tag as it gets to it.
Both are able to do what you need though.
Take a look at SAX Parser.
This link might be helpful too: http://www.mkyong.com/java/how-to-read-xml-file-in-java-sax-parser/

parsing XML that contain XML in elements, Can this be done

I have a 'complex item' that is in XML, Then a 'workitem' (in xml) that contains lots of other info, and i would like this to contain a string that contains the complex item in xml.
for example:
<inouts name="ClaimType" type="complex" value="<xml string here>"/>
However, trying SAX and other java parsers I cannot get it to process this line, it doesn't like the < or the "'s in the string, I have tried escaping, and converting the " to '.
Is there anyway around this at all?? Or will I have to come up with another solution?
Thanks
I think you'll find that the XML you're dealing with won't parse with a lot of parsers since it's invalid. If you have control over the XML, you'll at a bare minimum need to escape the attribute so it's something like:
<inouts name="ClaimType" type="complex" value="<xml string here>" />
Then, once you've extracted the attribute you can possibly re-parse it to treat it as XML.
Alternatively, you can take one of the approaches above (using CDATA sections) with some re-factoring of your XML.
If you don't have control over your XML, you could try using the TagSoup library to parse it to see how you go. (Disclaimer: I've only used TagSoup for HTML, I have no idea how it'd go with non-HTML content)
(The tag soup site actually appears down ATM, but you should be able to find enough doco on the web, and downloads via the maven repository)
Possibly the easiest solution would be to use a CDATA section. You could convert your example to look like this:
<inouts name="ClaimType" type="complex">
<![CDATA[
<xml string here>
]]>
</inouts>
If you have more than one attribute you want to store complex strings for, you could use multiple child elements with different names:
<inouts name="ClaimType" type="complex">
<value1>
<![CDATA[
<xml string here>
]]>
</value1>
<value2>
<![CDATA[
<xml string here>
]]>
</value2>
</inouts>
Or multiple value elements with an identifying id:
<inouts name="ClaimType" type="complex">
<value id="complexString1">
<![CDATA[
<xml string here>
]]>
</value>
<value id="complexString2">
<![CDATA[
<xml string here>
]]>
</value>
</inouts>
CDATA section or escaping
NB There is a big difference between escaping and encoding, which some other posters have referred to. Be careful of confusing the two.
I'm not sure how it works for attributes, and if escaping (< as < and > as >) does not work, then I don't know.
If it were an inner tag: you could use the Xml Any mechanism (never used it myself) or declare it in a CDATA section.
you are http://www.doingitwrong.com/
If inouts/#value really is tree-structured (i.e. XML) then it shouldn't be an attribute, it should be a child element:
<inout name="ClaimType" type="complex">
<value>
<some-arbitrary>
<xml-stuff/>
</some-arbitrary>
</value>
</inout>
If it is not, in fact, guaranteed to be well-formed XML, but just sort of looks like it because you put some pointy brackets in it, then you should ask yourself if there isn't some better way to solve this problem. That failing, use <![CDATA[ as some have already suggested.

Categories