Parsing XML-response and store to a container - java

I'm making HTTP-queries to a website and response I get is in XML-format. What I want to do is make multiple queries, parse data and have them in an ArrayList or some other container so I can easily access each query's data. I've been using some time to play with SAX for parsing the response. Examples I read had XML format like this:
<?xml version="1.0"?>
<company>
<staff>
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff>
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
I managed to parse format like this pretty easily just by looking at the examples on the internet.
But in my case I need to parse data like this:
<?xml version="1.0" encoding="UTF-8"?>
<root response="True">
<movie title="A Good Marriage" year="2014" rated="R" released="03 Oct 2014" runtime="102 min" genre="Thriller" director="Peter Askin" writer="Stephen King (short story)" actors="Joan Allen, Anthony LaPaglia, Stephen Lang, Cara Buono" plot="After 25 years of a good marriage, what will Darcy do once she discovers her husband's sinister secret?" language="English" country="USA" awards="N/A" poster="http://ia.media-imdb.com/images/M/MV5BMTk3MjY2ODgwNl5BMl5BanBnXkFtZTgwMTQ0Mjg0MjE#._V1_SX300.jpg" metascore="43" imdbRating="5.1" imdbVotes="2,016" imdbID="tt2180994" type="movie"/>
</root>
And from this response I want parse all the things to some container, so it's easy to use. I'm still learning things, maybe someone can help me out here, point me to right direction? :) Making queries is not a problem but parsing and storing data is.
EDIT: So to be more clear, my problem is that response from server isn't in neat XML-format like in the first example, you can see it's like this:
<movie title="A Good Marriage" year="2014" rated="R" released="03 Oct 2014" runtime="102 min" genre="Thriller" director="Peter Askin" writer="Stephen King (short story)" actors="Joan Allen, Anthony LaPaglia, Stephen Lang, Cara Buono" plot="After 25 years of a good marriage, what will Darcy do once she discovers her husband's sinister secret?" language="English" country="USA" awards="N/A" poster="http://ia.media-imdb.com/images/M/MV5BMTk3MjY2ODgwNl5BMl5BanBnXkFtZTgwMTQ0Mjg0MjE#._V1_SX300.jpg" metascore="43" imdbRating="5.1" imdbVotes="2,016" imdbID="tt2180994" type="movie"/>
And when I run my code, it doesn't print out anything but when I modify XML a bit manually like this:
<?xml version="1.0" encoding="UTF-8"?>
<root response="True">
<movie> title="Oblivion" year="2013" rated="PG-13" released="19 Apr 2013" runtime="124 min" genre="Action, Adventure, Mystery" director="Joseph Kosinski" writer="Karl Gajdusek (screenplay), Michael Arndt (screenplay), Joseph Kosinski (graphic novel original story)" actors="Tom Cruise, Morgan Freeman, Olga Kurylenko, Andrea Riseborough" plot="A veteran assigned to extract Earth's remaining resources begins to question what he knows about his mission and himself." language="English" country="USA" awards="10 nominations." poster="http://ia.media-imdb.com/images/M/MV5BMTQwMDY0MTA4MF5BMl5BanBnXkFtZTcwNzI3MDgxOQ##._V1_SX300.jpg" metascore="54" imdbRating="7.0" imdbVotes="307,845" imdbID="tt1483013" type="movie"/>
</movie>
</root>
So I added ending tag > for the movie-element and ending tag </movie> to the end, my program prints it like:
Movie : title="Oblivion" year="2013" rated="PG-13" released="19 Apr 2013" runtime="124 min" genre="Action, Adventure, Mystery" director="Joseph Kosinski" writer="Karl Gajdusek (screenplay), Michael Arndt (screenplay), Joseph Kosinski (graphic novel original story)" actors="Tom Cruise, Morgan Freeman, Olga Kurylenko, Andrea Riseborough" plot="A veteran assigned to extract Earth's remaining resources begins to question what he knows about his mission and himself." language="English" country="USA" awards="10 nominations." poster="http://ia.media-imdb.com/images/M/MV5BMTQwMDY0MTA4MF5BMl5BanBnXkFtZTcwNzI3MDgxOQ##._V1_SX300.jpg" metascore="54" imdbRating="7.0" imdbVotes="307,845" imdbID="tt1483013" type="movie"/>
So basically code I'm using at the moment reads everything between <movie> and </movie>, problem is that original response from the server leaves movie tag open like this: <movie title="Oblivion"... and doesn't have </movie> tag either.
I've been struggling pretty long with this, hopefully someone understands my confusing explanation! At the moment my parser code looks like this:
public void getXml(){
try {
// obtain and configure a SAX based parser
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
// obtain object for SAX parser
SAXParser saxParser = saxParserFactory.newSAXParser();
// default handler for SAX handler class
// all three methods are written in handler's body
DefaultHandler defaultHandler = new DefaultHandler(){
String movieTag="close";
// this method is called every time the parser gets an open tag '<'
// identifies which tag is being open at time by assigning an open flag
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if(qName.equalsIgnoreCase("MOVIE")) {
movieTag = "open";
}
}
// prints data stored in between '<' and '>' tags
public void characters(char ch[], int start, int length)
throws SAXException {
if(movieTag.equals("open")) {
System.out.println("Movie : " + new String(ch, start, length));
}
}
// calls by the parser whenever '>' end tag is found in xml
// makes tags flag to 'close'
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(qName.equalsIgnoreCase("MOVIE")) {
movieTag = "close";
}
}
};
// parse the XML specified in the given path and uses supplied
// handler to parse the document
// this calls startElement(), endElement() and character() methods
// accordingly
saxParser.parse("xml/testi.xml", defaultHandler);
} catch (Exception e) {
e.printStackTrace();
}
}
Please anyone, help is greatly appreciated..

You can still use a SAX parser, which you've been learning. You didn't mention which parser you're using. I use xerxes (from Apache.org).
What you might want to do is implement a class that extends DefaultHandler. If you're using Eclipse as your IDE, you can have Eclipse implements stubs for all the methods from DefaultHandler, then add debug output to each of them to get a better feel for what happens.
But the important method is this:
public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException
All your fields (title, year, rated, etc) will be available in the attributes array.
Then what you'll get:
-A call to startElement for the
-A call to startElement for the
Plus other calls you don't care about. So once you understand what you're doing, you can delete the methods that are nothing but debug statements, if you want.

Related

Java - attributevalue from parents child

I'm currently working on a small weather API (From YR.NO) in Java.
The API is in XML as shown below:
NOTE : There are several of those "boxes" in the same API, just different time on them.
Whole XML shown
<time datatype="forecast" from="2016-09-08T21:00:00Z" to="2016-09-08T21:00:00Z">
<location altitude="47" latitude="59.3293235" longitude="18.0685808">
<temperature id="TTT" unit="celsius" value="12.0"/>
<windDirection id="dd" deg="121.8" name="SE"/>
<windSpeed id="ff" mps="2.2" beaufort="2" name="Svak vind"/>
<windGust id="ff_gust" mps="3.7"/>
<humidity value="80.5" unit="percent"/>
<pressure id="pr" unit="hPa" value="1016.0"/>
<cloudiness id="NN" percent="51.6"/>
<fog id="FOG" percent="-0.0"/>
<lowClouds id="LOW" percent="51.2"/>
<mediumClouds id="MEDIUM" percent="0.0"/>
<highClouds id="HIGH" percent="0.8"/>
<dewpointTemperature id="TD" unit="celsius" value="8.8"/>
</location>
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException
{
for (int i = 0; i < attributes.getLength(); i++)
{
String attributeName = attributes.getLocalName(i);
String attributeValue = attributes.getValue(i);
System.out.println(attributes.getLocalName(i) + " : " + attributes.getValue(i));
//if(attributeValue.toLowerCase().indexOf(timeCheck.toLowerCase()) != -1)
// {
// System.out.println("Temperature: " + attributes.getValue(i));
// }
}
With this ^ i can easily display all the Names with their values but I can't really figure out how to control it.
What I have now is a String that saves the user-input like:
String timeCheck = "T"+timeUserInput;
(If i input 15 then the timeCheck becomes "T15")
and then I check if some value obtains this input for example "T21" with this inside the for-loop:
if(attributeValue.toLowerCase().indexOf(timeCheck.toLowerCase()) != -1)
and then if's true I want to print the value of temperature which in this case is "12.0" but I can't seem to find an easy way of doing it. I can print ALL the temperature values but i ONLY want the temperature value of the right time.
Tried my best explaining my issue, recently started with Java, Thanks in advance! If you need an explanation of anything just tell me and i'll try.
To your purposes, I'd recommend you to use XPATH Api, a specific language to query an XML. The program would be like this:
public void getTemperature(org.w3c.dom.Document doc)
throws javax.xml.transform.TransformerException, javax.xml.xpath.XPathExpressionException
{
javax.xml.xpath.XPathFactory xpathFactory=javax.xml.xpath.XPathFactory.newInstance();
javax.xml.xpath.XPath xpath=xpathFactory.newXPath();
String temperature=(String)xpath.evaluate("product/time[contains(#from,'"+timeCheck+"')]/location/temperature/#value", doc, javax.xml.xpath.XPathConstants.STRING);
}

COSM JSON Parser Error on .POST

I'm using jpachube and am running into problems with .POST on creatDatastream. I am getting POST error 400, and the following details from COSM's debug tool:
{"title":"JSON Parser Error","errors":"lexical error: invalid char in json text. <? xmlversion=\"1.0\"encoding=\"U"}
My XML request body from the COSM debug tool is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<eeml xmlns="http://www.eeml.org/xsd/005"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="5" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd"><environment><data id="0">
<tag>CPU</tag>
<current_value>0.0</current_value>
</data>
</environment>
</eeml>
COSM's API documentation for what the xml request body should look like is as follows:
<eeml xmlns="http://www.eeml.org/xsd/0.5.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance" version="0.5.1" xsi:schemaLocation="http://www.eeml.org/xsd/0.5.1 http://www.eeml.org/xsd/0.5.1/0.5.1.xsd">
<environment>
<data id="23">
<tag>apple</tag>
<tag>jag</tag>
<tag>tag</tag>
<tag>lag</tag>
<current_value>11</current_value>
<max_value>211.0</max_value>
<min_value>7.0</min_value>
<unit type="conversionBasedUnits" symbol="symbol">label</unit>
</data>
</environment>
The only difference I found was the version #, but I made that switch in the code already and got the same error.
I thought for the v2 of the COSM API was set up so xml and JSON are interchangeable but it converts everything to JSON.
The error is coming from this method call in Pachube.java
public boolean createDatastream(int feed, String s) throws PachubeException {
HttpRequest hr = new HttpRequest("http://api.cosm.com/v2/feeds/"
+ feed + "/datastreams/");
hr.setMethod(HttpMethod.POST);
hr.addHeaderItem("X-PachubeApiKey", this.API_KEY);
hr.setBody(s);
HttpResponse g = this.client.send(hr);
if (g.getHeaderItem("Status").equals("HTTP/1.1 201 Created")) {
return true;
} else {
throw new PachubeException(g.getHeaderItem("Status"));
}
}
Any input appreciated.
Day two...
Modified the createDatastream method using input from bjpirt (much thanks). Method looks like this
public boolean createDatastream(int feed, String s) throws PachubeException {
HttpRequest hr = new HttpRequest("http://api.cosm.com/v2/feeds/"
+ feed + "/datastreams.xml");
hr.setMethod(HttpMethod.POST);
hr.addHeaderItem("X-PachubeApiKey", this.API_KEY);
hr.addHeaderItem("Content-Type:", "application/xml");
hr.setBody(s);
HttpResponse g = this.client.send(hr);
if (g.getHeaderItem("Status").equals("HTTP/1.1 201 Created")) {
return true;
} else {
Log.d("create data stream", "prob");
throw new PachubeException(g.getHeaderItem("Status"));
}
}
This throws the following error for .POST on the COSM debug tool (error code 422):
<?xml version="1.0" encoding="UTF-8"?><errors><title>Unprocessable Entity</title> <error>Stream ID has already been taken</error></errors>
So, naturally, I need to get a title on this request. That is done through the toXMLWithWrapper in Data.java
public String toXMLWithWrapper() {
String ret = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<eeml xmlns=\"http://www.eeml.org/xsd/005\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" version=\"5\" xsi:schemaLocation=\"http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd\"><environment>";
ret = ret + ">\n\t<title>" + "cosm app" + "</title>\n\t";//inserted this line to add title
ret = ret + this.toXML() + "</environment></eeml>";
return ret;
}
And the request body looks like (from COSM debug tool):
<?xml version="1.0" encoding="UTF-8"?>
<eeml xmlns="http://www.eeml.org/xsd/005" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="5" xsi:schemaLocation="http://www.eeml.org/xsd/005 http://www.eeml.org/xsd/005/005.xsd"><environment>
<title>cosm app</title>
<data id="0">
<tag>CPU</tag>
<current_value >0.0</current_value>
</data></environment></eeml>
This comes back as error code 500 (ouch!)
Response body is
<?xml version="1.0" encoding="UTF-8"?><errors><title>Oops, something's broken</title> <error>We've been unable to complete your request due to a problem with our server</error></errors>
day three
it was pointed out that there was a problem with the xml (see below). I fixed the typo and I'm back to a 422 error. So, looking more closely at the response body I thought maybe that there was something wrong with the data stream. I delete all of the datastreams in the feed, create a new feed, and I get exactly ONE AWESOME HTTP:/1.1 201 - happy, right? Wrong, after the first .POST I get nothing. When I turn the app off and then back on, I'm back to 422 error and the same response body "Stream ID has already been taken". Yikes!
It seems like the xml may be invalid.
The opening <environment> node seems to be closed twice <environment>>
The 422 is probably because you are trying to POST to an existing feed.
To update a feed you need to send a PUT request.
See the Updating a feed docs
The clue is that the system looks like it's expecting json but you're feeding it XML. The default for the v2 api is json, so you'll need to make sure that you're including XML in the URL, e.g:
https://api.cosm.com/v2/feeds/113/datastreams.json
Alternatively, you can set a content type header on the request to indicate this:
Content-Type: application/xml

Android - Filtering html tags from XML(Rss/Atom) Feed using Regular Expressions

I am developing a newsreader app for my website http://www.werchelsea.com/ that gets the latest news from the feed: http://www.werchelsea.com/feed/atom/ and I am succesful to get the feed properly and casting it into a string. Now my main problem is that my feed description contains data with html tags like:
<p>It was Raul Meireles who came from the Merseyside to London to complete his move from Liverpool to Chelsea on the dead line day of the summer transfer window last year, when Chelsea failed to sign the highly-rated midfielder, Luka Modric. Chelsea were left with no other choice but to sign the Portuguese midfielder.</p>
<p>Meireles was a regular starter under the management of Villas-Boas, he really enjoyed working under
<a href='http://www.werchelsea.com/2012/09/05/time-to-say-good-bye-to-raul-meireles/303777_153113331443746_1122718871_n/' title='303777_153113331443746_1122718871_n'><img width="150" height="150" src="http://www.werchelsea.com/wp-content/uploads/2012/09/303777_153113331443746_1122718871_n-150x150.jpg" class="attachment-thumbnail" alt="Meireles first training session with Chelsea football club" title="303777_153113331443746_1122718871_n" /></a>
What I tried was replacing all these tags with regular expression but for some reason I am not able to find a correct RE to match all the html tag types. What I used to replace was:
protected String doInBackground(String... arg0) {
String response="";
try{
URL feedwebsite=new URL(feedURL);
SAXParserFactory spf=SAXParserFactory.newInstance();
SAXParser sp = spf.newSAXParser();
XMLHandler feedHandler=new XMLHandler();
XMLReader feedReader=sp.getXMLReader();
feedReader.setContentHandler(feedHandler);
InputSource is=new InputSource(feedwebsite.openStream());
feedReader.parse(is);
response=feedHandler.getParsedFeed().replaceAll("<"+"[0-9a-zA-Z]+"+">","_").replaceAll("</"+"[0-9a-zA-Z]+"+">","-").replaceAll("<"+"[0-9a-zA-Z]+"+"/>",".");
}
catch (Exception e)
{
response="Cannot Connect to the server.Please Check your Wifi/Data Connection.";
e.printStackTrace();
}
return response;
}***
If replacing the string using RE is the correct procedure for this or is there any other way please help me.
To match an HTML tag (opening or closing), use this regex:
<[^>]+?>

How to Parse XML Response of Http Post in Java

I have an XML response from HTTP POST of this form
<PaymentResponse><TransDate>301111</TransDate><TransTime>011505</TransTime><SourceID>0</SourceID><Balance>0</Balance><ResponseCode>06</ResponseCode><ResponseMessage>Error XXXX</ResponseMessage><ApprovalCode>DECL</ApprovalCode><ApprovalAmount>0</ApprovalAmount></PaymentResponse>
But I am unable to parse it using Sax parser. Kindly help me out.
regards
I am using this code
public void endElement(String uri, String localName, String qName) throws SAXException {
if(localName.equalsIgnoreCase("TransDate"))
{
int tD = Integer.parseInt(currentValue);
tempResponse.setTDate(tD);
}
But every time localName comes with empty string.
As to your specific question: have a look at 'qName' argument instead: local name is only populated when parser uses namespace-aware mode. qName should contain "qualified" name, ie. concatenation of prefix (if any) and local name; so something like "ns:element" (if there is a prefix), or "element" (if no prefix).

Parsing http returned xml with java

So I've tried searching and searching on how to do this but I keep seeing a lot of complicated answers for what I need. I basically am using the Flurry Analytics API to return some xml code from an HTTP request and this is what it returns.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<eventMetrics type="Event" startDate="2011-2-28" eventName="Tip Calculated" endDate="2011-3-1" version="1.0" generatedDate="3/1/11 11:32 AM">
<day uniqueUsers="1" totalSessions="24" totalCount="3" date="2011-02-28"/>
<day uniqueUsers="0" totalSessions="0" totalCount="0" date="2011-03-01"/>
<parameters/>
</eventMetrics>
All I want to get is that totalCount number which is 3 with Java to an int or string. I've looked at the different DOM and SAX methods and they seem to grab information outside of the tags. Is there someway I can just grab totalCount within the tag?
Thanks,
Update
I found this url -http://www.androidpeople.com/android-xml-parsing-tutorial-%E2%80%93-using-domparser/
That helped me considering it was in android. But I thank everyone who responded for helping me out. I checked out every answer and it helped out a little bit for getting to understand what's going on. However, now I can't seem to grab the xml from my url because it requires an HTTP post first to then get the xml. When it goes to grab xml from my url it just says file not found.
Update 2
I got it all sorted out reading it in now and getting the xml from Flurry Analytics (for reference if anyone stumbles upon this question)
HTTP request for XML file
totalCount is what we call an attribute. If you're using the org.w3c.dom API, you call getAttribute("totalCount") on the appropriate element.
If you are using an SAX handler, override the startElement callback method to access attributes:
public void startElement (String uri, String name, String qName, Attributes atts)
{
if("day".equals (qName)) {
String total = attrs.getValue("totalCount");
}
}
A JDOM example. Note the use of SAXBuilder to load the document.
URL httpSource = new URL("some url string");
Document document = SAXBuilder.build(httpSource);
List<?> elements = document.getDescendants(new KeyFilter());
for (Element e : elements) {
//do something more useful with it than this
String total = (Element) e.getAttributeValue("totalCount");
}
class KeyFilter implements Filter {
public boolean matches (Object obj) {
return (Element) obj.getName().equals("key");
}
}
I think that the simplest way is to use XPath, below is an example based on vtd-xml.
import com.ximpleware.*;
public class test {
public static void main(String[] args) throws Exception {
String xpathExpr = "/eventMetrics/day/#totalCount";
VTDGen vg = new VTDGen();
int i = -1;
if (vg.parseHttpUrl("http://localhost/test.xml", true)) {
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot();
ap.selectXPath(xpathExpr);
ap.bind(vn);
System.out.println("total count "+(int)ap.evalXPathtoDouble());
}
}
}

Categories