How to get data from an URL xml-file? - java

I'm making an android app - where I need to have weather-information. I've found this from yahoo weather. It's an XML and I want information such as: "day", "low" and "high".
Refer: http://weather.yahooapis.com/forecastrss?w=12718298&u=c
<yweather:forecast day="Sun" date="19 Feb 2012" low="-2" high="3" text="Clear" code="31"/>
(Line can be found in the bottom of the link)
I have no idea how to do this - please help. Source codes, examples and clues will be appreciated.

Here's the solution for future users:
InputStream inputXml = null;
try
{
inputXml = new URL("http://weather.yahooapis.com/forecastrss?w=12718298&u=c").openConnection().getInputStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.
newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputXml);
NodeList nodi = doc.getElementsByTagName("yweather:forecast");
if (nodi.getLength() > 0)
{
Element nodo = (Element)nodi.item(0);
String strLow = nodo.getAttribute("low");
Element nodo1 = (Element)nodi.item(0);
String strHigh = nodo1.getAttribute("high");
System.out.println("Temperature low: " + strLow);
System.out.println("Temperature high: " + strHigh);
}
}
catch (Exception ex)
{
System.out.println(ex.getMessage());
}
finally
{
try
{
if (inputXml != null)
inputXml.close();
}
catch (IOException ex)
{
System.out.println(ex.getMessage());
}
}
}

It's been a couple of years since I used XML in Android, but this was quite helpful to me when I started out: anddev.org

The link seems to be a feed. (which is XML, obviously). There are many feed-reader APIs in Java. So, here you go
Read feed documentation, http://developer.yahoo.com/weather/
Read how to parse/read the feed, Rome Library to read feeds Java
Pull out your desired fields.
I guess this is already done. (easily found on Google) http://www.javahouse.altervista.org/gambino/Articolo_lettura_feed_da_java_en.html

Related

HTML body returns empty (most of it) when calling from Jsoup [duplicate]

I have a problem using jsoup what I am trying to do is fetch a document from the url which will redirect to another url based on meta refresh url which is not working, to explain clearly if I am entering a website url named http://www.amerisourcebergendrug.com which will automatically redirect to http://www.amerisourcebergendrug.com/abcdrug/ depending upon the meta refresh url but my jsoup is still sticking with http://www.amerisourcebergendrug.com and not redirecting and fetching from http://www.amerisourcebergendrug.com/abcdrug/
Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").get();
I have also tried using,
Document doc = Jsoup.connect("http://www.amerisourcebergendrug.com").followRedirects(true).get();
but both are not working
Any workaround for this?
Update:
The Page may use meta refresh redirect methods
Update (case insensitive and pretty fault tolerant)
The content parsed (almost) according to spec
The first successfully parsed content meta data should be used
public static void main(String[] args) throws Exception {
URI uri = URI.create("http://www.amerisourcebergendrug.com");
Document d = Jsoup.connect(uri.toString()).get();
for (Element refresh : d.select("html head meta[http-equiv=refresh]")) {
Matcher m = Pattern.compile("(?si)\\d+;\\s*url=(.+)|\\d+")
.matcher(refresh.attr("content"));
// find the first one that is valid
if (m.matches()) {
if (m.group(1) != null)
d = Jsoup.connect(uri.resolve(m.group(1)).toString()).get();
break;
}
}
}
Outputs correctly:
http://www.amerisourcebergendrug.com/abcdrug/
Old answer:
Are you sure that it isn't working. For me:
System.out.println(Jsoup.connect("http://www.ibm.com").get().baseUri());
.. outputs http://www.ibm.com/us/en/ correctly..
to have a better error handling and case sensitivity problem
try
{
Document doc = Jsoup.connect("http://www.ibm.com").get();
Elements meta = doc.select("html head meta");
if (meta != null)
{
String lvHttpEquiv = meta.attr("http-equiv");
if (lvHttpEquiv != null && lvHttpEquiv.toLowerCase().contains("refresh"))
{
String lvContent = meta.attr("content");
if (lvContent != null)
{
String[] lvContentArray = lvContent.split("=");
if (lvContentArray.length > 1)
doc = Jsoup.connect(lvContentArray[1]).get();
}
}
}
// get page title
return doc.title();
}
catch (IOException e)
{
e.printStackTrace();
}

How to retrieve all the user comments from a site?

I want all the user comments from this site : http://www.consumercomplaints.in/?search=chevrolet
The problem is the comments are just displayed partially, and to see the complete comment I have to click on the title above it, and this process has to be repeated for all the comments.
The other problem is that there are many pages of comments.
So I want to store all the complete comments in an excel sheet from the above site specified.
Is this possible ?
I am thinking of using crawler4j and jericho along with Eclipse.
My code for visitPage method:
#Override
public void visit(Page page) {
String url = page.getWebURL().getURL();
System.out.println("URL: " + url);
if (page.getParseData() instanceof HtmlParseData) {
HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
String html = htmlParseData.getHtml();
// Set<WebURL> links = htmlParseData.getOutgoingUrls();
// String text = htmlParseData.getText();
try
{
String CrawlerOutputPath = "/DA Project/HTML Source/";
File outputfile = new File(CrawlerOutputPath);
//If file doesnt exists, then create it
if(!outputfile.exists()){
outputfile.createNewFile();
}
FileWriter fw = new FileWriter(outputfile,true); //true = append file
BufferedWriter bufferWritter = new BufferedWriter(fw);
bufferWritter.write(html);
bufferWritter.close();
fw.write(html);
fw.close();
}catch(IOException e)
{
System.out.println("IOException : " + e.getMessage() );
e.printStackTrace();
}
System.out.println("Html length: " + html.length());
}
}
Thanks in advance. Any help would be appreciated.
Yes it is possible.
Start crawling on your search site (http://www.consumercomplaints.in/?search=chevrolet)
Use the visitPage method of crawler4j to only follow comments and the ongoing pages.
Take the html Content from crawler4j and shove it to jericho
filter out the content you want to store and write it to some kind of .csv or .xls file (i would prefer .csv)
Hope this helps you

DOM Parser receiving NullPointerException on pure HTML RSS post

I'm going to try make this as clear as possible, although I'm not sure I'll succeed.
I've implemented a DOM parser in Android to parse a typical RSS feed based off some of the code found here. It works fine for almost all of the feeds I've tried however I just ran into a NullPointerException on the line theString = nchild.item(j).getFirstChild().getNodeValue(); (my code is lower down) on a certain post on a certain feed from a Blogger site. I know it's only this post because I rewrote the loop to ignore this single post and the error didn't appear and parsing continued just fine. Upon looking at this post within the actual RSS feed, it seems this post is entirely written in HTML (as opposed to just standard text) whereas the other posts which succeeded aren't.
Would this be the cause of the issue, or should I keep looking? And if this is indeed the issue, how would I go about solving it? Is there a way to ignore posts which are written in this way? I've tried looking for alternative examples to compare and try, but it seems that everyone has used the same base code for their tutorials.
The post I'm referring to is just a link, and a couple of lines of coloured text within <div> tags with some different fonts. I'd post it here, but I'm not sure the owner of the feed would want me to (I'll ask and update if able).
My parser:
try {
// Create required instances
DocumentBuilderFactory dbf;
dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
// Parse the xml
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
// Get all <item> tags.
NodeList nl = doc.getElementsByTagName("item");
int length = nl.getLength();
for (int i = 0; i < length; i++) {
Node currentNode = nl.item(i);
RSSItem _item = new RSSItem();
NodeList nchild = currentNode.getChildNodes();
int clength = nchild.getLength();
for (int j = 1; j < clength; j = j + 2) {
Node thisNode = nchild.item(j);
String theString = null;
String nodeName = thisNode.getNodeName();
theString = nchild.item(j).getFirstChild().getNodeValue();
if (theString != null) {
if ("title".equals(nodeName)) {
_item.setTitle(theString);
} else if ("description".equals(nodeName)) {
_item.setDescription(theString);
} else if ("pubDate".equals(nodeName)) {
String formatedDate = theString.replace(" +0000", "");
_item.setDate(formatedDate);
} else if ("author".equals(nodeName)) {
_item.setAuthor(theString);
}
}
}
_feed.addItem(_item);
}
} catch (Exception e) {
e.printStackTrace();
}
return _feed;
}
As I mentioned, I changed the text to ignore the (third) post causing the issue:
if(i != 3){
if (theString != null) {
if ("title".equals(nodeName)) {
_item.setTitle(theString);
} else if ("description".equals(nodeName)) {
_item.setDescription(theString);
} else if ("pubDate".equals(nodeName)) {
String formatedDate = theString.replace(" +0000", "");
_item.setDate(formatedDate);
} else if ("author".equals(nodeName)) {
_item.setAuthor(theString);
}
}
}
Which resulted in everything working as desired, just skipping the third post. Any help with this is appreciated, I've been searching for a while with no luck. I'd post my logcat but it's not very useful after the line I pasted at the start of this Q due to it going back through an AsyncTask.
Oh, and one of the ways I was thinking about solving it was just parse the description first instead of the title (rewriting the loop of course), and detecting if that was equal to NULL before continuing the parse. It'd be quite messy though, so I'm searching for an alternative.
Take a look at the HTML code you are trying to parse. I'm almost sure that the third post has no child. This is, it's empty. For example, this node would throw you an exception:
<Element></Element>
So, you must avoid calling getNodeValue before checking if the node has any childs:
theString = nchild.item(j).getFirstChild().getNodeValue();
To avoid this, you could make something like:
if (nchild.item(j).getFirstChild() != null)
//and your code
//...

SAXBuilder build throws java.lang.StringIndexOutOfBoundsException

I am parsing this xml
<Root><Status>1</Status><Message>Get call Successful</Message><StatusCode></StatusCode><Item type = 'all' subtype = '0' ><subItem><rank>0</rank><name>humywe12</name><value>4500</value></subItem></Item></Root>
I am parsing it using this code
SAXBuilder builder = new SAXBuilder();
Document doc = null;
xml = xml.replaceAll("\t", "");
StringReader r = new StringReader(xml);
try {
doc = builder.build(r); <-----here it throws error
} catch (IOException e) {
// e.printStackTrace();
throw e;
} catch (Exception e) {
// e.printStackTrace();
throw e;
}
return doc;
}
builder.build(r) it throws exception StringIndexOutOfBoundsException.
Am I doing something wrong?
updated
ok I have removed only these tags "type = 'all' subtype = '0'" and now it is not giving java.lang.StringIndexOutOfBoundsException. Is there any problem with SAXBUILDER ??
I believe this was a know JDom bug. See http://www.jdom.org/pipermail/jdom-interest/2000-August/001227.html
You may want to check out one of the latest versions of jdom (as fits within your application).
Someone can try and identify the error for you, but what I would do is to start with very small xml, say
<Root></Root>
and keep adding to it till I get the error and then see what in the data caused the error.
Spaces are not allowed between the attribute name and the "=", or between the "=" and the attribute value.
See the spec.

XML Parsing in Android [Java]

The API I need to work with does not support xpath, which is a bit of a headache! :-( lol
The xml I want to parse is as a String. My questions:
Is there a Java equivalent of "simplexml_load_string", where it makes the string into an xml document for parsing?
Which is better for parsing, SAX or DOM? I need to get a couple of values out of the XML and the structure isn't that deep. [3 levels]
Thanks!
Maybe this will help you
//http://developer.android.com/intl/de/reference/android/content/res/XmlResourceParser.html
import org.xmlpull.v1.XmlPullParserException;
try {
XmlResourceParser xrp = ctx.getResources().getXml(R.xml.rules);
while (xrp.getEventType() != XmlResourceParser.END_DOCUMENT) {
if (xrp.getEventType() == XmlResourceParser.START_TAG) {
String s = xrp.getName();
if (s.equals("category")) {
String catname = xrp.getAttributeValue(null, "name");
String rule = xrp.getAttributeValue(null, "rule");
}
} else if (xrp.getEventType() == XmlResourceParser.END_TAG) {
;
} else if (xrp.getEventType() == XmlResourceParser.TEXT) {
;
}
xrp.next();
}
xrp.close();
} catch (XmlPullParserException xppe) {
Log.e(TAG(), "Failure of .getEventType or .next, probably bad file format");
xppe.toString();
} catch (IOException ioe) {
Log.e(TAG(), "Unable to read resource file");
ioe.printStackTrace();
}
Not sure.
If the XML file/string is small, DOM is a good choice as it provides more capability. SAX should be used for larger XML files where memory usage and performance is a concern.

Categories