Currently, I am able to print all the URLs of a page but not able to print the text available on the URL....
For example:
<a class="fbl" href="/preferences?hl=en" jsaction="foot.cst" id="fsettl">Settings</a>
The code is able to print only "/preferences?hl=en", but not the text of the link i.e., Settings....
public static List getLinks(String uriStr) {
List result = new ArrayList<String>();
//create a reader on the html content
try{
System.out.println("in the getlinks try");
URL url = new URI(uriStr).toURL();
URLConnection conn = url.openConnection();
Reader rd = new InputStreamReader(conn.getInputStream());
// Parse the HTML
EditorKit kit = new HTMLEditorKit();
HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
kit.read(rd, doc, 0);
// Find all the A elements in the HTML document
HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
while (it.isValid()) {
SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
String link = (String)s.getAttribute(HTML.Attribute.HREF);
if (link != null) {
// Add the link to the result list
System.out.println(link);
//System.out.println("link print finished");
result.add(link);
}
//System.out.println(link);
it.next();
}
}
How would I print the content of the URL?
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.util.Iterator;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class PrintURL {
public static void main(String[] args) throws Exception
{
Reader r = null;
try {
URL u = new URL("https://www.google.co.in/");
// URL u = new URL(args[0]);
InputStream in = u.openStream();
r = new InputStreamReader(in);
Document jsoup = Jsoup.connect("https://www.google.co.in/").get();
Elements aHref = jsoup.getElementsByTag("a");
Iterator<Element> iterator = aHref.iterator();
while (iterator.hasNext())
{
Element element = iterator.next();
System.out.println("\nLink: " + element.attr("href"));
System.out.println("Link Name: " + element.text());
}
} finally {
if (r != null) {
r.close();
}
}
}
}
Related
Possibly the terminology is different with HTML than with XML, but here is a HTML document from which attributes are being retrieved. Here the attributes a1, a2, a3 are part of the Body tag.
<html>
<head>
Hello World
</head>
<body a1="ABC" a2="3974" a3="A1B2"> <------These attributes
<H1>Start Here<H1>
<p>This is the body</p>
</body>
</html>
Using the following file to parse the above HTML file.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.Reader;
import javax.swing.text.AttributeSet;
import javax.swing.text.Element;
import javax.swing.text.ElementIterator;
import javax.swing.text.StyleConstants;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
public class HTMLParserTest
{
public static void main(String args[]) throws Exception {
Reader reader = new FileReader("C:/Downloads/DeleteMe/Example1.html");
BufferedReader br = new BufferedReader(reader );
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
parser.parse(br, callback, true);
// Parse
ElementIterator iterator = new ElementIterator(htmlDoc);
Element element;
while ((element = iterator.next()) != null)
{
System.out.println("Element : " + element);
AttributeSet attributes = element.getAttributes();
Object name = attributes.getAttribute(StyleConstants.NameAttribute);
if ((name instanceof HTML.Tag))
//&& ((name == HTML.Tag.H1) || (name == HTML.Tag.H2) || (name == HTML.Tag.H3)))
{
// Build up content text as it may be within multiple elements
StringBuffer text = new StringBuffer();
int count = element.getElementCount();
for (int i = 0; i < count; i++) {
Element child = element.getElement(i);
AttributeSet childAttributes = child.getAttributes();
System.out.println("Element : " + child);
System.out.println(" Attribute count : " + childAttributes.getAttributeCount());
System.out.println(" a1 exists : " + childAttributes.isDefined("a1"));
int startOffset = child.getStartOffset();
int endOffset = child.getEndOffset();
int length = endOffset - startOffset;
text.append(htmlDoc.getText(startOffset, length));
}
}
}
System.exit(0);
}
}
The output is here.
Element : BranchElement(html) 0,1
Element : BranchElement(body) 0,1
Attribute count : 1
a1 exists : false <-----expected true here.
Element : BranchElement(body) 0,1
Element : BranchElement(p) 0,1
Attribute count : 3
a1 exists : false
Element : BranchElement(p) 0,1
Element : LeafElement(content) 0,1
Attribute count : 1
a1 exists : false
Element : LeafElement(content) 0,1
The expectation is that the "a1 exists" check should have returned true once, but it did not.
Eventually all 3 (a1, a2, a3) will be searched.
Is the above code the proper implementation or is this not feasible with the HTML parser?
Maybe this will help:
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class AttributeHTML
{
public static void main(String[] args)
{
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
try
{
// Create a reader on the HTML content.
Reader rd = getReader(args[0]);
// Parse the HTML.
kit.read(rd, doc, 0);
// Iterate through the elements of the HTML document.
ElementIterator it = new ElementIterator(doc);
Element elem = null;
while ( (elem = it.next()) != null )
{
if (elem.getName().equals("body"))
{
AttributeSet as = elem.getAttributes();
Enumeration enum1 = as.getAttributeNames();
while( enum1.hasMoreElements() )
{
Object name = enum1.nextElement();
Object value = as.getAttribute( name );
System.out.println( "\t" + name + " : " + value );
}
}
}
}
catch (Exception e)
{
e.printStackTrace();
}
System.exit(1);
}
// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException
{
// Retrieve from Internet.
if (uri.startsWith("http:"))
{
URLConnection conn = new URL(uri).openConnection();
return new InputStreamReader(conn.getInputStream());
}
// Retrieve from file.
else
{
return new FileReader(uri);
}
}
}
Test using:
java AttributeHTML yourFile.html
I am not aware about HtmlKitbut u can achieve similar result using regex
public static void main(String[] args) throws UnirestException {
String html = "<html>\r\n" +
" <head>\r\n" +
" Hello World\r\n" +
" </head>\r\n" +
" <body a1=\"ABC\" a2=\"3974\" a3=\"A1B2\">\r\n" +
" <H1>Start Here<H1>\r\n" +
" <p>This is the body</p>\r\n" +
" </body>\r\n" +
"</html>";
Pattern regexBodyPattern = Pattern.compile("<body[^>]*>", Pattern.MULTILINE);
Matcher matcher = regexBodyPattern.matcher(html);
while(matcher.find()) {
String bodyTag = matcher.group();
Pattern regexBodyAttrPattern = Pattern.compile("(\\S*)=(\\\"\\w*\\\")", Pattern.MULTILINE);
Matcher attrMatcher = regexBodyAttrPattern.matcher(bodyTag);
while(attrMatcher.find()) {
System.out.println("Key :: "+attrMatcher.group(1)+" , Value "+attrMatcher.group(2));
}
}
}
output
Key :: a1 , Value "ABC"
Key :: a2 , Value "3974"
Key :: a3 , Value "A1B2"
To retrieve the attributes, you can provide your own ParserCallback
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.List;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;
public class HTMLParserTest2
{
public static void main(String args[]) throws Exception {
Reader reader = new FileReader("d:/temp/Example.html");
BufferedReader br = new BufferedReader(reader);
System.out.println(HTMLParserTest2.extractTagsAttributes(br));
// output : [title-_implied_=true, body-a1=ABC, body-a2=3974, body-a3=A1B2]
System.exit(0);
}
public static List<String> extractTagsAttributes(Reader r) throws IOException {
final ArrayList<String> list = new ArrayList<String>();
ParserDelegator parserDelegator = new ParserDelegator();
ParserCallback parserCallback = new ParserCallback() {
#Override
public void handleText(final char[] data, final int pos) { }
#Override
public void handleStartTag(Tag tag, MutableAttributeSet attribute, int pos) {
Enumeration<?> e=attribute.getAttributeNames();
while(e.hasMoreElements()) {
Object name=e.nextElement();
Object value=attribute.getAttribute(name);
list.add(tag.toString() + "-" + name + "=" +value);
}
}
#Override
public void handleEndTag(Tag t, final int pos) { }
#Override
public void handleSimpleTag(Tag t, MutableAttributeSet a, final int pos) { }
#Override
public void handleComment(final char[] data, final int pos) { }
#Override
public void handleError(final java.lang.String errMsg, final int pos) { }
};
parserDelegator.parse(r, parserCallback, true);
return list;
}
}
My java code is:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class celebGrepper {
static class CelebData {
URL link;
String name;
CelebData(URL link, String name) {
this.link=link;
this.name=name;
}
}
public static String grepper(String url) {
URL source;
String data = null;
try {
source = new URL(url);
HttpURLConnection connection = (HttpURLConnection) source.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
/**
* Attempting to fetch an entire line at a time instead of just a character each time!
*/
StringBuilder str = new StringBuilder();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
while((data = br.readLine()) != null)
str.append(data);
data=str.toString();
} catch (IOException e) {
e.printStackTrace();
}
return data;
}
public static ArrayList<CelebData> parser(String html) throws MalformedURLException {
ArrayList<CelebData> list = new ArrayList<CelebData>();
Pattern p = Pattern.compile("<td class=\"image\".*<img src=\"(.*?)\"[\\s\\S]*<td class=\"name\"><a.*?>([\\w\\s]+)<\\/a>");
Matcher m = p.matcher(html);
while(m.find()) {
CelebData current = new CelebData(new URL(m.group(1)),m.group(2));
list.add(current);
}
return list;
}
public static void main(String... args) throws MalformedURLException {
String html = grepper("https://www.forbes.com/celebrities/list/");
System.out.println("RAW Input: "+html);
System.out.println("Start Grepping...");
ArrayList<CelebData> celebList = parser(html);
for(CelebData item: celebList) {
System.out.println("Name:\t\t "+item.name);
System.out.println("Image URL:\t "+item.link+"\n");
}
System.out.println("Grepping Done!");
}
}
It's supposed to fetch the entire HTML content of https://www.forbes.com/celebrities/list/. However, when I compare the actual result below to the original page, I find the entire table that I need is missing! Is it because the page isn't completely loaded when I start getting the bytes from the page via the input stream? Please help me understand.
The Output of the page:
https://jsfiddle.net/e0771aLz/
What can I do to just extract the Image link and the names of the celebs?
I know it's an extremely bad practice to try to parse HTML using regex and is the stuff of nightmares, but on a certain video training course for android, that's exactly what the guy did, and I just wanna follow along since it's just in this one lesson.
I have created a web scraper which brings the market data of share rates from the website of stock exchange. www.psx.com.pk in that site there is a hyperlink of Market Summary. From that link I have to scrap the data. I have created a program which is as follows.
package com.market_summary;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.Iterator;
import java.util.Locale;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ComMarket_summary {
boolean writeCSVToConsole = true;
boolean writeCSVToFile = true;
boolean sortTheList = true;
boolean writeToConsole;
boolean writeToFile;
public static Document doc = null;
public static Elements tbodyElements = null;
public static Elements elements = null;
public static Elements tdElements = null;
public static Elements trElement2 = null;
public static String Dcomma = ",";
public static String line = "";
public static ArrayList<Elements> sampleList = new ArrayList<Elements>();
public static void createConnection() throws IOException {
System.setProperty("http.proxyHost", "191.1.1.202");
System.setProperty("http.proxyPort", "8080");
String tempUrl = "http://www.psx.com.pk/index.php";
doc = Jsoup.connect(tempUrl).get();
System.out.println("Successfully Connected");
}
public static void parsingHTML() throws Exception {
File fold = new File("C:\\market_smry.csv");
fold.delete();
File fnew = new File("C:\\market_smry.csv");
for (Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
trElement2 = trElement.getElementsByTag("td");
tdElements = trElement.getElementsByTag("td");
FileWriter sb = new FileWriter(fnew, true);
if (trElement.hasClass("marketData")) {
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
if (it.hasNext()) {
sb.append("\r\n");
}
for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement2 = it.next();
final String content = tdElement2.text();
if (it2.hasNext()) {
sb.append(formatData(content));
sb.append(" | ");
}
}
System.out.println(sb.toString());
sb.flush();
sb.close();
}
}
System.out.println(sampleList.add(tdElements));
}
}
}
private static final SimpleDateFormat FORMATTER_MMM_d_yyyy = new SimpleDateFormat("MMM d, yyyy", Locale.US);
private static final SimpleDateFormat FORMATTER_dd_MMM_yyyy = new SimpleDateFormat("dd-MMM-YYYY", Locale.US);
public static String formatData(String text) {
String tmp = null;
try {
Date d = FORMATTER_MMM_d_yyyy.parse(text);
tmp = FORMATTER_dd_MMM_yyyy.format(d);
} catch (ParseException pe) {
tmp = text;
}
return tmp;
}
public static void main(String[] args) throws IOException, Exception {
createConnection();
parsingHTML();
}
}
Now, the problem is when I execute this program it should create a .csv file but what actually happens is it's not creating any file. When I debug this code I found that program is not entering in the loop. I don't understand that why it is doing so. While when I run the same program on the other website which have slightly different page structure it is running fine.
What I understand that this data is present in the #document which is a virtual element and doesn't mean anything that's why program can't read it while there is no such thing in other website. Kindly, help me out to read the data inside the #document element.
Long Story Short
Change your temp url to http://www.psx.com.pk/phps/index1.php
Explanation
There is no table in the document of http://www.psx.com.pk/index.php.
Instead it is showing it's content in two frameset.
One is dummy with url http://www.psx.com.pk/phps/blank.php.
Another one is the real page which is showing actual data and it's url is
http://www.psx.com.pk/phps/index1.php
Scenario/Requirement:
Download html page from some URL
Download images that were mentioned in html tags.
Change tags for images in my file, so I can open it with my browser offline and see them.
I made first 2 points, but am having difficulties with the third one.Tags do not change.What am I doing wrong?
The job is to open a file, find img src tag and replace it by another tag! Can you give me an example?
Code:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.*;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;
import java.awt.image.BufferedImage;
import java.net.URL;
import java.net.URLConnection;
import javax.imageio.ImageIO;
import javax.swing.text.AttributeSet;
import javax.swing.text.html.HTMLDocument;
public class ExtractAllImages {
static String result_doc = "/home/foo/index.html";
static String home_folder = "/home/foo/";
static String start_webURL = "http://www.oracle.com/";
public static void main(String args[]) throws Exception {
String webUrl = start_webURL;
URL url = new URL(webUrl);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr);
HTMLEditorKit htmlKit = new HTMLEditorKit();
HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
HTMLEditorKit.Parser parser = new ParserDelegator();
HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
parser.parse(br, callback, true);
FileWriter writer = new FileWriter(result_doc);
htmlKit.write(writer, htmlDoc, 0, htmlDoc.getLength());
writer.close();
int number_or_images = 0;
String[] array = new String[4096];
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.IMG); iterator.isValid(); iterator.next()) {
AttributeSet attributes = iterator.getAttributes();
String imgSrc = (String) attributes.getAttribute(HTML.Attribute.SRC);
System.out.println("img_src = " + imgSrc);
if (imgSrc != null && (imgSrc.endsWith(".jpg") || (imgSrc.endsWith(".png")) || (imgSrc.endsWith(".jpeg")) || (imgSrc.endsWith(".bmp")) || (imgSrc.endsWith(".ico")))) {
try {
downloadImage(webUrl, imgSrc);
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
array[number_or_images] = imgSrc;
number_or_images++;
///TODO change
}
for(int i =0; i < number_or_images; i++)
{
System.out.println("before = "+array[i]);
while(true)
{
int count = array[i].indexOf('/');
if(count == -1) break;
array[i] = array[i].substring(count+1);
}
System.out.println("after = " + array[i]);
}
//TODO open file and replace tags
int i =0;
File input = new File(result_doc);
Document doc = Jsoup.parse(input, "UTF-8");
System.out.println( input.canWrite());
for( Element img : doc.select("img[src]") )
{
String s = img.attr("src");
System.out.println(s);
img.attr("src", "/home/foo/"+array[i]); // set attribute 'src' to 'your-source-here'
s = img.attr("src");
System.out.println(s);
++i;
}
}
private static void downloadImage(String url, String imgSrc) throws IOException {
BufferedImage image = null;
try {
if (!(imgSrc.startsWith("http"))) {
url = url + imgSrc;
} else {
url = imgSrc;
}
imgSrc = imgSrc.substring(imgSrc.lastIndexOf("/") + 1);
String imageFormat = null;
imageFormat = imgSrc.substring(imgSrc.lastIndexOf(".") + 1);
String imgPath = null;
imgPath = home_folder + imgSrc + "";
URL imageUrl = new URL(url);
image = ImageIO.read(imageUrl);
if (image != null) {
File file = new File(imgPath);
ImageIO.write(image, imageFormat, file);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Solved.
I didn't save changes. Need to add code befire "downloadImage()"
int i = 0;
File input = new File(result_doc);
Document doc = Jsoup.parse(input, "UTF-8");
for( Element img : doc.select("img[src]") ) {
img.attr("src",array[i]); // set attribute 'src' to 'your-source-here'
++i;
}
try {
String strmb = doc.outerHtml();
bw = new BufferedWriter(new FileWriter(result_doc));
bw.write(strmb);
bw.close();
}
catch (Exception ex) {
System.out.println("Program stopped. The problem is " + "\"" +
ex.getMessage()+"\"");
}
You can go with JSOUP
Try something like below
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public static void getAllTags(){
try {
File input=new File("H:\\html pages\\index1.html");
Document document=Jsoup.parse(input, "UTF-8");
Document parse=Jsoup.parse(document.html());
Elements body=parse.select("body");
Elements bodyTags=body.select("*");
for (Element element : bodyTags) {
//Do what you want with tag
System.out.println(element.tagName());
}
} catch (Exception e) {
e.printStackTrace();
}
If you want to parse html then try this
public static void parseHTML(){
try {
File input = new File("H:\\html\\index1.html");
Document document = Jsoup.parse(input, "UTF-8");
Document parse = Jsoup.parse(document.html());
Elements bodyElements = parse.select("div");
Elements elements = bodyElements.select("*");
for (Element element : elements) {
FilterHtml.setHtmlTAG(element.tagName());
FilterHtml.ParseXml();
Elements body = bodyElements.select(FilterHtml.getXmlTAG());
if (body.is(FilterHtml.getXmlTAG())) {
Elements tag = parse.select(FilterHtml.getXmlTAG());
//Do something meaning full with tag
System.out.println(tag.text());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
Hope this would help. if yes please mark it green.
I making a java application that give the user notification with the weather conditions.
i used the yahoo weather API provided by yahoo like that link :
http://weather.yahooapis.com/forecastrss?w=2502265
and all i have to do is to change the eight numbered code that is in the URL in order to change the city.
that's working perfect, but there are two problems facing me now:
the first one, i want to implement a lot of weather forecast sources in my application not just the yahoo weather and i can't find a similar service in any other weather forecast websites.
the second one, i want to obtain the codes of all the cities in yahoo weather as for sure i won't ask the user to enter his city code, but to enter his city name and i'll match it with the code.
and here is the code that works with me in java:
the code to return the XML file:
package search;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
public class Process {
public static void main(String[] args) throws IOException {
Display disp = new Display();
Document doc = generateXML("1940345");
disp.getConditions(doc);
}
public static Document generateXML(String code) throws IOException {
String url = null;
String XmlData = null;
// creating the URL
url = "http://weather.yahooapis.com/forecastrss?w=" + code;
URL xmlUrl = new URL(url);
InputStream in = xmlUrl.openStream();
// parsing the XmlUrl
Document doc = parse(in);
return doc;
}
public static Document parse(InputStream is) {
Document doc = null;
DocumentBuilderFactory domFactory;
DocumentBuilder builder;
try {
domFactory = DocumentBuilderFactory.newInstance();
domFactory.setValidating(false);
domFactory.setNamespaceAware(false);
builder = domFactory.newDocumentBuilder();
doc = builder.parse(is);
} catch (Exception ex) {
System.err.println("unable to load XML: " + ex);
}
return doc;
}
}
the code to display the temperature and humidity in that city :
package search;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class Display {
static void getConditions(Document doc) {
String city = null;
String unit = null;
try {
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("rss");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
NodeList nl = eElement
.getElementsByTagName("yweather:location");
for (int tempr = 0; tempr < nl.getLength(); tempr++) {
Node n = nl.item(tempr);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element e = (Element) n;
city = e.getAttribute("city");
System.out.println("The City Is : " + city);
}
}
NodeList nl2 = eElement
.getElementsByTagName("yweather:units");
for (int tempr = 0; tempr < nl2.getLength(); tempr++) {
Node n2 = nl2.item(tempr);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element e2 = (Element) n2;
unit = e2.getAttribute("temperature");
}
}
NodeList nl3 = eElement
.getElementsByTagName("yweather:condition");
for (int tempr = 0; tempr < nl3.getLength(); tempr++) {
Node n3 = nl3.item(tempr);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element e3 = (Element) n3;
System.out.println("The Temperature In " + city
+ " Is : " + e3.getAttribute("temp") + " "
+ unit);
}
}
NodeList nl4 = eElement
.getElementsByTagName("yweather:atmosphere");
for (int tempr = 0; tempr < nl4.getLength(); tempr++) {
Node n4 = nl4.item(tempr);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element e4 = (Element) n4;
System.out.println("The Humidity In " + city
+ " Is : " + e4.getAttribute("humidity"));
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
You can use Metwit weather api simply passing latitude and longitude.
If you can implement them client-side: 200 request/day (ip based throttling) no authentication required. Worldwide coverage, JSON and REST compliant. You can register for extra API calls for free and if you still need it to call them server side the basic plan is pretty cheap.
Full disclosure: I own this API.
Take a look on this discussion. It seems relevant:
https://stackoverflow.com/questions/4876800/is-there-an-international-weather-forecast-api-that-is-not-limited-for-non-comme
Additionally type "weather forecast api" in google. There are tons of references to APIs that support several weather services.
Here's a list of Weather APIs that are available via the Temboo Java SDK:
https://temboo.com/library/keyword/weather/
You can use YQL (yahoo query language) to find the WOEID by city name like
var lclqry = escape('select * from geo.places where text="OKLAHOMA CITY"')
var lclurl = "http://query.yahooapis.com/v1/public/yql?q=" + lclqry + "&format=json&callback=?";
I know this is an old question, but i found it and as Sai suggested i have written code in java that send YQL query and retrieve WOEID number. Than it uses it to get weather from yahoo-weather-java-api. It needs gson dependecy which you can get by adding dependency to maven. I hope this will help someone.
EDIT
If there is more than one WOEID number for given town name, than getWeather returns weather for town with first WOEID returned.
CODE
Weather.java:
import com.github.fedy2.weather.YahooWeatherService;
import com.github.fedy2.weather.data.Channel;
import com.github.fedy2.weather.data.unit.DegreeUnit;
import com.google.gson.*;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLEncoder;
import javax.xml.bind.JAXBException;
/**
*
* #author robert
*/
public class Weather
{
public Channel getWeather(String townName) throws CantFindWeatherException
{
try
{
String baseUrl = "http://query.yahooapis.com/v1/public/yql?q=";
String query =
"select woeid from geo.places where text=\"" +
townName + "\"";
String fullUrlStr = baseUrl + URLEncoder.encode(query, "UTF-8") +
"&format=json";
URL fullUrl = new URL(fullUrlStr);
ResultObject resultObject = null;
ResultArray resultArray = null;
try (InputStream is = fullUrl.openStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader br = new BufferedReader(isr))
{
String result = "";
String read;
while ((read = br.readLine()) != null)
{
result += read;
}
Gson gson = new Gson();
try
{
resultObject = gson.fromJson(result, ResultObject.class);
}
catch (com.google.gson.JsonSyntaxException ex)
{
resultArray = gson.fromJson(result, ResultArray.class);
}
}
Integer woeid = null;
if (resultObject != null)
{
if (resultObject.query.results != null)
{
woeid = resultObject.query.results.place.woeid;
}
}
else if (resultArray != null)
{
woeid = resultArray.query.results.place[0].woeid;
}
if (woeid != null)
{
YahooWeatherService service = new YahooWeatherService();
Channel channel = service.getForecast(woeid.toString(),
DegreeUnit.CELSIUS);
return channel;
}
else
{
throw new CantFindWeatherException();
}
}
catch (IOException | JsonSyntaxException | JAXBException ex)
{
throw new CantFindWeatherException(ex);
}
}
private static class ResultObject
{
public QueryObject query;
}
private static class ResultArray
{
public QueryArray query;
}
private static class QueryObject
{
public int count;
public String created;
public String lang;
public ResultsObject results;
}
private static class QueryArray
{
public int count;
public String created;
public String lang;
public ResultsArray results;
}
private static class ResultsObject
{
public Place place;
}
private static class ResultsArray
{
public Place[] place;
}
private static class Place
{
public int woeid;
}
}
CantFindWeatherException.java:
/**
*
* #author robert
*/
public class CantFindWeatherException extends Exception
{
public CantFindWeatherException()
{
}
public CantFindWeatherException(String message)
{
super(message);
}
public CantFindWeatherException(String message, Throwable cause)
{
super(message, cause);
}
public CantFindWeatherException(Throwable cause)
{
super(cause);
}
}
As for the first question, I've build a website using forecast.io. It's pretty good. Good API and 1000 free calls/day. It uses latitute/longitude to find the weather of a place.
As for the second question, I would resolve what the user puts in with the Google Geocoding Api. So when they search for "New York", you check if you already have the relevant coordinates in your database, otherwise, you do an API call to Google Geocoding.