java - get html from ip address - java

I have devices that publish an html page when you connect via their ip address. For example, if I were to go to "192.168.1.104" on my computer, i would see the html page the device publishes. I am trying to scrape this html, but I am getting some errors, specifically a MalformedURLException at the first line of my method. I have posted my method below. I found some code for getting html and tweaked it for my needs. Thanks
public String getSbuHtml(String ipToPoll) throws IOException, SocketTimeoutException {
URL url = new URL("http", ipToPoll, -1, "/");
URLConnection con = url.openConnection();
con.setConnectTimeout(1000);
con.setReadTimeout(1000);
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());
String charset = m.matches() ? m.group(1) : "ISO-8859-1";
BufferedReader r = new BufferedReader(
new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) != null) {
buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();
}
EDIT: The above code has been changed to reflect constructing a new URL to work properly with an ip. However, when I try and get the contentType from the connection, it is null.

A URL (Uniform Resource Locator) must have a resource to locate (index.html) along with the means of network communication (http://). So an example of valid URL can be
http://192.168.1.104:8080/app/index.html
Merely 192.168.1.104 doesn't represent a URL

You need to add http:// to the front of your String that you pass into the method.

Create your URL as follows:
URL url = new URL("http", ipToPoll, -1, "/");
And since you're reading a potentially long HTML page I suppose buffering would help here:
BufferedReader r = new BufferedReader(
new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) !- null) {
buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();
EDIT: In response to your contentType coming null problem.
Before you inspect any headers like with getContentType() or retrieve content with getInputStream() you need to actually establish a connection with the URL resource by calling
URL url = new URL("http", ipToPoll, "/"); // -1 removed; assuming port = 80 always
// check your device html page address; change "/" to "/index.html" if required
URLConnection con = url.openConnection();
// set connection properties
con.setConnectTimeout(1000);
con.setReadTimeout(1000);
// establish connection
con.connect();
// get "content-type" header
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());
When you call openConnection() first (it wrongly suggests but) it doesn't establish any connection. It just gives you an instance of URLConnection to let you specify connection properties like connection timeout with setConnecTimeout().
If you're finding this hard to understand it may help to know that it's analogous to doing a new File() which simply represents a File but doesn't create one (assuming it doesn't exist already) unless you go ahead and call File.createNewFile() (or pass it to a FileReader).

Related

HTTP URL connection response

I am trying to hit the URL and get the response from my Java code.
I am using URLConnection to get this response. And writing this response in html file.
When opening this html in browser after executing the java class, I am getting only google home page and not with the results.
Whats wrong with my code, my code here,
FileWriter fWriter = null;
BufferedWriter writer = null;
URL url = new URL("https://www.google.co.in/?gfe_rd=cr&ei=aS-BVpPGDOiK8Qea4aKIAw&gws_rd=ssl#q=google+post+request+from+java");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String(encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Charset", "UTF-8");
connection.setDoInput(true);
connection.setRequestProperty("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream) connection.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(content));
String line;
try {
fWriter = new FileWriter(new File("f:\\fileName.html"));
writer = new BufferedWriter(fWriter);
while ((line = in.readLine()) != null) {
String s = line.toString();
writer.write(s);
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Same code works couple of days back, but not now.
The reason is that this url does not return search results it self. You have to understand google's working process to understand it. Open this url in your browser and view its source. You will only see lots of javascript there.
Actually, in a short summary, google uses Ajax requests to process search queries.
To perform required task you either have to use a headless browser (the hard way) which can execute javascript/ajax OR better use google search api as directed by anand.
This method of searching is not advised is supposed to fail, you must use google search APIs for this kind of work.
Note: Google uses some redirection and uses token, so even if you will find a clever way to handle it, it is ought to fail in long run.
Edit:
This is a sample of how using Google search APIs you can get your work done in reliable way; please do refer to the source for more information.
public static void main(String[] args) throws Exception {
String google = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=";
String search = "stackoverflow";
String charset = "UTF-8";
URL url = new URL(google + URLEncoder.encode(search, charset));
Reader reader = new InputStreamReader(url.openStream(), charset);
GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);
// Show title and URL of 1st result.
System.out.println(results.getResponseData().getResults().get(0).getTitle());
System.out.println(results.getResponseData().getResults().get(0).getUrl());
}

HttpUrlConnection's response omits the word 'http'

I create the URL object using a string like "http://www.example.com/a?s=12". I read the HTML response in the string serverResponse. This string is expected to have the entire HTML of a page, which has JavaScript and CSS includes. But strangely, the word "http:" is missing from all the URLs present in the response, eg in place of "http://example.com" I get "//asd.com". Any ideas?
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("GET");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer serverResponse = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
serverResponse.append(inputLine);
System.out.println(inputLine);
}
in.close();
System.out.println(serverResponse);
See here: Protocol-relative URLs
This string is expected to have the entire HTML of a page, which has javascript and CSS includes.
Why? A properly-constructed site will use relative URLs as much as possible. This seems to be one of them. Well done them, or you if it's your work.
But strangely, the word "http:" is missing from all the URLs present in the response, eg in place of "http://example.com" I get "//asd.com". Any ideas?
It's called a protocol-relative URL.

How to set right charset to working throw proxy-server with jsoup?

This code gave me content but some russian characters hide for me for square... Who know how to set utf-8 or cp1251 charset for proxy to get content. Dance with code do not take any results for me. getBytes and other method can't give me normal result.
URL url = new URL(linkCar);
String your_proxy_host = new String(proxys.getValueAt(xProxy, 1).toString());
int your_proxy_port = Integer.parseInt(proxys.getValueAt(xProxy, 2).toString());
Proxy proxy = null;
System.out.println(proxys.getValueAt(xProxy, 3).toString());
proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(your_proxy_host, your_proxy_port));
HttpURLConnection connection = (HttpURLConnection)url.openConnection(proxy);
connection.setConnectTimeout(16000);
connection.connect();
proxys - table model where list of the proxies;
And may be who know how to set connect throw socks-proxy
for UTF-8, try to change the line
BufferedReader buffer_input = new BufferedReader(new InputStreamReader(connection.getInputStream()));
to
BufferedReader buffer_input = new BufferedReader(new InputStreamReader(connection.getInputStream(),"UTF-8"));
you can change the charset name to another one if you want to change the charset.

How to invoke localhost servlet with URL Connection without specifying full url?

I am invoking a local servlet from a jsp, the servlet simply returns a json string:
URL url = new URL("http://myapp.appspot.com/myservlet");
URLConnection conn = url.openConnection();
conn.setConnectTimeout(5000);
InputStream is = conn.getInputStream();
StringWriter writer = new StringWriter();
IOUtils.copy(is, writer, "UTF-8");
String jsonStr = writer.toString();
Can I do this with a relative path so that it works both locally and on the deployed instance?
You could use JSTL with the tag
<c:import>
Alternatively, for your posted code, you could use
String requestURL = request.getRequestURL().toString();
String servletPath = request.getServletPath();
String serverPath = requestURL.substring(0,requestURL.indexOf(servletPath));
URL url = new URL(serverPath + "/myservlet");
You mean like this?
String urlString = "http://localhost/myservlet/";
localhost is an alias for 127.0.0.1, which is always "the local computer".
ServletRequest.getServerPort() will let you know the port where the user connected.
Depending on where this is happening, what you really might want to use is ServletRequest.getRequestDispatcher(), which bypasses the network layer, and stays inside your servlet container.
You can wrap the HttpResponse, and send that through to the RequestDispatcher, then extract the String that was produced with something like this:
http://goo.gl/kRW1b

Google Language detection api replying error code 406

I am trying to use Google language detection API, Right now I am using the sample available on Google documentation as follows:
public static String googleLangDetection(String str) throws IOException, JSONException{
String urlStr = "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=";
// String urlStr = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=Paris%20Hilton";
URL url = new URL(urlStr+str);
URLConnection connection = url.openConnection();
// connection.addRequestProperty("Referer","http://www.hpeprint.com");
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null) {
builder.append(line);
}
JSONObject json = new JSONObject(builder.toString());
for (Iterator iterator = json.keys(); iterator.hasNext();) {
String type = (String) iterator.next();
System.out.println(type);
}
return json.getString("language");
}
But I am getting http error code '406'.
I am unable to understand what the problem is? As the google search query(commented) below it is working fine.
The resultant language detection url itself is working fine when I run it in firefox or IE but it's failing in my java code.
Is there something I am doing wrong?
Thanks in advance
Ashish
As a guess, whatever is being passed in on str has characters that are invalid in a URL, as the error code 406 is Not Acceptable, and looks to be returned when there is a content encoding issue.
After a quick google, it looks like you need to run your str through the java.net.URLEncoder class, then append it to the URL.
Found the answer at following link:
HTTP URL Address Encoding in Java
Had to modify the code as follows:
URI uri = new URI("http","ajax.googleapis.com","/ajax/services/language/detect","v=1.0&q="+str,null);
URL url = uri.toURL();

Categories