HttpUrlConnection's response omits the word 'http' - java

I create the URL object using a string like "http://www.example.com/a?s=12". I read the HTML response in the string serverResponse. This string is expected to have the entire HTML of a page, which has JavaScript and CSS includes. But strangely, the word "http:" is missing from all the URLs present in the response, eg in place of "http://example.com" I get "//asd.com". Any ideas?
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("GET");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
StringBuffer serverResponse = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
serverResponse.append(inputLine);
System.out.println(inputLine);
}
in.close();
System.out.println(serverResponse);

See here: Protocol-relative URLs

This string is expected to have the entire HTML of a page, which has javascript and CSS includes.
Why? A properly-constructed site will use relative URLs as much as possible. This seems to be one of them. Well done them, or you if it's your work.
But strangely, the word "http:" is missing from all the URLs present in the response, eg in place of "http://example.com" I get "//asd.com". Any ideas?
It's called a protocol-relative URL.

Related

Locate and extract only particular tag in HTML response Java

I am trying to find the gender of a name by using website "http://www.gpeters.com/names/baby-names.php".I was able to pass parameters using get request and get the html page as response like the following
URL url = new URL(
"http://www.gpeters.com/names/baby-names.php?name=sarah");
HttpURLConnection connection = null;
try {
// Create connection
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
connection.setRequestProperty("Content-Language", "en-US");
connection.setUseCaches(false);
connection.setDoInput(true);
connection.setDoOutput(true);
connection.connect();
// Get Response
InputStream is = connection.getInputStream();
int status = connection.getResponseCode();
//System.out.println(status);
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = rd.readLine()) != null) {
System.out.println(line);
}
rd.close();
//program prints whole HTML page as response.
The HTML response has a element like "It's a girl!" where the required result located.How do i extract only the above string and prints whether the input parameter is a boy or girl.Example:sarah is a girl..
Add jtidy to your project. Use it to convert HTML to XML. After that, you can use the standard XML tools like JDOM 2 or Jaxen to examine the data.
What you neeed to do is look at the HTML code and determine a unique path that allows you to identify the desired element. There are no simple solutions here. But some pointers:
Look for elements with id attributes since they are unique
Look for elements that are rare.
Look for unique texts

HTML Form Action Java

I have a java program which I want to input something into an html form. If possible it could just load a url like this
.../html_form_action.asp?kill=Kill+Server
But i'm not sure how to load a url in Java. How would I do this? Or is there a better way to send an action to an html form?
Depending on your security, you can make an HTTP call in Java. It is often referred to as a RESTFul call. The HttpURLConnection class offers encapsulation for basic GET/POST requests. There is also an HttpClient from Apache.
Here's how you can use URLConnection to send a simple HTTP request.
URL url = new URL(url + "?" + query);
// set connection properties
URLConnection connection = url.openConnection();
connection.setRequestProperty("Accept-Charset", "UTF-8");
connection.connect(); // send request
// read response
BufferedReader reader = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
reader.close(); // close connection

java - get html from ip address

I have devices that publish an html page when you connect via their ip address. For example, if I were to go to "192.168.1.104" on my computer, i would see the html page the device publishes. I am trying to scrape this html, but I am getting some errors, specifically a MalformedURLException at the first line of my method. I have posted my method below. I found some code for getting html and tweaked it for my needs. Thanks
public String getSbuHtml(String ipToPoll) throws IOException, SocketTimeoutException {
URL url = new URL("http", ipToPoll, -1, "/");
URLConnection con = url.openConnection();
con.setConnectTimeout(1000);
con.setReadTimeout(1000);
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());
String charset = m.matches() ? m.group(1) : "ISO-8859-1";
BufferedReader r = new BufferedReader(
new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) != null) {
buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();
}
EDIT: The above code has been changed to reflect constructing a new URL to work properly with an ip. However, when I try and get the contentType from the connection, it is null.
A URL (Uniform Resource Locator) must have a resource to locate (index.html) along with the means of network communication (http://). So an example of valid URL can be
http://192.168.1.104:8080/app/index.html
Merely 192.168.1.104 doesn't represent a URL
You need to add http:// to the front of your String that you pass into the method.
Create your URL as follows:
URL url = new URL("http", ipToPoll, -1, "/");
And since you're reading a potentially long HTML page I suppose buffering would help here:
BufferedReader r = new BufferedReader(
new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) !- null) {
buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();
EDIT: In response to your contentType coming null problem.
Before you inspect any headers like with getContentType() or retrieve content with getInputStream() you need to actually establish a connection with the URL resource by calling
URL url = new URL("http", ipToPoll, "/"); // -1 removed; assuming port = 80 always
// check your device html page address; change "/" to "/index.html" if required
URLConnection con = url.openConnection();
// set connection properties
con.setConnectTimeout(1000);
con.setReadTimeout(1000);
// establish connection
con.connect();
// get "content-type" header
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());
When you call openConnection() first (it wrongly suggests but) it doesn't establish any connection. It just gives you an instance of URLConnection to let you specify connection properties like connection timeout with setConnecTimeout().
If you're finding this hard to understand it may help to know that it's analogous to doing a new File() which simply represents a File but doesn't create one (assuming it doesn't exist already) unless you go ahead and call File.createNewFile() (or pass it to a FileReader).

Creating a view in Struts2 to get a view of another application through URL

I need to make a struts 2 application. In the one view of this app, I have to get the view of another application through the URL provided for example (http://localhost:8080/hudson/)...
Now.
1. How to connect with the other application? (Can it be done with Apache HttpURLClient? OR any other way please guide. )
2 .If it can be done with Apache HttpURLClient, then how to render the Response object in stuts2 framework.
Please help. Many thanks in advance.
You can use java.net package to resolve the issue. Example Code :
URL urlApi = new URL(requestUrl);
HttpURLConnection httpURLConnection = (HttpURLConnection) urlApi.openConnection();
httpURLConnection.setRequestMethod(<requestMethod>); // GET or POST
httpURLConnection.setDoOutput(true);
//in case HTTP POST method un-comment following to write request body
//DataOutputStream ds = new DataOutputStream(httpURLConnection.getOutputStream());
//ds.writeBytes(body);
InputStream content = (InputStream) httpURLConnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(content));
StringBuilder stringBuilder = new StringBuilder(100);
String line = null;
while ((line = bufferedReader.readLine()) != null) {
stringBuilder.append(line);
}
String serverResult = stringBuilder.toString();
//now you have a string representation of the server result page
//do what you need
Hope this helps
I have to do the same, i`m using struts2 but i have doing a servlet. In struts.xml you have to put you can get the content of the url with httpauarlconnection or with httpclient (apache) and and put it at servlet response.
I have this but i have problems with the relative links of the html because it try to resolve with my domain name (the name of the servlet that make the work).

Optimized option for getting text from a web page

I used url.openConnection() to get text from a webpage
but i got time delay in execution while i tried it in loops
i also tried httpUrl.disconnect().
but the change is not that much...
can anyone give me a better option for this
i used the following code for this
for(int i=0;i<10;i++){
URL google = new URL(array[i]);//array of links
HttpURLConnection yc =(HttpURLConnection)google.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
source=source.concat(inputLine);
}
in.close();
yc.disconnect();
}
A couple of issues I can see.
in.readLine() doesn't retain the newline so when you use concat, all the newlines have been removed.
Using concat in a loop like this builds a longer and longer String. This will get slower and slower with each line you add.
Instead you might find IOUtils useful.
URL google = new URL("123newyear.com/2011/calendars/");
String text = IOUtils.toString(google.openConnection().getInputStream());
See Reading Directly from a URL for details on how to to get a stream from which you can read the contents of the URL.
Basically, you
Create a url URL url = new URL("123newyear.com/2011/calendars/";
Call openstream() on the URL object
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
Read from the stream (like you did).

Categories