I am attempting to connect to a website where I'd like to extract its HTML contents. My application will never connect to the site - only time out.
Here is my code:
URL url = new URL("www.website.com");
URLConnection connection = url.openConnection();
connection.setConnectTimeout(2000);
connection.setReadTimeOut(2000);
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream());
String line;
while ((line = reader.readLine()) != null) {
// do stuff with line
}
reader.close();
Any ideas would be greatly appreciated. Thanks!
I believe the url should be (ie. you need a protocol):
URL url = new URL("http://www.website.com");
If that doesn't help then post your real SSCCE that demonstrates the problem so we don't have to guess what you are really doing because we can't tell if you are using your try/catch block correctly or if you are just ignoring exceptions.
Related
Sorry for the very basic question, I am new to Java.
To get data from an URL I use code like this
URL url = new URL(BaseURL+"login?name=foo");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
while ((inputLine = in.readLine()) != null)
...
That works perfectly fine. When I now want to continue and send the next command to the server (like ".../getStatus"), do I need to create these objects over and over again, or is there a smarter way?
Thanks!
You have to call openConnection again in order to get a new URLConnection. The HttpURLConnection does internal caching, though, so if the HTTP-server supports Connection: keep-alive the underlying connection to the server will be reused so it's not that bad as it originally might look. It's just hidden from you.
I looked into the Apache HttpComponents (HttpClient) and it still requires a lot of code. As I don't need cookie-handling (it's only a simple RESTful server giving json-blocks as responses) I'm going for a very simple solution:
public static String readStringFromURL(String requestURL) throws IOException
{
URL u = new URL(requestURL);
try (InputStream in = u.openStream()) {
return new String(in.readAllBytes(), StandardCharsets.UTF_8);
}
}
For me that looks like a perfect solution, but as mentioned, I am new to Java and open (and thankful) for hints...
I am trying to hit the URL and get the response from my Java code.
I am using URLConnection to get this response. And writing this response in html file.
When opening this html in browser after executing the java class, I am getting only google home page and not with the results.
Whats wrong with my code, my code here,
FileWriter fWriter = null;
BufferedWriter writer = null;
URL url = new URL("https://www.google.co.in/?gfe_rd=cr&ei=aS-BVpPGDOiK8Qea4aKIAw&gws_rd=ssl#q=google+post+request+from+java");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String(encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("User-Agent", "Mozilla/5.0");
connection.setRequestProperty("Accept-Charset", "UTF-8");
connection.setDoInput(true);
connection.setRequestProperty("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream) connection.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(content));
String line;
try {
fWriter = new FileWriter(new File("f:\\fileName.html"));
writer = new BufferedWriter(fWriter);
while ((line = in.readLine()) != null) {
String s = line.toString();
writer.write(s);
}
writer.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Same code works couple of days back, but not now.
The reason is that this url does not return search results it self. You have to understand google's working process to understand it. Open this url in your browser and view its source. You will only see lots of javascript there.
Actually, in a short summary, google uses Ajax requests to process search queries.
To perform required task you either have to use a headless browser (the hard way) which can execute javascript/ajax OR better use google search api as directed by anand.
This method of searching is not advised is supposed to fail, you must use google search APIs for this kind of work.
Note: Google uses some redirection and uses token, so even if you will find a clever way to handle it, it is ought to fail in long run.
Edit:
This is a sample of how using Google search APIs you can get your work done in reliable way; please do refer to the source for more information.
public static void main(String[] args) throws Exception {
String google = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=";
String search = "stackoverflow";
String charset = "UTF-8";
URL url = new URL(google + URLEncoder.encode(search, charset));
Reader reader = new InputStreamReader(url.openStream(), charset);
GoogleResults results = new Gson().fromJson(reader, GoogleResults.class);
// Show title and URL of 1st result.
System.out.println(results.getResponseData().getResults().get(0).getTitle());
System.out.println(results.getResponseData().getResults().get(0).getUrl());
}
I have been trying to retrieve XML data from a URL and write to file on disk http://dbpedia.org/data/Berlin.rdf using the following code snippet.
URL urlObj = new URL("http://dbpedia.org/data/Berlin.rdf");
java.net.HttpURLConnection connection = (HttpURLConnection) urlObj.openConnection();
InputStream reader = new BufferedInputStream(connection.getInputStream());
BufferedReader breader = new BufferedReader(new InputStreamReader(reader));
String line;
BufferedWriter writer = new BufferedWriter(new eWriter("resource.xml"));
while ((line = breader.readLine()) != null) {
// writes the line to the output file
writer.write(line);
System.out.println(line);
}
writer.close();
connection.disconnect();
But I get this error: Exception in thread "main" java.io.IOException: Server returned HTTP response code: 502 for URL: http://dbpedia.org/data/Berlin.rdf
What is wrong ? How to fix this ? Thanks in advance.
A 502 HTTP Error is a Server Error.
If you go to the site (http://dbpedia.org/data/Berlin.rdf), you will see that dbpedia is currently undergoing maintenance. Go back in a couple of hours and try again and your code should work fine.
Update: It's working fine now.
I need to make a struts 2 application. In the one view of this app, I have to get the view of another application through the URL provided for example (http://localhost:8080/hudson/)...
Now.
1. How to connect with the other application? (Can it be done with Apache HttpURLClient? OR any other way please guide. )
2 .If it can be done with Apache HttpURLClient, then how to render the Response object in stuts2 framework.
Please help. Many thanks in advance.
You can use java.net package to resolve the issue. Example Code :
URL urlApi = new URL(requestUrl);
HttpURLConnection httpURLConnection = (HttpURLConnection) urlApi.openConnection();
httpURLConnection.setRequestMethod(<requestMethod>); // GET or POST
httpURLConnection.setDoOutput(true);
//in case HTTP POST method un-comment following to write request body
//DataOutputStream ds = new DataOutputStream(httpURLConnection.getOutputStream());
//ds.writeBytes(body);
InputStream content = (InputStream) httpURLConnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(content));
StringBuilder stringBuilder = new StringBuilder(100);
String line = null;
while ((line = bufferedReader.readLine()) != null) {
stringBuilder.append(line);
}
String serverResult = stringBuilder.toString();
//now you have a string representation of the server result page
//do what you need
Hope this helps
I have to do the same, i`m using struts2 but i have doing a servlet. In struts.xml you have to put you can get the content of the url with httpauarlconnection or with httpclient (apache) and and put it at servlet response.
I have this but i have problems with the relative links of the html because it try to resolve with my domain name (the name of the servlet that make the work).
I used url.openConnection() to get text from a webpage
but i got time delay in execution while i tried it in loops
i also tried httpUrl.disconnect().
but the change is not that much...
can anyone give me a better option for this
i used the following code for this
for(int i=0;i<10;i++){
URL google = new URL(array[i]);//array of links
HttpURLConnection yc =(HttpURLConnection)google.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
source=source.concat(inputLine);
}
in.close();
yc.disconnect();
}
A couple of issues I can see.
in.readLine() doesn't retain the newline so when you use concat, all the newlines have been removed.
Using concat in a loop like this builds a longer and longer String. This will get slower and slower with each line you add.
Instead you might find IOUtils useful.
URL google = new URL("123newyear.com/2011/calendars/");
String text = IOUtils.toString(google.openConnection().getInputStream());
See Reading Directly from a URL for details on how to to get a stream from which you can read the contents of the URL.
Basically, you
Create a url URL url = new URL("123newyear.com/2011/calendars/";
Call openstream() on the URL object
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
Read from the stream (like you did).