Download file programmatically - java

I am trying to download an vcalendar using a java application, but I can't download from a specific link.
My code is:
URL uri = new URL("http://codebits.eu/s/calendar.ics");
InputStream in = uri.openStream();
int r = in.read();
while(r != -1) {
System.out.print((char)r);
r = in.read();
}
When I try to download from another link it works (ex: http://www.mysportscal.com/Files_iCal_CSV/iCal_AUTO_2011/f1_2011.ics). Something don't allow me to download and I can't figure out why, when I try with the browser it works.

I'd follow this example. Basically, get the response code for the connection. If it's a redirect (e.g. 301 in this case), retrieve the header location and attempt to access the file using that.
Simplistic Example:
URL uri = new URL("http://codebits.eu/s/calendar.ics");
HttpURLConnection con = (HttpURLConnection)uri.openConnection();
System.out.println(con.getResponseCode());
System.out.println(con.getHeaderField("Location"));
uri = new URL(con.getHeaderField("Location"));
con = (HttpURLConnection)uri.openConnection();
InputStream in = con.getInputStream();

You should check what that link actually provides. For example, it might be a page that has moved, which gives you back an HTTP 301 code. Your browser will automatically know to go and fetch it from the new URL, but your program won't.
You might want to try, for example, wireshark to sniff the actual traffic when you do the browser request.

I think too that there is a redirect. The browser downloads from ssl secured https://codebits.eu/s/calendar.ics. Try using a HttpURLConnection, it should follow redirects automatically:
HttpURLConnection con = (HttpURLConnection)uri.openConnection();
InputStream in = con.getInputStream();

Related

Taking text from a response web page using Java

I am sending commands to a server using http, and I currently need to parse a response that the server sends back (I am sending the command via the command line, and the servers response appears in my browser).
There are a lot of resources such as this: Saving a web page to a file in Java, that clearly illustrate how to scrape a page such as cnn.com. However, since this is a response page that is only generated when the camera receives a specific command, my attempts to use the method described by Mike Deck (in the link above) have met with failure. (Specifically, when my program requests the page again the server returns a 401 error.)
The response from the server opens a new tab in my browser. Essentially, I need to know how to save the current web page using java, since reading in a file is probably the most simple way to approach this. Do any of you know how to do this?
TL;DR How do you save the current webpage to a webpage.html or webpage.txt file using java?
EDIT: I used Base64 from the Apache commons codec, which solved my 401 authentication issue. However, I am still getting a 400 error when I attempt to connect my InputStream (see below). Does this mean a connection isn't being established in the first place?
URL url = new URL ("http://"+ipAddress+"/axis-cgi/record/record.cgi?diskid=SD_DISK");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String (encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setDoInput (true);
connection.setRequestProperty ("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream)connection.getInputStream();
BufferedReader in = new BufferedReader (new InputStreamReader (content));
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
EDIT 2: Changing the request to a GET resolved the issue.
So while scrutinizing my code above, I decided to change
connection.setRequestMethod("POST");
to
connection.setRequestMethod("GET");
This solved my problem. In hindsight, I think the server was not recognizing the HTTP because it is not set up to handle the various trappings that come along with post.

403 error in accessing an URL but works fine in browsers

String url = "http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false";
URL google = new URL(url);
HttpURLConnection con = (HttpURLConnection) google.openConnection();
and I use BufferedReader to print the content I get 403 error
The same URL works fine in the browser. Could any one suggest.
The reason it works in a browser but not in java code is that the browser adds some HTTP headers which you lack in your Java code, and the server requires those headers. I've been in the same situation - and the URL worked both in Chrome and the Chrome plugin "Simple REST Client", yet didn't work in Java. Adding this line before the getInputStream() solved the problem:
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
..even though I have never used Mozilla. Your situation might require a different header. It might be related to cookies ... I was getting text in the error stream advising me to enable cookies.
Note that you might get more information by looking at the error text. Here's my code:
try {
HttpURLConnection connection = ((HttpURLConnection)url.openConnection());
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
InputStream input;
if (connection.getResponseCode() == 200) // this must be called before 'getErrorStream()' works
input = connection.getInputStream();
else input = connection.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String msg;
while ((msg =reader.readLine()) != null)
System.out.println(msg);
} catch (IOException e) {
System.err.println(e);
}
HTTP 403 is a Forbidden status code. You would have to read the HttpURLConnection.getErrorStream() to see the response from the server (which can tell you why you have been given a HTTP 403), if any.
This code should work fine. If you have been making a number of requests, it is possible that Google is just throttling you. I have seen Google do this before. You can try using a proxy to verify.
Most browsers automatically encode URLs when you enter them, but the Java URL function doesn't.
You should Encode the URL with URLEncoder URL Encoder
I know this is a bit late, but the easiest way to get the contents of a URL is to use the Apache HttpComponents HttpClient project: http://hc.apache.org/httpcomponents-client-ga/index.html
you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
I found the difference is in request header:
with 403 forbidden error,
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, no 403 forbidden,
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made

How can I download comments on a webpage (Android)

Usually I use this code to download a webpage source:
URL myURL = new URL("http://mysite.com/index.html");
StringBuffer all = new StringBuffer("");
URLConnection ucon = myURL.openConnection();
InputStream is = ucon.getInputStream();
BufferedReader page = new BufferedReader(new InputStreamReader(is, "ISO-8859-15"));
while((linea = page.readLine()) != null){
all.append(linea.trim());
}
It works fine with a wifi connection because it downloads the string like <!-- it's a comment -->,but i tried to used a mobile connection with my mobile phone but it doesn't download the comments.. Is there a method to include the comments on download webpage source?
thx for reply ;)
It is possible that your service provider is compressing the pages on their side to reduce the data sent. I've not heard of this being done for HTML but it is frequently done for JPG, so it's easy to image that's what's happening. This compression would be very likely to remove comments.
It would be nice if there was some http convention to tell the stack 'never compress', but (at fas as I know) there is not. So you're probably out of luck.

how do I search a word in a webpage

how do I search for existence of a word in a webpage given its url say "www.microsoft.com". Do I need to download this webpage to perform this search ?
You just need to make http request on web page and grab all its content after that you can search necessary words in it, below code might help you to do so.
public static void main(String[] args) {
try {
URL url;
URLConnection urlConnection;
DataOutputStream outStream;
DataInputStream inStream;
// Build request body
String body =
"fName=" + URLEncoder.encode("Atli", "UTF-8") +
"&lName=" + URLEncoder.encode("Þór", "UTF-8");
// Create connection
url = new URL("http://www.example.com");
urlConnection = url.openConnection();
((HttpURLConnection)urlConnection).setRequestMethod("POST");
urlConnection.setDoInput(true);
urlConnection.setDoOutput(true);
urlConnection.setUseCaches(false);
urlConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
urlConnection.setRequestProperty("Content-Length", ""+ body.length());
// Create I/O streams
outStream = new DataOutputStream(urlConnection.getOutputStream());
inStream = new DataInputStream(urlConnection.getInputStream());
// Send request
outStream.writeBytes(body);
outStream.flush();
outStream.close();
// Get Response
// - For debugging purposes only!
String buffer;
while((buffer = inStream.readLine()) != null) {
System.out.println(buffer);
}
// Close I/O streams
inStream.close();
outStream.close();
}
catch(Exception ex) {
System.out.println("Exception cought:\n"+ ex.toString());
}
}
i know how i would do this in theory - use cURL or some application to download it, store the contents into a variable, then parse it for whatever you need
Yes, you need to download page content and search inside it for what you want. And if it happens that you want to search the whole microsoft.com website then you should either write your own web crawler, use an existing crawler or use some search engine API like Google's.
Yes, you'll have to download the page, and, to make sure to get the complete content, you'll want to execute scripts and include dynamic content - just like a browser.
We can't "search" something on a remote resource, that is not controlled by us and no webservers offers a "scan my content" method by default.
Most probably you'll want to load the page with a browser engine (webkit or something else) and perform the search on the internal DOM structure of that engine.
If you want to do the search yourself, then obviously you have to download the page.
If you're planning on this approach, i recommend Lucene (unless you want a simple substring search)
Or you could have a webservice that does it for you. You could request the webservice to grep the url and post back its results.
You could use a search engine's API. I believe Google and Bing (http://msdn.microsoft.com/en-us/library/dd251056.aspx) have ones you can use.

Downloading a web page with Android

I'm downloading a web page then extracting some data out of it, using regex (don't yell at me, I know a proper parser would be better, but this is a very simple machine generated page). This works fine in the emulator, and on my phone when connected by wi-fi, but not on 3G - the string returned is not the same, and I don't get a match. I can imagine it has something to do with packet size or latency, but I can't figure it out.
My code:
public static String getPage(URL url) throws IOException {
final URLConnection connection = url.openConnection();
HttpGet httpRequest = null;
try {
httpRequest = new HttpGet(url.toURI());
} catch (URISyntaxException e) {
e.printStackTrace();
}
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response = (HttpResponse) httpclient.execute(httpRequest);
HttpEntity entity = response.getEntity();
BufferedHttpEntity bufHttpEntity = new BufferedHttpEntity(entity);
InputStream stream = bufHttpEntity.getContent();
String ct = connection.getContentType();
final BufferedReader reader;
if (ct.indexOf("charset=") != -1) {
ct = ct.substring(ct.indexOf("charset=") + 8);
reader = new BufferedReader(new InputStreamReader(stream, ct));
}else {
reader = new BufferedReader(new InputStreamReader(stream));
}
final StringBuilder sb = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
stream.close();
return sb.toString();
}
Is it my poor connection causing this, or is there a bug in there? Either way, how do I solve it?
Update:
The file downloaded over 3G is 201 bytes smaller than the one over wi-fi. While they are obviously both downloading the correct page, the 3G one is missing a whole bunch of whitespace, and also some HTML comments that are present in the original page which I find a little strange. Does Android fetch pages differently on 3G as to reduce file size?
UserAgent (UA) shouldn't be changed if u access web page using 3g or wifi.
As it is mentioned before, get rid of UrlConnection, cause obviously code is complete for using HTTPClient method, and you are able to set UA using:
httpclient.getParams().setParameter(CoreProtocolPNames.USER_AGENT, userAgent);
last one..it might be silly but maybe web page is dynamic?! is that possible?
Here you go some hints, some of them silly hints, but just in case:
Review your mobile connection, try to open web browser, surf the web, and make sure it actually works
I don't know which is the web page your are trying to access but take into account that depending on your phone User Agent (UA), the rendered content might be different (web pages specially designed for mobile phones), or even no content rendered at all. Is it a web page on your own.
Try to access that same web page from Firefox, changing the UA (Use the User Agent Switcher for Firefox), and review the code returned.
That will be a good start point to figure out what's your problem
Ger
You may want to check if your provider has a transparent proxy in place with 3G.

Categories