Downloading a web page with Android - java

I'm downloading a web page then extracting some data out of it, using regex (don't yell at me, I know a proper parser would be better, but this is a very simple machine generated page). This works fine in the emulator, and on my phone when connected by wi-fi, but not on 3G - the string returned is not the same, and I don't get a match. I can imagine it has something to do with packet size or latency, but I can't figure it out.
My code:
public static String getPage(URL url) throws IOException {
final URLConnection connection = url.openConnection();
HttpGet httpRequest = null;
try {
httpRequest = new HttpGet(url.toURI());
} catch (URISyntaxException e) {
e.printStackTrace();
}
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response = (HttpResponse) httpclient.execute(httpRequest);
HttpEntity entity = response.getEntity();
BufferedHttpEntity bufHttpEntity = new BufferedHttpEntity(entity);
InputStream stream = bufHttpEntity.getContent();
String ct = connection.getContentType();
final BufferedReader reader;
if (ct.indexOf("charset=") != -1) {
ct = ct.substring(ct.indexOf("charset=") + 8);
reader = new BufferedReader(new InputStreamReader(stream, ct));
}else {
reader = new BufferedReader(new InputStreamReader(stream));
}
final StringBuilder sb = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
sb.append(line);
}
stream.close();
return sb.toString();
}
Is it my poor connection causing this, or is there a bug in there? Either way, how do I solve it?
Update:
The file downloaded over 3G is 201 bytes smaller than the one over wi-fi. While they are obviously both downloading the correct page, the 3G one is missing a whole bunch of whitespace, and also some HTML comments that are present in the original page which I find a little strange. Does Android fetch pages differently on 3G as to reduce file size?

UserAgent (UA) shouldn't be changed if u access web page using 3g or wifi.
As it is mentioned before, get rid of UrlConnection, cause obviously code is complete for using HTTPClient method, and you are able to set UA using:
httpclient.getParams().setParameter(CoreProtocolPNames.USER_AGENT, userAgent);
last one..it might be silly but maybe web page is dynamic?! is that possible?

Here you go some hints, some of them silly hints, but just in case:
Review your mobile connection, try to open web browser, surf the web, and make sure it actually works
I don't know which is the web page your are trying to access but take into account that depending on your phone User Agent (UA), the rendered content might be different (web pages specially designed for mobile phones), or even no content rendered at all. Is it a web page on your own.
Try to access that same web page from Firefox, changing the UA (Use the User Agent Switcher for Firefox), and review the code returned.
That will be a good start point to figure out what's your problem
Ger

You may want to check if your provider has a transparent proxy in place with 3G.

Related

Can't access server socket with android app

I want to see the exact headers my android app is sending while making a web request so I thought I'd simply create a simple server app in java on my local machine and have my android app make a call to it. Then simply dump the request to the console so I could see what the app is sending. However when I tried to connect, the app hangs and stops responding.
I created a simple server the only accepts a connection and sysouts the data it gets. The server runs fine and if I hit it from a web browser on my computer will print the headers from the web browsers request. So I know the server works fine.
Here's the code from my app:
URL url = new URL("http://192.168.1.11:9000");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setDoOutput(true);
connection.connect();
PrintWriter writer = new PrintWriter(connection.getOuputStream(), true);
writer.write("hi");
writer.close();
Simple. I only want the headers after all. Now I started without a post and using:
URL url = new URL("http://192.168.1.11:9000");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
in.close();
but that doesn't work. The app stops responding on the getInputStream() request. It just stops and won't continue. The server gets no connection request either.
So in all, the app is blocking on the url connection's getInputStream and I can't figure out why.
Now I've searched for awhile and found these:
Android app communicating with server via sockets
socket exception socket not connected android
Android embedded browser cant connect to server on LAN
Using java.net.URLConnection to fire and handle HTTP requests
Client Socket cannot connect to running Socket server
But nothing helps. I'm not using the localhost like everyone with this problem seems to be and I've tried using the androids 10.0.0.2 but that doesnt work either.
I'm not on a network that restricts anything (I'm home) and I've tried using the first set of code shown in order to send a message to my server but not even that works (it runs fine but the server never gets a client. Hows that work?).
I tried using both URLConnection and HttpURLConnection, they both have the same problem.
I'm also using the internet permission in my app, so it does have the permission needed.
I'm at a loss at this point. Why can't I make a simple call to my server?
EDIT
I used the exact code from androids documentation:
private String downloadUrl(String myurl) throws IOException {
InputStream is = null;
// Only display the first 500 characters of the retrieved
// web page content.
int len = 500;
try {
URL url = new URL("http://10.0.2.2:9000");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(10000 /* milliseconds */);
conn.setConnectTimeout(15000 /* milliseconds */);
conn.setRequestMethod("GET");
conn.setDoInput(true);
// Starts the query
conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();
// Convert the InputStream into a string
String contentAsString = readIt(is, len);
return contentAsString;
// Makes sure that the InputStream is closed after the app is
// finished using it.
} finally {
if (is != null) {
is.close();
}
}
}
but even that doesn't work. It still hangs. Only now it hangs on the getResponseCode(). Then throws a timeout exception. The server never gets a request though.
Your address must start with 'http://", try again!
I think the root of your issue is that Android is FCing your app before the connection completes, because I assume you haven't wrapped this in a Loader, AsyncTask or Thread. I suggest you follow the training guide Google provides, wrapping your call in an AsyncTask and seeing if that corrects the issue.
I have a Java class I use for making HTTP GET requests, I'm guessing its near identical to the android code your using so below I've dumped the relevant part of the code. I've used this class many times in Java applications (not on Android).
currentUrl = new URL(getUrl);
conn = (HttpURLConnection)currentUrl.openConnection();
conn.setRequestProperty("Cookie", getCookies(currentUrl.getHost()));
conn.setRequestProperty("User-Agent", "robadob.org/crawler");
if(referrer!=null){conn.setRequestProperty("Referrer",referrer);}
conn.setRequestMethod("GET");
conn.connect();
//Get response
String returnPage = "";
String line;
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
while ((line = rd.readLine()) != null) {
returnPage+=line+"\n";
}
rd.close();
I can't see anything obvious that would be causing your code to fail, but hopefully you can spot something from this. The setRequestProperty is me setting headers, so you shouldn't need those.
If that fails, flood your code with System.out's so you can see which statement its stalling at.

Taking text from a response web page using Java

I am sending commands to a server using http, and I currently need to parse a response that the server sends back (I am sending the command via the command line, and the servers response appears in my browser).
There are a lot of resources such as this: Saving a web page to a file in Java, that clearly illustrate how to scrape a page such as cnn.com. However, since this is a response page that is only generated when the camera receives a specific command, my attempts to use the method described by Mike Deck (in the link above) have met with failure. (Specifically, when my program requests the page again the server returns a 401 error.)
The response from the server opens a new tab in my browser. Essentially, I need to know how to save the current web page using java, since reading in a file is probably the most simple way to approach this. Do any of you know how to do this?
TL;DR How do you save the current webpage to a webpage.html or webpage.txt file using java?
EDIT: I used Base64 from the Apache commons codec, which solved my 401 authentication issue. However, I am still getting a 400 error when I attempt to connect my InputStream (see below). Does this mean a connection isn't being established in the first place?
URL url = new URL ("http://"+ipAddress+"/axis-cgi/record/record.cgi?diskid=SD_DISK");
byte[] encodedBytes = Base64.encodeBase64("root:pass".getBytes());
String encoding = new String (encodedBytes);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setDoInput (true);
connection.setRequestProperty ("Authorization", "Basic " + encoding);
connection.connect();
InputStream content = (InputStream)connection.getInputStream();
BufferedReader in = new BufferedReader (new InputStreamReader (content));
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
EDIT 2: Changing the request to a GET resolved the issue.
So while scrutinizing my code above, I decided to change
connection.setRequestMethod("POST");
to
connection.setRequestMethod("GET");
This solved my problem. In hindsight, I think the server was not recognizing the HTTP because it is not set up to handle the various trappings that come along with post.

Download file programmatically

I am trying to download an vcalendar using a java application, but I can't download from a specific link.
My code is:
URL uri = new URL("http://codebits.eu/s/calendar.ics");
InputStream in = uri.openStream();
int r = in.read();
while(r != -1) {
System.out.print((char)r);
r = in.read();
}
When I try to download from another link it works (ex: http://www.mysportscal.com/Files_iCal_CSV/iCal_AUTO_2011/f1_2011.ics). Something don't allow me to download and I can't figure out why, when I try with the browser it works.
I'd follow this example. Basically, get the response code for the connection. If it's a redirect (e.g. 301 in this case), retrieve the header location and attempt to access the file using that.
Simplistic Example:
URL uri = new URL("http://codebits.eu/s/calendar.ics");
HttpURLConnection con = (HttpURLConnection)uri.openConnection();
System.out.println(con.getResponseCode());
System.out.println(con.getHeaderField("Location"));
uri = new URL(con.getHeaderField("Location"));
con = (HttpURLConnection)uri.openConnection();
InputStream in = con.getInputStream();
You should check what that link actually provides. For example, it might be a page that has moved, which gives you back an HTTP 301 code. Your browser will automatically know to go and fetch it from the new URL, but your program won't.
You might want to try, for example, wireshark to sniff the actual traffic when you do the browser request.
I think too that there is a redirect. The browser downloads from ssl secured https://codebits.eu/s/calendar.ics. Try using a HttpURLConnection, it should follow redirects automatically:
HttpURLConnection con = (HttpURLConnection)uri.openConnection();
InputStream in = con.getInputStream();

Post to a site using Java, navigating past the 'i agree' redirect

I am trying to visit a site, and get the request to be processed to follow the redirect.
i visit the i agree site, but it doesnt seem to continue past that, and keeps redirecting me
Here is my code:
public static void main(String[] args)
{
System.out.println("results");
//String targetConfirmation18 = "";
URL url;
HttpURLConnection connection;
OutputStreamWriter osw = null;
BufferedReader br = null;
String line;
try {
url = new URL("");
//url = new URL(targetConfirmation);
connection = (HttpURLConnection)url.openConnection();
connection.setDoInput(true);
connection.setDoOutput(true);
osw = new OutputStreamWriter(connection.getOutputStream());
osw.write("");
osw.flush();
br = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException ioe) {
// nothing to see here
}
}
}
I suspect that you are violating the Tabcorp Terms of Service. They say:
You may, using an industry-standard web browser, download and view the Content for your personal, non-commercial use only.
and
All rights not expressly granted herein are reserved.
The site sets cookies after you do post on 18+ url. You must remember them and submit with next requests. You can easily figure it out with FireBug.
As a result, you will need to use more advanced HTTP client than simple URL. For example, Apache HTTP Client that allows cookie manipulation.
This section in HTTP Client Tutorial specifically covers cookies.
I am pretty sure that your problem here is the HTTP session.
When you surf to the site using browser the server creates HTTP session and sends its ID as one of the cookies. Then browser sends the cookies back on each request, so server can recognize that this is the existing session.
I think that server always redirects you to 18+ page when session is unknown.
So, why the session is unknown in your case? It is because all your requests are independent. You should do as a browser. Do not start from posting to 18+ confirmation page. Start from HTTP get that will redirect you to this page. Take cookies from response header Set-Cookie and send the cookies back using request header "Cookie".
You can also use higher level tools like Jakarta HTTP client that does this work for you automatically, but it is a good exercise to implement it yourself. I tried this technique several times and saw that it works also with standard HttpUrlConnection.
BTW, I hope that this is not your case but sometimes you have to mimic the User-Agent: present yourself as one of the known browsers. Otherwise some sites redirect you to page that says that your browser is unsupported.
Good luck.

How can I download comments on a webpage (Android)

Usually I use this code to download a webpage source:
URL myURL = new URL("http://mysite.com/index.html");
StringBuffer all = new StringBuffer("");
URLConnection ucon = myURL.openConnection();
InputStream is = ucon.getInputStream();
BufferedReader page = new BufferedReader(new InputStreamReader(is, "ISO-8859-15"));
while((linea = page.readLine()) != null){
all.append(linea.trim());
}
It works fine with a wifi connection because it downloads the string like <!-- it's a comment -->,but i tried to used a mobile connection with my mobile phone but it doesn't download the comments.. Is there a method to include the comments on download webpage source?
thx for reply ;)
It is possible that your service provider is compressing the pages on their side to reduce the data sent. I've not heard of this being done for HTML but it is frequently done for JPG, so it's easy to image that's what's happening. This compression would be very likely to remove comments.
It would be nice if there was some http convention to tell the stack 'never compress', but (at fas as I know) there is not. So you're probably out of luck.

Categories