I am trying to get the source of a webpage using the following code:
public static String getFile(String sUrl) throws ClientProtocolException, IOException {
DefaultHttpClient httpclient = new DefaultHttpClient();
StringBuilder b = new StringBuilder();
// Prepare a request object
HttpGet httpget = new HttpGet(sUrl);
// Execute the request
HttpResponse response = httpclient.execute(httpget);
// Examine the response status
System.out.println(response.getStatusLine());
//status code should be 200
if (response.getStatusLine().getStatusCode() != 200) {
return null;
}
// Get hold of the response entity
HttpEntity entity = response.getEntity();
// If the response does not enclose an entity, there is no need
// to worry about connection release
if (entity != null) {
InputStream instream = entity.getContent();
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(instream));
// do something useful with the response
String s = reader.readLine();
while (s != null) {
b.append(s);
b.append("\n");
s = reader.readLine();
}
} catch (IOException ex) {
// In case of an IOException the connection will be released
// back to the connection manager automatically
throw ex;
} catch (RuntimeException ex) {
// In case of an unexpected exception you may want to abort
// the HTTP request in order to shut down the underlying
// connection and release it back to the connection manager.
httpget.abort();
throw ex;
} finally {
// Closing the input stream will trigger connection release
instream.close();
}
// When HttpClient instance is no longer needed,
// shut down the connection manager to ensure
// immediate deallocation of all system resources
httpclient.getConnectionManager().shutdown();
}
return b.toString();
}
It works fine, but certain symbols like  , - , single quotes etc. are not getting copied correctly.
I try to save the page source as a text/html type into amazon s3 and display it by accessing the page saved in the s3 server.
The symbols that I mentioned above are displayed as � .
Is there any solution for this?
You need to make sure that you are reading the content with the encoding of the page, else your system default encoding would be used (which apparently is not the correct one as you have seen):
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, entity.getContentEncoding()));
First one need to specify the encoding that the InputStreamReader uses. Your version of the constructor takes the default encoding on your system.
The encoding could be delivered in the headers. It defaults to ISO-8859-1 but (Latin-1) but in reality is Windows-1252 (Windows Latin-1).
String charset = "Windows-1252"; // Can be used as default.
String enc = entity.getContentEncoding(); // Or from Content-Type.
if (enc != null) {
charset = enc;
}
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream, charset));
For HTML entities, apache has:
String s = ...
s = StringEscapeUtils.unescapeHTML4(s);
Related
I am digging for quite a while and I am wondering how do I open an HttpClient connection in Java (Android) and then close the socket(s) right away without getting CLOSE_WAIT and TIME_WAIT TCP statuses while I am checking network monitoring tools.
What I am doing is (Found this solution on stackoverflow site):
String url = "http://example.com/myfile.php";
String result = null;
InputStream is = null;
StringBuilder sb = null;
try {
HttpClient httpclient = new DefaultHttpClient();
HttpPost httppost = new HttpPost(url);
HttpResponse response = httpclient.execute(httppost);
HttpEntity entity = response.getEntity();
is = entity.getContent();
} catch (Exception e) {
Log.e("log_tag", "Error in http connection" + e.toString());
}
// convert response to string
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(
is, "iso-8859-1"), 8);
sb = new StringBuilder();
sb.append(reader.readLine() + "\n");
String line = "0";
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
result = sb.toString();
} catch (Exception e) {
}
Toast.makeText(getApplicationContext(), result, Toast.LENGTH_LONG).show();
After I run this code - The PHP file is executed well, I get the response back to TOAST, BUT - when I analyze the networking environment of my mobile device with external network analyzer tool - I see that the connection(s) stay in CLOSE_WAIT or/and TIME_WAIT for about 1 minute and only then they move to CLOSED state.
The problem is:
I am calling the above function every ~2 to 5 seconds in an infinite loop, which result over time a huge amount of CLOSE_WAITs and TIME_WAITs - which affect the overall performance of my Android app, until it gets stuck and useless !
What I want to do is (And need your answer if possible):
I wish to really close the connection RIGHT AWAY after I TOAST the response message without any open sockets. No TIME_WAIT and no CLOSE_WAIT. No left overs at all - close all communication IMMEDIATELY at the split second that I run code that should do so. I don't need the connection anymore until the next iteration of the loop.
How can I accomplish that ?
I have in mind that I don't want the application to halt or have poor performance over time, since it should run in a service/stay open forever.
I would really appreciate if you could write simple code that work after I do copy-paste.
I am new to Java and Android, so I will try to figure out the code that you write, so please keep it as simple as possible. Thanks a lot !
Question asker.
try using HttpURLConnection class. refer to following link :
http://android-developers.blogspot.de/2011/09/androids-http-clients.html
in the finally clause just call disconnect. Example code ..
try {
HttpURLConnection locConn = (HttpURLConnection) locurl.openConnection();
//URL url = locConn.getURL();
locConn.setRequestProperty("Authorization", basicAuth);
locConn.setRequestMethod("GET");
locConn.setRequestProperty("Content-Type", "application/json");
locConn.setRequestProperty("X-Myauthtoken", userCredentials);
retc = locConn.getResponseCode();
reader = new BufferedReader(new InputStreamReader(locConn.getInputStream()));
String sessionK = null;
readVal = reader.readLine();
if (retc == 200) {
}
}catch (...)
{
//handle exception
}finally {
//disconnect here
locConn.disconnect();
}
HttpClient httpclient = new HttpClient();
GetMethod httpget = new GetMethod("http://www.myhost.com/");
try {
httpclient.executeMethod(httpget);
Reader reader = new InputStreamReader(httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
// consume the response entity
} finally {
httpget.releaseConnection();
}
I tried uploading a file to skydrive with the rest api in java.
Here is my code:
public void UploadFile(File upfile) {
if (upload_loc == null) {
getUploadLocation();
}
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, HttpVersion.HTTP_1_1);
HttpPost post = new HttpPost(upload_loc + "?" + "access_token=" + access_token);
try {
MultipartEntity mpEntity = new MultipartEntity(null,"A300x",null);
ContentBody cbFile = new FileBody(upfile, "multipart/form-data");
mpEntity.addPart("file", cbFile);
post.setEntity(mpEntity);
System.out.println(post.toString());
HttpResponse response = client.execute(post);
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line2 = "";
while ((line2 = rd.readLine()) != null) {
System.out.println(line2);
}
} catch (IOException ex) {
Logger.getLogger(Onlab.class.getName()).log(Level.SEVERE, null, ex);
}
client.getConnectionManager().shutdown();
}
But when I try to run it, I get this error:
{
"error": {
"code": "request_body_invalid",
"message": "The request entity body for multipart form-data POST isn't valid. The expected format is:\u000d\u000a--[boundary]\u000d\u000aContent-Disposition: form-data; name=\"file\"; filename=\"[FileName]\"\u000d\u000aContent-Type: application/octet-stream\u000d\u000a[CR][LF]\u000d\u000a[file contents]\u000d\u000a--[boundary]--[CR][LF]"
}
}
My biggest problem is that I don't see the request itself. I couldn't find any usable toString method for that. I tried this forced boundary format, but I tried it with empty constructor too.
My file is now a txt with some text, and I think the boundary is the main problem or I should be configuring some more parameters. When I see the variables in debugging mode everything looks the same as a guide in the msdn.
I'm new in the rest world and if it possible I want to keep this apache lib with the simple to use HttpClient and HttpPost classes.
Thanks in advance, and sorry for my english.
EDIT:
Ok, after a long sleep I decided to try the PUT method instead of POST. The code work fine with minimal changes:
public void UploadFile(File upfile) {
if (upload_loc == null) {
getUploadLocation();
}
HttpClient client = new DefaultHttpClient();
client.getParams().setParameter(CoreProtocolPNames.PROTOCOL_VERSION, HttpVersion.HTTP_1_1);
String fname=upfile.getName();
HttpPut put= new HttpPut(upload_loc +"/"+fname+ "?" + "access_token=" + access_token);
try {
FileEntity reqEntity=new FileEntity(upfile);
put.setEntity(reqEntity);
HttpResponse response = client.execute(put);
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
String line2 = "";
while ((line2 = rd.readLine()) != null) {
System.out.println(line2);
}
} catch (IOException ex) {
Logger.getLogger(Onlab.class.getName()).log(Level.SEVERE, null, ex);
}
client.getConnectionManager().shutdown();
}
But there is no answer for the first question yet.
Two quick things:
You should not be using the overloaded MultipartEntity constructor unless you really need to. In this case you are setting the charset to null, which is probably not a good idea. Also, your boundary delimiter is not complex enough.
Your file body content type should reflect the content of the actual file being uploaded. `multipart-formdata is normally used for HTML form data, not files. You should change this to 'text/plain', or 'image/jpeg', or whatever reflects the true mime type of the file.
Some great tools for debugging REST requests - REST Console (Chrome), REST Client (Firefox).
Some quick notes on the error message you received, it actually has quite a bit of detail. The service is expecting the following parameters to be set for the file part being sent:
name:
filename:
Content-Type: application/octet-stream
You can have the HTTP client set most of these with this code:
ContentBody cbFile = new FileBody(
upfile,
"yourFileNameHere",
"application/octet-stream",
"UTF-8");
I'm working on Yahoo boss API. The URL supposed to return JSON, I need to store it in a string then parse it. http://developer.yahoo.com/java/howto-parseRestJava.html
My question: How can I save URL response in a string ??
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpResponse response = (HttpResponse) httpclient.execute(httpPostRequest);//send a request and receive a response
System.out.println("HTTPResponse received in [" + (System.currentTimeMillis()-t) + "ms]");
HttpEntity entity = response.getEntity();
if (entity != null) {
// Read the content stream
InputStream instream = entity.getContent();
// convert content stream to a String
String resultString= convertStreamToString(instream);
instream.close();
resultString = resultString.substring(1,resultString.length()-1); // remove wrapping "[" and "]"
and here is the function convertStreamToString
private static String convertStreamToString(InputStream is) {
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
try {
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return sb.toString();
}
Technically, you want to wrap an appropriately configured InputStreamReader around the URL InputStream and copy the Reader to a StringWriter (apache commons IO has a "copy Reader to String" utility method). However, in order to determine the correct character set for the InputStreamReader, you need to parse the ContentType header. In which case you might be better off working with a higher level library like apache commons HttpClient.
Or, you could wrap a JSONTokener around the URL InputStream and parse the JSONObject directly from the JSONTokener (although i'm not entirely sure how the tokener determines the correct character set, so you might be safer using something like HttpClient).
I am using HttpClient 4.1 to download a web page. I would like to get a compressed version:
HttpGet request = new HttpGet(url);
request.addHeader("Accept-Encoding", "gzip,deflate");
HttpResponse response = httpClient.execute(request,localContext);
HttpEntity entity = response.getEntity();
response.getFirstHeader("Content-Encoding") shows "Content-Encoding: gzip"
however, entity.getContentEncoding() is null.
If I put:
entity = new GzipDecompressingEntity(entity);
I get:
java.io.IOException: Not in GZIP format
It looks like the resulting page is plain text and not compressed even though "Content-Encoding" header shows it's gzipped.
I have tried this on several URLs (from different websites) but get the same results.
How can I get a compressed version of a web page?
Don't use HttpClient if you don't want your API to handle mundane things like unzipping.
You can use the basic URLConnection class to fetch the compressed stream, as demonstrated by the following code :
public static void main(String[] args) {
try {
URL url = new URL("http://code.jquery.com/jquery-latest.js");
URLConnection con = url.openConnection();
// comment next line if you want to have something readable in your console
con.addRequestProperty("Accept-Encoding", "gzip,deflate");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String l;
while ((l=in.readLine())!=null) {
System.out.println(l);
}
} catch (Exception e) {
e.printStackTrace();
}
}
sock = new Socket("www.google.com", 80);
out = new BufferedOutputStream(sock.getOutputStream());
in = new BufferedInputStream(sock.getInputStream());
When i try to do printing out of content inside "in" like below
BufferedInputStream bin = new BufferedInputStream(in);
int b;
while ( ( b = bin.read() ) != -1 )
{
char c = (char)b;
System.err.print(""+(char)b); //This prints out content that is unreadable.
//Isn't it supposed to print out html tag?
}
If you want to print the content of a web page, you need to work with the HTTP protocol. You do not have to implement it yourself, the best way is to use existing implementations such as the java API HttpURLConnection or Apache's HttpClient
Here is an example of how to do it with HttpURLConnection:
URL url = new URL("http","www.google.com");
HttpURLConnection urlc = (HttpURLConnection)url.openConnection();
urlc.setAllowUserInteraction( false );
urlc.setDoInput( true );
urlc.setDoOutput( false );
urlc.setUseCaches( true );
urlc.setRequestMethod("GET");
urlc.connect();
// check you have received an status code 200 to indicate OK
// get the encoding from the Content-Type header
BufferedReader in = new BufferedReader(new InputStreamReader(urlc.getInputStream()));
String line = null;
while((line = in.readLine()) != null) {
System.out.println(line);
}
// close sockets, handle errors, etc.
As written above, you can save traffic by adding the Accept-Encoding header and check the
Content-Encoding header of the response.
Here is an HttpClient Example, taken from here:
// Create an instance of HttpClient.
HttpClient client = new HttpClient();
// Create a method instance.
GetMethod method = new GetMethod(url);
// Provide custom retry handler is necessary
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
try {
// Execute the method.
int statusCode = client.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + method.getStatusLine());
}
// Read the response body.
byte[] responseBody = method.getResponseBody();
// Deal with the response.
// Use caution: ensure correct character encoding and is not binary data
System.out.println(new String(responseBody));
} catch (HttpException e) {
System.err.println("Fatal protocol violation: " + e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.err.println("Fatal transport error: " + e.getMessage());
e.printStackTrace();
} finally {
// Release the connection.
method.releaseConnection();
}
Very easy to create a String from a Stream using Java 8 Stream API:
new BufferedReader(new InputStreamReader(in)).lines().collect(Collectors.joining("\n"))
Using IntelliJ I even can set this beeing a debug expression:
I guess in Eclipse it will work similar.
If you what to fetch the content of a webpage, you should take a look at apache httpclient instead of coding this yourself, expect for learning purposes or any other really good reason.