HttpGet sends request with the wrong encoding - java

I'm trying to get the text response from the following URL:
http://translate.google.cn/translate_a/single?client=t&sl=zh-CN&tl=en&dt=t&tk=265632.142896&q=%E4%BD%A0%E5%A5%BD
The response is the following:
[[["Hello there","你好",,,1]],,"zh-CN"]
(You can verify this response by entering the address into your browser.)
Here is a simplified version of my code that tries to download this text:
import org.apache.http.client.HttpClient;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.DefaultHttpClient;
public class Test {
public static String downloadString() {
String url = "http://translate.google.cn/translate_a/single?client=t&sl=zh-CN&tl=en&dt=t&tk=265632.142896&q=%E4%BD%A0%E5%A5%BD";
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
ResponseHandler<String> handler = new BasicResponseHandler();
try {
return client.execute(request, handler);
} catch (Exception e) {
return "GET request failed.";
}
}
}
When I call Test.downloadString(), I get the following (incorrect) response:
[[["Huan Chai Sunsolt","浣犲ソ",,,0]],,"zh-CN"]
I'm guessing that there is some sort of encoding problem behind the scenes somewhere in the request process (there are six bytes that should be interpreted as two Chinese characters, but are instead interpreted as three Japanese characters), but I can't seem to pinpoint the exact cause. What am I doing wrong in my code?

It's strange, but adding the User-Agent header fixed the problem:
request.addHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0");

Android 6.0 release removes support for the Apache HTTP client. If your app is using this client and targets Android 2.3 (API level 9) or higher, use the HttpURLConnection class instead.
here: http://developer.android.com/about/versions/marshmallow/android-6.0-changes.html#behavior-apache-http-client

Related

Connection Failed: Timeout for Testing API using Java, Apache HTTP Client

I am trying to test API using Java. I am using Java 8, Apache HTTP client 4.5.3 to test it. I tried many different ways to testing using Java .net class, Apache HTTP client but every time same issue;
Exception in thread "main"
org.apache.http.conn.HttpHostConnectException: Connect to
api.github.com:443 [api.github.com/192.30.253.116,
api.github.com/192.30.253.117] failed: Connection timed out: connect
at
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:159)
Everytime I am getting time out. But if I use same URL in Browser I am getting result.
Can someone help me to point out issue? Whether its setup issue or code issue?
Tried almost all codes available on internet. I am beginner for API testing and don't have knowledge of in depth of HTTP workflow.
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.impl.client.HttpClientBuilder;
import java.io.IOException;
import java.net.*;
public class API {
public static void main(String args[]) throws IOException, URISyntaxException {
HttpUriRequest request = new HttpGet( "https://api.github.com" );
// When
HttpResponse response = HttpClientBuilder.create().build().execute( request );
System.out.println(response.getStatusLine().getStatusCode());
}
}
Using Java .net package
import java.io.IOException;
import java.net.*;
public class API {
public static void main(String args[]) throws IOException, URISyntaxException {
URL url = new URL("http://maps.googleapis.com/maps/api/geocode/json?address=chicago&sensor=false");
//URL url = uri.toURL();
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/xml");
if (conn.getResponseCode() != 200) {
throw new RuntimeException("HTTP error code : "
+ conn.getResponseCode());
}
}
}
If the same URL works in browser then there are only three possibilities.
The URL expects headers like User-Agent. You can set request headers needed like this:
request.setHeader("User-Agent", "Mozilla");
You are in a corporate or restricted environment and need a proxy to connect to external URLs. Your browser might already be setup to use proxy server. In this case, you will need to pass proxy credentials to http client API.
Example: https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientProxyAuthentication.java
All outgoing requests are blocked in your environment by firewall or something. In this case, you will need to ask your network admin to allow network connection.

Apache Http Client in Android

I'm trying to get the whole html from a web page in Android.
In java console aplication I used to do like this:
DefaultHttpClient httpclient = new DefaultHttpClient();
String busca = "kindle";
HttpGet httpGet = new HttpGet("http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="+busca);
try {
ResponseHandler<String> manipulador = new BasicResponseHandler();
String resposta = httpclient.execute(httpGet,manipulador);
}
} finally {
httpGet.releaseConnection();
}
I tried to do the same in my Android aplication but I didn't work!
This library works in Android?
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.DefaultHttpClient;
Is there in better way to get a page html code in a string on Android?
Thks for the help!
I did it with another URl and it worked :)
Maybe the HTML code of that page I was using as to big to save in a String or to show in a Text
You can have a loot at this :
HttpClient 4.0.1 - how to release connection?
HttpRequestBase.releaseConnection() is introduced in Version 4.2

Reading over HTTP & (hopefully) automating actions using Java

Is there a way to make get & put calls over HTTP in java ? I also need to automate any user inputs like a button click on the target web-page(any web-page, not just yahoo finance)
I tried using the apache commons library & couldn't quite crack it:
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
public class Fin {
/**
* #param args
*/
public static void main(String[] args) {
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://finance.yahoo.com");
try {
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
} catch (Exception e) {
e.printStackTrace();
} finally {
httpget.releaseConnection();
}
}
}
I keep getting 'java.net.ConnectException: Connection refused', though i can see it in the browser.
If you really want to automate browser-based interactions, you could go further and use Watij, which runs a browser via the JVM and is driven via a browser-based API (I.e. you identify the button you want to press and it will actually do this)
Otherwise a library like the one you've identified will normally work. You have to watch out for client-side JavaScript interactions driving the requests, and configure proxies etc (I suspect this is your problem in the above)

HTTP authentication with Apache HTTP Components: force sending of challenge

I need to talk to an obscure webserver which requires authentication. If I don't supply credentials, a login form is displayed. However, if I do supply unsolicited Basic Authentication credentials, I get directly to the desired content.
wget supports this directly:
# this fails and downloads a form:
wget https://weird.egg/data.txt --http-user=me --http-password=shhh
# this works and downloads the document:
wget https://weird.egg/data.txt --http-user=me --http-password=shhh --auth-no-challenge
Now my question: How can I make the download in Java using Apache's HTTP Components?
Here's what I got so far. (There's also a proxy in place, and I use -Y on in wget, and I have a matching https_proxy environment variable.)
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.conn.params.ConnRoutePNames;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import java.net.URI;
// ...
DefaultHttpClient hc = new DefaultHttpClient();
hc.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, new HttpHost(proxy_name, proxy_port));
URI uri = new URI("https://weird.egg/data.txt");
hc..getCredentialsProvider().setCredentials(new AuthScope(AuthScope.ANY_HOST, AuthScope.ANY_PORT, AuthScope.ANY_REALM, AuthScope.ANY_SCHEME), new UsernamePasswordCredentials("me", "shh"));
hc.execute(new HttpGet(uri)); // etc
However, I only end up with the login form page, not the actual document. I'm suspecting that the DefaultHttpClient isn't sending the credentials unsolicited, in the way that wget does. Is there a way to make the Java program send the credentials?
Never mind. I solved the problem by not trying to use any library authentication methods, but just brute-forcing the Basic Authentication header into the request:
HttpGet get = new HttpGet(uri);
String basic_auth = new String(Base64.encodeBase64((username + ":" + password).getBytes()));
get.addHeader("Authorization", "Basic " + basic_auth);
hc.execute(get); // etc
(This needs the additional import org.apache.commons.codec.binary.Base64;, but in turn we can remove the credential-related imports.)

Status code of the method

I am executing following sample program of httpclient of "GET" method.
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.httpclient.params.HttpMethodParams;
public class TestMethodStatuscode {
public static void main(String[] args) throws Exception
{
HttpClient client = new HttpClient();
client.getParams().setParameter(HttpMethodParams.USER_AGENT,
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)");
//client.getParams().setCookiePolicy(org.apache.http.client.params.CookiePolicy.BROWSER_COMPATIBILITY);
GetMethod get = new GetMethod("http://de.mg40.mail.yahoo.com/neo/launch?.rand=80g4u84m26ifl");
//get_siteurl.getParams().setCookiePolicy(CookiePolicy.BROWSER_COMPATIBILITY);
client.executeMethod(get);
System.out.println("Status code: "+get.getStatusCode());
//System.out.println(get.getResponseBodyAsString());
get.releaseConnection();
}
}
output:- Status code: 200
The url I am trying to fetch is some url which I get during process of login to yahoo.de email account (login to yahoo.de did not work for me so was trying this code). If I enable wireshark (filter-http or (http.request.method == POST or http.request.method == GET) and then type this url in browser , press enter and finally I notice in wireshark that the return code of the above url is 302 which means it is redirected.
Also when I run my program and check in wireshark, I see that method returns the code 302. So my queston is why it is giving me 200 as a statuscode as output and not 302 ?
As per documentation:
GetMethods will follow redirect requests from the http server by default. This behavour can be disabled by calling setFollowRedirects(false).
You probably follow redirects set to true. You can get this with the getFollowRedirects() method. If that returns true, it will automatically follow redirects. You can set it to false to remove that behavior.

Categories