Can't download with HttpClient with a different server port - java

I am using Apache's HttpClient via httpclient-fluent-builder to download a http page.
This is the code:
...
try {
response = Http.get("http://fr3.ah.fm:9000/played.html")
.use(client) // use this HttpClient (required)
.charset("windows-1252") // set the encoding... (optional)
.asString();
} catch (IOException e) {
Log.d(TAG, "Error: "+e.toString());
e.printStackTrace();
}
Log.i(TAG, ""+ response );
...
Problem is that I get org.apache.http.client.ClientProtocolException
It's something with the host:port/url, beacause it works with urls without ports. I also get this same error with another Httphelper class than fluent-builder. Firewall is off.
Logcat: http://pastebin.com/yMMvvdQ3

Found out what it was via this post that it was the Shoutcast server...
You should be able to connect to 8000
with your web browser and get the DNAS
status page. If, on the other hand,
you connect to that port with a media
player, it'll return the direct MP3
stream. (Unfortunately, in an
incredibly boneheaded piece of design,
the way SHOUTcast decides which to
respond with is by sniffing your
User-Agent header for something
beginning with Mozilla, so if you're
using an alternative browser or
blocking your UA you'll not be able to
get the status, and if the stream's
down you might just get nothing.)
Drove me crazy.. But it's an easy fix. I just added.
.header("User-Agent", "UserAgent: Mozilla/5.0")

try to set the uri using HttpHost and use it in HttpClient.execute().
I have not tried it..

Related

HttpResponse code not 200

i'm running a simple java program to get HttpResponse codes, however for some reason not all codes happen to be 200. I find this odd because when checking the network tab for certain URLs like www.reddit.com, the Response is 200, but my program is returning a different value.
The code below...
try{
String urlName = "http://www.reddit.com";
URL url = new URL(urlName);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.connect();
String message = connection.getResponseMessage();
System.out.println("Message: " + message);
int code = connection.getResponseCode();
System.out.println(Integer.toString(code));
}
catch(Exception e){
e.printStackTrace();
}
Lastly, is there a reason to set the RequestMethod to GET and connection again? I get the response code whether or not I have that code because the connection executes openConnection();
Goal - make all valid connections return 200
You said that you're seeing a 301 for Reddit and a 302 for Facebook. Those status codes mean that you're getting redirected. Your browser's following them; your code isn't.
Java's built-in HTTP support is not great for end-users. I strongly recommend using a better HTTP client library, such as Apache's HttpClient, or Horizon, which is built on top of Apache (for synchronous requests) and Ning (for async).
Full disclosure: I work for HubSpot; Horizon is one of our open-source libraries.
It would be nice if you posted the error code it did give...
I ran your code myself and the error was 301, meaning moved permanently.
If you go to http://www.reddit.com yourself, you will see that you get redirected to the httpS version of reddit. Changing this in the urlName will fix your problem.
Edit: same goes for facebook as i saw in comments to your question, google does not require https always so that does work.

Get the redirected URL of a very specific URL (in Java)

How can I get the redirected URL of http://at.atwola.com/?adlink/5113/1649059/0/2018/AdId=4041444;BnId=872;itime=15692006;impref=13880156912668385284; in Java?
My code (given below) is constructed according to answers to similar questions on stack-overflow (https://stackoverflow.com/a/5270162/1382251 in particular).
But it just yields the original URL. I suspect that there are other similar cases, so I would like to resolve this one in specific and use the solution in general.
String ref = "http://at.atwola.com/?adlink/5113/1649059/0/2018/AdId=4041444;BnId=872;itime=15692006;impref=13880156912668385284;";
try
{
URLConnection con1 = new URL(ref).openConnection();
con1.connect();
InputStream is = con1.getInputStream();
URL url = con1.getURL();
is.close();
String finalPage = url.toString();
if (finalPage.equals(ref))
{
HttpURLConnection con2 = (HttpURLConnection)con1;
con2.setInstanceFollowRedirects(false);
con2.connect();
if (con2.getResponseCode()/100 == 3)
finalPage = con2.getHeaderField("Location");
}
System.out.println(finalPage);
}
catch (Exception error)
{
System.out.println("error");
}
I played a bit with your URL with telnet, wget, and curl and I noticed that in some cases the server returns response 200 OK, and sometimes 302 Moved Temporarily. The main difference seems to be the request User-agent header. Your code works if you add the following before con1.connect():
con1.setRequestProperty("User-Agent","");
That is, with empty User-Agent (or if the header is not present at all), the server issues a redirect. With the Java User-Agent (in my case User-Agent: Java/1.7.0_45) and with the default curl User-Agent (User-Agent: curl/7.32.0) the server responds with 200 OK.
In some cases you might need to also set:
System.setProperty("http.agent", "");
See Setting user agent of a java URLConnection
The server running the site is the Adtech Adserver and apparently it is doing user agent sniffing. There is a long history of user agent sniffing. So it seems that the safest thing to do would be to set the user agent to Mozilla:
con1.setRequestProperty("User-Agent","Mozilla"); //works with your code for your URL
Maybe the safest option would be to use a user agent used by some of the popular web browsers.

403 error in accessing an URL but works fine in browsers

String url = "http://maps.googleapis.com/maps/api/directions/xml?origin=Chicago,IL&destination=Los+Angeles,CA&waypoints=Joplin,MO|Oklahoma+City,OK&sensor=false";
URL google = new URL(url);
HttpURLConnection con = (HttpURLConnection) google.openConnection();
and I use BufferedReader to print the content I get 403 error
The same URL works fine in the browser. Could any one suggest.
The reason it works in a browser but not in java code is that the browser adds some HTTP headers which you lack in your Java code, and the server requires those headers. I've been in the same situation - and the URL worked both in Chrome and the Chrome plugin "Simple REST Client", yet didn't work in Java. Adding this line before the getInputStream() solved the problem:
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
..even though I have never used Mozilla. Your situation might require a different header. It might be related to cookies ... I was getting text in the error stream advising me to enable cookies.
Note that you might get more information by looking at the error text. Here's my code:
try {
HttpURLConnection connection = ((HttpURLConnection)url.openConnection());
connection.addRequestProperty("User-Agent", "Mozilla/4.0");
InputStream input;
if (connection.getResponseCode() == 200) // this must be called before 'getErrorStream()' works
input = connection.getInputStream();
else input = connection.getErrorStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String msg;
while ((msg =reader.readLine()) != null)
System.out.println(msg);
} catch (IOException e) {
System.err.println(e);
}
HTTP 403 is a Forbidden status code. You would have to read the HttpURLConnection.getErrorStream() to see the response from the server (which can tell you why you have been given a HTTP 403), if any.
This code should work fine. If you have been making a number of requests, it is possible that Google is just throttling you. I have seen Google do this before. You can try using a proxy to verify.
Most browsers automatically encode URLs when you enter them, but the Java URL function doesn't.
You should Encode the URL with URLEncoder URL Encoder
I know this is a bit late, but the easiest way to get the contents of a URL is to use the Apache HttpComponents HttpClient project: http://hc.apache.org/httpcomponents-client-ga/index.html
you original page (with link) and the targeted linked page are not the same domain.
original-domain and target-domain.
I found the difference is in request header:
with 403 forbidden error,
request header have one line:
Referer: http://original-domain/json2tree/ipfs/ipfsList.html
when I enter url, no 403 forbidden,
the request header does NOT have above line referer: original-domain
I finally figure out how to fix this error!!!
on your original-domain web page, you have to add
<meta name="referrer" content="no-referrer" />
it will remove or prevent sending the Referer in header, works both for links and for Ajax requests made

URLConnection FileNotFoundException for non-standard HTTP port sources

I was trying to use the Apache Ant Get task to get a list of WSDLs generated by another team in our company. They have them hosted on a weblogic 9.x server on http://....com:7925/services/. I am able to get to the page through a browser, but the get task gives me a FileNotFoundException when trying to copy the page to a local file to parse. I was still able to get (using the ant task) a URL without the non-standard port 80 for HTTP.
I looked through the Ant source code, and narrowed the error down to the URLConnection. It seems as though the URLConnection doesn't recognize the data is HTTP traffic, since it isn't on the standard port, even though the protocol is specified as HTTP. I sniffed the traffic using WireShark and the page loads correctly across the wire, but still gets the FileNotFoundException.
Here's an example where you will see the error (with the URL changed to protect the innocent). The error is thrown on connection.getInputStream();
import java.io.File;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
public class TestGet {
private static URL source;
public static void main(String[] args) {
doGet();
}
public static void doGet() {
try {
source = new URL("http", "test.com", 7925,
"/services/index.html");
URLConnection connection = source.openConnection();
connection.connect();
InputStream is = connection.getInputStream();
} catch (Exception e) {
System.err.println(e.toString());
}
}
}
The response to my HTTP request returned with a status code 404, which resulted in a FileNotFoundException when I called getInputStream(). I still wanted to read the response body, so I had to use a different method: HttpURLConnection#getErrorStream().
Here's a JavaDoc snippet of getErrorStream():
Returns the error stream if the
connection failed but the server sent
useful data nonetheless. The typical
example is when an HTTP server
responds with a 404, which will cause
a FileNotFoundException to be thrown
in connect, but the server sent an
HTML help page with suggestions as to
what to do.
Usage example:
public static String httpGet(String url) {
HttpURLConnection con = null;
InputStream is = null;
try {
con = (HttpURLConnection) new URL(url).openConnection();
con.connect();
//4xx: client error, 5xx: server error. See: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
boolean isError = con.getResponseCode() >= 400;
//In HTTP error cases, HttpURLConnection only gives you the input stream via #getErrorStream().
is = isError ? con.getErrorStream() : con.getInputStream();
String contentEncoding = con.getContentEncoding() != null ? con.getContentEncoding() : "UTF-8";
return IOUtils.toString(is, contentEncoding); //Apache Commons IO
} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
//Note: Closing the InputStream manually may be unnecessary, depending on the implementation of HttpURLConnection#disconnect(). Sun/Oracle's implementation does close it for you in said method.
if (is != null) {
try {
is.close();
} catch (IOException e) {
throw new IllegalStateException(e);
}
}
if (con != null) {
con.disconnect();
}
}
}
This is an old thread, but I had a similar problem and found a solution that is not listed here.
I was receiving the page fine in the browser, but got a 404 when I tried to access it via the HttpURLConnection. The URL I was trying to access contained a port number. When I tried it without the port number I successfully got a dummy page through the HttpURLConnection. So it seemed the non-standard port was the problem.
I started thinking the access was restricted, and in a sense it was. My solution was that I needed to tell the server the User-Agent and I also specify the file types I expect. I am trying to read a .json file, so I thought the file type might be a necessary specification as well.
I added these lines and it finally worked:
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 ( compatible ) ");
httpConnection.setRequestProperty("Accept","*/*");
check the response code being returned by the server
I know this is an old thread but I found a solution not listed anywhere here.
I was trying to pull data in json format from a J2EE servlet on port 8080 but was receiving the file not found error. I was able to pull this same json data from a php server running on port 80.
It turns out that in the servlet, I needed to change doGet to doPost.
Hope this helps somebody.
You could use OkHttp:
OkHttpClient client = new OkHttpClient();
String run(String url) throws IOException {
Request request = new Request.Builder()
.url(url)
.build();
Response response = client.newCall(request).execute();
return response.body().string();
}
I've tried that locally - using the code provided - and I don't get a FileNotFoundException except when the server returns a status 404 response.
Are you sure that you're connecting to the webserver you intend to be connecting to? Is there any chance you're connecting to a different webserver? (I note that the port number in the code doesn't match the port number in the link)
I have run into a similar issue but the reason seems to be different, here is the exception trace:
java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.reflect.GeneratedConstructorAccessor2.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at com.doitnext.loadmonger.HttpExecution.getBody(HttpExecution.java:85)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:214)
at com.doitnext.loadmonger.ClientWorker.run(ClientWorker.java:126)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.FileNotFoundException: http://myhost1:8081/test/api?wait=1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at com.doitnext.loadmonger.HttpExecution.execute(HttpExecution.java:166)
... 2 more
So it would seem that just getting the response code will cause the URL connection to callGetInputStream.
I know this is an old thread but just noticed something on this one so thought I will just put it out there.
Like Jessica mentioned, this exception is thrown when using non-standard port.
It only seems to happen when using DNS though. If I use IP number I can specify the port number and everything works fine.

Implementing Java's getResponseCode() in C?

If it's any help, there is also a similar class in C#'s WebRequest. Although I do not want it in java or .NET, i am wondering how to implement this in native C/C++ code (for windows).
for reference:
try {
URL url=new URL("http://google.ca");
HttpURLConnection con=(HttpURLConnection)url.openConnection();
con.connect();
int code = con.getResponseCode();
System.out.println(code);
} catch (MalformedURLException e) {
System.err.println("Error reading URL.");
}
prints out:
200
meaning "OK"
I understand I probably need to use sockets and send a User-Agent string, but I haven't a clue where to begin. Whenever I learn a new language the first thing I like to do is try porting my code to it, but this one has stumped me.
Any help is appreciated
There is no HTTP support in standard C library.
So you have two options - use 3rd-party HTTP library, such ar libcurl, or handle HTTP yourself:
open socket
resolve hostname
connect to server
build HTTP request
send request to server
receive HTTP response
parse response and get response code from it.

Categories