I'm facing strange behavior when i request my own ping server with java HttpComponent client.
Sometimes, an unknownHostException is thrown with no reason.
This exception is principally thrown after switching network, for example when i changed the default network route (from eth0 to wifi or from wifi to other mobile NIC)
For information, my wifi connection is enabled through mobile access point. (Could this point reason of my issue ? )
I'm running on a linux OS, in embededded context with limited linux command.
Below, code example:
HttpGet get = new HttpGet("someUri");
if (networkInterfaceName != null) { // used to ping through specific route
get.setConfig(RequestConfig.custom()
.setLocalAddress(networkInterfaceToInetAdress(networkInterfaceName ))
.setConnectionRequestTimeout(15000)
.setConnectTimeout(15000)
.setSocketTimeout(15000).build());
}
HttpResponse response = HttpClientBuilder.create().useSystemProperties().build().execute(get);
So, if a specific interface is provided, local address is set in the requestConfig, otherwise we use the default route.
I'm currently not abled to identify the root cause and some actions are tried to identify the root cause.
First action, i checked on each httpRequest, route and /etc/resolv.conf and everything seems ok. Ping cmd result is also ok when unknown host exception is thrown.
I checked the httpClient code, and it seems that it create a new client on each call, so there is no httpClient cache for me.I used HttpClientBuilder class and it seems that create method call return always a new builder. Also, cloaseableHttpClient is not closed explicitly, should it impact next call ?
I checked httpClient method, it enable setting a dnsResolver, but in my case, but i've no control to remote/targeted ip.
I set JVM system properties in code (and not as Java ARGS directly):
java.security.Security.setProperty("networkaddress.cache.ttl", "0");
java.security.Security.setProperty("networkaddress.cache.negative.ttl", "0");
System.setProperty("java.net.preferIPv4Stack", "true");
System.setProperty("sun.net.inetaddr.ttl", "0");
Also, i'm going to use dnsjava lib when unknowhostexception is thrown as dig/dnslookup are not available in my limited OS.
Any hint about this issue ? or other thing that i need to ckeck ? may be it's not a DNS issue but something in Java socket or connection that i passed by ?
UPDATE
I tried to run same logic in another JVM and httpGet request are ok while httpGet fail in my original program that is running in another JVM
Thanks for your help.
Related
I'm creating a (well behaved) web spider and I notice that some servers are causing Apache HttpClient to give me a SocketException -- specifically:
java.net.SocketException: Connection reset
The code that causes this is:
// Execute the request
HttpResponse response;
try {
response = httpclient.execute(httpget); //httpclient is of type HttpClient
} catch (NullPointerException e) {
return;//deep down in apache http sometimes throws a null pointer...
}
For most servers it's just fine. But for others, it immediately throws a SocketException.
Example of site that causes immediate SocketException: http://www.bhphotovideo.com/
Works great (as do most websites): http://www.google.com/
Now, as you can see, www.bhphotovideo.com loads fine in a web browser. It also loads fine when I don't use Apache's HTTP Client. (Code like this:)
HttpURLConnection c = (HttpURLConnection)url.openConnection();
BufferedInputStream in = new BufferedInputStream(c.getInputStream());
Reader r = new InputStreamReader(in);
int i;
while ((i = r.read()) != -1) {
source.append((char) i);
}
So, why don't I just use this code instead? Well there are some key features in Apache's HTTP Client that I need to use.
Does anyone know what causes some servers to cause this exception?
Research so far:
Problem occurs on my local Mac dev machines AND an AWS EC2 Instance, so it's not a local firewall.
It seems the error isn't caused by the remote machine because the exception doesn't say "by peer"
This stack overflow seems relavent java.net.SocketException: Connection reset but the answers don't show why this would happen only from Apache HTTP Client and not other approaches.
Bonus question: I'm doing a fair amount of crawling with this system. Is there generally a better Java class for this other than Apache HTTP Client? I've found a number of issues (such as the NullPointerException I have to catch in the code above). It seems that HTTPClient is very picky about server communications -- more picky than I'd like for a crawler that can't just break when a server doesn't behave.
Thanks all!
Solution
Honestly, I don't have a perfect solution, but it works, so that's good enough for me.
As pointed out by oleg below, Bixo has created a crawler that customizes HttpClient to be more forgiving to servers. To "get around" the issue more than fix it, I just used SimpleHttpFetcher provided by Bixo here:
(linked removed - SO thinks I'm a spammer, so you'll have to google it yourself)
SimpleHttpFetcher fetch = new SimpleHttpFetcher(new UserAgent("botname","contact#yourcompany.com","ENTER URL"));
try {
FetchedResult result = fetch.fetch("ENTER URL");
System.out.println(new String(result.getContent()));
} catch (BaseFetchException e) {
e.printStackTrace();
}
The down side to this solution is that there are a lot of dependencies for Bixo -- so this may not be a good work around for everyone. However, you can always just work through their use of DefaultHttpClient and see how they instantiated it to get it to work. I decided to use the whole class because it handles some things for me, like automatic redirect following (and reporting the final destination url) that are helpful.
Thanks for the help all.
Edit: TinyBixo
Hi all. So, I loved how Bixo worked, but didn't like that it had so many dependencies (including all of Hadoop). So, I created a vastly simplified Bixo, without all the dependencies. If you're running into the problems above, I would recommend using it (and feel free to make pull requests if you'd like to update it!)
It's available here: https://github.com/juliuss/TinyBixo
First, to answer your question:
The connection reset was caused by a problem on the server side. Most likely the server failed to parse the request or was unable to process it and dropped the connection as a result without returning a valid response. There is likely something in the HTTP requests generated by HttpClient that causes server side logic to fail, probably due to a server side bug. Just because the error message does not say 'by peer' does not mean the connection reset took place on the client side.
A few remarks:
(1) Several popular web crawlers such as bixo http://openbixo.org/ use HttpClient without major issues but pretty much of them had to tweak HttpClient behavior to make it more lenient about common HTTP protocol violations. Per default HttpClient is rather strict about the HTTP protocol compliance.
(2) Why did not you report the NPE problem or any other problem you have been experiencing to the HttpClient project?
These two settings will sometimes help:
client.getParams().setParameter("http.socket.timeout", new Integer(0));
client.getParams().setParameter("http.connection.stalecheck", new Boolean(true));
The first sets the socket timeout to be infinite.
Try getting a network trace using wireshark, and augment that with log4j logging of the HTTPClient. That should show why the connection is being reset
I have set up a local proxy server for request logging but my java code ignores it and connects directly (Windows XP, JDK 1.7). Web browsers work with it. So I wrote test code for discussion that seems to connect directly even if a (bogus) proxy is specified. With the bogus proxy, I would expect connection failure but the code succeeds, connecting directly:
System.setProperty("http.proxyHost", "localhost");
System.setProperty("http.proxyPort", "12345");
System.setProperty("http.nonProxyHosts", "noNonProxyHost.com");
URL url = new URL("http://docs.oracle.com/javase/7/docs/technotes/guides/net/proxies.html");
InputStream in = url.openStream();
System.out.println("Connection via bogus proxy succeeded");
The code is run as standalone Java, no Maven, no applet, no container. I have a direct internet connection.
In your case using java.net.URL(), if the proxy server cannot be reached at http.proxyHost and http.proxyPort then it simply falls back and tries to do a direct connect. If that succeeds, you'll see no exception thrown which is why your code works without error. You should see a pause while it tries to find the proxy though.
This sample code below happily fetches the URL and displays it, without error, even when run with bogus proxy settings. -Dhttp.proxyHost=bogus -Dhttp.proxyPort=2345 but will talk to my local proxy localhost port 8888 if set correctly
import java.io.*;
import java.net.URL;
import java.util.*;
public class URLClient {
private static String sUrl = "http://www.apache.org/";
public static void main(String[] args) {
try {
URL url = new URL(sUrl);
InputStream is = url.openStream();
java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
String output = s.hasNext() ? s.next() : "";
System.out.println(output);
} catch(Throwable e) {
System.err.println("exception");
}
}
}
The problem I was originally having with http.proxyHost and http.proxyPort being ignored (Google led me to your question) was that those settings are completely ignored by apache.commons.httpClient because it uses its own sockets, as described here.
http://cephas.net/blog/2007/11/14/java-commons-http-client-and-http-proxies/
I have faced a similar problem recently. First of all, one part of the above answer from Daemon42 explains pretty well, why the bogus proxy server didn't lead to a failure of the program:
if the proxy server cannot be reached at http.proxyHost and http.proxyPort then it simply falls back and tries to do a direct connect. If that succeeds, you'll see no exception thrown which is why your code works without error. You should see a pause while it tries to find the proxy though.
Still, your actual question was, why the proxy server configured via the operating system is not used by the Java application. As stated in the Oracle documentation (https://docs.oracle.com/javase/8/docs/technotes/guides/net/proxies.html), the system proxy settings are not evaluated by Java by default. To do so, you have to set the value of the system property "java.net.useSystemProxies" to the value "true".
You can set that system property on the command line, or you can edit the JRE installation file jre/lib/net.properties, that way you have to change it only once on a given system.
I have 3 .jsp's. The first one asks the user for their username. Once the form is submitted it is taken to a 2nd jsp where a unique passcode is created for the user. How would I go about taking this passcode and passing it to a 3rd jsp using a socket?
You can use java.net.URL and java.net.URLConnection to fire and handle HTTP requests programmatically. They make use of sockets under the covers and this way you don't need to fiddle with low level details about the HTTP protocol. You can pass parameters as query string in the URL.
String url = "http://localhost:8080/context/3rd.jsp?passcode=" + URLEncoder.encode(passcode, "UTF-8");
InputStream input = new URL(url).openStream();
// ... (read it, it contains the response)
This way the passcode request parameter is available in the 3rd JSP by ${param.passcode} or request.getParameter("passcode") the usual way.
Better is however to just include that 3rd JSP in your 2nd JSP.
request.setAttribute("passcode", passcode);
request.getRequestDispatcher("3rd.jsp").include(request, response);
This way the passcode is available as request attribute in the 3rd JSP by ${passcode} or request.getAttribute("passcode") the usual way.
See also:
Using java.net.URLConnection to fire and handle HTTP requests
Unrelated to the concrete question, this is however a terribly nasty hack and the purpose of this is beyond me. There's somewhere a serious design flaw in your application. Most likely those JSPs are tight coupled with business logic which actually belongs in normal and reuseable Java classes like servlets and/or EJBs and/or JAX-WS/RS which you just import and call in your Java class the usual Java way. JSPs are meant to generate and send HTML, not to act as business services, let alone web services. See also How to avoid Java code in JSP files?
So, you want the username to be submitted from the first JSP to the second, by submitting a form to the second, right?
But, for interaction between the second and third, you want to avoid using the communication mechanisms behind the the JSP files and use your own, right?
Well, how you might implement doing this depends on where you're sending your communication from and to. For instance, are they on the same machine, or on different machines?
Generally speaking, you'll need a client-server type of relationship to be set up here. I imagine that you would want your third JSP to act as the server.
What the third JSP will do is will sit and wait for a client to try to communicate with it. But, before you can do that, you'll first need to bind a port to your application. Ports are allocated by the Operating System and are given to requesting processes.
When trying to implement this in Java, you might want to try something like the following:
int port_number = 1080;
ServerSocket server = new ServerSocket(port_number);
In the above example, the ServerSocket is already bound to the specified port 1080. It doesn't have to be 1080 - 1080 is just an example.
Next, you will want to listen and wait for a request to come in. You can implement this step in the following:
Socket request = null;
while((request = server.accept()) == null)
{}
This will cause the server socket to keep looping until it finally receives a request. When the request comes in, it will create a new Socket object to handle that request. So, you could come back to your loop later on and continue to wait and accept requests, while a child thread handles communication using your newly created request Socket.
But, for your project, I would guess that you don't need to communicate with more than one client at a time, so it's okay if we just simply stop listening once we receive a request, I suppose.
So, now onto the client application. Here, it's a little bit different from what we had with the server. First off, instead of listening in on the port and waiting for are request, the client's socket will actively try to connect to a remote host on their port. So, if there is no server listening in on that port, then the connection will fail.
So, two things will need to be know, those are:
What's the IP Address of the server?
What port is the server listening in on?
There are short-cuts to getting the connection using the Java Socket class, but I assume that you're going to test this out on the same machine, right? If so, then you will need two separate ports for both your client and server. That's because the OS won't allow two separate processes to share the same port. Once a process binds to the port, no other process is allowed to access it until that port releases it back to the OS.
So, to make the two separate JSP's communicate on the same physical machine, you'll need both a local port for your client, and you'll need the server's port number that it's listening in on.
So, let's try the following for the client application:
int local_port = 1079;
int remote_port = 1080;
InetSocketAddress localhost = new InetSocketAddress(local_port);
Socket client = new Socket(); //The client socket is not yet bound to any ports.
client.bind(localhost); //The client socket has just requested the specified port number from the OS and should be bound to it.
String remoteHostsName = "[put something here]";
InetSocketAddress remotehost = new InetSocketAddress(InetAddress.getByName(remoteHostsName), remote_port); //Performs a DSN lookup of the specified remote host and returns an IP address with the allocated port number
client.connect(remotehost); //Connection to the remote server is being made.
That should help you along your way.
A final note should be made here. You can't actually run these two applications using the same JVM. You'll need two separate processes for client and server applications to run.
I'm creating a (well behaved) web spider and I notice that some servers are causing Apache HttpClient to give me a SocketException -- specifically:
java.net.SocketException: Connection reset
The code that causes this is:
// Execute the request
HttpResponse response;
try {
response = httpclient.execute(httpget); //httpclient is of type HttpClient
} catch (NullPointerException e) {
return;//deep down in apache http sometimes throws a null pointer...
}
For most servers it's just fine. But for others, it immediately throws a SocketException.
Example of site that causes immediate SocketException: http://www.bhphotovideo.com/
Works great (as do most websites): http://www.google.com/
Now, as you can see, www.bhphotovideo.com loads fine in a web browser. It also loads fine when I don't use Apache's HTTP Client. (Code like this:)
HttpURLConnection c = (HttpURLConnection)url.openConnection();
BufferedInputStream in = new BufferedInputStream(c.getInputStream());
Reader r = new InputStreamReader(in);
int i;
while ((i = r.read()) != -1) {
source.append((char) i);
}
So, why don't I just use this code instead? Well there are some key features in Apache's HTTP Client that I need to use.
Does anyone know what causes some servers to cause this exception?
Research so far:
Problem occurs on my local Mac dev machines AND an AWS EC2 Instance, so it's not a local firewall.
It seems the error isn't caused by the remote machine because the exception doesn't say "by peer"
This stack overflow seems relavent java.net.SocketException: Connection reset but the answers don't show why this would happen only from Apache HTTP Client and not other approaches.
Bonus question: I'm doing a fair amount of crawling with this system. Is there generally a better Java class for this other than Apache HTTP Client? I've found a number of issues (such as the NullPointerException I have to catch in the code above). It seems that HTTPClient is very picky about server communications -- more picky than I'd like for a crawler that can't just break when a server doesn't behave.
Thanks all!
Solution
Honestly, I don't have a perfect solution, but it works, so that's good enough for me.
As pointed out by oleg below, Bixo has created a crawler that customizes HttpClient to be more forgiving to servers. To "get around" the issue more than fix it, I just used SimpleHttpFetcher provided by Bixo here:
(linked removed - SO thinks I'm a spammer, so you'll have to google it yourself)
SimpleHttpFetcher fetch = new SimpleHttpFetcher(new UserAgent("botname","contact#yourcompany.com","ENTER URL"));
try {
FetchedResult result = fetch.fetch("ENTER URL");
System.out.println(new String(result.getContent()));
} catch (BaseFetchException e) {
e.printStackTrace();
}
The down side to this solution is that there are a lot of dependencies for Bixo -- so this may not be a good work around for everyone. However, you can always just work through their use of DefaultHttpClient and see how they instantiated it to get it to work. I decided to use the whole class because it handles some things for me, like automatic redirect following (and reporting the final destination url) that are helpful.
Thanks for the help all.
Edit: TinyBixo
Hi all. So, I loved how Bixo worked, but didn't like that it had so many dependencies (including all of Hadoop). So, I created a vastly simplified Bixo, without all the dependencies. If you're running into the problems above, I would recommend using it (and feel free to make pull requests if you'd like to update it!)
It's available here: https://github.com/juliuss/TinyBixo
First, to answer your question:
The connection reset was caused by a problem on the server side. Most likely the server failed to parse the request or was unable to process it and dropped the connection as a result without returning a valid response. There is likely something in the HTTP requests generated by HttpClient that causes server side logic to fail, probably due to a server side bug. Just because the error message does not say 'by peer' does not mean the connection reset took place on the client side.
A few remarks:
(1) Several popular web crawlers such as bixo http://openbixo.org/ use HttpClient without major issues but pretty much of them had to tweak HttpClient behavior to make it more lenient about common HTTP protocol violations. Per default HttpClient is rather strict about the HTTP protocol compliance.
(2) Why did not you report the NPE problem or any other problem you have been experiencing to the HttpClient project?
These two settings will sometimes help:
client.getParams().setParameter("http.socket.timeout", new Integer(0));
client.getParams().setParameter("http.connection.stalecheck", new Boolean(true));
The first sets the socket timeout to be infinite.
Try getting a network trace using wireshark, and augment that with log4j logging of the HTTPClient. That should show why the connection is being reset
I have to implement a webservice client to a given WSDL file.
I used the SDK's 'wsimport' tool to create Java classes from the WSDL as well as a class that wrap's the webservice's only method (enhanceAddress(auth, param, address)) into a simple java method. So far, so good. The webservice is functional and returning results correcty. The code looks like this:
try {
EnhancedAddressList uniservResponse = getWebservicePort().enhanceAddress(m_auth, m_param, uniservAddress);
//Where the Port^ is the HTTP Soap 1.2 Endpoint
}catch (Throwable e) {
throw new AddressValidationException("Error during uniserv webservice request.", e);
}
The Problem now: I need to get Information about the connection and any error that might occur in order to populate various JMX values (such as COUNT_READ_TIMEOUT, COUNT_CONNECT_TIMEOUT, ...)
Unfortunately, the method does not officially throw any Exceptions, so in order to get details about a ConnectException, i need to use getCause() on the ClientTransportException that will be thrown.
Even worse: I tried to test the read timeout value, but there is none. I changed the service's location in the wsdl file to post the request to a php script that simply waits forever and does not return. Guess what: The web service client does not time out but waits forever as well (I killed the app after 30+ minutes of waiting). That is not an option for my application as i eventually run out of tcp connections if some of them get 'stuck'.
The enhanceAddress(auth, param, address) method is not implemented but annotated with javax.jws.* Annotations, meaning that i cannot see/change/inspect the code that is actually executed.
Do i have any option but to throw the whole wsimport/javax.jsw-stuff away and implement my own soap client?
to setup read-timeout and connect timeouts you can configure the binding parameters when you setup your Service and Port instances:
Service = new Service();
Port = Service.getPort();
((BindingProvider) Port).getRequestContext().put(
BindingProvider.ENDPOINT_ADDRESS_PROPERTY,
"http://localhost:8080/service");
((BindingProvider) Port).getRequestContext().put(
BindingProviderProperties.CONNECT_TIMEOUT,
30);
((BindingProvider) Port).getRequestContext().put(
BindingProviderProperties.REQUEST_TIMEOUT,
30);
now whenever you execute a service via "Port" you will get response timeouts and/or connection timeouts if the backend is slow to respond. the values follow the timeout values of the Socket Class.
when these timeouts are exceeded you will get timeout exeption or a connection exception and you can put counter-code to keep track of how many you get.