I’m testing a Diffusion solution in our pre-production environment. The solution gives anonymous clients 10 minutes of free access before they have to authenticate, or be disconnected. This works fine in development and early testing, but in pre-production when one client is disconnected we see many simultaneous disconnection of other clients without cause. Once the logging is set to FINEST the log file says:
2016-03-21 11:57:36.557|DEBUG|Diffusion: InboundThreadPool Thread_4||NIOBufferedChannel#52e2a219[connected local=/10.0.4.1:8080 remote=/10.0.1.99:58673] : Closed(UNEXPECTED_ERROR) Unexpected error EOF|com.pushtechnology.diffusion.io.message.MessageChannelException
2016-03-21 11:57:36.558|DEBUG|Diffusion: InboundThreadPool Thread_4||Java Client 50328FF242799CD4-000000000000015A AWAITING_RECONNECTION#10.0.1.99: State changed from CONNECTED to AWAITING_RECONNECTION.|com.pushtechnology.diffusion.clients.impl.ClientImpl
2016-03-21 11:57:36.558|DEBUG|Diffusion: InboundThreadPool Thread_4||Java Client 50328FF242799CD4-000000000000015A AWAITING_RECONNECTION#10.0.1.99: CONNECTION_LOST keeping alive for 60000 ms.|com.pushtechnology.diffusion.clients.impl.ClientImpl
The effected clients are always browsers, not smart phones. Often older browsers such as IE9.
I'm guessing that your pre-production environment has a load balancer which is set to use connection pooling. Versions of IE prior to v10 did not support WebSockets, so they'll be using XHR long polling. Your smart phone client also will be using WebSockets, so will be unaffected.
The manual has this to say in section "Considerations when using load balancers"
Do not use connection pooling for connections between the load balancer and the Diffusion server. If multiple client connections are multiplexed through a single server-side connection, this can cause client connections to be prematurely closed.
In Diffusion, a client is associated with a single TCP/HTTP connection for the lifetime of that connection. If a Diffusion server closes a client, the connection is also closed. Diffusion makes no distinction between a single client connection and a multiplexed connection, so when a client sharing a multiplexed connection closes, the connection between the load balancer and Diffusion is closed, and subsequently all of the client-side connections multiplexed through that server-side connection are closed.
To illustrate the problem. When a Diffusions server has a direct connection with its audience Alice, Bob and Charlie, closing Bob's connection is straight forward
When a connection pooling middle box (a proxy or load-balancer) enters the mix, closing Bob's connection results in disconnection for Alice and Charlie as well.
So, whereas connection pooling is a good idea for regular HTTP servers, it is problematic for Diffusion servers entertaining an audience of XHR polling clients if it needs to disconnect discrete clients.
Related
I am trying to setup a HttpClient through the HttpClientBuilder. I also had a look at the HttpClientConnectionManager and here the confusion started.
On the ConnectionManager or more exactly the PoolingHttpClientConnectionManager there are methods to:
close expired connections
close idle connections
When is a connection considered expired?
When is it idle?
What happens when a connection from the pool is closed? Is it ensured, that there are connections recreated when needed?
HTTP is based on TCP, which manages that packages are sent and received in the correct order and requests retransmissions if packages got lost mid way. A TCP connection is started with a TCP-Handshake consisting of SYN, SYN-ACK and ACK messages while it is ended with a FIN, ACK-FIN, and ACK series as can be seen from this image taken from Wikipedia
While HTTP is a request-response protocol, opening and closing connections is quite costly and so HTTP/1.1 allowed to reuse existing connections. With the header Connection: keep-alive i.e. you tell your client (i.e. browser) to keep the connection open to a server. A server can have litterally thousands and thousands open connection at the same time. In order to avoid draining the server's resources connection are usually timely limited. Via socket timeouts idle connections or connections with certain connection issues (broken internet access, ...) are closed after some predefined time by the server automatically.
Plenty of HTTP implementations, such as Apaches HTTP client 4.4 and beyond, check the status of a connection only when it is about to use it.
The handling of stale connections was changed in version 4.4. Previously, the code would check every connection by default before re-using it. The code now only checks the connection if the elapsed time since the last use of the connection exceeds the timeout that has been set. The default timeout is set to 2000ms (Source)
If a connection therefore might not have been used for some time the client may not have read the ACK-FIN from the server and therefore still think the connection is open when it actually got already closed by the server some time ago. Such a connection is expired and usually called half-closed. It therefore may be collected by the pool.
Note that if you send requests including a Connection: close HTTP header, the connection should be closed right after the client received the response.
The state of open connections can be checked via netstat which should be present on most modern operation systems. I recently had to check one of our HTTP clients which was managed through a third party library that did not propagate the Connection: Close header properly and therefore led to plenty of half-closed connections.
According to: https://hc.apache.org/httpcomponents-client-4.5.x/current/tutorial/html/connmgmt.html#d5e418
HttpClient tries to mitigate the problem by testing whether the
connection is 'stale', that is no longer valid because it was closed
on the server side, prior to using the connection for executing an
HTTP request. The stale connection check is not 100% reliable. The
only feasible solution that does not involve a one thread per socket
model for idle connections is a dedicated monitor thread used to evict
connections that are considered expired due to a long period of
inactivity. The monitor thread can periodically call
ClientConnectionManager#closeExpiredConnections() method to close all
expired connections and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.
The difference between expired and idle is that an expired connection has been closed on the server side, while the idle connection isn't necessarily closed on the server side, but it has been idle over a period of time. When a connection is closed, it becomes available again in the pool to be used.
In my batch application when I am sending requests across a network using a Web Service and Java, after running about 30000 requests and receiving the responses, the program throws a java.net.connectexception connection timed out exception.
I am using WildFly, along with some Java code in the middle to configure the requests (Strings) before sending it across the network.
After research the possible reasons I found for this is that there is either a Firewall blocking my access, which can't be true since it ran 90% of the requests already.
I've also seen somewhere that says that I could have overloaded the server, although I'm not sure what that means exactly.
You have filled up the server's listen backlog queue. This is caused by creating new connections faster than the server can accept them. You should look into connection pooling at the client, and handling multiple requests per connection at the server.
I was wondering if storing the ip address of a user into the handshake of a websocket would be a good way to protect my java ee server agains DDOS :
when the server receive an abnormal amount of connections, he switches to 'secure' mode, where, if a given connection request provides an ip address that is not known to the server (stored in database, first time connection), then I can simply refuse that connection.
could that help ? (My main concern is to protect my websocket server as much as possible. I've looked into the origin thingy but with no success so far.)
Thanks for the help !
Protection against DDoS must be at the network level (routing, balancing, switching, etc...). A server cannot do anything if a massive amount of request arrives to it. Even if the server is quickly dispatching them with errors, the channel is saturated and legit requests cannot reach the server, or they reach with very bad throughput. Put aside, that a DDoS can be done even with ICMP packets that are not even at TCP/UDP layer, but just IP layer, so a WebSocket server cannot do much about this.
Protection against DoS is related with the logic more than with the infrastructure. In essence, an attack vector that allows to hang your server. A practical example would be, if sending a malformed WebSocket request the thread that is dispatching sockets in your server dies or gets stuck, preventing the app from accepting more connections. To protect your server against DoS, check these kind of things.
I have written a Java socket server application which is giving me error if i run it for long time say 4-8hrs, below is the list of error i get:
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:130)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:153)
at java.io.BufferedReader.readLine(BufferedReader.java:316)
at java.io.BufferedReader.readLine(BufferedReader.java:379)
at LiveRate.processData(LiveRate.java:224)
at LiveRate.mainLiveRate(LiveRate.java:265)
at LiveRate.liveRate(LiveRate.java:126)
at LiveRate.run(LiveRate.java:119)
at java.lang.Thread.run(Thread.java:636)
My socket application reads some values from another TCP/IP server and stores the value temporarily and offers the same to other client.Not sure If these error are because of Heavyload on the system or because of the Memory issues.Please help
It is probably neither (directly) load or memory related. Instead, it is more likely to be one of the following:
the remote service is shut down / falls over and is restarted on a regular basis,
the remote service has decided to close its end of the connection because it is "idle",
network connectivity is intermittent and you are occasionally encountering an outage or congestion-induced "brownout" that is too long,
you are using NAT or similar, and the port number that was being used for the connection has been reclaimed by the NAT gateway, or
something is enforcing some policy about TCP/IP connections being open for too long.
The bottom line is that your client software needs to be able to cope with lost connections if you want ti to run for extended periods of time. This is the way that the internet works.
I'd say it's because your connection gets reseted by your Internet Provider every 24 hours.
I am using Jersey 1.4, the ApacheHttpClient, and the Apache MultiThreadedHttpConnectionManager class to manage connections. For the HttpConnectionManager, I set staleCheckingEnabled to true, maxConnectionsPerHost to 1000 and maxTotalConnections to 1000. Everything else is default. We are running in Tomcat and making connections out to multiple external hosts using the Jersey client.
I have noticed that after after a short period of time I will begin to see sockets in a CLOSE_WAIT state that are associated with the Tomcat process. Some monitoring with tcpdump shows that the external hosts appear to be closing the connection after some time but it's not getting closed on our end. Usually there is some data in the socket read queue, often 24 bytes. The connections are using https and the data seems to be encrypted so I'm not sure what it is.
I have checked to be sure that the ClientRequest objects that get created are closed. The sockets in CLOSE_WAIT do seem to get recycled and we're not running out of any resources, at least at this time. I'm not sure what's happening on the external servers.
My question is, is this normal and should I be concerned?
Thanks,
John
This is likely to be a device such as the firewall or the remote server timing out the TCP session. You can analyze packet captures of HTTPS using Wireshark as described on their SSL page:
http://wiki.wireshark.org/SSL
The staleCheckingEnabled flag only issues the check when you go to actually use the connection so you aren't using network resources (TCP sessions) when they aren't needed.