Shall I use persistent connections to upload files in intranet?

Shall I use persistent connections to upload files in intranet? - java

In intranet, the network is good
Server A will send lots of files to server B by http service at the same time
Http protocol is HTTP 1.1, which uses persistent connection by default
[update] Use a connection pool to hold 100 connections
[update] One connection sends a file at one time
[update] Onnection will not be closed(persistent connection), and will be reused to send next file
Each file has size of 7K to 30K
Question:
In the above condition, will persistent connection have better performance than non-persistent connections?
I ask this question because we found the connections would be blocked for a huge long time when upload files. I suggest to use non-persistent connection, since I think it's more stable, but my colleage inisit to use persistent connection, because he think persistent has better performance.
UPDATE
See the updated question, thank you ~

In HTTP 1.1, your persistent connection allows pipe-lining but not parallelism. (See RFC 2616) That means that if you share your connection among 100 threads and each one sends a file, you will send those 100 files one at a time (in some ordering) and receive responses for each file in the order it was sent. You are not getting any advantage out of sending on 100 threads, because they're just lining up to send and receive one at a time.
You may be able to send faster using multiple connections, because that would allow them to actually run in parallel. But this is dependent on lots of other factors. Depending on your network, setting up and tearing down 100 connections may be slower than pipe-lining through one connection. Also, the server may not appreciate you opening 100 separate connections. Worse, the server may reject you only some of the time, which is a big headache.
I suggest taking the middle road: open, say, 5 persistent connections (using only 5 threads) and send 20 documents down each connection. HTTPClient has a BasicConnPool to do this sort of thing, though it may be too basic for your needs.

Related

Is it a good idea to destroy sockets after a single use?

I've been looking into making a simple Sockets-based game in Java, and read in multiple places that client sockets are destroyed after a single exchange. Is this good practice for continued connections? The server needs to maintain a connection with a client (i.e. not using socket.accept() every time it wants to tell a client about something), but can't wait every time for the client's response. I already have the server/client running in separate threads, but won't destroying the socket after every exchange mean re-acquiring (or failing to re-acquire) a connection to that client? I've seen so many conflicting websites about sockets in Java and how they should be implemented.

There's no hard and fast rules, but it does depend slightly on what data rates you want to achieve.
For example, YouTube is a streaming video service, but the video data is delivered by means of the client using https to fetch batches of video data. Inefficient, yes, but very easy to program for. There's lots of reasons to use https for an application like YouTube (firewalls, etc), but ultimate power saving and network performance were not one of them. The "proper" way would be to use a protocol like RTP which uses UDP to deliver small packets of data which can then be rearranged into order, you also have to deal with missing frames at the CODEC level, etc. Much less network traffic, friendly to bandwidth constrained network links, but significantly more difficult to deal with traversing across firewalls, in client software, etc.
So if your game is sending modest amounts of data, the only thing wrong with setting up and tearing down a whole socket connection for every message is the nagging feeling you yourself will have that it is somehow not the most efficient solution.
Though it sounds like you have a conflict between the need to communicate between client / server and a need to process something else whilst waiting for the communication to complete. Here you're getting into asynchronous I/O territory. To make that easy i strongly suggest you take a look at ZeroMQ - that will make everything a whole lot simpler.

and read in multiple places that client sockets are destroyed after a single exchange.
Only in the places where that actually happens. There are numerous contexts where it doesn't, the outstanding example being HTTP, where every effort is made to reuse connections.
Is this good practice for continued connections?
The question is a contradiction in terms. A continued connection is a connection that isn't closed. A closed connection can't be continued.
The server needs to maintain a connection with a client (i.e. not using socket.accept() every time it wants to tell a client about something), but can't wait every time for the client's response.
The word you are groping for here is 'session'.
I already have the server/client running in separate threads, but won't destroying the socket after every exchange mean re-acquiring (or failing to re-acquire) a connection to that client?
Yes.
I've seen so many conflicting websites about sockets in Java and how they should be implemented.
You should use a connection pool at the client; a request loop at the server that looks for multiple requests per connection; a client-side facility that closes idle connections after some idle timeout; and a read timeout at the server that closes connections on which no request has been read within the timeout.

Apache Http Client and Load Balancers

After spending a few hours reading the Http Client documentation and source code I have decided that I should definitely ask for help here.
I have a load balancer server using a round-robin algorithm somewhat like this
+---> RESTServer1
client --> load balancer +---> RESTServer2
+---> RESTServer3
Our client is using HttpClient to direct requests to our load balancer server, which in turn round-robins the requests to the corresponding RESTServer.
Now, Apache HttpClient creates, by default, a pool of connections (2 per route by default). This connections are by default persistent connections since I am using Http v1.1 and my servers are emitting Connection: Keep-Alive headers.
So, the problems is that since HttpClient creates this persistent connections, then those connections are no longer subject to round-robing algorithm at the balancer level. They always hit the same server every time.
This creates two problems:
I can see that sometimes one or more of the balanced servers are overloaded with traffic, whereas one ore more of the other servers are idle; and
even if I take one of my REST servers out of the balancer, it stills receives requests while the persistent connections are alive.
Definitely this is not the intended behavior.
I suppose I could force a Connection: close header in my responses, or I could run HttpClient without a connection pool or with a NoConnectionReuseStrategy. But the documentation for HttpClient states that the idea behind the use of a pool is to improve performance by avoiding having to open a socket every time and doing all the TPC handshaking and related stuff. So, I have to conclude that the use of a connection pool is beneficial to the performance of my applications.
So my question here, is there a way to use persistent connections with a load-balancer in the way or am I forced to use non-persistent connections for this scenario?
I want the performance that comes with reusing connections, but I want them properly load-balanced. Any thoughts on how I can configure this scenario with Apache Http Client if at all possible?

Your question is perhaps more related to your load balancer configuration and the style of load balancing. There are several ways:
HTTP Redirection
LB acts as a reverse proxy
Pure packet forwarding
In scenarios 1 and 3 you do not have a chance with persistent connections. If your load balancer acts like a reverse proxy, there might be a way to achieve persistent connections with balancing. "Dumb" balancers, like SMTP or LDAP selects the target per TCP connection, not on a request basis.
For example the Apache HTTPd server with the balancer module (see http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html) can dispatch every request (even on persistent connections) to a different server.
Also check, that you do not receive a balancer cookie which might be session persistent so that the cause is not the persistent connection but a balancer cookie.
HTH, Mark

+1 to #mp911de answer
One can also make the scenarios 1 and 3 work reasonably well by limiting the total time to live of persistent connections to some short period time, say 15 seconds. This way connections would live long enough to get re-used during periods of activity and short enough to go away during periods of relative inactivity.

How many requests can handle a port at 'a' time

I am creating a web application having a login page , where number of users can tries to login at same time. so here I need to handle number of requests at a time.
I know this is already implemented for number of popular sites like G talk.
So I have some questions in my mind.
"How many requests can a port handle at a time ?"
How many sockets can I(server) create ? is there any limitations?
For e.g . As we know when we implement client server communication using Socket programming(TCP), we pass 'a port number(unreserved port number)to server for creating a socket .
So I mean to say if 100000 requests came at a single time then what will be approach of port to these all requests.
Is he manitains some queue for all these requests , or he just accepts number of requests as per his limit? if yes what is handling request limit size of port ?
Summary:
I want to know how server serves multiple requests simultaneously?I don't know any thing about it. I know we are connection to a server via its ip address and port number that's it.
So I thought there is only one port and many request come to that port only via different clients so how server manages all the requests?
This is all I want to know. If you explain this concept in detail it would be very helpful. Thanks any way.

A port doesn't handle requests, it receives packets. Depending on the implementation of the server this packets may be handled by one or more processes / threads, so this is unlimited theoretically. But you'll always be limited by bandwith and processing performance.
If lots of packets arrive at one port and cannot be handled in a timely manner they will be buffered (by the server, the operating system or hardware). If those buffers are full, the congestion maybe handled by network components (routers, switches) and the protocols the network traffic is based on. TCP for example has some methods to avoid or control congestion: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Congestion_control

This is typically configured in the application/web server you are using. How you limit the number of concurrent requests is by limiting the number of parallel worker threads you allow the server to spawn to serve requests. If more requests come in than there are available threads to handle them, they will start to queue up. This is the second thing you typically configure, the socket back-log size. When the back-log is full, the server will start responding with "connection refused" when new requests come in.

Then you'll probably be restricted by number of File Descriptors your os supports (in case of *nix) or the number of simultaneous connections your webserver supports. The OS maximum on my machine seems to be 75000.

100,000 concurrent connections should be easily possible in Java if you use something like Netty.
You need to be able to:
Accept incoming connections fast enough. The NIO framework helps enormously here, which is what Netty uses internally. There is a smallish queue for incoming requests, so you need to be able to handle these faster than the queue can fill up.
Create connections for each client (this implies some memory overhead for things like connection info, buffers etc.) - you may need to tweak your VM settings to have enough free memory for all the connections
See this article from 2009 where they discuss achieving 100,000 concurrent connections with about 20% CPU usage on a quad-core server.

How could I quickly know if a server is online?

I'm developing a Java client/server application in which there will be a great number of servers with which the clients are going to have to connect. The problem is that probably the vast majority of them will not be serving at the same time. The client needs to find at least one available in the list, so it will iterate it, looking for an available server (when it finds the first it stops, one is enough).
The problem is that the list will probably be long, tens of zousands, they could be even hundreds... and it may happen that only 1% of them are connected (i.e. executing the server). That's why I need a clever and a fast way to know if a server is connected, without waiting for time-outs or so. I accept all kinds of suggestions.
I have thought about ordering the server list statistically, so that the servers that are available more often are the first hosts attempted. But this is not enough.
Perhaps multicasting UDP datagrams? The connections between clients/servers are TCP, but perhaps to find a server it's better to do an UDP multicast first and wait for the answer, for example... what do you think?
:)
EDIT:
Both the server and client use thread pools.
The server pool handles 200 threads concurrently, and when the pool is full, queues the rest until the queue is 200 runnables long. Then it blocks, and stop accepting connections until there is free room in the queue again.
The client has a cached thread pool, it can make all the request to the server you want concurrently (with common sense, obviously...).

This is just an initial thought and would add some over head, but you could have the servers periodically ping some centralized server which the clients would connect through. Then if the server doesn't ping for some set time it gets removed.

You might want to use a peer-to-peer network.
Have a look at JXTA/JXSE:
http://jxse.kenai.com/index.html

If it is your own code which is running on each of these servers, could you send an alive to a central server (which is controlled by you and is guaranteed to be up at all times)? The central server can then maintain an updated list of all servers which are active. The client just needs a copy of this list from the central server and then start whatever communication it needs.

Sounds like a job for Threads. You cannot speed up the connection, it takes time to contact the server.
IMHO, the best way is to get few hundred Threads to march through the list of servers. The first one to find one server alive wins. Then signal other threads to die out.
Btw, did you really mean to order the server list "sadistically"? :)

Client Server communication in Java - which approach to use?

I have a typical client server communication - Client sends data to the server, server processes that, and returns data to the client. The problem is that the process operation can take quite some time - order of magnitude - minutes. There are a few approaches that could be used to solve this.
Establish a connection, and keep it alive, until the operation is finished and the client receives the response.
Establish connection, send data, close the connection. Now the processing takes place and once it is finished the server could establish a connection to the client to send the data.
Establish a connection, send data, close the connection. Processing takes place. client asks server, every n minutes/seconds if the operation is finished. If the processing is finished the client fetches the data.
I was wondering which approach would be the best way to use. Is there maybe some "de facto" standard for solving this problem? How "expensive" is opening a socket in Java? Solution 1. seems pretty nasty to me, but 2. and 3. could do. The problem with solution 2. is that the server needs to know on which port the client is listening, while solution 3. adds some network overhead.

is good enought
will not work at many situations, for example wne client is under firewall, NAT, and so on. Server usually accepts incoming connections from everywhere, desktops usualy not
better than 1 just because you will haven't problems when connection is lost
solutions 1+3 - make long waiting connections, with periodical sleep and reconnect after. I mean: connect to server, wait 30 sec for data, if no data received, sleep for 10 sec, loop.
Opening sockets is sometimes expensive, but not so expensive that your data processing.

I see an immediate problem with option 2. If the client is behind a firewall, he might very well be allowed to connect and do the request, but the server might be prevented to connect back to the cilent.
As you say, option 1 looks a bit nasty (not too nasty though, could work well), so among the options listed, I would go for option 3. Perhaps the server could estimate the time that's left of the processing, and hint the client, in each poll, of when it's about time to check back.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.