Best way to maintain huge amount of connections in Java

Best way to maintain huge amount of connections in Java - java

I'm wondering wich is best solution for maintaining huge amount of small TCP connections in multi thread application without locking after some time.
Assume, that we have to visit lot of http web sites (like ~200 000 on different domains, servers etc) in multi thread. Wich classes are the best to do safest connection (I mean most lock-resistance, not multi-threading lock but TCP connection that will "not react for anything"). Will HttpURLConnection & BufferedReader do the job with setted connection and read timeout ? I saw that when I was using simple solution:
URL url = new URL( xurl );
BufferedReader in = new BufferedReader( new InputStreamReader( url.openStream() ) );
All threads were locked/dead after 2-3 hours.
Is better to have constant threads like 10 running all-time and requesting URL's to take from main thread or better create one thread for each url and then kill it in some way if it will not respond after some time ? (how to kill sub-thread ?)

Well if it is going to be HTTP connection, I really doubt you can cache them. Because keeping the HTTP connection alive is not only at the client side, it requires the server side support too. Most of the time, the server will close the connection after the time out period (which is configured in the server). So, check what is the maximum time out configured at the server side and how long you want to keep the connection cached.

Related

What is the difference between thread per connection vs thread per request?

Can you please explain the two methodologies which has been implemented in various servlet implementations:
Thread per connection
Thread per request
Which of the above two strategies scales better and why?

Which of the above two strategies scales better and why?
Thread-per-request scales better than thread-per-connection.
Java threads are rather expensive, typically using a 1Mb memory segment each, whether they are active or idle. If you give each connection its own thread, the thread will typically sit idle between successive requests on the connection. Ultimately the framework needs to either stop accepting new connections ('cos it can't create any more threads) or start disconnecting old connections (which leads to connection churn if / when the user wakes up).
HTTP connection requires significantly less resources than a thread stack, although there is a limit of 64K open connections per IP address, due to the way that TCP/IP works.
By contrast, in the thread-per-request model, the thread is only associated while a request is being processed. That usually means that the service needs fewer threads to handle the same number of users. And since threads use significant resources, that means that the service will be more scalable.
(And note that thread-per-request does not mean that the framework has to close the TCP connection between HTTP request ...)
Having said that, the thread-per-request model is not ideal when there are long pauses during the processing of each request. (And it is especially non-ideal when the service uses the comet approach which involves keeping the reply stream open for a long time.) To support this, the Servlet 3.0 spec provides an "asynchronous servlet" mechanism which allows a servlet's request method to suspend its association with the current request thread. This releases the thread to go and process another request.
If the web application can be designed to use the "asynchronous" mechanism, it is likely to be more scalable than either thread-per-request or thread-per-connection.
FOLLOWUP
Let's assume a single webpage with 1000 images. This results in 1001 HTTP requests. Further let's assume HTTP persistent connections is used. With the TPR strategy, this will result in 1001 thread pool management operations (TPMO). With the TPC strategy, this will result in 1 TPMO... Now depending on the actual costs for a single TPMO, I can imagine scenarios where TPC may scale better then TPR.
I think there are some things you haven't considered:
The web browser faced with lots of URLs to fetch to complete a page may well open multiple connections.
With TPC and persistent connections, the thread has to wait for the client to receive the response and send the next request. This wait time could be significant if the network latency is high.
The server has no way of knowing when a given (persistent) connection can be closed. If the browser doesn't close it, it could "linger", tying down the TPC thread until the server times out the connection.
The TPMO overheads are not huge, especially when you separate the pool overheads from the context switch overheads. (You need to do that, since TPC is going to incur context switches on a persistent connections; see above.)
My feeling is that these factors are likely to outweigh the TPMO saving of having one thread dedicated to each connection.

HTTP 1.1 - Has support for persistent connections which means more than one request/response can be received/sent using the same HTTP connection.
So to run those requests received using the same connection in parallel a new Thread is created for each request.
HTTP 1.0 - In this version only one request was received using the connection and the connection was closed after sending the response. So only one thread was created for one connection.

Thread per connection is the Concept of reusing the same HTTP Connection from multiple requests(keep-alive).
Thread per requestwill create a thread for each request from a client.Server can create a number of threads as per request.

Thread per request will create a thread for each HTTP Request the server receives .
Thread per connection will reuse the same HTTP Connection from multiple requests(keep-alive) .AKA HTTP persistent connection
but please note that this supported only from HTTP 1.1
Thread Per Request is faster as most web container use Thread Pooling.
The number of maximum parallel connections you should set on the number of cores on your server.
More cores => more parallel threads .
See here how to configure...
Tomcat 6: http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html
Tomcat 7: http://tomcat.apache.org/tomcat-7.0-doc/config/executor.html
Example

Thread per request should be better, because it reuses threads, while some client may be idle. If you have a lot of concurrent users, they could be served with a less number of threads and having equal number of thread would be more expensive. There is also one more consideration - we do not know if user is still working with application, so we cant know when to destroy a thread. With a thread per request mechanism, we just use a thread pool.

Handling multiple HTTP connections

I finished coding a java application that uses 25 different threads, each thread is an infinite loop where an http request is sent and the json object(small one) that is returned is processed. It is crucial that the time between two requests sent by a specific thread is less than 500ms. However, I did some benchmark on my program and that time is well over 1000ms. SO my question is: Is there a better way to handle multiple connections other than creating multiple threads ?
I am in desperate need for help so I'm thankful for any advice you may have !
PS: I have a decent internet connection ( my ping to the destination server of the requests is about 120ms).

I'd suggest looking at Apache HttpClient:
Specifically, you'll be interested in constructing a client that has a pooling connection manager. You can then leverage the same client.
PoolingClientConnectionManager connectionManager = new PoolingClientConnectionManager();
connectionManager.setMaxTotal(number);
HttpClient client = new DefaultHttpClient(connectionManager);
Here's a specific example that handles your use-case:
PoolingConnectionManager example

making multiple http request efficiently

I want to make a few million http request to web service of the form-
htp://(some ip)//{id}
I have the list of ids with me.
Simple calculation has shown that my java code will take around 4-5 hours to get the data from the api
The code is
URL getUrl = new URL("http url");
URLConnection conn = getUrl.openConnection();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuffer sbGet = new StringBuffer();
String getline;
while ((getline = rd.readLine()) != null)
{
sbGet.append(getline);
}
rd.close();
String getResponse = sbGet.toString();
Is there a way to more efficiently make such requests which will take less time

One way is to use an executor service with a fixed thread pool (the size depends how much the target HTTP service can handle) and bombard requests to the service in parallel. The Runnable would basically perform the steps you outlined in your sample code, btw.

You need to profile your code before you start optimizing it. Otherwise you may end up optimizing the wrong part. Depending on the results you obtain from profiling consider the following options.
Change the protocol to allow you to batch the requests
Issue multiple requests in parallel (use multiple threads or execute multiple processes in parallel; see this article)
Cache previous results to reduce the number of requests
Compress the request or the response
Persist the HTTP connection

Is there a way to more efficiently make such requests which will take less time?
Well you probably could run a small number of requests in parallel, but you are likely to saturate the server. Beyond a certain number of requests per second, the throughput is likely to degrade ...
To get past that limit, you will need to redesign the server and/or the server's web API. For instance:
Changing your web API to allows a client to fetch a number of objects in each request will reduce the request overheads.
Compression could help, but you are trading off network bandwidth for CPU time and/or latency. If you have a fast, end-to-end network then compression might actually slow things down.
Caching helps in general, but probably not in your use-case. (You are requesting each object just once ...)
Using persistent HTTP connections avoids the overhead of creating a new TCP/IP connection for each request, but I don't think you can't do this for HTTPS. (And that's a shame because HTTPS connection establishment is considerably more expensive.)

Why should I use NIO for TCP multiplayer gaming instead of simple sockets (IO) - or: where is the block?

I'm trying to create a simple multiplayer game for Android devices. I have been thinking about the netcode and read now a lot of pages about Sockets. The Android application will only be a client and connect only to one server.
Almost everywhere (here too) you get the recommendation to use NIO or a framework which uses NIO, because of the "blocking".
I'm trying to understand what the problem of a simple socket implementation is, so I created a simple test to try it out:
My main application:
[...]
Socket clientSocket = new Socket( "127.0.0.1", 2593 );
new Thread(new PacketReader(clientSocket)).start();
PrintStream os = new PrintStream( clientSocket.getOutputStream() );
os.println( "kakapipipopo" );
[...]
The PacketReader Thread:
class PacketReader implements Runnable
{
Socket m_Socket;
BufferedReader m_Reader;
PacketReader(Socket socket)
{
m_Reader = new BufferedReader(new InputStreamReader(socket.getInputStream()));
}
public void run()
{
char[] buffer = new char[200];
int count = 0;
while(true)
{
count = m_Reader.read(buffer, 0, 200);
String message = new String(buffer, 0, count);
Gdx.app.log("keks", nachricht);
}
}
}
I couldn't experience the blocking problems I should get. I thought the read() function will block my application and I couldn't do anything - but everything worked just fine.
I have been thinking: What if I just create a input and output buffer in my application and create two threads which will write and read to the socket from my two buffers? Would this work?
If yes - why does everyone recommend NIO? Somewhere in the normal IO way there must a block happen, but I can't find it.
Are there maybe any other benifits of using NIO for Android multiplayer gaming? I thought that NIO seems to be more complex, therefore maybe less suited for a mobile device, but maybe the simple socket way is worse for a mobile device.
I would be very happy if someone could tell me where the problem happens. I'm not scared of NIO, but at least I would like to find out why I'm using it :D
Greetings
-Thomas

The blocking is, the read() will block current thread until it can read data from socket's input stream. Thus, you need a thread dedicate on that single TCP connection.
What if you have more than 10k client devices connected with your server? You need at least 10k threads to handle all client devices (assume each device maintain a single TCP connection) no matter they are active or not. You have too much overhead on context switch and other multi-threads overhead even only 100 of them are active.
The NIO use a selector model to handle those clients, means you don't need a dedicate thread for each TCP connection to receive data. You just need to select all active connections (which has data already received) and to process those active connections. You can control how many threads should be maintained in server side.

EDIT
This answer is kind of not exactly answering about what OP asked. For client side its fine because the client is going to connect to just one server. Although my answer gives some generic idea about Blocking and Non Blocking IO.
I know this answer is coming 3 years later but this is something which might help someone in future.
In a Blocking Socket model, if data is not available for reading or if the server is not ready for writing then the network thread will wait on a request to read from or write to a socket until it either gets or sends the data or
times out. In other words, the program may halt at that point for quite some time if it can't proceed. To cancel this out we can create a thread per connection which can handle out requests from each client concurrently. If this approach is chosen, the scalability of the application can suffer because in a scenario where thousands of clients are connected, having thousands of threads can eat up all the memory of the system as threads are expensive to create and can affect the performance of the application.
In a Non-Blocking Socket model, the request to read or write on a socket returns immediately whether or not it was successful, in other words, asynchronously. This keeps the network thread busy. It is then our task to decide whether to try again or consider the read/write operation complete. Having this creates an event driven approach towards communication where we can create threads when needed and which leads to a more scalable system.
Diagram below explains the difference between Blocking and Non-Blocking socket model.

Keeping persistent connections from a home-made Java server

I've built a simple Java program that works as a server locally.
At the moment it does a few things, such as previews directories, forwards to index.html if directory contains it, sends Last-Modified header and responds properly to a client's If-Modifed-Since request.
What I need to do now is make my program accept persistent connections. It's threaded at the moment so that each connection has it's own thread. I want to put my entire thread code within a loop that continues until either Connection: close, or a specified timeout.
Does anybody have any ideas where to start?
Edit: This is a university project, and has to be done without the use of Frameworks.
I have a main method, which loops indefinitely, each time it loops it creates a Socket object, a HTTPThread object is then created (A class of my own creation) - that processes the single request.
I want to allow multiple requests to work within a single connection making use of the Connection: keep-alive request header. I expect to use a loop in my HTTPThread class, I'm just not sure how to pass multiple requests.
Thanks in advance :)

I assume that you are implementing the HTTP protocol code yourself starting with the Socket APIs. And that you are implementing the persistent connections part of the HTTP spec.
You can put the code in the loop as you propose, and use Socket.setSoTimeout to set the timeout on blocking operations, and hence your HTTP timeouts. You don't need to do anything to reuse the streams for your connection ... apart from not closing them.
I would point out that there are much easier ways to implement a web server. There are many existing Java web server frameworks and application servers, or you could repurpose the Apache HTTP protocol stacks.

If it should act like a web service: Open 2 sockets from the client side, one for requests, one for
responses. Keep the sockets and streams open.
You need to define a separator to notify the other side that a
transfer is over. A special bit string for a binary, a special
character (usually newline) for a text-based protocol (like XML).
If you really try to implement an own http-server, you should rather make use of a library that already implements the HTTP 1.1 connection-keepalive standard.

Some ideas to get you started:
This wikipedia article describes HTTP 1.1 persistent connections:
http://en.wikipedia.org/wiki/HTTP_persistent_connection
You want to not close the socket, but after some inactive time period (apache 2.2 uses 5 seconds) you want to close it.
You have two ways to implement:
in your thread do not close the socket and do not exit the thread, but instead put a read timeout on the socket (whatever you want to support). When you call read it will block and if the timeout expires then you close the socket, else you read next request. The downside of this is that each persistent connection holds both a thread and a socket for whatever your max wait period is. Meaning that your solution doesn't scale because you're holding threads for too long (but may be fine for the purposes of a school project)!
You can get around the limitation of (1) by maintaining a list of tuples {socket,timestamp}, having a background thread monitor and close connections that timeout, and using NIO to detect a new read on an existing open socket. So after you finish reading the initial request you just exit the thread (returning it to the thread pool). Obviously this is much more complicated but it has the benefit of freeing up request threads.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.