I have a network server that has a few dozen backend controllers, each which process a different user request (i.e., user clicking something on the website).
Each controller makes network calls to a handful of services to get the data it needs. Each network call takes somewhere around 200ms. These network calls can be done in parallel, so I want to launch one thread for each of them and then collect at the end - maximizing parallelization. (5 network calls in parallel takes 200ms, where 5 in sequence will take 1000ms).
However I am unsure of best practice to design the thread management strategy.
Should I have one threadpool with say 1000 (arbitrary number for example) threads in it, and each controller draws from that pool?
Should I not have a threadpool at all and create new threads in each controller as I need them? This option seems "dumb" but I wonder - how much is the cost in CPU cycles of creating a thread, compared to waiting for network response? Quite minimal.
Should I have one threadpool per controller? (Meaning dozens of threadpools, each with around 5 or 6 threads for that specific controller).
Seeking pros / cons of each strategy, best practices, or an alternate strategy I haven't considered.
Related
I am creating an application in which I have to download thousands of images (~1 MB each) using Java.
I take a list of Album URLs in my REST request, each Album contains multiple number of images.
So my request looks something like:
[
"www.abc.xyz/album1",
"www.abc.xyz/album2",
"www.abc.xyz/album3",
"www.abc.xyz/album4",
"www.abc.xyz/album5"
]
Suppose each of these albums have 1000 images, so I need to download 50000 images in parallel.
Right now I have implemented it using parallelStream() but I feel that I can optimize it further.
There are two principle classes - AlbumDownloader and ImageDownloader (Spring components).
So the main application creates a parallelStream() on the list of albums:
albumData.parallelStream().forEach(ad -> albumDownloader.downloadAlbum(ad));
And a parallelStream() inside AlbumDownloader -> downloadAlbum() method as well:
List<Boolean> downloadStatus = albumData.getImageDownloadData().parallelStream().map(idd -> imageDownloader.downloadImage(idd)).collect(Collectors.toList());
I am thinking about using CompletableFuture with ExecutorService but I am not sure what pool size should I use?
Should I create a separate pool for each Album?
ExecutorService executor = Executors.newFixedThreadPool(Math.min(albumData.getImageDownloadData().size(), 1000));
That would create 5 different pools of 1000 threads each, that'll be like 5000 threads which might degrade the performance instead of improving.
Could you please give me some ideas to make it very very fast ?
I am using Apache Commons IO FileUtils to download files by the way and I have a machine with 12 available CPU cores.
Suppose each of these albums have 1000 images, so I need to download 50000 images in parallel.
It's wrong to think of your application doing 50000 things in parallel. What you are trying to do is to optimize your throughput – you are trying to download all of the images in the shortest amount of time.
You should try one fixed-sized thread-pool and then play around with the number of threads in the pool until your optimize your throughput – maybe start with double the number of processors. If your application is mostly waiting for network or the server then maybe you can increase the number of threads in the pool but you wouldn't want to overload the server so that it slows to a crawl and you wouldn't want to thrash your application with a huge number of threads.
That would create 5 different pools of 1000 threads each, that'll be like 5000 threads which might degrade the performance instead of improving.
I see no point in multiple pools unless there are different servers for each album or some other reason why the downloads from each album are different.
The only way to make it "very very fast" is to get a "very very fast" network connection to the server; e.g. co-locate your client with the server that you are downloading from.
Your download speeds are going to be constrained by a number of potential bottlenecks. These include:
The performance of the server; i.e. how fast it can assemble the data to send to you and push it through its network interface.
Per-user request limits imposed by the service.
The end-to-end performance of the network path between your client and the server.
The performance of the machine you are running on in terms of moving data from the network and putting it (I guess) onto your local disk.
The bottleneck could be any of these, or a combination of them.
Throwing thousands of threads at the problem is unlikely to improve things. Indeed, if anything it is likely to make performance less than optimal. For example:
it could congest your network link, or
it could trigger anti-hogging or anti-DOS defenses in the server you are fetching from.
A better (simple) idea would be to use an ExecutorService with a small bounded worker pool, and submit the downloads to the pool as tasks.
Other things:
Try to keep HTTP / HTTPS connections open between downloads from the same server. Some client libraries will do this kind of thing for you.
If you have to download from a number of different servers, try to balance the load across the servers. Consider implementing per-server queues and trying to balance work so that individual servers don't see "bursts" of activity.
I would also advise you to make sure that you have permission to do what you are doing. Companies in the music publishing business have good lawyers. They could make your life unpleasant1 if they perceive you to be violating their terms and conditions or stealing their intellectual property.
1 - Like blocking your IP address or issuing take-down requests to your service provider.
Angular 4 application sends a list of records to a Java spring MVC application that has been deployed in Websphere 8 Servlet container. The list is then inserted into to a temp table. After the batch insert, a procedure call is made in order to do some calculations and return results. Depending on the size of the list that was inserted into temp table it may take anywhere between: 3000ms( N ~ 500 ), 6000ms( N ~ 1000 ), 50,000+ms ( N > 2000 ).
My asendach would be to create chunks of data and simultaneously send them to database for processing. After threads (Futures) return results I would aggregate them and return back to the client. To sum up, I would split a synchronous call into multiple asynchronous processes(simultaneously executed) and return back to the client over the same thread that initiated HTTP call - landed into my controller.
Everything would be fine and I would not be asking this questions if a more experienced colleague of mine was not strongly disagreeing with this approach. His reasoning is that using this approach is prone to exceptions due to thread interrupts / timeouts / semaphores and so on. Hi is going as far as saying that multithreading should be avoided within a web container because it can crash the Servlet container in case it runs out of threads.
He proposes that we should have the browser send multiple AJAX requests and aggregates/present data in chunks.
Can you please help me understand which approach is better and why?
I would say that your approach is much better.
Threads created by application logic aren't application container threads and limited only by operating system. While each AJAX request uses a thread from application container. So the second approach reduces throughput and increases the possibility of reaching application container limit while and the first one not. Performance also should be considered because it's much cheaper to create a thread than to send a request over network. Plus each network requests uses additional resources for authentication/authorization/encryption etc.
It's definetely harder to write correct multithread code and it can easily prone to errors. However it shouldn't stop you from doing it because concurrency can significantly increase your performance. It's pretty straightforward to handle interrupts and timeouts using Future and you for sure don't need semaphores here.
Exposing this logic to client looks like breaking of encapsulation. Imagine that you use rest api which forces you to send multiple request by splitting you data in chunks. What chunk size should i use? How to deal with timeouts/interrupts? How many requests should i sent? etc. You will have almost the same challenges in both approaches, but it's much easier to deal with them using specially designed for this libraries like ExecutorService and Future.
This question already has answers here:
Does multi-threading improve performance? How?
(2 answers)
Closed 8 years ago.
I have a List<Object> objectsToProcess.Lets say it contains 1000000 item`s. For all items in the array you then process each one like this :
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
My question is : would multi threading improve performance? I would of thought that multi threads are allocated by default by the processor anyways?
In the described scenario, given that process is a time-consuming task, and given that the CPU has more than one core, multi-threading will indeed improve the performance.
The processor is not the one who allocates the threads. The processor is the one who provides the resources (virtual CPUs / virtual processors) that can be used by threads by providing more than one execution unit / execution context. Programs need to create multiple threads themselves in order to utilize multiple CPU cores at the same time.
The two major reasons for multi-threading are:
Making use of multiple CPU cores which would otherwise be unused or at least not contribute to reducing the time it takes to solve a given problem - if the problem can be divided into subproblems which can be processed independently of each other (parallelization possible).
Making the program act and react on multiple things at the same time (i.e. Event Thread vs. Swing Worker).
There are programming languages and execution environments in which threads will be created automatically in order to process problems that can be parallelized. Java is not (yet) one of them, but since Java 8 it's on a good way to that, and Java 9 maybe will bring even more.
Usually you do not want significantly more threads than the CPU provides CPU cores, for the simple reason that thread-switching and thread-synchronization is overhead that slows down.
The package java.util.concurrent provides many classes that help with typical problems of multithreading. What you want is an ExecutorService to which you assign the tasks that should be run and completed in parallel. The class Executors provides factor methods for creating popular types of ExecutorServices. If your problem just needs to be solved in parallel, you might want to go for Executors.newCachedThreadPool(). If your problem is urgent, you might want to go for Executors.newWorkStealingPool().
Your code thus could look like this:
final ExecutorService service = Executors.newWorkStealingPool();
for (final Object object : objectsToProcess) {
service.submit(() -> {
Go to database retrieve data.
process
save data
}
});
}
Please note that the sequence in which the objects would be processed is no longer guaranteed if you go for this approach of multithreading.
If your objectsToProcess are something which can provide a parallel stream, you could also do this:
objectsToProcess.parallelStream().forEach(object -> {
Go to database retrieve data.
process
save data
});
This will leave the decisions about how to handle the threads to the VM, which often will be better than implementing the multi-threading ourselves.
Further reading:
http://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#executing_streams_in_parallel
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/package-summary.html
Depends on where the time is spent.
If you have a load of calculations to do then allocating work to more threads can help, as you say each thread may execute on a separate CPU. In such a situation there is no value in having more threads than CPUs. As Corbin says you have to figure out how to split the work across the threads and have responsibility for starting the threads, waiting for completion and aggregating the results.
If, as in your case, you are waiting for a database then there can be additional value in using threads. A database can serve several requests in paraallel (the database server itself is multi-threaded) so instead of coding
for(Object : objectsToProcess){
Go to database retrieve data.
process
save data
}
Where you wait for each response before issuing the next, you want to have several worker threads each performing
Go to database retrieve data.
process
save data
Then you get better throughput. The trick though is not to have too many worker threads. Several reasons for that:
Each thread is uses some resources, it has it's own stack, its own
connection to the database. You would not want 10,000 such threads.
Each request uses resources on the server, each connection uses memory, each database server will only serve so many requests in parallel. You have no benefit in submitting thousands of simultaneous requests if it can only server tens of them in parallel. Also If the database is shared you probably don't want to saturate the database with your requests, you need to be a "good citizen".
Net: you will almost certainly get benefit by having a number of worker threads. The number of threads that helps will be determined by factors such as the number of CPUs you have and the ratio between the amount of processing you do and the response time from the DB. You can only really determine that by experiment, so make the number of threads configurable and investigate. Start with say 5, then 10. Keep your eye on the load on the DB as you increase the number of threads.
I've searched the site a bit for help understanding this, but haven't found anything super clear, so I thought I'd post my use case and see if anybody could shed some light.
I have a question about the scaling of jvm threads vs os threads when used in akka for io operations. From the akka site:
Akka supports dispatchers for both event-driven lightweight threads, allowing creation of millions threads on a single workstation, and thread-based Actors, where each dispatcher is bound to a dedicated OS thread.
The event-based Actors currently consume ~600 bytes per Actor which means that you can create more than 6.5 million Actors on 4 G RAM.
In this context, can you all help me understand how that matters on a workstation with only 1 processor (for simplicity). So, for my example use case, I want to take a list of say 1000 'Users' and then go query a database (or several) for various information about each user. So if I were to dispatch each of these 'get' tasks to an actor, and that actor is going to do IO, wouldn't that actor block based on the os thread limit for the workstation?
How does the akka actor model give me lift in a scenario like this? I know that I am probably missing something as I am not wildly knowledgeable on the interworkings of vm threads vs os threads, so if one of the smart folks here could spell it out for me, that would be great.
If I use Futures, don't I need to use await() or get() to block and wait for the reply?
In my use case, regardless of actors, would it end up just 'feeling' like I'm making 1000 sequential database requests?
If code snips are useful in helping me understand this, Java would be preferred as I am still coming up to speed on scala syntax - but a nice clear textual explanation of how these millions of threads can interoperate on a single processor machine while doing database IO would be fine too.
It is really hard to figure out what you are actually asking here, but here are some pointers:
If you are running on a modern JVM, there is typically a one-to-one relationship between Java threads and OS threads. (IIRC, Solaris allows you to do this differently ... but that's the exception.)
The amount of real parallelism you will get using threads, or anything built on top of threads is limited by the number of processors / cores that are available to the application. Beyond that, you will find that not all threads are actually executing at any given instant.
If you have 1000 Actors all trying to access the database "at the same time", then most of them will actually be waiting on the database itself, or on the thread scheduler. Whether this amounts to making 1000 sequential requests (i.e. strict serialization) will depend on the database and the queries / updates that the actors are doing.
The bottom line is that a computer system has hard limits on the resources available for doing stuff; e.g. number of processors, speed of processors, memory bandwidth, disc access times, network bandwidth, etc. You can design an application to be smart about the way it uses available resources, but you can't get it to use more resources than there actually are.
On reading the text that you quoted, it seems to me that it is talking about two different kinds of actors:
Thread-based actors have a 1 to 1 relationship with threads. There's no way you could have millions of this kind of actor in 4Gb memory.
Event-based actors work differently. Instead of having a thread at all times, they would mostly be sitting in a queue waiting for an event to happen. When that happened, an event processing thread would grab the actor from the queue and execute the "action" associated with the event. When the action finished, the thread moves onto another actor / event pair.
The quoted text is saying that the memory overhead of an event-based actor is ~600 bytes. They don't include the event thread ... because the event thread is shared by multiple actors.
Now I'm not an expert on Scala / Actors, but it is pretty obvious that there are certain things that you should avoid when using event-based actors. For instance, you should probably avoid talking directly to an external database because that is liable to block the event processing thread.
I think there may be a typo there. I think they meant to say:
Akka supports dispatchers for both event-driven lightweight actors,
allowing creation of millions actors on a single workstation, and thread-based Actors, where each actor is bound to a dedicated OS thread.
The event-driven actors use a thread pool - all of the (potentially millions of) actors share the same pool of threads. I'm not that familiar with Akka actors but generally you would not want to do blocking I/O with event-driven actors, otherwise you could cause starvation.
I am troubled with the following concept:
Most books/docs describe how robust servers are multithreaded and that the most common approach is to start a new thread to serve each new client. E.g. a thread is dedicated to each new connection. But how is this actually implemented in big systems? If we have a server that accepts requests from 100000 clients, it has started 100000 threads? Is this realistic? Aren't there limits on how many threads can run in a server? Additionally the overhead of context switching and synchronization, doesn't it degrade performance? Is it implemented as a mix of queues and threads? In this case is the number of queues fixed? Can anybody enlighten me on this, and perhaps give me a good reference that describes these?
Thanks!
The common method is to use thread pools. A thread pool is a collection of already created threads. When a new request gets to the server it is assigned a spare thread from the pool. When the request is handled, the thread is returned to the pool.
The number of threads in a pool is configured depending on the characteristics of the application. For example, if you have an application that is CPU bound you will not want too many threads since context switches will decrease performance. On the other hand, if you have a DB or IO bound application you want more threads since much time is spent waiting. Hence, more threads will utilize the CPU better.
Google "thread pools" and you will for sure find much to read about the concept.
Also Read up on the SEDA pattern link , link
In addition to the answers above I should notice, that really high-performance servers with many incoming connections attempt not to spawn a thread per each connection but use IO Completion Ports, select() and other asynchronous techniques for working with multiple sockets in one thread. And of course special attention must be paid to ensure that problems with one request or one socket won't block other sockets in the same thread.
Also thread management consumes CPU time, so threads should not be spawned for each connection or each client request.
In most systems a thread pool is used. This is a pool of available threads that wait for incoming requests. The number of threads can grow to a configured maximum number, depending on the number of simultaneous requests that come in and the characteristics of the application.
If a requests arrives, an unoccupied thread is requested from the thread pool. This thread is then dedicated to handling the request until the request finishes. When that happens, the thread is returned to the thread pool to handle another request.
Since there is only a limited number of threads, in most server systems one should attempt to make the lifetime of requests as short as possible. The less time a request needs to execute, the sooner a thread can be reused for a new request.
If requests come in while all threads are occupied, most servers implement a queueing mechanism for requests. Of course the size of the queue is also limited, so when more requests arrive than can be queued, new requests will be denied.
One other reason for having a thread pool instead of starting threads for each request is that starting a new thread is an expensive operation. It's better to have a number of threads started beforehand and reusing them then starting new threads all the time.
To get network servers to handle lots of concurrent connections there are several approaches (mostly divided up in "one thread per connection" and "several connections per thread" categories), take a look at the C10K page, which is a great resource on this topic, discussing and comparing a lot of approaches and linking to further resources on them.
Creating 10k threads is not likely to be efficient in most cases, but can be done and would work.
If you needed to serve 10k clients at once, doing so on a single machine would be unlikely but possible.
Depending on the client side implementation, it may be that the 10,000 clients do not need to maintain an open TCP connection - depending on the purpose, the protocol design can greatly improve the efficiency of implementation.
I think the appropriate solution for high scale systems is probably extremely domain-specific, and if you wanted a suggestion you'd have to explain more about your problem domain.