How to determine max number of threads avialable in Java-VM? - java

I want to determine the max number of threads I am able to create for my sort algoritm. I want to use java.lang.Runtime for that.
I want to count the current thread amount and stop creating new threads when the limit is reached.

The max number of threads for a JVM is generally somewhere in the thousands. If you're using multiple threads to optimize a computational algorithm you don't really want more than the number of processors in the system its run on. Use Runtime.getRuntime().availableProcessors() to find out.

I'd suggest you look at the problem from a slightly different angle...
what I mean is that if you're asking
"what's the maximum number of threads I can create and use for task x?"
and you're asking that because you think the most threads you use the better, you could instead ask
"how many threads would I need to complete x most efficiently?"
its very unlikely that task x will perform better with thousands of threads on a dual core machine for example. As a general guide for example, for CPU bound tasks (which a sort algorithm probably is), the optimal number of threads is
threads = number of CPUs + 1
see How to find out the optimal amount of threads?
The actual maximum number of threads depends on the OS and JVM along with how much memory is configured for use within the JVM. Creating a new thread takes some OS memory but not Java' heap memory. This means that if you create thousands of threads, you might run out of memory which you can't bump up using -Xmx. In fact, the less Java heap memory you allocate, the more native memory is available and so the more threads you can create.
See this article and the comment to get a feel for how you can work out the max with pen and paper.
Having said all that, if you want to use an unlimited number of threads in your application, you can use the `Executors.newCachedThreadPool()' which will create a pool of threads, creating as many as are needed. I don't recommend using this type of pool for everyday usage, but its related to your original question.
Hope that helps.

You can see how many threads you have by attaching visualvm to your process. I recommend version 1.3.2 with all the plugins downloaded.
I'd also recommend using Spring and its Executor pools. It's a great way to be able to configure the size of the thread pool.

Related

Scheduling memory-bound tasks in java

Suppose I have a large batch of memory-bound tasks that are quite independent of one another. To make things concrete, let's say I can allocate 30GB for the heap and that each task requires on average about 3GB of memory at its peak, but with some variability both over time and from task to task. A few tasks here and there might even require 6GB.
In this case, it seems more efficient to try to run 10 (or arguably even more) tasks concurrently, and if / when we bump into the memory limit have the task wait, much the same as we do with other shared resources like I/O, specific memory addresses (which are accessed through locks), etc.
Is it possible do this in Java? More generally
What's the best way to handle memory-bound task scheduling in Java?
Some Related Questions and "Close Misses"
This question asks whether it's possible to have threads in java wait for memory instead of throwing an OOM exception, but the answers seem to focus on why this is a bad idea to begin with - perhaps because the question suggests the number of threads is unreasonable. Also, I guess treating all memory requests as equal can lead to deadlocks. So I want to emphasize that here we are talking about only about 10 tasks, and the desire to "max out" the memory usage seems like a very natural one. I do not mind wrapping my tasks by some suitable logic that will distinguish their memory requests as having lower priority. I can even accept a solution where I need to identify the class whose instances are filling up the memory and maybe add some suitable counter - but I'd prefer a platform-independent solution that works "out of the box", if there is one.
This question also also asks about scheduling memory-bound tasks but seems to presuppose a specific solution framework.
The problem is that within a single JVM you have very little control on how much memory a single thread is going to use; unless you make use of offheap (e.g. using Unsafe or direct memory as AnatolyG already mentioned). If you have huge array allocations, you could also control these. But we need to know more about the data-structures that consume the most memory.
But if you have orbitrary object graphs you don't have much control over, perhaps it smarter to model the problem using multiple processes. You have 1 intake controller process and then a bunch of worker processes. And on each process you can configure the maximum amount of heap a JVM is allowed to use.
Bumping into memory limits on OS level can be a huge PITA because it could lead to swapping and this will makes all the threads in a system slow. Or even worse, OOM-killer. Make sure you set the vm.swappiness to a very low value to prevent premature swapping.
Do you know up front how much memory a process is going to consume? If so, then you could keep track of the maximum amount of memory being consumed in the system and don't allow for new tasks in the system before tasks have completed.
If you don't know up front the memory limits, then you could assume each tasks will use the maximum, but this can lead to under-utilization of memory.

Java: I want to adjust the size of a thread pool. How can I detect either CMS collection starts, or other system-wide factors affecting available CPU?

(The specifics for this question are for a mod for Minecraft. In general, the question deals with resizing a threadpool based on system load and CPU availability).
I am coming from an Objective C background, and Apple's libdispatch (Grand Central Dispatch) for thread scheduling.
The immediate concern I have is trying to reduce the size of the threadpool when a CMS tenured collection is running. The program in question (Minecraft) only works well with CMS collections. A much less immediate, but still "of interest", is reducing the threadpool size when other programs are demanding significant CPU (specifically, either a screen recorder, or a twitch stream).
In Java, I have just found out about (deep breath):
Executors, which provide access to thread pools (both fixed size, and adjustable size), with cached thread existence (to avoid the overhead of constantly re-creating new threads, or to avoid the worry of coding threads to pause and resume based on workload),
Executor (no s), which is the generic interface for saying "Now it is time to execute this runnable()",
ExecutorService, which manages the threadpools according to Executor,
ThreadPoolExecutor, which is what actually manages the thread pool, and has the ability to say "This is the maximum number of threads to use".
Under normal operation, about 5 times a second, there will be 50 high priority, and 400 low priority operations submitted to the thread pool per user on the server. This is for high-powered machines.
What I want to do is:
Work with less-powerful machines. So, if a computer only has 2 cores, and the main program (two primary threads, plus some minor assistant threads) is already maxing out the CPU, these background tasks will be competing with the main program and the garbage collector. In this case, I don't want to reduce the number of background threads (it will probably stay at 2), but I do want to reduce how much work I schedule. So this is just "How do I detect when the work-load is going up". I suspect that this is just a case of watching the size of the work queue I use when Executors.newCachedThreadPool()
But the first problem: I can't find anything to return the size of the work queue! ThreadPoolExecutor() can return the queue, and I can ask that for a size, but newCachedThreadPool() only returns an ExecutorService, which doesn't let me query for size (or rather, I don't see how to).
If I have "enough cores", I want to tell the pool to use more threads. Ideally, enough to keep CPU usage near max. Most of the tasks that I want to run are CPU bound (disk I/O will be the exception, not the rule; concurrency blocking will also be rare). But I don't want to heavily over-schedule threads. How do I determine "enough threads" without going way over the available cores?
If, for example, screen recording (or streaming) activates, CPU core usage by other programs will go up, and then I want to reduce the number of threads; as the number of threads go down, and queue backlog goes up, I can reduce the amount of tasks I add to the queue. But I have no idea how to detect this.
I think that the best advice I / we can give is to not try to "micro-manage" the number of threads in the thread pools. Set it to sensible size that is proportional to the number of physical cores ... and leave it. By all means provide some static tuning parameters (e.g. in config files), but don't to make the system tune itself dynamically. (IMO, the chances that dynamic tuning will work better than static are ... pretty slim.)
For "real-time" streaming, your success is going to depend on the overall load and the scheduler's ability to prioritize more than the number of threads. However it is a fact that standard Java SE on a standard OS is not suited to hard real-time, so your real-time performance is liable to deteriorate badly if you push the envelope.

java multithreading performance scaling

can you explain this nonsense to me?
i have a method that basically fills up an array with mathematical operations. there's no I/O involved or anything. now, this method takes about 50 seconds to run, and the code is perfectly scalable (theoretically 100%), so i split it up into 4 threads, wait for them to complete, and reassemble the 4 arrays. now, i run the program on a quad core processor, expecting it to take about 15 seconds, and it actually takes 58 seconds. that's right: it takes longer! i see the cpu working 100%, and i know that each thread does 1/4 of the calculations, and creating threads and reassembling the arrays take about 1-2 ms in total.
what's causing such loss of performance? what the hell is the cpu doing all that time?
CODE: http://pastebin.com/cFUgiysw
Threads don't work that way.
Threads are still part of the same process (depending on the OS), so in terms of the operating system - CPU time will be scheduled the same for 4 threads in 1 process as it is for 1 thread in 1 process.
Also, with such a small number of values, you won't see the scalability in the midst of the overhead. Re-assembling the arrays in java will be costly.
Check out things like "Context switching overhead" - things like that always mess you up when you try to map theory to practise :P
I would stick to the single-threaded way :)
~ Dan
http://en.wikipedia.org/wiki/Context_switch
A lot depends on what you are doing and how you are dividing the work. There are many possible causes for this problem.
The most likely cause is, you are using all the bandwidth of your CPU to main memory bus with one thread. This can happen if your data set is larger than your CPU cache. esp if you have some random access behaviour. You could consider trying to reuse the original array, rather than taking multiple copies to reduce cache churn.
Your locking overhead is greater than the performance gain. I suspect you have used very course locking so this shouldnt be an issue.
Starting stopping threads takes too long. As your code is multi second, I doubt this too.
There is a cost associated with opening new threads. I don't think it should be up to 8 second but it depends on what threads you are using. Some threads needs to create a copy of the data that you are handling to be thread safe and that can take some time. This cost is commonly referred to as overhead. If the execution you are doing is somewhere not serializable for instance reads the same file or needs access to a shared resource the threads might need to wait on each other this can take some time and under sub optimal conditions it can take more time than serial execution. My tip is try and check for these unserializable events remove them from the threaded part if possible. Also try and use a lower amount of threads 4 threads for 4 cpus is not always optimal.
Hope it helps.
Unless you are constantly creating and killing threads the thread overhead shouldn't be a problem. Four threads running simultaeously is no big deal for the scheduler.
As Peter Lawrey suggested the memory bandwidth could be the problem. Your 50-second code is running on a java engine and they both compete for the available memory bandwidth. The java engine needs memory bandwidth to execute your code and your code needs it to do its calculations.
You write "perfectly scalable" which would be the case if your code was compiled. Since it runs on a java engine this is not the case. So the 16% increase in overall time could be seen as the difference between the smoothness of one thread vs the chaos of four colliding over memory accesses.

Java Threads memory explosion

I am fairly new with concurrent programming and I am learning it.
I am implementing a quick sort in Java JDK 7 (Fork Join API) to sort a list of objects (100K).
While using this recursive piece of code without using concurrency,i observe no memory explosion, everything is fine.
I just added the code to use it on multi cores (by extending the class RecursiveAction) and then the memory usage jumped very high until it reached its limits. By doing some profiling i observe a high creation rate of threads and i think its expectable.
But, is a java Thread by itself much more memory demanding or am i missing something here ?
Quicksort must requires a lot of threads but not much than regular objects.
Should I stop creating RecursiveAction Threads when i meet a threshold and then just switch to a sequential piece of code (no more threads)?
Thank you very much.
Java threads usually take 256k/512k(depeding in OS, jdk versions..) of stack space alone, by default.
You're wasting huge resources and speed if you're running more threads than you have processors/cores for a CPU intensive process such as doing quicksort, so try to not run more threads than you have cores.
Yes, switching over to sequential code is a good idea when the unit of work is in the region of ca. 10,000-100,000 operations. This is just a rule of thumb. So, for quick sort, I'd drop out to sequential execution when the size to be sorted is less than say 10-20,000 elements, depending upon the complexity of the comparison operation.
What's the size of the ForkJoinPool - usually it's set to create the same number of threads as processors, so you shouldn't be seeing too many threads. If you've manually set the parallelism to be high (say, in the hundreds or thousands) then you will see high (virtual) memory use, since each thread allocates space for the stack (256K by default on 32-bit windows and linux.)
As a rule of thumb for a CPU bound computation, once your number of threads exceeds the number of available cores, adding more threads is not going to speed things up. In fact, it will probably slow you down due to the overheads of creating the threads, the resources tied down by each thread (e.g. the thread stacks), and the cost of synchronizing.
Indeed, even if you had an infinite number of cores, it would not be worth creating threads to do small tasks. Even with thread pools and other clever tricks, if the amount of work to be done in a task is too small the overheads of using a thread will exceed any savings. (It is difficult to predict exactly where that threshold is, and it certainly depends on the nature of the task as well as platform-related factors.)
I changed my code and so far I have better results. I invoke the main Thread task in the ForkJoinPool, in the Threads, I dont create more threads if there are a lot more active threads than available cores in the ForkJoinPool.
I dont do synchronism through the join() method. As a result a parent thread will die as soon as it created its offsprings. In the main function that invoked the root task. I wait for the tasks to be completed, aka no more active threads. Its seems to work fine as the memory stays normal and i gained lots of time over a the same piece of code executed sequentially.
I am going to learn more.
Thank you all !

Threads per Processor

In Java, is there a programmatic way to find out how many concurrent threads are supported by a CPU?
Update
To clarify, I'm not trying to hammer the CPU with threads and I am aware of Runtime.getRuntime().availableProcessors() function, which provides me part of the information I'm looking for.
I want to find out if there's a way to automatically tune the size of thread pool so that:
if I'm running on a 1-year old server, I get 2 threads (1 thread per CPU x an arbitrary multiplier of 2)
if I switch to an Intel i7 quad core two years from now (which supports 2 threads per core), I get 16 threads (2 logical threads per CPU x 4 CPUs x the arbitrary multiplier of 2).
if, instead, I use a eight core Ultrasparc T2 server (which supports 8 threads per core), I get 128 threads (8 threads per CPU x 8 CPUs x the arbitrary multiplier of 2)
if I deploy the same software on a cluster of 30 different machines, potentially purchased at different years, I don't need to read the CPU specs and set configuration options for every single one of them.
Runtime.availableProcessors returns the number of logical processors (i.e. hardware threads) not physical cores. See CR 5048379.
A single non-hyperthreading CPU core can always run one thread. You can spawn lots of threads and the CPU will switch between them.
The best number depends on the task. If it is a task that will take lots of CPU power and not require any I/O (like calculating pi, prime numbers, etc.) then 1 thread per CPU will probably be best. If the task is more I/O bound. like processing information from disk, then you will probably get better performance by having more than one thread per CPU. In this case the disk access can take place while the CPU is processing information from a previous disk read.
I suggest you do some testing of how performance in your situation scales with number of threads per CPU core and decide based on that. Then, when your application runs, it can check availableProcessors() and decide how many threads it should spawn.
Hyperthreading will make the single core appear to the operating system and all applications, including availableProcessors(), as 2 CPUs, so if your application can use hyperthreading you will get the benefit. If not, then performance will suffer slightly but probably not enough to make the extra effort in catering for it worth while.
There is no standard way to get the number of supported threads per CPU core within Java. Your best bet is to get a Java CPUID utility that gives you the processor information, and then match it against a table you'll have to generate that gives you the threads per core that the processor manages without a "real" context switch.
Each processor, or processor core, can do exactly 1 thing at a time. With hyperthreading, things get a little different, but for the most part that still remains true, which is why my HT machine at work almost never goes above 50%, and even when it's at 100%, it's not processing twice as much at once.
You'll probably just have to do some testing on common architectures you plan to deploy on to determine how many threads you want to run on each CPU. Just using 1 thread may be too slow if you're waiting for a lot of I/O. Running a lot of threads will slow things down as the processor will have to switch threads more often, which can be quite costly. I'm not sure if there is any hard-coded limit to how many threads you can run, but I gaurantee that your app would probably come to a crawl from too much thread switching before you reached any kind of hard limit. Ultimately, you should just leave it as an option in the configuration file, so that you can easily tune your app to whatever processor you're running it on.
A CPU does not normally pose a limit on the number of threads, and I don't think Java itself has a limit on the number of native (kernel) threads it will spawn.
There is a method availableProcessors() in the Runtime class. Is that what you're looking for?
Basics:
Application loaded into memory is a process. A process has at least 1 thread. If you want, you can create as many threads as you want in a process (theoretically). So number of threads depends upon you and the algorithms you use.
If you use thread pools, that means thread pool manages the number of threads because creating a thread consumes resources. Thread pools recycle threads. This means many logical threads can run inside one physical thread one after one.
You don't have to consider the number of threads, it's managed by the thread pool algorithms. Thread pools choose different algorithms for servers and desktop machines (OSes).
Edit1:
You can use explicit threads if you think thread pool doesn't use the resources you have. You can manage the number of threads explicitly in that case.
This is a function of the VM, not the CPU. It has to do with the amount of heap consumed per thread. When you run out of space on the heap, you're done. As with other posters, I suspect your app becomes unusable before this point if you exceed the heap space because of thread count.
See this discussion.

Categories