I want to get the process ID of a Thread to see how much memory it takes.
It depends a lot on the OS and how it manages threads. Theoretically it also depends on how the JVM implements threads, but all modern JVMs implement them as native threads.
On Linux each thread will used to get its own process ID, but most tools hide all but one thread per process (i.e. you don't usually see them unless you explicitly ask for them, ps uses the -m flag for example). This is caused by the fact that the Linux kernel doesn't really make much of a difference between threads and tasks.
Edit: as I just learned this is no longer necessarily the case: you can create a thread with the exact same PID as the parent, in which case the threads will be distinguished by different thread IDs.
However since a thread shares its memory with all other threads in the same process, this doesn't help you find out "how much memory a thread takes", since all threads in a process will use the exact same amount (and they all use the same, so the real used memory is shown_memory_use and not shown_memory_user * number_of_threads).
Threads do not have PIDs, processes do. As such what you're asking is not possible. There is also no reliable way to retrieve your PID from within a Java process (although the first part of the value returned by ManagementFactory.getRuntimeMXBean().getName() usually is the PID).
As the name implies, PID means process ID. Each process can spawn multiple threads, which all share the same PID. Are you sure you don't mean Thread ID?
A feature of thread is that is shares the heap with all other threads. This means that any one thread can potentially use almost all the memory of the process. The only thing which a thread doesn't have access to is the stack or local variables of another thread.
As such it is not useful to try to determine how much memory an individual thread uses. Instead it can be useful to determine how much memory a data structure uses. (Although this can have similar difficulties)
It is worth noting that main memory is relatively cheap. Your situation may be different but a typical new server with 24 GB can cost as little as £1K. You can buy a 96 GB PC for around £2K. Sometimes it is not worth worrying about how much memory you are using until you know it is a problem.
Related
Suppose I have a large batch of memory-bound tasks that are quite independent of one another. To make things concrete, let's say I can allocate 30GB for the heap and that each task requires on average about 3GB of memory at its peak, but with some variability both over time and from task to task. A few tasks here and there might even require 6GB.
In this case, it seems more efficient to try to run 10 (or arguably even more) tasks concurrently, and if / when we bump into the memory limit have the task wait, much the same as we do with other shared resources like I/O, specific memory addresses (which are accessed through locks), etc.
Is it possible do this in Java? More generally
What's the best way to handle memory-bound task scheduling in Java?
Some Related Questions and "Close Misses"
This question asks whether it's possible to have threads in java wait for memory instead of throwing an OOM exception, but the answers seem to focus on why this is a bad idea to begin with - perhaps because the question suggests the number of threads is unreasonable. Also, I guess treating all memory requests as equal can lead to deadlocks. So I want to emphasize that here we are talking about only about 10 tasks, and the desire to "max out" the memory usage seems like a very natural one. I do not mind wrapping my tasks by some suitable logic that will distinguish their memory requests as having lower priority. I can even accept a solution where I need to identify the class whose instances are filling up the memory and maybe add some suitable counter - but I'd prefer a platform-independent solution that works "out of the box", if there is one.
This question also also asks about scheduling memory-bound tasks but seems to presuppose a specific solution framework.
The problem is that within a single JVM you have very little control on how much memory a single thread is going to use; unless you make use of offheap (e.g. using Unsafe or direct memory as AnatolyG already mentioned). If you have huge array allocations, you could also control these. But we need to know more about the data-structures that consume the most memory.
But if you have orbitrary object graphs you don't have much control over, perhaps it smarter to model the problem using multiple processes. You have 1 intake controller process and then a bunch of worker processes. And on each process you can configure the maximum amount of heap a JVM is allowed to use.
Bumping into memory limits on OS level can be a huge PITA because it could lead to swapping and this will makes all the threads in a system slow. Or even worse, OOM-killer. Make sure you set the vm.swappiness to a very low value to prevent premature swapping.
Do you know up front how much memory a process is going to consume? If so, then you could keep track of the maximum amount of memory being consumed in the system and don't allow for new tasks in the system before tasks have completed.
If you don't know up front the memory limits, then you could assume each tasks will use the maximum, but this can lead to under-utilization of memory.
Somewhere I have heard about Thread Affinity and Thread Affinity Executor. But I cannot find a proper reference for it at least in java. Can someone please explain to me what is it all about?
There are two issues. First, it’s preferable that threads have an affinity to a certain CPU (core) to make the most of their CPU-local caches. This must be handled by the operating system. This CPU affinity for threads is often also called “thread affinity”. In case of Java, there is no standard API to get control over this. But there are 3rd party libraries, as mentioned by other answers.
Second, in Java there is the observation that in typical programs objects are thread-affine, i.e. typically used by only one thread most of the time. So it’s the task of the JVM’s optimizer to ensure, that objects affine to one thread are placed close to each other in memory to fit into one CPU’s cache but place objects affine to different threads not too close to each other to avoid that they share a cache line as otherwise two CPUs/Cores have to synchronize them too often.
The ideal situation is that a CPU can work on some objects independently to another CPU working on other objects placed in an unrelated memory region.
Practical examples of optimizations considering Thread Affinity of Java objects are
Thread-Local Allocation Buffers (TLABs)
With TLABs, each object starts its lifetime in a memory region dedicated to the thread which created it. According to the main hypothesis behind generational garbage collectors (“the majority of all objects will die young”), most objects will spent their entire lifetime in such a thread local buffer.
Biased Locking
With Biased Locking, JVMs will perform locking operations with the optimistic assumption that the object will be locked by the same thread only, switching to a more expensive locking implementation only when this assumption does not hold.
#Contended
To address the other end, fields which are known to be accessed by multiple threads, HotSpot/OpenJDK has an annotation, currently not part of a public API, to mark them, to direct the JVM to move these data away from the other, potentially unshared data.
Let me try explaining it. With the rise of multicore processors, message passing between threads & thread pooling, scheduling has become more costlier affair. Why this has become much heavier than before, for that we need to understand the concept of "mechanical sympathy". For details you can go through a blog on it. But in crude words, when threads are distributed across different cores of a processor, when they try to exchange messages; cache miss probability is high. Now coming to your specific question, thread affinity being able to assign specific threads to a particular processor/core. Here is one of the library for java that can be used for it.
The Java Thread Affinity version 1.4 library attempts to get the best of both worlds, by allowing you to reserve a logical thread for critical threads, and reserve a whole core for the most performance sensitive threads. Less critical threads will still run with the benefits of hyper threading. e.g. following code snippet
AffinityLock al = AffinityLock.acquireLock();
try {
// find a cpu on a different socket, otherwise a different core.
AffinityLock readerLock = al.acquireLock(DIFFERENT_SOCKET, DIFFERENT_CORE);
new Thread(new SleepRunnable(readerLock, false), "reader").start();
// find a cpu on the same core, or the same socket, or any free cpu.
AffinityLock writerLock = readerLock.acquireLock(SAME_CORE, SAME_SOCKET, ANY);
new Thread(new SleepRunnable(writerLock, false), "writer").start();
Thread.sleep(200);
} finally {
al.release();
}
// allocate a whole core to the engine so it doesn't have to compete for resources.
al = AffinityLock.acquireCore(false);
new Thread(new SleepRunnable(al, true), "engine").start();
Thread.sleep(200);
System.out.println("\nThe assignment of CPUs is\n" + AffinityLock.dumpLocks());
Thread affinity (or process affinity) describes on which processor cores the thread/process is allowed to run. Normally, this setting is equal to the (logical) CPUs in your system, and there's hardly a reason for changing this, because the operating system then has the best possibilities to schedule your tasks among the available processors.
See i.e. http://msdn.microsoft.com/en-us/library/windows/desktop/ms683213(v=vs.85).aspx for how this works in windows. I don't know whether java offers an API to set these.
In Java Concurrency in Practice, the authors write:
When locking is contended, the losing thread(s) must block. The JVM can implement blocking either via spin-waiting (repeatedly trying to acquire the lock until it succeeds) or by suspending the blocked thread through the operating system. Which is more efficient depends on the relationship between context switch overhead and the time until the lock becomes available; spin-waiting is preferred for short waits and suspension is preferable for long waits. Some JVMs choose between the two adaptively based on profiling data of past wait times, but most just suspend threads waiting for a lock.
When I read this I was quite surprised. Are there any known JVMs implementing blocking either on always spin-waiting or sometimes spin-waiting due to profiling results? It's hard to believe for now.
Here is evidence that JRockit can use spinlocks - http://forums.oracle.com/forums/thread.jspa?threadID=816625&tstart=494
And if you search for "spin" in the JVM options listed here you will see evidence for the use of / support for spinlocks in Hotspot JVMs.
And if you want a current example, look at "src/hotspot/share/runtime/mutex.cpp" in the OpenJDK Java 11 source tree.
What the authors have written is right and it only makes sense. This is true for Linux as well. The rationale of why spin locks are used is because most resources are protected for a fraction of a millisecond. As such, to suspend, push all the contents of the registers onto the stack and relinquish CPU is just way too much overhead and not worth it. Thus even though it just spins in a tight set of instructions, sometimes just wasting time, it is still way more efficient than swapping out.
That being said, with the VM profiling, it would ideally make your processing more efficient. As such, is there are particular case that you always want to suspend? Or maybe always spin-wait?
What are the Light weight and heavy weight threads in terms of Java?
It's related to the amount of "context" associated with a thread, and consequently the amount of time it takes to perform a "context switch".
Heavyweight threads, (usually kernel/os level threads) have a lot of context (hardware registers, kernel stacks, etc). So it takes a lot of time to switch between threads. Heavyweight threads may also have restrictions on them, for example, on some OSes, kernel threads cannot be pre-empted, which means they can't forcibly be switched out until they give up control.
Lightweight threads on the other hand (usually, user space threads) have much less context. (They essentially share the same hardware context), they only need to store the context of the user stack, hence the time taking to switch lightweight threads is much shorter.
On most OSes, any threads you create as a programmer in user space will be lightweight in comparison to the kernel space threads. There is no formal definition of heavyweight and lightweight, it's just more of a comparison between threads with more context and threads with less context. Don't forget that every OS has its own different implementation of threads, and the lines between heavy and light threads are not necessarily clearly defined. In some programming languages and frameworks, when you create a "Thread" you might not even be getting a full thread, you might just be getting some abstraction that hides the real number of threads underneath.
[Some OSes allow threads to share address space, so threads that would usually be quite heavy, are slightly lighter]
Java standard threads are reasonably heavy in comparison to Erlang threads which are very light spawnable processes. Erlang demonstrates a distributed finite state machine.
However as an example, http://kilim.malhar.net/ , a Java extension library based on the Actor model of concurrency, proposes a construct for light weight threads in java. Instead of Thread implementing run(), a Kilim thread implements from the Kilim library using an execute() method. Apparently it shows Java's runtime outperforms Erlang's (atleast in a local environment AFAIK). Java did actually have such things in the original language spec called 'green threads' but subsequent Java versions dropped them in favor of native threads
In most systems Light weight threads are the normal threads you create with the help of library, like p_threads in linux.
While Heavy weight, in some systems, refer to a system process, with its own virtual memory and a more complex structure, like information about the process performance/statistics.
For more information:
http://www.computerworld.com/s/article/66405/Processes_and_Threads
http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx
In Java, is there a programmatic way to find out how many concurrent threads are supported by a CPU?
Update
To clarify, I'm not trying to hammer the CPU with threads and I am aware of Runtime.getRuntime().availableProcessors() function, which provides me part of the information I'm looking for.
I want to find out if there's a way to automatically tune the size of thread pool so that:
if I'm running on a 1-year old server, I get 2 threads (1 thread per CPU x an arbitrary multiplier of 2)
if I switch to an Intel i7 quad core two years from now (which supports 2 threads per core), I get 16 threads (2 logical threads per CPU x 4 CPUs x the arbitrary multiplier of 2).
if, instead, I use a eight core Ultrasparc T2 server (which supports 8 threads per core), I get 128 threads (8 threads per CPU x 8 CPUs x the arbitrary multiplier of 2)
if I deploy the same software on a cluster of 30 different machines, potentially purchased at different years, I don't need to read the CPU specs and set configuration options for every single one of them.
Runtime.availableProcessors returns the number of logical processors (i.e. hardware threads) not physical cores. See CR 5048379.
A single non-hyperthreading CPU core can always run one thread. You can spawn lots of threads and the CPU will switch between them.
The best number depends on the task. If it is a task that will take lots of CPU power and not require any I/O (like calculating pi, prime numbers, etc.) then 1 thread per CPU will probably be best. If the task is more I/O bound. like processing information from disk, then you will probably get better performance by having more than one thread per CPU. In this case the disk access can take place while the CPU is processing information from a previous disk read.
I suggest you do some testing of how performance in your situation scales with number of threads per CPU core and decide based on that. Then, when your application runs, it can check availableProcessors() and decide how many threads it should spawn.
Hyperthreading will make the single core appear to the operating system and all applications, including availableProcessors(), as 2 CPUs, so if your application can use hyperthreading you will get the benefit. If not, then performance will suffer slightly but probably not enough to make the extra effort in catering for it worth while.
There is no standard way to get the number of supported threads per CPU core within Java. Your best bet is to get a Java CPUID utility that gives you the processor information, and then match it against a table you'll have to generate that gives you the threads per core that the processor manages without a "real" context switch.
Each processor, or processor core, can do exactly 1 thing at a time. With hyperthreading, things get a little different, but for the most part that still remains true, which is why my HT machine at work almost never goes above 50%, and even when it's at 100%, it's not processing twice as much at once.
You'll probably just have to do some testing on common architectures you plan to deploy on to determine how many threads you want to run on each CPU. Just using 1 thread may be too slow if you're waiting for a lot of I/O. Running a lot of threads will slow things down as the processor will have to switch threads more often, which can be quite costly. I'm not sure if there is any hard-coded limit to how many threads you can run, but I gaurantee that your app would probably come to a crawl from too much thread switching before you reached any kind of hard limit. Ultimately, you should just leave it as an option in the configuration file, so that you can easily tune your app to whatever processor you're running it on.
A CPU does not normally pose a limit on the number of threads, and I don't think Java itself has a limit on the number of native (kernel) threads it will spawn.
There is a method availableProcessors() in the Runtime class. Is that what you're looking for?
Basics:
Application loaded into memory is a process. A process has at least 1 thread. If you want, you can create as many threads as you want in a process (theoretically). So number of threads depends upon you and the algorithms you use.
If you use thread pools, that means thread pool manages the number of threads because creating a thread consumes resources. Thread pools recycle threads. This means many logical threads can run inside one physical thread one after one.
You don't have to consider the number of threads, it's managed by the thread pool algorithms. Thread pools choose different algorithms for servers and desktop machines (OSes).
Edit1:
You can use explicit threads if you think thread pool doesn't use the resources you have. You can manage the number of threads explicitly in that case.
This is a function of the VM, not the CPU. It has to do with the amount of heap consumed per thread. When you run out of space on the heap, you're done. As with other posters, I suspect your app becomes unusable before this point if you exceed the heap space because of thread count.
See this discussion.