Relationship between core and threads [duplicate] - java

Say if I have a processor like this which says # cores = 4, # threads = 4 and without Hyper-threading support.
Does that mean I can run 4 simultaneous program/process (since a core is capable of running only one thread)?
Or does that mean I can run 4 x 4 = 16 program/process simultaneously?
From my digging, if no Hyper-threading, there will be only 1 thread (process) per core. Correct me if I am wrong.

A thread differs from a process. A process can have many threads. A thread is a sequence of commands that have a certain order. A logical core can execute on sequence of commands. The operating system distributes all the threads to all the logical cores available, and if there are more threads than cores, threads are processed in a fast cue, and the core switches from one to another very fast.
It will look like all the threads run simultaneously, when actually the OS distributes CPU time among them.
Having multiple cores gives the advantage that less concurrent threads will be placed on one single core, less switching between threads = greater speed.
Hyper-threading creates 2 logical cores on 1 physical core, and makes switching between threads much faster.

That's basically correct, with the obvious qualifier that most operating systems let you execute far more tasks simultaneously than there are cores or threads, which they accomplish by interleaving the executing of instructions.
A system with hyperthreading generally has twice as many hardware threads as physical cores.

The term thread is generally used as a description of an operating system concept that has the potential to execute independently of other threads. Whether it does so depends on whether it is stuck waiting for some event (disk or screen I/O, message queue), or if there are enough physical CPUs (hyperthreaded or not) to allow it run in the face of other non-waiting threads.
Hyperthreading is a CPU vendor term that means a single core, that can multiplex its attention between two computations. The easy way to think about a hyperthreaded core is as if you had two real CPUs, both slightly slower than what the manufacture says the core can actually do.

Basically this is up to the OS. A thread is a high-level construct holding a instruction pointer, and where the OS places a threads execution on a suitable logical processor. So with 4 cores you can basically execute 4 instructions in parallell. Where as a thread simply contains information about what instructions to execute and the instructions placement in memory.
An application normally uses a single process during execution and the OS switches between processes to give all processes "equal" process time. When an application deploys multiple threads the processes allocates more than one slot for execution but shares memory between threads.
Normally you make a difference between concurrent and parallell execution. Where parallell execution is when you actually physically execute instructions of more than one logical processor and concurrent execution is the the frequent switching of a single logical processor giving the apperence of parallell execution.

Related

Single processes on separate machines vs multithreading on a single machine with cores equal to number of single CPU machines

Let's assume that there is a Java program that creates image thumbnails. All cloud providers offer VMs with 1 up to N vCPU virtual machines nodes (VMs).
What would be the advantages and/or disadvantages or running several single-threaded JVM processes each on a single 1 vCPU node vs running one multi-threaded JVM process on a multi-vCPU machine, with the processes number on single vCPU equal to the threads running in the single JVM process?
To give a more precise example:
4 VM nodes each has a 1-vCPU and running single-threaded JVM process that is processing image (this could be anything else)
VS
1 VM node that has 4-vCPU and running multi-threaded JVM process with exact 4 threads each one is doing the exact same task as the single-threaded process above.
Interesting question.
First, let's assume that the individual threads do not share any Java resources, i.e., don't all synchronize on some lock. They are completely independent.
Second, let's assume for a second that you aren't using virtual CPUs, but actual dedicated servers with either 1 CPU or 4 CPUs.
How would the two approaches compete for resources?
4 x 1 CPU, 1 thread
Each node has its own dedicated IO hardware, so in essence, 4x the IO capacity
Each node only has one process using its own memory and its own garbage collector
Because we only have one thread, we hardly have any context switches
1 x 4 CPU, 4 threads
The 4 threads share the IO hardware, which could lead to contingencies (compared to the other solution), i.e. IO-wait times
The memory is shared, so if the garbage collector encounters stop the world events (they rarely do these days), all threads are blocked. Depending on the GC, this could be a bottleneck
Because we have four threads, we do have context switches all the time. How much this affects performance depends on the JVMs ability to keep all data local to a CPU, i.e., pin threads to a CPU.
Now with virtual CPUs some of these differences may disappear. E.g. IO capacity may be identical, because all four virtual nodes may be on one physical machine, just like the one node scenario using 4 virtual CPUs. Garbage collector contention is probably still lower in the 4 vCPUs scenario. Regarding context switches, the question is probably, whether switching between VMs is as cheap as switching between threads. Again, the ability to pin processes/threads to a given physical CPU may be the deciding factor.
To summarize, I don't think you get around benchmarking this. So my advice is to try to learn as much as possible about your infrastructure (so you know what you are measuring) and then run some experiment.
Also, this is something you haven't asked, but there is a cost attached to running both solutions. Administrative overhead/cost is certainly lower when running the multi-threaded solution. On the other hand, running multiple VMs is more robust, because it's not fatal when one VM dies—you still have 3 left running.

Java threads and number of cores

I just had a quick question on how processors and threads work. According to my current understanding, a core can only perform 1 process at a time. But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores? I am also able to run my 30 thread program on my local computer and also continue to perform other activities on my computer such as watch movies or browse the internet.
I have read somewhere that scheduling of threads occurs and that sort of gives the illusion that these 30 threads are running concurrently by the 4 cores. Is this true and if so can someone explain how this works and also recommend some good reading on this?
Thank you in advance for the help.
Processes vs Threads
In days of old, each process had precisely one thread of execution, so processes were scheduled onto cores directly (and in these old days, there was almost only one core to schedule onto). However, in operating systems that support threading (which is almost all moderns OS's), it is threads, not processes that are scheduled. So for the rest of this discussion we will talk exclusively about threads, and you should understand that each running process has one or more threads of execution.
Parallelism vs Concurrency
When two threads are running in parallel, they are both running at the same time. For example, if we have two threads, A and B, then their parallel execution would look like this:
CPU 1: A ------------------------->
CPU 2: B ------------------------->
When two threads are running concurrently, their execution overlaps. Overlapping can happen in one of two ways: either the threads are executing at the same time (i.e. in parallel, as above), or their executions are being interleaved on the processor, like so:
CPU 1: A -----------> B ----------> A -----------> B ---------->
So, for our purposes, parallelism can be thought of as a special case of concurrency*
Scheduling
But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores?
In this case, they can run concurrently because the CPU scheduler is giving each one of those 30 threads some share of CPU time. Some threads will be running in parallel (if you have 4 cores, then 4 threads will be running in parallel at any one time), but all 30 threads will be running concurrently. The reason you can then go play games or browse the web is that these new threads are added to the thread pool/queue and also given a share of CPU time.
Logical vs Physical Cores
According to my current understanding, a core can only perform 1 process at a time
This is not quite true. Due to very clever hardware design and pipelining that would be much too long to go into here (plus I don't understand it), it is possible for one physical core to actually be executing two completely different threads of execution at the same time. Chew over that sentence a bit if you need to -- it still blows my mind.
This amazing feat is called simultaneous multi-threading (or popularly Hyper-Threading, although that is a proprietary name for a specific instance of such technology). Thus, we have physical cores, which are the actual hardware CPU cores, and logical cores, which is the number of cores the operating system tells software is available for use. Logical cores are essentially an abstraction. In typical modern Intel CPUs, each physical core acts as two logical cores.
can someone explain how this works and also recommend some good reading on this?
I would recommend Operating System Concepts if you really want to understand how processes, threads, and scheduling all work together.
The precise meanings of the terms parallel and concurrent are hotly debated, even here in our very own stack overflow. What one means by these terms depends a lot on the application domain.
Java do not perform Thread scheduling, it leaves this on Operating System to perform Thread scheduling.
For computationally intensive tasks, It is recommended to have thread pool size equal to number of cores available. But for I/O bound tasks we should have larger number of threads. There are many other variations, if both type of tasks are available and needs CPU time slice.
a core can only perform 1 process at a time
Yes, but they can multitask and create an illusion that they are processing more than one process at a time
How is this possible if we are only have 4 cores? I am also able to
run my 30 thread program on my local computer and also continue to
perform other activities on my computer
This is possible due to multitasking (which is concurrency). Lets say you started 30 threads and OS is also running 50 threads, all 80 threads will share 4 CPU cores by getting CPU time slice one by one (one thread per core at a time). Which means on average each core will run 80/4=20 threads concurrently. And you will feel all threads/processes are running at the same time.
can someone explain how this works
All of this happens at OS level. If you are a programmer then you should not worry about this. But if you are a student of OS then pick any OS book & learn more about Multi-threading at OS level in detail or find some good research paper for depth. One thing you should know that each OS handle these things in different way (but generally concepts are same)
There are some languages like Erlang, which use green threads (or processes), due to which they get the ability to map and schedule threads on their own eliminating OS. So, do some research on green threads as well if you are interested.
Note: You can also research on actors which is another abstraction over threads. Languages like Erlang, Scala etc use actors to accomplish tasks. One thread can have hundred of actors; each actor can perform different task (similar to threads in java).
This is a very vast and active research topic and there are many things to learn.
In short, your understanding of a core is correct. A core can execute 1 thread (aka process) at a time.
However, your program doesn't really run 30 threads at once. Of those 30 threads, only 4 are running at a time, and the other 26 are waiting. The CPU will schedule threads and give each thread a slice of time to run on a core. So the CPU will make all the threads take turns running.
A common misconception:
Having more threads will make my program run faster.
FALSE: Having more threads will NOT always make your program run faster. It just means the CPU has to do more switching, and in fact, having too many threads will make your program run slower because of the overhead caused by switching out all the different processes.

Maximum number of threads than can run concurrently in java on a CPU

Please I got confused about something.
What I know is that the maximum number of threads that can run concurrently on a normal CPU of a modern computer ranges from 8 to 16 threads.
On the other hand, using GPUs thousands of threads can run concurrently without the scheduler interrupting any thread to schedule another one.
On several posts as:
Java virtual machine - maximum number of threads https://community.oracle.com/message/10312772
people are stating that they run thousands of java threads concurrently on normal CPUs.
How could this be ??
And how can I know the maximum number of threads that can run concurrently so that my code adjusts it self dynamically according to the underlying architecture.
Threads aren't tied to or limited by the number of available processors/cores. The operating system scheduler can switch back and forth between any number of threads on a single CPU. This is the meaning of "preemptive multitasking."
Of course, if you have more threads than cores, not all threads will be executing simultaneously. Some will be on hold, waiting for a time slot.
In practice, the number of threads you can have is limited by the scheduler - but that number is usually very high (thousands or more). It will vary from OS to OS and with individual versions.
As far as how many threads are useful from a performance standpoint, as you said it depends on the number of available processors and on whether the task is IO or CPU bound. Experiment to find the optimal number and make it configurable if possible.
There is hardware and software concurrency. The 8 to 16 threads refers to the hardware you have - that is one or more CPUs with hardware to execute 8 to 16 threads parallel to each other. The thousands of threads refers to the number of software threads, the scheduler will have to swap them out so every software thread gets its time slice to run on the hardware.
To get the number of hardware threads you can try Runtime.availableProcessors().
At any given time, a processor will run the number of threads equal to the number of cores contained. This means that on a uniprocessor system, only one thread (or no thread) is being run at any given moment.
However, processors do not run each thread one after another, rather they switch between multiple threads rapidly to simulate concurrent execution. If this weren't the case let alone create multiple threads, you won't even be able to start multiple applications.
A java thread (compared to processor instructions) is a very high level abstraction of a set of instructions for the CPU to process. When it gets down to the processor level, there is no guarantee which threads will run on which core at any given time. But given that processors rapidly switch between these threads, it is theoretically possible to create an infinite amount of threads albeit at the cost of performance.
If you think about it, a modern computer has thousands of threads running at the same time (combining all applications) while only having 1 ~ 16 (typical case) number of cores. Without this task-switching, nothing would ever get done.
If you are optimizing your application, you should consider the amount of threads you need by the work at hand, and not by the underlying architecture. Performance gains from parallelism should be weighted against increasing overheads of thread execution. Since every machine is different, every runtime environment is different, it is impractical to work out some golden thread count (however, a ballpark estimate may be made by benchmarking and looking at number of cores).
While all the other answers have explained how you can theoretically have thousands of threads in your application at the cost of memory and other overheads already well explained here. It is however worth noting that the default concurrencyLevel for the data structures provided in the java.util.concurrent package is 16.
You will come across contention issues if you don't account for the same.
Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.
Make sure you have set the appropriate concurrencyLevel in case you are running into issues related to concurrency with a higher number of threads.

How many CPU will a multithreaded application take, if runs in multicore processor

A multi-core processor is a single computing component with two or more independent actual central processing units (called "cores"), which are the units that read and execute program instructions.
If a multithreaded application runs on a multi-core processor, how many CPUs will is use? For example, if the machine is capable of dual core execution, then 2 CPUs will be used, if my understanding is correct. Within these two CPUs, multiple threads will be executed and do the context switching.
If a Mulithreaded application runs on multi-core processor, how many CPU it will use, for example if the machine is capable of doing the dualcore, then 2 CPU will be used is my understanding is correct, and within these two CPU multiple thread will be executed and do the context switching.
The JVM really doesn't deal directly with processors. It uses the native thread capabilities of the operating system which uses the processors that are exposed by the operating system and hardware. In Java there is a Runtime.availableProcessors() method but this in a only a few places by the JVM code.
To the JVM or any other application running on a computer, the multiple cores typically seem the same as multiple processors if that's how the OS exposes them. This means that the distinction between physical processors versus multiple cores in a single processor is completely hidden from the Java programmer.
There are single core CPUs then there are CPUs with multiple cores which share certain internal components but the OS sees them and schedules them as multiple processors. Multiple cores are most likely seen to the OS as multiple CPUs -- there is no distinction. Then there are the virtual processors often called hyperthreading which share the same processor core (and the associated processing circuitry) but have multiple execution pipelines. These are also (usually) seen by the OS as multiple processors.
Specifically, in the OP's example, you have a single processor with two cores, in linux cat'ing /proc/cpuinfo will show 2 processors and in Java the Runtime.availableProcessors() will return 2. It will also return 2 if you have 2 physical processors also will most likely if you have a single core with dual hyperthreading pipelines depending on the OS kernel.
As to how many processors the JVM will actually be using, this depends again on the native thread code. This said, if the JVM is running on a single CPU with two cores, and the cores are not in use by other applications or the OS, then the JVM's thread will most likely be scheduled to run on them concurrently.
By default you can utilize all processors. One processor can run virtually as many threads as possible at the same time (virtually means that physically there's always just one thread which is running). How many is possible depends on the operating system resource limitations and the used threading framework.
It doesn't matter from software point of view, if the cores are on one die, and there's one CPU socket with a multi-core CPU, or there are more CPU sockets. The OS and JVM will see the collection of the cores. (This brings in an interesting aspect though: data exchange between such cores which are on the same die and those which are in different sockets are not uniform).
Thread schedulers (talking about both the OS's and the virtual machine's) often tend to shuffle and move threads from one core to another throughout scheduler time. That can hurt performance, there are techniques to tie a thread to a certain core (thread affinity).
How much cpu resources your application (lets assume long running task) will really consume depends on how much percentage you need your cpu. Application can be network, memory, harddisk or cpu bound and a few others.
If the cpu has to wait for any other resource such as memory or network it will remain idle or be assigned to other threads.
Example:
If your application is only cpu bound (won't consume much memory) and you run a long task with as many threads as cores (physical or virtual with hyperthreading) you will get almost 100% usage of the free resources that are not used by other running threads (os, programms).
Depending on the program you can tell in which state your application is from the cpu/memory/network consumption and you can analyse the performance.
It will get use of at most as many CPUs as you have simultaneously busy threads, and possibly as few as one.
From programmer's point of view, a core is a processor. Method Runtime.availableProcessors() shows the number of cores. However, from manufacturer's point of view, multi-core processor is similar to ordinary processor, so they decided to leave the name "processor", probably making a marketing mistake.

Need clarification in Parallel Processing

For example if I use dual core processor and write a java program without using threads. Does it mean the program execution is sequential and it will only use single core among the dual?
For example if I use dual core processor and write a java program using threads and synchronisation. Does it mean the program execution is parallel and it will use all available cores(in this case two cores)?
If my reasoning is totally wrong then , what is the relation between threading,cores and parallelism?
First, the JVM has a number of background threads that will use multiple CPUs and cores even if the user code never forks another thread. The garbage collector for example will run concurrently in another CPU if possible regardless of the user code.
If your user code never forks another thread, the JVM will never run your code concurrently in multiple CPUs. If you do write your program with multiple threads there is no guarantee that it will be run in multiple CPUs but it is certainly more likely. It depends a lot on what else is running on on the OS and how blocked your threads are. If you threads are consuming a lot of CPU cycles and run for any length of time on a modern OS then yes, your program will use both CPUs.
You can verify this on a Linux OS (and other Unixen) by watching to see if your process consumes more than 100% of CPU at any one time. You can also use ps options to show the underlying threads and their CPU usage. See my answer here: Concurrency of posix threads in multiprocessor machine

Categories