Java Scheduler Thread Switching - java

I've read somewhere that in Java scheduler, thread switching happens after execution of certain amount of instructions and not after a certain time (like schedulers used in operating systems). But the references were missing. I wanted to know if this is correct.

Java used to have a feature called GreenThreads, It was removed in 1.3. For all practical purposes we can assume that thread scheduling is directly influenced by the underlying operating systems's process/thread scheduling strategy. In this context, developers need to assume that threads are executed/scheduled randomly and should code/treat them as such.

On Linux the scheduling of Java threads is done using the Completely Fair Scheduler (CFS). For each CPU there is a run-queue and by default there is 24 ms in which all threads on a single-run queue should have a chance to run. So if there are 2 threads, each thread gets 12 ms, if there are 3 tasks, each thread gets 8 ms.
When tasks have a different priority, things chance. And also there is a minimum granularity to prevent reduce context switching overhead of many threads are running.

Related

Web application - multithreading

Tell me, please, is there a guarantee that one thread will perform one task and will not switch to another during the execution of the current one? (I.e. pseudo-parallelism). That is, until this task is completed, the thread will not be released. At the same time, the system has two cores.
Also interested in why Tomcat has 200 threads by default if we have 2 cores (and *2 => 4 threads). What's the difference? With 4 threads, it will not be possible to ensure optimal operation of more than 1000 users...?
You're mixing two contexts: One thread might well be interrupted any time - or just wait for I/O to complete. During this time, the CPU core might execute another thread (or totally different process).
From within the thread, you'll always be in the same context, no switching appears. Other than executions possibly taking more or less time, the thread itself does not notice that it's interrupted. A thread is an abstraction for developing software, so that you're decoupled from context switches on the CPU.
Possible interruptions also explain why Webservers typically allocate a lot more threads for simultaneous executions than a machine has cores.
My expectation is that a webserver's default settings do not quite provide optimal resource allocations, but for most intents and purposes are a reasonable default, from which you might want to divert based on the characteristics of your application. e.g. if you're doing heavy CPU work without any I/O, your optimal setting will differ from an application with heavy I/O and minor CPU usage.
So, for application development purposes: A thread will execute exactly one task - interruption doesn't matter as it's transparent to the executed code. You might want to measure and tune your system and set optimal values rather than the default - but most of the times the webserver's default values are a better starting point than the number of your CPU cores.

If thread scheduler runs one thread at a time then how does java provide multi-threading?

Below is what I have seen on java t point about thread scheduler.
Thread Scheduler in Java
Thread scheduler in java is the part of the JVM that decides which thread should run.
There is no guarantee that which runnable thread will be chosen to run by the thread scheduler.
Only one thread at a time can run in a single process.
The thread scheduler mainly uses preemptive or time slicing scheduling to schedule the threads.
The source you are quoting sounds somehow outdated. In present day JVM implementations, Java uses underlying operating system threads.
In other words: a Java thread is mapped by the JVM to a thread provided by the operating system. See here for example.
In that sense, the "true" multi-threading capabilities of your system very much depend on the JVM implementation/version and the OS type/version; and even the capabilities of the underlying hardware. But rest assured: on a current day system, Java is able to run many threads in "real parallel".
In your context - Java used "green threads" at some point (and then Java would manage the threads) - but that is computer science history (many many years in the past). See here for example.
Thread scheduler in java is the part of the JVM that decides which thread should run.
Close, but... Most JVMs that anybody uses for real work in the last couple of decades use "native threading". That means, that scheduling is the operating system's job. The JVM does not get involved.
There is no guarantee [of] which runnable thread will [next] be chosen to run by the thread scheduler.
True but..., The operating system implements a thread scheduling policy (It may be capable of implementing any of several policies, to be chosen by the system administrator). If you understand the policy, then you could, in principle, know which thread will be scheduled next.
But true, because the Java Language Specification does not say anything about thread scheduling policies, and if you're writing a "run anywhere" Java program, then you should not depend on any particular policy.
Only one thread at a time can run in a single process.
False, but... It was true, maybe thirty years ago. Virtually all computers, tablets, and cell phones these days have more than one processor. At any given instant, there can be one thread running on each processor.
The thread scheduler mainly uses preemptive or time slicing scheduling to schedule the threads.
Yes. The thread scheduler on any given OS doesn't mainly do whatever, it implements a policy. On most computers, tablets, and cell phones the policy choices all will be preemptive. That means that a thread could be be de-scheduled at any point in time (e.g., half way through an i++; statement).
The alternative to preemptive multi-tasking is called cooperative multitasking, in which a thread can lose its turn on the processor only at certain, well defined yield points. That's what Thread.yield() is for, but I don't know where you are ever going to find Java running in a cooperative multitasking environment.
You can run more than one thread in a process at the same time if your machine supports that.
Information you have is very obsolete. What you describe was valid for OS when each process was effectviely a single thread (Windows 3.1, or OLD unix kernels, Macintosh system 7). That time JVM running as a single process had to implement its own thread scheduler and thread management.
Today all common platforms support natively multithreading and by default JVM uses the underlying system implementation

Java Multi Threading in multi cpu cores

Does Java threads runs in parallel on Multi core Processor i.e, runs multiple threads at the same time?
[Parallel processing with Java Threads]
volatile is useful when you want to prevent your resource from being cached by Threads
Multiple threads can run on single CPU (though, one at a time) and can share resources, So volatile is still useful.
JVM does not decide the number of processors to be used. It is the job of OS. JVM has the capability of creating multiple threads and submits them.
Volatile is used to guarantee the data is not being fetched from CPU cache during concurrency.
First thing it's the JVM who spawns the threads but it's the hardware whom JVM depends on. If it has multi core, JVM can run multiple threads at the same time to extract max performance.
Now when it comes to user(You) decide to what extent you want to exploit the CPU resources and you do this through thread pools.(By defining max number of threads can run in parallel) but yet again you stuck up with your hardware configuration.

Java threads and number of cores

I just had a quick question on how processors and threads work. According to my current understanding, a core can only perform 1 process at a time. But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores? I am also able to run my 30 thread program on my local computer and also continue to perform other activities on my computer such as watch movies or browse the internet.
I have read somewhere that scheduling of threads occurs and that sort of gives the illusion that these 30 threads are running concurrently by the 4 cores. Is this true and if so can someone explain how this works and also recommend some good reading on this?
Thank you in advance for the help.
Processes vs Threads
In days of old, each process had precisely one thread of execution, so processes were scheduled onto cores directly (and in these old days, there was almost only one core to schedule onto). However, in operating systems that support threading (which is almost all moderns OS's), it is threads, not processes that are scheduled. So for the rest of this discussion we will talk exclusively about threads, and you should understand that each running process has one or more threads of execution.
Parallelism vs Concurrency
When two threads are running in parallel, they are both running at the same time. For example, if we have two threads, A and B, then their parallel execution would look like this:
CPU 1: A ------------------------->
CPU 2: B ------------------------->
When two threads are running concurrently, their execution overlaps. Overlapping can happen in one of two ways: either the threads are executing at the same time (i.e. in parallel, as above), or their executions are being interleaved on the processor, like so:
CPU 1: A -----------> B ----------> A -----------> B ---------->
So, for our purposes, parallelism can be thought of as a special case of concurrency*
Scheduling
But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores?
In this case, they can run concurrently because the CPU scheduler is giving each one of those 30 threads some share of CPU time. Some threads will be running in parallel (if you have 4 cores, then 4 threads will be running in parallel at any one time), but all 30 threads will be running concurrently. The reason you can then go play games or browse the web is that these new threads are added to the thread pool/queue and also given a share of CPU time.
Logical vs Physical Cores
According to my current understanding, a core can only perform 1 process at a time
This is not quite true. Due to very clever hardware design and pipelining that would be much too long to go into here (plus I don't understand it), it is possible for one physical core to actually be executing two completely different threads of execution at the same time. Chew over that sentence a bit if you need to -- it still blows my mind.
This amazing feat is called simultaneous multi-threading (or popularly Hyper-Threading, although that is a proprietary name for a specific instance of such technology). Thus, we have physical cores, which are the actual hardware CPU cores, and logical cores, which is the number of cores the operating system tells software is available for use. Logical cores are essentially an abstraction. In typical modern Intel CPUs, each physical core acts as two logical cores.
can someone explain how this works and also recommend some good reading on this?
I would recommend Operating System Concepts if you really want to understand how processes, threads, and scheduling all work together.
The precise meanings of the terms parallel and concurrent are hotly debated, even here in our very own stack overflow. What one means by these terms depends a lot on the application domain.
Java do not perform Thread scheduling, it leaves this on Operating System to perform Thread scheduling.
For computationally intensive tasks, It is recommended to have thread pool size equal to number of cores available. But for I/O bound tasks we should have larger number of threads. There are many other variations, if both type of tasks are available and needs CPU time slice.
a core can only perform 1 process at a time
Yes, but they can multitask and create an illusion that they are processing more than one process at a time
How is this possible if we are only have 4 cores? I am also able to
run my 30 thread program on my local computer and also continue to
perform other activities on my computer
This is possible due to multitasking (which is concurrency). Lets say you started 30 threads and OS is also running 50 threads, all 80 threads will share 4 CPU cores by getting CPU time slice one by one (one thread per core at a time). Which means on average each core will run 80/4=20 threads concurrently. And you will feel all threads/processes are running at the same time.
can someone explain how this works
All of this happens at OS level. If you are a programmer then you should not worry about this. But if you are a student of OS then pick any OS book & learn more about Multi-threading at OS level in detail or find some good research paper for depth. One thing you should know that each OS handle these things in different way (but generally concepts are same)
There are some languages like Erlang, which use green threads (or processes), due to which they get the ability to map and schedule threads on their own eliminating OS. So, do some research on green threads as well if you are interested.
Note: You can also research on actors which is another abstraction over threads. Languages like Erlang, Scala etc use actors to accomplish tasks. One thread can have hundred of actors; each actor can perform different task (similar to threads in java).
This is a very vast and active research topic and there are many things to learn.
In short, your understanding of a core is correct. A core can execute 1 thread (aka process) at a time.
However, your program doesn't really run 30 threads at once. Of those 30 threads, only 4 are running at a time, and the other 26 are waiting. The CPU will schedule threads and give each thread a slice of time to run on a core. So the CPU will make all the threads take turns running.
A common misconception:
Having more threads will make my program run faster.
FALSE: Having more threads will NOT always make your program run faster. It just means the CPU has to do more switching, and in fact, having too many threads will make your program run slower because of the overhead caused by switching out all the different processes.

A question about Thread and Process

I read some tutorial about threads and processes, it is said that the processes be scheduled by operating system kernel, and the threads can be managed and scheduled in a user mode.
I do not understand the saying "threads can be managed and scheduled in a user mode",
for example: the producer and consumer problem? is this a example for "scheduled in a user mode"? or can anyone explain me?
Not sure what tutorial you're looking at, but there are two ways that threads can be scheduled.
The first is user-mode scheduling, which basically mean that one process, using Green threads or perhaps fibers, schedules different threads to run without involving the operating system in its decision. This can be more portable across operating systems, but usually doesn't allow you to take advantage of multiple processors.
The second is kernel scheduling, which means that the various threads are visible to the kernel and are scheduled by it, possibly simultaneously on different processors. This can make thread creation and scheduling more expensive, however.
So it doesn't really depend on the problem that you are trying to solve. User-mode just means that the scheduling of threads happens without involving the operating system. Some early Java versions used Green/user-mode threads, but I believe most now use native/kernel threads.
EDIT:
Coding Horror has a nice overview of the difference between user and kernel mode.
Get a better tutorial? The official Java tutorials are quite good, and contain a lesson on concurrency, that also defines what process and thread mean.
PS: Whether threads are managed/scheduled in user mode is an implementation detail of the Java Virtual Machine that generally need not concern the application programmer.
Scheduled in user mode means you have control over the threads of your software but they are managed by the operating system kernel. So yes, the producer consumer problem is an example you normally handle yourself (but it is not directly related to user mode scheduling) by having two threads, a producer thread and a consumer thread. Both threads access the same shared recource. This resource has to be thread-safe, this means you have to make sure the shared resource does not get corrupted becouse both threads access it at the same time. Thread safety can either be guaranteed by using thread-safe data types or by manually locking or synchronizing your resource.
However, even if you have some control over your threads e.g. starting threads, stopping threads, make threads sleep etc. you do not have full control. The operating system is still managing which threads are allowed cpu time etc.

Categories