I need to process 80 files of info, and I'm doing it via groups of 8 threads, what i would like to do is to to always have 8 threads running (right now i have 8 threads, and after those 8 finish their job, another 8 are generated, and so on).
So I Would like to know if there is a way to do his:
launch 8 threads.
after 1 thread finishes its job, launch another thread (so all the
time I have 8 threads running, until the job is done)
Why not use a thread pool, and in particular a fixed size thread pool ? Configure your thread pool size to be 8 threads, and then submit all your work items as Runnable/Callable objects. The thread pool will execute these using the 8 configured threads.
So, everyone is quick to jump in and tell you to use a thread pool. Sure, that's the right way to achieve what you want. The question is, is it the right thing to want? It's not as simple as throw a bunch of threads at the problem, and magically everything is solved.
You haven't told us the nature of the processing. Are the jobs I/O bound, or CPU bound1? If they are CPU bound, the threads do nothing. If they are I/O bound, the threading might help.
You haven't told us if you have eight cores (or compute units). If you can't guarantee that you'll have that, it might not be best to have eight threads running.
There's a lot to think about. You're increasing the complexity of your solution. Maybe it's getting you what you want, maybe not.
1: Yes, you said you're processing files, but that doesn't tell us enough. Maybe the processing is intensive (think: rendering a video file). Or maybe you're reading the files from a very fast disk (think: SSD or memory-mapped files).
Related
I just had a quick question on how processors and threads work. According to my current understanding, a core can only perform 1 process at a time. But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores? I am also able to run my 30 thread program on my local computer and also continue to perform other activities on my computer such as watch movies or browse the internet.
I have read somewhere that scheduling of threads occurs and that sort of gives the illusion that these 30 threads are running concurrently by the 4 cores. Is this true and if so can someone explain how this works and also recommend some good reading on this?
Thank you in advance for the help.
Processes vs Threads
In days of old, each process had precisely one thread of execution, so processes were scheduled onto cores directly (and in these old days, there was almost only one core to schedule onto). However, in operating systems that support threading (which is almost all moderns OS's), it is threads, not processes that are scheduled. So for the rest of this discussion we will talk exclusively about threads, and you should understand that each running process has one or more threads of execution.
Parallelism vs Concurrency
When two threads are running in parallel, they are both running at the same time. For example, if we have two threads, A and B, then their parallel execution would look like this:
CPU 1: A ------------------------->
CPU 2: B ------------------------->
When two threads are running concurrently, their execution overlaps. Overlapping can happen in one of two ways: either the threads are executing at the same time (i.e. in parallel, as above), or their executions are being interleaved on the processor, like so:
CPU 1: A -----------> B ----------> A -----------> B ---------->
So, for our purposes, parallelism can be thought of as a special case of concurrency*
Scheduling
But we are able to produce a thread pool(lets say 30) with a larger number than the number of cores that we posses(lets say 4) and have them run concurrently. How is this possible if we are only have 4 cores?
In this case, they can run concurrently because the CPU scheduler is giving each one of those 30 threads some share of CPU time. Some threads will be running in parallel (if you have 4 cores, then 4 threads will be running in parallel at any one time), but all 30 threads will be running concurrently. The reason you can then go play games or browse the web is that these new threads are added to the thread pool/queue and also given a share of CPU time.
Logical vs Physical Cores
According to my current understanding, a core can only perform 1 process at a time
This is not quite true. Due to very clever hardware design and pipelining that would be much too long to go into here (plus I don't understand it), it is possible for one physical core to actually be executing two completely different threads of execution at the same time. Chew over that sentence a bit if you need to -- it still blows my mind.
This amazing feat is called simultaneous multi-threading (or popularly Hyper-Threading, although that is a proprietary name for a specific instance of such technology). Thus, we have physical cores, which are the actual hardware CPU cores, and logical cores, which is the number of cores the operating system tells software is available for use. Logical cores are essentially an abstraction. In typical modern Intel CPUs, each physical core acts as two logical cores.
can someone explain how this works and also recommend some good reading on this?
I would recommend Operating System Concepts if you really want to understand how processes, threads, and scheduling all work together.
The precise meanings of the terms parallel and concurrent are hotly debated, even here in our very own stack overflow. What one means by these terms depends a lot on the application domain.
Java do not perform Thread scheduling, it leaves this on Operating System to perform Thread scheduling.
For computationally intensive tasks, It is recommended to have thread pool size equal to number of cores available. But for I/O bound tasks we should have larger number of threads. There are many other variations, if both type of tasks are available and needs CPU time slice.
a core can only perform 1 process at a time
Yes, but they can multitask and create an illusion that they are processing more than one process at a time
How is this possible if we are only have 4 cores? I am also able to
run my 30 thread program on my local computer and also continue to
perform other activities on my computer
This is possible due to multitasking (which is concurrency). Lets say you started 30 threads and OS is also running 50 threads, all 80 threads will share 4 CPU cores by getting CPU time slice one by one (one thread per core at a time). Which means on average each core will run 80/4=20 threads concurrently. And you will feel all threads/processes are running at the same time.
can someone explain how this works
All of this happens at OS level. If you are a programmer then you should not worry about this. But if you are a student of OS then pick any OS book & learn more about Multi-threading at OS level in detail or find some good research paper for depth. One thing you should know that each OS handle these things in different way (but generally concepts are same)
There are some languages like Erlang, which use green threads (or processes), due to which they get the ability to map and schedule threads on their own eliminating OS. So, do some research on green threads as well if you are interested.
Note: You can also research on actors which is another abstraction over threads. Languages like Erlang, Scala etc use actors to accomplish tasks. One thread can have hundred of actors; each actor can perform different task (similar to threads in java).
This is a very vast and active research topic and there are many things to learn.
In short, your understanding of a core is correct. A core can execute 1 thread (aka process) at a time.
However, your program doesn't really run 30 threads at once. Of those 30 threads, only 4 are running at a time, and the other 26 are waiting. The CPU will schedule threads and give each thread a slice of time to run on a core. So the CPU will make all the threads take turns running.
A common misconception:
Having more threads will make my program run faster.
FALSE: Having more threads will NOT always make your program run faster. It just means the CPU has to do more switching, and in fact, having too many threads will make your program run slower because of the overhead caused by switching out all the different processes.
I have an application which just uses ExecutorService.newFixedThreadPool(), and everything runs fine on our development machines (multicore Intels mostly, also runs fine on a 6 core AMD). But when we run it on our server (Opteron CPUs, 64 cores total) and the thread pool is limited to, say, 4 threads, sporadically something weird happens and the program starts using 48 cores.
There is nothing but a main thread and this ExecutorService which should be limited to N threads, so there should be no more than N+1(main)+X(some java services) threads, but definitely not 48+.
Any suggestions on what might be causing this behavior are highly appreciated.
I'm not posting any code here, because we were not able to reproduce this in any other environment, than this server and there's nothing special about the code. It's just the fixed thread pool, on which Callables are run in batches (each batch no more than the size of thread pool) and the results are collected from Futures before submitting the next batch of tasks.
Looks like you're using a parallel garbage collector. See here: http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2009-January/000718.html
From that answer, it looks like you'll have 40 threads of GC, plus your application threads. So that's probably what's happening.
Check this out: http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html -- in particular set XX:ParallelGCThreads=n
If it helps ... I had this exact same thing happen to me, except the container monitor killed my process for excess thread usage. Oh HP-UX, how I (don't) miss you.
I have a program which runs (all day) tasks in parallel (no I/O in the task to be executed) so I have used Executors.newFixedThreadPool(poolSize) to implement it.
Initially I set the poolSize to Runtime.getRuntime().availableProcessors(), but I was a bit worried to use all the available cores since there are other processes running on the same PC (32 cores).
In particular I have ten other JVM running the same program (on different input data), so I'm a bit worried that there might be a lot of overhead in terms of threads switching amongst the available cores, which could slow down the overall calculations.
How shall I decide the size of the pool for each program / JVM?
Also, in my PC, there are other processes running all the time (Antivirus, Backup, etc.). Shall I take into account these as well?
Any advice is going to be dependent upon your particular circumstances. 10 JVMs on 32 cores would suggest 3 threads each (ignoring garbage collection threads, timer tasks etc...)
You also have other tasks running. The scheduler will ensure they're running, but do they have to be responsive ? More responsive than the JVM ? If you're running Linux/Unix then you can also make use of prioritisation (via nice) to ensure particular processes don't hog the CPU.
Finally you're running 10 JVMs. Will that cause paging ? If so, that will be slow and you may be better off running fewer JVMs in order to avoid consuming so much memory.
Just make sure that your key variables are exposed and configurable, and measure various scenarios in order to find the optimal one.
How shall I decide the size of the pool for each program / JVM?
You want the number of threads which will get you close to 99% utilisation and no more.
The simplest way to balance the work is to have the process running once, processing multiple files at concurrently and using just one thread pool. You can set up you process as a service if you need to start files via the command line.
If this is impossible for some reason, you will need to guesstimate how much the thread pools should be shrunk by. Try running one process and look at the utilisation. If one is say 40% then I suspect ten processes is over utilised by 400%. i.e then you might reduce the pool size by a factor of 4.
Unfortunately, this is a hard thing to know, as programs don't typically know what else is or might be going on on the same box.
the "easy" way out is to make the pool size configurable. this allows the user who controls the program/box to decide how many threads to allocate to your program (presumably using their knowledge of the general workload of the box).
a more complex solution would be to attempt to programmatically determine the current workload of the box and choose the pool size appropriately from that. the efficacy of this solution depends on how accurately you can determine the workload and potentially adapt as it changes over time.
Try grepping the processes, check top/task manager and performance monitors to verify if this implementation is actually affecting your machine.
This article seems to contain interesting info about what you are trying to implement:
http://www.ibm.com/developerworks/library/j-jtp0730/index.html
I have made an OS simulator for a project, for the next part I have to modify the simulator to include 4 CPUs in place of only 1 CPU from the previous part, so that all the processes get done in a faster time.
So I have to add concurrency but I am not sure what the right design pattern is for this kind of thing. I need to know if the following will actually give me a speed up.
CPU extends Thread
//in main
get process1
get process 2
get process 3
get process 4
cpu1.run(process1)
cpu2.run(process2)
cpu3.run(process3)
cpu4.run(process4)
am I right in assuming that because cpus are extending thread they will all run concurrently for finish the 4 processes or will it be just like running the 4 processes on a single CPU?
By the nature of the question, this is a class project and that your representation of a cpu is relatively simple. For example, just runs a series of instructions like thread class. If however, you are trying to emulate real world CPUs and microprocessors, we need to know more about the cpu features: scheduling, event handling and other low level aspects that are normally hidden.
But, in the simple case, the answer is generally yes.
Note, depending on the tasks in those processes and the CPU you run this code on, you may see different behaviors because of how the CPU and JVM are actually implementing threads. But, I think in your case it isn't relevant.
It depends mainly on 3 things:
The kind of operations involved in threads. Do they share variables? Do they need synchronization between themselves or are they completely indipendent?
The environment in which they are executed. You inserted Java tag but it's not clear if you want to give your simulator the ability to schedule processes on multi (real) cpus or just use more than one core. By the way if you plan to use more real CPUs you have to avoid green threads (wikipedia).
If you want to use multiple real cores you have to care about the structure of the CPU too. Which kind of cache do they share? And so on..
Do you have anykind of simulated scheduler inside your OS?
Time slicing says that you will have to divide the single CPU among the N threads that you decide to start. There won't be any parallelism.
In you example each CPU is able to run a process concurrently, so if you just have 4 processes you're doing good.
If you want your program to work also for the case where there are more processes than CPUs you need something more complex. In that case I would recommend you take a look at the Java concurrency framework.
For the simplest solution when you have more than 4 processes that you want to run I would use ExecutorService.newFixedThreadPool(4), and add each process (as a Callable) to the resulting thread-pool using either invokeAll() or submit().
BUT this does not give you concurrency between all running processes (it will only pick up the 5th process when one of the first 4 processes has completed). If you want your program to act as a real multi-threaded OS (where more processes can be active than CPUs available) you'll have to add some sort of scheduler that can assign a part of a process on one of the available CPUs, then (before the first is done) let another process use that same CPU for a part of its work, etc.. So it will have to allow for part of processes to be done, then do part of one or more other processes then let the first do a bit more of its work, etc until all processes are done.
Your simulator then also needs some way to decide when a process can be 'paused' (i.e. put aside by the scheduler to be picked up later)...
I am working on a bittorrent client. While communicating with the peers the easiest way for me to communicate with them is to spawn a new thread for each one of them. But if the user wants to keep connections with large number of peers that my cause me to spawn a lot of threads.
Another solution i thought of is have one thread to iterate through peer objects and run them for e period.
I checked other libraries mostly in ruby( mine is in java ) and they spawn one thread for each new peer. Do you think spawning one thread will degrade performence if user sets the number of connections to a high number like 100 or 200?
It shouldn't be a problem unless you're running thousands of threads. I'd look into a compromise, using a threadpool. You can detect the number of CPUs at runtime and decide how many threads to spin up based on that, and then hand out work to the threadpool as it comes along.
You can avoid the problem altogether by using Non-blocking IO (java.nio.*).
I'd recommend using an Executor to keep the number of threads pooled.
Executors.newFixedThreadPool(numberOfThreads);
With this, you can basically add "tasks" to the pool and they will complete as soon as threads become available. This way, you're not exhausting all of the enduser's computer's threads and still getting a lot done at the same time. If you set it to like 16, you'd be pretty safe, though you could always allow the user to change this number if they wanted to.
No.....
Once I had this very same doubt and created a .net app (4 years ago) with 400 threads....
Provided they don't do a lot of work, with a decent machine you should be fine...
A few hundred threads is not a problem for most workstation-class machines, and is simpler to code.
However, if you are interested in pursuing your idea, you can use the non-blocking IO features provided by Java's NIO packages. Jean-Francois Arcand's blog contains a lot of good tips learned from creating the Grizzly connector for Glassfish.
Well in 32bit Windows for example there is actually a maximum number of native Threads you can create (2 Gigs / (number of Threads * ThreadStackSize (default is 2MB)) or something like that). So with too many connections you simply might run out of Virtual Memory address space.
I think a compromise might work: Use a Thread Pool with e.g. 10 Threads (depending on the machine) running and Distribute the connections evenly. Inside the Thread loop through the peers assigned to this Thread. And limit the maximum number of connections.
Use a thread pool and you should be safe with a fairly large pool size (100 or so). CPU will not be a problem since you are IO bound with this type of application.
You can easily make the pools size configurable and put in a reasonable maximum, just to prevent memory related issues with all the threads. Of course that should only occur if all the threads are actually being used.