How to scale threads according to CPU cores?

How to scale threads according to CPU cores? - java

I want to solve a mathematical problem with multiple threads in Java. my math problem can be separated into work units, that I want to have solved in several threads.
I don't want to have a fixed amount of threads working on it, but instead an amount of threads matching the amount of CPU cores. My problem is, that I couldn't find an easy tutorial in the internet for this. All I found are examples with fixed threads.
How can this be done? Can you provide examples?

You can determine the number of processes available to the Java Virtual Machine by using the static Runtime method, availableProcessors. Once you have determined the number of processors available, create that number of threads and split up your work accordingly.
Update: To further clarify, a Thread is just an Object in Java, so you can create it just like you would create any other object. So, let's say that you call the above method and find that it returns 2 processors. Awesome. Now, you can create a loop that generates a new Thread, and splits the work off for that thread, and fires off the thread. Here's some pseudocode to demonstrate what I mean:
int processors = Runtime.getRuntime().availableProcessors();
for(int i=0; i < processors; i++) {
Thread yourThread = new AThreadYouCreated();
// You may need to pass in parameters depending on what work you are doing and how you setup your thread.
yourThread.start();
}
For more information on creating your own thread, head to this tutorial. Also, you may want to look at Thread Pooling for the creation of the threads.

You probably want to look at the java.util.concurrent framework for this stuff too.
Something like:
ExecutorService e = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
// Do work using something like either
e.execute(new Runnable() {
public void run() {
// do one task
}
});
or
Future<String> future = pool.submit(new Callable<String>() {
public String call() throws Exception {
return null;
}
});
future.get(); // Will block till result available
This is a lot nicer than coping with your own thread pools etc.

Option 1:
newWorkStealingPool from Executors
public static ExecutorService newWorkStealingPool()
Creates a work-stealing thread pool using all available processors as its target parallelism level.
With this API, you don't need to pass number of cores to ExecutorService.
Implementation of this API from grepcode
/**
* Creates a work-stealing thread pool using all
* {#link Runtime#availableProcessors available processors}
* as its target parallelism level.
* #return the newly created thread pool
* #see #newWorkStealingPool(int)
* #since 1.8
*/
public static ExecutorService newWorkStealingPool() {
return new ForkJoinPool
(Runtime.getRuntime().availableProcessors(),
ForkJoinPool.defaultForkJoinWorkerThreadFactory,
null, true);
}
Option 2:
newFixedThreadPool API from Executors or other newXXX constructors, which returns ExecutorService
public static ExecutorService newFixedThreadPool(int nThreads)
replace nThreads with Runtime.getRuntime().availableProcessors()
Option 3:
ThreadPoolExecutor
public ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue<Runnable> workQueue)
pass Runtime.getRuntime().availableProcessors() as parameter to maximumPoolSize.

Doug Lea (author of the concurrent package) has this paper which may be relevant:
http://gee.cs.oswego.edu/dl/papers/fj.pdf
The Fork Join framework has been added to Java SE 7. Below are few more references:
http://www.ibm.com/developerworks/java/library/j-jtp11137/index.html
Article by Brian Goetz
http://www.oracle.com/technetwork/articles/java/fork-join-422606.html

The standard way is the Runtime.getRuntime().availableProcessors() method.
On most standard CPUs you will have returned the optimal thread count (which is not the actual CPU core count) here. Therefore this is what you are looking for.
Example:
ExecutorService service = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
Do NOT forget to shut down the executor service like this (or your program won't exit):
service.shutdown();
Here just a quick outline how to set up a future based MT code (offtopic, for illustration):
CompletionService<YourCallableImplementor> completionService =
new ExecutorCompletionService<YourCallableImplementor>(service);
ArrayList<Future<YourCallableImplementor>> futures = new ArrayList<Future<YourCallableImplementor>>();
for (String computeMe : elementsToCompute) {
futures.add(completionService.submit(new YourCallableImplementor(computeMe)));
}
Then you need to keep track on how many results you expect and retrieve them like this:
try {
int received = 0;
while (received < elementsToCompute.size()) {
Future<YourCallableImplementor> resultFuture = completionService.take();
YourCallableImplementor result = resultFuture.get();
received++;
}
} finally {
service.shutdown();
}

On the Runtime class, there is a method called availableProcessors(). You can use that to figure out how many CPUs you have. Since your program is CPU bound, you would probably want to have (at most) one thread per available CPU.

Related

Limiting the Q size on Java's 1.7 ForkJoinPool

We're doing some tasks that require external I/O, and are recursive. To accomplish this, we're prototyping switching from the old ExecutorService to the ForkJoinPool. Our degree of parallelism will obviously be higher than our number of cores, since we'll spend most of our thread's time in I/O wait. We only have synchronous network API's so we don't have any other options here.
In the old ExecutorService, you could reject tasks so that they don't pile up by setting the queue size. In the ForkJoinPool, this doesn't seem possible, and it seems it expands to this value in the Oracle 1.7 implementation.
/**
* Maximum size for submission queue array. Must be a power of two
* less than or equal to 1 << (31 - width of array entry) to
* ensure lack of index wraparound, but is capped at a lower
* value to help users trap runaway computations.
*/
private static final int MAXIMUM_QUEUE_CAPACITY = 1 << 24; // 16M
This is a significantly larger queue that we want. Is there a fork/join pool implementation with the following features?
1) The ability to provide a facility to name threads as they're created? We have a couple of I/O worker pools, and it's useful for debugging to see which pool created and owns the thread.
2) Provide the ability to set the max queue size. We actually want it to be 0 when scheduling the parent tasks. If there is no capacity in the scheduler and the parent task is started, we want it to run in the thread calling submit and attempting to schedule. This will give us an auto throttle mechanism by slowing the caller down.
Thanks,
Todd

you can use the third constructor:
ForkJoinPool(int parallelism, ForkJoinPool.ForkJoinWorkerThreadFactory factory, Thread.UncaughtExceptionHandler handler, boolean asyncMode)
No. But the nice thing about open-source software is that you can create your own copy and modify it any way you want.

Regarding your first query:
You can implement ThreadFactory as below
class SimpleThreadFactory implements ThreadFactory {
String name;
public SimpleThreadFactory (String name){
this.name = name;
}
public Thread newThread(Runnable r) {
return new Thread(r,name);
}
}
And you can pass this Factory in ForkJoinPool constructor
ForkJoinPool(int parallelism,
ForkJoinPool.ForkJoinWorkerThreadFactory factory,
Thread.UncaughtExceptionHandler handler,
boolean asyncMode)
Regarding your second query:
ForkJoinPool does not provide flexibility to control worker queue size.

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Our application requires all worker threads to synchronize at a defined point. For this we use a CyclicBarrier, but it does not seem to scale well. With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
EDIT: Synchronization happens very frequently, in the order of 100k to 1M times.
If synchronization of many threads is "hard", would it help building a synchronization tree? Thread 1 waits for 2 and 3, which in turn wait for 4+5 and 6+7, respectively, etc.; after finishing, threads 2 and 3 wait for thread 1, thread 4 and 5 wait for thread 2, etc..
1
| \
2 3
|\ |\
4 5 6 7
Would such a setup reduce synchronization overhead? I'd appreciate any advice.
See also this featured question: What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

With more than eight threads, the synchronization overhead seems to outweigh the benefits of multithreading. (However, I cannot support this with measurement data.)
Honestly, there's your problem right there. Figure out a performance benchmark and prove that this is the problem, or risk spending hours / days solving the entirely wrong problem.

You are thinking about the problem in a subtly wrong way that tends to lead to very bad coding. You don't want to wait for threads, you want to wait for work to be completed.
Probably the most efficient way is a shared, waitable counter. When you make new work, increment the counter and signal the counter. When you complete work, decrement the counter. If there is no work to do, wait on the counter. If you drop the counter to zero, check if you can make new work.

If I understand correctly, you're trying to break your solution up into parts and solve them separately, but concurrently, right? Then have your current thread wait for those tasks? You want to use something like a fork/join pattern.
List<CustomThread> threads = new ArrayList<CustomThread>();
for (Something something : somethings) {
threads.add(new CustomThread(something));
}
for (CustomThread thread : threads) {
thread.start();
}
for (CustomThread thread : threads) {
thread.join(); // Blocks until thread is complete
}
List<Result> results = new ArrayList<Result>();
for (CustomThread thread : threads) {
results.add(thread.getResult());
}
// do something with results.
In Java 7, there's even further support via a fork/join pool. See ForkJoinPool and its trail, and use Google to find one of many other tutorials.
You can recurse on this concept to get the tree you want, just have the threads you create generate more threads in the exact same way.
Edit: I was under the impression that you wouldn't be creating that many threads, so this is better for your scenario. The example won't be horribly short, but it goes along the same vein as the discussion you're having in the other answer, that you can wait on jobs, not threads.
First, you need a Callable for your sub-jobs that takes an Input and returns a Result:
public class SubJob implements Callable<Result> {
private final Input input;
public MyCallable(Input input) {
this.input = input;
}
public Result call() {
// Actually process input here and return a result
return JobWorker.processInput(input);
}
}
Then to use it, create an ExecutorService with a fix-sized thread pool. This will limit the number of jobs you're running concurrently so you don't accidentally thread-bomb your system. Here's your main job:
public class MainJob extends Thread {
// Adjust the pool to the appropriate number of concurrent
// threads you want running at the same time
private static final ExecutorService pool = Executors.newFixedThreadPool(30);
private final List<Input> inputs;
public MainJob(List<Input> inputs) {
super("MainJob")
this.inputs = new ArrayList<Input>(inputs);
}
public void run() {
CompletionService<Result> compService = new ExecutorCompletionService(pool);
List<Result> results = new ArrayList<Result>();
int submittedJobs = inputs.size();
for (Input input : inputs) {
// Starts the job when a thread is available
compService.submit(new SubJob(input));
}
for (int i = 0; i < submittedJobs; i++) {
// Blocks until a job is completed
results.add(compService.take())
}
// Do something with results
}
}
This will allow you to reuse threads instead of generating a bunch of new ones every time you want to run a job. The completion service will do the blocking while it waits for jobs to complete. Also note that the results list will be in order of completion.
You can also use Executors.newCachedThreadPool, which creates a pool with no upper limit (like using Integer.MAX_VALUE). It will reuse threads if one is available and create a new one if all the threads in the pool are running a job. This may be desirable later if you start encountering deadlocks (because there's so many jobs in the fixed thread pool waiting that sub jobs can't run and complete). This will at least limit the number of threads you're creating/destroying.
Lastly, you'll need to shutdown the ExecutorService manually, perhaps via a shutdown hook, or the threads that it contains will not allow the JVM to terminate.
Hope that helps/makes sense.

If you have a generation task (like the example of processing columns of a matrix) then you may be stuck with a CyclicBarrier. That is to say, if every single piece of work for generation 1 must be done in order to process any work for generation 2, then the best you can do is to wait for that condition to be met.
If there are thousands of tasks in each generation, then it may be better to submit all of those tasks to an ExecutorService (ExecutorService.invokeAll) and simply wait for the results to return before proceeding to the next step. The advantage of doing this is eliminating context switching and wasted time/memory from allocating hundreds of threads when the physical CPU is bounded.
If your tasks are not generational but instead more of a tree-like structure in which only a subset need to be complete before the next step can occur on that subset, then you might want to consider a ForkJoinPool and you don't need Java 7 to do that. You can get a reference implementation for Java 6. This would be found under whatever JSR introduced the ForkJoinPool library code.
I also have another answer which provides a rough implementation in Java 6:
public class Fib implements Callable<Integer> {
int n;
Executor exec;
Fib(final int n, final Executor exec) {
this.n = n;
this.exec = exec;
}
/**
* {#inheritDoc}
*/
#Override
public Integer call() throws Exception {
if (n == 0 || n == 1) {
return n;
}
//Divide the problem
final Fib n1 = new Fib(n - 1, exec);
final Fib n2 = new Fib(n - 2, exec);
//FutureTask only allows run to complete once
final FutureTask<Integer> n2Task = new FutureTask<Integer>(n2);
//Ask the Executor for help
exec.execute(n2Task);
//Do half the work ourselves
final int partialResult = n1.call();
//Do the other half of the work if the Executor hasn't
n2Task.run();
//Return the combined result
return partialResult + n2Task.get();
}
}
Keep in mind that if you have divided the tasks up too much and the unit of work being done by each thread is too small, there will negative performance impacts. For example, the above code is a terribly slow way to solve Fibonacci.

Creating User specified number of threads in Android

I'm doing a download application for android. The downloading part is now successfully implemented and its working.
What I need is to download the file parallel in segments. To be more clear, if the user specify 8 segments, I want to create 8 Threads and do the downloading.
So in what way will I be able to create 8 threads dynamically? Also as I'm doing this for an phone how will I be able to maintain the memory consumption at a minimum level?
I have not worked with threads before, so I hope you can help me with this. Thank you for your time! :)

The most efficient way to create a fixed number of threads is to use the ExecutorService:
ExecutorService exec = Executors.newFixedThreadPool(8);
It's basically a fixed-size thread pool that takes a lot of the management burden from the developer.
Edit: So your flow should be something like this:
First, define your thread task class (each thread will execute the call method of its own task):
class ThreadTask implements Callable<Object> {
public Object call() {
// execute download
...
return result;
}
}
If you want to pass any parameters to the tasks, put some private fields in the class above and pass them through a constructor. Also, you can return any type from call, just change the type in the implements Callable<...> part.
When you want to fire off the threads, create the pool and submit the tasks:
ExecutorService exec = Executors.newFixedThreadPool(8);
List<Future<Object>> results = new ArrayList<Future<Object>>();
// submit tasks
for(int i = 0; i < 8; i++) {
results.add(exec.submit(new ThreadTask()));
}
...
// stop the pool from accepting new tasks
exec.shutdown();
// wait for results
for(Future<Object> result: results) {
Object obj = result.get();
}

Take a look at ExecutorService, in particular Executors.newFixedThreadPool(int i), this is an excellent way to handle threads in a system friendly matter.

java-Executor Framework

Please look at my following code....
private static final int NTHREDS = 10;
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
while(rs.next()){
webLink=rs.getString(1);
FirstName=rs.getString(2);
MiddleName=rs.getString(3);
Runnable worker = new MyRunnable(webLink,FirstName,MiddleName);// this interface has run method....
executor.execute(worker);
}
//added
public class MyRunnable implements Runnable {
MyRunnable(String webLink,String FirstName,String MiddleName){
** Assigning Values...***
}
#Override
public void run() {
long sum = 0;
**Calling method to crawl by passing those Values**
try {
Thread.sleep(200);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
}
}
In this part if the resultset(rs) having 100 records excutor creating 100 threads..... I need to run this process with in 10 threads. I need your help to know how to get control of threads.. If any thread has completed its task then it should process the immediate available task from the Result Set. Is it possible to achieve using executor framework.
Thanks...
vijay365

The code you've already posted does this. Your code will not immediately spawn 100 threads. It will spawn 10 threads that consume tasks from a queue containing your Runnables.
From the Executors.newFixedThreadPool Javadocs:
Creates a thread pool that reuses a
fixed set of threads operating off a
shared unbounded queue.
Instead of using a static number of threads (10 in this case) you should determine the number dynamically:
final int NTHREADS = Runtime.getRuntime().availableProcessors();
Also, I don't get why you are calling Thread.sleep?

ResultSet is probably a JDBC query result.
This design is almost certain to be doomed to failure.
The JDBC interface implementations are not thread-safe.
ResultSets are scare resources that should be closed in the same scope in which they were created. If you pass them around, you're asking for trouble.
Multi-threaded code is hard to write well and even harder to debug if incorrect.
You are almost certainly headed in the wrong direction with this design. I'd bet a large sum of money that you're guilty of premature optimization. You are hoping that multiple threads will make your code faster, but what will happen is ten threads time slicing on one CPU and taking the same time or longer. (Context switching takes time, too.)
A slightly better idea would be to load the ResultSet into an object or collection, close the ResultSet, and then do some multi-threaded processing on that returned object.

Try executor.submit(worker);

Java: Best way to retrieve timings form multiple threads

We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul

Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}

Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.

The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.

You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to scale threads according to CPU cores? - java

On the Runtime class, there is a method called availableProcessors(). You can use that to figure out how many CPUs you have. Since your program is CPU bound, you would probably want to have (at most) one thread per available CPU.

Related

Limiting the Q size on Java's 1.7 ForkJoinPool

Weak performance of CyclicBarrier with many threads: Would a tree-like synchronization structure be an alternative?

Creating User specified number of threads in Android

java-Executor Framework

Java: Best way to retrieve timings form multiple threads

Categories

Resources