I want to launch a lot of tasks to run on a database of +-42Mio records. I want to run this in batches of 5000 records/time (results in 850 tasks).
I also want to limit the number of threads (to 16) java starts to do this for me and I am using the current code to accomplish this task:
ExecutorService executorService = Executors.newFixedThreadPool(16);
for (int j = 1; j < 900 + 1; j++) {
int start = (j - 1) * 5000;
int stop = (j) * 5000- 1;
FetcherRunner runner = new FetcherRunner(routes, start, stop);
executorService.submit(runner);
Thread t = new Thread(runner);
threadsList.add(t);
t.start();
}
Is this the correct way to do this? Particularly as I have the impression that java just fires away all tasks ...(FetcherRunner implements runnable)
The first part using ExecutorService looks good:
...
FetcherRunner runner = new FetcherRunner(routes, start, stop);
executorService.submit(runner);
The part with Thread should not be there, I am assuming you have it there just to show how you had it before?
Update:
Yes, you don't require the code after executorService.submit(runner), that is going to end up spawning a huge number of threads. If your objective is to wait for all submitted tasks to complete after the loop, then you can get a reference to Future when submitting tasks and wait on the Future, something like this:
ExecutorService executorService = Executors.newFixedThreadPool(16);
List<Future<Result>> futures = ..;
for (int j = 1; j < 900+ 1; j++) {
int start = (j - 1) * 5000;
int stop = (j) * 5000- 1;
FetcherRunner runner = new FetcherRunner(routes, start, stop);
futures.add(executorService.submit(runner));
}
for (Future<Result> future:futures){
future.get(); //Do something with the results..
}
Is this the correct way of working?
The first part is correct. But you shouldn't be creating and starting new Thread objects. When you submit the Runnable, the ExecutorService puts it on its queue, and then runs it when a worker thread becomes available.
.... I use the threadlist to detect when all my threads are finished so I can continue processing results.
Well if you do what you are currently doing, you are running each task twice. Worse still, the swarm of manually created threads will all try to run in parallel.
A simple way to make sure that all of the tasks have completed is to call awaitTermination(...) on the ExecutorService. (An orderly shutdown of the executor service will have the same effect ... if you don't intend to use it again.)
The other approach is to create a Future for each FetcherRunner's results, and attempt to get the result after all of the tasks have been submitted. That has the advantage that you can start processing early results before later ones have been produced. (However, if you don't need to ... or can't ... do that, using Futures won't achieve anything.)
You don't need to the part after the call to submit. The code you have that creates a Thread will result in 900 threads being created! Yowza. The ExecutorService has a pool of 16 threads and you can run 16 jobs at once. Any jobs submitted when all 16 threads are busy will be queued. From the docs:
Creates a thread pool that reuses a fixed number of threads operating
off a shared
unbounded queue. At any point, at most nThreads threads will be active processing tasks.
If additional tasks are submitted when all threads are active, they will wait in the
queue until a thread is available. If any thread terminates due to a failure during
execution prior to shutdown, a new one will take its place if needed to execute
subsequent tasks. The threads in the pool will exist until it is explicitly shutdown.
So there is no need for yet another thread. If you need to be notified after a task has finished you can have it call out. Other options are to cache all of the Future's returned from submit, and upon each task being finished you can check to see if all Future's are done. After all Future's are finished you can dispatch another function to run. But it will run ON one of the threads in the ExecutorService.
Changed from your code:
ExecutorService executorService = Executors.newFixedThreadPool(16);
for (int j = 1; j < 900 + 1; j++) {
int start = (j - 1) * 5000;
int stop = (j) * 5000 - 1;
FetcherRunner runner = new FetcherRunner(routes, start, stop);
executorService.submit(runner);
}
The best way would be to use countdownlatch as follows
ExecutorService executorService = Executors.newFixedThreadPool(16);
CountdownLatch latch = new CountdownLatch(900);
FetcherRunner runner = new FetcherRunner(routes, start, stop, latch);
latch.await();
in the FetcherRunner under finally block use latch.countDown(); code after await() will be executed only when all the tasks are completed.
Related
How can I asynchronously execute 20 Runnable tasks(or 1 task 20 times), using 5 CompletableFutures?
That's what I've got:
Runnable task = () -> {
long startTime = System.currentTimeMillis();
Random random = new Random();
while (System.currentTimeMillis() - startTime < 3000) {
DoubleStream.generate(() -> random.nextDouble())
.limit(random.nextInt(100))
.map(n -> Math.cos(n))
.sum();
}
System.out.println("Done");
};
for (int i = 0; i < 4; i++) {
CompletableFuture<Void> future1 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future2 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future3 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future4 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future5 = CompletableFuture.runAsync(task);
future1.get();
future2.get();
future3.get();
future4.get();
future5.get();
}
If I execute this code, I can see that it only runs 3 future.get() asynchronously:
3 and then 2 that's left during 1 for() iteration
So, I would like to do all 20 tasks, as asynchronously as possible
You can use allOf to run several tasks simultaneously as one. First I create a combined of 5 tasks (the same as in your question) but then I added 10 instead (and only loped twice) and got half the execution time.
for (int i = 0; i < 2; i++) {
CompletableFuture<Void> future1 = CompletableFuture.runAsync(task);
CompletableFuture<Void> future2 = CompletableFuture.runAsync(task);
// and so on until ten
CompletableFuture<Void> future10 = CompletableFuture.runAsync(task);
CompletableFuture<Void> combined = CompletableFuture.allOf(future1, future2, future3, future4, future5, future6, future7, future8, future9, future10);
combined.get();
}
The default executor of CompletableFuture is the common pool of the ForkJoinPool, which has a default target parallelism matching the number of CPU cores minus one. So if you have four cores, at most three jobs will get executed asynchronously. Since you are forcing a wait for completion every 5 jobs, you’ll get three parallel executions, followed by two parallel executions in every loop iteration.
If you want to get a particular execution strategy like parallelism of your choice, the best way is to specify a properly configured executor. Then, you should let the executor manage the parallelism instead of waiting in a loop.
ExecutorService pool = Executors.newFixedThreadPool(5);
for (int i = 0; i < 20; i++) {
CompletableFuture.runAsync(task, pool);
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS); // wait for the completion of all tasks
This allows five parallel jobs, but will let each of the five threads pick up a new job immediately after one completed, instead of waiting for the next loop iteration.
But when you say
So, I would like to do all 20 tasks, as asynchronously as possible
it’s not clear why you are enforcing a wait after scheduling five jobs at all. The maximum parallelism can be achieve via
ExecutorService pool = Executors.newCachedThreadPool();
for (int i = 0; i < 20; i++) {
CompletableFuture.runAsync(task, pool);
}
pool.shutdown();
pool.awaitTermination(1, TimeUnit.DAYS); // wait for the completion of all tasks
This may spawn as many threads as jobs, unless one job completes before all have been scheduled, as in this case the worker thread may pick up a new job.
But this logic doesn’t require a CompletableFuture at all. You can also use:
ExecutorService pool = Executors.newCachedThreadPool();
// schedule 20 jobs and return when all completed
pool.invokeAll(Collections.nCopies(20, Executors.callable(task)));
pool.shutdown();
But when your job does not involve I/O or any other kind of waiting resp. releasing the CPU, there is no point in creating more threads than CPU cores. A pool configured to the number of processors is preferable.
ExecutorService pool = Executors.newWorkStealingPool(
Runtime.getRuntime().availableProcessors());
// schedule 20 jobs at return when all completed
pool.invokeAll(Collections.nCopies(20, Executors.callable(task)));
pool.shutdown();
In your special case this likely runs slower as your jobs use the system time to appear running faster when having more threads than cores, but are actually doing less work then. But for ordinary computational task, this will improve the performance.
Set the following system property to the number of threads you want the common fork join pool to use:
java.util.concurrent.ForkJoinPool.common.parallelism
See ForkJoinPool
The reason being that you do not specify your own fork join pool when constructing your completable futures, so it implicitly uses
ForkJoinPool.commonPool()
See CompletableFurure
I have a fixed thread pool for scheduled job, now I just have only one scheduled job:
App.setSes(Executors.newScheduledThreadPool(1));
App.getSes().scheduleAtFixedRate(new SPCPollingTask(), 0, Integer.parseInt(SPCService.INSTANCE.getConfig("scheduleInterval")), TimeUnit.MINUTES);
And another thread pool for asynchronous task:
App.setAes(Executors.newFixedThreadPool(10, new ThreadFactory(){
#Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setPriority(Thread.MIN_PRIORITY);
return thread;
}
}));
In the SPCPollingTask, I submit asynchronous task to it like this:
if(total > 50){
int i = 51;
long c = System.currentTimeMillis();
do{
App.getAes().submit(new Async1225Task(bdhm, ccxh, accountNo, subAccountNo, beginDate, endDate, i));
i = i + 50;
}while(i <= total);
long n = System.currentTimeMillis();
ARE.getLog().debug("[perf-hint]--submit task to AES takes " + StringUtil.formatLong((n-c), "#") + " ms");
}
By doing this, I expect the scheduled task SPCPollingTask will always finish very quickly, but on my windows PC, it done at almost the end of the whole asynchronous task. I uploaded my code to a linux server and got better result but still not good enough.
How can I make sure my scheduled SPCPollingTask will always do it's job first, then asynchronous tasks can do their job?
I am running these on Tomcat 6 with JDK6.
The easiest way right now that would require the least tampering of your code is to add all the async tasks that generated in your scheduled job in a list, and then submit them all at the end of your scheduled jobs execution.
This is easily done:
At the beginning of your run() method of SPCPollingTask just add
List<Task> tasks = new ArrayList<Task>(80);
Replace all App.getAes().submit('any task goes here'); with
tasks.add('any task goes here');
At the end of the run method
App.getAes().invokeAll(tasks);
Bare in mind that invokeAll is ''blocking'', and so you should consider submitting all tasks in the list using a loop
I am trying to understand the ExecutorService in java. There is not much performance difference when I use 1 thread or 4 threads. I have a quad core CPU and I do not have any other process running.
ExecutorService exService = Executors.newFixedThreadPool(4);
exService.execute(new Test().new RunnableThread());
exService.awaitTermination(25, TimeUnit.SECONDS);
class RunnableThread implements Runnable {
#Override
public void run() {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
long cnt = 0;
for (cnt = 0; cnt < 999999999; cnt++) {
try {
for (long j = 0; j < 20; j++){
x += j;
}
} catch (Exception e) {
e.printStackTrace();
}
}
stopWatch.stop();
System.out.println(stopWatch.getTime());
}
}
If my understanding is right, my task should have close to 4x performance improvement when I say newFixedThreadPool(4) right?
Unfortunately, there is no magic in the allocation of workload to threads.
Every task runs on its own thread. It does not somehow automatically get transformed into concurrent execution paths.
If you have only one task, the remaining three threads will be idle.
Multiple threads only speed up things if you can split your workload into multiple tasks that can run concurrently (and you have to do that splitting yourself).
If my understanding is right, my task should have close to 4x performance improvement when I say newFixedThreadPool(4) right?
Yes, if you're actually running 4 concurrent tasks.
Currently, you have a single task that you are submitting to the executor. Let's say that it takes 10 seconds. Even if you have 4 cores and 4 threads, Java will not be able to parallelize a single task. However, if you submit 4 independent tasks (that have no memory or lock contention), then you will see all of them complete in those 10 seconds that it took the 1 task.
I have 10 threads filling unique codes in 10 tables simultaneously. Each thread filling up million records. After sometimes 7 tables got filled up but the rest 3 are still filling up. I want to indulge the free 7 threads in filling up the tables simultaneously with the running 3 threads can this be done??
String noOfCodes = ""+((Integer.parseInt(totalOfCodes))/10);
ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 0; i < 10; i++) {
String threadNo = ""+i;
Runnable worker = new CodeGeneratorDAO(pgmId, digits, points, validity, noOfCodes, product, threadNo);
executor.execute(worker);
resp.setSuccess(true);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");
A simple solution is to define a Runnable executing a smaller task that your current Runnable. Breaking down the tasks will smooth the overall execution time.
You say that your Runnable "fills up 1000 records", so define your Runnable as filling up 1 record and submit all your 10 * 1000 records to be updated to your ExecutorService:
ExecutorService executor = Executors.newFixedThreadPool(10);
for(Runnable oneRecordRunnable : allRunnables) {
executor.submit(oneRecordRunnable);
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.HOURS);
As a side note, I replaced your cpu-burning while(true) loop by the awaitTermination method.
I need to run some code for a predefined length of time, when the time is up it needs to stop. Currently I am using a TimerTask to allow the code to execute for a set amount of time but this is causing endless threads to be created by the code and is just simply not efficient. Is there a better alternative?
Current code;
// Calculate the new lines to draw
Timer timer3 = new Timer();
timer3.schedule(new TimerTask(){
public void run(){
ArrayList<String> Coords = new ArrayList<String>();
int x = Float.valueOf(lastFour[0]).intValue();
int y = Float.valueOf(lastFour[1]).intValue();
int x1 = Float.valueOf(lastFour[2]).intValue();
int y1 = Float.valueOf(lastFour[3]).intValue();
//Could be the wrong way round (x1,y1,x,y)?
Coords = CoordFiller.coordFillCalc(x, y, x1, y1);
String newCoOrds = "";
for (int j = 0; j < Coords.size(); j++)
{
newCoOrds += Coords.get(j) + " ";
}
newCoOrds.trim();
ClientStorage.storeAmmendedMotion(newCoOrds);
}
}
,time);
If you are using Java5 or later, consider ScheduledThreadPoolExecutor and Future. With the former, you can schedule tasks to be run after a specified delay, or at specified intervals, thus it takes over the role of Timer, just more reliably.
The Timer facility manages the execution of deferred ("run this task in 100 ms") and periodic ("run this task every 10 ms") tasks. However, Timer has some drawbacks, and ScheduledThreadPoolExecutor should be thought of as its replacement. [...]
A Timer creates only a single thread for executing timer tasks. If a timer task takes too long to run, the timing accuracy of other TimerTasks can suffer. If a recurring TimerTask is scheduled to run every 10 ms and another TimerTask takes 40 ms to run, the recurring task either (depending on whether it was scheduled at fixed rate or fixed delay) gets called four times in rapid succession after the long-running task completes, or "misses" four invocations completely. Scheduled thread pools address this limitation by letting you provide multiple threads for executing deferred and periodic tasks.
Another problem with Timer is that it behaves poorly if a TimerTask throws an unchecked exception. The Timer thread doesn't catch the exception, so an unchecked exception thrown from a TimerTask terminates the timer thread. Timer also doesn't resurrect the thread in this situation; instead, it erroneously assumes the entire Timer was cancelled. In this case, TimerTasks that are already scheduled but not yet executed are never run, and new tasks cannot be scheduled.
From Java Concurrency in Practice, section 6.2.5.
And Futures can be constrained to run at most for the specified time (throwing a TimeoutException if it could not finish in time).
Update
If you don't like the above, you can make the task measure its own execution time, as below:
int totalTime = 50000; // in nanoseconds
long startTime = System.getNanoTime();
boolean toFinish = false;
while (!toFinish)
{
System.out.println("Task!");
...
toFinish = (System.getNanoTime() - startTime >= totalTime);
}
[...] Currently I am using a TimerTask to allow the code to execute for a set amount of time [...]
The timer task will never stop the currently running task. In fact, it's only purpose is to restart the task over and over again.
There is no easy way of solving this without tight cooperation with the executing task. The best way is to let the task monitor it's own execution, and make sure that it returns (terminates) when its time is up.
If by stopping you mean the program has to exit, the solution is to create a thread for your processing and mark it as daemon, start it and in the main thread sleep for the time required, then simply return from the main() method.
Scratch that if by stopping, you mean just to stop the processing.
It should also be noted that generally you only need to create one Timer(). From the code snippet I would guess you are creating multiple Timer() objects.
The time in the schedule method is the time to run at, not how long to run for.
Consider putting a start time before the for loop & putting a break in the for loop if you have exceeded the time limit.
long startedAt = System.currentTimeMillis();
long finishedCorrectly = true;
for (int j = 0; j < Coords.size(); j++) {
newCoOrds += Coords.get(j) + " ";
if ((System.currentTimeMillis() - startedAt) > MAX_TIME_TO_RUN) {
finishedCorrectly = false;
break;
}
}