I am looking for a good solution or probably an API to solve the following problem:
My application does a task in a loop, for example it sends e-mails etc. I need to limit the average rate of messages to for example 100 messages per second or 1000 messages per last minute ...
No I am looking for an algorithm or an API which does exactly this task.
You can use a ScheduledExecutorService to schedule tasks for a given period of time.
For example, to schedule 100 tasks per second you can say:
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(nThreads);
scheduler.scheduleAtFixedRate(mailSender, 0, 10, TimeUnit.MILLISECONDS);
Obviously, you need to track how many tasks have executed and turn off the scheduler after the job is done.
Token bucket algorithm is very easy to implement and use yet very powerful. You can control the throughput at runtime and queue some requests to handle peeks.
The simplest way I can think of is to delay when to send each emails depending on how many are waiting.
final ScheduledThreadPoolExecutor service = new ScheduledThreadPoolExecutor(1);
int ratePerSecond = ...
public static void execute(Runnable run) {
int delay = 1000 * service.getQueue().size() / ratePerSecond;
service.schedule(run, delay, TimeUnit.MILLISECONDS);
}
This will ensure that the tasks are performed only as close to together as the rate allows.
Guava has a RateLimiter class that does exactly that.
Related
I'm trying to schedule a task that requires ~2.25 sec to be run every second.Thus I know 3 threads should be enough to handle the load. My code looks like this:
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(4);
scheduler.scheduleAtFixedRate(new Streamer(), 0, 1000, TimeUnit.MILLISECONDS);
interestingly this behaves like scheduledAtFixedDelay with a delay of 0 (threads finish every ~2.25 sec).
I know, that scheduleAtFixedRate can be late if a thread runs late. But shouldn't giving the executor a larger threadpool solve this problem?
I could easily circumnvent the problem by coding 3 executors to start every 3 seconds, but that would make administration unnecessarily difficult.
Is there an easier solution to this problem?
The scheduledExecutor automatically prevents overlapping of Task executions. So your subsequent executions will be delayed until after the previous one has finished.
Docs:
"If any execution of this task takes longer than its period, then subsequent executions may start late, but will not concurrently execute."
So you need to schedule 3 Tasks with 1) InitDelay 0 secs, 2) InitDelay 1 secs, 3) InitDelay 2 secs and for each of them a Period of 3 secs.
private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(4);
final Streamer sharedStreamer = new Streamer();
scheduler.scheduleAtFixedRate(sharedStreamer, 0, 3, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(sharedStreamer, 1, 3, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(sharedStreamer, 2, 3, TimeUnit.SECONDS);
Mind that execution with the shared resource could lead to extended execution time, if there are large portions of synchronized code.
You can use threadPool and scheduler to achieve this effect:
Create a fixedThreadPool (or other that will fullfill your needs), let say with 4 threads - the number of threads should be based on the time single task consumes to execute and the rate at which the tasks are scheduled, basically numberOfThreads = avgExecTimeOfSingleTaks * frequency + someMargin;
then create scheduler thread, that every second (or other desired period) will add a job to the fixedThreadPool queue, then the tasks can overlap.
This way of solving the problem has additional adventage:
If your tasks start to take longer than 2.25 sec, or you would like to execute them with greater frequency, then all you have to do is to add some threads to the threadPool, whereas using the other answer, you need to recalculate and reschedule everything. So this approch gives you clearer and easier to maintenance approach.
I'm trying to set up a job that will run every x minutes/seconds/milliseconds/whatever and poll an Amazon SQS queue for messages to process. My question is what the best approach would be for this. Should I create a ScheduledThreadPoolExecutor with x number of threads and schedule a single task with scheduleAtFixedRate method and just run it very often (like 10 ms) so that multiple threads will be used when needed, or, as I am proposing to colleagues, create a ScheduledThreadPoolExecutor with x number of threads and then create multiple scheduled tasks at slightly offset intervals but running less often. This to me sounds like how the STPE was meant to be used.
Typically I use Spring/Quartz for this type of thing but that's out of at this point.
So what are your thoughts?
I recommend that you use long polling on SQS, which makes your ReceiveMessage calls behave more like calls to take on a BlockingQueue (which means that you won't need to use a scheduled task to poll from the queue - you just need a single thread that polls in an infinite loop, retrying if the connection times out)
Well it depends on the frequency of tasks. If you just have to poll on timely interval and the interval is not very small, then ScheduledThreadPoolExecutor with scheduleAtFixedRate is a good alternative.
Else I will recommend using netty's HashedWheelTimer. Under heavy tasks it gives the best performance. Akka and play uses this for scheduling. This is because STPE for every task adding takes O(log(n)) where as HWT takes O(1).
If you have to use STPE, I will recommend one task at a rate else it results in excess resource.
Long Polling is like a blocking queue only for a max of 20 seconds after which the call returns. Long polling is sufficient if that is the max delay required between poll cycles. Beyond that you will need a scheduledExector.
The number of threads really depends on how fast you can process the received messages. If you can process the message really fast you need only a single thread. I have a setup as follows
SingleThreadScheduledExecutor with scheduleWithFixedDelay executes 5 mins after the previous completion
In each execution messages are retrieved in batch from SQS till there are no more messages to process (remember each batch receive a max of 10 messages).
The messages are processed and then deleted from queue.
For my scenario single thread is sufficient. If the backlog is increasing (for example, a network operation is required for each message which may involve waits), you might want to use multiple threads. If one processing node become resource constrained you could always start another instance (EC2 perhaps) to add more capacity.
I am looking for a load balanced thread pool with no success so far. (Not sure whether load balancing is the correct wording).
Let me explain what I try to achieve.
Part 1:
I have Jobs, with 8 to 10 single tasks. On a 6 core CPU I let 8 thread work on this tasks in parallel which seems to deliver best peformance. Whe one task is ready, another one can start. Once all ten tasks are finished, the complete job is done. Usually a job is done in 30 to 60 seconds.
Part two:
Some times, unfortunately, the job takes more then two hours. This is correct due to amount of data that has to be calculated.
The bad thing is, that no other job can start while job1 is running (assuming, that all threads have the same duration) because it is using all threads.
My First idea:
Have 12 threads, allow up to three jobs in parallel.
BUT: that means, the cou is not fully untilized when there is only 1 job.
I am looking for a solution to have full CPU power for job one when there is no other job. But when an other job needs to be started while one other is running, I want the CPU power allocated to both job. And when a third or fourth job shows up, I want the cpu power alocated fairly to all four jobs.
I apreciate your answers...
thanks in advance
One possibility might be to use a standard ThreadPoolExecutor with a different kind of task queue
public class TaskRunner {
private static class PriorityRunnable implements Runnable,
Comparable<PriorityRunnable> {
private Runnable theRunnable;
private int priority = 0;
public PriorityRunnable(Runnable r, int priority) {
this.theRunnable = r;
this.priority = priority;
}
public int getPriority() {
return priority;
}
public void run() {
theRunnable.run();
}
public int compareTo(PriorityRunnable that) {
return this.priority - that.priority;
}
}
private BlockingQueue<Runnable> taskQueue = new PriorityBlockingQueue<Runnable>();
private ThreadPoolExecutor exec = new ThreadPoolExecutor(8, 8, 0L,
TimeUnit.MILLISECONDS, taskQueue);
public void runTasks(Runnable... tasks) {
int priority = 0;
Runnable nextTask = taskQueue.peek();
if(nextTask instanceof PriorityRunnable) {
priority = ((PriorityRunnable)nextTask).getPriority() + 1;
}
for(Runnable t : tasks) {
exec.execute(new PriorityRunnable(t, priority));
priority += 100;
}
}
}
The idea here is that when you have a new job you call
taskRunner.runTasks(jobTask1, jobTask2, jobTask3);
and it will queue up the tasks in such a way that they interleave nicely with any existing tasks in the queue (if any). Suppose you have one job queued, whose tasks have priority numbers j1t1=3, j1t2=103, and j1t3=203. In the absence of other jobs, these tasks will execute one after the other as quickly as possible. But if you submit another job with three tasks of its own, these will be assigned priority numbers j2t1=4, j2t2=104 and j2t3=204, meaning the queue now looks like
j1t1, j2t1, j1t2, j2t2, etc.
This is not perfect however, because if all threads are currently working (on tasks from job 1) then the first task of job 2 can't start until one of the job 1 tasks is complete (unless there's some external way for you to detect this and interrupt and re-queue some of job 1's tasks). The easiest way to make things more fair would be to break down the longer-running tasks into smaller segments and queue those as separate tasks - you need to get to a point where each individual job involves more tasks than there are threads in the pool, so that some of the tasks will always start off in the queue rather than being assigned directly to threads (if there are idle threads then exec.execute() passes the task straight to a thread without going through the queue at all).
The easiest thing to do is to oversubscribe your CPU, as Kanaga suggests, but start 8 threads each. There may be some overhead from the competition, but if you get to a single job situation, it will fully utilize the CPU. The OS will handle giving time to each thread.
Your "first idea" would also work. The idle threads wouldn't take resources from 8 working threads if they aren't actually executing a task. This wouldn't distribute the cpu resources as evenly when there are multiple jobs running, though.
Do you have a setup where you can test these different pipelines to see how they're performing for you?
I think since your machine is 6 core CPU. Better have 6 worker thread for each job-thread. So that when ever one thread got a new job, it starts up to six parallel workers to work on the single job. This will ensure consuming the full cpu power when there is only one job at a time.
Also please have a look at Fork and Join concept in java 7.
References_1 References_2References_3 References_4
Also learn about newcachedthreadpool()
Java newCachedThreadPool() versus newFixedThreadPool
I'm looking for a scalable "at" replacement, with high availability. It must support adding and removing jobs at runtime.
Some background:
I have an application where I trigger millions of events, each event occurs just once. I don't need cron like mechanisms (first Sunday of the month, etc), simply date, time and context.
Currently I'm using the Quartz scheduler, and while it is a very good project, it has difficulties to handle the amount of events we throw at it, even after a lot of tweaking (sharding, increasing polling interval, etc.) due to the basic locking it performs on the underline database. Also, it is a bit overkill for us, as basically we have millions of one time triggers, and relatively small number of jobs.
I'd appreciate any suggestion
If I was facing the same scenario I would do the following...
Setup a JMS queue cluster (e.g RabbitMQ or ActiveMQ) using a queue replication setup over a few boxes.
Fire all the events at my nice highly-available JMS queue.
Then I would code an agent application that popped the events of the JMS queue as needed, I could run multiple agents on multiple boxes and have that combined with the correct JMS failover url etc.
You could also use the same sort of model if your jobs are firing the events...
Fyi, a nicer way of scheduling in core Java is as follows:
ScheduledExecutorService threadPool = Executors.newScheduledThreadPool(sensibleThreadCount);
threadPool.scheduleAtFixedRate(new Runnable() {
#Override
public void run() {
//Pop events from event queue.
//Do some stuff with them.
}
}, initialDelay, period, TimeUnit.X);
Maybe just use JGroups shared tree with tasks sorted by execution time. Nodes will will take first task and schedule timer, which will execute at given time. On task remove timer can be canceled.
So basically you can use just JGroups and simple java Timers/Executors.
I didn't read it whole, but here is some proof of concept or maybe even solution
how about the java timer?
import java.util.*;
public class MyTask extends TimerTask{
public void run(){
System.out.println( "do it!" );
}
}
and then
Timer timer = new Timer();
MyTask job1 = new MyTask();
// once after 2 seconds
timer.schedule( job1, 2000 );
job1.cancel();
MyTask job2 = new MyTask();
// nach each 5 seconds after 1 second
timer.schedule ( job2, 1000, 5000 );
i have a ScheduledExecutorService that gets tasks for periodically execution:
scheduler = Executors.newScheduledThreadPool( what size? );
public addTask(ScheduledFuture<?> myTask, delay, interval) {
myTask = scheduler.scheduleAtFixedRate(new Runnable() {
// doing work here
},
delay,
interval,
TimeUnit.MILLISECONDS );
}
The number of tasks the scheduler gets depends solely on the user of my program. Normaly it should be a good idea, afaik, to make the ThreadPoolSize #number_of_Cpu_Threads, so that each CPU or CPU Thread executes one Task at a time, cause this should give the fastest throughput. But what should i do if the Tasks involve I/O (as they do in my program)? The tasks in my program are grabbing data from a server on the internet and saving them in a db. So that means most of the time they are waiting for the data to come in (aka idle). So what would be the best solution for this problem?
It really depends on the exact context:
How many tasks will be added? (You've said it's up to the user, but do you have any idea? Do you know this before you need to create the pool?)
How long does each of them take?
Will they be doing any intensive work?
If they're all saving to the same database, is there any concurrency issue there? (Perhaps you want to have several threads fetching from different servers and putting items in a queue, but only one thread actually storing data in the database?)
So long as you don't get "behind", how important is the performance anyway?
Ultimately I strongly suspect you'll need to benchmark this yourself - it's impossible to give general guidance without more information, and even with specific numbers it would be mostly guesswork. Hard data is much more useful :)
Note that the argument to newScheduledThreadPool only specifies the number of core threads to keep in the thread pool if threads are idle - so it's going to be doing a certain amount of balancing itself.