I apologise for the length of this problem, but I thought it important to include sufficient detail given that I'm looking for a suitable approach to my problem, rather than a simple code suggestion!
General description:
I am working on a project that requires tasks being able to be 'scheduled' at some relative repeating interval.
These intervals are in terms of some internal time, that is represented as an integer that is incremented as the program executes (so not equal to real time). Each time this happens, the schedule will be interogated to check for any tasks due to execute at this timestep.
If a task is executed, it should then be rescheduled to run again at a position relative to the current time (e.g. in 5 timesteps). This relative position is simply stored as an integer property of the Task object.
The problem:
I am struggling somewhat to decide upon how I should structure this- partly because it is a slightly difficult set of search terms to look for.
As it stands, I am thinking that each time the timer is incremented I need to:
Execute tasks at the '0' position in the schedule
Re-add those tasks to the schedule again at their relative position (e.g. a task that repeats every 5 steps will be returned to the position 5)
Each group of tasks in the schedule will have their 'time until execution' decremented one (e.g. a task at position 1 will move to position 0)
Assumptions:
There are a couple of assumptions that may limit the possible solutions I can use:
The interval must be relative, not a specific time, and is defined to be an integer number of steps from the current time
These intervals may take any integer value, e.g. are not bounded.
Multiple tasks may be scheduled for the same timestep, but their order of execution is not important
All execution should remain in a single thread- multi-threaded solutions are not suitable due to other constraints
The main questions I have are:
How could I design this Schedule to work in an efficient manner? What datatypes/collections may be useful?
Is there another structure/approach I should consider?
Am I wrong to dismiss scheduling frameworks (e.g. Quartz), which appear to work more in the 'real' time domain rather 'non-real' time domain?
Many thanks for any possible help. Please feel free to comment for further information if neccessary, I will edit wherever needed!
Well, Quartz is quite powerfull tools, however it has limited configuration possibilities, so if you need specific features, you should propably write your own solution.
However, it's a good idea to study the Quartz source code and data structures, because they have successfully dealt with much problems you would find f.g. inter-process synchronization on database level, running delayed tasks etc.
I've written once my own scheduler, which was adapted to tasks where propably Quartz would not be easy to adapt, but once I've learned Quartz I've understood how much I could improve in my solutions, knowing how it was done in Quartz.
How about this, it uses your own Ticks with executeNextInterval() :
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class Scheduler {
private LinkedList<Interval> intervals = new LinkedList<Scheduler.Interval>();
public void addTask(Runnable task, int position) {
if(position<0){
throw new IllegalArgumentException();
}
while(intervals.size() <= position){
intervals.add(new Interval());
}
Interval interval = intervals.get(position);
interval.add(task);
}
public void executeNextInterval(){
Interval current = intervals.removeFirst();
current.run();
}
private static class Interval {
private List<Runnable> tasks = new ArrayList<Runnable>();
public void add(Runnable task) {
tasks.add(task);
}
public void run() {
for (Runnable task : tasks) {
task.run();
}
}
}
}
You might want to add some error handling, but it should do your job.
And here are some UnitTests for it :)
import junit.framework.Assert;
import org.junit.Test;
public class TestScheduler {
private static class Task implements Runnable {
public boolean didRun = false;
public void run() {
didRun = true;
}
}
Runnable fail = new Runnable() {
#Override
public void run() {
Assert.fail();
}
};
#Test
public void queue() {
Scheduler scheduler = new Scheduler();
Task task = new Task();
scheduler.addTask(task, 0);
scheduler.addTask(fail, 1);
Assert.assertFalse(task.didRun);
scheduler.executeNextInterval();
Assert.assertTrue(task.didRun);
}
#Test
public void queueWithGaps() {
Scheduler scheduler = new Scheduler();
scheduler.addTask(fail, 1);
scheduler.executeNextInterval();
}
#Test
public void queueLonger() {
Scheduler scheduler = new Scheduler();
Task task0 = new Task();
scheduler.addTask(task0, 1);
Task task1 = new Task();
scheduler.addTask(task1, 1);
scheduler.addTask(fail, 2);
scheduler.executeNextInterval();
scheduler.executeNextInterval();
Assert.assertTrue(task0.didRun);
Assert.assertTrue(task1.didRun);
}
}
A circular linked list might be the data structure you're looking for. Instead of decrementing fields in each task element, you simply increment the index of the 'current' field in the circular list of tasks. A pseudocode structure might look something like this:
tick():
current = current.next()
for task : current.tasklist():
task.execute()
any time you schedule a new task, you just add it in the position N ticks forward of the current 'tick'
Here are a couple of thoughts:
Keep everything simple. If you don't have millions of tasks, there is no need for an optimized data structure (except pride or the urge for premature optimization).
Avoid relative times. Use an absolute internal tick. If you add a task, set the "run next time" to the current tick value. Add it to the list, sort the list by time.
When looking for tasks, start at the head of the list and pick everything which has a time <= current tick, run the task.
Collect all those tasks in another list. After all have run, calculate the "run next time" based on the current tick and the increment (so you don't get tasks that loop), add all of them to the list, sort.
Take a look at the way DelayQueue uses a PriorityQueue to maintain such an ordered list of events. DelayQueue works using real time and hence can use the variable timed wait methods available in Condition and LockSupport. You could implement something like a SyntheticDelayQueue that behaves in the same way as DelayQueue but uses your own synthetic time service. You would obviously have to replace the timed wait/signalling mechanisms that come for free with the jdk though and this might be non trivial to do efficiently.
If I had to do it, I'd create a simple queue ( linked list variant). This queue would contain a dynamic data structure (simple list for example) containing all the tasks that need to be done. At each time interval (or time-step), the process reads the first node of the queue, executes the instructions it finds in the list of that node. At the end of each execution it would compute the rescheduling and add the new execution to another node in the queue or create nodes up to that position before storing the instruction within that node. The first node is then removed and the second node (now the first) is executed at the next time-step. This system would also not require any integers to be kept track of and all data structures needed are found in the java language. This should solve your problem.
Use a ScheduledExecutorService. It has everything you need built right in. Here's how simple it is to use:
// Create a single-threaded ScheduledExecutorService
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); // 1 thread
// Schedule something to run in 10 seconds time
scheduler.schedule(new Runnable() {
public void run() {
// Do something
}}, 10, TimeUnit.SECONDS);
// Schedule something to run in 2 hours time
scheduler.schedule(new Runnable() {
public void run() {
// Do something else
}}, 2, TimeUnit.HOURS);
// etc as you need
Related
Is there a Java/Scala collection (List, Array or something similar) that automatically updates whole collection content after a given timeout? But with one feature that updating will happen in another thread and the old data will be available during updating.
My case is the enrichment of the Flink stream with a periodically updated collection, the data of which I get from PostgreSQL. I do not need to query the database for each message. I do a full data download (SELECT * FROM table) from the table every N minutes. And I don't want to make a pause for updating data.
Viktor,
I guess caching is one of the best solution you can use. But if you are not aware on caching, following answer will help you.
Solution
You can use TimerTask class to run a specific method timely manner. Following simple example will give you an idea.
ListUpdater (which is a TimerTask child)
import java.util.TimerTask;
public class ListUpdater extends TimerTask{
#Override
public void run() {
// You can write your code to update your list.
System.out.println("Updating list");
}
}
Application Class
public class Test {
public static void main(String[] args) {
TimerTask task = new ListUpdater();
Timer timer = new Timer();
timer.schedule(task, 1000,6000);
}
}
In this example, the timer will print the “Updating list” message every 6 seconds, with a 1 second delay for the first time of execution.
There are lot of thread safe collections in java. You are free to use any of them.
ConcurrentLinkedQueue
ArrayBlockingQueue
LinkedBlockingDeque
LinkedBlockingQueue
PriorityBlockingQueue
SynchronousQueue
DelayQueue
Or to create a brand new thread-safe list:
List newList = Collections.synchronizedList(new ArrayList());
Using Java 7
I am trying to build a watcher that watches a data store (some collection type) and then will return certain items from it at certain points.
In this case they are time stamps, when a timestamp passes the current time I want it to be returned to the starting thread. Please see code below.
#Override
public void run() {
while (!data.isEmpty()) {
for (LocalTime dataTime : data) {
if (new LocalTime().isAfter(dataTime)) {
// return a result but continue running
}
}
}
}
I have read about future and callables, but they seem to stop the thread on a return.
I do not particularly want to return a value and stop the thread then start another task if using callable, unless it is the best way.
What are the best techniques to look for this? There seem to be such a wide range of doing it.
Thanks
You can put the intermediate results in a Blocking Queue so that the results are available to consumer threads as and when they are made available :
private final LinkedBlockingQueue<Result> results = new LinkedBlockingQueue<Result>();
#Override
public void run() {
while (!data.isEmpty()) {
for (LocalTime dataTime : data) {
if (new LocalTime().isAfter(dataTime)) {
results.put(result);
}
}
}
}
public Result takeResult() {
return results.take();
}
Consumer threads can simply call the takeResult method to use the intermediate results. The advantage of using a Blocking Queue is that you don't have to reinvent the wheel since this looks like a typical producer-consumer scenario that can be solved using a blocking data structure.
Note Here, Result can be a `POJO that represents the intermediate result object.
You are on the right path. Assuming proper synchronization will be there and you will be getting all your timestamps on time :) You should ideally choose a data structure that doesn't require you to scan through all the items. Choose something like a min heap or some ascending/descending lists and now when you iterate just delete the element from this data store and put it on a Blocking Queue. have a thread that is listening on this queue to proceed further.
I wrapped a ThreadPoolExecutor in an implementation of ExecutorService of my own, just to send it any filesystem writing task, so they would be treated sequencially and one-by-one. (No need to harass this poor disk writing head.)
The wrapper comes in handy by:
allowing me to Inject this ThreadPool as a Guice Singleton pretty much everywhere I need it
telling me in real-time how much more work there is left to do
This last feature is acomplished by the call to logUtils.writingHeartbeat(int) which logs a message about how many jobs are still in the queue if a "sufficient" time has been elapsed since last logging. It works pretty well in regards of writing logs at the desired intervals, but always tells me there is 0 files remaining to write. Which sounds fishy given the execution times.
What am I doing wrong?
#Singleton
public class WritersThreadPool implements ExecutorService {
private final ThreadPoolExecutor innerPool;
private final LogUtils logUtils;
#Inject
public WritersThreadPool(LogUtils logUtils) {
innerPool = new ThreadPoolExecutor(1, 1, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>());
this.logUtils = logUtils;
}
#Override
public Future<?> submit(final Runnable r) {
return innerPool.submit(new Callable<Void>() {
#Override
public Void call() throws Exception {
r.run();
logUtils.writingHeartbeat(innerPool.getQueue().size());
return null;
}
});
}
(...) // Other implemented methods with no special behavior.
}
I share #chubbsondubs opinion, that there must be a synchronization issue elsewhere in the code.
My suggestions to prove some things were:
Try logging getTaskCountand getCompletedTaskCount.
This led to your observation, that there is indeed only 1 Task in the queue at one given time.
Instead of composition, extend ThreadPoolExecutor and use the afterExecute hook. Maybe you can investigate who is synchronizing that should not that way.
So I think the problem is that you are sequentially checking the queue size after you have fully run the Runnable being submitted. So the Runnable has fully completed it's work then it check the queue size which is going to be empty unless you had exhausted the number of threads in the innerPool. In other words there has to be something waiting in the queue for it to print out anything other than 0. The current job is being run by a thread so it's not on the queue.
I'm trying to implement an application that programs tasks based on some user input. The users can put a number of IPs with telnet commands associated with them (one to one relationship), a frequency of execution, and 2 groups (cluster, objectClass).
The user should be able to add/remove IPs, Clusters, commands, etc, at runtime. They should also be able to interrupt the executions.
This application should be able to send the telnet commands to the IPs, wait for a response and save the response in a database based on the frequency. The problem I'm having is trying to make all of this multithreaded, because there are at least 60,000 IPs to telnet, and doing it in a single thread would take too much time. One thread should process a group of IPs in the same cluster with the same objectClass.
I've looked at Quartz to schedule the jobs. With Quartz I tried to make a dynamic job that took a list of IPs (with commands), processed them and saved the result to database. But then I ran into the problem of the different timers that users gave. The examples on the Quartz webpage are incomplete and don't go too much into detail.
Then I tried to do it the old fashioned way, using java Threads, but I need to have exception handling and parameter passing, Threads don't do that. Then I discovered the Callables and Executors but I can't schedule tasks with Callables.
So Now I'm stumped, what do I do?
OK, here are some ideas. Take with the requisite grain of salt.
First, create a list of all of the work that you need to do. I assume you have this in tables somewhere and you can make a join that looks like this:
cluster | objectClass | ip-address | command | frequency | last-run-time
this represents all of the work your system needs to do. For the sake of explanation, I'll say frequency can take the form of "1 per day", "1 per hour", "4 per hour", "every minute". This table has one row per (cluster,objectClass,ip-address,command). Assume a different table has a history of runs, with error messages and other things.
Now what you need to do is read that table, and schedule the work. For scheduling use one of these:
ScheduledExecutorService exec = Executors...
When you schedule something, you need to tell it how often to run (easy enough with the frequencies we've given), and a delay. If something is to run every minute and it last ran 4 min 30 seconds ago, the initial delay is zero. If something is to run each hour the the initial delay is (60 min - 4.5 min = 55.5 min).
ScheduledFuture<?> handle = exec.scheduleAtFixedRate(...);
More complex types of scheduling are why things like Quartz exist, but basically you just need a way to resolve, given(schedule, last-run) an elapsed time to the next execution. If you can do that, then instead of scheduleAtFixedRate(...) you can use schedule(...) and then schedule the next run of a task as that task completes.
Anyway, when you schedule something, you'll get a handle back to it
ScheduledFuture<?> handle = exec.scheduleAtFixedRate(...);
Hold this handle in something that's accessible. For the sake of argument let's say it's a map by TaskKey. TaskKey is (cluster | objectClass | ip-address | command) together as an object.
Map<TaskKey,ScheduledFuture<?>> tasks = ...;
You can use that handle to cancel and schedule new jobs.
cancelForCustomer(CustomerId id) {
List<TaskKey> keys = db.findAllTasksOwnedByCustomer(id);
for(TaskKey key : keys) {
ScheduledFuture<?> f = tasks.get(key);
if(f!=null) f.cancel();
}
}
For parameter passing, create an object to represent your work. Create one of these with all the parameters you need.
class HostCheck implements Runnable {
private final Address host;
private final String command;
private final int something;
public HostCheck(Address host, String command; int something) {
this.host = host; this.command = command; this.something = something;
}
....
}
For exception handling, localize that all into your object
class HostCheck implements Runnable {
...
public void run() {
try {
check();
scheduleNextRun(); // optionally, if fixed-rate doesn't work
} catch( Exception e ) {
db.markFailure(task); // or however.
// Point is tell somebody about the failure.
// You can use this to decide to stop scheduling checks for the host
// or whatever, but just record the info now and us it to influence
// future behavior in, er, the future.
}
}
}
OK, so up to this point I think we're in pretty good shape. Lots of detail to fill in but it feels manageable. Now we get to some complexity, and that's the requirement that execution of "cluster/objectClass" pairs are serial.
There are a couple of ways to handle this.
If the number of unique pairs are low, you can just make Map<ClusterObjectClassPair,ScheduledExecutorService>, making sure to create single-threaded executor services (e.g., Executors.newSingleThreadScheduledExecutor()). So instead of a single scheduling service (exec, above), you have a bunch. Simple enough.
If you need to control the amount of work you attempt concurrently, then you can have each HealthCheck acquire a permit before execution. Have some global permit object
public static final Semaphore permits = java.util.concurrent.Semaphore(30);
And then
class HostCheck implements Runnable {
...
public void run() {
permits.acquire()
try {
check();
scheduleNextRun();
} catch( Exception e ) {
// regular handling
} finally {
permits.release();
}
}
}
You only have one thread per ClusterObjectClassPair, which serializes that work, and then permits just limit how many ClusterObjectClassPair you can talk to at a time.
I guess this turned it a quite a long answer. Good luck.
We have 1000 threads that hit a web service and time how long the call takes. We wish for each thread to return their own timing result to the main application, so that various statistics can be recorded.
Please note that various tools were considered for this, but for various reasons we need to write our own.
What would be the best way for each thread to return the timing - we have considered two options so far :-
1. once a thread has its timing result it calls a singleton that provides a synchronised method to write to the file. This ensures that all each thread will write to the file in turn (although in an undetermined order - which is fine), and since the call is done after the timing results have been taken by the thread, then being blocked waiting to write is not really an issue. When all threads have completed, the main application can then read the file to generate the statistics.
2. Using the Executor, Callable and Future interfaces
Which would be the best way, or are there any other better ways ?
Thanks very much in advance
Paul
Use the latter method.
Your workers implement Callable. You then submit them to a threadpool, and get a Future instance for each.
Then just call get() on the Futures to get the results of the calculations.
import java.util.*;
import java.util.concurrent.*;
public class WebServiceTester {
public static class Tester
implements Callable {
public Integer call() {
Integer start = now();
//Do your test here
Integer end = now();
return end - start;
}
}
public static void main(String args[]) throws Exception {
ExecutorService pool = Executors.newFixedThreadPool(1000);
Set<Future<Integer>> set = new HashSet<Future<Integer>>();
for (int i =0 ; i < 1000 i++) {
set.add(pool.submit(new Tester()));
}
Set<Integer> results = new Set<Integer>();
for (Future<Integer> future : set) {
results.put(future.get());
}
//Manipulate results however you wish....
}
}
Another possible solution I can think of would be to use a CountDownLatch (from the java concurrency packages), each thread decrementing it (flagging they are finished), then once all complete (and the CountDownLatch reaches 0) your main thread can happily go through them all, asking them what their time was.
The executor framework can be implemented here. The time processing can be done by the Callable object. The Future can help you identify if the thread has completed processing.
You could pass an ArrayBlockingQueue to the threads to report their results to. You could then have a file writing thread that takes from the queue to write to the file.