TL;DR
How to model a process where the duration of one task depends on the finish time of its predecessor task in chained through time pattern. For more details please read the following description:
Planning Problem Description
First of all I want to describe my planning problem:
Constraints
A machine has x slots and can produce or cool different types of products in each of its slots.
A machine could not produces and cool at the same time, but it could produces first and than cool afterwards.
A machine can produce only one type of product at a time, but it can produce different types of products one after the other.
A machine could produces multiple products of the same type at the same time
A machine could cool different types of products at the same time
The products are consumed at a target time
Produced products must be cooled until target time
Planning Goals
Minimize production time
Determine the latest time at which the production must be started
Examples of valid processes
The following picture illustrates two valid production processes. The Machines are on the right end of the process because the PlanningEntities are chained through time backwards starting with the target time.
Domain Model
I have modeled the problem with the chained through time pattern since the shortest duration of a production job is less than a minute. So I have to plan in seconds. Maybe this would be to fine grained for the time grain pattern.
The following code is an excerpt from my domain model.The abstract class JobOrSlot is used to model the chain of jobs anchored by a slot.
#PlanningEntity
public abstract class JobOrSlot {
#Id
private UUID id = UUID.randomUUID();
#InverseRelationShadowVariable(sourceVariableName = "previousJobOrSlot")
protected Job nextJob;
public abstract Integer getStartOffset();
}
#PlanningEntity
public class Job extends JobOrSlot {
#PlanningVariable(valueRangeProviderRefs = { "slotRange", "jobRange" }, graphType = PlanningVariableGraphType.CHAINED)
private JobOrSlot previousJobOrSlot;
#AnchorShadowVariable(sourceVariableName = "previousJobOrSlot")
private Slot slot;
#ShadowVariable(variableListenerClass = EndTimeUpdatingVariableListener.class, sourceVariableName = "previousJobOrSlot")
private Integer startOffset;
#PiggybackShadowVariable(shadowVariableName = "startOffset")
private Integer endOffset;
// problem facts
#Nullable
private Job predecessor;
#Nullable
private Job successor;
private Duration duration;
private JobType type; // produce, hold
// ... unecessary problemfacts and getter and setter omitted
}
The EndTimeUpdatingVariableListener.class iterates through the chains of jobs and calculates the start and end times of the jobs based on their duration and their position in the chain.
The Problem
The challenge is, that the duration of the cooling job is not known before the planning starts. The duration of the cooling job can only be calculated in the moment the production job is placed. It is targetTime - cookingJob.endTime. So placing a production job must change the duration of the cooling job and I can't find a good solution to model this in Optaplanner. I have evaluate two possible soloutions but maybe you have a better idea to implement this.
One Solution would be to update the associated cooling job start and end times in the EndTimeUpdatingVariableListener every time its production job changes, but this could create effects which are hard to tackle. E.G. Placing another job between the production job and its cooling job like produce A -> someOtherJob -> cool A -> Slot would create an infinite update loop. This could be tackled by implementing a MoveFilter to forbid such invalid moves but its still pain.
Another solution could be to implement CustomMoves that add or remove cooling jobs from the planning problem. Every time a cooling job is added it gets the current valid duration. Hence the duration of the Entity is fixed during planning.
Both implementation getting pretty complex. Is there a better way to model such a planning entity depedency in optaplanner?
Related
I would like to know whether the following use-case is intended to be solved with Optaplanner.
Problem description
Problem picture
There is a vehicle that needs to go from Start to End (location to location) over a mesh of nodes.
Nodes are connected with "nodeConnection" and there can be many nodeConnections between two nodes. NodeConnection also contains the price of that route.
I need a solution that tells me list of nodeConnections to take.
Number of NodeConnections that the vehicle will take is unknown. vehicle might go A - B and only take 1 NodeConnection, or go A - D and take 3 NodeConnections ( A-B connection B-C connection and then C-D connection)
What is required
Basically I'd like to give Optaplanner an input of 2 Nodes (locations) and a list of available NodeConnections and get back an "optimal" route for that input.
I'd like to know which route is cheapest, eg. in the provided image it's the one marked with a green line because total trip cost is 150.
What I tried
I thought that I could have NodeConnection collection as a #PlanningProperty, however that is currently not supported by OptaPlanner.
Then I came across planning variable not support Collect and tried to have a RouteNodeConnection as a many-to-many relation between a Route and a NodeConnection, eg. 1 route would have N NodeConnections - but we do not know how much, we just want the "best" route.
The problem at the moment is "telling" OptaPlanner what we want and setting up the Constraints.
VRP repository seemed similar but there are differences:
I only use 1 vehicle, not an entire fleet
no need to go back to the starting point
no need to calculate various distances since NodeConnection prices/distances are already defined
I read about chained #PlanningVariables and it seems that that approach must be taken here to create a chain of NodeConnections, however I am not sure how to setup the constraints and get OptaPlanner to actually return the best route.
Is this something that can be done with OptaPlanner or a more simpler solution might be enough here. I looked at Dijkstra's Algorithm to get the shortest path and then we'd have to figure out how to "weight" other constraints I will introduce later and so on but OptaPlanner seems more fit for this problem - especially considering the fact that other than price, we're most likely going to have additional 5 hard and over 10 soft constraints that would be hard to "weigh" in graph algorithms.
Any tips on how to approach this problem are very welcome.
Classes for reference:
public class Node {
private String Name;
}
public class NodeConnection {
private Node StartPosition;
private Node EndPosition;
private BigDecimal Price;
}
public class Route {
private Node Start;
private Node End;
private BigDecimal RoutePrice;
private List<NodeConnection> NodeConnections;
}
// "Many-to-many" relation
public class RouteNodeConnection {
private Route Route;
private NodeConnection NodeConnection;
}
I have a scenario where the existing code does the following task(Single Java file and class):
methodCopyDataBasedOnDates(StartDate, EndDate) {
GetDataFromSomewhere(); // Gets data from a Database table
ProcessData(); //Converts to JSON array
PasteSomewhereElse(); //Copies to Amazon S3
}
Now, since this takes a long time for this single task, I want to implement Java multithreading as a solution.
I am planning to use executorFramework like this:
LinkedBlockingQueue<whatever> allDateRangesQueue; //Contains all date ranges.
ExecutorService executorService = Executors.newFixedThreadPool(20);
while(!allDateRangesQueue.empty())
{
executorService.execute(new Runnable() {
#Override
public void run()
// DatePair consists of start date and end date pair. Eg: if input date range is one year, then this DS consists of date ranges of 1 day each 364 entries in the Queue for the threads to pick up.
final HashMap<Integer, String> datePair = allDateRangesQueue.poll();
methodCopyDataBasedOnDates(date.get(0), date.get(1), threadName);
}
});
}
Question: I need a better design in terms of modularity and efficiency. Can someone suggest a design?
Can you please have a look at spring batch? I think it does exacly what you are trying to do. Will try to find a good example and update this answer
found:
https://spring.io/guides/gs/batch-processing/ , https://www.mkyong.com/spring-batch/spring-batch-hello-world-example/
The way this works is exacly chunk asyncronous processing. you define a reader, a processor and a writer, bind these three together and these can work together in parallel. Also assuming ofcourse you can read chunks (pagination) of data, this can become a really powerfull tool.
An 'extension' that i have never managed to understand and use but if you got time it may become really usefull for you (since to me it seems that it meets your needs) is 'spring xd', but i am afraid I failed to understand it at huge extend. Spring batch i have used and seems like an elegant easy way to implement what you are asking for
My service endpoint recieves a list of metric every minute along with their timestamps. If a metric passes certain conditions we need to store them in a cache, so that they can be accessed later. The access functions for this service are -
List<Metrics> GetAllInterestingMetrics5Mins();
List<Metrics> GetAllInterestingMetrics10Mins();
List<Metrics> GetAllInterestingMetrics30Mins();
My curent solution is to use 3 Guava caches with time based eviction set to 5, 10 & 15 minutes. When somebody calls one of the above functions, I return all the metrics from the relvant cache.
There are 2 problems with this -
Guava cache start timing for eviction based on when the value is put in the cache (or accessed, depending upon setting). Now its possible for a metric to be delayed, so the timestamp would be earlier than the time when the metric is put in the cache.
I dont like that I have to create 3 caches, when one cache with 30 mins should suffice, it increases memory footprint and complexity in cache handling.
Is there a way to solve these 2 problems in Guava or any other out of box caching solution ?
There is a particular difference between caching solutions like Guava and EHCache and what you are trying to implement. The sole purpose of these caches is to act in the same way than getter functions work. So, caches are intended to retrieve a single element by its key and store it for further use; evicting it after it stops being used.
E.g.
#Cacheable
public Object getter(String key){
...
}
That's why getting a whole set of objects from the cache feels a little like forcing the cache and the eviction policy to work differently from its original purpose.
What you need, instead of Guava cache (or other caching solutions), is a collection that can be evicted all at once by a timer function. Sadly, Guava doesn't provide that right now. You would still need a timer function provided by the application that would remove all existing elements from the cache.
So, my suggestion would be the following:
Even when it is possible for Guava to behave in the way you want it to, you will find out that you are not using the features that make Guava really valuable, and you are "forcing" it to behave differently. So I suggest you forget about the Guava implementation and consider using, for example, an specialization from the AbstractMap class, along with a timer function that will evict its contents every N seconds.
This way you will be able to have all your entries in a single cache and stop worrying about the discrepancies between the timestamp and the time the entry was added to the cache.
Regarding Topic 1:
Just a sidenote: Please do not confuse expiry and eviction. Expiry means the entry may not returned by the cache any more and may happen at a specified point in time or after a duration. Eviction is the action to free resources, the entry is removed from the cache. After expiry, eviction may happen at the same time or later.
All common cache products do not have support for exact, aka "point in time", expiry. We need that usecase very often in our applications so I spent some effort with cache2k to support this.
Here is a blueprint for cache2k:
static class MetricsEntry {
long nextUpdate;
List<Metrics> metrics;
}
static class MyEntryExpiryCalculator implements EntryExpiryCalculator<Integer, MetricsEntry> {
#Override
public long calculateExpiryTime(Integer _key, MetricsEntry _value, long _fetchTime, CacheEntry _oldEntry) {
return _value.nextUpdate;
}
}
Cache createTheCache() {
Cache<Integer, MetricsEntry> cache =
CacheBuilder.newCache(Integer.class, MetricsEntry.class)
.sharpExpiry(true)
.entryExpiryCalculator(new MyEntryExpiryCalculator())
.source(new MySource())
.build();
return cache;
}
If you have a time reference in the metrics objects, you can use that and you can omit the additional entry class. sharpExpiry(true) instructs cache2k for exact expiry. If you leave this out, the expiry may be a few milliseconds off, but the access time is slightly faster.
Regarding Topic 2:
The straight forward approach would be to use the interval minutes as cache key.
Here is a cache source (aka cache loader) that strictly returns the metrics of the previous interval:
static class MySource implements CacheSource<Integer, MetricsEntry> {
#Override
public MetricsEntry get(Integer interval) {
MetricsEntry e = new MetricsEntry();
boolean crossedIntervalEnd;
do {
long now = System.currentTimeMillis();
long intervalMillis = interval * 1000 * 60;
long startOfInterval = now % (intervalMillis);
e.metrics = calculateMetrics(startOfInterval, interval);
e.nextUpdate = startOfInterval + intervalMillis;
now = System.currentTimeMillis();
crossedIntervalEnd = now >= e.nextUpdate;
} while (crossedIntervalEnd);
return e;
}
}
That would return the metrics for 10:00-10:05 if you do the request on lets say 10:07.
If you just want to calculate instantly the metrics of the past interval, then it is simpler:
static class MySource implements CacheSource<Integer, MetricsEntry> {
#Override
public MetricsEntry get(Integer interval) {
MetricsEntry e = new MetricsEntry();
long intervalMillis = interval * 1000 * 60;
long startOfInterval = System.currentTimeMillis();
e.metrics = calculateMetrics(startOfInterval, interval);
e.nextUpdate = startOfInterval + intervalMillis;
return e;
}
}
The use of the cache source has an advantage over put(). cache2k is blocking, so if multiple requests come in for one metric, only one metric calculation is started.
If you don't need exact expiry to the millisecond, you can use other caches, too. The thing you need to do is to store the time it takes to calculate the metrics within your cache value and then correct the expiry duration accordingly.
Have a good one!
Have you considered using something like a Deque instead? Just put the metrics in the queue and when you want to retrieve metrics for the last N minutes, just start at the end with the most recent additions and take everything until you find one that's from > N minutes ago. You can evict entries that are too old from the other end in a similar way. (It's not clear to me from your question how the key/value aspect of Cache relates to your problem.)
I'm trying to implement an application that programs tasks based on some user input. The users can put a number of IPs with telnet commands associated with them (one to one relationship), a frequency of execution, and 2 groups (cluster, objectClass).
The user should be able to add/remove IPs, Clusters, commands, etc, at runtime. They should also be able to interrupt the executions.
This application should be able to send the telnet commands to the IPs, wait for a response and save the response in a database based on the frequency. The problem I'm having is trying to make all of this multithreaded, because there are at least 60,000 IPs to telnet, and doing it in a single thread would take too much time. One thread should process a group of IPs in the same cluster with the same objectClass.
I've looked at Quartz to schedule the jobs. With Quartz I tried to make a dynamic job that took a list of IPs (with commands), processed them and saved the result to database. But then I ran into the problem of the different timers that users gave. The examples on the Quartz webpage are incomplete and don't go too much into detail.
Then I tried to do it the old fashioned way, using java Threads, but I need to have exception handling and parameter passing, Threads don't do that. Then I discovered the Callables and Executors but I can't schedule tasks with Callables.
So Now I'm stumped, what do I do?
OK, here are some ideas. Take with the requisite grain of salt.
First, create a list of all of the work that you need to do. I assume you have this in tables somewhere and you can make a join that looks like this:
cluster | objectClass | ip-address | command | frequency | last-run-time
this represents all of the work your system needs to do. For the sake of explanation, I'll say frequency can take the form of "1 per day", "1 per hour", "4 per hour", "every minute". This table has one row per (cluster,objectClass,ip-address,command). Assume a different table has a history of runs, with error messages and other things.
Now what you need to do is read that table, and schedule the work. For scheduling use one of these:
ScheduledExecutorService exec = Executors...
When you schedule something, you need to tell it how often to run (easy enough with the frequencies we've given), and a delay. If something is to run every minute and it last ran 4 min 30 seconds ago, the initial delay is zero. If something is to run each hour the the initial delay is (60 min - 4.5 min = 55.5 min).
ScheduledFuture<?> handle = exec.scheduleAtFixedRate(...);
More complex types of scheduling are why things like Quartz exist, but basically you just need a way to resolve, given(schedule, last-run) an elapsed time to the next execution. If you can do that, then instead of scheduleAtFixedRate(...) you can use schedule(...) and then schedule the next run of a task as that task completes.
Anyway, when you schedule something, you'll get a handle back to it
ScheduledFuture<?> handle = exec.scheduleAtFixedRate(...);
Hold this handle in something that's accessible. For the sake of argument let's say it's a map by TaskKey. TaskKey is (cluster | objectClass | ip-address | command) together as an object.
Map<TaskKey,ScheduledFuture<?>> tasks = ...;
You can use that handle to cancel and schedule new jobs.
cancelForCustomer(CustomerId id) {
List<TaskKey> keys = db.findAllTasksOwnedByCustomer(id);
for(TaskKey key : keys) {
ScheduledFuture<?> f = tasks.get(key);
if(f!=null) f.cancel();
}
}
For parameter passing, create an object to represent your work. Create one of these with all the parameters you need.
class HostCheck implements Runnable {
private final Address host;
private final String command;
private final int something;
public HostCheck(Address host, String command; int something) {
this.host = host; this.command = command; this.something = something;
}
....
}
For exception handling, localize that all into your object
class HostCheck implements Runnable {
...
public void run() {
try {
check();
scheduleNextRun(); // optionally, if fixed-rate doesn't work
} catch( Exception e ) {
db.markFailure(task); // or however.
// Point is tell somebody about the failure.
// You can use this to decide to stop scheduling checks for the host
// or whatever, but just record the info now and us it to influence
// future behavior in, er, the future.
}
}
}
OK, so up to this point I think we're in pretty good shape. Lots of detail to fill in but it feels manageable. Now we get to some complexity, and that's the requirement that execution of "cluster/objectClass" pairs are serial.
There are a couple of ways to handle this.
If the number of unique pairs are low, you can just make Map<ClusterObjectClassPair,ScheduledExecutorService>, making sure to create single-threaded executor services (e.g., Executors.newSingleThreadScheduledExecutor()). So instead of a single scheduling service (exec, above), you have a bunch. Simple enough.
If you need to control the amount of work you attempt concurrently, then you can have each HealthCheck acquire a permit before execution. Have some global permit object
public static final Semaphore permits = java.util.concurrent.Semaphore(30);
And then
class HostCheck implements Runnable {
...
public void run() {
permits.acquire()
try {
check();
scheduleNextRun();
} catch( Exception e ) {
// regular handling
} finally {
permits.release();
}
}
}
You only have one thread per ClusterObjectClassPair, which serializes that work, and then permits just limit how many ClusterObjectClassPair you can talk to at a time.
I guess this turned it a quite a long answer. Good luck.
I apologise for the length of this problem, but I thought it important to include sufficient detail given that I'm looking for a suitable approach to my problem, rather than a simple code suggestion!
General description:
I am working on a project that requires tasks being able to be 'scheduled' at some relative repeating interval.
These intervals are in terms of some internal time, that is represented as an integer that is incremented as the program executes (so not equal to real time). Each time this happens, the schedule will be interogated to check for any tasks due to execute at this timestep.
If a task is executed, it should then be rescheduled to run again at a position relative to the current time (e.g. in 5 timesteps). This relative position is simply stored as an integer property of the Task object.
The problem:
I am struggling somewhat to decide upon how I should structure this- partly because it is a slightly difficult set of search terms to look for.
As it stands, I am thinking that each time the timer is incremented I need to:
Execute tasks at the '0' position in the schedule
Re-add those tasks to the schedule again at their relative position (e.g. a task that repeats every 5 steps will be returned to the position 5)
Each group of tasks in the schedule will have their 'time until execution' decremented one (e.g. a task at position 1 will move to position 0)
Assumptions:
There are a couple of assumptions that may limit the possible solutions I can use:
The interval must be relative, not a specific time, and is defined to be an integer number of steps from the current time
These intervals may take any integer value, e.g. are not bounded.
Multiple tasks may be scheduled for the same timestep, but their order of execution is not important
All execution should remain in a single thread- multi-threaded solutions are not suitable due to other constraints
The main questions I have are:
How could I design this Schedule to work in an efficient manner? What datatypes/collections may be useful?
Is there another structure/approach I should consider?
Am I wrong to dismiss scheduling frameworks (e.g. Quartz), which appear to work more in the 'real' time domain rather 'non-real' time domain?
Many thanks for any possible help. Please feel free to comment for further information if neccessary, I will edit wherever needed!
Well, Quartz is quite powerfull tools, however it has limited configuration possibilities, so if you need specific features, you should propably write your own solution.
However, it's a good idea to study the Quartz source code and data structures, because they have successfully dealt with much problems you would find f.g. inter-process synchronization on database level, running delayed tasks etc.
I've written once my own scheduler, which was adapted to tasks where propably Quartz would not be easy to adapt, but once I've learned Quartz I've understood how much I could improve in my solutions, knowing how it was done in Quartz.
How about this, it uses your own Ticks with executeNextInterval() :
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
public class Scheduler {
private LinkedList<Interval> intervals = new LinkedList<Scheduler.Interval>();
public void addTask(Runnable task, int position) {
if(position<0){
throw new IllegalArgumentException();
}
while(intervals.size() <= position){
intervals.add(new Interval());
}
Interval interval = intervals.get(position);
interval.add(task);
}
public void executeNextInterval(){
Interval current = intervals.removeFirst();
current.run();
}
private static class Interval {
private List<Runnable> tasks = new ArrayList<Runnable>();
public void add(Runnable task) {
tasks.add(task);
}
public void run() {
for (Runnable task : tasks) {
task.run();
}
}
}
}
You might want to add some error handling, but it should do your job.
And here are some UnitTests for it :)
import junit.framework.Assert;
import org.junit.Test;
public class TestScheduler {
private static class Task implements Runnable {
public boolean didRun = false;
public void run() {
didRun = true;
}
}
Runnable fail = new Runnable() {
#Override
public void run() {
Assert.fail();
}
};
#Test
public void queue() {
Scheduler scheduler = new Scheduler();
Task task = new Task();
scheduler.addTask(task, 0);
scheduler.addTask(fail, 1);
Assert.assertFalse(task.didRun);
scheduler.executeNextInterval();
Assert.assertTrue(task.didRun);
}
#Test
public void queueWithGaps() {
Scheduler scheduler = new Scheduler();
scheduler.addTask(fail, 1);
scheduler.executeNextInterval();
}
#Test
public void queueLonger() {
Scheduler scheduler = new Scheduler();
Task task0 = new Task();
scheduler.addTask(task0, 1);
Task task1 = new Task();
scheduler.addTask(task1, 1);
scheduler.addTask(fail, 2);
scheduler.executeNextInterval();
scheduler.executeNextInterval();
Assert.assertTrue(task0.didRun);
Assert.assertTrue(task1.didRun);
}
}
A circular linked list might be the data structure you're looking for. Instead of decrementing fields in each task element, you simply increment the index of the 'current' field in the circular list of tasks. A pseudocode structure might look something like this:
tick():
current = current.next()
for task : current.tasklist():
task.execute()
any time you schedule a new task, you just add it in the position N ticks forward of the current 'tick'
Here are a couple of thoughts:
Keep everything simple. If you don't have millions of tasks, there is no need for an optimized data structure (except pride or the urge for premature optimization).
Avoid relative times. Use an absolute internal tick. If you add a task, set the "run next time" to the current tick value. Add it to the list, sort the list by time.
When looking for tasks, start at the head of the list and pick everything which has a time <= current tick, run the task.
Collect all those tasks in another list. After all have run, calculate the "run next time" based on the current tick and the increment (so you don't get tasks that loop), add all of them to the list, sort.
Take a look at the way DelayQueue uses a PriorityQueue to maintain such an ordered list of events. DelayQueue works using real time and hence can use the variable timed wait methods available in Condition and LockSupport. You could implement something like a SyntheticDelayQueue that behaves in the same way as DelayQueue but uses your own synthetic time service. You would obviously have to replace the timed wait/signalling mechanisms that come for free with the jdk though and this might be non trivial to do efficiently.
If I had to do it, I'd create a simple queue ( linked list variant). This queue would contain a dynamic data structure (simple list for example) containing all the tasks that need to be done. At each time interval (or time-step), the process reads the first node of the queue, executes the instructions it finds in the list of that node. At the end of each execution it would compute the rescheduling and add the new execution to another node in the queue or create nodes up to that position before storing the instruction within that node. The first node is then removed and the second node (now the first) is executed at the next time-step. This system would also not require any integers to be kept track of and all data structures needed are found in the java language. This should solve your problem.
Use a ScheduledExecutorService. It has everything you need built right in. Here's how simple it is to use:
// Create a single-threaded ScheduledExecutorService
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); // 1 thread
// Schedule something to run in 10 seconds time
scheduler.schedule(new Runnable() {
public void run() {
// Do something
}}, 10, TimeUnit.SECONDS);
// Schedule something to run in 2 hours time
scheduler.schedule(new Runnable() {
public void run() {
// Do something else
}}, 2, TimeUnit.HOURS);
// etc as you need