Executing dependent tasks in java - java

I need to find a way to execute mutually dependent tasks.
First task has to download a zip file from remote server.
Second tasks goal is to unzip the file downloaded by the first task.
Third task has to process files extracted from zip.
so, third is dependent on second and second on first task.
Naturally if one of the tasks fails, others shouldn't be executed. Since the first task downloads files from remote server, there should be a mechanism for restarting the task is server is not available.
Tasks have to be executed daily.
Any suggestions, patterns or java API?
Regards!

It seems that you do not want to devide them into tasks, just do like this:
process(unzip(download(uri)));

It depends a bit on external requirements. Is there any user involvement? Monitoring? Alerting?...
The simplest would obviously be just methods that check if the previous has done what it should.
download() downloads file to specified place.
unzip() extracts the file to a specified place if the downloaded file is in place.
process() processes the data if it has been extracted.
A more "formal" way of doing it would be to use a workflow engine. Depending on requirements, you can get some that do everything from fancy UIs, to some that follow formal standardised .XML-definitions of the workflow - and any in between.
http://java-source.net/open-source/workflow-engines

Create one public method to execute the full chain and private methods for the tasks:
public void doIt() {
if (download() == false) {
// download failed
} else if (unzip() == false) {
// unzip failed;
} else (process() == false)
// processing failed
}
}
private boolean download() {/* ... */}
private boolean unzip() {/* ... */}
private boolean process() {/* ... */}
So you have an API that gurantees that all steps are executed in the correct sequence and that a step is only executed if certain conditions are met (the above example just illustrates this pattern)

For daily execution you can use the Quartz Framework.
As the tasks are depending on each other I would recommend to evaluate the error codes or exceptions the tasks are returning. Then just continue if the previous task was successful.

The normal way to perform these tasks is to; call each task in order, and throw an exception when you have a failure which prevents the following tasks being performed. Something like
try {
download();
unzip();
process();
} catch(Exception failed) {
failed.printStackTrace();
}

I think what you are interested in is some kind of transaction definition.
I.e.
- Define TaskA (e.g. download)
- Define TaskB (e.g. unzip)
- Define TaskC (e.g. process)
Assuming that you intention is to have tasks working independent as well, e.g. only download a file (not execute also TaskB, TaskC) you should define Transaction1 composed of TaskA,TaskB,TaskC or Transaction2 composed of only TaskA.
The semantics e.g. concerning Transaction1 that TaskA,TaskB and TaskC should be executed sequentially and all or none could be captured in your transaction definitions.
The definitions can be in xml configuration files and you can use a framework e.g. Quartz for scheduling.
A higher construct shall check for the transactions and execute them as defined.

Dependent tasks execution made easy with Dexecutor
Disclaimer : I am the owner of the library
Basically you need the following pattern
Use Dexecutor.addDependency method
DefaultDexecutor<Integer, Integer> executor = newTaskExecutor();
//Building
executor.addDependency(1, 2);
executor.addDependency(2, 3);
executor.addDependency(3, 4);
executor.addDependency(4, 5);
//Execution
executor.execute(ExecutionConfig.TERMINATING);

Related

Thread safety for method that returns Mono based on mutable attribute in Java

In my Spring Boot application I have a component that is supposed to monitor the health status of another, external system. This component also offers a public method that reactive chains can subscribe to in order to wait for the external system to be up.
#Component
public class ExternalHealthChecker {
private static final Logger LOG = LoggerFactory.getLogger(ExternalHealthChecker.class);
private final WebClient externalSystemWebClient = WebClient.builder().build(); // config omitted
private volatile boolean isUp = true;
private volatile CompletableFuture<String> completeWhenUp = new CompletableFuture<>();
#Scheduled(cron = "0/10 * * ? * *")
private void checkExternalSystemHealth() {
webClient.get() //
.uri("/health") //
.retrieve() //
.bodyToMono(Void.class) //
.doOnError(this::handleHealthCheckError) //
.doOnSuccess(nothing -> this.handleHealthCheckSuccess()) //
.subscribe(); //
}
private void handleHealthCheckError(final Throwable error) {
if (this.isUp) {
LOG.error("External System is now DOWN. Health check failed: {}.", error.getMessage());
}
this.isUp = false;
}
private void handleHealthCheckSuccess() {
// the status changed from down -> up, which has to complete the future that might be currently waited on
if (!this.isUp) {
LOG.warn("External System is now UP again.");
this.isUp = true;
this.completeWhenUp.complete("UP");
this.completeWhenUp = new CompletableFuture<>();
}
}
public Mono<String> waitForExternalSystemUPStatus() {
if (this.isUp) {
LOG.info("External System is already UP!");
return Mono.empty();
} else {
LOG.warn("External System is DOWN. Requesting process can now wait for UP status!");
return Mono.fromFuture(completeWhenUp);
}
}
}
The method waitForExternalSystemUPStatus is public and may be called from many, different threads. The idea behind this is to provide some of the reactive flux chains in the application a method of pausing their processing until the external system is up. These chains cannot process their elements when the external system is down.
someFlux
.doOnNext(record -> LOG.info("Next element")
.delayUntil(record -> externalHealthChecker.waitForExternalSystemUPStatus())
... // starting processing
The issue here is that I can't really wrap my head around which part of this code needs to be synchronised. I think there should not be an issue with multiple threads calling waitForExternalSystemUPStatusat the same time, as this method is not writing anything. So I feel like this method does not need to be synchronised. However, the method annotated with #Scheduled will also run on it's own thread and will in-fact write the value of isUp and also potentially change the reference of completeWhenUpto a new, uncompleted future instance. I have marked these two mutable attributes with volatilebecause from reading about this keyword in Java it feels to me like it would help with guaranteeing that the threads reading these two values see the latest value. However, I am unsure if I also need to add synchronized keywords to part of the code. I am also unsure if the synchronized keyword plays well with reactor code, I have a hard time finding information on this. Maybe there is also a way of providing the functionality of the ExternalHealthCheckerin a more complete, reactive way, but I cannot think of any.
I'd strongly advise against this approach. The problem with threaded code like this is it becomes immensely difficult to follow & reason about. I think you'd at least need to synchronise the parts of handleHealthCheckSuccess() and waitForExternalSystemUPStatus() that reference your completeWhenUp field otherwise you could have a race hazard on your hands (only one writes to it, but it might be read out-of-order after that write) - but there could well be something else I'm missing, and if so it may show as one of these annoying "one in a million" type bugs that's almost impossible to pin down.
There should be a much more reliable & simple way of achieving this though. Instead of using the Spring scheduler, I'd create a flux when your ExternalHealthChecker component is created as follows:
healthCheckStream = Flux.interval(Duration.ofMinutes(10))
.flatMap(i ->
webClient.get().uri("/health")
.retrieve()
.bodyToMono(String.class)
.map(s -> true)
.onErrorResume(e -> Mono.just(false)))
.cache(1);
...where healthCheckStream is a field of type Flux<Boolean>. (Note it doesn't need to be volatile, as you'll never replace it so cross-thread worries don't apply - it's the same stream that will be updated with different results every 10 minutes based on the healthcheck status, whatever thread you'll access it from.)
This essentially creates a stream of healthcheck response values every 10 minutes, always caches the latest response, and turns it into a hot source. This means that the "nothing happens until you subscribe" doesn't apply in this case - the flux will start executing immediately, and any new subscribers that come in on any thread will always get the latest result, be that a pass or a fail. handleHealthCheckSuccess() and handleHealthCheckError(), isUp, and completeWhenUp are then all redundant, they can go - and then your waitForExternalSystemUPStatus() can just become a single line:
return healthCheckStream.filter(x -> x).next();
...then job done, you can call that from anywhere and you'll have a Mono that will only complete when the system is up.

Stopping Quartz job created using MethodInvokingJobDetailFactoryBean

i create a job running a Spring bean class with this code
MethodInvokingJobDetailFactoryBeanjobDetail = new MethodInvokingJobDetailFactoryBean();
Class<?> businessClass = Class.forName(task.getBusinessClassType());
jobDetail.setTargetObject(applicationContext.getBean(businessClass));
jobDetail.setTargetMethod(task.getBusinessMethod());
jobDetail.setName(task.getCode());
jobDetail.setGroup(task.getGroup().getCode());
jobDetail.setConcurrent(false);
Object[] argumentArray = builArgumentArray(task.getBusinessMethodParams());
jobDetail.setArguments(argumentArray);
jobDetail.afterPropertiesSet();
CronTrigger trigger = TriggerBuilder.newTrigger().withIdentity(task.getCode() + "_TRIGGER", task.getGroup().getCode() + "_TRIGGER_GROUP")
.withSchedule(CronScheduleBuilder.cronSchedule(task.getCronExpression())).build();
dataSchedulazione = scheduler.scheduleJob((JobDetail) jobDetail.getObject(), trigger);
scheduler.start();
sometimes the task stop to respond if i remove the trigger and the task from scheduler
remain in
List ob = scheduler.getCurrentlyExecutingJobs();
The state of the trigger is NONE but is still in scheduler.getCurrentlyExecutingJobs();
I have tried to implent InterruptableJob in a class that extend MethodInvokingJobDetailFactoryBeanjobDetail
But when i use
scheduler.interrupt(jobKey);
It say that the InterruptableJob is not implemented.
I think is because the instance of the class is MethodInvokingJobDetailFactoryBeanjobDetail
`scheduler.scheduleJob((JobDetail) jobDetail.getObject(), trigger);`
this is the code inside the quartz scheduler
`job = jec.getJobInstance();
if (job instanceof InterruptableJob) {
((InterruptableJob)job).interrupt();
interrupted = true;
} else {
throw new UnableToInterruptJobException(
"Job " + jobDetail.getKey() +
" can not be interrupted, since it does not implement " +
InterruptableJob.class.getName());
}
`
Is there another way to kill a single task?
I use Quartz 2.1.7 and java 1.6 and java 1.8
TIA
Andrea
There is no magic way to force JVM to stop execution of some piece of code.
You can implement different ways to interrupt the job. But the most appropriate way is to implement InterruptableJob.
Implementing this interface is not sufficient. You should implement a job in such way that it really reacts on such requests.
Example
Suppose, your job is processing 1 000 000 records in the database or in a file and it take relatively long time, let say 1 hour. Then one possible implementation can be following. In the method "interrupt()" you set some flag (member variable) to "true", let name it isInterruptionRequested. In the main logic part that is processing 1 000 000 records you can regularly, e.g. each 5 seconds or after each let say 100 records check if this flag isInterruptionRequested is set to "true". If set, you exit from the method where you implemented the main logic.
It is important that you don't check the condition too often. Otherwise, depending on the logic, it may happen that checking if the job interruption was requested may take 80-90% of CPU, much more than the actual logic :)
Thus, even when you implement the InterruptableJob interface properly, it doesn't mean that the job will be stopped immediately. It will be just a hint like "I would like to stop this job when it is possible". When it will be stopped (if at all) depends on how you implement it.

Monitor progress and intermediate results in Spark

I have a simple Spark task, something like this:
JavaRDD<Solution> solutions = rdd.map(new Solve());
// Select best solution by some criteria
The solve routine takes some time. For a demo application, I need to get some property of each solution as soon as it is calculated, before the call to rdd.map terminates.
I've tried using accumulators and SparkListener, overriding the onTaskEnd method, but it seems to be called only at the end of the mapping, not per thread, E.g.
sparkContext.sc().addSparkListener(new SparkListener() {
public void onTaskEnd(SparkListenerTaskEnd taskEnd) {
// do something with taskEnd.taskInfo().accumulables()
}
});
How can I get an asynchronous message for each map function end?
Spark runs locally or in a standalone cluster mode.
Answers can be in Java or Scala, both are OK.

How to chain Guava futures?

I'm trying to create a small service to accept file upload, unzip it and then delete the uploaded file. Those three steps should be chained as futures. I'm using Google Guava library.
Workflow is:
A future to download the file, if the operation completed, then a future to unzip the file. If unzipping is done, a future to delete the original uploaded file.
But honestly, it isn't clear to me how I would chain the futures, and even how to create them in Guava's way. Documentation is simply terse and unclear. Ok, there is transform method but no concrete example at all. chain method is deprecated.
I miss RxJava library.
Futures.transform is not fluently chainable like RxJava, but you can still use it to set up Futures that depend on one another. Here is a concrete example:
final ListeningExecutorService service = MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());
final ListenableFuture<FileClass> fileFuture = service.submit(() -> fileDownloader.download())
final ListenableFuture<UnzippedFileClass> unzippedFileFuture = Futures.transform(fileFuture,
//need to cast this lambda
(Function<FileClass, UnzippedFileClass>) file -> fileUnzipper.unzip(file));
final ListenableFuture<Void> deletedFileFuture = Futures.transform(unzippedFileFuture,
(Function<UnzippedFileClass, Void>) unzippedFile -> fileDeleter.delete(unzippedFile));
deletedFileFuture.get(); //or however you want to wait for the result
This example assumes fileDownloader.download() returns an instance of FileClass, fileUpzipper.unzip() returns an UnzippedFileClass etc. If FileDownloader.download() instead returns a ListenableFuture<FileClass>, use AsyncFunction instead of Function.
This example also uses Java 8 lambdas for brevity. If you are not using Java 8, pass in anonymous implementations of Function or AsyncFunction instead:
Futures.transform(fileFuture, new AsyncFunction<FileClass, UpzippedFileClass>() {
#Override
public ListenableFuture<UnzippedFileClass> apply(final FileClass input) throws Exception {
return fileUnzipper.unzip();
}
});
More info on transform here: http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/util/concurrent/Futures.html#transform (scroll or search for "transform" -- deep linking appears to be broken currently)
Guava extends the Future interface with ListenableFuture for this purpose.
Something like this should work:
Runnable downloader, unzipper;
ListeningExecutorService service = MoreExecutors.listeningDecorator(Executors.newCachedThreadPool());
service.submit(downloader).addListener(unzipper, service);
I would include deleting the file in the unzipper, since it is a near instantaneous action, and it would complicate the code to separate it.

serial and parallel run workflow in cq5

I have 2 workflow.
I want to write workflow if workflow. Thus I want to know how to
Launch 2 workflow in parallel
Launch second workflow after first workflow termination
You can achieve this by having multiple workflow launchers, though you have to be careful what these workflows do to the workload, e.g. if they would concurrently change the same property.
There are multiple ways to do this:
Either write a property in the last step of the first workflow and have the second workflow be triggered with a launcher if this property is set. Or you can start another workflow from a custom step:
protected void processItem(WorkItem item, WorkflowSession wfSession, WorkflowData workflowData, String config) throws WorkflowException {
String wfId = "myWorkflowId";
WorkflowModel model = wfSession.getModel(wfId);
wfSession.startWorkflow(model, workflowData);
//optionaly terminate the current workflow programmatically
wfSession.terminateWorkflow(item.getWorkflow());
}

Categories