Quartz job Vs. Thread for immediate one time task - java

Let's say I have some unit of work that needs to get done and I want to do it asynchronously relative to the rest of my application because it can take a long time e.g. 10 seconds to 2 minutes. To accomplish this I'm considering two options:
Schedule a Quartz job with a simple trigger set to fire only once and as soon as possible.
Create a Runnable instance, hand it off to a Thread, and call run();.
In the context of the above I have the following questions:
What does using the Quartz job get me that the thread doesn't have?
What does using the runable get me that using the quartz job doesn't?
In terms of best practices, what criteria ought be used for deciding between Quartz jobs and runnables for this use case?

With Quartz you have many features "well implemented", like:
transaction mgmt of job execution
job persistence, so that we know the status of the running jobs
clustering supports
scheduling control, even if you just need the simple trigger. But it provides the possibility.
without using it, you have to control them on your own, some issue could be complicated.
Starting new thread:
light weight no job persistence, quartz api etc.
your application runs without extra dependency (quartz)
error source (from quartz) was reduced
It depends on what kind of job do you want to start, and if other features of your application require job scheduling too.
If your concern is just asynchronisation, you can just start a thread. If there were other concerns, like clustering, you may consider to use quartz.

I would not add Quartz to a project just for this capability, but if I already had Quartz installed and was already using it, then, yea, even for a one off I would use a one time immediate Quartz job.
The reason is simply consistency. Quartz already manages all of the details of the thread and job process. A single thread is Simple, but we also know from experience that even a single thread can be Not Simple.
Quartz wraps the thread in to a high level concept (the Job), and all that which it brings with it.
From a code base point of view you get the consistency of all your jobs having the same semantics, your developers don't have to "shift gears" "just for a thread". Later, they may "just do a thread" and run in to a complexity that Quartz manages painlessly.
The overhead of the abstraction and conditions that make a Quartz job are not significant enough to just use a thread in this case because "it's lighter weight".
Consistency and commonality are important aspects to a codebase. I would stick to the single abstraction and leverage as much as I can.

If it's a one-time job and there are no additional requirements, like job persistency, scheduling, etc. then you're better off with regular threads.
Quartz jobs are much more robust than regular threads and support scheduling, job persistence, etc., all the other stuff that you probably don't need.
No need to set anything up with Runnables and Threads
If you think there might be more jobs that this, scheduled jobs, delayed jobs, etc, you have 2 options: go with Java's standard Excecutors. Set up a thread pool and use this to run your jobs. You might also want to use Spring's TaskExecutor abstraction so you can easily switch between Quartz and Executors when you need it. But that seems like an overkill for a one-time gig.

For immediate 1 time task, Threads will be enough.
But there are better plugins available like quartz, Spring Scheduler

Related

Framework to execute long running tasks

Are there any frameworks available in java or .NET to execute long running tasks?
This framework should give me the flexibility to plug in my implementation to execute the job and also ability to control the run-time like the number of tasks that execute and load balancing of execution.
I would like to here different approaches to solve the problem.
in Java, you can try the Java 5 Executor framework, Spring Batch or Quartz depending on your need.
In Java:
ThreadPoolExecutor offers you a thread pool and lets you execute tasks in it. It also offers means to determine current active tasks, number of already executed tasks etc.
Windows Workflow Foundation might work for you in .NET depending on what exactly you mean by "Long Running Task". It has the capability of recording the task's state and restoring it at a later time, allowing the task to survive a reboot or other interuption.
TaskParallelLibrary in .Net 4.0 has facility to work with parallel, asynchronous and long-running tasks.
See http://msdn.microsoft.com/en-us/library/dd460717.aspx
In .Net for "long running tasks" it would be Windows Workflow Foundation. And for "ability to control the run-time like the number of tasks that execute and load balancing of execution" it would be Task Parallel Library. You might need to evaluate your problem statement and see if either one of these or combination of both of these frameworks can solve your problem.

Quartz in Webapplication

I have a question in scheduling jobs in web application. If we have to schedule jobs in web application we can either use java util Timer/TimerTask or Quartz(there are also other scheduling mechanism, but I considered Quartz). I was considering which one to use, when i hit the site http://oreilly.com/pub/a/java/archive/quartz.html?page=1 which says using timer has a bad effect as it creates a thread that is out of containers control in the last line. The other pages discuss Quartz and its capabilities, but I can read that Quartz also uses thread and/or threadpool to schedule tasks. My guess is that these threads are also not under the containers control
Can anybody clarify this to me
Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Thanks in advance
Can anybody clarify this to me Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Both quartz and the JDK Timer start unmanaged threads that do not have access to Java EE contextual information, that's the biggest issue. In addition, they can use resources without the [application server] knowing about it, exist without an administrator's ability to control their number and resource usage, and impede on the application server's ability to gracefully shutdown or recover resources from failure (see Unmanaged threads).
Having that said, I didn't face hanging threads or locking issues (I guess it depends on what you're doing with them though).
If really this is a concern, consider using a JSR-237 Timer and WorkManager implementation (that works with managed thread) like Foo-CommonJ instead of quartz or JDK Timer.
Both approaches created unmanaged threads. I use Quartz for scheduling rather than java Timer since it offer more flexability (cron expressions, for example) and it better managable.
If I have to say in one line, I would say use Quartz as it will take care of managing scheduling related low level work for you. With Timer, you can do everything which quartz does (even make timer threads keep polling to check if web app is running and exit otherwise). But this needs to be done in your code by you. With Quartz all this you get out of the box.
Now details
Quartz provides
1. Job persistence
2. Managed thread pool so you create appropriate number of threads and make the jobs wait after that.
3. Initialization servlet to be integrated with your web application. When app shuts down, I think it takes care of closing your threads, but I have not tried it. So I would not comment much on it.
4. RMI based scheduling, for clustered environments.
There are others as well, but these ones have been the biggest motivators why people use quartz more frequently.

Is it possible to run a cron job in a web application?

In a java web application (servlets/spring mvc), using tomcat, is it possible to run a cron job type service?
e.g. every 15 minutes, purge the log database.
Can you do this in a way that is container independent, or it has to be run using tomcat or some other container?
Please specify if the method is guaranteed to run at a specific time or one that runs every 15 minutes, but may be reset etc. if the application recycles (that's how it is in .net if you use timers)
As documented in Chapter 23. Scheduling and Thread Pooling, Spring has scheduling support through integration classes for the Timer and the Quartz Scheduler (http://www.quartz-scheduler.org/). For simple needs, I'd recommend to go with the JDK Timer.
Note that Java schedulers are usually used to trigger Java business oriented jobs. For sysadmin tasks (like the example you gave), you should really prefer cron and traditional admin tools (bash, etc).
If you're using Spring, you can use the built-in Quartz or Timer hooks. See http://static.springsource.org/spring/docs/2.5.x/reference/scheduling.html
It will be container-specific. You can do it in Java with Quartz or just using Java's scheduling concurrent utils (ScheduledExecutorService) or as an OS-level cron job.
Every 15 minutes seems extreme. Generally I'd also advise you only to truncate/delete log files that are no longer being written to (and they're generally rolled over overnight).
Jobs are batch oriented. Either by manual trigger or cron-style (as you seem to want).
Still I don't get your relation between webapp and cron-style job? The only webapp use-case I could think of is, that you want to have a HTTP endpoint to trigger a job (but this opposes your statement about being 'cron-style').
Generally use a dedicated framework, which solves the problem-area 'batch-jobs'. I can recommend quartz.

How can an EJB parallelize a long, CPU intensive process?

The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.
It’s theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I’d like to use this parallelization to optimize performance.
How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.
Is there a solution for this?
There are all sorts of ways to do this.
One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.
For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.
One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.
Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.
The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.
So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.
This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.
Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.
I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.
I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.
You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.
Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.
This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.
Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.
WorkManager info
Related questions
Why Spawning threads is discouraged
An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.
The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.
Back to the Future - Java EE 7 has lot more Concurrency support via ManagedThreadFactory, ManagedExecutor service etc (JSR 236: Concurrency Utilities for Java EE) with which you can create your own 'managed'Threads .It is no longer a taboo in EE AS supporting it (Wildfly ?) via usining the ManagedThread* API's
More details
https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdf
http://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities002.htm
I once participated in a project where EJB transactions ran for up to 5 hours at a time. Aargh!
This same application also had a BEA specialist consultant who approved that they started additional threads from the transactions. While it's disrecommended in the specs and elsewhere, it doesn't automatically result in failure. You need to be aware that your extra threads are outside the container's control and thus if something goes wrong it's your fault. But if you can assure that the number of threads started in the worst case doesn't exceed reasonable limits, and that they all terminate cleanly within reasonable time, then it is quite possible to work like this. In fact, in your case it sounds like the almost-only solution.
There are some slightly esoteric solutions possible where your EJB app reaches out to another app for a service, which then does the multithreading in itself before returning to the EJB caller. But this is essentially just shifting the problem around.
You may, however, consider a thread pooling solution to keep an upper limit on the number of threads spawned. If you have too many threads your application will behave horribly.
You've analyzed the situation quite well, and no, there is not patern for this that match the EJB model.
Creating threads is mainly forbidden because it bypass the app. server thread management strategy and also because of the transactions.
I worked on a project with similar requireements and I decided to spawn additional threads (going against the sepc then). The operation to parallelized was read-only, so it worked regarding the transaction (the thread would basically have not transaction associated to them). I also knew that I wouldn't spawn too many threads per EJB calls, so the number of threads was not an issue. But if your threads are supposed to modify data, then you break the transactional model of the EJB seriously. But if your operation in pure computing, that might be ok.
Hope it helps...

Time triggered job Cron or Quartz?

I already asked a separate question on how to create time triggered event in Java. I was introduced to Quartz.
At the same time, I also google it online, and people are saying cron in Unix is a neat solution.
Which one is better? What's the cons and pros?
Some specification of the system:
* Unix OS
* program written in Java
* I have a task queue with 1000+ entries, for each timestamp, up to 500 tasks might be triggered.
Using cron seems to add another entry point into your application, while Quartz would integrate into it. So you would be forced to deal with some inter-process communication if you wanted to pass some information to/from the process invoked from cron. In Quartz you simply (hehe) run multiple threads.
cron is platform dependent, Quartz is not.
Quartz may allow you to reliably make sure a task is run at the given time or some time after if the server was down for some time. Pure cron wouldn't do it for you (unless you handle it manually).
Quartz has a more flexible language of expressing occurences (when the tasks should be fired).
Consider the memory footprint. If your single tasks share nothing or little, then it might be better to run them from the operating system as a separate process. If they share a lot of information, it's better to have them as threads within one process.
Not quite sure how you could handle the clustering in the cron approach. Quartz might be used with Terracotta following the scaling out pattern (I haven't tried it, but I believe it's doable).
The plus for cron is that any sysadmin knows how to use it and it's documented in many places. If cron will do the job then it would really be the preferred solution.

Categories