I have a question in scheduling jobs in web application. If we have to schedule jobs in web application we can either use java util Timer/TimerTask or Quartz(there are also other scheduling mechanism, but I considered Quartz). I was considering which one to use, when i hit the site http://oreilly.com/pub/a/java/archive/quartz.html?page=1 which says using timer has a bad effect as it creates a thread that is out of containers control in the last line. The other pages discuss Quartz and its capabilities, but I can read that Quartz also uses thread and/or threadpool to schedule tasks. My guess is that these threads are also not under the containers control
Can anybody clarify this to me
Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Thanks in advance
Can anybody clarify this to me Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Both quartz and the JDK Timer start unmanaged threads that do not have access to Java EE contextual information, that's the biggest issue. In addition, they can use resources without the [application server] knowing about it, exist without an administrator's ability to control their number and resource usage, and impede on the application server's ability to gracefully shutdown or recover resources from failure (see Unmanaged threads).
Having that said, I didn't face hanging threads or locking issues (I guess it depends on what you're doing with them though).
If really this is a concern, consider using a JSR-237 Timer and WorkManager implementation (that works with managed thread) like Foo-CommonJ instead of quartz or JDK Timer.
Both approaches created unmanaged threads. I use Quartz for scheduling rather than java Timer since it offer more flexability (cron expressions, for example) and it better managable.
If I have to say in one line, I would say use Quartz as it will take care of managing scheduling related low level work for you. With Timer, you can do everything which quartz does (even make timer threads keep polling to check if web app is running and exit otherwise). But this needs to be done in your code by you. With Quartz all this you get out of the box.
Now details
Quartz provides
1. Job persistence
2. Managed thread pool so you create appropriate number of threads and make the jobs wait after that.
3. Initialization servlet to be integrated with your web application. When app shuts down, I think it takes care of closing your threads, but I have not tried it. So I would not comment much on it.
4. RMI based scheduling, for clustered environments.
There are others as well, but these ones have been the biggest motivators why people use quartz more frequently.
Related
Let's say I have some unit of work that needs to get done and I want to do it asynchronously relative to the rest of my application because it can take a long time e.g. 10 seconds to 2 minutes. To accomplish this I'm considering two options:
Schedule a Quartz job with a simple trigger set to fire only once and as soon as possible.
Create a Runnable instance, hand it off to a Thread, and call run();.
In the context of the above I have the following questions:
What does using the Quartz job get me that the thread doesn't have?
What does using the runable get me that using the quartz job doesn't?
In terms of best practices, what criteria ought be used for deciding between Quartz jobs and runnables for this use case?
With Quartz you have many features "well implemented", like:
transaction mgmt of job execution
job persistence, so that we know the status of the running jobs
clustering supports
scheduling control, even if you just need the simple trigger. But it provides the possibility.
without using it, you have to control them on your own, some issue could be complicated.
Starting new thread:
light weight no job persistence, quartz api etc.
your application runs without extra dependency (quartz)
error source (from quartz) was reduced
It depends on what kind of job do you want to start, and if other features of your application require job scheduling too.
If your concern is just asynchronisation, you can just start a thread. If there were other concerns, like clustering, you may consider to use quartz.
I would not add Quartz to a project just for this capability, but if I already had Quartz installed and was already using it, then, yea, even for a one off I would use a one time immediate Quartz job.
The reason is simply consistency. Quartz already manages all of the details of the thread and job process. A single thread is Simple, but we also know from experience that even a single thread can be Not Simple.
Quartz wraps the thread in to a high level concept (the Job), and all that which it brings with it.
From a code base point of view you get the consistency of all your jobs having the same semantics, your developers don't have to "shift gears" "just for a thread". Later, they may "just do a thread" and run in to a complexity that Quartz manages painlessly.
The overhead of the abstraction and conditions that make a Quartz job are not significant enough to just use a thread in this case because "it's lighter weight".
Consistency and commonality are important aspects to a codebase. I would stick to the single abstraction and leverage as much as I can.
If it's a one-time job and there are no additional requirements, like job persistency, scheduling, etc. then you're better off with regular threads.
Quartz jobs are much more robust than regular threads and support scheduling, job persistence, etc., all the other stuff that you probably don't need.
No need to set anything up with Runnables and Threads
If you think there might be more jobs that this, scheduled jobs, delayed jobs, etc, you have 2 options: go with Java's standard Excecutors. Set up a thread pool and use this to run your jobs. You might also want to use Spring's TaskExecutor abstraction so you can easily switch between Quartz and Executors when you need it. But that seems like an overkill for a one-time gig.
For immediate 1 time task, Threads will be enough.
But there are better plugins available like quartz, Spring Scheduler
What are some workarounds when dealing with third party libraries that don't correctly clean-up threads when the library is shutdown?
Many libraries expose lifecycle methods, either explicitly or implicitly, for the code contained therein. For example, a web application framework exists within a web application context in a servlet container. When the context is created, the framework may start some threads, for various reasons.
Now, taking the example further, when the servlet container or the web application context is shutdown, the web application framework should terminate all these threads. Either the ExecutorServices created by the library should be shutdown, or some other means of stopping those threads should be enacted.
Worst case is that the threads are non-daemon threads. These would actually stop the termination of the Java process. But even if they are daemon threads, allowing threads to continue is probably bad practice. If (back to the example) the servlet container is embedded within other code that other code may continue to run with the threads still chugging away, which may cause problems.
There's no programmatic way exposed to stop these threads, so what can be done?
If you are using such a third party library, what are the ways to force existing threads to be shutdown?
If you are using such a third party library, what are the ways to force existing threads to be shutdown?
In general, there is no safe way to do this that is guaranteed to work in all cases. The closest you are likely to get to a safe solution is to use the ThreadGroup to enumerate all existing Threads, and use Thread.interrupt() to tell each thread to go stop. Of course, there is no guarantee that the threads will pay attention, or that they will shutdown cleanly in response to an interrupt.
IMO, the best strategy is:
if the threads do not need to be shut down cleanly, then pull the plug on the JVM by calling System.exit()
if the threads need to be shut down cleanly to avoid the possibility of damage, then don't use the library ... or fix the problem yourself.
One of the first things I've learned about Java EE development is that I shouldn't spawn my own threads inside a Java EE container. But when I come to think about it, I don't know the reason.
Can you clearly explain why it is discouraged?
I am sure most enterprise applications need some kind of asynchronous jobs like mail daemons, idle sessions, cleanup jobs etc.
So, if indeed one shouldn't spawn threads, what is the correct way to do it when needed?
It is discouraged because all resources within the environment are meant to be managed, and potentially monitored, by the server. Also, much of the context in which a thread is being used is typically attached to the thread of execution itself. If you simply start your own thread (which I believe some servers will not even allow), it cannot access other resources. What this means, is that you cannot get an InitialContext and do JNDI lookups to access other system resources such as JMS Connection Factories and Datasources.
There are ways to do this "correctly", but it is dependent on the platform being used.
The commonj WorkManager is common for WebSphere and WebLogic as well as others
More info here
And here
Also somewhat duplicates this one from this morning
UPDATE: Please note that this question and answer relate to the state of Java EE in 2009, things have improved since then!
For EJBs, it's not only discouraged, it's expressly forbidden by the specification:
An enterprise bean must not use thread
synchronization primitives to
synchronize execution of multiple
instances.
and
The enterprise bean must not attempt
to manage threads. The enterprise
bean must not attempt to start, stop,
suspend, or resume a thread, or to
change a thread’s priority or name.
The enterprise bean must not attempt
to manage thread groups.
The reason is that EJBs are meant to operate in a distributed environment. An EJB might be moved from one machine in a cluster to another. Threads (and sockets and other restricted facilities) are a significant barrier to this portability.
The reason that you shouldn't spawn your own threads is that these won't be managed by the container. The container takes care of a lot of things that a novice developer can find hard to imagine. For example things like thread pooling, clustering, crash recoveries are performed by the container. When you start a thread you may lose some of those. Also the container lets you restart your application without affecting the JVM it runs on. How this would be possible if there are threads out of the container's control?
This the reason that from J2EE 1.4 timer services were introduced. See this article for details.
Concurrency Utilities for Java EE
There is now a standard, and correct way to create threads with the core Java EE API:
JSR 236: Concurrency Utilities for Java™ EE
By using Concurrency Utils, you ensure that your new thread is created, and managed by the container, guaranteeing that all EE services are available.
Examples here
There is no real reason not to do so. I used Quarz with Spring in a webapp without problems. Also the concurrency framework java.util.concurrent may be used. If you implement your own thread handling, set the theads to deamon or use a own deamon thread group for them so the container may unload your webapp any time.
But be careful, the bean scopes session and request do not work in threads spawned! Also other code beased on ThreadLocal does not work out of the box, you need to transfer the values to the spawned threads by yourself.
You can always tell the container to start stuff as part of your deployment descriptors. These can then do whatever maintainance tasks you need to do.
Follow the rules. You will be glad some day you did :)
Threads are prohibited in Java EE containers according to the blueprints. Please refer to the blueprints for more information.
I've never read that it's discouraged, except from the fact that it's not easy to do correctly.
It is fairly low-level programming, and like other low-level techniques you ought to have a good reason. Most concurrency problems can be resolved far more effectively using built-in constructs like thread pools.
One reason I have found if you spawn some threads in you EJB and then you try to have the container unload or update your EJB you are going to run into problems. There is almost always another way to do something where you don't need a Thread so just say NO.
In a java web application (servlets/spring mvc), using tomcat, is it possible to run a cron job type service?
e.g. every 15 minutes, purge the log database.
Can you do this in a way that is container independent, or it has to be run using tomcat or some other container?
Please specify if the method is guaranteed to run at a specific time or one that runs every 15 minutes, but may be reset etc. if the application recycles (that's how it is in .net if you use timers)
As documented in Chapter 23. Scheduling and Thread Pooling, Spring has scheduling support through integration classes for the Timer and the Quartz Scheduler (http://www.quartz-scheduler.org/). For simple needs, I'd recommend to go with the JDK Timer.
Note that Java schedulers are usually used to trigger Java business oriented jobs. For sysadmin tasks (like the example you gave), you should really prefer cron and traditional admin tools (bash, etc).
If you're using Spring, you can use the built-in Quartz or Timer hooks. See http://static.springsource.org/spring/docs/2.5.x/reference/scheduling.html
It will be container-specific. You can do it in Java with Quartz or just using Java's scheduling concurrent utils (ScheduledExecutorService) or as an OS-level cron job.
Every 15 minutes seems extreme. Generally I'd also advise you only to truncate/delete log files that are no longer being written to (and they're generally rolled over overnight).
Jobs are batch oriented. Either by manual trigger or cron-style (as you seem to want).
Still I don't get your relation between webapp and cron-style job? The only webapp use-case I could think of is, that you want to have a HTTP endpoint to trigger a job (but this opposes your statement about being 'cron-style').
Generally use a dedicated framework, which solves the problem-area 'batch-jobs'. I can recommend quartz.
The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.
It’s theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I’d like to use this parallelization to optimize performance.
How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.
Is there a solution for this?
There are all sorts of ways to do this.
One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.
For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.
One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.
Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.
The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.
So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.
This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.
Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.
I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.
I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.
You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.
Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.
This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.
Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.
WorkManager info
Related questions
Why Spawning threads is discouraged
An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.
The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.
Back to the Future - Java EE 7 has lot more Concurrency support via ManagedThreadFactory, ManagedExecutor service etc (JSR 236: Concurrency Utilities for Java EE) with which you can create your own 'managed'Threads .It is no longer a taboo in EE AS supporting it (Wildfly ?) via usining the ManagedThread* API's
More details
https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdf
http://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities002.htm
I once participated in a project where EJB transactions ran for up to 5 hours at a time. Aargh!
This same application also had a BEA specialist consultant who approved that they started additional threads from the transactions. While it's disrecommended in the specs and elsewhere, it doesn't automatically result in failure. You need to be aware that your extra threads are outside the container's control and thus if something goes wrong it's your fault. But if you can assure that the number of threads started in the worst case doesn't exceed reasonable limits, and that they all terminate cleanly within reasonable time, then it is quite possible to work like this. In fact, in your case it sounds like the almost-only solution.
There are some slightly esoteric solutions possible where your EJB app reaches out to another app for a service, which then does the multithreading in itself before returning to the EJB caller. But this is essentially just shifting the problem around.
You may, however, consider a thread pooling solution to keep an upper limit on the number of threads spawned. If you have too many threads your application will behave horribly.
You've analyzed the situation quite well, and no, there is not patern for this that match the EJB model.
Creating threads is mainly forbidden because it bypass the app. server thread management strategy and also because of the transactions.
I worked on a project with similar requireements and I decided to spawn additional threads (going against the sepc then). The operation to parallelized was read-only, so it worked regarding the transaction (the thread would basically have not transaction associated to them). I also knew that I wouldn't spawn too many threads per EJB calls, so the number of threads was not an issue. But if your threads are supposed to modify data, then you break the transactional model of the EJB seriously. But if your operation in pure computing, that might be ok.
Hope it helps...