What are some workarounds when dealing with third party libraries that don't correctly clean-up threads when the library is shutdown?
Many libraries expose lifecycle methods, either explicitly or implicitly, for the code contained therein. For example, a web application framework exists within a web application context in a servlet container. When the context is created, the framework may start some threads, for various reasons.
Now, taking the example further, when the servlet container or the web application context is shutdown, the web application framework should terminate all these threads. Either the ExecutorServices created by the library should be shutdown, or some other means of stopping those threads should be enacted.
Worst case is that the threads are non-daemon threads. These would actually stop the termination of the Java process. But even if they are daemon threads, allowing threads to continue is probably bad practice. If (back to the example) the servlet container is embedded within other code that other code may continue to run with the threads still chugging away, which may cause problems.
There's no programmatic way exposed to stop these threads, so what can be done?
If you are using such a third party library, what are the ways to force existing threads to be shutdown?
If you are using such a third party library, what are the ways to force existing threads to be shutdown?
In general, there is no safe way to do this that is guaranteed to work in all cases. The closest you are likely to get to a safe solution is to use the ThreadGroup to enumerate all existing Threads, and use Thread.interrupt() to tell each thread to go stop. Of course, there is no guarantee that the threads will pay attention, or that they will shutdown cleanly in response to an interrupt.
IMO, the best strategy is:
if the threads do not need to be shut down cleanly, then pull the plug on the JVM by calling System.exit()
if the threads need to be shut down cleanly to avoid the possibility of damage, then don't use the library ... or fix the problem yourself.
Related
I am working on a platfor that hosts small Java applications, all of which currently uses a single thread, living inside a Docker engine, consuming data from a Kafka server and logging to a central DB.
Now, I need to put another Java application to this platform. This app at hand uses multithreading relatively heavily, I already tested it inside a Docker container and it works perfectly there, so I'm ready to deploy it on the platform where it would be scaled manually, that is, some human would define the number of containers that would be started, each of them containing an instance of this app.
My Architect has an objection, saying that "In a distributed environment we never use multithreading". So now, I have to refactor my application eliminating any thread related logic from it, making it single threaded. I requested a more detailed reasoning from him, but he yells "If you are not aware of this principle, you have no place near Java".
Is it really a mistake to use a multithreaded Java application in a distributed system - a simple cluster with ten or twenty physical machines, each hosting a number of virtual machines, which then runs Docker containers, with Java applications inside them.
Honestly, I don't see the problem of multithreading inside a container.
Is it really a mistake or somehow "forbidden"?
Thanks.
When you write for example a web application that will run in a Java EE application server, then normally you should not start up your own threads in your web application. The application server will manage threads, and will allocate threads to process incoming requests on the server.
However, there is no hard rule or reason why it is never a good idea to use multi-threading in a distributed environment.
There are advantages to making applications single-threaded: the code will be simpler and you won't have to deal with difficult concurrency issues.
But "in a distributed environment we never use multithreading" is not necessarily always true and "if you are not aware of this principle, you have no place near Java" sounds arrogant and condescending.
I guess he only tells you this as using a single thread eliminates multi threading and data ordering issues.
There is nothing wrong with multithreading though.
Distributed systems usually have tasks that are heavily I/O bound.
If I/O calls are blocking in your system
The only way to achieve concurrency within the process is spawning new threads to do other useful work. (Multi-threading).
The caveat with this approach is that, if they are too many threads
in flight, the operating system will spend too much time context
switching between threads, which is wasteful work.
If I/O calls are Non-Blocking in your system
Then you can avoid the Multi-threading approach and use a single thread to service all your requests. (read about event-loops or Java's Netty Framework or NodeJS)
The upside for single thread approach
The OS does not any wasteful thread context switches.
You will NOT run into any concurrency problems like dead locks or race conditions.
The downside is that
It is often harder to code/think in a non-blocking fashion
You typically end up using more memory in the form of blocking queues.
What? We use RxJava and Spring Reactor pretty heavily in our application and it works pretty fine. You can't work with threads across two JVMs anyway. So just make sure that your logic is working as you expect on a single JVM.
One of the first things I've learned about Java EE development is that I shouldn't spawn my own threads inside a Java EE container. But when I come to think about it, I don't know the reason.
Can you clearly explain why it is discouraged?
I am sure most enterprise applications need some kind of asynchronous jobs like mail daemons, idle sessions, cleanup jobs etc.
So, if indeed one shouldn't spawn threads, what is the correct way to do it when needed?
It is discouraged because all resources within the environment are meant to be managed, and potentially monitored, by the server. Also, much of the context in which a thread is being used is typically attached to the thread of execution itself. If you simply start your own thread (which I believe some servers will not even allow), it cannot access other resources. What this means, is that you cannot get an InitialContext and do JNDI lookups to access other system resources such as JMS Connection Factories and Datasources.
There are ways to do this "correctly", but it is dependent on the platform being used.
The commonj WorkManager is common for WebSphere and WebLogic as well as others
More info here
And here
Also somewhat duplicates this one from this morning
UPDATE: Please note that this question and answer relate to the state of Java EE in 2009, things have improved since then!
For EJBs, it's not only discouraged, it's expressly forbidden by the specification:
An enterprise bean must not use thread
synchronization primitives to
synchronize execution of multiple
instances.
and
The enterprise bean must not attempt
to manage threads. The enterprise
bean must not attempt to start, stop,
suspend, or resume a thread, or to
change a thread’s priority or name.
The enterprise bean must not attempt
to manage thread groups.
The reason is that EJBs are meant to operate in a distributed environment. An EJB might be moved from one machine in a cluster to another. Threads (and sockets and other restricted facilities) are a significant barrier to this portability.
The reason that you shouldn't spawn your own threads is that these won't be managed by the container. The container takes care of a lot of things that a novice developer can find hard to imagine. For example things like thread pooling, clustering, crash recoveries are performed by the container. When you start a thread you may lose some of those. Also the container lets you restart your application without affecting the JVM it runs on. How this would be possible if there are threads out of the container's control?
This the reason that from J2EE 1.4 timer services were introduced. See this article for details.
Concurrency Utilities for Java EE
There is now a standard, and correct way to create threads with the core Java EE API:
JSR 236: Concurrency Utilities for Java™ EE
By using Concurrency Utils, you ensure that your new thread is created, and managed by the container, guaranteeing that all EE services are available.
Examples here
There is no real reason not to do so. I used Quarz with Spring in a webapp without problems. Also the concurrency framework java.util.concurrent may be used. If you implement your own thread handling, set the theads to deamon or use a own deamon thread group for them so the container may unload your webapp any time.
But be careful, the bean scopes session and request do not work in threads spawned! Also other code beased on ThreadLocal does not work out of the box, you need to transfer the values to the spawned threads by yourself.
You can always tell the container to start stuff as part of your deployment descriptors. These can then do whatever maintainance tasks you need to do.
Follow the rules. You will be glad some day you did :)
Threads are prohibited in Java EE containers according to the blueprints. Please refer to the blueprints for more information.
I've never read that it's discouraged, except from the fact that it's not easy to do correctly.
It is fairly low-level programming, and like other low-level techniques you ought to have a good reason. Most concurrency problems can be resolved far more effectively using built-in constructs like thread pools.
One reason I have found if you spawn some threads in you EJB and then you try to have the container unload or update your EJB you are going to run into problems. There is almost always another way to do something where you don't need a Thread so just say NO.
I am currently building a java-servlet-based web application that should offer its service to quite a lot of users (don't ask me how much "a lot" is :-) - I don't know yet).
However, while the application is being used, there might occur some long-taking processing on the serverside.
In order to avoid bad UI responsiveness, I decided to move these processing operations into their own threads.
This means that once a user is logged in, it can happen that 1-10 threads run in the background (per user!).
I once heard that using multiple threads in a web application is a "bad idea".
Is this true and if yes: Why?
Update: I forgot to mention that my application heavily relies on ajax calls. Every user action causes a new ajax call. So, when the main servlet thread is busy, the ajax call takes very long to process. That's why I want to use multiple threads.
It is a bad idea to manually create the threads yourself. This has been discussed a lot here in SO. See this question for example.
Another question discusses alternative solutions.
The "bad idea" isn't multiple threads. Java EE was originally written so multi-threading was in the hands of the app server, so users were discouraged from starting their own threads.
I think what you really want is asynchronous processing for long-running tasks so users won't have to wait for them to finish before proceeding.
You could do that with JMS and stay within the lines in the Java EE coloring book. I think that it's safer to do on your own, now that there are new classes and constructs in the java.util.concurrent package.
It's still not an easy thing to do. Multi-threaded code isn't trivial. But I think it's easier than it used to be in Java.
Part of the problem might be that you're asking that servlet to do too much. Servlets should listen for HTTP request and orchestrate getting a response from other classes, not do all the processing themselves. Perhaps your servlet is telling you that it's time to refactor a bit. This will help your testing, since you'll be able to unit test those asynch classes without having a servlet/JSP engine running.
AJAX calls to services via HTTP need not block. If the service can return a token, a la FedEx, that tells the app when and how to get the response, there's no reason why the service can't process asynchronously. It's an implementation detail for the services that you should hide from clients.
1.
Brilliant Idea.
It's not common, but it's nothing wrong.
If you think asynchronous tasks are needed for better user experiences. Just use it.
2.
You need to be careful with it.
2.1.
Creating and destroying threads add a lot of overhead to your server.
You'd better use a executor, like java.util.concurrent.ThreadPoolExecutor.
2.2.
Don't just use Executors.newFixedThreadPool(). It is for beginners and hides dangerous details.
You need to know the edge behavior of ThreadPoolExecutor, and configure it properly.
How much threads are enough for your task? You need to calculate it out.
What would happen if there is no free theads in your pool? Different configurations can make it wait, cache, or abandon new tasks. What should you expect?
What would happen if a task runs for too long(such as an infinite loop)? There is no real timeout-and-exit mechanism in java. How do you prevent these.
If the application requires it, then I say go ahead and do the background threads, but, since you don't know how many users you will have, you are taking a great risk that you will overwhelm your server. You might consider some alternatives, if they will work in your situation. Can you run the background tasks completely offline, e.g. in a batch job? Can you limit the number of threads that each logged in user will need? How will you get the results of the background threads back to the user?
This is a bad idea for three main reasons:
Excessive number of running threads can kill system resources and cause some strange things such as starvation and priority inversion. Often this can be solved with a thread pool.
User session duration is unpredictable. The user can fire an action and go for a coffee, or he/she might complain for the delay an redo the action. This can cause creation of multiple background jobs, so requires complex control, and when we talk about threads, we never know for sure if we didn't left race conditions or unantecipated scenarios.
Most likely servlets will have some interaction with the threads. Now suppose your application needs to be scaled, so you use a clustered container (after all, you have "a lot" of users). The container can passivate a session and restore it in another node. But your threads will remain in the initial node, so the link between session and threads will be broken. This ends in unexpected exceptions and error 500 - server failure.
I think the best solution is to design your application so that it won't create so many background threads.
But if you insist or really need it, try using Java EE message driven beans (MDBs) and make your servlet invoke it using JMS, like #duffymo said.
The challenge is how to make communication between MDBs and user sessions. Perhaps your servlet can create a JMS queue or topic and send it to MDBs for them to reply, but I don't know if the servlet side of JMS connection can be passivated and restored.
Another forms of communication would be JNDI or an external database or file, but this requires polling, which might be unresponsive or CPU-excessive.
I have a question in scheduling jobs in web application. If we have to schedule jobs in web application we can either use java util Timer/TimerTask or Quartz(there are also other scheduling mechanism, but I considered Quartz). I was considering which one to use, when i hit the site http://oreilly.com/pub/a/java/archive/quartz.html?page=1 which says using timer has a bad effect as it creates a thread that is out of containers control in the last line. The other pages discuss Quartz and its capabilities, but I can read that Quartz also uses thread and/or threadpool to schedule tasks. My guess is that these threads are also not under the containers control
Can anybody clarify this to me
Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Thanks in advance
Can anybody clarify this to me Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Both quartz and the JDK Timer start unmanaged threads that do not have access to Java EE contextual information, that's the biggest issue. In addition, they can use resources without the [application server] knowing about it, exist without an administrator's ability to control their number and resource usage, and impede on the application server's ability to gracefully shutdown or recover resources from failure (see Unmanaged threads).
Having that said, I didn't face hanging threads or locking issues (I guess it depends on what you're doing with them though).
If really this is a concern, consider using a JSR-237 Timer and WorkManager implementation (that works with managed thread) like Foo-CommonJ instead of quartz or JDK Timer.
Both approaches created unmanaged threads. I use Quartz for scheduling rather than java Timer since it offer more flexability (cron expressions, for example) and it better managable.
If I have to say in one line, I would say use Quartz as it will take care of managing scheduling related low level work for you. With Timer, you can do everything which quartz does (even make timer threads keep polling to check if web app is running and exit otherwise). But this needs to be done in your code by you. With Quartz all this you get out of the box.
Now details
Quartz provides
1. Job persistence
2. Managed thread pool so you create appropriate number of threads and make the jobs wait after that.
3. Initialization servlet to be integrated with your web application. When app shuts down, I think it takes care of closing your threads, but I have not tried it. So I would not comment much on it.
4. RMI based scheduling, for clustered environments.
There are others as well, but these ones have been the biggest motivators why people use quartz more frequently.
The application has a CPU intensive long process that currently runs on one server (an EJB method) serially when the client requests it.
It’s theoretically possible (from a conceptual point of view) to split that process in N chunks and execute them in parallel, as long as the output of all parallel jobs can be collected and joined together before sending it back to the client that initiated the process. I’d like to use this parallelization to optimize performance.
How can I implement this parallelization with EJBs? I know that we should not create threads in a EJB method. Instead, we should publish messages (one per job) to be consumed by message driven beans (MDBs). But then it would not be a synchronous call anymore. And being synchronous seems to be a requirement in this case since I need to collect the output of all jobs before sending it back to the client.
Is there a solution for this?
There are all sorts of ways to do this.
One, you can use an EJB Timer to create a run-once process that will start immediately. This is a good technique to spawn processes in the background. A EJB Timer is associated with a specific Session Bean implementation. You can either add an EJB Timer to every Session Bean that you want to be able to do this, or you can have a single Session Bean that can then call your application logic through some dispatch mechanism.
For me, I pass a serializable blob of parameters along with a class name that meets a specific interface to a generic Session Bean that then executes the class. This way I can easily background most anything.
One caveat about the EJB Timer is that EJB Timers are persistent. Once you create an EJB Timer is stays in the container until its job is finished or canceled. The gotcha on this is that if you have a long running process, and the server goes down, when it restarts the process will continue and pick back up. Mind this can be a good thing, but only if your process is prepared to be restarted. But if your have a simple process iterating through "10,000 items", if the server goes down on item 9,999, when it comes back up you can easily see it simply starting over at item 1. It's all workable, just a caveat to be aware of.
Another way to background something is you can use a JMS queue. Put a message on the queue, and the handler runs aysnchronously from the rest of your application.
The clever part here, and something I has also done leveraging the work with the Timer Bean, is you can control how many "jobs" will run based on how many MDB instances you configure the system to have.
So, for the specific task of running a process in multiple, parallel chunks, I take the task, break it up in to "pieces", and then send each piece on the Message Queue, where the MDBs execute them. If I allow 10 instances of the MDB, I can have 10 "parts" of any task running simultaneously.
This actually works surprisingly well. There's a little overhead it splitting the process up and routing it through the JMS queue, but that's all basically "start up time". Once it gets going, you get a real benefit.
Another benefit of using the Message Queue is you can have your actual long running processes executing on a separate machine, or you can readily create a cluster of machines to handle these processes. Yet, the interface is the same, and the code doesn't know the difference.
I've found once you've relegated a long running process to the background, you can pay the price of having less-that-instant access to that process. That is, there's no reason to monitor the executing classes themselves directly, just have them publish interesting information and statistic to the database, or JMX, or whatever rather than having something that can monitor the object directly because it shares the same memory space.
I was easily able to set up a framework that lets task run either on the EJB Timer or on the MDB scatter queue, the tasks are the same, and I could monitor their progress, stop them, etc.
You could combine the scatter technique to create several EJB Timer jobs. One of the free advantages of the MDB is it acts as a thread pool which can throttle your jobs (so you don't suddenly saturate your system with too many background processes). You get this "for free" just by leveraging the EJB management features in the container.
Finally, Java EE 6 has a new "asynchronous" (or something) qualifier for Session Bean methods. I do not know the details on how this works, as I've yet to play with a new Java EE 6 container. But I imagine you're probably not going to want to change containers just for this facility.
This particular question has come up on multiple occasions and I will summarize that there are several possible solutions, only 1 of which I would recommend.
Use a WorkManager from the commonj API. It allows for managed threads in a Java EE container and is specifically designed to fit your use case. If you are using WebSphere or WebLogic, these API's are already available in your server. For others your will have to put a third party solution in yourself.
WorkManager info
Related questions
Why Spawning threads is discouraged
An EJB is a ultimately a transactional component for a client-server system providing request/reply semantics. If you find yourself in the position that you need to pigeonhole a long-running transaction within the bounds of a request/reply cycle, then somewhere your system architect(ure) has taken the wrong turn.
The situation you describe is cleanly and correctly handled by an event based architecture with a messaging back end. Initial event initiates the process (which can then be trivially parallelized by having the workers subscribe to the event topic) and the aggregating process itself raises an event on its completion. You can still squeeze these sequence within the bounds of a request/reply cycle, but you will by necessity violate the letter and spirit of the Java EE system architecture specs.
Back to the Future - Java EE 7 has lot more Concurrency support via ManagedThreadFactory, ManagedExecutor service etc (JSR 236: Concurrency Utilities for Java EE) with which you can create your own 'managed'Threads .It is no longer a taboo in EE AS supporting it (Wildfly ?) via usining the ManagedThread* API's
More details
https://jcp.org/aboutJava/communityprocess/ec-public/materials/2013-01-1516/JSR236-EC-F2F-Jan2013.pdf
http://docs.oracle.com/javaee/7/tutorial/doc/concurrency-utilities002.htm
I once participated in a project where EJB transactions ran for up to 5 hours at a time. Aargh!
This same application also had a BEA specialist consultant who approved that they started additional threads from the transactions. While it's disrecommended in the specs and elsewhere, it doesn't automatically result in failure. You need to be aware that your extra threads are outside the container's control and thus if something goes wrong it's your fault. But if you can assure that the number of threads started in the worst case doesn't exceed reasonable limits, and that they all terminate cleanly within reasonable time, then it is quite possible to work like this. In fact, in your case it sounds like the almost-only solution.
There are some slightly esoteric solutions possible where your EJB app reaches out to another app for a service, which then does the multithreading in itself before returning to the EJB caller. But this is essentially just shifting the problem around.
You may, however, consider a thread pooling solution to keep an upper limit on the number of threads spawned. If you have too many threads your application will behave horribly.
You've analyzed the situation quite well, and no, there is not patern for this that match the EJB model.
Creating threads is mainly forbidden because it bypass the app. server thread management strategy and also because of the transactions.
I worked on a project with similar requireements and I decided to spawn additional threads (going against the sepc then). The operation to parallelized was read-only, so it worked regarding the transaction (the thread would basically have not transaction associated to them). I also knew that I wouldn't spawn too many threads per EJB calls, so the number of threads was not an issue. But if your threads are supposed to modify data, then you break the transactional model of the EJB seriously. But if your operation in pure computing, that might be ok.
Hope it helps...