One of the first things I've learned about Java EE development is that I shouldn't spawn my own threads inside a Java EE container. But when I come to think about it, I don't know the reason.
Can you clearly explain why it is discouraged?
I am sure most enterprise applications need some kind of asynchronous jobs like mail daemons, idle sessions, cleanup jobs etc.
So, if indeed one shouldn't spawn threads, what is the correct way to do it when needed?
It is discouraged because all resources within the environment are meant to be managed, and potentially monitored, by the server. Also, much of the context in which a thread is being used is typically attached to the thread of execution itself. If you simply start your own thread (which I believe some servers will not even allow), it cannot access other resources. What this means, is that you cannot get an InitialContext and do JNDI lookups to access other system resources such as JMS Connection Factories and Datasources.
There are ways to do this "correctly", but it is dependent on the platform being used.
The commonj WorkManager is common for WebSphere and WebLogic as well as others
More info here
And here
Also somewhat duplicates this one from this morning
UPDATE: Please note that this question and answer relate to the state of Java EE in 2009, things have improved since then!
For EJBs, it's not only discouraged, it's expressly forbidden by the specification:
An enterprise bean must not use thread
synchronization primitives to
synchronize execution of multiple
instances.
and
The enterprise bean must not attempt
to manage threads. The enterprise
bean must not attempt to start, stop,
suspend, or resume a thread, or to
change a thread’s priority or name.
The enterprise bean must not attempt
to manage thread groups.
The reason is that EJBs are meant to operate in a distributed environment. An EJB might be moved from one machine in a cluster to another. Threads (and sockets and other restricted facilities) are a significant barrier to this portability.
The reason that you shouldn't spawn your own threads is that these won't be managed by the container. The container takes care of a lot of things that a novice developer can find hard to imagine. For example things like thread pooling, clustering, crash recoveries are performed by the container. When you start a thread you may lose some of those. Also the container lets you restart your application without affecting the JVM it runs on. How this would be possible if there are threads out of the container's control?
This the reason that from J2EE 1.4 timer services were introduced. See this article for details.
Concurrency Utilities for Java EE
There is now a standard, and correct way to create threads with the core Java EE API:
JSR 236: Concurrency Utilities for Java™ EE
By using Concurrency Utils, you ensure that your new thread is created, and managed by the container, guaranteeing that all EE services are available.
Examples here
There is no real reason not to do so. I used Quarz with Spring in a webapp without problems. Also the concurrency framework java.util.concurrent may be used. If you implement your own thread handling, set the theads to deamon or use a own deamon thread group for them so the container may unload your webapp any time.
But be careful, the bean scopes session and request do not work in threads spawned! Also other code beased on ThreadLocal does not work out of the box, you need to transfer the values to the spawned threads by yourself.
You can always tell the container to start stuff as part of your deployment descriptors. These can then do whatever maintainance tasks you need to do.
Follow the rules. You will be glad some day you did :)
Threads are prohibited in Java EE containers according to the blueprints. Please refer to the blueprints for more information.
I've never read that it's discouraged, except from the fact that it's not easy to do correctly.
It is fairly low-level programming, and like other low-level techniques you ought to have a good reason. Most concurrency problems can be resolved far more effectively using built-in constructs like thread pools.
One reason I have found if you spawn some threads in you EJB and then you try to have the container unload or update your EJB you are going to run into problems. There is almost always another way to do something where you don't need a Thread so just say NO.
Related
I am working on a platfor that hosts small Java applications, all of which currently uses a single thread, living inside a Docker engine, consuming data from a Kafka server and logging to a central DB.
Now, I need to put another Java application to this platform. This app at hand uses multithreading relatively heavily, I already tested it inside a Docker container and it works perfectly there, so I'm ready to deploy it on the platform where it would be scaled manually, that is, some human would define the number of containers that would be started, each of them containing an instance of this app.
My Architect has an objection, saying that "In a distributed environment we never use multithreading". So now, I have to refactor my application eliminating any thread related logic from it, making it single threaded. I requested a more detailed reasoning from him, but he yells "If you are not aware of this principle, you have no place near Java".
Is it really a mistake to use a multithreaded Java application in a distributed system - a simple cluster with ten or twenty physical machines, each hosting a number of virtual machines, which then runs Docker containers, with Java applications inside them.
Honestly, I don't see the problem of multithreading inside a container.
Is it really a mistake or somehow "forbidden"?
Thanks.
When you write for example a web application that will run in a Java EE application server, then normally you should not start up your own threads in your web application. The application server will manage threads, and will allocate threads to process incoming requests on the server.
However, there is no hard rule or reason why it is never a good idea to use multi-threading in a distributed environment.
There are advantages to making applications single-threaded: the code will be simpler and you won't have to deal with difficult concurrency issues.
But "in a distributed environment we never use multithreading" is not necessarily always true and "if you are not aware of this principle, you have no place near Java" sounds arrogant and condescending.
I guess he only tells you this as using a single thread eliminates multi threading and data ordering issues.
There is nothing wrong with multithreading though.
Distributed systems usually have tasks that are heavily I/O bound.
If I/O calls are blocking in your system
The only way to achieve concurrency within the process is spawning new threads to do other useful work. (Multi-threading).
The caveat with this approach is that, if they are too many threads
in flight, the operating system will spend too much time context
switching between threads, which is wasteful work.
If I/O calls are Non-Blocking in your system
Then you can avoid the Multi-threading approach and use a single thread to service all your requests. (read about event-loops or Java's Netty Framework or NodeJS)
The upside for single thread approach
The OS does not any wasteful thread context switches.
You will NOT run into any concurrency problems like dead locks or race conditions.
The downside is that
It is often harder to code/think in a non-blocking fashion
You typically end up using more memory in the form of blocking queues.
What? We use RxJava and Spring Reactor pretty heavily in our application and it works pretty fine. You can't work with threads across two JVMs anyway. So just make sure that your logic is working as you expect on a single JVM.
I know that spinning off threads in a JavaEE application is a big no-no. However, I have an application that is a perfect candidate for Java's fork/join mechanism. But, since threads are not supposed to be created within an application, is there any way to make use of this within my EJB? I know that on WebSphere applications the async bean functionality provides this. However, my application is deployed on JBoss EAP 6.1, so this is not an option.
Is there a "legal" way to accomplish fork/join within a JavaEE application?
The best answer now is the Concurrency Utils API in the Java EE 7 specification. You have ManagedExecutors and ManagedThreadPools. Since these managed threads features are and managed tasks are controlled by the application server then ensure your fork join computation uses these resources then you can ensure that threads are contained and not orphaned.
Finally you probably have to write a version of ForkJoinPool that is 'Managed' to get the optimal solution. However it should be possible because one would replace the thread pool executor with the managed version as a first step.
PS: Java EE 8 must resolve this when Java SE 8 is released!
JBoss EAP 6.1 actually supports async EJBs as well. But AFAIK async EJBs only really help you when you don't need to wait for the results of the subtasks (eg. you only need the fork part, not the join part).
If you use java.util.concurrent.ForkJoinPool there isn't a legal way of using it in Java EE and Java EE 7 / JSR-236 does not help (I raised this point with the EG but they couldn't be bothered). ForkJoinPool spwans threads which is illegal and ManagedThreadFactory from EE 7 / JSR-236 is not a ForkJoinWorkerThreadFactory.
In JDK 8 there is a default ForkJoinPool which can be configured to not spawn any threads and run everything in the caller thread (probably through the "java.util.concurrent.ForkJoinPool.common.parallelism" system property). This makes it legal but won't give you any parallelism.
On a more general note fork join tasks should be compute bound and not IO bound. In theory spawning thread threads in Java EE is safe as long as you don't use any Java EE features (and let them terminate when your application undeploys). For example:
transactions
JDBC
JPA (including lazy loading)
security
EJBs
remoting
class loading
JNDI
…
These features also generally make tasks IO bound instead of CPU bound.
And yes the same issues apply to JDK 8 parallel streams.
You can create a Singleton ejb, which is similar to a "service" (i.e. a JMX service). within the context of this special service, you can control threading and synchronization. so, you can create a singleton ejb which encapsulates the job execution with the fork/join logic, and your standard ejbs/mdbs can utilize this service.
In JavaEE the container controls the threads. Just imagine what would happen if every programmer decided to create his own threads? In any case the use of threads in the parallel bulk operations for Java8 have been rejected. SEE HERE You can make any assumption you want, yes I wrote the article.
Within a Java EE 5 environment I have the problem to ensure the existence of some data written by another part before continue processing my own data.
Historically (J2EE time), it was done by putting the data object to be processed into an internal JMS queue after waiting for e.g. 500ms via Thread.sleep.
But this does not feel like the best way to handle that problem, so I have 2 questions:
Is there any problem with using the sleep method within an Java EE context?
What is a reasonable solution to delaying some processing within an Java EE 5 application?
Edit:
I should have mentioned, that my processing takes place while handling objects from a JMS queue via an MDB.
And it may be the case, that the data for which I'm waiting never shows up, so there must be some sort of timeout, after which I can do some special processing with my data.
You can use EJB TimerService feature. Using threads in a managed environment should be avoided.
I agree with #dkaustubh about timers and avoiding threads manipulation in JavaEE.
Another possibility is to use JMS queue with delayed delivery. Although it is not a part of JavaEE API, most of messaging systems vendors supports it. check here.
I think, its possible with some advanced Threading approach. More than thinking on manual synchronizations and thread management, you can always use the Java Concurrent package.
Future can be one of the ways to do this. Please refer to Java Concurrent package.
Use notifications and Object#wait() / Object#notifyAll()
i.e. Multithreaded, the producer notifies the consumer.
Is it a good idea to use ThreadLocal as a context for data in web application?
That's what it was made for. But take care to remove the ThreadLocal on the end of the context, else you might run in memory leaks, or at least hold unused data for too long.
ThreadLocals are also very fast, you can think of it as a HashMap<Thread,Object>, which is always queried with Thread.getCurrentThread().
That depends on the scope of the data. The ThreadLocal will be specific to the request thread, not specific to the user's session (each request, a different request processing thread may be used). Hence it will be important to remove the data as the request processing is completing (so that it doesn't bleed over into some other user's session when that same thread services their request).
If you are completing a request/response pair with a single thread, then it works fine in my experience. However, "event driven" webapps are coming into vogue with the rise of ajax and high performance containers. These event driven models allow for a request thread to be returned to their thread pool, e.g. during I/O events, so that the thread is not occupied waiting for an external service call to return. As a result, a single logical request may be serviced by multiple different threads. Event driven architecture, coupled with NIO on the server side can yield highly improved throughput.
With that said, if your application doesn't have this architecture, it seems reasonable to me.
If you're not familiar with this model, take a look at Tomcat 6's "comet" and Jetty 6's Continuations. These are vendor-specific implementations of asynchronous I/O pending official Servlet 3.0 support. Note that Tomcat 7 claims to be fully 3.0 compliant now.
ThreadLocal in a multithreaded program is much the same as a static/global in a non-threaded program. That is to say, use of ThreadLocal is an abomination.
In general I would say no. Use frameworks to do that for you.
In the web-tier of a web application use the session context (or other on top framework specific contexts) to store data and state over request scope.
If you introduce a business layer it should not be dependent on a specific web-context of course. spring and Java EE provide solutions for security, transactions and persistence as a context.
If you touch this manually you should be really careful; it can lead to cleanup problems, memory leaks, and strange bugs...
Without having the source code for a Java API, is there anyway to know if the API methods create multiple threads ? Are there any conventions to follow if you are writing Java APIs and they create multiple threads. This may be very fundamental question but it happened to spawn out of a discussion in which the crux question was - " How do you know which Java APIs create threads and which don't " ?
One way of determining which libraries create new threads is by disallowing Thread creation and ThreadGroup modification in the SecurityManager. See the java.lang.SecurityManager.checkAccess(Thread) method. By implementing your own SecurityManager, you are able to react on the creation of Threads.
To answer the other question: many libraries create new threads, even if you don't expect it. For example APIs for HTTP communication create Timers for Keep-Alives or session timeouts. Java 2D is creating a signalling thread. Java itself has multiple threads, e.g. the Finalizer thread; the AWT/Swing event dispatcher thread etc.
There's no way to tell. Actually, I don't think you normally would care that much unless you're in some kind of constrained environment. What's I've found is more relevant is to determine if a method is written with an expectation of being run on a particular thread (the AWT Event dispatch thread, in the case I've seen). There's not a way to do that either, unless the code is using some kind of naming convention, or it's documented.
In my experience, if you are looking at core java, not J2EE, the only time I can think that threads are created in core Java is with Swing.
I haven't seen any example of other threads being created by the core Java APIs, except for the Thread class, of course. :)
But, if you are using other libraries then it may be that they are creating threads, but, if you don't want to profile, you may want to use AspectJ to log whenever a new thread is created, and the stack track of what called it, so you can see what is creating the threads.
UPDATE:
Swing uses 4 threads, according to this post, but he also explains how you can go about killing off the threads, if needed.
http://www.herongyang.com/Swing/jframe_2.html
If you want to see active threads, just fire up the jvisualvm application (located in your $JDK/bin directory) to connect to any local java process. You'll be able to see a multitude of information about the process, including thread names, status, and history. Get more information here.