Fork/Join for JavaEE app? - java

I know that spinning off threads in a JavaEE application is a big no-no. However, I have an application that is a perfect candidate for Java's fork/join mechanism. But, since threads are not supposed to be created within an application, is there any way to make use of this within my EJB? I know that on WebSphere applications the async bean functionality provides this. However, my application is deployed on JBoss EAP 6.1, so this is not an option.
Is there a "legal" way to accomplish fork/join within a JavaEE application?

The best answer now is the Concurrency Utils API in the Java EE 7 specification. You have ManagedExecutors and ManagedThreadPools. Since these managed threads features are and managed tasks are controlled by the application server then ensure your fork join computation uses these resources then you can ensure that threads are contained and not orphaned.
Finally you probably have to write a version of ForkJoinPool that is 'Managed' to get the optimal solution. However it should be possible because one would replace the thread pool executor with the managed version as a first step.
PS: Java EE 8 must resolve this when Java SE 8 is released!

JBoss EAP 6.1 actually supports async EJBs as well. But AFAIK async EJBs only really help you when you don't need to wait for the results of the subtasks (eg. you only need the fork part, not the join part).
If you use java.util.concurrent.ForkJoinPool there isn't a legal way of using it in Java EE and Java EE 7 / JSR-236 does not help (I raised this point with the EG but they couldn't be bothered). ForkJoinPool spwans threads which is illegal and ManagedThreadFactory from EE 7 / JSR-236 is not a ForkJoinWorkerThreadFactory.
In JDK 8 there is a default ForkJoinPool which can be configured to not spawn any threads and run everything in the caller thread (probably through the "java.util.concurrent.ForkJoinPool.common.parallelism" system property). This makes it legal but won't give you any parallelism.
On a more general note fork join tasks should be compute bound and not IO bound. In theory spawning thread threads in Java EE is safe as long as you don't use any Java EE features (and let them terminate when your application undeploys). For example:
transactions
JDBC
JPA (including lazy loading)
security
EJBs
remoting
class loading
JNDI
…
These features also generally make tasks IO bound instead of CPU bound.
And yes the same issues apply to JDK 8 parallel streams.

You can create a Singleton ejb, which is similar to a "service" (i.e. a JMX service). within the context of this special service, you can control threading and synchronization. so, you can create a singleton ejb which encapsulates the job execution with the fork/join logic, and your standard ejbs/mdbs can utilize this service.

In JavaEE the container controls the threads. Just imagine what would happen if every programmer decided to create his own threads? In any case the use of threads in the parallel bulk operations for Java8 have been rejected. SEE HERE You can make any assumption you want, yes I wrote the article.

Related

How to create threads in Java EE environment?

I have a requirement where I have to persist some data in a table and the persisting may take sometime. Basically I want to persist a log. I don't want the execution to wait till the persisting finishes.
I know I have to use threads to accomplish this task and I know that it is discouraged to create threads in an enterprise application.
So I started reading about worker manager and understood and tried a sample program in websphere application server 8.5.
I used asynchbeans.jar from websphere and now I am bothered that I am writing vendor specific code.
Then I came across commonj work api which is described in oracle java documentation. Now I am thinking to use commonj api from fabric3.
My doubt is, is there a better way to accomplish the same task? An EJB way? Or work manager is good for my requirement?
You have some options:
Asynchronous beans. These are vendor-specific, as you mention.
commonj is just barely not vendor-specific. As far as I know, it was only implemented by IBM WebSphere Application Server and BEA WebLogic. The API was effectively superseded by Concurrency Utilities for Java EE, which is really the best choice.
EJB #Asynchronous methods. Requires using EJBs (unwanted complexity for some).
EJB timers. Requires using EJBs, requires serializable data.
JMS. Probably requires using MDBs to receive the message, requires serializable data.
Actually create threads. The EE specs do not recommend this, but as long as you don't attempt to use EE constructs (lookup("java:..."), JPA, UserTransaction, etc.), then you should be fine.
JavaEE7 has the managed executor, that you can try. You can spawn a task with it, and recieve managed callbacks in a handler. This is part of EE standard and should be platform agnostic.
See JDoc here:
http://docs.oracle.com/javaee/7/api/javax/enterprise/concurrent/ManagedExecutorService.html
If you need to be sure that all your log entries are safely written, then you probably should use JMS with persistent messages. Otherwise you could use #Asynchronous EJB methods.

How to delay processing reasonably in Java EE context?

Within a Java EE 5 environment I have the problem to ensure the existence of some data written by another part before continue processing my own data.
Historically (J2EE time), it was done by putting the data object to be processed into an internal JMS queue after waiting for e.g. 500ms via Thread.sleep.
But this does not feel like the best way to handle that problem, so I have 2 questions:
Is there any problem with using the sleep method within an Java EE context?
What is a reasonable solution to delaying some processing within an Java EE 5 application?
Edit:
I should have mentioned, that my processing takes place while handling objects from a JMS queue via an MDB.
And it may be the case, that the data for which I'm waiting never shows up, so there must be some sort of timeout, after which I can do some special processing with my data.
You can use EJB TimerService feature. Using threads in a managed environment should be avoided.
I agree with #dkaustubh about timers and avoiding threads manipulation in JavaEE.
Another possibility is to use JMS queue with delayed delivery. Although it is not a part of JavaEE API, most of messaging systems vendors supports it. check here.
I think, its possible with some advanced Threading approach. More than thinking on manual synchronizations and thread management, you can always use the Java Concurrent package.
Future can be one of the ways to do this. Please refer to Java Concurrent package.
Use notifications and Object#wait() / Object#notifyAll()
i.e. Multithreaded, the producer notifies the consumer.

Java synchronization between different JVMs

The project I am working on would trigger various asynchronous jobs to do some work. As I look into it more these asynchronous jobs are actually being run as separate JVMs (separate java processes). Does it mean I would not be able to use any of the following if I need to synchronize between these processes:
synchronized methods/blocks
any lock that implements java.util.concurrent.locks
Because it seems to me they are all thread-level?
Does Java provide support for IPC like semaphores between processes?
That's right. You can not use any standard synchronization mechanisms because they are working into one JVM.
Solutions
You can use file locks introduced in java 7.
You can use synchronization via database entities.
One of already implemented solutions like Terracota may be helpful
Re-think your design. If you are beginner in java world try to talk in details with more experienced engineers. Your question shows that IMHO you are just on wrong way.
You can use synchronized keyword, locks, atomic objects, etc. - but they are local to the JVM. So if you have two JVMs running the same program, they can still e.g. run the same synchronized method at the same time - one on each JVM, but not more.
Solutions:
terracotta provides distributed locking
hazelcast as well
you can use manual synchronization on file system or database
I'm using distributed lock provided by Redisson to synchronize work of different JVMs
they are all thread-level?
That's correct, synchronized etc only work within the context of a single process.
Does Java provide support for IPC like semaphores between processes?
One way to implement communication between Java processes is using RMI.
I have implemented a java IPC Lock implementation using files: FileBasedLock and a IPC Semaphore implementation using a shared DB (jdbc): JdbcSemaphore. Both implementations are part of spf4j.
If you have a zookeeper instance take a look at the Zookeeper based Lock recipes from Apache Curator

Why Threads, Sockets are not allowed within a Java EE component? [duplicate]

One of the first things I've learned about Java EE development is that I shouldn't spawn my own threads inside a Java EE container. But when I come to think about it, I don't know the reason.
Can you clearly explain why it is discouraged?
I am sure most enterprise applications need some kind of asynchronous jobs like mail daemons, idle sessions, cleanup jobs etc.
So, if indeed one shouldn't spawn threads, what is the correct way to do it when needed?
It is discouraged because all resources within the environment are meant to be managed, and potentially monitored, by the server. Also, much of the context in which a thread is being used is typically attached to the thread of execution itself. If you simply start your own thread (which I believe some servers will not even allow), it cannot access other resources. What this means, is that you cannot get an InitialContext and do JNDI lookups to access other system resources such as JMS Connection Factories and Datasources.
There are ways to do this "correctly", but it is dependent on the platform being used.
The commonj WorkManager is common for WebSphere and WebLogic as well as others
More info here
And here
Also somewhat duplicates this one from this morning
UPDATE: Please note that this question and answer relate to the state of Java EE in 2009, things have improved since then!
For EJBs, it's not only discouraged, it's expressly forbidden by the specification:
An enterprise bean must not use thread
synchronization primitives to
synchronize execution of multiple
instances.
and
The enterprise bean must not attempt
to manage threads. The enterprise
bean must not attempt to start, stop,
suspend, or resume a thread, or to
change a thread’s priority or name.
The enterprise bean must not attempt
to manage thread groups.
The reason is that EJBs are meant to operate in a distributed environment. An EJB might be moved from one machine in a cluster to another. Threads (and sockets and other restricted facilities) are a significant barrier to this portability.
The reason that you shouldn't spawn your own threads is that these won't be managed by the container. The container takes care of a lot of things that a novice developer can find hard to imagine. For example things like thread pooling, clustering, crash recoveries are performed by the container. When you start a thread you may lose some of those. Also the container lets you restart your application without affecting the JVM it runs on. How this would be possible if there are threads out of the container's control?
This the reason that from J2EE 1.4 timer services were introduced. See this article for details.
Concurrency Utilities for Java EE
There is now a standard, and correct way to create threads with the core Java EE API:
JSR 236: Concurrency Utilities for Java™ EE
By using Concurrency Utils, you ensure that your new thread is created, and managed by the container, guaranteeing that all EE services are available.
Examples here
There is no real reason not to do so. I used Quarz with Spring in a webapp without problems. Also the concurrency framework java.util.concurrent may be used. If you implement your own thread handling, set the theads to deamon or use a own deamon thread group for them so the container may unload your webapp any time.
But be careful, the bean scopes session and request do not work in threads spawned! Also other code beased on ThreadLocal does not work out of the box, you need to transfer the values to the spawned threads by yourself.
You can always tell the container to start stuff as part of your deployment descriptors. These can then do whatever maintainance tasks you need to do.
Follow the rules. You will be glad some day you did :)
Threads are prohibited in Java EE containers according to the blueprints. Please refer to the blueprints for more information.
I've never read that it's discouraged, except from the fact that it's not easy to do correctly.
It is fairly low-level programming, and like other low-level techniques you ought to have a good reason. Most concurrency problems can be resolved far more effectively using built-in constructs like thread pools.
One reason I have found if you spawn some threads in you EJB and then you try to have the container unload or update your EJB you are going to run into problems. There is almost always another way to do something where you don't need a Thread so just say NO.

Quartz in Webapplication

I have a question in scheduling jobs in web application. If we have to schedule jobs in web application we can either use java util Timer/TimerTask or Quartz(there are also other scheduling mechanism, but I considered Quartz). I was considering which one to use, when i hit the site http://oreilly.com/pub/a/java/archive/quartz.html?page=1 which says using timer has a bad effect as it creates a thread that is out of containers control in the last line. The other pages discuss Quartz and its capabilities, but I can read that Quartz also uses thread and/or threadpool to schedule tasks. My guess is that these threads are also not under the containers control
Can anybody clarify this to me
Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Thanks in advance
Can anybody clarify this to me Is it safe to use Quartz in my web applications without creating hanging threads or thread locking issues?
Both quartz and the JDK Timer start unmanaged threads that do not have access to Java EE contextual information, that's the biggest issue. In addition, they can use resources without the [application server] knowing about it, exist without an administrator's ability to control their number and resource usage, and impede on the application server's ability to gracefully shutdown or recover resources from failure (see Unmanaged threads).
Having that said, I didn't face hanging threads or locking issues (I guess it depends on what you're doing with them though).
If really this is a concern, consider using a JSR-237 Timer and WorkManager implementation (that works with managed thread) like Foo-CommonJ instead of quartz or JDK Timer.
Both approaches created unmanaged threads. I use Quartz for scheduling rather than java Timer since it offer more flexability (cron expressions, for example) and it better managable.
If I have to say in one line, I would say use Quartz as it will take care of managing scheduling related low level work for you. With Timer, you can do everything which quartz does (even make timer threads keep polling to check if web app is running and exit otherwise). But this needs to be done in your code by you. With Quartz all this you get out of the box.
Now details
Quartz provides
1. Job persistence
2. Managed thread pool so you create appropriate number of threads and make the jobs wait after that.
3. Initialization servlet to be integrated with your web application. When app shuts down, I think it takes care of closing your threads, but I have not tried it. So I would not comment much on it.
4. RMI based scheduling, for clustered environments.
There are others as well, but these ones have been the biggest motivators why people use quartz more frequently.

Categories