scheduling tasks on JBoss with clustering

scheduling tasks on JBoss with clustering - java

I need to be able to run some scheduled tasks (reports) for an EJB application running on JBoss 4.2.
In my initial implementation I am using a servlet in an associated WAR to read some configuration from a properties file and then reset the scheduled tasks using the Timer Service API. This works but it seems a bit awkward to have the initialization off in a web project. Also I'm not sure if this will work as expected when the app is deployed in a clustered environment.
What are the best practice for accomplishing this type of task? Should I be using something other than Timer Service and is there a better way to initialize the timers when the server starts?

Maybe have a look at Quartz Scheduler. Quoting its website:
Quartz is a full-featured, open source job scheduling system that can be integrated with, or used along side virtually any J2EE or J2SE application - from the smallest stand-alone application to the largest e-commerce system. Quartz can be used to create simple or complex schedules for executing tens, hundreds, or even tens-of-thousands of jobs; jobs whose tasks are defined as standard Java components or EJBs. The Quartz Scheduler includes many enterprise-class features, such as JTA transactions and clustering.
I've used it in the past to trigger EJB jobs and the whole solution was working very well, with very good scalability. To use it with EJB, you'll need to use the JobStoreCMT to store scheduling information (job, triggers and calendars). To tune resources for jobs execution, have a look at the Configure ThreadPool Settings doc. Then, just let the EJB client do its job to load balance requests over the different instances if EJBs are deployed on a cluster.
Quartz itself can also be clustered to get both high availability and scalability through fail-over and load balancing if required.
Regarding the properties file you mentioned, I'm not sure of what kind of data you need to read exactly but, without a servlet, if you need to read something, you'll have to read it from the database.

Related

Spring Boot - Running one specific background job per pod

I'm coming from the PHP/Python/JS environment where it's a standard to run multiple instances of web application as separate processes and asynchronous tasks like queue processing as separate scripts.
eg. in the k8s environment, there would be
N instances of web server only, each running in separate pod
For each queue, dynamic number of consumers, each in separate pod
Cron scheduling using k8s crontab functionality, leaving the scheduling process to k8s
Such approach matches well the cloud nature where the workload can be scheduled across both smaller number of powerful machines and lot of less powerful machines and allows very fine control of auto scaling (based on the number of messages in specific queue for example).
Also, there is a clear separation between the developer and DevOps responsibility.
Recently, I tried to replicate the same setup with Java Spring Boot application and failed miserably.
Even though Java frameworks say that they are "cloud native", it seems like all the documentation is still built around monolith application, which handles all consumers and cron scheduling in separate threads.
Clear answer to this problem is microservices but that's way out of scope.
What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
So, the question is:
How do I design my Spring Boot application so that:
I can run the webserver separately without queue listeners and scheduled jobs
I can run one queue listener per pod in the k8s
I can use k8s cron scheduling instead of App level Spring scheduler?
I found several ways to achieve something like this but I expect there must be some "more or less standard way".
Alternative solutions that came to my mind:
Having separate module with separate Application definition so that each "command" is built separately
Using Spring Profiles to instantiate specific services only according to some environment variables
Implement custom command line runner which would parse command name/queue name and dynamically create appropriate services (this seems to be the most similar approach to the way how it's done in "scripting languages")
What I mainly want to achieve with such setup is:
To be able to run the application on lot of weak HW instead of having 1 machine with 32 cpu cores
Easier scaling per workload
Removing one layer from already complex monitoring infrastructure (k8s already allows very fine resource monitoring, application level task scheduling and parallelism makes this way more difficult)
Do I miss something or is it just that it's not standard to write Java server apps this way?
Thank you!

What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
I agree with #jacky-neo's answer in terms of the appropriate architecture/best practice, but that may require you to break up your monolithic application.
To solve this without breaking up your monolithic application, deploy multiple instances of your monolith to Kubernetes each as a separate Deployment. Each deployment can have its own configuration. Then you can utilize feature flags and define the environment variables for each deployment based on the functionality you would like to enable.
In application.properties:
myapp.queue.listener.enabled=${QUEUE_LISTENER_ENABLED:false}
In your Deployment for the queue listener, enable the feature flag:
env:
- name: 'QUEUE_LISTENER_ENABLED'
value: 'true'
You would then just need to configure your monolithic application to use this myapp.queue.listener.enabled property and only enable the queue listener when the property is set to true.
Similarly, you could also apply this logic to the Spring profile to only run certain features in your app based on the profile defined in your ConfigMap.
This Baeldung article explains the process I'm presenting here in detail.
For the scheduled task, just set up a CronJob using a curl container which can invoke the service you want to perform the work.
Edit
Another option based on your comments below -- split the shared logic out into a shared module (using Gradle or Maven), and have two other runnable modules like web and listener that depend on the shared module. This will allow you to keep your shared logic in the same repository, and keep you from having to build/maintain an extra library which you would like to avoid.
This would be a good step in the right direction, and it would lend well to breaking the app into smaller pieces later down the road.
Here's some additional info about multi-module Spring Boot projects using Maven or Gradle.

According to my expierence, I will resolve these issue as below. Hope it is what you want.
I can run the webserver separately without queue listeners and
scheduled jobs
Develop a Spring Boot app to do it and deploy it as service-A in Kubernetes. In this app, you use spring-mvc to define the controller or REST controller to receive requests. Then use the Kubernetes Nodeport or define ingress-gateway to make the service accessible from outside the Kubernetes cluster. If you use session, you should save it into Redis or a similar shared place so that more instances of the service (pod) can share same session value.
I can run one queue listener per pod in the k8s
Develop a new Spring Boot app to do it and deploy it as service-B in Kubernetes. This service only processes queue messages from RabbitMQ or others, which can be sent from service-A or another source. In most times it should not be accessed from outside the Kubernetes cluster.
I can use k8s cron scheduling instead of App level Spring scheduler?
In my opinion, I like to define a new Spring Boot app with spring-scheduler called service-C in Kubernetes. It will have only one instance and will not be scaled. Then, it will invoke service-A method at the scheduled time. It will not be accessible from outside the Kubernetes cluster. But if you like Kubernetes CronJob, you can just write a bash shell using service-A's dns name in Kubernetes to access its REST endpoint.
The above three services can each be configured with different resources such as CPU and memory usage.

I do not get the essence of your post.
You want to have an application with "monolithic code architecture".
And then deploy it to several pods, but only parts of the application are actually running.
Why don't you separate the parts you want to be special to be applications in their own right?
Perhaps this is because I come from a Java background and haven't deployed monolithic scripting apps.

WebSphere application server scheduler or java scheduler for insert

I am working on an application which is deployed on web-sphere application server 8.0. This application insert record in one table and uses the data-source by jndi lookup.
I need to create a batch job which will read data from the above table and will insert into some other table continuously on a fixed interval of time. It will be deployed on the same WAS server and use the same jndi lookup for data source.
I read on internet that web-sphere application server scheduling is an option and is done using EJB and session beans.
I also read about jdk's ScheduledThreadPoolExecutor. I can create a war having ScheduledThreadPoolExecutor implementation and deploy it on the WAS for this.
I tried to find the difference between these two in terms of usage, complexity, performance and maintainability but could not.
Please help me in deciding which approach will be better for creating the scheduler for insert batch jobs and why. And in case if WAS scheduler is better then please provide me link to create and deploy the same.
Thanks!

Some major differences between WAS Scheduler and Java SE ScheduledThreadPoolExecutor is that WAS Scheduler is transactional (task execution can roll back or commit), persistent (tasks are stored in a database), and can coordinate across members of a cluster (such that tasks can be scheduled from any member but only run on one member). ScheduledThreadPoolExecutor is much lighter weight approach because it doesn't have any of these capabilities and does all of its scheduling within a single JVM. Task executions neither roll back nor retry and are not kept externally in a database in case the server goes down. It should be noted that WebSphere Application Server also has CommonJ TimerManager (and AlarmManager via WorkManager) which are more similar to what you get with ScheduledThreadPoolExecutor if that is what you want. In that case, the application server still manages the threads and ensures that context of the scheduling thread is available on the thread of execution. Hope this helps with your decision.

How can I coordinate a single ejb timer deployed to multiple servers within my WebLogic cluster?

So, I have a web client and an EJB timer, deployed seperately.
The workflow is as follows:
1) User accesses client.
2) User requests an action to take place which is known to be long-running, so we write the request to run this process in a database table.
3) TimerOne is checking this table every few seconds to see if there are any waiting tasks, so it finds the user's request and runs the task.
My problem is that in some environments in which our application is run, we are taking advantage of server clustering. When we do this, both the client and the EJB timer are deployed to each server in the cluster.
It is okay for the client to be deployed to multiple servers, as it helps with workload; however, having the timer run on multiple servers is an issue. When the user requests for a long-running task to be run, both timers grab the task at the same time from the database and start running it. As the long-running jobs usually write to the database, this scenario leads to collisions, among other issues.
My goal is to be able to deploy my EJB timer to both servers, but for there to be some state maintained across the cluster which can be used by the timers to decide whether they should pick up the task or if one of the other instances has already picked it up.
I tried using the database for this and tried file storage, but these are either too slow, or I could not come up with a bullet-proof workflow for synchronization.
Does anyone know of a good way to handle this problem? Is it even possible?
The solution should be able to run on a clustered WebLogic domain, a non-clustered WebLogic domain, a clustered Glassfish domain, and a non-clustered Glassfish domain.
I am open to changing the way this is done, if there is another, more elegent solution.
Thanks for any ideas!

Yes this is possible with clustered timers or a Weblogic Singleton Service (and has been asked a number of times here already). See the following:
Clustered timers:
https://blogs.oracle.com/muraliveligeti/entry/ejb_timer_ejb
http://shaoxiongyang.blogspot.com/2010/10/how-to-use-ejb-3-timer-in-weblogic-10.html
http://java.sys-con.com/node/43944
Singleton Services:
https://blogs.oracle.com/jamesbayer/entry/a_simple_job_scheduler_example
http://developsimpler.blogspot.com/2012/03/weblogic-clusters-and-singleton-service.html

I am open to changing the way this is done, if there is another, more elegent solution.
I know that your question is about a EJB Timer, but take in mind the following:
In my opinion, you have a requirement that need the advantage of asynchronous processing.
In earlier Java EE versions, one of the alternatives to achieve this kind of requirement was to use JMS which allows you to send a message that is processed later for a business layer component. Other possibility was the one that you have described, that required the use of EJB Timer. I think both cases were a workaround that filled a gap in the EE specification.
Since Java EE 6, you can define asynchronous services which allows you make asynchronous calls, avoiding to use features were thought for other purposes.

Executing external services using EJB Timer Service

I have a scenario to ask regarding utilizing the EJB Timer Service.
Use case as follows:
The system should be able to schedule a task that will poll/ask our subversion repository for files changes using some particular timestamp.
The idea is that whenever the scheduled task is about to run, it will execute command against a particular svn repository.
For this particular purpose, I will not call any external process but will use the 'pure' java way of using the SVNKit java library http://svnkit.com/
My only concern is this:
Is it a good idea to use the EJB Timer Service to execute task that will call external processes? My way will use a 'pure' java way but in other scenario such as calling a batch file/command line/external executable directly into the timer service logic.
I worry about the effects of server memory use/performance etc.
Is this a good idea?
The other thought that I am thinking is to just create a 'desktop' application in the server using client based technology such as SWT/Swing that will do the polling and then code the logic there but this will mean that I need to manage two applications. The 'desktop' app that will poll and the 'web' user interface that I will create in Glassfish.
I am leaning towards doing everything in the App server of my choice which is glassfish.
I have used EJB Timer before but it only calls against the database without calling any extenral service and it's just that this scenario came up so I raised a question here to gather more thoughts from those who have experienced doing this.
Any thoughts?

In theory, EJBs aren't supposed to depend on external I/O since it interferes with the container/server's management of bean instances, threads, etc.
In practice, this should work if you take precautions. For example:
isolate the function to its own EJB (i.e., a stateless session bean that only handles these timers) to avoid instance pooling issues
use timeouts while waiting for commands to avoid hung processes from hanging all server threads
ensure that you don't schedule timers so that you have multiple OS commands run simultaneously
Keep in mind that EJB 3.0 timers are persistent (vs EJB 3.1 timers, which have the option of being non-persistent), which means:
They can run on any server in a cluster. If you have multiple machines in your cluster, you need to ensure that they are all capable of running the command.
They survive server restarts. If you schedule a timer to run but the server crashes before it can, it will run when the server restarts. This can cause particular problems for interval timers (all missed timers will fire repeatedly) and if you don't carefully manage existing times (you can easily create redundant timers).

Job queuing library/software for Java

The premise is this: For asynchronous job processing I have a homemade framework that:
Stores jobs in database
Has a simple java api to create more jobs and processors for them
Processing can be embedded in a web application or can run by itself in on different machines for scaling out
Web UI for monitoring the queue and canceling queue items
I would like to replace this with some ready made library because I would expect more robustness from those and I don't want to maintain this. I've been researching the issue and figured you could use JMS for something similar. But I would still have to build a simple java API, figure out a runtime where I would put the processing when I want to scale out and build a monitoring UI. I feel like the only thing I would benefit from JMS is that I would not have to do is the database stuff.
Is there something similar to this that is ready made?
UPDATE
Basically this is the setup I would want to do:
Web application runs in a Servlet container or Application Server
Web application uses a client api to create jobs
X amount of machines process those jobs
Monitor and manage jobs from an UI

You can use Quartz:
http://www.quartz-scheduler.org/

Check out Spring Batch.
Link to sprint batch website: http://projects.spring.io/spring-batch/

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.