We have a requirement, where we have to run many async background processes which accesses DBs, Kafka queues, etc. As of now, we are using Spring Batch with Tomcat (exploded WAR) for the same. However, we are facing certain issues which I'm unable to solve using Spring Batch. I was thinking of other frameworks to use, but couldn't find any that solves all my problems.
It would be great to know if there exists a framework which solves the following problems:
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
WHAT I WANT: Bundle all the jars and run each job as a separate process. The framework should store the PID and should be able to manage (stop/force-kill) the job on demand. This way, when we want to update a JAR, the existing process won't be hindered (however, we should be able to stop the existing process from UI), and no other job (running or not) will also be touched.
I have looked at hot-update of JARs in Tomcat, but I'm skeptical whether to use such a mechanism in production.
Sub-question: Will OSGI integrate with Spring Batch? If so, is it possible to run each job as a separate container with all JARs embedded in it?
Spring batch doesn't have a master-slave architecture.
WHAT I WANT: There should be a master, where the list of jobs are specified. There should be slave machines (workers), which are specified to master in a configuration file. There should exist a scheduler in the master, which when needed to start a job, should assign a slave a job (possibly load-balanced, but not necessary) and the slave should update the DB. The master should be able to send and receive data from the slaves (start/stop/kill any job, give me update of running jobs, etc.) so that it can be displayed on a UI.
This way, in case I have a high load, I should be able to just add machines into the cluster and modify the master configuration file and the load should get balanced right away.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
WHAT I WANT: I should be able to set up alerts for jobs in case of failure. If necessary, a job should have a timeout where it should able to notify the user (via email probably) or should force stop the job when the job crosses a specified threshold.
Maybe vertx can do the trick.
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
Vertx allows you to build microservices. Each vertx instance is able to communicate with other instances. If you stop one, the others can still work (if there are not dependant, eg if you stop master, slaves will fail)
Vert.x is not an application server.
There's no monolithic Vert.x instance into which you deploy applications.
You just run your apps wherever you want to.
Spring batch doesn't have a master-slave architecture
Since vertx is even driven, you can easily create a master slave architecture. For example handle the http request in an vertx instance and dispatch them between severals other instances depending on the nature of the request.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
In vertx, you can set a timeout for each message and handle failure.
Sending with timeouts
When sending a message with a reply handler you can specify a timeout in the DeliveryOptions.
If a reply is not received within that time, the reply handler will be called with a failure.
The default timeout is 30 seconds.
Send Failures
Message sends can fail for other reasons, including:
There are no handlers available to send the message to
The recipient has explicitly failed the message using fail
In all cases the reply handler will be called with the specific failure.
EDIT There are other frameworks to do microservices in java. Dropwizard is one of them, but I can't talk much more about it.
Related
Currently, I have an idea about building a centralized batch job management system (I temporarily call it batch service).
We own a microservice system, and the batch jobs are scattered across the services (including oracle's bacth jobs). So I intend to set up a bacth job management system.
But there is one problem that in microservices there are many databases, so I want the manipulation of data to be done by other services, and batch service only does the following things: setting, scheduling, checking status state, log, start, stop, retry.
My idea is to use message broker(kafka, rabbitmq, ...) to pass job request from batch service to other services. But I am not thinking of a solution to stop or save the log of jobs on the batch service.
Is this idea feasible and if so can you give me some advice on deployment technologies (We are deploying using spring boot at the moment).
Thanks for taking the time to read ^^.
I'm coming from the PHP/Python/JS environment where it's a standard to run multiple instances of web application as separate processes and asynchronous tasks like queue processing as separate scripts.
eg. in the k8s environment, there would be
N instances of web server only, each running in separate pod
For each queue, dynamic number of consumers, each in separate pod
Cron scheduling using k8s crontab functionality, leaving the scheduling process to k8s
Such approach matches well the cloud nature where the workload can be scheduled across both smaller number of powerful machines and lot of less powerful machines and allows very fine control of auto scaling (based on the number of messages in specific queue for example).
Also, there is a clear separation between the developer and DevOps responsibility.
Recently, I tried to replicate the same setup with Java Spring Boot application and failed miserably.
Even though Java frameworks say that they are "cloud native", it seems like all the documentation is still built around monolith application, which handles all consumers and cron scheduling in separate threads.
Clear answer to this problem is microservices but that's way out of scope.
What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
So, the question is:
How do I design my Spring Boot application so that:
I can run the webserver separately without queue listeners and scheduled jobs
I can run one queue listener per pod in the k8s
I can use k8s cron scheduling instead of App level Spring scheduler?
I found several ways to achieve something like this but I expect there must be some "more or less standard way".
Alternative solutions that came to my mind:
Having separate module with separate Application definition so that each "command" is built separately
Using Spring Profiles to instantiate specific services only according to some environment variables
Implement custom command line runner which would parse command name/queue name and dynamically create appropriate services (this seems to be the most similar approach to the way how it's done in "scripting languages")
What I mainly want to achieve with such setup is:
To be able to run the application on lot of weak HW instead of having 1 machine with 32 cpu cores
Easier scaling per workload
Removing one layer from already complex monitoring infrastructure (k8s already allows very fine resource monitoring, application level task scheduling and parallelism makes this way more difficult)
Do I miss something or is it just that it's not standard to write Java server apps this way?
Thank you!
What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
I agree with #jacky-neo's answer in terms of the appropriate architecture/best practice, but that may require you to break up your monolithic application.
To solve this without breaking up your monolithic application, deploy multiple instances of your monolith to Kubernetes each as a separate Deployment. Each deployment can have its own configuration. Then you can utilize feature flags and define the environment variables for each deployment based on the functionality you would like to enable.
In application.properties:
myapp.queue.listener.enabled=${QUEUE_LISTENER_ENABLED:false}
In your Deployment for the queue listener, enable the feature flag:
env:
- name: 'QUEUE_LISTENER_ENABLED'
value: 'true'
You would then just need to configure your monolithic application to use this myapp.queue.listener.enabled property and only enable the queue listener when the property is set to true.
Similarly, you could also apply this logic to the Spring profile to only run certain features in your app based on the profile defined in your ConfigMap.
This Baeldung article explains the process I'm presenting here in detail.
For the scheduled task, just set up a CronJob using a curl container which can invoke the service you want to perform the work.
Edit
Another option based on your comments below -- split the shared logic out into a shared module (using Gradle or Maven), and have two other runnable modules like web and listener that depend on the shared module. This will allow you to keep your shared logic in the same repository, and keep you from having to build/maintain an extra library which you would like to avoid.
This would be a good step in the right direction, and it would lend well to breaking the app into smaller pieces later down the road.
Here's some additional info about multi-module Spring Boot projects using Maven or Gradle.
According to my expierence, I will resolve these issue as below. Hope it is what you want.
I can run the webserver separately without queue listeners and
scheduled jobs
Develop a Spring Boot app to do it and deploy it as service-A in Kubernetes. In this app, you use spring-mvc to define the controller or REST controller to receive requests. Then use the Kubernetes Nodeport or define ingress-gateway to make the service accessible from outside the Kubernetes cluster. If you use session, you should save it into Redis or a similar shared place so that more instances of the service (pod) can share same session value.
I can run one queue listener per pod in the k8s
Develop a new Spring Boot app to do it and deploy it as service-B in Kubernetes. This service only processes queue messages from RabbitMQ or others, which can be sent from service-A or another source. In most times it should not be accessed from outside the Kubernetes cluster.
I can use k8s cron scheduling instead of App level Spring scheduler?
In my opinion, I like to define a new Spring Boot app with spring-scheduler called service-C in Kubernetes. It will have only one instance and will not be scaled. Then, it will invoke service-A method at the scheduled time. It will not be accessible from outside the Kubernetes cluster. But if you like Kubernetes CronJob, you can just write a bash shell using service-A's dns name in Kubernetes to access its REST endpoint.
The above three services can each be configured with different resources such as CPU and memory usage.
I do not get the essence of your post.
You want to have an application with "monolithic code architecture".
And then deploy it to several pods, but only parts of the application are actually running.
Why don't you separate the parts you want to be special to be applications in their own right?
Perhaps this is because I come from a Java background and haven't deployed monolithic scripting apps.
I'm trying to wrap my head around Spring Batch, and while many tutorials show great examples of code, i feel like i'm missing how the "spring batch engine" works.
Scenario 1 - On user creation, create user at external service.
Web request
CreateLocalUser()
launch job CreateExternalUser()
CreateExternalUser() can fail because of many reasons, so we want to be able to retry and log errors, which Spring Batch can do for us. Also it's a decoupled process that has nothing to do with the creation of our local user.
Where does the job run? Will it run in the same thread as the web request, which means the end user will have to wait for the job to finish before getting http status 200?
Imagine i have a Web server and a Batch server. I want all jobs to run on the Batch server, but the jobs themselves can be initiated from the Web server. Can Spring Batch do this? Do i need some kind of Queue that i can write to from the Webserver and Consume from the Batch server, where the actual job will begin?
Scenario 2 - Process lines in huge file, start new job for each line
Read lines in huge file (1.000.000 lines)
Start new job for each line using input parameters from the file.
Processing the 1.000.000 lines is quick and the 1.000.000 new jobs will more or less be started at the same time. Where does these run? Do they run async to the initial job? Will my server be able to handle running all these more or less at the same time.
Additional question:
Is it possible to query Jobs based on a job input parameter. i.e. Scenario 1, i want to show the CreateExternalUser job status / error when viewing my local user with Id 1234 on my web page. CreateExternalUser job has input parameter userId: 1234
You have a few questions here so let's go through them one at a time:
Where does the job run? Will it run in the same thread as the web request, which means the end user will have to wait for the job to finish before getting http status 200?
That depends on your configuration. If you use the defaults, then yes. The job would run in the same thread and the user would be forced to wait until the job completes in order to get the 200. This obviously isn't a good idea...
Which is why Spring Batch's SimpleJobLauncher allows you to inject a TaskExecutor. By configuring your JobLauncher to use an async TaskExecutor implementation (ThreadPoolTaskExecutor for example), the job would be executed in a different thread, allowing the controller's processing to complete.
Obviously this is all within a single JVM, which bring us to your next question.
I want all jobs to run on the Batch server, but the jobs themselves can be initiated from the Web server. Can Spring Batch do this? Do i need some kind of Queue that i can write to from the Webserver and Consume from the Batch server, where the actual job will begin?
Spring Batch contains a module called Spring Batch Integration. This module provides various capabilities including using messages to launch Spring Batch Jobs. You can use this to have a remote "batch" server that you can communicate with from the web server. The communication mechanism is Spring Integration channels so any messaging option backed by SI would be supported (JMS, AMQP, REST, etc).
Scenario 2 - Process lines in huge file, start new job for each line
This scenario makes me think you're going down the wrong path for your design. Can you post a new question that elaborates on this use case?
Additional question: Is it possible to query Jobs based on a job input parameter
Job parameters are used to identify JobInstances and are fundamental to job identification. Because of this, yes, you can identify individual job runs based on the parameters.
So, I have a web client and an EJB timer, deployed seperately.
The workflow is as follows:
1) User accesses client.
2) User requests an action to take place which is known to be long-running, so we write the request to run this process in a database table.
3) TimerOne is checking this table every few seconds to see if there are any waiting tasks, so it finds the user's request and runs the task.
My problem is that in some environments in which our application is run, we are taking advantage of server clustering. When we do this, both the client and the EJB timer are deployed to each server in the cluster.
It is okay for the client to be deployed to multiple servers, as it helps with workload; however, having the timer run on multiple servers is an issue. When the user requests for a long-running task to be run, both timers grab the task at the same time from the database and start running it. As the long-running jobs usually write to the database, this scenario leads to collisions, among other issues.
My goal is to be able to deploy my EJB timer to both servers, but for there to be some state maintained across the cluster which can be used by the timers to decide whether they should pick up the task or if one of the other instances has already picked it up.
I tried using the database for this and tried file storage, but these are either too slow, or I could not come up with a bullet-proof workflow for synchronization.
Does anyone know of a good way to handle this problem? Is it even possible?
The solution should be able to run on a clustered WebLogic domain, a non-clustered WebLogic domain, a clustered Glassfish domain, and a non-clustered Glassfish domain.
I am open to changing the way this is done, if there is another, more elegent solution.
Thanks for any ideas!
Yes this is possible with clustered timers or a Weblogic Singleton Service (and has been asked a number of times here already). See the following:
Clustered timers:
https://blogs.oracle.com/muraliveligeti/entry/ejb_timer_ejb
http://shaoxiongyang.blogspot.com/2010/10/how-to-use-ejb-3-timer-in-weblogic-10.html
http://java.sys-con.com/node/43944
Singleton Services:
https://blogs.oracle.com/jamesbayer/entry/a_simple_job_scheduler_example
http://developsimpler.blogspot.com/2012/03/weblogic-clusters-and-singleton-service.html
I am open to changing the way this is done, if there is another, more elegent solution.
I know that your question is about a EJB Timer, but take in mind the following:
In my opinion, you have a requirement that need the advantage of asynchronous processing.
In earlier Java EE versions, one of the alternatives to achieve this kind of requirement was to use JMS which allows you to send a message that is processed later for a business layer component. Other possibility was the one that you have described, that required the use of EJB Timer. I think both cases were a workaround that filled a gap in the EE specification.
Since Java EE 6, you can define asynchronous services which allows you make asynchronous calls, avoiding to use features were thought for other purposes.
Currently I have a Java (and a half ported python version) app that runs in the background that has a queue of jobs (currently read out of a mysql database) which handles thread sleep/waking to share resources based on the job priority and running time. There is a front end php script that posts jobs to the database which are polled by the system every time interval.
This manner is somewhat inefficient (but nicer than locking issues using a job file) but I can't but wonder if there would be some way to simplify this.
My thoughts were java app (and or python app) sets up http service (jetty?) and has a web interface that directly pushes jobs to the queue without the middleman. Apache is serving other php sites so this would have to run in tandem.
I'm really after some other input as I'd prefer it to be a background service always running - having a cron execute jobs was painful (since some jobs run for 20+ hours so adding new ones was a pain with new php [ no threading] /java calls having to check if a service was running with outstanding jobs to add to instead of starting a new service) but also have a very simple web interface without too much resource wastage.
Thanks for your input.
Deploy a JSP using Tomcat (or similar) that allows the user to post job requests to a job scheduler web service using a webpage. On the backend, use Quartz Scheduler to manage your jobs and just have your web service add jobs to the Quartz queue.