Spring Boot - Running one specific background job per pod

Spring Boot - Running one specific background job per pod - java

I'm coming from the PHP/Python/JS environment where it's a standard to run multiple instances of web application as separate processes and asynchronous tasks like queue processing as separate scripts.
eg. in the k8s environment, there would be
N instances of web server only, each running in separate pod
For each queue, dynamic number of consumers, each in separate pod
Cron scheduling using k8s crontab functionality, leaving the scheduling process to k8s
Such approach matches well the cloud nature where the workload can be scheduled across both smaller number of powerful machines and lot of less powerful machines and allows very fine control of auto scaling (based on the number of messages in specific queue for example).
Also, there is a clear separation between the developer and DevOps responsibility.
Recently, I tried to replicate the same setup with Java Spring Boot application and failed miserably.
Even though Java frameworks say that they are "cloud native", it seems like all the documentation is still built around monolith application, which handles all consumers and cron scheduling in separate threads.
Clear answer to this problem is microservices but that's way out of scope.
What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
So, the question is:
How do I design my Spring Boot application so that:
I can run the webserver separately without queue listeners and scheduled jobs
I can run one queue listener per pod in the k8s
I can use k8s cron scheduling instead of App level Spring scheduler?
I found several ways to achieve something like this but I expect there must be some "more or less standard way".
Alternative solutions that came to my mind:
Having separate module with separate Application definition so that each "command" is built separately
Using Spring Profiles to instantiate specific services only according to some environment variables
Implement custom command line runner which would parse command name/queue name and dynamically create appropriate services (this seems to be the most similar approach to the way how it's done in "scripting languages")
What I mainly want to achieve with such setup is:
To be able to run the application on lot of weak HW instead of having 1 machine with 32 cpu cores
Easier scaling per workload
Removing one layer from already complex monitoring infrastructure (k8s already allows very fine resource monitoring, application level task scheduling and parallelism makes this way more difficult)
Do I miss something or is it just that it's not standard to write Java server apps this way?
Thank you!

What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
I agree with #jacky-neo's answer in terms of the appropriate architecture/best practice, but that may require you to break up your monolithic application.
To solve this without breaking up your monolithic application, deploy multiple instances of your monolith to Kubernetes each as a separate Deployment. Each deployment can have its own configuration. Then you can utilize feature flags and define the environment variables for each deployment based on the functionality you would like to enable.
In application.properties:
myapp.queue.listener.enabled=${QUEUE_LISTENER_ENABLED:false}
In your Deployment for the queue listener, enable the feature flag:
env:
- name: 'QUEUE_LISTENER_ENABLED'
value: 'true'
You would then just need to configure your monolithic application to use this myapp.queue.listener.enabled property and only enable the queue listener when the property is set to true.
Similarly, you could also apply this logic to the Spring profile to only run certain features in your app based on the profile defined in your ConfigMap.
This Baeldung article explains the process I'm presenting here in detail.
For the scheduled task, just set up a CronJob using a curl container which can invoke the service you want to perform the work.
Edit
Another option based on your comments below -- split the shared logic out into a shared module (using Gradle or Maven), and have two other runnable modules like web and listener that depend on the shared module. This will allow you to keep your shared logic in the same repository, and keep you from having to build/maintain an extra library which you would like to avoid.
This would be a good step in the right direction, and it would lend well to breaking the app into smaller pieces later down the road.
Here's some additional info about multi-module Spring Boot projects using Maven or Gradle.

According to my expierence, I will resolve these issue as below. Hope it is what you want.
I can run the webserver separately without queue listeners and
scheduled jobs
Develop a Spring Boot app to do it and deploy it as service-A in Kubernetes. In this app, you use spring-mvc to define the controller or REST controller to receive requests. Then use the Kubernetes Nodeport or define ingress-gateway to make the service accessible from outside the Kubernetes cluster. If you use session, you should save it into Redis or a similar shared place so that more instances of the service (pod) can share same session value.
I can run one queue listener per pod in the k8s
Develop a new Spring Boot app to do it and deploy it as service-B in Kubernetes. This service only processes queue messages from RabbitMQ or others, which can be sent from service-A or another source. In most times it should not be accessed from outside the Kubernetes cluster.
I can use k8s cron scheduling instead of App level Spring scheduler?
In my opinion, I like to define a new Spring Boot app with spring-scheduler called service-C in Kubernetes. It will have only one instance and will not be scaled. Then, it will invoke service-A method at the scheduled time. It will not be accessible from outside the Kubernetes cluster. But if you like Kubernetes CronJob, you can just write a bash shell using service-A's dns name in Kubernetes to access its REST endpoint.
The above three services can each be configured with different resources such as CPU and memory usage.

I do not get the essence of your post.
You want to have an application with "monolithic code architecture".
And then deploy it to several pods, but only parts of the application are actually running.
Why don't you separate the parts you want to be special to be applications in their own right?
Perhaps this is because I come from a Java background and haven't deployed monolithic scripting apps.

Related

Combination of Spring Cloud and Orchestration Tools Like Docker Swarm and Kubernetes

I have a cloud-native application, which is implemented using Spring Cloud Netflix.
So, in my application, I'm using Eureka service discovery to manage all instances of different services of the application. When each service instance wants to talk to another one, it uses Eureka to fetch the required information about the target service (IP and port for example).
The service orchestration can also be achieved using tools like Docker Swarm and Kubernetes, and it looks there are some overlaps between what Eureka does and what Docker Swarm and Kubernetes can do.
For example, Imagine I create a service in Docker Swarm with 5 instances. So, swarm insures that those 5 instances are always up and running. Additionally, each services of the application is sending a periodic heartbeat to the Eureka internally, to show that it's still alive. It seems we have two layers of health check here, one for Docker and another inside the Spring Cloud itself.
Or for example, you can expose a port for a service across the entire swarm, which eliminates some of the needs to have a service discovery (the ports are always apparent). Another example could be load balancing performed by the routing mesh inside the docker, and the load balancing happening internally by Ribbon component or Eureka itself. In this case, having a hardware load balancer, leads us to a 3-layered load balancing functionality.
So, I want to know is it rational to use these tools together? It seems using a combination of these technologies increases the complexity of the application very much and may be redundant.
Thank you for reading!

If you already have the application working then there's presumably more effort and risk in removing the netflix components than keeping them. There's an argument that if you could remove e.g. eureka then you wouldn't need to maintain it and it would be one less thing to upgrade. But that might not justify the effort and it also depends on whether you are using it for anything that might not be fulfilled by the orchestration tool.
For example, if you're connecting to services that are not set up as load-balanced ('headless services') then you might want ribbon within your services. (You could do this using tools in the spring cloud kubernetes incubator project or its fabric8 equivalent.) Another situation to be mindful of is when you're connecting to external services (i.e. services outside the kubernetes cluster) - then you might want to add load-balancing or rate limiting and ribbon/hystrix would be an option. It will depend on how nuanced your requirements for load-balancing or rate-limiting are.
You've asked specifically about netflix but it's worth stating clearly that spring cloud includes other components and not just netflix ones. And that there's other areas of overlap where you would need to make choices.
I've focused on Kubernetes rather than docker swarm partly because that's what I know best and partly because that's what I believe to be the current direction of travel for the industry - on this you should note that kubernetes is available within docker EE. I guess you've read many comparison articles but https://hackernoon.com/a-kubernetes-guide-for-docker-swarm-users-c14c8aa266cc might be particularly interesting to you.

You are correct in that it does seem redundant. From personal observations, I think that each layer of that architecture should handle load balancing in its' own specific way. It ends up giving you a lot more flexibility for not much more cost. If you want to take advantage of client side load balancing and any failover features, it makes sense to have Eureka. The major benefit is that if you don't want to take advantage of all of the features, you don't have to.
The container orchestration level load balancing has a place for any applications or services that do not conform to your service discovery piece that resides at the application level (Eureka).
The hardware load balancer provides another level that allows for load balancing outside of your container orchestrator.
The specific use case that I ran into was on AWS for a Kubernetes cluster with Traefik and Eureka with Spring Cloud.

Yes, you are correct. We have a similar Spring Cloud Netflix application deployed on Oracle cloud platform and Predix Cloud Foundry. If you use multiple Kubernetes clusters then you have to use Ribbon load balancing because you have multiple instance for services.
I cannot tell you which is better Kubernetes or Docker Swarm. We use Kubernetes for service orchestration as it provides more flexibility.

Microservice architectute and high availability and scalability for scheduler

i'm trying to make a microservice architecture using spring cloud, for that I use config server, eureka and etc, also I exploit the docker to deploy my services. I use several machines for that. For redundancy and load balancing i'm gonna deploy one copy of each services into each machine, but i face a problem: some of these services must be working in one copy at the same time (e.g. monitoring of something which is executed by cron expression) That is to say I don't want to have several monitorings components to be run at same time, instead they have to be set up on each machine by rotation. (e.g. as here http://www.quartz-scheduler.org/documentation/quartz-..)
How could i do that the best way? What should i use for that?
thanks

Background job Framework Needed in Production

We have a requirement, where we have to run many async background processes which accesses DBs, Kafka queues, etc. As of now, we are using Spring Batch with Tomcat (exploded WAR) for the same. However, we are facing certain issues which I'm unable to solve using Spring Batch. I was thinking of other frameworks to use, but couldn't find any that solves all my problems.
It would be great to know if there exists a framework which solves the following problems:
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
WHAT I WANT: Bundle all the jars and run each job as a separate process. The framework should store the PID and should be able to manage (stop/force-kill) the job on demand. This way, when we want to update a JAR, the existing process won't be hindered (however, we should be able to stop the existing process from UI), and no other job (running or not) will also be touched.
I have looked at hot-update of JARs in Tomcat, but I'm skeptical whether to use such a mechanism in production.
Sub-question: Will OSGI integrate with Spring Batch? If so, is it possible to run each job as a separate container with all JARs embedded in it?
Spring batch doesn't have a master-slave architecture.
WHAT I WANT: There should be a master, where the list of jobs are specified. There should be slave machines (workers), which are specified to master in a configuration file. There should exist a scheduler in the master, which when needed to start a job, should assign a slave a job (possibly load-balanced, but not necessary) and the slave should update the DB. The master should be able to send and receive data from the slaves (start/stop/kill any job, give me update of running jobs, etc.) so that it can be displayed on a UI.
This way, in case I have a high load, I should be able to just add machines into the cluster and modify the master configuration file and the load should get balanced right away.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
WHAT I WANT: I should be able to set up alerts for jobs in case of failure. If necessary, a job should have a timeout where it should able to notify the user (via email probably) or should force stop the job when the job crosses a specified threshold.

Maybe vertx can do the trick.
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
Vertx allows you to build microservices. Each vertx instance is able to communicate with other instances. If you stop one, the others can still work (if there are not dependant, eg if you stop master, slaves will fail)
Vert.x is not an application server.
There's no monolithic Vert.x instance into which you deploy applications.
You just run your apps wherever you want to.
Spring batch doesn't have a master-slave architecture
Since vertx is even driven, you can easily create a master slave architecture. For example handle the http request in an vertx instance and dispatch them between severals other instances depending on the nature of the request.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
In vertx, you can set a timeout for each message and handle failure.
Sending with timeouts
When sending a message with a reply handler you can specify a timeout in the DeliveryOptions.
If a reply is not received within that time, the reply handler will be called with a failure.
The default timeout is 30 seconds.
Send Failures
Message sends can fail for other reasons, including:
There are no handlers available to send the message to
The recipient has explicitly failed the message using fail
In all cases the reply handler will be called with the specific failure.
EDIT There are other frameworks to do microservices in java. Dropwizard is one of them, but I can't talk much more about it.

Scaling Spring on Heroku

I was wondering if someone could point me to a good tutorial or blog post on writing a spring application that can be all run in a single process for integration testing locally but when deployed will deploy different subsystems into different processes/dynos on heroku.
For example, I have services for User management, Job processing, etc. all in my web application. I want to run it just as a web application locally. But when I deploy to heroku I want to deploy just the stateless web front end to TWO dynos and then have worker dynos that I can select different services to run on. I may decide to group 2 of these services into one process or decide that each should run in its own process. Obviously when the services run in their own process they will need to transparently add some kind of transport like REST or RabbitMQ or AKKA or some such.
Any pointers on where to start looking to learn how to do this? Or am I thinking about this incorrectly and you'd like to suggest a different approach? I need to figure out how to setup the application and also how to construct maven and intellij to achieve this.
Thanks.

I can't point you to a prefabricated article or post, but I can share the direction I started down to solve a similar problem. Essentially, the proposed approach was similar to yours - put specific services with potentially long-running logic in worker dynos and pass messages via Jesque (Java port of Resque) on a RedisToGo instance (Heroku add-on). I never got the separate web vs. worker Spring contexts fully ironed out (moved on to other priorities) but the gist of it was 1) web tier app context would be configured to post messages and 2) worker app context configured to consume.
That said, I used foreman locally to simulate the Heroku environment to debug scaling (foreman start --formation="web=2" + Apache mod_proxy_http). Big Spring gotcha when you scale to 2+ dynos - make sure you are using Redis or Memcache for session storage when using webapp-runner. Spring uses HttpSession by default to store the security context... no session affinity or native Tomcat session replication.
Final caveat - in our case, none of our worker processing needed to be reflected to the end user. That said, we were using Pusher for other features (also a Heroku add-on). If you need to update the user when an async task completes, I recommend looking at it.

Job queuing library/software for Java

The premise is this: For asynchronous job processing I have a homemade framework that:
Stores jobs in database
Has a simple java api to create more jobs and processors for them
Processing can be embedded in a web application or can run by itself in on different machines for scaling out
Web UI for monitoring the queue and canceling queue items
I would like to replace this with some ready made library because I would expect more robustness from those and I don't want to maintain this. I've been researching the issue and figured you could use JMS for something similar. But I would still have to build a simple java API, figure out a runtime where I would put the processing when I want to scale out and build a monitoring UI. I feel like the only thing I would benefit from JMS is that I would not have to do is the database stuff.
Is there something similar to this that is ready made?
UPDATE
Basically this is the setup I would want to do:
Web application runs in a Servlet container or Application Server
Web application uses a client api to create jobs
X amount of machines process those jobs
Monitor and manage jobs from an UI

You can use Quartz:
http://www.quartz-scheduler.org/

Check out Spring Batch.
Link to sprint batch website: http://projects.spring.io/spring-batch/

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.