vertx-x3 cluster multiple verticles

vertx-x3 cluster multiple verticles - java

I want to use vert.x3 in cluster mode with hazelcast with java. I have two types of Verticles:
verticle for handle http requests(simple http server) (this type of verticle should be run on each node)
verticle with (non-local) eventbus consumers, for contain some data (i have N parts of data, each verticle contain one part, i would like to run each part in HA mode and only one instance for each(There are N verticles of this type in the cluster)).
Verticle of type one would communicate with verticle of type two.
I also have a fatjar with all code.
And I have few questions about it.
How should I do this?
How to run cluster?
Do I run same jar on each node or need do something else?
How to run each type of verticle?
How to guarantee that only one instance of verticle of type two will run on cluster?
Does I lose eventbus messages?
Is it correct way to use vertx for this task?

There are several questions here I'll try to answer them all.
How should I do this?
The easiest way imho is to have a single fat jar per verticle type and each verticle should have the dependency on the hazelcast cluster manager:
<dependency>
<groupId>io.vertx</groupId>
<artifactId>vertx-hazelcast</artifactId>
<version>3.3.2</version>
</dependency>
And in your shade plugin specify the manifest attributes:
Main-Class: io.vertx.core.Launcher
Main-Verticle: vertx.bc.service.Main
How to run cluster?
Now for each fat jar you can run as:
java -jar verticle1.jar -cluster
java -jar verticle2.jar -cluster
They should form a HZ cluster and be up and running. You can deploy on the same machine, or across several machines as long as your network supports multicast the default config will work for you. If you have special needs you need to customize your HZ config.
How to guarantee that only one instance of verticle of type two will run on cluster?
You can't. It is a distributed system the network should be considered unreliable so there cannot be an assumtion that you always know how many nodes of each type are running. To solve this you need monitoring tools. BTW this is not Vert.x specific but related to any distributed system/microservice architecture.
Does I lose eventbus messages?
Only if there are no consumers at the time of submission for a specific address, those messages will be lost. This relates to the previous question, to reduce this chance you should deploy more instances of a specific verticle to reduce the chance of message loss and the deployment should be across several machines to reduce the change of network split.
Is it correct way to use vertx for this task?
If you're using ha and only 1 instance this should work fine for consumer verticles, however note that the web server if for some reason dies and respawns on another host will not give what you're looking for since the http server "moved" from host1 to hostN. This means that all your web clients will now get a "Cannot connect to host" error since your application entry point is now using a different IP address.

Related

Spring Boot - Running one specific background job per pod

I'm coming from the PHP/Python/JS environment where it's a standard to run multiple instances of web application as separate processes and asynchronous tasks like queue processing as separate scripts.
eg. in the k8s environment, there would be
N instances of web server only, each running in separate pod
For each queue, dynamic number of consumers, each in separate pod
Cron scheduling using k8s crontab functionality, leaving the scheduling process to k8s
Such approach matches well the cloud nature where the workload can be scheduled across both smaller number of powerful machines and lot of less powerful machines and allows very fine control of auto scaling (based on the number of messages in specific queue for example).
Also, there is a clear separation between the developer and DevOps responsibility.
Recently, I tried to replicate the same setup with Java Spring Boot application and failed miserably.
Even though Java frameworks say that they are "cloud native", it seems like all the documentation is still built around monolith application, which handles all consumers and cron scheduling in separate threads.
Clear answer to this problem is microservices but that's way out of scope.
What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
So, the question is:
How do I design my Spring Boot application so that:
I can run the webserver separately without queue listeners and scheduled jobs
I can run one queue listener per pod in the k8s
I can use k8s cron scheduling instead of App level Spring scheduler?
I found several ways to achieve something like this but I expect there must be some "more or less standard way".
Alternative solutions that came to my mind:
Having separate module with separate Application definition so that each "command" is built separately
Using Spring Profiles to instantiate specific services only according to some environment variables
Implement custom command line runner which would parse command name/queue name and dynamically create appropriate services (this seems to be the most similar approach to the way how it's done in "scripting languages")
What I mainly want to achieve with such setup is:
To be able to run the application on lot of weak HW instead of having 1 machine with 32 cpu cores
Easier scaling per workload
Removing one layer from already complex monitoring infrastructure (k8s already allows very fine resource monitoring, application level task scheduling and parallelism makes this way more difficult)
Do I miss something or is it just that it's not standard to write Java server apps this way?
Thank you!

What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
I agree with #jacky-neo's answer in terms of the appropriate architecture/best practice, but that may require you to break up your monolithic application.
To solve this without breaking up your monolithic application, deploy multiple instances of your monolith to Kubernetes each as a separate Deployment. Each deployment can have its own configuration. Then you can utilize feature flags and define the environment variables for each deployment based on the functionality you would like to enable.
In application.properties:
myapp.queue.listener.enabled=${QUEUE_LISTENER_ENABLED:false}
In your Deployment for the queue listener, enable the feature flag:
env:
- name: 'QUEUE_LISTENER_ENABLED'
value: 'true'
You would then just need to configure your monolithic application to use this myapp.queue.listener.enabled property and only enable the queue listener when the property is set to true.
Similarly, you could also apply this logic to the Spring profile to only run certain features in your app based on the profile defined in your ConfigMap.
This Baeldung article explains the process I'm presenting here in detail.
For the scheduled task, just set up a CronJob using a curl container which can invoke the service you want to perform the work.
Edit
Another option based on your comments below -- split the shared logic out into a shared module (using Gradle or Maven), and have two other runnable modules like web and listener that depend on the shared module. This will allow you to keep your shared logic in the same repository, and keep you from having to build/maintain an extra library which you would like to avoid.
This would be a good step in the right direction, and it would lend well to breaking the app into smaller pieces later down the road.
Here's some additional info about multi-module Spring Boot projects using Maven or Gradle.

According to my expierence, I will resolve these issue as below. Hope it is what you want.
I can run the webserver separately without queue listeners and
scheduled jobs
Develop a Spring Boot app to do it and deploy it as service-A in Kubernetes. In this app, you use spring-mvc to define the controller or REST controller to receive requests. Then use the Kubernetes Nodeport or define ingress-gateway to make the service accessible from outside the Kubernetes cluster. If you use session, you should save it into Redis or a similar shared place so that more instances of the service (pod) can share same session value.
I can run one queue listener per pod in the k8s
Develop a new Spring Boot app to do it and deploy it as service-B in Kubernetes. This service only processes queue messages from RabbitMQ or others, which can be sent from service-A or another source. In most times it should not be accessed from outside the Kubernetes cluster.
I can use k8s cron scheduling instead of App level Spring scheduler?
In my opinion, I like to define a new Spring Boot app with spring-scheduler called service-C in Kubernetes. It will have only one instance and will not be scaled. Then, it will invoke service-A method at the scheduled time. It will not be accessible from outside the Kubernetes cluster. But if you like Kubernetes CronJob, you can just write a bash shell using service-A's dns name in Kubernetes to access its REST endpoint.
The above three services can each be configured with different resources such as CPU and memory usage.

I do not get the essence of your post.
You want to have an application with "monolithic code architecture".
And then deploy it to several pods, but only parts of the application are actually running.
Why don't you separate the parts you want to be special to be applications in their own right?
Perhaps this is because I come from a Java background and haven't deployed monolithic scripting apps.

Use of Task Definition in AWS | Configure Hazelcast to run on AWS

I'm a Java developer, aware of AWS and good at Hazelcast independently.
Have 2 AWS EC2 instances running and would like to run Hazelcast as an in-memory cluster between nodes. Followed link to do the required changes. Except configuration for taskdef.json in Task Definition.
Read some documentation but couldn't understand what and why exactly task definition is?
How to i know if it's already created? else if I create one now, would my production gets distracted?

The whole reason for the ec2 discovery is to resolve the issue with non static ip addresses. The EC2 plugin performs a describe instances and pulls the ip adddress from the json.

Configuring storm cluster for production cluster

We have configured storm cluster with one nimbus server and three supervisors. Published three topologies which does different calculations as follows
Topology1 : Reads raw data from MongoDB, do some calculations and store back the result
Topology2 : Reads the result of topology1 and do some calculations and publish results to a queue
Topology3 : Consumes output of topology2 from the queue, calls a REST Service, get reply from REST service, update result in MongoDB collection, finally send an email.
As new bee to storm, looking for an expert advice on the following questions
Is there a way to externalize all configurations, for example a config.json, that can be referred by all topologies?
Currently configuration to connect MongoDB, MySql, Mq, REST urls are hard-coded in java file. It is not good practice to customize source files for each customer.
Wanted to log at each stage [Spouts and Bolts], Where to post/store log4j.xml that can be used by cluster?
Is it right to execute blocking call like REST call from a bolt?
Any help would be much appreciated.

Since each topology is just a Java program, simply pass the configuration into the Java Jar, or pass a path to a file. The topology can read the file at startup, and pass any configuration to components as it instantiates them.
Storm uses slf4j out of the box, and it should be easy to use within your topology as such. If you use the default configuration, you should be able to see logs either through the UI, or dumped to disk. If you can't find them, there are a number of guides to help, e.g. http://www.saurabhsaxena.net/how-to-find-storm-worker-log-directory/.
With storm, you have the flexibility to push concurrency out to the component level, and get multiple executors by instantiating multiple bolts. This is likely the simplest approach, and I'd advise you start there, and later introduce the complexity of an executor inside of your topology for asynchronously making HTTP calls.
See http://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html for the canonical overview of parallelism in storm. Start simple, and then tune as necessary, as with anything.

Background job Framework Needed in Production

We have a requirement, where we have to run many async background processes which accesses DBs, Kafka queues, etc. As of now, we are using Spring Batch with Tomcat (exploded WAR) for the same. However, we are facing certain issues which I'm unable to solve using Spring Batch. I was thinking of other frameworks to use, but couldn't find any that solves all my problems.
It would be great to know if there exists a framework which solves the following problems:
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
WHAT I WANT: Bundle all the jars and run each job as a separate process. The framework should store the PID and should be able to manage (stop/force-kill) the job on demand. This way, when we want to update a JAR, the existing process won't be hindered (however, we should be able to stop the existing process from UI), and no other job (running or not) will also be touched.
I have looked at hot-update of JARs in Tomcat, but I'm skeptical whether to use such a mechanism in production.
Sub-question: Will OSGI integrate with Spring Batch? If so, is it possible to run each job as a separate container with all JARs embedded in it?
Spring batch doesn't have a master-slave architecture.
WHAT I WANT: There should be a master, where the list of jobs are specified. There should be slave machines (workers), which are specified to master in a configuration file. There should exist a scheduler in the master, which when needed to start a job, should assign a slave a job (possibly load-balanced, but not necessary) and the slave should update the DB. The master should be able to send and receive data from the slaves (start/stop/kill any job, give me update of running jobs, etc.) so that it can be displayed on a UI.
This way, in case I have a high load, I should be able to just add machines into the cluster and modify the master configuration file and the load should get balanced right away.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
WHAT I WANT: I should be able to set up alerts for jobs in case of failure. If necessary, a job should have a timeout where it should able to notify the user (via email probably) or should force stop the job when the job crosses a specified threshold.

Maybe vertx can do the trick.
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
Vertx allows you to build microservices. Each vertx instance is able to communicate with other instances. If you stop one, the others can still work (if there are not dependant, eg if you stop master, slaves will fail)
Vert.x is not an application server.
There's no monolithic Vert.x instance into which you deploy applications.
You just run your apps wherever you want to.
Spring batch doesn't have a master-slave architecture
Since vertx is even driven, you can easily create a master slave architecture. For example handle the http request in an vertx instance and dispatch them between severals other instances depending on the nature of the request.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
In vertx, you can set a timeout for each message and handle failure.
Sending with timeouts
When sending a message with a reply handler you can specify a timeout in the DeliveryOptions.
If a reply is not received within that time, the reply handler will be called with a failure.
The default timeout is 30 seconds.
Send Failures
Message sends can fail for other reasons, including:
There are no handlers available to send the message to
The recipient has explicitly failed the message using fail
In all cases the reply handler will be called with the specific failure.
EDIT There are other frameworks to do microservices in java. Dropwizard is one of them, but I can't talk much more about it.

High availability of Standalone Java Multi threaded application

We are using a Core Java APP with no WEBSERVER, which is MULTI-Threaded. We have a requirement where in ,our app is to be made highly available in the customers environment.
All the transactions in our app are majorly ActiveMQ(Java Messaging Services TCP connections) based i.e. we communicate with other apps using message Queues. We also have HTTP connections
For High-availability of ActiveMQ ,we have implemented it in Master/Slave Configuration(Active/Passive)
For High-availability of our App(Active/Active),we thought of deploying two instances of the app which will consume the messages parallelly,
but this implementaion will rule out our internal feature of retaining the message . We are acknowledging the message from ActiveMQ queue only if they are proccessed.
Hence having two instances running might result in duplication of the proccess for the corresponding message.
Please advice us on how to make our App Highly-available.
Does a load Balancer in Place solve our issue? Also,
Should we have to convert our Core Java App into services?
Thanks in advance

Whenever you want high availability of app, AND IF high availability of your app directly depends on high availability of ActiveMQ, then what you should really be doing is, having a single instance of App and multiple instances of activeMQ,
what this does is, even if one instance of ActiveMQ goes down, other might takeover(typical Master-slave configuration) and App will function as expected.
This topology will also not result in duplication of message processing,because at any point of time only one ActiveMQ instance will be associated with your app.
for load balncing you can have a look here(if it suits your requirements).
https://docs.oracle.com/cd/E19823-01/819-0215/loadb.html
hope this helps!
Good luck!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.