Apache Storm integration with Spring framework

Apache Storm integration with Spring framework - java

I'm new to Apache Storm. Currently I'm working on legacy project that involves some streaming processing using Apache Storm. I want to integrate current project with Spring. I found couple comments (Storm and Spring 4 integration, http://mail-archives.apache.org/mod_mbox/storm-user/201605.mbox/%3CCAMwbCdz7myeBs+Z2mZDxWgqBPfjcq-tynOz_+pmPrmY6umfUxA#mail.gmail.com%3E) saying that there are concerns doing that. Can someone explain me how to do such an integration or why it is impossible?

Fair warning, I haven't used Spring in Storm, so this is based solely on my knowledge of Storm, and having used Spring on non-Storm projects, i.e. this is really just guesswork.
I think you can use Spring with Storm, but there are some caveats you should be aware of. Whether Spring is still worth using given these caveats is up to you.
Unlike e.g. a Spring MVC application, Spring will not be responsible for object instantiation or application flow. Storm doesn't know about Spring, and when you run your topology it will be Storm that calls your bolt/spout methods. This means you have to be aware that some parts of your application will be called outside the Spring context.
Here's my guess at where you could use Spring during different phases of a topology deployment.
When you set up your topology and submit it (all your code up to and including StormSubmitter.submitTopology), you can most likely use Spring just like you would in any standalone Java application. e.g. you could start your application like in this example, and put all your submission and wiring code in Main.start. All bolt/spout constructors will run in this phase, so you can use autowiring here if you like. You have to ensure that your spouts and bolts are serializable though.
After topology submission, Storm will serialize your spouts and bolts (and any non-transient fields in these objects), and send them to the supervisor machines, where they will be deserialized. At this point if you need a context available in the worker, you could create one in a worker hook (added to the topology via TopologyBuilder.addWorkerHook), and expose it through a static method on the hook (which is a little ugly, but I don't see any other way to make it available to other parts of the code).
Just to reiterate, if you decide to run a Spring context inside your workers, you have to be aware that the spout/bolt methods will be invoked by Storm outside the Spring context.

Related

Spring Boot - Running one specific background job per pod

I'm coming from the PHP/Python/JS environment where it's a standard to run multiple instances of web application as separate processes and asynchronous tasks like queue processing as separate scripts.
eg. in the k8s environment, there would be
N instances of web server only, each running in separate pod
For each queue, dynamic number of consumers, each in separate pod
Cron scheduling using k8s crontab functionality, leaving the scheduling process to k8s
Such approach matches well the cloud nature where the workload can be scheduled across both smaller number of powerful machines and lot of less powerful machines and allows very fine control of auto scaling (based on the number of messages in specific queue for example).
Also, there is a clear separation between the developer and DevOps responsibility.
Recently, I tried to replicate the same setup with Java Spring Boot application and failed miserably.
Even though Java frameworks say that they are "cloud native", it seems like all the documentation is still built around monolith application, which handles all consumers and cron scheduling in separate threads.
Clear answer to this problem is microservices but that's way out of scope.
What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
So, the question is:
How do I design my Spring Boot application so that:
I can run the webserver separately without queue listeners and scheduled jobs
I can run one queue listener per pod in the k8s
I can use k8s cron scheduling instead of App level Spring scheduler?
I found several ways to achieve something like this but I expect there must be some "more or less standard way".
Alternative solutions that came to my mind:
Having separate module with separate Application definition so that each "command" is built separately
Using Spring Profiles to instantiate specific services only according to some environment variables
Implement custom command line runner which would parse command name/queue name and dynamically create appropriate services (this seems to be the most similar approach to the way how it's done in "scripting languages")
What I mainly want to achieve with such setup is:
To be able to run the application on lot of weak HW instead of having 1 machine with 32 cpu cores
Easier scaling per workload
Removing one layer from already complex monitoring infrastructure (k8s already allows very fine resource monitoring, application level task scheduling and parallelism makes this way more difficult)
Do I miss something or is it just that it's not standard to write Java server apps this way?
Thank you!

What I need is to deploy separate parts of application (like 1 queue listener only) per pod in the cloud yet keep the monolith code architecture.
I agree with #jacky-neo's answer in terms of the appropriate architecture/best practice, but that may require you to break up your monolithic application.
To solve this without breaking up your monolithic application, deploy multiple instances of your monolith to Kubernetes each as a separate Deployment. Each deployment can have its own configuration. Then you can utilize feature flags and define the environment variables for each deployment based on the functionality you would like to enable.
In application.properties:
myapp.queue.listener.enabled=${QUEUE_LISTENER_ENABLED:false}
In your Deployment for the queue listener, enable the feature flag:
env:
- name: 'QUEUE_LISTENER_ENABLED'
value: 'true'
You would then just need to configure your monolithic application to use this myapp.queue.listener.enabled property and only enable the queue listener when the property is set to true.
Similarly, you could also apply this logic to the Spring profile to only run certain features in your app based on the profile defined in your ConfigMap.
This Baeldung article explains the process I'm presenting here in detail.
For the scheduled task, just set up a CronJob using a curl container which can invoke the service you want to perform the work.
Edit
Another option based on your comments below -- split the shared logic out into a shared module (using Gradle or Maven), and have two other runnable modules like web and listener that depend on the shared module. This will allow you to keep your shared logic in the same repository, and keep you from having to build/maintain an extra library which you would like to avoid.
This would be a good step in the right direction, and it would lend well to breaking the app into smaller pieces later down the road.
Here's some additional info about multi-module Spring Boot projects using Maven or Gradle.

According to my expierence, I will resolve these issue as below. Hope it is what you want.
I can run the webserver separately without queue listeners and
scheduled jobs
Develop a Spring Boot app to do it and deploy it as service-A in Kubernetes. In this app, you use spring-mvc to define the controller or REST controller to receive requests. Then use the Kubernetes Nodeport or define ingress-gateway to make the service accessible from outside the Kubernetes cluster. If you use session, you should save it into Redis or a similar shared place so that more instances of the service (pod) can share same session value.
I can run one queue listener per pod in the k8s
Develop a new Spring Boot app to do it and deploy it as service-B in Kubernetes. This service only processes queue messages from RabbitMQ or others, which can be sent from service-A or another source. In most times it should not be accessed from outside the Kubernetes cluster.
I can use k8s cron scheduling instead of App level Spring scheduler?
In my opinion, I like to define a new Spring Boot app with spring-scheduler called service-C in Kubernetes. It will have only one instance and will not be scaled. Then, it will invoke service-A method at the scheduled time. It will not be accessible from outside the Kubernetes cluster. But if you like Kubernetes CronJob, you can just write a bash shell using service-A's dns name in Kubernetes to access its REST endpoint.
The above three services can each be configured with different resources such as CPU and memory usage.

I do not get the essence of your post.
You want to have an application with "monolithic code architecture".
And then deploy it to several pods, but only parts of the application are actually running.
Why don't you separate the parts you want to be special to be applications in their own right?
Perhaps this is because I come from a Java background and haven't deployed monolithic scripting apps.

Which of these approaches is better for breaking Spring Boot applications into three tiers?

We are trying to split a big monolithic J2EE application into a set of modules to provide unified business logic for all web application clients.
Our goal is to have smaller and more specialized business modules that can be used in any combination by many distinct web applications. This way we expect the applications get easier to maintain individually (compared to the now heavily coupled monolithic one).
We are planning to arrange it like the following diagram:
Where Web Apps call module services on the upper layer to handle business logic through method calls (RMI was the intended protocol but we are open to other options).
J2EE makes it easy to arrange services on three tiers like this through EJB remote beans.
But we are also trying to setup these modules as spring boot applications (using the Spring Boot JPA Starter) because it makes development much more convenient.
My initial plan was to implement the modules with Spring Boot and expose a thin layer of EJB beans as proxies to spring beans with the actual implementation of the service. EJB beans would have spring beans injected using the SpringBeanAutowiringInterceptor class as in spring v4 documentation.
But since this kind of integration was removed from spring v5 I don't know how to proceed.
They are also considering "EJB as an effectively deprecated technology now" as stated in the issue above. But for my use case I can't find a good enough alternative.
So far I have thought of the following options:
Using a custom interceptor as the issue suggests. But it looks like reinventing a discarded old wheel (if that analogy makes any sense).
Spring remoting is an alternative but it's challenging to make it work with Wildfly JNDI for RMI configuration and trying to looks like re-implementing EJB remoting.
Spring Integration I think will add too much complexity and overhead for this simple task.
Message based integration (JMS, MQTT, etc...) may not fit well because of the synchronous nature of what we are trying to achieve.
REST API calls would add a lot of boilerplate code (for serialization and deserialization of objects into JSON, endpoints configuration and so on) and also add some ovehead because the HTTP handling.
Our goals, in this order of priority, are:
Make this integration as solid and fail-proof as possible.
Make calls to business logic from the web layer to the service layer as simple as possible.
Have as little overhead as possible.
With all that said which of those five options mentioned (or maybe another one I haven't considered) would be the best way and why?

Scheduled Jobs in Spring using Akka

I am trying to determine the best way to implement handling long running batch jobs in Spring MVC. I come across Akka in my searching as a non blocking framework for aync processing, which is preferred because I don't want the batch processing to eat up all the threads from the thread pool.
Essentially what I will be doing is have a job that needs to run on some set schedule that will go out and call various web services, process the data, and persist it.
I have seen some code example with using it with Spring, but I've never seen it used with a CRON type scheduler. It always seems to be using a fixed time period.
I'm not sure if this is even the best approach to handling large scale batch processing within Spring. Any suggestions or links to good Akka Spring resources are welcome.

I would suggest you to look into Spring Integration and Spring Batch projects. The first one allows you configure chains of services using EIP. We used it in or project to fetch files from FTP, deserialize and process them, import into DB, send emails if required etc. - all by schedule. The second one is more straightforward and basically provides a framework to work on rows of data. Both are configurable with Quartz and integrate into Spring MVC project nicely.

Camel - Integrating with Existing Application

I currently work on a trading application that does not use camel.
It essentially takes in trades, does some processing and sends the details to an external system.
We now have a need to integrate with 3 new systems uusing FTP for 2 systems and JMS for 1 system.
I would like to use Camel in my application for these integrations. I have read a good chunk of camel in action but I was unclear on how we could kick off our camel routes
Essentially, we dont want to modify too drastically any part of the existing application as its working well in production.
In the existing application, we generate a Trade Value Object and its from this object that that I want to kick off our camel integration.
I dont have a database table or jms queue where I can kick off the route from.
I had a quick look at the Chapter on Bean routing and remoting in the Camel in Action book but I wanted to get peoples advise first before proceeding with any steps.
What would be the best approach for this integration?
Thanks
Damien

You can use Camel's POJO Producing feature that allows you to send a message to a camel endpoint from the java bean. If you have no need in JMS or DB you can use "direct:" or "seda:" or "vm:" endpoint as <from> part of your route.

Pojo producing as Konstantin V. Salikhov stated. However, you need to be sure you have a spring application and are scanning your beans with spring or wire them.
"If a bean is defined in Spring XML or scanned using the Spring component scanning mechanism and a is used or a CamelBeanPostProcessor then we process a number of Camel annotations to do various things such as injecting resources or producing, consuming or routing messages."
If this approach will add too much changes in your application, you could use a ProducerTemplate and just invoke a direct endpoint. (Or SEDA for that matter).
The choice of protocol here might be important. The direct protocol is a safe choice, since the overhead is simply a method call. Also, exceptions will propagate well through direct endpoints, as will transactions. As SEDA endpoints is asynchronous (like JMS) but does not feature persistence, there is a slight chance of loosing in flight data in case of a crash. This might or might not be an issue. However, with high load, the SEDA protocol stages better and give your application better resistance for load peaks.

BlazeDS and Java class in WAR file

Hi I have a java class which has been deployed as WAR web application in a BlazeDS/Spring server sitting on JBOSS.
Apart from the Flex application which will access the WAR file, I also need to start some server side process's which will initiate BlazeDS "pushes" to the Flex client via AMF messaging.
What is the best way to implement this server side process?
- Should it just be a class with a main() method in the WAR file which gets called from the command line? Can this be done - not sure you can run a class in a WAR file from command line?
- Should it just be a class with a main() method in a JAR file which gets called from the command line?
Not sure what the standard practise here is. The key is that the process needs to be started on the BlazeDS server to push data out (not on the Flex client).
Any help would he appreacited
Mike

First off, are you using the latest Spring/BlazeDS integration? If not, I highly recommend checking it out here. It can greatly simplify setting up message destinations for push messaging. It also will allow you to use JMS and Spring Integration message destinations as well as integrate Spring Security if you so choose.
Now to answer your question. What are the life-cycle requirements for your data push service? Do you want to be able to control the parameters of this data push (i.e., starting and stopping it, frequency, etc.) from other classes? Creating this service using Spring will allow you to inject it into other beans for control as you so desire.
I currently have a similar use case in which I use a BlazeDS message destination to "push" telemetry data to a client browser. I setup a "service" class that is instantiated by Spring as a singleton instance.
If you do not need external control of this singleton, I then suggest you use an annotated #PostConstruct or "init" method for creating a Thread and starting it with an anonymous Runnable representing your main loop. If your service needs to push data at a predefined frequency, you might consider a java.util.concurrent.ScheduledExecutorService.
Either way, you will also need to setup an annotated #PreDestory or "destroy" method that will execute just before the singleton instance is destroyed. This will allow you to insert code to safely stop the loop Thread or ScheduledFuture and clean up any necessary resources before the Spring container is shut down.
If you want further interaction with your service, you can manipulate it from other classes (such as Web controllers, etc.) using a service interface. Have your class implement this interface and inject your class into other classes using this interface. For a more daring solution, you might consider using dm Server or another OSGi container and create an OSGi service.
Please let me know if you need further help regarding this process or if there are specific details that I can illuminate further.

Marshall your a star - thanks for that!
I am using the Spring #PostConstruct and this is working a treat. It appears that the Monitoring class is getting instantiated by Spring automatically and then the #PostConstruct method is being called.
I also had to include the following in the Spring config file to get this to work:
xmlns:context=springframework.org/schema/context
springframework.org/schema/context
springframework.org/schema/context/spring-context-2.5.xsd
Within the #PostConstruct method I have implemented a simple java.util.Timer which pushes data to the Flex client are regular intervals. (I still need to set it up as a singleton via Spring - im a bit of Spring newbie!)
Does the ScheduledExecutorService offer any benefits above the Timer class for my purposes?
Once again thanks
Regards
Michael

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.