Spring Cloud Data Flow - Spring Batch Remote Chunking - java

How to do Spring Batch Remote Chunking within Spring Cloud Data Flow Server?
In my understanding - Remote Partitioning of Spring Batch can be done within Spring Cloud Data Flow Server using DeployerPartitionHandler.
But, How do we implement Remote Chunking inside SCDF?

There is nothing special to run a remote chunking job on SCDF. All you need to do is to run both the master and workers as Task applications.

Related

Cron job from application or external server

I am working on an application which is deployed in pcf and we are using caching mechanism. We want to implement a batch job which will purge the data from cache region.
I want get some suggestions on below:
should i include this batch job in the same application so that it uses the same server to run the batch jobs?
or should i create a new server for running these batch jobs?
Just want to see how the performance of the current application will be impacted, if we run the batch job from the same server. Just want to see the advantages and disadvantages for the same..
TIA

Spring Batch without Spring Cloud Data Flow

I have a Spring Boot application that uses Spring Batch. I want now to implement an admin panel to see all job statuses. For this, Spring has "spring-batch-admin" But I see that is deprecated long time ago:
The functionality of Spring Batch Admin has been mostly duplicated
and
expanded upon via Spring Cloud Data Flow and we encourage all users to
migrate to that going forward.
But then Spring Cloud Data Flow says:
Pipelines consist of Spring Boot apps, built using the Spring Cloud
Stream or Spring Cloud Task microservice frameworks
So in order to use this functionality do I really need to convert my spring boot app to a microservice? Isn't this an overkill just to see some batch statuses? Also I can not install docker on my production server(for various reasons) Can I still use Spring Cloud Data Flow without docker?
Yes, spring boot batch should be wrapped as spring cloud task, which should not be too complicated.
If Docker does not suit your needs - https://docs.spring.io/spring-cloud-dataflow/docs/current/reference/htmlsingle/#getting-started-local-deploying-spring-cloud-dataflow

Overhead in Spring Batch

I have a Restful microservice that uses Spring-Boot and Spring-Batch. When I run it locally it uses H2 database. Sending requests to microservice, I see the Spring Batch serializes the context of Job. Using a tool for profiling I see the microservice stays too much on this serialization.
Table used: BATCH_JOB_EXECUTION_CONTEXT.
Call graph:
Do you have any idea how the serialization of context can be disabled?
Thank you

Scalable spring batch job on kubernetes

I am developing an ETL batch application using spring batch. My ETL process takes data from one pagination based REST API and loads it to the Google Big-query. I would like to deploy this batch application in kubernetes cluster and want to exploit pod scalability feature. I understand spring batch supports both horizontal and vertical scaling. I have few questions:-
1) How to deploy this ETL app on kubernetes so that it creates pod on demand using remote chunking / remote partitioning?
2) I am assuming there would be main master pod and different slave pods provisioned based on load. Is it correct?
3) There is one kubernetes batch API also available. Use kubernetes batch API or use Spring Cloud feature.Whis option is the better one?
I have used Spring Boot with Spring Batch and Spring Cloud Task to do something similar to what you want to do. Maybe it will help you.
The way it works is like this: I have a manager app that deploys pods on Kubernetes with my master application. The master application does some work and then starts the remote partitioning deploying several other pods with "workers".
Trying to answer your questions:
1) You can create a docker image of an application that has a Spring Batch job. Let's call it Master application.
The application that will deploy the master application could uses a TaskLauncher or an AppDeployer from spring cloud deployer kubernetes
2) Correct. In this case you could use remote partitioning. Each partition would be another docker image with a Job. This would be your worker.
An example of remote partitioning can be found here.
3) In my case I used spring batch and manage to do everything I needed. The only problems I have now is with Upscalling and Downscaling my cluster. Since my workers are not stateful I'm experiencing some problems when instances are removed from the cluster. If you don't need to upscale or downscale your cluster, you are good to go.

spring boot application in cluster

I am developing a spring boot application.
Since spring boot created a .jar file for an application.
I want to cluster this particular application on different server. Lets say I build a jar file and ran a project then it should run in cluster mode from number of defined servers and should be able to serve end user needs.
My jar will reside on only one server but it will be clustered across number of servers. When end user calls a web service from my spring boot app he never know from where it is getting called.
The reason behind clustering is suppose any of the server goes down in future, end user will still be able to access web services from another server. But I don't know how to make it clustered.
Can any one please give me insight on this ?
If you want to have it clustered, you just run your Spring Boot application on multiple servers (of course, the JAR must be present on those servers, otherwise you can't run it). You would then place a loadbalancer in front of the application servers to distribute the load.
If all services you are going to expose are stateless so you only need to use load balancer in front of your nodes for ex. apache or nginx, if your services are stateful "store any state [session, store data in db]" so you have to use distributed cache or in memory data grid:
for session you can use spring-session project which could used rails to store sessions.
for store data in DB you need to cluster DB it self and can use distributed cache above your DB layer like Hazelcast.
Look into spring cloud, they have used some netflix open software along with amazons to create 12 factor apps for micro services.
Ideally you would need a load balancer, service registry that can help you achieve multiple instances of spring boot. I believe you have to add a dependency called eureka.
Check the below link
Spring cloud
You can deploy it in cloud foundry and use autoscale function to increase your application instances.

Categories