I am working on an application which is deployed in pcf and we are using caching mechanism. We want to implement a batch job which will purge the data from cache region.
I want get some suggestions on below:
should i include this batch job in the same application so that it uses the same server to run the batch jobs?
or should i create a new server for running these batch jobs?
Just want to see how the performance of the current application will be impacted, if we run the batch job from the same server. Just want to see the advantages and disadvantages for the same..
TIA
Related
I am new to Flink and kubernetes. I am planning to creating a flink streaming job that streams data from a FileSystem to Kafka.
Have the flink job jar which is working fine(tested locally). Now I am trying to host this job in kubernetes, and would like to use EKS in AWS.
I have read through official flink documentation on how to setup flink cluster.
https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/kubernetes.html
I tried to set it up locally using minikube and brought up session cluster and submitted the job which is working fine.
My questions:
1)Out of the two options Job cluster and session cluster, since the job is streaming job and should keep monitor the filesystem and when any new files came in it should stream it to destination, can I use job cluster in this case? As per documentation job cluster is something that executes the job and terminates once it is completed, if the job has monitor on a folder does it ever complete?
2)I have a maven project that builds the flink jar, would like to know the ideal way to spin a session/job cluster using this jar in production ? what is the normal CI CD process ? Shall I build a session cluster initially and submit the jobs whenever needed ? or spinning up Job cluster with the jar built ?
First off, the link that you provided is for Flink 1.5. If you are starting fresh, I'd recommend using Flink 1.9 or the upcoming 1.10.
For your questions:
1) A job with file monitor never terminates. It cannot know when no more files arrive, so you have to cancel it manually. Job cluster is fine for that.
2) There is no clear answer to that and it's also not Flink specific. Everyone has a different solution with different drawbacks.
I'd aim for a semi-automatic approach, where everything is automatic but you need to explicitly press a deploy button (and not just a git push). Often times, these CI/CD pipelines deploy on a test cluster first and make a smoke test before allowing a deploy on production.
If you are completely fresh, you could check the AWS codedeploy. However, I made good experiences with Gitlab and AWS runner.
The normal process would be something like:
build
integration/e2e tests on build machine (dockerized)
deploy on test cluster/preprod cluster
run smoke tests
deploy on prod
I have also seen processes that go quickly on prod and invest the time in better monitoring and a fast rollback instead of preprod cluster and smoke tests. That's usually viable for business uncritical processes and how expensive a reprocessing is.
How to do Spring Batch Remote Chunking within Spring Cloud Data Flow Server?
In my understanding - Remote Partitioning of Spring Batch can be done within Spring Cloud Data Flow Server using DeployerPartitionHandler.
But, How do we implement Remote Chunking inside SCDF?
There is nothing special to run a remote chunking job on SCDF. All you need to do is to run both the master and workers as Task applications.
I am developing an ETL batch application using spring batch. My ETL process takes data from one pagination based REST API and loads it to the Google Big-query. I would like to deploy this batch application in kubernetes cluster and want to exploit pod scalability feature. I understand spring batch supports both horizontal and vertical scaling. I have few questions:-
1) How to deploy this ETL app on kubernetes so that it creates pod on demand using remote chunking / remote partitioning?
2) I am assuming there would be main master pod and different slave pods provisioned based on load. Is it correct?
3) There is one kubernetes batch API also available. Use kubernetes batch API or use Spring Cloud feature.Whis option is the better one?
I have used Spring Boot with Spring Batch and Spring Cloud Task to do something similar to what you want to do. Maybe it will help you.
The way it works is like this: I have a manager app that deploys pods on Kubernetes with my master application. The master application does some work and then starts the remote partitioning deploying several other pods with "workers".
Trying to answer your questions:
1) You can create a docker image of an application that has a Spring Batch job. Let's call it Master application.
The application that will deploy the master application could uses a TaskLauncher or an AppDeployer from spring cloud deployer kubernetes
2) Correct. In this case you could use remote partitioning. Each partition would be another docker image with a Job. This would be your worker.
An example of remote partitioning can be found here.
3) In my case I used spring batch and manage to do everything I needed. The only problems I have now is with Upscalling and Downscaling my cluster. Since my workers are not stateful I'm experiencing some problems when instances are removed from the cluster. If you don't need to upscale or downscale your cluster, you are good to go.
I am using spring batch local partitioning to process my Job.In local partitioning multiple slaves will be created in same instance i.e in the same job. How Remote partitioning is different from local partitioning.What i am assuming is that in Remote partitioning each slave will be executed in different machine. Is my understanding correct. If my understanding is correct how to start the slaves in different machines without using cloudfoundry. I have seen Michael Minella talk on Remote partitioning https://www.youtube.com/watch?v=CYTj5YT7CZU tutorial. I am curious to know how remote partitioning works without using cloudfoundry. How can I start slaves in different machines?
While that video uses CloudFoundry, the premise of how it works applies off CloudFoundry as well. In that video I launch multiple JVM processes (web apps in that case). Some are configured as slaves so they listen for work. The other is configured as a master and he's the one I use to do the actual launching of the job.
Off of CloudFoundry, this would be no different than deploying WAR files onto Tomcat instances on multiple servers. You could also use Spring Boot to package executable jar files that run your Spring applications in a web container. In fact, the code for that video (which is available on Github here: https://github.com/mminella/Spring-Batch-Talk-2.0) can be used in the same way it was on CF. The only change you'd need to make is to not use the CF specific connection factories and use traditional configuration for your services.
In the end, the deployment model is the same off CloudFoundry or on. You launch multiple JVM processes on multiple machines (connected by middleware of your choice) and Spring Batch handles the rest.
Has anyone worked or has any experience with executing spring batch jobs from web UI. Currently I have written few jobs for data-copy from CSV to DB table, it runs fine from command prompt and in a JUnit test. But now these jobs have to be executed through web, JSF is being used as the front controller framework. Any suggestions about the best practices in this case would be very helpful.
Thanks!
Spring Batch Admin is a deployable web frontend for your Spring Batch jobs. If all you want is a simple UI instead of a shell script for Administrators, take this approach:
http://static.springsource.org/spring-batch-admin/getting-started.html
If you're looking for a way to integrate the job trigger mechanism with your existing application, look at this implementation using Spring's JobLauncher which can be invoked from Controller/Servlet:
http://docs.spring.io/spring-batch/trunk/reference/html/configureJob.html#runningJobsFromWebContainer