Currently, I have an idea about building a centralized batch job management system (I temporarily call it batch service).
We own a microservice system, and the batch jobs are scattered across the services (including oracle's bacth jobs). So I intend to set up a bacth job management system.
But there is one problem that in microservices there are many databases, so I want the manipulation of data to be done by other services, and batch service only does the following things: setting, scheduling, checking status state, log, start, stop, retry.
My idea is to use message broker(kafka, rabbitmq, ...) to pass job request from batch service to other services. But I am not thinking of a solution to stop or save the log of jobs on the batch service.
Is this idea feasible and if so can you give me some advice on deployment technologies (We are deploying using spring boot at the moment).
Thanks for taking the time to read ^^.
Related
I have a requirement to create around 10 Spring Batch jobs, which will consists of a reader and a writer. All readers read data from some different Oracle DB and write into a different Oracle Db(Source and destination servers are different). And the Spring jobs are implemented using Spring Boot. Also all 10+ jobs would be packaged into a single Jar File. So far fine.
Now the client also wants some UI to monitor the job status and act as a job organizer. I gone through the Spring Data flow Server documentation for UI requirement. But I'm not sure whether it'll serve the purpose, or is there any other alternative option available for monitoring the job status, stop and start the jobs whenever required from the UI.
Also how could I separate the the 10+ jobs inside a single Jar in the Spring Data Flow Server if it's the only option for an UI.
Thanks in advance.
I don't have reputation to add a comment. So, I am posting answer here. Although I know this is not the way to share reference link as an answer.
This might help you:
spring-batch-job-monitoring-with-angular-front-end-real-time-progress-bar
Observability of spring batch jobs is given by data that are persisted by the framework in a relational database... instances..executions..timestamps...read count..write count....
You have different way to exploit these data. SQL client, JMX, spring batch api (JobExplorer, JobOperator), spring admin (deprecated in favor of cloud data flow server).
Data flow is an orchestrator allowing you to execute data pipelines with streams and tasks(finite and short lived/monitored services). For your jobs we can imagine wrap each jobs in tasks and create a multitask pipeline. Data flow gives you status of each executions.
You can also expose your monitoring data by pushing them as metrics in an influxDb for instance...
We have a requirement, where we have to run many async background processes which accesses DBs, Kafka queues, etc. As of now, we are using Spring Batch with Tomcat (exploded WAR) for the same. However, we are facing certain issues which I'm unable to solve using Spring Batch. I was thinking of other frameworks to use, but couldn't find any that solves all my problems.
It would be great to know if there exists a framework which solves the following problems:
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
WHAT I WANT: Bundle all the jars and run each job as a separate process. The framework should store the PID and should be able to manage (stop/force-kill) the job on demand. This way, when we want to update a JAR, the existing process won't be hindered (however, we should be able to stop the existing process from UI), and no other job (running or not) will also be touched.
I have looked at hot-update of JARs in Tomcat, but I'm skeptical whether to use such a mechanism in production.
Sub-question: Will OSGI integrate with Spring Batch? If so, is it possible to run each job as a separate container with all JARs embedded in it?
Spring batch doesn't have a master-slave architecture.
WHAT I WANT: There should be a master, where the list of jobs are specified. There should be slave machines (workers), which are specified to master in a configuration file. There should exist a scheduler in the master, which when needed to start a job, should assign a slave a job (possibly load-balanced, but not necessary) and the slave should update the DB. The master should be able to send and receive data from the slaves (start/stop/kill any job, give me update of running jobs, etc.) so that it can be displayed on a UI.
This way, in case I have a high load, I should be able to just add machines into the cluster and modify the master configuration file and the load should get balanced right away.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
WHAT I WANT: I should be able to set up alerts for jobs in case of failure. If necessary, a job should have a timeout where it should able to notify the user (via email probably) or should force stop the job when the job crosses a specified threshold.
Maybe vertx can do the trick.
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
Vertx allows you to build microservices. Each vertx instance is able to communicate with other instances. If you stop one, the others can still work (if there are not dependant, eg if you stop master, slaves will fail)
Vert.x is not an application server.
There's no monolithic Vert.x instance into which you deploy applications.
You just run your apps wherever you want to.
Spring batch doesn't have a master-slave architecture
Since vertx is even driven, you can easily create a master slave architecture. For example handle the http request in an vertx instance and dispatch them between severals other instances depending on the nature of the request.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
In vertx, you can set a timeout for each message and handle failure.
Sending with timeouts
When sending a message with a reply handler you can specify a timeout in the DeliveryOptions.
If a reply is not received within that time, the reply handler will be called with a failure.
The default timeout is 30 seconds.
Send Failures
Message sends can fail for other reasons, including:
There are no handlers available to send the message to
The recipient has explicitly failed the message using fail
In all cases the reply handler will be called with the specific failure.
EDIT There are other frameworks to do microservices in java. Dropwizard is one of them, but I can't talk much more about it.
I am trying to determine the best way to implement handling long running batch jobs in Spring MVC. I come across Akka in my searching as a non blocking framework for aync processing, which is preferred because I don't want the batch processing to eat up all the threads from the thread pool.
Essentially what I will be doing is have a job that needs to run on some set schedule that will go out and call various web services, process the data, and persist it.
I have seen some code example with using it with Spring, but I've never seen it used with a CRON type scheduler. It always seems to be using a fixed time period.
I'm not sure if this is even the best approach to handling large scale batch processing within Spring. Any suggestions or links to good Akka Spring resources are welcome.
I would suggest you to look into Spring Integration and Spring Batch projects. The first one allows you configure chains of services using EIP. We used it in or project to fetch files from FTP, deserialize and process them, import into DB, send emails if required etc. - all by schedule. The second one is more straightforward and basically provides a framework to work on rows of data. Both are configurable with Quartz and integrate into Spring MVC project nicely.
I was wondering if someone could point me to a good tutorial or blog post on writing a spring application that can be all run in a single process for integration testing locally but when deployed will deploy different subsystems into different processes/dynos on heroku.
For example, I have services for User management, Job processing, etc. all in my web application. I want to run it just as a web application locally. But when I deploy to heroku I want to deploy just the stateless web front end to TWO dynos and then have worker dynos that I can select different services to run on. I may decide to group 2 of these services into one process or decide that each should run in its own process. Obviously when the services run in their own process they will need to transparently add some kind of transport like REST or RabbitMQ or AKKA or some such.
Any pointers on where to start looking to learn how to do this? Or am I thinking about this incorrectly and you'd like to suggest a different approach? I need to figure out how to setup the application and also how to construct maven and intellij to achieve this.
Thanks.
I can't point you to a prefabricated article or post, but I can share the direction I started down to solve a similar problem. Essentially, the proposed approach was similar to yours - put specific services with potentially long-running logic in worker dynos and pass messages via Jesque (Java port of Resque) on a RedisToGo instance (Heroku add-on). I never got the separate web vs. worker Spring contexts fully ironed out (moved on to other priorities) but the gist of it was 1) web tier app context would be configured to post messages and 2) worker app context configured to consume.
That said, I used foreman locally to simulate the Heroku environment to debug scaling (foreman start --formation="web=2" + Apache mod_proxy_http). Big Spring gotcha when you scale to 2+ dynos - make sure you are using Redis or Memcache for session storage when using webapp-runner. Spring uses HttpSession by default to store the security context... no session affinity or native Tomcat session replication.
Final caveat - in our case, none of our worker processing needed to be reflected to the end user. That said, we were using Pusher for other features (also a Heroku add-on). If you need to update the user when an async task completes, I recommend looking at it.
In my java-based application, I need a job to read data from a set of tables and insert them into another table. In my first design, I created a oracle job and scheduled it to do the process frequently.
Unfortunately, when the job fails, there is not enough info available about the root causes of the failure. In addition. deploying the system for many system instances has made the work harder.
As an alternative work, I am trying to move the job into my application server, as a Weblogic job. Is this a good design or not?
Having moved my jobs into application server, I have faced the following advantages:
Tracking the job failure is easier.
Non-DBA users can easily read the application Server logs and fix the issues. (Many users do not have access to DB in production line. )
The logic of the job has been moved from my data access layer into my business logic layer and it is more acceptable due to design patterns.