Currently I have a Java (and a half ported python version) app that runs in the background that has a queue of jobs (currently read out of a mysql database) which handles thread sleep/waking to share resources based on the job priority and running time. There is a front end php script that posts jobs to the database which are polled by the system every time interval.
This manner is somewhat inefficient (but nicer than locking issues using a job file) but I can't but wonder if there would be some way to simplify this.
My thoughts were java app (and or python app) sets up http service (jetty?) and has a web interface that directly pushes jobs to the queue without the middleman. Apache is serving other php sites so this would have to run in tandem.
I'm really after some other input as I'd prefer it to be a background service always running - having a cron execute jobs was painful (since some jobs run for 20+ hours so adding new ones was a pain with new php [ no threading] /java calls having to check if a service was running with outstanding jobs to add to instead of starting a new service) but also have a very simple web interface without too much resource wastage.
Thanks for your input.
Deploy a JSP using Tomcat (or similar) that allows the user to post job requests to a job scheduler web service using a webpage. On the backend, use Quartz Scheduler to manage your jobs and just have your web service add jobs to the Quartz queue.
Related
I have a frontend web tool which interacts with REST API written in danjgo. Each API calls take long time to process the call and is also CPU/GPU intensive. So I am planning to run only 1 call at a time and putting rest of the calls in queue. Now I am not sure if Celery with redis can be helpful here or should I stick with job queue approach at the java side.
So, the tool would be used by multiple users and and so each user would have their jobs. So, I need to put the jobs in queue so that they can be processed one by one asynchronously. Would Celery be helpful here?
We have a requirement, where we have to run many async background processes which accesses DBs, Kafka queues, etc. As of now, we are using Spring Batch with Tomcat (exploded WAR) for the same. However, we are facing certain issues which I'm unable to solve using Spring Batch. I was thinking of other frameworks to use, but couldn't find any that solves all my problems.
It would be great to know if there exists a framework which solves the following problems:
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
WHAT I WANT: Bundle all the jars and run each job as a separate process. The framework should store the PID and should be able to manage (stop/force-kill) the job on demand. This way, when we want to update a JAR, the existing process won't be hindered (however, we should be able to stop the existing process from UI), and no other job (running or not) will also be touched.
I have looked at hot-update of JARs in Tomcat, but I'm skeptical whether to use such a mechanism in production.
Sub-question: Will OSGI integrate with Spring Batch? If so, is it possible to run each job as a separate container with all JARs embedded in it?
Spring batch doesn't have a master-slave architecture.
WHAT I WANT: There should be a master, where the list of jobs are specified. There should be slave machines (workers), which are specified to master in a configuration file. There should exist a scheduler in the master, which when needed to start a job, should assign a slave a job (possibly load-balanced, but not necessary) and the slave should update the DB. The master should be able to send and receive data from the slaves (start/stop/kill any job, give me update of running jobs, etc.) so that it can be displayed on a UI.
This way, in case I have a high load, I should be able to just add machines into the cluster and modify the master configuration file and the load should get balanced right away.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
WHAT I WANT: I should be able to set up alerts for jobs in case of failure. If necessary, a job should have a timeout where it should able to notify the user (via email probably) or should force stop the job when the job crosses a specified threshold.
Maybe vertx can do the trick.
Since Spring Batch runs inside one Tomcat container (1 java process), any small update in any job/step will result in restarting the Tomcat server. This results in hard-stopping of all running jobs, resulting in incomplete/stale data.
Vertx allows you to build microservices. Each vertx instance is able to communicate with other instances. If you stop one, the others can still work (if there are not dependant, eg if you stop master, slaves will fail)
Vert.x is not an application server.
There's no monolithic Vert.x instance into which you deploy applications.
You just run your apps wherever you want to.
Spring batch doesn't have a master-slave architecture
Since vertx is even driven, you can easily create a master slave architecture. For example handle the http request in an vertx instance and dispatch them between severals other instances depending on the nature of the request.
Spring batch doesn't have an in-built alerting mechanism in case of job stall/failure.
In vertx, you can set a timeout for each message and handle failure.
Sending with timeouts
When sending a message with a reply handler you can specify a timeout in the DeliveryOptions.
If a reply is not received within that time, the reply handler will be called with a failure.
The default timeout is 30 seconds.
Send Failures
Message sends can fail for other reasons, including:
There are no handlers available to send the message to
The recipient has explicitly failed the message using fail
In all cases the reply handler will be called with the specific failure.
EDIT There are other frameworks to do microservices in java. Dropwizard is one of them, but I can't talk much more about it.
I am a new user of Spark. I have a web service that allows a user to request the server to perform a complex data analysis by reading from a database and pushing the results back to the database. I have moved those analysis's into various Spark applications. Currently I use spark-submit to deploy these applications.
However, I am curious, when my web server (written in Java) receives a user request, what is considered the "best practice" way to initiate the corresponding Spark application? Spark's documentation seems to be to use "spark-submit" but I would rather not pipe out the command to a terminal to perform this action. I saw an alternative, Spark-JobServer, which provides an RESTful interface to do exactly this, but my Spark applications are written in either Java or R, which seems to not interface well with Spark-JobServer.
Is there another best-practice to kickoff a spark application from a web server (in Java), and wait for a status result whether the job succeeded or failed?
Any ideas of what other people are doing to accomplish this would be very helpful! Thanks!
I've had a similar requirement. Here's what I did:
To submit apps, I use the hidden Spark REST Submission API: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
Using this same API you can query status for a Driver or you can Kill your Job later
There's also another hidden UI Json API: http://[master-node]:[master-ui-port]/json/ which exposes all information available on the master UI in JSON format.
Using "Submission API" I submit a driver and using the "Master UI API" I wait until my Driver and App state are RUNNING
The web server can also act as the Spark driver. So it would have a SparkContext instance and contain the code for working with RDDs.
The advantage of this is that the Spark executors are long-lived. You save time by not having to start/stop them all the time. You can cache RDDs between operations.
A disadvantage is that since the executors are running all the time, they take up memory that other processes in the cluster could possibly use. Another one is that you cannot have more than one instance of the web server, since you cannot have more than one SparkContext to the same Spark application.
We are using Spark Job-server and it is working fine with Java also just build jar of Java code and wrap it with Scala to work with Spark Job-Server.
I have a scenario to ask regarding utilizing the EJB Timer Service.
Use case as follows:
The system should be able to schedule a task that will poll/ask our subversion repository for files changes using some particular timestamp.
The idea is that whenever the scheduled task is about to run, it will execute command against a particular svn repository.
For this particular purpose, I will not call any external process but will use the 'pure' java way of using the SVNKit java library http://svnkit.com/
My only concern is this:
Is it a good idea to use the EJB Timer Service to execute task that will call external processes? My way will use a 'pure' java way but in other scenario such as calling a batch file/command line/external executable directly into the timer service logic.
I worry about the effects of server memory use/performance etc.
Is this a good idea?
The other thought that I am thinking is to just create a 'desktop' application in the server using client based technology such as SWT/Swing that will do the polling and then code the logic there but this will mean that I need to manage two applications. The 'desktop' app that will poll and the 'web' user interface that I will create in Glassfish.
I am leaning towards doing everything in the App server of my choice which is glassfish.
I have used EJB Timer before but it only calls against the database without calling any extenral service and it's just that this scenario came up so I raised a question here to gather more thoughts from those who have experienced doing this.
Any thoughts?
In theory, EJBs aren't supposed to depend on external I/O since it interferes with the container/server's management of bean instances, threads, etc.
In practice, this should work if you take precautions. For example:
isolate the function to its own EJB (i.e., a stateless session bean that only handles these timers) to avoid instance pooling issues
use timeouts while waiting for commands to avoid hung processes from hanging all server threads
ensure that you don't schedule timers so that you have multiple OS commands run simultaneously
Keep in mind that EJB 3.0 timers are persistent (vs EJB 3.1 timers, which have the option of being non-persistent), which means:
They can run on any server in a cluster. If you have multiple machines in your cluster, you need to ensure that they are all capable of running the command.
They survive server restarts. If you schedule a timer to run but the server crashes before it can, it will run when the server restarts. This can cause particular problems for interval timers (all missed timers will fire repeatedly) and if you don't carefully manage existing times (you can easily create redundant timers).
The premise is this: For asynchronous job processing I have a homemade framework that:
Stores jobs in database
Has a simple java api to create more jobs and processors for them
Processing can be embedded in a web application or can run by itself in on different machines for scaling out
Web UI for monitoring the queue and canceling queue items
I would like to replace this with some ready made library because I would expect more robustness from those and I don't want to maintain this. I've been researching the issue and figured you could use JMS for something similar. But I would still have to build a simple java API, figure out a runtime where I would put the processing when I want to scale out and build a monitoring UI. I feel like the only thing I would benefit from JMS is that I would not have to do is the database stuff.
Is there something similar to this that is ready made?
UPDATE
Basically this is the setup I would want to do:
Web application runs in a Servlet container or Application Server
Web application uses a client api to create jobs
X amount of machines process those jobs
Monitor and manage jobs from an UI
You can use Quartz:
http://www.quartz-scheduler.org/
Check out Spring Batch.
Link to sprint batch website: http://projects.spring.io/spring-batch/