Background tasks in Axis2 - Tomcat stack

Background tasks in Axis2 - Tomcat stack - java

I am working on a Java server side application that needs to provide a SOAP service. For this, we are using Axis2 and deploy in a Tomcat 6 installation.
We have the following issue: we need to run a couple of background threads; one to periodically query another web service for changes in provided data and a second one to monitor and consume data in an MQ.
My question is, what is the best, Java EE, practice to run these background tasks? Should we just run those as background threads that we'll somehow need to tell Tomcat to run at startup? Is there a better way than spawning threads from the web app container?
The system is not large enough to argue breaking it to smaller parts (e.g. run the background tasks in a system deamon with the webservice part being a separate stateless component querying that system deamon). For the same reason we do not have the option to run within a full app server like JBoss (would that make any difference?).
Thanks!
UPDATE:
On a supplementary question, if we just spawned new threads for these tasks (and assuming that this is not common practice), would Tomcat (or Axis) be made more unstable or have any other issues?

I would suggest to use quartz-scheduler for such kind of things. It's simpler than to threads itself and of course more flexible to use. There are interceptors during the start of Tomcat or Axis2 so you can start the scheduler there.

Related

Launch spring batch jobs from control-m

I developed jobs in Spring Batch to replace a data loading process that was previously done using bash scripts.
My company's scheduler of choice is control-m. The old bash scripts were triggered from control-m on file arrival using a file watcher.
For reasons beyond my control, we still need to use control-m. Using Spring Boot or any other framework tied to a webserver is not a possibility.
The safe approach seems to be to package the spring batch application as a jar and trigger from control-m the job using "java -jar", but this doesn't seem the right way considering we have 20+ jobs.
I was wondering if it's possible to trigger the app once (like a deamon) and communicate with it using JMS or any other approach. In this way we wouldn't need to spawn multiple jvms (considering jobs might run simultaneously).
I'm open to different solutions. Feel free to teach me the best way to solve this use case.

The safe approach seems to be to package the spring batch application as a jar and trigger from control-m the job using "java -jar", but this doesn't seem the right way considering we have 20+ jobs.
IMO, running jobs on demand is the way to go, because it is the most efficient way to use resources. Having a JVM running 24/7 and make it run a batch job once a while is a waste of resource as the JVM will be idle between run schedules. If your concern is the packaging side of things, you can package all jobs in a single jar and use the spring boot property spring.batch.job.names at launch time to specify which job to run.
I was wondering if it's possible to trigger the app once (like a deamon) and communicate with it using JMS or any other approach. In this way we wouldn't need to spawn multiple jvms (considering jobs might run simultaneously).
I would recommend to expose a REST endpoint in your JVM and implement a controller that launches batch jobs on demand. You can find an example in the Running Jobs from within a Web Container section. In this case, the job name and its parameters could passed in as request parameters.
Another way is to use a combination between Spring Batch and Spring Integration to launch jobs using JMS requests (JobLaunchRequest). This approach is explained in details with code examples in the Launching Batch Jobs through Messages.

In addition to the helpful answer from Mahmoud, Control-M isn't great with daemons. Sure, you can launch a daemon but anything running for a substantial length of time (i.e. into several weeks and beyond) is prone to error and you can often end up with daemons that are running that Control-M is no longer "aware" of (e.g. if the system has issues that cause the Control-M Agent to assume the daemon job has failed and then launches another one).
When I had no other methods available I used to add daemons as batch jobs in Control-M but there was an overhead in additional jobs that checked for multiple daemons, did the stop/starts and various housekeeping tasks. Best avoided if possible.

Ruby on rails with jRuby

I'm working on a Ruby on Rails app, currently hosted on Heroku.
We have about 5 web dynos and about 2 worker process running on average. But because we're using adeptscale these can change a lot, and the cost is increasing from month to month.
We're thinking about changing the process and the infrastructure (using our own, off of amazon/google etc). And also because of the performance, access to java libraries and other gains we're planning to go with jRuby.
I haven't got much experience with jRuby at all, but I do have Java experience. So I have a few questions:
Question intro: Since rails philosophy/approach differs from Javas, i.e ruby webserver uses far less memory but can only process one request at a time, and so having multiple servers sort of compensates the inability to process multiple requests.
If we go with jRuby (and have our rails project packaged as a war file and deployed to any servlet container i.e Tomcat or Jboss(more than just container)), will we be able to process multiple requests then?
Question intro: Currently we got some application logic running in the workers(instead of blocking the webserver, and not being able to serve other clients/browser clients). i.e when users submit some form and then our app needs to contact the 3rd party service to return the response, we simply let the worker do the workload of getting back from the 3rd party service and updating the ui (which reports waiting status) via websockets that the 3rd party service returned x/y or whatever status.
If we switch to jRuby, how will we achieve the similar logic? I mean do we go with the java code which has some kind of thread pool of workers and then free workers do the workload of contacting the 3rd party service etc? How would we go about this if we decide to go with jRuby?

1) You can serve multiple requests at a time in jruby with nearly any container, but you can also serve multiple requests at a time with mri-ruby. You only have to have a threadsafe app (config.threadsafe! is default in rails4). Different rack servers have different approaches to serve multiple requests at a time. For example unicorn uses multiple processes while passenger or puma go for a multi-threaded approach.
In my experience jruby containers like jboss or tomcat are more complicated to configure properly. But there are things like tourquebox, trinidad that help you with this. But you can even still go for some of the ruby servers (e.g. puma) that dont use c extensions.
2) If I understand you correctly you are looking for some background-processing library? You can use sidekiq or resque with ruby or jruby (while jruby will be faster in general, and its easier to debug memory leaks). You can even use ruby for your rack servers and jruby for your workers (can even be run in parallel with things like rvm/rbenv)
In general I would only go for the jruby option if you know what you are doing and need better performance for your app servers or if you want to speed up your worker servers. If I was you I would probably stay in the ruby world and use puma for your app and sidekiq as a background service. Both are very elegant and need not so much configuration.

Yes, JRuby uses Java threads and is really multithreaded. And I can say that it's really good in integration with Java, even using classes for JNI.
I can recommend next servers (some have already been mentioned):
puma (https://github.com/puma/puma)
any servlet container (even IBM WebSphere Application Server!) - just use warbler (https://github.com/jruby/warbler)
The 'simplest' way to run application on servlet container is make .war with warbler. Usually resulting .war file includes all dependencies and JRuby interpreter, so resulting file usually is 30 Mb. But I think that it is not so easy to setup warbler, then I wouldn't recommend this way if you don't really need to run Rails in enterprise Java environment.
And I would just remind that Rails opens DB connection for any request, then default size of DB connection pool of 5 isn't enough - don't forget to increase it before load testing :) (e.g. default thread pool for puma is 16, IBM WAS is 50, Tomcat - 200 threads).
I agree with smallbutton.com that puma is good choice. Finally, with puma you can switch between JRuby and other interpreter almost easy (in my experience there is one difference - gem's names)

Quartz Scheduler - to run in Tomcat or application jar?

We have a web application that receives incoming data via RESTful web services running on Jersey/Tomcat/Apache/PostgreSQL. Separately from this web-service application, we have a number of repeating and scheduled tasks that need to be carried out. For example, purging different types of data at different intervals, pulling data from external systems on varying schedules, and generating reports on specified days and times.
So, after reading up on Quartz Scheduler, I see that it seems like a great fit.
My question is: should I design my Quartz-based scheduling application to run in Tomcat (via QuartzInitializerListener), or build it into a standalone application to run as a linux daemon (e.g., via Apache Commons Daemon or the Tanuk Java Service Wrapper).
On the one hand, it strikes me as counterintuitive to use Tomcat to host an application that is not geared towards receiving http calls. On the other hand, I haven't used Apache Commons Daemon or the Java Service Wrapper before, so maybe running inside Tomcat is the path of least resistance.
Are there any significant benefits or dangers with either approach that I should be aware of? Our core modules already take care of data access, logging, etc., so I don't see that those services are much of a factor either way.
Our scheduling will be data driven, so our Quartz-based scheduler will read the relevant data from PostgreSQL. However, if we run the scheduling application within Tomcat, is it possible/reasonable to send messages to our application via http calls to Tomcat? Finally, fwiw, since our jobs will be driven by our existing application data, I don't see any need for the Quartz JDBCJobStore.

To run a Java standalone application as linux daemon, simply end the java-command with an & -sign so that it runs in the background and put it in an Upstart-script for example.
As for the design: in this case I would go for whatever is easier to maintain. And it looks like running an app in Tomcat is already familiar. One benefit that comes to mind is that configuration files (for the database for example) can be shared/re-used so that only one set of configuration files needs to be maintained.
However, if you think the scheduled tasks can have a significant impact on resource usage, then you might want to run the tasks on a separate (virtual) machine. Since the timing of the tasks is data driven, it is hard to predict the exact load. E.g. it could happen that all the different tasks are executed at the same time (worst case/highest load scenario). Also consider the complexity of the software for the scheduled tasks and the related risk of nasty bugs: if you think there is a low chance of nasty bugs, then running the tasks in Tomcat next to the web-service is a good option, if not, run the tasks as a separate application. Lastly, consider the infrastructure in general: production line systems (providing (a continuous flow of) data processing critical to business) should be separate from non-production line systems. E.g. if the reports are created an hour later than usual and the business is largely unaffected, then this is non-production line. But if the web-service goes down and business is (immediatly) affected, then this is production line. Purging data and pulling updates is a bit gray: depends on what happens if these tasks are not performed, or later.

Decouple web services from other backend heavy computing service in Java

Background of the web application:
I am using java/spring-mvc/tomcat to provide my web service as well as exposing my restful API to mobile clients. I am happy with everything on the web surface right now. The problem is that my application has a really heavy computing process at its core, which invokes a separate Java program to process the images and return computed data back to the web service.
It sometime eats up lots of my EC2 instance memory, or causes an exception that shuts down my Tomcat7 server.
Question:
Right now everything is running under same tomcat7 container, and I am seeking a solution to decouple those two so that I can install them in different server, perhaps find a high memory server for computing program alone.
What are the options out there that allow me to decouple them and improve scalability and stability?
Update:
I can invoke computing engine programmatically or from command line.
Update2:
I have done some researches based on the answer. When I read on another post about What exactly is Apache Camel?, I feel I should probably learn a little more about EIP patterns. Hopefully, it is not overkill.
Solution based on suggestion
After reading through the EIP concept, camel in action, activemq, I finally come up with a solution. It might not be elegant, but it's working. Suggestion and comments would be appreciated!
I wrote a queue router based on apache-camel , connecting to activemq broker and running as standalone program in one server. The computing engine running in standalone container and the router is responsible to process jms requestor from my spring container in web server. Later on I just need to config load balance for computing engine from camel if further intensive computing is needed.

The one which are pointing right now is adding more hardware. You need to think through if this solves your problem. Eg: If you are using a 32 bit JVM there are limitations on how much heap size you can specify. If you are lucky to have a 64 bit JVM them then you will have a bigger room for memory. But there is always the possibility of using too much CPU where your application becomes unresponsive.
I prefer breaking the compute intensive tasks into jobs and work them out in a seperate JVM. Persist your jobs in a datastore/JMS so that they do not get lost. Be careful if you are doing DB updates from those jobs to avoid any locking.

If I understand correctly, it seems you need a load balancer.
Have a load balancer to route to one of multiple instances of your webservice/compute engine. You can achieve this using an esb, routing engine, clustered, master-slave, distributed-cache etc most of them interrelated.
And you can also spin up additional nodes realtime on EC2 based on load.
Else, if the task can be broken, then delegate it to multiple nodes/services. You will need some orchestration mechanism.
There are open source solutions that can address 1 and 2 above.

Does the backend work synchronously? I mean, when the mobile clients requests something do they have to wait for the backend to do a lot of processing?
If yes, you can grow horizontally, putting more worker nodes (backend webapps) and a front Nginx or any balancer. It's the fastest way.
Do you have reutilizable data? if yes, you can use something like memcached.
Hope it helps, if you give us more information I'm pretty sure that we will provide better advice.

quarts or simple pojo

I'm writing an java based app (not web app) and it should be able to run standalone without any container the task it carries are below:
windows scheduler fires off either quartz or simple POKO
pick up file(s) during midnight
import the data into DB
move the files over from original destination to another drive
Now, the dilemma I'm having is I've been reading around and it appears quartz need web container to function.
Is that correct AND what would be most simple and durable solution?

According your question: Quartz does not need a web container, it can be run in any java application. See Quartz Quickstart Guide for how to configure Quartz.
If you use Quartz the windows scheduler shouldn't be necessary, but this implies that your java application is running constantly.
I think Quartz has the advantage, that you can configure your application in one place and do not need to consider os specific scheduling. Further more Quartz makes you independent of the os specific scheduling mechanism.
But: All this advantages are not relevant if your application is not running all the time.
On the other hand if you want it to be a fire and forget like application, that runs, does its work and then quits again, you will be on the safe side to delegate the task of scheduling to the operation system your application runs on.
So, for this specific context I think using the operation system's scheduling mechanism is the better option.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.