I have a shell script which I'd like to trigger from a J2EE web app.
The script does lots of things - processing, FTPing, etc - it's a legacy thing.
It takes a long time to run.
I'm wondering what is the best approach to this. I want a user to be able to click on a link, trigger the script, and display a message to the user saying that the script has started. I'd like the HTTP request/response cycle to be instantaneous, irrespective of the fact that my script takes a long time to run.
I can think of three options:
Spawn a new thread during the processing of the user's click. However, I don't think this is compliant with the J2EE spec.
Send some output down the HTTP response stream and commit it before triggering the script. This gives the illusion that the HTTP request/response cycle has finished, but actually the thread processing the request is still sat there waiting for the shell script to finish. So I've basically hijacked the containers HTTP processing thread for my own purpose.
Create a wrapper script which starts my main script in the background. This would let the request/response cycle to finish normally in the container.
All the above would be using a servlet and Runtime.getRuntime().exec().
This is running on Solaris using Oracle's OC4J app server, on Java 1.4.2.
Please does anyone have any opinions on which is the least hacky solution and why?
Or does anyone have a better approach? We've got Quartz available, but we don't want to have to reimplement the shell script as a Java process.
Thanks.
You mentioned Quartz so let's go for an option #4 (which is IMO the best of course):
Use Quartz Scheduler and a org.quartz.jobs.NativeJob
PS: The biggest problem may be to find documentation and this is the best source I've been able to find: How to use NativeJob?
I'd go with option 3, especially if you don't actually need to know when the script finishes (or have some other way of finding out other than waiting for the process to end).
Option 1 wastes a thread that's just going to be sitting around waiting for the script to finish. Option 2 seems like a bad idea. I wouldn't hijack servlet container threads.
Is it necessary for your application to evaluate output from the script you are starting, or is this a simple fire-and-forget job? If it's not required, you can 'abuse' the fact that Runtime.getRuntime().exec() will return immediately with the process continuing to run in the background. If you actually wanted to wait for the script/process to finish, you would have to invoke waitFor() on the Process object returned by exec().
If the process you are starting writes anything to stdout or stderr, be sure to redirect these to either log files or /dev/null, otherwise the process will block after a while, since stdout and stderr are available as InputStreams with limited buffering capabilites through the Process object.
My approach to this would probably be something like the following:
Set up an ExecutorService within the servlet to perform the actual execution.
Create an implementation of Callable with an appropriate return type, that wraps the actual script execution (using Runtime.exec()) to translate Java input variables to shell script arguments, and the script output to an appropriate Java object.
When a request comes in, create an appropriate Callable object, submit it to the executor service and put the resulting Future somewhere persistent (e.g. user's session, or UID-keyed map returning the key to the user for later lookups, depending on requirements). Then immediately send an HTTP response to the user implying that the script was started OK (including the lookup key if required).
Add some mechanism for the user to poll the progress of their task, returning either a "still running" response, a "failed" response or a "succeeded + result" response depending on the state of the Future that you just looked up.
It's a bit handwavy but depending on how your webapp is structured you can probably fit these general components in somewhere.
If your HTTP response / the user does not need to see the output of the script, or be aware of when the script completes, then your best option is to launch the thread in some sort of wrapper script as you mention so that it can run outside of the servlet container environment as a whole. This means you can absolve yourself from needing to manage threads within the container, or hijacking a thread as you mention, etc.
Only if the user needs to be informed of when the script completes and/or monitor the script's output would I consider options 1 or 2.
For the second option, you can use a servlet, and after you've responded to the HTTP request, you can use java.lang.Runtime.exec() to execute your script. I'd also recommend that you look here : http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-traps.html
... for some of the problems and pitfalls of using it.
The most robust solution for asynchronous backend processes is using a message queue IMO. Recently I implemented this using a Spring-embedded ActiveMQ broker, and rigging up a producing and consuming bean. When a job needs to be started, my code calls the producer which puts a message on the queue. The consumer is subscribed to the queue and get kicked into action by the message in a separate thread. This approach neatly separates the UI from the queueing mechanism (via the producer), and from the asynchronous process (handled by the consumer).
Note this was a Java 5, Spring-configured environment running on a Tomcat server on developer machines, and deployed to Weblogic on the test/production machines.
Your problem stems from the fact that you are trying to go against the 'single response per request' model in J2EE, and have the end-user's page dynamically update as the backend task executes.
Unless you want to go down the introducing an Ajax-based solution, you will have to force the rendered page on the user's browser to 'poll' the server for information periodically, until the back-end task completes.
This can be achieved by:
When the J2EE container receives the request, spawn a thread which takes a reference to the session object (which will be used to write the output of your script)
Initialize the response servlet to write an html page which will contain a Javascript function to reload the page from the server at regular intervals (every 10 seconds or so).
On each request, poll the session object to display the output stored by the spawned thread in step 1
[clean-up logic can be added to delete the stored content from the session once the thread completes if needed, also you can set any additional flags in the session for mark state transitions of the execution of your script]
This is one way to achieve what you want - it isn't the most elegant of all approaches, but it is essentially due to needing to asynchronously update your page content from the server , with a request/response model.
There are other ways to achieve this, but it really depends on how inflexible your constraints are. I have heard of Direct Web Remoting (although I haven't played with it yet), might be worth taking a look at Developing Applications using Reverse-Ajax
Related
It's my first SO question so be patient with me :)
I'm trying to create a service that:
Receives HTTP GET requests containing a URL to query
For a single GET request the service extracts the URL
Queries a local DB about the URL
If a result was found in the DB it will return it to the client and if not it will need to query some external services (that may take relatively long time to respond)
Return the result of the URL to the client
I'm running this on a virtual machine and Tomcat7 with spring.
I'll apologize in advance and mention that I'm pretty new to Tomcat
Anyway, I'm expecting a lot of concurrent GET requests to this service (hundreds of thousands of simultaneous requests)
What I'm basically trying to achieve is to make this service as scalable as possible (and if that's not possible then at least a service that can handle hundreds of thousands of simultaneous requests)
I've been reading A LOT about asynchronous requests handling in services and especially in Tomcat but I have some things that are still unclear to me:
From the official tomcat website it seems that Tomcat contains number of acceptor threads and number of working threads.
If so, why should I use AsyncContext? Whats the benefit of releasing a Tomcat's working thread and occupying a different thread in my application to do the exact same actions? (there's still 1 active thread in the system)
Somewhat similar to the first question but are there any benefits for creating the AsyncContext and using it with a different thread? (a thread from a thread pool created in my application)
Regarding the same issue, I've seen here that I can also return a Callable or a DeferredResult and process it with either one of Tomcat's threads or with one of my own threads. Are there any benefits for returning a Callable or using a DeferredResult over just processing the AsyncContext from the requests?
Also, If I decide to return a callable, from what thread pool does Tomcat gets the thread to process my callable? Are the threads being used here the same working threads from Tomcat that I previously mentioned? If so, what benefits do I get from releasing one Tomcat working thread and using a different one instead?
I've seen from Oracle's documentation that I can pass AsyncContext a Runnable object that will be processed concurrently, From where do the threads used to execute this Runnable come from? Do I have any control over it? Also, any benefits to passing the AsyncContext a Runnable over just passing the AsyncContext to one my threads?
I apologize for asking so many questions regarding the same things but me and my colleagues are arguing over these things for over a week without any concrete answer.
I have 1 more general question:
What do you think is the best way to make the service I described scalable? (putting aside adding more machines at the moment), could you post any examples or references for the purposed solution?
I'd post more links of links I've been looking at but my current reputation doesn't allow it.
I'll be grateful for any understandable references or for concrete examples and I'll obviously be happy to clarify on any relevant issue
Cheers!
There are a lot of questions packed into this, but I'll try to address some of them.
Asynchronous I/O is a good thing, especially on servers that serve large volumes of requests - it allows you to use fewer threads to process more requests. In the case of a proxy such as you are writing, you really want your HTTP client (that makes the requests to foreign URLs) to be asynchronous as well, so that neither processing the request nor receiving the remote response involves blocking I/O.
That said, you may have a harder time doing this stuff with Tomcat or Java EE servers in general, which have had asynchronous I/O bolted onto them as an afterthought, than using a framework like Netty that is asynchronous from the ground up. As the author of a framework which builds on top of Netty, I'm a bit biased.
To demonstrate how little code you'd need to do what you describe, I wrote a small server that does what you describe here in 3 Java source files and put it on github - it builds a standalone JAR you can run with java -jar to try it out, and I tried to comment it clearly.
What it comes down to is, networked applications spend most of their time waiting for I/O to happen. In the case of a proxy in particular, with traditional, threaded I/O, you would get a request, and the thread that received the request would be responsible for answering it synchronously - that means, if it has to make a network request to another server, that thread is blocked waiting for the answer to come from the remote server. Meaning that thread can't be used for anything else. So, if you have 10 threads, and all of them are waiting on responses, your server can't answer any more requests until one of them finishes and frees up a thread. With asynchronous I/O, you get a callback when some I/O completes. In other words, instead of standing still until the OS flushes your data to the socket and out the network card, your code simply gets a friendly tap on the shoulder when there is something to do (like a response arriving from your proxy request). While your code is waiting for that HTTP request to complete, the thread that sent the proxy request is free to be used to handle another request That means one thread can do a little work on one request, do a little work on another, and another, and eventually finish the first request. Since threads are a finite resource provided by your operating system, this allows you to do a lot more with a lot less hardware.
As to Callable vs. DeferredResult, using a Callable just moves when the work happens around (the Callable gets executed later, on some thread or other, but is still expected to do return a result synchronously); DeferredResult sounds more like what you'd need, since that allows your code to go off and do whatever work it wants, and then set the result (triggering completion of the response) whenever it has something to set.
Honestly, I think if you want to implement this really efficiently, you'd be better off staying away from the Java EE stack - so much of it has baked in assumptions that I/O is synchronous that trying to do async stuff with it is swimming upstream (for example, JDBC has synchronous I/O baked into its bones - if you really want this to scale and you want to use an SQL database, you'd be better off with something like this ).
For another example of using Netty for this sort of thing, see the tiny-maven-proxy project - the code is less pretty, but it shows an example of doing an HTTP proxy where the response body is fed to the client chunk-by-chunk, as it arrives - so you never actually pull the full response body into memory, meaning even requests with huge responses won't run the proxy out of memory. Tiny-maven-proxy also caches on the filesystem. I didn't do those things in the demo because it would have made the code more complicated.
I have a jsp/servlet based web app.
I have a button "Clean Up" which calls a servlet and the request goes till DAO class.DAO class performs different DB activities like, moving data from Master table to backup table, then deleting data from master table etc.
As of now this activity is Synchronous and user needs to wait until a response is sent.
I want to implement the same scenario as an Asynch task with user just getting a message as
" Clean Up Activities Triggered"
What could be the best/easiest way to perform this task. I cannot use scheduler.
My Container is TomCat.
Simplest but a different solution for this could be to use some AJAX behavior in the client side. There are lot of simple/powerful frameworks(JS files) to help you achieve AJAX in your page. Using AJAX, you just submit the request asynchronously and display the client side message "Clean Up Activities Triggered", while request is being processed in the server side. If user wait, server process will return and display a "success" message otherwise user is free to navigate other pages or perform other actions.
ExecutorService is the most robust solution. Creating a simple thread is enough as well. However the bigger problem is synchronization. Use Semaphore to control whether two users aren't cleaning up simultaneously.
we did this for our project once and it worked pretty well.
We sent the 200 ok to the user as long as there no issues processing the request. And we used the java executorservice to do the cleanup.
And in case something went wrong notified the user separately.
I have a servlet that takes a couple of minutes to process and return its response. It is running in a somewhat restricted environment (Amazon Elastic Beanstalk). In this environment, there is a 60 second limit on request times and that is not configurable.
What are my options here? I thought of having the servlet start a thread and have the browser poll with AJAX, but I have seen so many people recommend against servlets starting threads for various reasons.
Another solution would be to have a thread start and end in the application's context listener, but I have many different servlets in the app that perform various functions, all of which have the same issue. A single thread running in the background would not really help.
Any suggestions?
Edit: With a little bit of more research in SO, I found that an Executor
is what I need.
See BalusC's answer here
See skaffman's answer here
Yes, it is not the best practice to start threads programmaticaly into servelet container. But this restriction is not so strict. IMHO you can do it if you really need. But if you are starting such solution implement it step-by-step.
First just try if this works. Open new thread to process your long request. While it is being processed send some kind of "keep-alive" from the "main" thread of your servlet. When processing is done send response to client.
Probably better and more scalable solution is to use messaging (e.g. JMS) for asynchronous processing of long requests. When request is received servlet should just create JMS message , enqueue it and immediately return. The other side (that implements MessageListener) should process message and put the result into outgoing queue. Client should request the result from this queue. The is the clear solution, it will work in clustered and multi-machine environment but it requires more efforts.
So, you choice should depend on your requirements, resources and time.
The best way to address this is using the Executor (see the update in my question). I used this in my project and it has worked seamlessly.
if i write the comet push with php but use this code on a java server via quercus, will that solve the one process per request problem that apache had and scale well with lot of users using my chat?
Yes, Quercus solves the one process per request Apache bottleneck. However, you need to understand the possible bottlenecks of the JVM. In my opinion, though, you should write the service or app in C/C++ using something like libevent, in Erlang, in Google Go, or simply as a Java servlet simply for portability's sake.
Well, Quercus runs on the (J)VM so it can run with other code that can start threads. But why do you need threads to do chat? You simply set the timeout on a vanilla PHP request to 0 (no timeout) and wait for there to be something to send back to the user.
That something else will be in response to someone else's request (ie A says "hello" which interrupts B's wait for something to happen). That doesn't require multithreading.
Also you could keep using Apache/PHP and do the above and instead connect to a Java (or other) service via something like XML RPC, which could wait forever. That server could do run multiple threads or do whatever it needs to.
We have a string processing service (c++, uses stdin/out for in/output) that has different layouts, each layout runs separately (eventually will run on separate machines), each layout takes time to load, thats why it must keep running after first run.
I must implement a system with client that will ask the master server to connect it to a relevant slave server which actually runs the relevant layout service. The slave server will communicate the data passed from the client to the service, and when finished will become available on the master server for other clients.
The question is what is the best way to go about implementing the servers? Should I keep an open connection between slave/master until the process is complete to notify the master that the connection is over or keep some sort of var in a synchronized function to check that?
Any other important inputs (or other designs) I have overlooked are also very welcomed, Thanx!
Assuming you can't replace the C++ stuff, here is how I would do it off the top of my head.
I would setup one master server. That server would run a process that accepts requests (probably by HTTP, so it'd be a webservice) and I would have it read the request, parse out what it is, and then call the correct slave. Basically it acts as a proxy. Once it receives the response from the slave it forwards it back to the caller. The simplicity here means that if you start getting more of one type of request, you can set up additional servers for that and round-robin requests to them.
The slaves would be webservices that open the C++ program and forward input and retrieve output. That's all it would do.
I wouldn't bother keeping open connections (except between the slave and the C++ program based on your description). Just using a web request for this stuff will keep the connection between the master and the slave open during the process, but it shouldn't be a problem. This way you don't need to worry about this detail.
Now if I were you I would seriously look at reimplementing the C++ code in Java or calling it via JNI or something. If you can avoid it, I think avoiding the Java wrapper around C++ thing would be a good design goal. The Java could do whatever expensive process it is during start up once, and then hold things ready in memory like the C++ code does.
I hope this helps.
Depending on your scalability needs, you may want to take a look at the Java NIO package. This will give you a starting point to build a scalable, non-blocking server implementation.