Creating thread for background task servlet - java

The scenario of my problem is:
In my servlet I get a large amount of data from somewhere (not relevant). I have to iterate over all this data and put it in an array, convert it to a JSON object and send it to the client side for viewing. If I do this in a single response it takes a very long time to display the results. Hence, I need to do multithreading.
The created thread needs to keep on adding data to the list while the main thread whenever it gets a request (requests for data keep on coming periodically) sends the present available list.
For instance on first request the response sent is : 1 2 3
Second request : 4 5 6 and so on.
Now I come to actual problem : I don't know how to do multithreading in a servlet. I have looked through numerous resources and examples but it only has confused me further. Some examples have created threads right in doGet which I think is very wrong, some have created them in the init() method but I dont know how can I pass parameters and get results from the thread if it is declared in the init method (It cannot be a global variable). Then there are examples of servletContextListener but I havent found anything useful or that makes sense.
Can anyone please guide to me a reliable source or just give me some sort of pseudo code to get a solution to my problem. It would be extremely helpful if the answers are in context with the aforementioned scenario.
Thanks

The created thread needs to keep on adding data to the list while the
main thread whenever it gets a request (requests for data keep on
coming periodically) sends the present available list.
If I got you correct, you like to get some data as background service and make them ready for clients once they request them(sounds like harvesting data).
Well, creating thread in web-apps, or generally stuffs come with managed environment is different, creating a thread implicitly would cause of memory leak.
One good solution would having a ThreadPool(either by container context/ndi or create it manually).
AND it MUST be created in a manageable manner, where you would control it by environment related events.
ContextListener is your friend, having a context listener class, like this.
public class dear_daemon implements ServletContextListener,Runnable{
ExecutorService the_pool;
Thread the_evil;
/*following get invoked once the context is called*/
public void contextInitialized(ServletContextEvent sce){
/*initialize the thread-pool, and run evil thread*/}
/*following get invoked once the context is destroying*/
public void contextDestroyed(ServletContextEvent sce){eviling=false;
/*stop evil(this) thread(first), then destroy thread pool*/
}
volatile boolean eviling=true;
public void run(){
while(eviling){
/*run Runnable instance which do data fetching using thread-pool*/
}
}
}
And register the listener in web.xml
<listener>
<listener-class>dudes.dear_daemon</listener-class>
</listener>
Having a class(runnable) which do the data fetching, and invoke it by evil thread, each instance using one thread.
The ContextLisstener helps you correctly shutdown and manage init and hult events by container, using the same thing with servlet init is possible, but make sure you do the same thing about hulting with destroy method of servlet.
If you like to do thread-thing about it, make sure you are doing things thread-safe since you have one thing to store data(a list).
If any synchronization is needed(for example ordering the fetched data), make sure you are doing it right, or you will face with deadlocks, or low-performance code.
If any(probably) IO action is needed for getting data, note java IO is blocking, so set appreciated read and connection timeouts, or switch to NIO if you can handle complex NIO stuffs.
If applying these changes make the environment complex, and you like to do alternative solutions, you may simply extract the data fetching from web-profile and run it as a external daemon-service or applciation, where the applciation will pass the fetched data to the server context using calling one of your CGI/Servlet.

Related

if multiple requests are handled by a server to run a single servlet then where we need to take care of synchronization?

If multiple requests are handled by a server to run a single servlet then where we need to take care of synchronization?
I have got the answer from How does a single servlet handle multiple requests from client side how multiple requests are handled. But then again there is a question that why we need synchronization if all requests are handled separately?
Can you give some real life example how a shared state works and how a servlet can be dependent? I am not much interested in code but looking for explanation with example of any portal application? Like if there is any login page how it is accessed by n number of users concurrently.
If more than one request is handled by the server.. like what I read is server make a thread pool of n threads to serve the requests and I guess each thread will have their own set of parameters to maintain the session... so is there any chance that two or more threads (means two or more requests) can collide with each other?
Synchronization is required when multiple threads are modifying a shared resources.
So, when all your servlets are independent of each other, you don't worry about the fact that they run in parallel.
But, if they work on "shared state" somehow (for example by reading/writing values into some sort of centralized data store); then you have to make sure that things don't go wrong. Of course: the layer/form how to provide the necessary synchronization to your application depends on your exact setup.
Yes, my answer is pretty generic; but so is your question.
Synchronization in Java will only be needed if shared object is mutable. if your shared object is either read-only or immutable object, then you don't need synchronization, despite running multiple threads. Same is true with what threads are doing with an object if all the threads are only reading value then you don't require synchronization in Java.
Read more
Basically if your servlet application is multi-threaded, then data associated with servlet will not be thread safe. The common example given in many text books are things like a hit counter, stored as a private variable:
e.g
public class YourServlet implements Servlet {
private int counter;
public void service(ServletRequest req, ServletResponse, res) {
//this is not thread safe
counter ++;
}
}
This is because the service method and Servlet is operated on by multiple thread incoming as HTTP requests. The unary increment operator has to firstly read the current value, add one and the write the value back. Another thread doing the same operation concurrently, may increment the value after the first thread has read the value, but before it is written back, thus resulting in a lost write.
So in this case you should use synchronisation, or even better, the AtomicInteger class included as part of Java Concurrency from 1.5 onwards.

At what point can I trust that data has been saved to mysql?

After I create a new object 'Order', I would like to get its generated ID and put it on an AMQP queue, so that a worker can do other stuff with it. The worker takes the generated ID (message) and looks up the order but complains that no record exists, even though I just created one. I am trying to figure out either how long to wait for after I call my .persist() before I put the message (generated ID) on the queue (which I dont think is a good idea); have the worker loop over and over until mysql DOES return a record (which I dont like either); or find a point where I can put the message on the queue after I know the data is safe in mysql (this sounds best). Im thinking that it needs to be done outside of any #Transactional method.
The worker that is going to read the data back out of mysql is part of a different system on a different server. So when can I tell the worker that the data is in mysql so that it can get started with its task?
Is it true that after the #Transactional method finishes the data is done being written to mysql, I am having trouble understanding this.
Thanks a million in advanced.
Is it true that after the #Transactional method finishes the data is
done being written to mysql, I am having trouble understanding this.
Thanks a million in advanced.
So first, as Kayamann and Ralf wrote in comments, it is guaranteed that data is stored and available for other processes when the transaction commits (ends)
#Transactional methods are easy to understand. When you have #Transactional method, it means that the container (application that is going to actually invoke that method) will begin the transaction before the method is invoked, and auto commit or rollback the transaction in case of success or error.
So if we have
#Transactional
public void modify(){
doSomething();
}
And when you call somewhere in the code (or invokation via contaier eg due to some bindings) the actuall frol will be as follows
tx=entityManager.beginTransaction();
object.modify();
tx.commit();
There is quite simple. Such approach will mean that transactions are Container Controlled
As four your situation, well to let your external system know that transaction has been complete, you have to either use message queue (that you are using already) with the message that transaction is complete for some id and it can start processing stuff, or use different technology, REST for example.
Remote systems can signal eachoter for various of events via queues and REST services, so there is no difference.

Multithread GAE servlets to handle concurrent users

I'd like to multithread my GAE servlets so that the same servlet on the same instance can handle up to 10 (on frontend instance I believe the max # threads is 10) concurrent requests from different users at the same time, timeslicing between each of them.
public class MyServlet implements HttpServlet {
private Executor executor;
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response) {
if(executor == null) {
ThreadFactory threadFactory = ThreadManager.currentRequestFactory();
executor = Executors.newCachedThreadPoolthreadFactory);
}
MyResult result = executor.submit(new MyTask(request));
writeResponseAndReturn(response, result);
}
}
So basically when GAE starts up, the first time it gets a request to this servlet, an Executor is created and then saved. Then each new servlet request uses that executor to spawn a new thread. Obviously everything inside MyTask must be thread-safe.
What I'm concerned about is whether or not this truly does what I'm hoping it does. That is, does this code create a non-blocking servlet that can handle multiple requests from multiple users at the same time? If not, why and what do I need to do to fix it? And, in general, is there anything else that a GAE maestro can spot that is dead wrong? Thanks in advance.
I don't think your code would work.
The doGet method is running in threads managed by the servlet container. When a request comes in, a servlet thread is occupied, and it will not be released until doGet method return. In your code, the executor.submit would return a Future object. To get the actual result you need to invoke get method on the Future object, and it would block until the MyTask finishes its task. Only after that, doGet method returns and new requests can kick in.
I am not familiar with GAE, but according to their docs, you can declare your servlet as thread-safe and then the container will dispatch multiple requests to each web server in parallel:
<!-- in appengine-web.xml -->
<threadsafe>true</threadsafe>
You implicitly asked two questions, so let me answer both:
1. How can I get my AppEngine Instance to handle multiple concurrent requests?
You really only need to do two things:
Add the statement <threadsafe>true</threadsafe> to your appengine-web.xml file, which you can find in the war\WEB-INF folder.
Make sure that the code inside all your request handlers is actually thread-safe, i.e. use only local variables in your doGet(...), doPost(...), etc. methods or make sure you synchronize all access to class or global variables.
This will tell the AppEngine instance server framework that your code is thread-safe and that you are allowing it to call all of your request handlers multiple times in different threads to handle several requests at the same time. Note: AFAIK, It is not possible to set this one a per-servlet basis. So, ALL your servlets need to be thread-safe!
So, in essence, the executor-code you posted is already included in the server code of each AppEngine instance, and actually calls your doGet(...) method from inside the run method of a separate thread that AppEngine creates (or reuses) for each request. Basically doGet() already is your MyTask().
The relevant part of the Docs is here (although it doesn't really say much): https://developers.google.com/appengine/docs/java/config/appconfig#Using_Concurrent_Requests
2. Is the posted code useful for this (or any other) purpose?
AppEngine in its current form does not allow you to create and use your own threads to accept requests. It only allows you to create threads inside your doGet(...) handler, using the currentRequestThreadFactory() method you mentioned, but only to do parallel processing for this one request and not to accept a second one in parallel (this happens outside doGet()).
The name currentRequestThreadFactory() might be a little misleading here. It does not mean that it will return the current Factory of RequestThreads, i.e. threads that handle requests. It means that it returns a Factory that can create Threads inside the currentRequest. So, unfortunately it is actually not even allowed to use the returned ThreadFactory beyond the scope of the current doGet() execution, like you are suggesting by creating an Executor based on it and keeping it around in a class variable.
For frontend instances, any threads you create inside a doGet() call will get terminated immediately when your doGet() method returns. For backend instances, you are allowed to create threads that keep running, but since you are not allowed to open server sockets for accepting requests inside these threads, these will still not allow you to manage the request handling yourself.
You can find more details on what you can and cannot do inside an appengine servlet here:
The Java Servlet Environment - The Sandbox (specifically the Threads section)
For completeness, let's see how your code can be made "legal":
The following should work, but it won't make a difference in terms of your code being able to handle multiple requests in parallel. That will be determined solely by the <threadsafe>true</threadsafe> setting in you appengine-web.xml. So, technically, this code is just really inefficient and splits an essentially linear program flow across two threads. But here it is anyways:
public class MyServlet implements HttpServlet {
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response) {
ThreadFactory threadFactory = ThreadManager.currentRequestThreadFactory();
Executor executor = Executors.newCachedThreadPool(threadFactory);
Future<MyResult> result = executor.submit(new MyTask(request)); // Fires off request handling in a separate thread
writeResponse(response, result.get()); // Waits for thread to complete and builds response. After that, doGet() returns
}
}
Since you are already inside a separate thread that is specific to the request you are currently handling, you should definitely save yourself the "thread inside a thread" and simply do this instead:
public class MyServlet implements HttpServlet {
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response) {
writeResponse(response, new MyTask(request).call()); // Delegate request handling to MyTask object in current thread and write out returned response
}
}
Or, even better, just move the code from MyTask.call() into the doGet() method. ;)
Aside - Regarding the limit of 10 simultaneous servlet threads you mentioned:
This is a (temporary) design-decision that allows Google to control the load on their servers more easily (specifically the memory use of servlets).
You can find more discussion on those issues here:
Issue 7927: Allow configurable limit of concurrent requests per instance
Dynamic Backend Instance Scaling
If your bill shoots up due to increased latency, you may not be refunded the charges incurred
This topic has been bugging the heck out of me, too, since I am a strong believer in ultra-lean servlet code, so my usual servlets could easily handle hundreds, if not thousands, of concurrent requests. Having to pay for more instances due to this arbitrary limit of 10 threads per instance is a little annoying to me to say the least. But reading over the links I posted above, it sounds like they are aware of this and are working on a better solution. So, let's see what announcements Google I/O 2013 will bring in May... :)
I second the assessments of ericson and Markus A.
If however, for some reason (or for some other scenario) you want to follow the path that uses your code snippet as a starting point, I'd suggest that you change your executor definition to:
private static Executor executor;
so that it becomes static across instances.

Java Web Service background process to update service data

I have a simple Tomcat 7 Server where I want to implement a Java Web Service which offers some data I can get via my mobile phone.
The point is I want the data on the server being updated every once in a while. So I need a "background process" which updates the data.
I first tried to start a new thread in the constructor of my binding implementation class (which implements only my own Service - not a HttpServlet or so) like
public NewBindingImpl(){
Thread informationFetcher = new InformationFetcher();
informationFetcher.start();
}
But I didn't think about the fact that this class gets created every time someone is using the service. Further more this would update the data only the moment I ask for them. But how could I update them lets say every two hours or so?
Hopefully someone here has an idea. Is that even possible for a "simple" web service?
Thank you very much,
Tobias
EDIT: ----
Maybe it helps to know that I tried this very basic tutorial here:
http://www.elearning.witnut.com/230/java-web-service-creation-using-top-development-approach/
Why not initialise the thread when the servlet's init() method is called ? You can shut it down when the corresponding destroy() method is called. The thread will be bound to the lifecycle of the servlet and since init() is only called once, you won't have to worry about multiple instances.
Here's a brief tutorial on the init() method usage.
Since you want something running every two hours, check out the Timer class. For more complex scenarios Quartz is a serious contender.

Is a thread guaranteed for the entire request handled by a servlet?

I am running into a situation where I use ThreadLocal static variable to hold a bean that contains various metrics values from different classes during the lifecycle of the request. In a filter I create the bean and set it in a thread local variable and remove it from the thread local variable in the same filter after request has been processed. What I am running into is that the bean containing values from other requests! The only explanation for this is the thread being shared to process multiple requests at the same time. So the question in the title.
While one thread will generally process a single request (speaking about tomcat, for sure), the thread may process multiple requests over time but not w/o finishing the existing request, unless using include/forward alikes.
I'd VERY strognly recommend you to use attribute (setAttribute()) of the said request w/ your bean and use it for profiling. If you can't provide the request to various methods... well you are stuck w/ the ThreadLocal [which is not so bad solution].
Alternatively you can post the code how you install/remove the threadLocal bean.
Keep in mind that you have to to some managing the of that bean as well (it will not be available outside the request).
Edit: forgot to ask: do you use try/finally calling doFilter(...)?
the code should be like that
installBean();
try{
chain.doFilter(req, resp);
}finally{
Bean b = deinstallBean();
useTheMetrics(b);
//potentially, process exception, etc
}
It could also be that your filter is not always called in the sequence you expect it to be. Threads are reused to process multiple requests one after another, so if the removal of the value in the ThreadLocal does not happen, it will still be there when the thread processes its next request.
Yes, you can assume that a single thread will process each request.
Use a finally block to clear (set to null) the ThreadLocal in the filter after processing the rest of the chain. That will prevent data from previous requests from being mingled with the current request.

Categories