Thread Safe Servlet With a Static String - java

I have reviewed a sample of a chat server with Node JS and socket IO at http://ahoj.io/nodejs-and-websocket-simple-chat-tutorial. In that sample a simple history variable was used at server to save chat history data. As the Node Js is single thread every thing works fine. (You can ignore above node JS example if you are not interested in node js :) I will explain it in java below )
Consider below servlet which gets message String from request and add it to an string. This code could be an example of a Chat Server. It gets user messages from request and all it to a history String and other clients can read it.
public class ChatServlet implements Servlet {
private static String history = "";
public void service(ServletRequest request, ServletResponse response)
history = history.concat(request.getParameter("message"));
}
}
Theoretically, this code is not thread-safe as it use a global static variable (How do servlets work? Instantiation, sessions, shared variables and multithreading) .
However, I have tested above code with jMeter with lots of concurrence request and the history string always stores all the messages (So no client message lost or over-written), and nothing went wrong!
I have not work with threads so I wonder if I am missing something here ! Is the above code thread-safe and can it be trusted.

As others have confirmed, this is indeed not thread-safe in that it cannot be trusted. Some quirk in JVM implementation may make this a workable servlet, but there is no guarantee that it will work at another JVM or even at another time.
To add to the variety of proposed implementations, here's one with AtomicReference:
AtomicReference<String> history = new AtomicReference<>("");
public void service(ServletRequest request, ServletResponse response)
history.updateAndGet(h -> h.concat("123"));
}

No, it's not. Thread safety bugs can be difficult to trigger - maybe your program will miss one message in a billion, or maybe it will never miss a message by coincidence. If it was thread safe, though, it would be guaranteed to never happen.
You could simply use a synchronized block to ensure that only one thread accesses history at a time, like this:
synchronized(ChatServlet.class) {
history = history.concat(request.getParameter("message"));
}
This means: lock ChatServlet.class, add the message to the history, then unlock ChatServlet.class.
You can never have two threads lock the same object at the same time - if they try, one of them will proceed, and the rest will wait around for the first one to unlock the object (and then another one will proceed, and the rest will wait for it to unlock the object, and so on).
Also make sure to only read history inside a synchronized(ChatServlet.class) block - otherwise, it's not guaranteed that the reading thread will see the latest updates.

It isn't thread-safe. Code that isn't thread-safe isn't guaranteed to fail, but it's not guaranteed to work either.

Related

if multiple requests are handled by a server to run a single servlet then where we need to take care of synchronization?

If multiple requests are handled by a server to run a single servlet then where we need to take care of synchronization?
I have got the answer from How does a single servlet handle multiple requests from client side how multiple requests are handled. But then again there is a question that why we need synchronization if all requests are handled separately?
Can you give some real life example how a shared state works and how a servlet can be dependent? I am not much interested in code but looking for explanation with example of any portal application? Like if there is any login page how it is accessed by n number of users concurrently.
If more than one request is handled by the server.. like what I read is server make a thread pool of n threads to serve the requests and I guess each thread will have their own set of parameters to maintain the session... so is there any chance that two or more threads (means two or more requests) can collide with each other?
Synchronization is required when multiple threads are modifying a shared resources.
So, when all your servlets are independent of each other, you don't worry about the fact that they run in parallel.
But, if they work on "shared state" somehow (for example by reading/writing values into some sort of centralized data store); then you have to make sure that things don't go wrong. Of course: the layer/form how to provide the necessary synchronization to your application depends on your exact setup.
Yes, my answer is pretty generic; but so is your question.
Synchronization in Java will only be needed if shared object is mutable. if your shared object is either read-only or immutable object, then you don't need synchronization, despite running multiple threads. Same is true with what threads are doing with an object if all the threads are only reading value then you don't require synchronization in Java.
Read more
Basically if your servlet application is multi-threaded, then data associated with servlet will not be thread safe. The common example given in many text books are things like a hit counter, stored as a private variable:
e.g
public class YourServlet implements Servlet {
private int counter;
public void service(ServletRequest req, ServletResponse, res) {
//this is not thread safe
counter ++;
}
}
This is because the service method and Servlet is operated on by multiple thread incoming as HTTP requests. The unary increment operator has to firstly read the current value, add one and the write the value back. Another thread doing the same operation concurrently, may increment the value after the first thread has read the value, but before it is written back, thus resulting in a lost write.
So in this case you should use synchronisation, or even better, the AtomicInteger class included as part of Java Concurrency from 1.5 onwards.

Java Servlet - Observer Pattern causing null Response object

I have a Java HttpServlet. This servlet contains a set of objects that make use of the observer pattern in order to return data through the servlet's Response object. Here is a simplified version of my doGet() method in the HttpServlet:
protected void doGet(final HttpServletRequest request, final HttpServletResponse response)
MyProcess process = new MyProcess();
// This following method spawns a few threads, so I use a listener to receive a completion event.
process.performAsynchronousMethod(request, new MyListener() {
public void processComplete(data) {
response.getWriter().print(data.toString());
}
}
}
As the example shows, I have a process that I execute, which spawns a variety of threads in order to produce a final dataset. This process can take anywhere from seconds to a minute. My problem is, it appears that as the doGet() method completes, the response object becomes null. When processComplete() is called, the response object will be null - thus preventing me from writing any data out.
It appears as if the servlet is closing the connection as soon as the asynchronous method is called.
Is there a better way to implement this type of servlet when using the observer pattern for asynchronous tasks? Should I do this in another way?
The servlet response will be sent back to the client when the doGet method terminates, it won't wait for your asynchronous call to finish as well. You will need to find a way to block until all your asynchronous tasks have completed, and only then allow the doGet() method to return.
The answers to this question should point you in the right direction.
Something else to watch out for is that you have no guarantee that the threads will write to the response writer in series, you may find that the various print operations overlap and the output will be garbled (this may not matter to you, depending on what the data is, and how it will be used)
You could try asynchronous servlets available in spec version 3.0, not all web servers support it, only some modern. But it means that server will hold socket connection for this amount of time. So, you should know how many clients could be connected simultaneously, not all hardware/operation system could handle a lot of open connections.
And web client will wait, and could have a timeout. You should also consider a situation that socket connection could be disconnected and client will never get result (e.g. some proxy servers break long running connections). So you should allow "resume" operation.

Multithread GAE servlets to handle concurrent users

I'd like to multithread my GAE servlets so that the same servlet on the same instance can handle up to 10 (on frontend instance I believe the max # threads is 10) concurrent requests from different users at the same time, timeslicing between each of them.
public class MyServlet implements HttpServlet {
private Executor executor;
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response) {
if(executor == null) {
ThreadFactory threadFactory = ThreadManager.currentRequestFactory();
executor = Executors.newCachedThreadPoolthreadFactory);
}
MyResult result = executor.submit(new MyTask(request));
writeResponseAndReturn(response, result);
}
}
So basically when GAE starts up, the first time it gets a request to this servlet, an Executor is created and then saved. Then each new servlet request uses that executor to spawn a new thread. Obviously everything inside MyTask must be thread-safe.
What I'm concerned about is whether or not this truly does what I'm hoping it does. That is, does this code create a non-blocking servlet that can handle multiple requests from multiple users at the same time? If not, why and what do I need to do to fix it? And, in general, is there anything else that a GAE maestro can spot that is dead wrong? Thanks in advance.
I don't think your code would work.
The doGet method is running in threads managed by the servlet container. When a request comes in, a servlet thread is occupied, and it will not be released until doGet method return. In your code, the executor.submit would return a Future object. To get the actual result you need to invoke get method on the Future object, and it would block until the MyTask finishes its task. Only after that, doGet method returns and new requests can kick in.
I am not familiar with GAE, but according to their docs, you can declare your servlet as thread-safe and then the container will dispatch multiple requests to each web server in parallel:
<!-- in appengine-web.xml -->
<threadsafe>true</threadsafe>
You implicitly asked two questions, so let me answer both:
1. How can I get my AppEngine Instance to handle multiple concurrent requests?
You really only need to do two things:
Add the statement <threadsafe>true</threadsafe> to your appengine-web.xml file, which you can find in the war\WEB-INF folder.
Make sure that the code inside all your request handlers is actually thread-safe, i.e. use only local variables in your doGet(...), doPost(...), etc. methods or make sure you synchronize all access to class or global variables.
This will tell the AppEngine instance server framework that your code is thread-safe and that you are allowing it to call all of your request handlers multiple times in different threads to handle several requests at the same time. Note: AFAIK, It is not possible to set this one a per-servlet basis. So, ALL your servlets need to be thread-safe!
So, in essence, the executor-code you posted is already included in the server code of each AppEngine instance, and actually calls your doGet(...) method from inside the run method of a separate thread that AppEngine creates (or reuses) for each request. Basically doGet() already is your MyTask().
The relevant part of the Docs is here (although it doesn't really say much): https://developers.google.com/appengine/docs/java/config/appconfig#Using_Concurrent_Requests
2. Is the posted code useful for this (or any other) purpose?
AppEngine in its current form does not allow you to create and use your own threads to accept requests. It only allows you to create threads inside your doGet(...) handler, using the currentRequestThreadFactory() method you mentioned, but only to do parallel processing for this one request and not to accept a second one in parallel (this happens outside doGet()).
The name currentRequestThreadFactory() might be a little misleading here. It does not mean that it will return the current Factory of RequestThreads, i.e. threads that handle requests. It means that it returns a Factory that can create Threads inside the currentRequest. So, unfortunately it is actually not even allowed to use the returned ThreadFactory beyond the scope of the current doGet() execution, like you are suggesting by creating an Executor based on it and keeping it around in a class variable.
For frontend instances, any threads you create inside a doGet() call will get terminated immediately when your doGet() method returns. For backend instances, you are allowed to create threads that keep running, but since you are not allowed to open server sockets for accepting requests inside these threads, these will still not allow you to manage the request handling yourself.
You can find more details on what you can and cannot do inside an appengine servlet here:
The Java Servlet Environment - The Sandbox (specifically the Threads section)
For completeness, let's see how your code can be made "legal":
The following should work, but it won't make a difference in terms of your code being able to handle multiple requests in parallel. That will be determined solely by the <threadsafe>true</threadsafe> setting in you appengine-web.xml. So, technically, this code is just really inefficient and splits an essentially linear program flow across two threads. But here it is anyways:
public class MyServlet implements HttpServlet {
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response) {
ThreadFactory threadFactory = ThreadManager.currentRequestThreadFactory();
Executor executor = Executors.newCachedThreadPool(threadFactory);
Future<MyResult> result = executor.submit(new MyTask(request)); // Fires off request handling in a separate thread
writeResponse(response, result.get()); // Waits for thread to complete and builds response. After that, doGet() returns
}
}
Since you are already inside a separate thread that is specific to the request you are currently handling, you should definitely save yourself the "thread inside a thread" and simply do this instead:
public class MyServlet implements HttpServlet {
#Override
public void doGet(HttpServletRequest request, HttpServletResponse response) {
writeResponse(response, new MyTask(request).call()); // Delegate request handling to MyTask object in current thread and write out returned response
}
}
Or, even better, just move the code from MyTask.call() into the doGet() method. ;)
Aside - Regarding the limit of 10 simultaneous servlet threads you mentioned:
This is a (temporary) design-decision that allows Google to control the load on their servers more easily (specifically the memory use of servlets).
You can find more discussion on those issues here:
Issue 7927: Allow configurable limit of concurrent requests per instance
Dynamic Backend Instance Scaling
If your bill shoots up due to increased latency, you may not be refunded the charges incurred
This topic has been bugging the heck out of me, too, since I am a strong believer in ultra-lean servlet code, so my usual servlets could easily handle hundreds, if not thousands, of concurrent requests. Having to pay for more instances due to this arbitrary limit of 10 threads per instance is a little annoying to me to say the least. But reading over the links I posted above, it sounds like they are aware of this and are working on a better solution. So, let's see what announcements Google I/O 2013 will bring in May... :)
I second the assessments of ericson and Markus A.
If however, for some reason (or for some other scenario) you want to follow the path that uses your code snippet as a starting point, I'd suggest that you change your executor definition to:
private static Executor executor;
so that it becomes static across instances.

Safe multithreaded file creation in java

I have a webapp that needs to sometimes download some bytes from a url and package it up and send back to the requester. The downloaded bytes are stored for a little while so they can be reused if the same url is needed to be downloaded. I am trying to figure out how best to prevent the threads from downloading the same url at the same time if the requests come in at the same time. I was thinking of creating a class like below that would prevent the same url from being downloaded at the same time. If a url is unable to be locked then it either waits until its not locked anymore to try and download it as long as it does not exist after the unlock.
public class URLDownloader
{
HashMap<String,String> activeThreads = new HashMap<String,String>();
public synchronized void lockURL(String url, String threadID) throws UnableToLockURLException
{
if(!activeThreads.containsKey(url))
activeThreads.put(url, threadID)
else
throw UnableToLockURLException()
}
public synchonized void unlockURL(String url, String threadID)
{
//need to check to make sure its locked and by the passed in thread
returns activeThreads.remove(url);
}
public synchonized void isURLStillLocked(String url)
{
returns activeThreads.contains(url);
}
}
Does anyone have a better solution for this? Does my solution seem valid? Are there any open source components out there that already do this very well that I can leverage?
Thanks
I would suggest to keep a ConcurrentHashSet<String> to keep track of your unique URLs visible to all your threads. This construct might not exist directly in the java library but can easily constructed by a ConcurrentHashMap like so: Collections.newSetFromMap(new ConcurrentHashMap<String,Boolean>())
It sounds like you don't need a lock, since if there are multiple requests to download the same URL, the point is to download it only once.
Also, I think it would make more sense in terms of encapsulation to put the check for a stored URL / routine to store new URLs in the URLDownloader class, rather than in the calling classes. Your threads can simply call e.g. fetchURL(), and let URLDownloader handle the specifics.
So, you can implement this in two ways. If you don't have a constant stream of download requests, the simpler way is to have only one URLDownloader thread running, and to make its fetchURL method synchronized, so that you only download one URL at a time. Otherwise, keep the pending download requests in a central LinkedHashSet<String>, which preserves order and ignores repeats.

String and concurrency in Java

This maybe a related question: Java assignment issues - Is this atomic?
I have the same class as the OP that acts on a mutable string reference. But set rarely happens. (basically this string is part of a server configuration that only reloads when forced to).
public class Test {
private String s;
public void setS(String str){
s = str;
}
public String getS(){
return s;
}
}
Multiple threads will be pounding this variable to read its value. What is the best method to make it 'safe' while not having to incur the performance degradation by declaring it volatile?
I am currently heading into the direction of ReadWriteLock, but as far as I understand, ReadWrite locks does not make it safe from thread caching? unless some syncronisation happen? Which means I've gone a full circle back to I may as well just use the volatile keyword?
Is my understanding correct? Is there nothing that can 'notify' other threads about an update to a variable in main memory manually such that they can update their local cache just once on a full moon?
volatile on this seems overkill given that the server application is designed to run for months without restart. By that time, it would've served a few million reads. I'm thinking I might as well just set the String as static final and not allow it mutate without a complete application and JVM restart.
Reads and writes to references are atomic. The problems you can incur is attempting to perform a read and a write (an update) or guaranteeing that after a write all thread see this change on the next read. However, only you can say what your requirements are.
When you use volatile, it requires a cache coherent copy be read or written. This doesn't require a copy be made to/from main memory as the caches communicate amongst themselves, even between sockets. There is a performance impact but it doesn't mean the caches are not used.
Even if the access did go all the way to main memory, you could still do millions of accesses per second.
Why a mutable String? Why not a Config class with a simple static String. When config is updated, you change this static reference, which is an atomic operation and won't be a problem for reading threads. You then have no synchronization, no locking penalties.
In order to notify the clients to this server you can use observer pattern, who ever is interested in getting the info of server update can register for your event and server delivers the notification. This shouldnt become a bottleneck as you mentioned the reload is not often.
Now to make this thread safe you can have a separate thread handle the update of server state and if your get you check for the state if state is 'Updating' you wait for it to complete say you went to sleep. Once your update thread is done it should change the state from 'Updating' to 'Updated', once you come out of sleep check for the state if it is 'Updating' then go to sleep or else start servicing the request.
This approach will add an extra if in your code but then it will enable you to reload the cache without forcing application restart.
Also this shouldnt be a bottleneck as server update is not frequent.
Hope this makes some sense.
In order to avoid the volatile keyword, you could add a "memory barrier" method to your Test class that is only called very rarely, for example
public synchronized void sync() {
}
This will force the thread to re-read the field value from main memory.
Also, you would have to change the setter to
public synchronized void setS(String str){
s = str;
}
The synchronized keyword will force the setting thread to write directly to main memory.
See here for a detailed explanation of synchronization and memory barriers.

Categories