ThreadLocals hard to use - java

I'm using ThreadLocal variables (through Clojure's vars, but the following is the same for plain ThreadLocals in Java) and very often run into the issue that I can't be sure that a certain code path will be taken on the same thread or on another thread. For code under my control this is obviously not too big a problem, but for polymorphic third party code there's sometimes not even a way to statically determine whether it's safe to assume single threaded execution.
I tend to think this is a inherent issue with ThreadLocals, but I'd like to hear some advise on how to use them in a safe way.

Then don't use ThreadLocals! They are specifically for when you want a variable that's associated with a Thread, as if there were a Map<Thread,T>.

The typical use case (as far as I know) for a ThreadLocal is in a web application framework. An HTTP filter obtains a database connection on an incoming request, and stores the connection in a static ThreadLocal. All subsequent controllers needing the connection can easily obtain it from the framework using a static call. When the response is returned, the same filter releases the connection again.

Related

Thread local behaviour in spring boot

As we know Tomcat has approx 200 threads and Jetty has some default count threads in their respective thread pools. So if we set something in a ThreadLocal per request, will it be there in the thread for life time or will Tomcat clear the ThreadLocal after each request.
If we set something in userContext in a filter do we need to clear it every time the filter exits?
Or will the web server create a new thread every time, if we don't have a thread pool configuration?
public static final ThreadLocal<UserContextDto> userContext = new ThreadLocal<>();
Yes, you need to clear ThreadLocal. Tomcat won't clear ThreadLocals.
No, new thread is not created every time. A thread from the pool is used to serve a request, and returned back to pool once request is complete.
This not only applies to Tomcat, it applies to Jetty and Undertow as well. Thread creation for every request is expensive in terms of both resources and time.
No, Tomcat will not clear ThreadLocals that your code creates, which means they will remain and could pollute subsequent requests.
So whenever you create one, make sure you clear it out before that same request or whatever exits.
It should also be noted that subsequent requests - even using the identical URL - could well be executed in a totally different thread, so ThreadLocals are not a mechanism for saving state between requests. For this, something like SessionBeans could be used.
If you put something in a ThreadLocal in a Thread that is not 100% under your control (i.e. one in which you are invoked from other code, like for a HTTP request), you need to clear whatever you set before you leave your code.
A try/finally structure is a good way to do that.
A threadpool can't do it for you, because the Java API does not provide a way to clear a thread's ThreadLocal variables. (Which is arguably a shortcoming in the Java API)
Not doing so risks a memory leak, although it's bounded by the size of the thread pool if you have one.
Once the same thread gets assigned again to the code that knows about the ThreadLocal, you'll see the old value from the previous request if you didn't remove it. It's not good to depend on that. It could lead to hard to trace bugs, security holes, etc.

Threading and Concurrency Within A Servlet

I have a web application that retrieves a (large) list of results from the database, then needs to pare down the list by looking at each result, and throwing out "invalid" ones. The parameters that make a result "invalid" are dynamic, and we cannot pass the work on to the database.
So, one idea is to create a thread pool and ExecutorService and check these results concurrently. But I keep seeing people saying "Oh, the spec prohibits spawning threads in a servlet" or "that's just a bad idea".
So, my question: what am I supposed to do? I'm in a servlet 2.5 container, so all the asynchrous goodies as part of the 3.0 spec are unavailable to me. Writing a separate service that I communicate with via JMS seems like overkill.
Looking for expert advice here.
Jason
Nonsense.
The JEE spec has lots of "should nots" and "thou shant's". The Servlet spec, on the other hand, has none of that. The Servlet spec is much more wild west. It really doesn't dive in to the actual operational aspects like the JEE spec does.
I've yet to see a JEE container (either a pure servlet container ala Tomcat/Jetty, or full boat ala Glassfish/JBoss) that actually prevented me from firing off a thread on my own. WebSphere might, it's supposed to be rather notorious, but I've not used WebSphere.
If the concept of creating unruly, self-managed threads makes you itch, then the full JEE containers internally have a formal "WorkManager" that can be used to peel threads off of. They just all expose them in different ways. That's the more "by the book-ish" mechanism for getting a thread.
But, frankly, I wouldn't bother. You'll likely have more success using the Executors out of the standard class library. If you saturate your system with too many threads and everything gets out of hand, well, that's on you. Don't Do That(tm).
As to whether an async solution is even appropriate, I'll punt on that. It's not clear from your post whether it is or not. But your question was about threads and Servlets.
Just Do It. Be aware it "may not be portable", do it right (use an Executor), take responsibility for it, and the container won't be the wiser, nor care.
Doesn't look like concurrency will help you much here. Unless it's very expensive to check each entry, making that check concurrent won't speed things up. Your bottleneck is passing the result set through the database connection, and you couldn't multithread that even if you weren't working on a servlet.
There's nothing to stop you from hitting some ThreadPool from your Servlet, the challenge comes in getting the results. If the Servlet invocation is expecting some result from your submission of a Task to the TreadPool you will end up blocking waiting for the TreadPool stuff to finish so you can compose a response to the doGet/doPut invocation.
If, on the other hand, you devise your service such that a doPut, for example, submits a Task to a ThreadPool but gets back a "handle" or some other unique identifier of the Task returning that to the client, then the client can "poll" the handle through some doGet API to see if the task is done. When the task is done, the client can get the results.
It's completely fine and appropriate. I have done countless work with Servlets that use thread pools on different containers without any problems whatsoever.
EJB containers (like JBoss) tend to warn against spawning threads, but this is because EJB guarantees that an instance of a Bean is only called by one thread, and some of the facilities rely on this and thus you could mess that up by using your own threads. In Servlet there is no such reliance and hence nothing you can mess up this way.
Even in EJB containers, you can use thread pools and be fine as long as you don't interact (like call) with EJB facilities from your own threads.
The thing to watch out for with servlet/threads is that member variables of the servlet need to be thread safe.
Technically nothing stops you from using a thread pool in your servlet to do some post processing but you could shoot yourself in the foot if you create a static thread pool with say 20 threads and 50 clients access your servlet concurrently because 30 clients will be waiting (depending on how long your post-processing takes).

Concurrency : Handling multiple submits in a web application

This is a recent interview question to my friend:
How would you handle a situation where users enter some data in the screen and let's say 5 of them clicked on the Submit button *the SAME time ?*
(By same time,the interviewer insisted that they are same to the level of nanoseconds)
My answer was just to make the method that handles the request synchronized and only one request can acquire the lock on the method at a given time.
But it looks like the interviewer kept insisting there was a "better way" to handle it .
One other approach to handle locking at the database level, but I don't think it is "better".
Are there any other approaches. This seems to be a fairly common problem.
If you have only one network card, you can only have one request coming down it at once. ;)
The answer he is probably looking for is something like
Make the servlet stateless so they can be executed concurrently.
Use components which allow thread safe concurrent access like Atomic* or Concurrent*
Use locks only where you obsolutely have to.
What I prefer to do is to make the service so fast it can respond before the next resquest can come in. ;) Though I don't have the overhead of Java EE or databases to worry about.
Does it matter that they click at the same time e.g. are they both updating the same record on a database?
A synchronized method will not cut it, especially if it's a webapp distributed amongst multiple JVMs. Also the synchronized method may block, but then the other threads would just fire after the first completes and you'd have lost writes.
So locking at database level seems to be the option here i.e. if the record has been updated, report an error back to the users whose updates were serviced after the first.
You do not have to worry about this as web server launches each request in isolated thread and manages it.
But if you have some shared resource like some file for logging then you need to achieve concurrency and put thread lock on it in request and inter requests

Is it safe to pass around the context to multiple threads?

I'm implementing a service that does REST calls for multiple applications. The results of certain REST calls should be stored in a content provider.
I'm currently trying to use multiple threads that would do the HTTP request, parse the result, and store the data in a content provider. In order to do this, I must pass around the Context to each of the threads. I'm not sure if this is a good idea because I do not know if the Context is ok to be passed to multiple threads because of its size, thread safety, etc. I'm thinking that I'm only passing a reference to the Context object for each thread, so maybe its not heavy to pass it around?
Yes, this is fine. I don't believe that explicit synchronization is required, but many of the interesting things you can do with a Context must happen on the UI thread.
Because of this reason it is usually wise to do your http request inside an AsyncTask, which will arrange to have your implementation of onPreExecute and onPostExecute run on the UI thread, as well as provide a nice interface for cancellation.
Pretty much everything in Java is passed by reference, so it's not "heavyweight".
However, you'll need to be careful that your access to members of Context is synchronized appropriately or else you will have thread safety issues.

java methods and race condition in a jsp/servlets application

Suppose that I have a method called doSomething() and I want to use this method in a multithreaded application (each servlet inherits from HttpServlet).I'm wondering if it is possible that a race condition will occur in the following cases:
doSomething() is not staic method and it writes values to a database.
doSomething() is static method but it does not write values to a database.
what I have noticed that many methods in my application may lead to a race condition or dirty read/write. for example , I have a Poll System , and for each voting operation, a certain method will change a single cell value for that poll as the following:
[poll_id | poll_data ]
[1 | {choice_1 : 10, choice_2 : 20}]
will the JSP/Servlets app solve these issues by itself, or I have to solve all that by myself?
Thanks..
It depends on how doSomething() is implemented and what it actually does. I assume writing to the database uses JDBC connections, which are not threadsafe. The preferred way of doing that would be to create ThreadLocal JDBC connections.
As for the second case, it depends on what is going on in the method. If it doesn't access any shared, mutable state then there isn't a problem. If it does, you probably will need to lock appropriately, which may involve adding locks to every other access to those variables.
(Be aware that just marking these methods as synchronized does not fix any concurrency bugs. If doSomething() incremented a value on a shared object, then all accesses to that variable need to be synchronized since i++ is not an atomic operation. If it is something as simple as incrementing a counter, you could use AtomicInteger.incrementAndGet().)
The Servlet API certainly does not magically make concurrency a non-issue for you.
When writing to a database, it depends on the concurrency strategy in your persistence layer. Pessimistic locking, optimistic locking, last-in-wins? There's way more going on when you 'write to a database' that you need to decide how you're going to handle. What is it you want to have happen when two people click the button at the same time?
Making doSomething static doesn't seem to have too much bearing on the issue. What's happening in there is the relevant part. Is it modifying static variables? Then yes, there could be race conditions.
The servlet api will not do anything for you to make your concurrency problems disappear. Things like using the synchronized keyword on your servlets are a bad idea because you are basically forcing your threads to be processed one at a time and it ruins your ability to respond quickly to multiple users.
If you use Spring or EJB3, either one will provide threadlocal database connections and the ability to specify transactions. You should definitely check out one of those.
Case 1, your servlet uses some code that accesses a database. Databases have locking mechanisms that you should exploit. Two important reasons for this: the database itself might be used from other applications that read and write that data, it's not enough for your app to deal with contending with itself. And: your own application may be deployed to a scaled, clustered web container, where multiple copies of your code are executing on separate machines.
So, there are many standard patterns for dealing with locks in databases, you may need to read up on Pessimistic and Optimistic Locking.
The servlet API and JBC connection pooling gives you some helpful guarantees so that you can write your servlet code without using Java synchronisation provided your variables are in method scope, in concept you have
Start transaction (perhaps implicit, perhaps on entry to an ejb)
Get connection to DB ( Gets you a connection from pool, associated with your tran)
read/write/update code
Close connection (actually keeps it for your thread until your transaction commits)
Commit (again maybe implictly)
So your only real issue is dealing with any contentions in the DB. All of the above tends to be done rather more nicely using things such as JPA these days, but under the covers thats more or less what's happening.
Case 2: static method, this presumably implies that you now keep everything in a memory structure. This (barring remote invocation of some sort) impies a single JVM and you managing your own locking. Should your JVM or machine crash I guess you lose your data. If you care about your data then using a DB is probably better.
OR, how about a completely other approach: servlet simply records the "vote" by writing a message to a persistent JMS queue. Have some other processes pick up the votes from the queue and adds them up. You won't give immediate feedback to the voter this way, but you decouple the user's experience from the actual (in similar scenarios) quite complex processing .
I thing that the best solution for your problem is to use something like "synchronized" keyword and wait/notify!

Categories