Under IBM JVM we have faced an issue when multiple threads are trying to call Class.getAnnotation at the same time on different objects (but with the same annotation). Threads are starting to deadlock waiting on a monitor inside a Hashtable, which is used as a cache for annotations in IBM JVM. The weirdest thing is that the thread that is holding this monitor is put into 'waiting on condition' state right inside Hashtable.get, making all other threads to wait indefinitely.
The support from IBM stated, that implementation of Class.getAnnotation is not thread safe.
Comparing to other JVM implementations (for example, OpenJDK) we see that they implement Class methods in thread safe manner. IBM JVM is a closed source JVM, they do publish some source code together with their JVM, but it's not enough to make a clear judgment whenever their implementation of Class is thread safe or not.
The Class documentation doesn't clearly state whenever its methods are thread safe or not. So is it a safe assumption to treat Class methods (getAnnotation in particular) as a thread safe or we must use sync blocks in multi threaded environment?
How do popular frameworks (ex. Hibernate) are mitigating this problem? We haven't found any usage of synchronization in Hibernate code that was using getAnnotation method.
Your problem might be related to bug fixed in version 8 of Oracle Java.
One thread calls isAnnotationPresent on an annotated class where the
annotation is not yet initialised for its defining classloader. This
will result in a call on AnnotationType.getInstance, locking the class
object for sun.reflect.annotation.AnnotationType. getInstance will
result in a Class.initAnnotationsIfNecessary for that annotation,
trying to acquire a lock on the class object of that annotation.
In the meanwhile, another thread has requested Class.getAnnotations
for that annotation(!). Since getAnnotations locks the class object it
was requested on, the first thread can't lock it when it runs into
Class.initAnnotationsIfNecessary for that annotation. But the thread
holding the lock will try to acquire the lock for the class object of
sun.reflect.annotation.AnnotationType in AnnotationType.getInstance
which is hold by the first thread, thus resulting in the deadlock.
JDK-7122142 : (ann) Race condition between isAnnotationPresent and getAnnotations
Well, there is no specified behavior, so normally the correct way to deal with it would be to say “if no behavior is specified, assume no safety guarantees”.
But…
The problem here is that if these methods are not thread-safe, the specification lacks a documentation of how to achieve thread-safety correctly here. Recall that instances of java.lang.Class are visible across all threads of the entire application or even within multiple applications if your JVM hosts multiple apps/applets/servlets/beans/etc.
So unlike classes you instantiate for your own use where you can control access to these instances, you can’t preclude other threads from accessing the same methods of a particular java.lang.Class instance. So even if we engage with the very awkward concept of relying on some kind of convention for accessing such a global resource (e.g. like saying “the caller has to do synchronized(x.class)”), the problem here is, even bigger, that no such convention exists (well, or isn’t documented which comes down to the same).
So in this special case, where no caller’s responsibility is documented and can’t be established without such a documentation, IBM is in charge of telling how they think, programmers should use these methods correctly when they are implemented in a non-thread-safe manner.
There is an alternative interpretation I want to add: all information, java.lang.Class offers, is of a static constant nature. This class reflects what has been invariably compiled into the class. And it has no methods to alter any state. So maybe there’s no additional thread-safety documentation as all information is to be considered immutable and hence naturally thread-safe.
Rather, the fact that under the hood some information is loaded on demand is the undocumented implementation detail that the programmer does not need to be aware of. So if JRE developers decide to implement lazy creation for efficiency they must maintain the like-immutable behavior, read thread safety.
Related
This may be a stupid question. However, I would like to know if I have something like this - rdd.mapPartitions(func). Should the logic in func be threadsafe?
Thanks
The short answer is no, it does not have to be thread safe.
The reason for this is that spark divides the data between partitions. It then creates a task for each partition and the function you write would run within that specific partition as a single threaded operation (i.e. no other thread would access the same data).
That said, you have to make sure you do not create thread "unsafety" manually by accessing resources which are not the RDD data. For example, if you create a static object and access that, it might cause issues as multiple tasks might run in the same executor (JVM) and access it as well. That said, you shouldn't be doing something like that to begin with unless you know exactly what you are doing...
Any function passed to the mapPartitions (or any other action or transformation) has to be thread safe. Spark on JVM (this is not necessarily true for guest languages) uses executor threads and doesn't guarantee any isolation between individual tasks.
This is particularly important when you use resources which are not initialized in the function, but passed with the closure like for example objects initialized in the main function, but referenced in the function.
It goes without saying you should not modify any of the arguments unless it is explicitly allowed.
When you do "rdd.mapPartitions(func)", the func may actually execute in a different jvm!!! Thread does not have significance across JVM.
If you are running in local mode, and using global state or thread unsafe functions, the job might work as expected but the behaviours is not defined or supported.
Recently I started reading 'Java 7 Concurrency Cookbook' and in a section Creating and running a daemon thread found the code where main thread creates and one instance of ArrayDeque and shares its reference with three producers and one consumer. The producers call deque.addFirst(event) and the consumer calls deque.getLast().
But JavaDoc of ArrayDeque clearly states that:
Array deques are not thread-safe; in the absence of external synchronization, they do not support concurrent access by multiple threads.
So I wonder whether it is a mistake or I just don't understand something?
Array deques are not thread safe, meaning you have to provide external synchronization.
However why it works is, like holger said
You are using addFirst(e) is an insert model method which does causes change in underlying datastructure
You are using getLast() which is an examine model method which does not causes change in underlying datastructure.
That is why it is working, if you had used removeLast() instead of getLast(), you should have got ConcurrentModification Exception for sure.
Hope this clears up everything , Cheers
It is clearly mentioned that if you are not going to provide any external synchronization, then ArrayDeque will not give you synchronization features just like Vector(provides internal features for thread safety-concurrency)
In Martin Odersky's talk :http://youtu.be/9PkxE_L_LMo , at 49. minute he talks about a problem that "the compiler is not re-entrant" because of static data.
I have a basic understaning about what the term "reentrancy" means in Java (for example, if I invoke a synchronized method recursively then I won't get into a deadlock) , but I still don't understand what Martin is talking about.
Why is it not possible to run two compilers in the same JVM if the code is written as shown at 49th minute of the talk?
What kind of reentrancy is he talking about?
I assume when he refers to reentrancy in the 49. minute of his talk, he does not mean the kind of reentrancy that allows one to use recursive synchronized method invocations in Java without getting into deadlock. Am I right? I am not sure.
Is he simply refering to the fact that when on has mutable static data accessed from several threads then the program will work incorrectly due to race conditions?
Please enlighten me!
What kind of reentrancy is he talking about?
He's talking about the one computer science definition for re-entrant. Besides the wikipedia article, this IBM developerWorks article states it clearly:
A reentrant function is one that can be used by more than one task
concurrently without fear of data corruption. Conversely, a
non-reentrant function is one that cannot be shared by more than one
task unless mutual exclusion to the function is ensured either by
using a semaphore or by disabling interrupts during critical sections
of code. A reentrant function can be interrupted at any time and
resumed at a later time without loss of data. Reentrant functions
either use local variables or protect their data when global variables
are used.
Being that static variables are the object-oriented version of global variables, Odersky is talking about a compiler that doesn't protect its global variables.
Is he simply refering to the fact that when on [sic] has mutable static data
accessed from several threads then the program will work incorrectly
due to race conditions?
Essentially, yes. The compiler may not function correctly when invoked concurrently because it would mix information about multiple programs, resulting in possible corruption.
I guess JAXB calls the zero-arg constructor and then starts filling the non volatile fields and adds stuff to the lists.
In my own code: Immediately after doing this (the unmarshalling) the generated beans get deported to some worker threads over some add method, but not through the constructor or any other way that would trigger the memory model to flush and refetch the data to and from shared area.
Is this safe? Or does JAXB do some magic trick behind the scenes? I can't think of any way in the java programming language that could enforce everything being visible for all threads. Does the user of JAXB generated beans have to worry about fields maybe not being visibly set in a concurrent setup?
Edit: Why are there so many downvotes? Nobody was yet able to explain how JAXB ensures this seemingly impossible task.
I won't bother to investigate the various "facts" in your question, I'll just paraphrase:
"Without references it ain't true!"
That said, anyone dealing with threads in Java these days will have to actually try to avoid establishing happens-before and happens-after relationships inadvertently. Any use of a volatile variable, a synchronized block, a Lock object or an atomic variable is bound to establish such a relationship. That immediately pulls in blocking queues, synchronized hash maps and a whole lot of other bits and pieces.
How are you so certain that the JAXB implementation actually manages to do the wrong thing?
That said, while objects obtained from JAXB are about as safe as any Java object once JAXB is done with them, the marshalling/unmarshalling methods themselves are not thread-safe. I believe that you do not have to worry unless:
Your threads share JAXB handler objects.
You are passing objects between your threads without synchronization: A decidedly unhealthy practice, regardless of where those objects came from...
EDIT:
Now that you have edited your question we can provide a more concrete answer:
JAXB-generated objects are as thread-safe as any other Java object, which is not at all. A direct constructor call offers no thread-safety on its own either. Without an established happens-before relationship, the JVM is free to return partially initialized objects at the time when new is called.
There are ways, namely via the use of final fields and immutable objects, to avoid this pitfall, but it is quite hard to get right, especially with JAXB, and it does not actually solve the issue of propagating the correct object reference so that all threads are looking at the same object.
Bottom line: it is up to you to move data among your threads safely, via the use of proper synchronization methods. Do not assume anything about the underlying implementation, except for what is clearly documented. Even then, it's better to play it safe and code defensively - it usually results in more clear interactions between the threads anyway. If at a later stage a profiler indicates a performance issue, then you should start thinking about fine-tuning your synchronization code.
Suppose that I have a method called doSomething() and I want to use this method in a multithreaded application (each servlet inherits from HttpServlet).I'm wondering if it is possible that a race condition will occur in the following cases:
doSomething() is not staic method and it writes values to a database.
doSomething() is static method but it does not write values to a database.
what I have noticed that many methods in my application may lead to a race condition or dirty read/write. for example , I have a Poll System , and for each voting operation, a certain method will change a single cell value for that poll as the following:
[poll_id | poll_data ]
[1 | {choice_1 : 10, choice_2 : 20}]
will the JSP/Servlets app solve these issues by itself, or I have to solve all that by myself?
Thanks..
It depends on how doSomething() is implemented and what it actually does. I assume writing to the database uses JDBC connections, which are not threadsafe. The preferred way of doing that would be to create ThreadLocal JDBC connections.
As for the second case, it depends on what is going on in the method. If it doesn't access any shared, mutable state then there isn't a problem. If it does, you probably will need to lock appropriately, which may involve adding locks to every other access to those variables.
(Be aware that just marking these methods as synchronized does not fix any concurrency bugs. If doSomething() incremented a value on a shared object, then all accesses to that variable need to be synchronized since i++ is not an atomic operation. If it is something as simple as incrementing a counter, you could use AtomicInteger.incrementAndGet().)
The Servlet API certainly does not magically make concurrency a non-issue for you.
When writing to a database, it depends on the concurrency strategy in your persistence layer. Pessimistic locking, optimistic locking, last-in-wins? There's way more going on when you 'write to a database' that you need to decide how you're going to handle. What is it you want to have happen when two people click the button at the same time?
Making doSomething static doesn't seem to have too much bearing on the issue. What's happening in there is the relevant part. Is it modifying static variables? Then yes, there could be race conditions.
The servlet api will not do anything for you to make your concurrency problems disappear. Things like using the synchronized keyword on your servlets are a bad idea because you are basically forcing your threads to be processed one at a time and it ruins your ability to respond quickly to multiple users.
If you use Spring or EJB3, either one will provide threadlocal database connections and the ability to specify transactions. You should definitely check out one of those.
Case 1, your servlet uses some code that accesses a database. Databases have locking mechanisms that you should exploit. Two important reasons for this: the database itself might be used from other applications that read and write that data, it's not enough for your app to deal with contending with itself. And: your own application may be deployed to a scaled, clustered web container, where multiple copies of your code are executing on separate machines.
So, there are many standard patterns for dealing with locks in databases, you may need to read up on Pessimistic and Optimistic Locking.
The servlet API and JBC connection pooling gives you some helpful guarantees so that you can write your servlet code without using Java synchronisation provided your variables are in method scope, in concept you have
Start transaction (perhaps implicit, perhaps on entry to an ejb)
Get connection to DB ( Gets you a connection from pool, associated with your tran)
read/write/update code
Close connection (actually keeps it for your thread until your transaction commits)
Commit (again maybe implictly)
So your only real issue is dealing with any contentions in the DB. All of the above tends to be done rather more nicely using things such as JPA these days, but under the covers thats more or less what's happening.
Case 2: static method, this presumably implies that you now keep everything in a memory structure. This (barring remote invocation of some sort) impies a single JVM and you managing your own locking. Should your JVM or machine crash I guess you lose your data. If you care about your data then using a DB is probably better.
OR, how about a completely other approach: servlet simply records the "vote" by writing a message to a persistent JMS queue. Have some other processes pick up the votes from the queue and adds them up. You won't give immediate feedback to the voter this way, but you decouple the user's experience from the actual (in similar scenarios) quite complex processing .
I thing that the best solution for your problem is to use something like "synchronized" keyword and wait/notify!