Is JAXB safe for concurrent access (how is it done) - java

I guess JAXB calls the zero-arg constructor and then starts filling the non volatile fields and adds stuff to the lists.
In my own code: Immediately after doing this (the unmarshalling) the generated beans get deported to some worker threads over some add method, but not through the constructor or any other way that would trigger the memory model to flush and refetch the data to and from shared area.
Is this safe? Or does JAXB do some magic trick behind the scenes? I can't think of any way in the java programming language that could enforce everything being visible for all threads. Does the user of JAXB generated beans have to worry about fields maybe not being visibly set in a concurrent setup?
Edit: Why are there so many downvotes? Nobody was yet able to explain how JAXB ensures this seemingly impossible task.

I won't bother to investigate the various "facts" in your question, I'll just paraphrase:
"Without references it ain't true!"
That said, anyone dealing with threads in Java these days will have to actually try to avoid establishing happens-before and happens-after relationships inadvertently. Any use of a volatile variable, a synchronized block, a Lock object or an atomic variable is bound to establish such a relationship. That immediately pulls in blocking queues, synchronized hash maps and a whole lot of other bits and pieces.
How are you so certain that the JAXB implementation actually manages to do the wrong thing?
That said, while objects obtained from JAXB are about as safe as any Java object once JAXB is done with them, the marshalling/unmarshalling methods themselves are not thread-safe. I believe that you do not have to worry unless:
Your threads share JAXB handler objects.
You are passing objects between your threads without synchronization: A decidedly unhealthy practice, regardless of where those objects came from...
EDIT:
Now that you have edited your question we can provide a more concrete answer:
JAXB-generated objects are as thread-safe as any other Java object, which is not at all. A direct constructor call offers no thread-safety on its own either. Without an established happens-before relationship, the JVM is free to return partially initialized objects at the time when new is called.
There are ways, namely via the use of final fields and immutable objects, to avoid this pitfall, but it is quite hard to get right, especially with JAXB, and it does not actually solve the issue of propagating the correct object reference so that all threads are looking at the same object.
Bottom line: it is up to you to move data among your threads safely, via the use of proper synchronization methods. Do not assume anything about the underlying implementation, except for what is clearly documented. Even then, it's better to play it safe and code defensively - it usually results in more clear interactions between the threads anyway. If at a later stage a profiler indicates a performance issue, then you should start thinking about fine-tuning your synchronization code.

Related

Are java.lang.Class methods thread safe?

Under IBM JVM we have faced an issue when multiple threads are trying to call Class.getAnnotation at the same time on different objects (but with the same annotation). Threads are starting to deadlock waiting on a monitor inside a Hashtable, which is used as a cache for annotations in IBM JVM. The weirdest thing is that the thread that is holding this monitor is put into 'waiting on condition' state right inside Hashtable.get, making all other threads to wait indefinitely.
The support from IBM stated, that implementation of Class.getAnnotation is not thread safe.
Comparing to other JVM implementations (for example, OpenJDK) we see that they implement Class methods in thread safe manner. IBM JVM is a closed source JVM, they do publish some source code together with their JVM, but it's not enough to make a clear judgment whenever their implementation of Class is thread safe or not.
The Class documentation doesn't clearly state whenever its methods are thread safe or not. So is it a safe assumption to treat Class methods (getAnnotation in particular) as a thread safe or we must use sync blocks in multi threaded environment?
How do popular frameworks (ex. Hibernate) are mitigating this problem? We haven't found any usage of synchronization in Hibernate code that was using getAnnotation method.
Your problem might be related to bug fixed in version 8 of Oracle Java.
One thread calls isAnnotationPresent on an annotated class where the
annotation is not yet initialised for its defining classloader. This
will result in a call on AnnotationType.getInstance, locking the class
object for sun.reflect.annotation.AnnotationType. getInstance will
result in a Class.initAnnotationsIfNecessary for that annotation,
trying to acquire a lock on the class object of that annotation.
In the meanwhile, another thread has requested Class.getAnnotations
for that annotation(!). Since getAnnotations locks the class object it
was requested on, the first thread can't lock it when it runs into
Class.initAnnotationsIfNecessary for that annotation. But the thread
holding the lock will try to acquire the lock for the class object of
sun.reflect.annotation.AnnotationType in AnnotationType.getInstance
which is hold by the first thread, thus resulting in the deadlock.
JDK-7122142 : (ann) Race condition between isAnnotationPresent and getAnnotations
Well, there is no specified behavior, so normally the correct way to deal with it would be to say “if no behavior is specified, assume no safety guarantees”.
But…
The problem here is that if these methods are not thread-safe, the specification lacks a documentation of how to achieve thread-safety correctly here. Recall that instances of java.lang.Class are visible across all threads of the entire application or even within multiple applications if your JVM hosts multiple apps/applets/servlets/beans/etc.
So unlike classes you instantiate for your own use where you can control access to these instances, you can’t preclude other threads from accessing the same methods of a particular java.lang.Class instance. So even if we engage with the very awkward concept of relying on some kind of convention for accessing such a global resource (e.g. like saying “the caller has to do synchronized(x.class)”), the problem here is, even bigger, that no such convention exists (well, or isn’t documented which comes down to the same).
So in this special case, where no caller’s responsibility is documented and can’t be established without such a documentation, IBM is in charge of telling how they think, programmers should use these methods correctly when they are implemented in a non-thread-safe manner.
There is an alternative interpretation I want to add: all information, java.lang.Class offers, is of a static constant nature. This class reflects what has been invariably compiled into the class. And it has no methods to alter any state. So maybe there’s no additional thread-safety documentation as all information is to be considered immutable and hence naturally thread-safe.
Rather, the fact that under the hood some information is loaded on demand is the undocumented implementation detail that the programmer does not need to be aware of. So if JRE developers decide to implement lazy creation for efficiency they must maintain the like-immutable behavior, read thread safety.

Java avoid race condition WITHOUT synchronized/lock

In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.

I'm pretty sure finalize is still bad news on later JVMs--is there an alternative?

I would like to implement a ORM-style system that can save updates to POJOs when they are no longer reachable by the caller.
I thought the reference classes could do it, but they seem to only enqueue the reference after the object has been cleared (I was hoping it was when they were able to be collected), so once enqueued the .get() method will always return null.
I could use a finalizer but last time I checked those were questionable (Not guaranteed to run promptly or run at all)--I believe a combination of finalizers and runShutdownHook() would work but that's getting into fairly swampy territory.
Is there another path I'm not thinking besides the obligatory "Just have the caller call .save() when he's done"?
Are you just trying to avoid having to call save() on every POJO that you modify?
This can be done reliably using a persistence session object, like this:
Open a new session object.
Load objects via the session object. The session object maintains references to all the objects it loads.
Make any changes to the loaded objects. It is not necessary to call a save method on updated objects.
Close the session object. The session saves all of its objects. It might even be fancy enough to keep a copy of clean loaded data, compare all of its objects to the clean data, and save only the ones that have been modified.
And if you don't want to pass session objects through your code, you can take things a step further with the Unit of Work pattern, associating a session object to the current thread:
Start a unit of work. This creates a session object behind the scenes and associates it with the current thread.
Load objects. Whenever an object is loaded, your ORM automatically associates it with a session object based on the current thread.
Make any changes to the loaded objects. It is not necessary to call a save method on updated objects.
Complete the unit of work. This closes the session object, saving all the objects.
This fixes several problems with a reachability based solution:
You are not relying on nondeterministic garbage collections, which may have a long time between runs, or not run at all.
All objects modified in one operation are saved together. If you rely on reachability, different objects modified in the same operation can become unreachable at different times, meaning your modifications can be saved to the database in bits-and-pieces.
Rollback is much easier - just give your session object a rollback() method. With a reachability solution, you would need to remember to call rollback() on every modified POJO if an operation fails, which is really the same as your original problem.
Perhaps see http://nhibernate.info/doc/patternsandpractices/nhibernate-and-the-unit-of-work-pattern.html or research the Unit of Work pattern and emulate some of those ideas.
Use the Observer Pattern do build a ClearanceManager and some Destroyables.
IDestroyable is an interface, which is used for the observers it contains the method public void destroy()
The ClearanceManager is the Subject of the Observerpattern. Maybe use Singleton here to ensure you have just one ClearanceManager object in your application.
Use a Set internaly inside the ClearanceManager (not a List to ensure Objects can just be added once)
support an addDestroyable(IDestroyable destoryable) method (and maybe a removeDestroyable one).
During Runtime the Classes for which you need some destructor emulation, can register them self at the ClearenceManager. ClearenceManager.getInstance().addDestroyable(this);
The ClearanceManager has an doClearance() method, which should just be called at the end of the Main method. It iterates threw the private Set and calls destroy() on every IDestroyable object.
Doing it this way you can emulate destructors, without using them, because using destructors you are losing control about the existance of myabe needed object.
You do not know when overwriting finalize, when it is called.
Maybe, if you do not want to call doClearance() in your Main method you can use here, but just here, a real destructor finalize(). Because there are references in the ClearenceManager to the needed Objects, they will not be destroyed first. But maybe mhh, if there are cross references .... better do not use finalize, use doClearance() and have fun with it :)
I think you are barking up the wrong tree here.
All of Java's finalizer and Reference mechanisms based on reachability depend on the garbage collector to determine whether the respective objects are reachable. So if you use any of the Reference mechanisms for some kind of finalization, you run into much the same issues that make finalize a bad idea.
It is technically possible to implement your own mechanisms for doing reachability; e.g. by implementing your own application-specific reference counting. However, it is likely to be expensive, fragile, and make your code look horrible. (Reference counting in Java is likely to be messier and more fragile than in C++, because you can't overload reference assignment operators to ensure that reference counts are adjusted transparently. So every reference assignment needs to be wrapped in a method call.) So I'd say that doing your own reachability analysis is a bad idea.
So, to be practical you need to either:
rethink your design so that you don't do things based on reachability, or
live with the consequences of using finalize.
The first option is clearly the best, IMO.
maybe you can subclass PhantomReference, and store necessary data in it for final actions.

if multiple threads are updating the same variable, what should be done so each thread updates the variable correctly?

If multiple threads are updating the same variable, what should I do so each thread updates the variable correctly?
Any help would be greatly appreciated
There are several options:
1) Using no synchronization at all
This can only work if the data is of primitive type (not long/double), and you don't care about reading stale values (which is unlikely)
2) Declaring the field as volatile
This will guarantee that stale values are never read. It also works fine for objects (assuming the objects aren't changed after creation), because of the happens-before guarantees of volatile variables (See "Java Memory Model").
3) Using java.util.concurrent.AtomicLong, AtomicInteger etc
They are all thread safe, and support special operations like atomic incrementation and atomic compare-and-set operations.
4) Protecting reads and writes with the same lock
This approach provides mutual exclusion, which allows defining a large atomic operation, where multiple data members are manipulated as a single operation.
This is a major problem with multi-threaded applications, and spans more than I could really cover in an answer, so I'll point you to some resources.
http://download.oracle.com/javase/tutorial/essential/concurrency/sync.html
http://www.vogella.de/articles/JavaConcurrency/article.html#concurrencyjava_synchronized
Essentially, you use the synchronized keyword to place a lock around a variable. This makes sure that the piece of code is only being run once at a time. You can also place locks around the same object in multiple areas.
Additionally, you need to look out for several pitfalls, such as Deadlock.
http://tutorials.jenkov.com/java-concurrency/deadlock.html
Errors caused by misuse of locks are often very difficult to debug and track down, because they aren't very consistent. So, you always need to be careful that you put all of your locks in the correct location.
You should implement locking on the variable in question.
Eg.
http://download.oracle.com/javase/tutorial/essential/concurrency/newlocks.html

How can I perform a threadsafe point-in-time snapshot of a key-value map in Java?

In my application, I have a key-value map that serves as a central repository for storing data that is used to return to a defined state after a crash or restart (checkpointing).
The application is multithreaded and several threads may put key-value pairs into that map. One thread is responsible for regularly creating a checkpoint, i. e. serialize the map to persistant storage.
While the checkpoint is being written, the map should remain unchanged. It's rather easy to avoid new items being added, but what about other threads changing members of "their" objects inside the map?
I could have a single object whose monitor is seized when the checkpointing starts and wrap all write access to any member of the map, and members thereof, in blocks synchronizing on that object. This seems very error-prone and tedious to me.
I could also make the map private to the checkpointer and only put copies of the submitted objects in it. But then I would have to ensure that the copies are deep copies and I wouldn't be able to have the data in the map being automatically updated, on every change to the submitted objects, the submitters would have to re-submit them. This seems like a lot of overhead and also error-prone, as I have to remember putting resubmit code in all the right places.
What's an elegant and reliable way to solve this?
what about other threads changing members of "their" objects inside the map
Here you have a problem :) and it cannot be solved by any kind of Map...
One solution would be to allow only immutable objects in your Map, but this may be impossible for you.
Otherwise you have to share a lock will all threads that may change data referenced by your map and block them all during your snapshot ; but this is a stop the world approach...
pgras is right that immutability would fix things, but that would also be tough. You could just lock the whole thing but that could be a performance problem. I can think of two good ideas.
First is to use a ReadWriteLock (which requires 1.5 or newer). Since your checkpoint can acquire the read lock it can be assured things are safe, but when no one is reading performance should be pretty good. This is still a pretty coarse lock, so you may also want to do #2...
Second is to break things up. Each area of the program could keep it's own map (the map for GUI stuff, the map for user settings, the map for hardware settings, whatever). Each one would have a lock on it and things would go about as usual. When it came time to checkpoint, the checkpointer would grab ALL the locks (so things are consistant) and then do it's job. The catch here is you have define an order for the locks to be grabbed in (say alphabetical) otherwise you'll end-up with deadlocks.
If the maps are orthogonal to each other (updates to one don't require updates to another to be consistent) then the easiest thing may be to push the updates to a central "backup" map in the checkpointer, not unlike something you described.
My biggest question to you would be, how much of a problem is this (performance wise)? Are updates very frequent, or are they rare? That would help to advise on something since my last idea (previous paragraph) could be slow, but it's easy and may not matter.
There is a fantastic book called Java Concurrency in Practice which is basically the Java threading bible. It discusses how to figure out this kind of stuff and strategies to avoid problems or make solving them easier. If you are going to be doing more threading, it's a very useful read.
Actually if your key values are orthogonal to eachother, then things are really easy. The ConcurrentMap interface (there are implemetations such as the ConcurrentHashMap) would solve your problems since they can do changes atomically, so readers wouldn't see inconsistent data. But if you have any two (or more) keys that must be updated at the same time this won't cover you.
I hope this helps. Threading access to shared data structures is complex stuff.

Categories