Java multi-threading & Safe Publication [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
After reading "Java concurrent in practice" and "OSGI in practice" I found a specific subject very interesting; Safe Publication. The following is from JCIP:
To publish an object safely, both the reference to the object and the object's state must be made visible to other threads at the same time. A properly constructed object can be safely published by:
Initializing an object reference from a static initializer.
Storing a reference to it into a volatile field.
Storing a reference to it into a final field.
Storing a reference to it into a field that is properly guarded by a (synchronized) lock.
My first question: how many java developers are aware of this (problem)?
How many real world java applications are really following this, AND is this really a real problem? I have a feeling that 99% of the implemented JVMs out there are not that "evil", i.e. a thread is not guaranteed (in fact its practical (almost) "impossible") to see stale data just because the reference is not following the "safe publication idiom" above.

Proportionally, it's probably fair to say that very few programmers sufficiently understand synchronization and concurrency. Who knows how many server applications there are out there right now managing financial transactions, medical records, police records, telephony etc etc that are full of synchronization bugs and essentially work by accident, or very very occasionally fail (never heard of anybody get a phantom phone call added to their telephone bill?) for reasons that are never really looked into or gotten to the bottom of.
Object publication is a particular problem because it's often overlooked, and it's a place where it's quite reasonable for compilers to make optimisations that could result in unexpected behaviour if you don't know about it: in the JIT-compiled code, storing a pointer, then incrementing it and storing the data is a very reasonable thing to do. You might think it's "evil", but at a low level, it's really how you'd expect the JVM spec to be. (Incidentally, I've heard of real-life programs running in JRockit suffering from this problem-- it's not purely theoretical.)
If you know that your application has synchronization bugs but isn't misbehaving in your current JVM on your current hardware, then (a) congratulations; and (b), now is the time to start "walking calmly towards the fire exit", fixing your code and educating your programmers before you need to upgrade too many components.

"is this really a real problem?"
Yes absolutely. Even the most trivial web application has to confront issues surrounding concurrency. Servlets are accessed by multiple threads, for example.
The other issue is that threading and concurrency is very hard to handle correctly. It is almost too hard. That is why we are seeing trends emerge like transactional memory, and languages like Clojure that hopefully make concurrency easier to deal with. But we have a ways to go before these become main stream. Thus we have to do the best with what we have. Reading JCiP is a very good start.

Firstly "safe publication" is not really an idiom (IMO). It comes straight from the language.
There have been cases of problems with unsafe publication, with use of NIO for instance.
Most Java code is very badly written. Threaded code is obviously more difficult than average line-of-business code.

It's not a matter of being "evil". It is a real problem, and will become much more apparent with the rise of multicore architectures in the coming years. I have seen very real production bugs due to improper synchronization. And to answer your other question, I would say that very few programmers are aware of the issue, even among otherwise "good" developers.

I would say very few programmers are away of this issue. When was the last code example you have seen that used the volatile keyword? However, most of the other conditioned mentioned - I just took for granted as best practices.
If a developer completely neglects those conditions, they will quickly encounter multi-threading errors.

My experience (short-terming and consulting in lots of different kinds of environments
Most applications I've seen) agrees with this intuition - I've never seen an entire system clearly architected to manage this problem carefully (well, I've also almost never seen an entire system clearly architected) . I've worked with very, very few developers with a good knowledge of threading issues.
Especially with web apps, you can often get away with this, or at least seem to get away with it. If you have spring-based instantiations managing your object creation and stateless servlets, you can often pretend that there's no such thing as synchronization, and this is sort where lots of applications end up. Eventually someone starts putting some shared state where it doesn't belong and 3 months later someone notices some wierd intermittent errors. This is often "good enough" for many people (as long as you're not writing banking transactions).
How many java developer are aware of this problem? Hard to say, as it depends heavily on where you work.

Related

What are memory fences used for in Java?

Whilst trying to understand how SubmissionPublisher (source code in OpenJDK 10, Javadoc), a new class added to the Java SE in version 9, has been implemented, I stumbled across a few API calls to VarHandle I wasn't previously aware of:
fullFence, acquireFence, releaseFence, loadLoadFence and storeStoreFence.
After doing some research, especially regarding the concept of memory barriers/fences (I have heard of them previously, yes; but never used them, thus was quite unfamiliar with their semantics), I think I have a basic understanding of what they are for. Nonetheless, as my questions might arise from a misconception, I want to ensure that I got it right in the first place:
Memory barriers are reordering constraints regarding reading and writing operations.
Memory barriers can be categorized into two main categories: unidirectional and bidirectional memory barriers, depending on whether they set constraints on either reads or writes or both.
C++ supports a variety of memory barriers, however, these do not match up with those provided by VarHandle. However, some of the memory barriers available in VarHandle provide ordering effects that are compatible to their corresponding C++ memory barriers.
#fullFence is compatible to atomic_thread_fence(memory_order_seq_cst)
#acquireFence is compatible to atomic_thread_fence(memory_order_acquire)
#releaseFence is compatible to atomic_thread_fence(memory_order_release)
#loadLoadFence and #storeStoreFence have no compatible C++ counter part
The word compatible seems to really important here since the semantics clearly differ when it comes to the details. For instance, all C++ barriers are bidirectional, whereas Java's barriers aren't (necessarily).
Most memory barriers also have synchronization effects. Those especially depend upon the used barrier type and previously-executed barrier instructions in other threads. As the full implications a barrier instruction has is hardware-specific, I'll stick with the higher-level (C++) barriers. In C++, for instance, changes made prior to a release barrier instruction are visible to a thread executing an acquire barrier instruction.
Are my assumptions correct? If so, my resulting questions are:
Do the memory barriers available in VarHandle cause any kind of memory synchronization?
Regardless of whether they cause memory synchronization or not, what may reordering constraints be useful for in Java? The Java Memory Model already gives some very strong guarantees regarding ordering when volatile fields, locks or VarHandle operations like #compareAndSet are involved.
In case you're looking for an example: The aforementioned BufferedSubscription, an inner class of SubmissionPublisher (source linked above), established a full fence in line 1079, function growAndAdd. However, it is unclear for me what it is there for.
This is mainly a non-answer, really (initially wanted to make it a comment, but as you can see, it's far too long). It's just that I questioned this myself a lot, did a lot of reading and research and at this point in time I can safely say: this is complicated. I even wrote multiple tests with jcstress to figure out how really they work (while looking at the assembly code generated) and while some of them somehow made sense, the subject in general is by no means easy.
The very first thing you need to understand:
The Java Language Specification (JLS) does not mention barriers, anywhere. This, for java, would be an implementation detail: it really acts in terms of happens before semantics. To be able to proper specify these according to the JMM (Java Memory Model), the JMM would have to change quite a lot.
This is work in progress.
Second, if you really want to scratch the surface here, this is the very first thing to watch. The talk is incredible. My favorite part is when Herb Sutter raises his 5 fingers and says, "This is how many people can really and correctly work with these." That should give you a hint of the complexity involved. Nevertheless, there are some trivial examples that are easy to grasp (like a counter updated by multiple threads that does not care about other memory guarantees, but only cares that it is itself incremented correctly).
Another example is when (in java) you want a volatile flag to control threads to stop/start. You know, the classical:
volatile boolean stop = false; // on thread writes, one thread reads this
If you work with java, you would know that without volatile this code is broken (you can read why double check locking is broken without it for example). But do you also know that for some people that write high performance code this is too much? volatile read/write also guarantees sequential consistency - that has some strong guarantees and some people want a weaker version of this.
A thread safe flag, but not volatile? Yes, exactly: VarHandle::set/getOpaque.
And you would question why someone might need that for example? Not everyone is interested with all the changes that are piggy-backed by a volatile.
Let's see how we will achieve this in java. First of all, such exotic things already existed in the API: AtomicInteger::lazySet. This is unspecified in the Java Memory Model and has no clear definition; still people used it (LMAX, afaik or this for more reading). IMHO, AtomicInteger::lazySet is VarHandle::releaseFence (or VarHandle::storeStoreFence).
Let's try to answer why someone needs these?
JMM has basically two ways to access a field: plain and volatile (which guarantees sequential consistency). All these methods that you mention are there to bring something in-between these two - release/acquire semantics; there are cases, I guess, where people actually need this.
An even more relaxation from release/acquire would be opaque, which I am still trying to fully understand.
Thus bottom line (your understanding is fairly correct, btw): if you plan to use this in java - they have no specification at the moment, do it on you own risk. If you do want to understand them, their C++ equivalent modes are the place to start.

Does having a large method name increase latency?

Do longer method names have an increased latency when called? Is this a noticeable effect after a long period of calls?
No, method names have nothing to do with performance. And btw, the JVM doesn't use name the name to invoke the method, it uses it symbolic link, which point to the method name in the constant pool.
invokestatic #2 //Method fn:()V
So even the byte-code doesn't get bloated with lengthy method names.
No. The length of the method name doesn't make a difference.
The fact that you're using a method does, though. More the methods, more the overhead. That's the price we pay for cleaner and more maintainable code.
Just for the sake of it: be careful when diving into "performance" from this "microscopic" point of view.
Of course it is a good thing to understand the effects of using different approaches in your source code (to avoid "stupid mistakes").
But efficiency and performance is something that receives "too much priority" too often. Typically, when you focus on creating a clean, maintainable design, the outcome will have "good enough performance". If not; well a clean, maintainable design is easier to change than a messed-up design anyway.
Keep in mind:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." and I agree with this. Its usually not worth spending a lot of time micro-optimizing code before its obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems."
What Hoare and Knuth are really saying is that software engineers should worry about other issues (such as good algorithm design and good implementations of those algorithms) before they worry about micro-optimizations such as how many CPU cycles a particular statement consumes.
from here

Why sharing a static variable between threads reduce performance?

I asked question here and someone leaved a comment saying that the problem is I'm sharing a static variable.
Why is that a problem?
Sharing a static variable of and by itself should have no adverse effect on performance. Global data is common is all programs starting with the JVM and OS constructs.
Mutable shared data is a different story as the mutation of shared data can lead to both performance issues (cache misses at the very least) and correctness issues which are a pain and are often solved using locks, which lead to potentially other performance issues.
The wiki static variable looks like a pretty substantial part of your program. Not knowing anything about what it's going or how it's coded, I would guess that it does locking in order to keep a consistent state. If most of your threads are spending their time blocking waiting to acquire access to this same object then that would explain why you're not seeing any gain from using multiple threads.
For threads to make a difference to the performance of your program they have to be reasonably independent, and not all locking on the same thing. The more locking they have to do, the less gain you will see. So try to split out the work so as much can be done independently as possible. For instance if there are work items that can be gathered independently, then you might be better off by having multiple threads go find the work items, then feed them to a queue that a dedicated thread can use to pull work items off the queue and feed them to the wiki object.

NDK vs JAVA performance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Is any body have a assumption how fast will be C code with NDK with the same calculations then java code?(if any)
lets say I am doing X calculations(the same calculations) in Y seconds in java code.
How many X calculations can i do in the same Y seconds through the C code in NDK? 1.2 ?2.7 ? any guess number?
Lets say that the calc is B=L/A +C/D (the same one for all X calculations).
EDIT:
Why i am asking this?
because i consider to move my java processing camera frames to the C code.for bigger resolutions opportunities
Since no one else want to touch this topic, since its not consider serious to try to answer it, I will have a go:
Java compiles to bytecode, and the bytecode compiles to native code by the JIT.
C compiles directly to native code.
The difference are really the extra compile step, and in theory java should do a better work then your C compiler, and here's why:
Java can insert statistics calculations into the generated native code, and then after a while regenerate it to optimize it against the current runtime paths in your code!
That last point sounds awesome, java do however come with some tradeoffs:
It needs GC runs to clean out memory
It may not JIT code at all
GC copies alive objects and throws all dead one, since GC does not need to do anything for the dead one only for the live ones, GC in theory is faster then the normal malloc/free loop for objects.
However, one thing is forgotten by most Java advocates and that is that nothing says that you will have to malloc/free every object instance when coding C. You can reuse memory, you can malloc up memory blocks and free memory blocks containing thousands of temporarily objects on one go.
With big heaps on Java, GC time increases, adding stall time. In some software it is totally OK with stall times during GC cleanup cycle, in others it causes fatal errors. Try keeping your software to respond under a defined number of milliseconds when a GC happens, and you will see what I'm talking about.
In some extreme cases, the JIT may also choose not to JIT the code at all. This happens when a JITed method would be to big, 8K if I remember correct. A non JITed method has a runtime penalty in the range of 20000% (200 times slower that is, at least at our customer it was). JIT is also turned of when the JVMs CodeCache starts to get full (if keep loading new classes into the JVM over and over again this can happen, also happen at customer site).
At one point JIT statistics also reduced concurrency on one 128 core machine to basically single core performance.
In Java the JIT has a specific amount of time to compile the bytecode to native code, it is not OK to spend all CPU resources for the JIT, since it runs in parallel with the code doing the actually work of your program. In C the compiler can run as long as it needs to spit out what it thinks is the most optimized code it can. It has no impact on execution time, where in Java it has.
What I'm saying is really this:
Java gives you more, but it's not always up to you how it performs.
C gives you less, but it's up to you how it performs.
So to answer your question:
No selecting C over Java will not make your program faster
If you only keep to simple math over a preallocate buffer, both Java and C compilers should spit out about the same code.
You will probably not get a clear anwser from anyone. The questions is far more complex than it looks like.
It is no problem to put the same number of polys out in OpenGL be it with the NDK or SDK. After all it's just same OpenGL calls. The time to render the polys (in a batch) exeeds the time of the function call overhead by orders of magnitude. So it is usually completely neglectable.
But as soon as an application gets more complex and performs some serious calculations(AI, Scene Graph Management, Culling, Image processing, Number crunching, etc.) the native version will usually be much faster.
And there is another thing: Beside the fundamental problem that there currently is no JIT Compilation.
The current dalvikvm with its compiler seems to be very basic, without doing any optimizations - even not the most basic ones!
There is this (very good) video: Google I/O 2009 - Writing Real-Time Games for Android
After I have seen it, it was clear for me I will definitely use C++ with the NDK.
For example: He is talking about the overhead of function calls "Don't use function calls".
... So yeah we are back - before 1970 and start talking about the cost of structured programming and the performance advantage of using only global vars and gotos.
The garbage collection is a real problem for games. So you will spend a lot of your time thinking how you can avoid it. Even formatting a string will create new objects. So there are tips like: don't show the FPS!
Seriously, if you know C++ it is probably easier to manage you memory with new and delete than to tweak your architecture to reduce/avoid garbage collections.
It seems like if you want to program a non trivial real time game, you are loosing all the advantages of Java. Don't use Getters and Setters, Don't use function calls. Avoid any Abstraction, etc. SERIOUSLY?
But back to your question: The performance advantage of NDK vs SDK can be anything from 0-10000%. It all depends.

Should I inline long code in a loop, or move it in a separate method?

Assume I have a loop (any while or for) like this:
loop{
A long code.
}
From the point of time complexity, should I divide this code in parts, write a function outside the loop, and call that function repeatedly?
I read something about functions very long ago, that calling a function repeatedly takes more time or memory or like something, I don't remember it exactly. Can you also provide some good reference about things like this (time complexity, coding style)?
Can you also provide some reference book or tutorial about heap memory, overheads etc. which affects the performance of program?
The performance difference is probably very minimal in this case. I would concentrate on clarity rather than performance until you identify this portion of your code to be a serious bottleneck.
It really does depend on what kind of code you're running in the loop, however. If you're just doing a tiny mathematical operation that isn't going to take any CPU time, but you're doing it a few hundred thousand times, then inlining the calculation might make sense. Anything more expensive than that, though, and performance shouldn't be an issue.
There is an overhead of calling a function.
So if the "long code" is fast compared to this overhead (and your application cares about performance), then you should definitely avoid the overhead.
However, if the performance is not noticably worse, it's better to make it more readable, by using a (or better multiple) function.
Rule one of performance optmisation: Measure it.
Personally, I go for readable code first and then optimise it IF NECESSARY. Usually, it isn't necessary :-)
See the first line in CHAPTER 3 - Measurement Is Everything
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil." - Donald Knuth
In this case, the difference in performance will probably be minimal between the two solutions, so writing clearer code is the way to do it.
There really isnt a simple "tutorial" on performance, it is a very complex subject and one that even seasoned veterans often dont fully understand. Anyway, to give you more of an idea of what the overhead of "calling" a function is, basically what you are doing is "freezing" the state of your function(in Java there are no "functions" per se, they are all called methods), calling the method, then "unfreezing", where your method was before.
The "freezing" essentially consists of pushing state information(where you were in the method, what the value of the variables was etc) on to the stack, "unfreezing" consists of popping the saved state off the stack and updating the control structures to where they were before you called the function. Naturally memory operations are far from free, but the VM is pretty good at keeping the performance impact to an absolute minimum.
Now keep in mind Java is almost entirely heap based, the only things that really have to get pushed on the stack are the value of pointers(small), your place in the program(again small), and whatever primitives you have local to your method, and a tiny bit of control information, nothing else. Furthermore, although you cannot explicitly inline in Java(though Im sure there are bytecode editors out there that essentially let you do that), most VMs, including the most popular HotSpot VM, will do this automatically for you. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html
So the bottom line is pretty much 0 performance impact, if you want to verify for yourself you can always run benchmarking and profiling tools, they should be able to confirm it for you.
From a execution speed point of view it shouldn't matter, and if you still believe this is a bottleneck it is easy to measure.
From a development performance perspective, it is a good idea to keep the code short. I would vote for turning the loop contents into one (or more) properly named methods.
Forget it! You can't gain any performance by doing the job of the JIT. Let JIT inline it for you. Keep the methods short for readability and also for performance, as JIT works better with short methods.
There are microptimizations which may help you gain some performance, but don't even think about them. I suggest the following rules:
Write clean code using appropriate objects and algorithms for readability and for performance.
In case the program is too slow, profile and identify the critical parts.
Think about improving them using better objects and algorithms.
As a last resort, you may also consider microoptimizations.

Categories