I was reading through the SparseArray class in android, and came across the following method:
public void removeAt(int index) {
if (mValues[index] != DELETED) {
mValues[index] = DELETED;
mGarbage = true;
}
}
Clearly this could as well has been written:
public void removeAt(int index) { Or public void removeAt(int index) {
if (mValues[index] != DELETED) { mValues[index] = DELETED;
mValues[index] = DELETED; mGarbage = true;
if (!mGarbage) }
mGarbage = true;
}
}
It would seem the android developers believed the array lookup mValues[index] was faster than an array write, but the variable lookup wasn't faster than a variable write.
Is this really true? Does it depend on the VM, or is it general knowledge in compiled languages too?
Certainly the right-hand side version is not equivalent - because then mGarbage is set to true whether or not the value has changed.
The left-hand side is equivalent to the original, but it's pointless.
Basically I think you've missed the side-effect of checking whether or not the existing value was allows DELETED: it allows mGarbage to be set to true only if the method has actually had an effect. That has nothing to do with the performance of reading from the array.
It depends a lot on the VM and I'd guess that this specific code is tuned for the Dalvik VM (or it's just whatever Apache Harmony happened to implement).
One thing to remember is that a write always implies some cost related to caching and cross-thread interaction (i.e. you might need memory barriers for it to work correctly), while a read is much easier to do.
The assumption is probably true, although it will depend a lot on the processor and JVM implementation.
The general reason is less to do with arrays vs. variables but more to do with memory access patterns:
mGarbage is very likely to be locally cached if it's a field value of the current object, either in a register or L1 cache. You probably just rest the object into cache in order to do something like a virtual method lookup a few cycles ago. There won't be much difference between a read or a write when something is locally cached.
mValues[index] is an array lookup that is less likely to be locally cached (particularly if the array is large or only gets accessed sporadically). Reads from non-local caches will usually be faster than writes because of locking / memory contention issues so it makes sense to do a read only if you can get away with it. This effect becomes stronger the more cores you have in your machine and the more concurrency you have in your code.
Related
In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?
I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter
I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.
I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)
In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.
I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug
I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.
What is the difference between getVolatile vs getAcquire when using e.g. AtomicInteger?
PS: those are related to
The source of a synchronizes-with edge is called a release, and the
destination is called an acquire.
from https://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html#jls-17.3
It all comes back to how we want the optimisation of our code. Optimisation in terms of reordering code. Compiler might reorder to optimise. getAquire ensures that the instructions following it will not be executed before it. These instructions might be reordered but they will always be executed after the getAquire.
This works in combination with setRelease (for VarHandle) where setRelease ensures that what happens before it is not reordered to happen after it.
Example:
Thread1:
var x = 1;
var y = 2;
var z = 3;
A.setRelease(this, 10)
assignments of x, y and z will happen before A.setRelease but might be reordered themselves.
Thread 2:
if (A.getAquire(this) == 10) {
// we know that x is 1, y is 2 and z = 3
}
This is a nice use case for concurrent program where you don't have to push volatility on everything but just need some instructions to be executed before another.
For getVolatile, the variable is treated just like any volatile variable in Java. No reordering or optimisation is happening.
This video is a nice to see to understand whats called the "memory ordering modes" being plain, opaque, release/acquire and volatile.
One of the key differences between acquire/release and volatile (sequential consistency) can be demonstrated using Dekker's algorithm.
public void lock(int t) {
int other = 1-t;
flag[t]=1
while (flag[other] == 1) {
if (turn == other) {
flag[t]=0;
while (turn == other);
flag[t]=1
}
}
}
public void unlock(int t) {
turn = 1-t;
flag[t]=0
}
So lets assume that the writing of the flag is done using a release store and the loading of the flag is done using an acquire-load, then we'll get the following ordering guarantees:
.. other loads/stores
[StoreStore][LoadStore]
flag[t]=1 // release-store
flag[other] // acquire-load
[LoadLoad][LoadStore]
.. other loads/stores
The problem is that the earlier write to flag[t] can be reordered with the later load of flag[other] and the consequence is that 2 threads could end up in the critical section.
The reason the earlier store and the later load to a different address can be reordered is 2 fold:
the compiler could reorder it.
modern CPU's have store buffers to hide the memory latency. Since stores are going to be made eventually anyway, there is no point on letting the CPU stall on a cache write miss.
To prevent this from happening a stronger memory model is needed. In this case we need sequential consistency since we do not want any reordering to take place. This can be realized by adding a [StoreLoad] between the store and the load.
.. other loads/stores
[StoreStore][LoadStore]
flag[t]=1 // release-store
[StoreLoad]
flag[other] // acquire-load
[LoadLoad][LoadStore]
.. other loads/stores
It depends on the ISA on which side this is done; e.g. on the X86 this is typically done on the writing side. E.g. using an MFENCE (there are other like XCHG that has an implicit lock or using a LOCK ADDL 0 to the stack pointer like the JVM typically does).
On the ARM this is done on the reading side. Instead of using a weaker load like an LDAPR, a LDAR is needed which will lead the LDAR to wait till the STLR's have been drained from the store buffer.
For a good read, check the following link:
https://shipilev.net/blog/2014/on-the-fence-with-dependencies/
Java's present memory model guarantees that if the only reference to an object "George" is stored into a final field of some other object "Joe", and neither George nor Joe have never been seen by any other thread, all operations upon George which were performed before the store will be seen by all threads as having been performed before the store. This works out very nicely in cases where it makes sense to store into a final field a reference to an object which will never be mutated after that.
Is there any efficient way of achieving such semantics in cases where an object of mutable type is supposed to be lazily created (sometime after the owning object's constructor has finished execution)? Consider the fairly simple class ArrayThing which encapsulates an immutable array, but it offers a method (three versions with the same nominal purpose) to return the sum of all elements prior to a specified one. For purposes of this example, assume that many instances will be constructed without ever using that method, but on instances where that method is used, it will be used a lot; consequently, it's not worthwhile to precompute the sums when every instance of ArrayThing is constructed, but it is worthwhile to cache them.
class ArrayThing {
final int[] mainArray;
ArrayThing(int[] initialContents) {
mainArray = (int[])initialContents.clone();
}
public int getElementAt(int index) {
return mainArray[index];
}
int[] makeNewSumsArray() {
int[] temp = new int[mainArray.length+1];
int sum=0;
for (int i=0; i<mainArray.length; i++) {
temp[i] = sum;
sum += mainArray[i];
}
temp[i] = sum;
return temp;
}
// Unsafe version (a thread could be seen as setting sumOfPrevElements1
// before it's seen as populating array).
int[] sumOfPrevElements1;
public int getSumOfElementsBefore_v1(int index) {
int[] localElements = sumOfPrevElements1;
if (localElements == null) {
localElements = makeNewSumsArray();
sumOfPrevElements1 = localElements;
}
return localElements[index];
}
static class Holder {
public final int[] it;
public Holder(int[] dat) { it = dat; }
}
// Safe version, but slower to read (adds another level of indirection
// but no thread can possibly see a write to sumOfPreviousElements2
// before the final field and the underlying array have been written.
Holder sumOfPrevElements2;
public int getSumOfElementsBefore_v2(int index) {
Holder localElements = sumOfPrevElements2;
if (localElements == null) {
localElements = new Holder(makeNewSumsArray());
sumOfPrevElements2 = localElements;
}
return localElements.it[index];
}
// Safe version, I think; but no penalty on reading speed.
// Before storing the reference to the new array, however, it
// creates a temporary object which is almost immediately
// discarded; that seems rather hokey.
int[] sumOfPrevElements3;
public int getSumOfElementsBefore_v3(int index) {
int[] localElements = sumOfPrevElements3;
if (localElements == null) {
localElements = (new Holder(makeNewSumsArray())).it;
sumOfPrevElements3 = localElements;
}
return localElements[index];
}
}
As with the String#hashCode() method, it is possible that two or more threads might see that a computation hasn't been performed, decide to perform it, and store the result. Since all threads would end up producing identical results, that wouldn't be an issue. With getSumOfElementsBefore_v1(), however, there is a different problem: Java could re-order program execution so the array reference gets written to sumOfPrevElements1 before all the elements of the array have been written. Another thread which called getSumOfElementsBefore() at that moment could see that the array wasn't null, and then proceed to read an array element which hadn't yet been written. Oops.
From what I understand, getSumOfElementsBefore_v2() fixes that problem, since storing a reference to the array in final field Holder#it would establish a "happens-after" relationship with regard to the array element writes. Unfortunately, that version of the code would need to create and maintain an extra heap object, and would require that every attempt to access the sum-of-elements array go through an extra level of indirection.
I think getSumOfElementsBefore_v3() would be cheaper but still safe. The JVM guarantees that all actions which were done to a new object before a reference is stored into a final field will be visible to all threads by the time any thread can see that reference. Thus, even if other threads don't use Holder#it directly, the fact that they are using a reference which was copied from that field would establish that they can't see the reference until after everything that was done before the store has actually happened.
Even though the latter method limits the overhead (versus the unsafe method) to the times when the new array is created (rather than adding overhead to every read), it still seems rather ugly to create a new object purely for the purpose of writing and reading back a final field. Making the array field volatile would achieve legitimate semantics, but would add memory-system overhead every time the field is read (a volatile qualifier would require that the code notice if the field has been written in another thread, but that's overkill for this application; what's necessary is merely that any thread which does see that the field has been written also see all writes which occurred to the array identify thereby before the reference was stored). Is there any way to achieve similar semantics without having to either create and abandon a superfluous temporary object, or add additional overhead every time the field is read??
Your third version does not work. The guarantees made for a properly constructed object stored in a final instance field apply to reads of that final field only. Since the other threads don’t read that final variable, there is no guaranty made.
Most notably, the fact that the initialization of the array has to be completed before the array reference is stored in the final Holder.it variable does not say anything about when the sumOfPrevElements3 variable will be written (as seen by other threads). In practice, a JVM might optimize away the entire Holder instance creation as it has no side-effects, thus the resulting code behaves like an ordinary unsafe publication of an int[] array.
For using the final field publication guaranty you have to publish the Holder instance containing the final field, there is no way around it.
But if that additional instance annoys you, you should really consider using a simple volatile variable. After all, you are making only assumptions about the cost of that volatile variable, in other words, thinking about premature optimization.
After all, detecting a change made by another thread doesn’t have to be expensive, e.g. on x86 it doesn’t even need an access to the main memory as it has cache coherence. It’s also possible that an optimizer detects that you never write to the variable again once it became non-null, then enabling almost all optimizations possible for ordinary fields once a non-null reference has been read.
So the conclusion is as always: measure, don’t guess. And start optimizing only once you found a real bottleneck.
I think your second and third examples do work (sort of, as you say the reference itself might not be noticed by another thread, which might re-assign the array. That's a lot of extra work!).
But those examples are based on a faulty premise: it is not true that a volatile field requires the reader to "notice" the change. In fact, volatile and final fields perform exactly the same operation. The read operation of a volatile or a final has no overhead on most CPU architectures. I believe on a write volatile has a tiny amount of extra overhead.
So I would just use volatile here, and not worry about your supposed "optimizations". The difference in speed, if any, is going to be extremely slight, and I'm talking like an extra 4 bytes written with a bus-lock, if that. And your "optimized" code is pretty god-awful to read.
As a minor pendant, it is not true that final fields require you to have the sole reference to an object to make it immutable and thread safe. The spec only requires you to prevent changes to the object. Having the sole reference to an object is one way to prevent changes, sure. But objects that are already immutable (like java.lang.String for example) can be shared without problems.
In summary: Premature Optimization is the Root of All Evil.. Loose the tricky nonsense and just write a simple array update with assignment to a volatile.
volatile int[] sumOfPrevElements;
public int getSumOfElementsBefore(int index) {
if( sumOfPrevElements != null ) return sumOfPrevElements[index];
sumOfPrevElements = makeNewSumsArray();
return sumOfPrevElements[index];
}
I have a bottleneck method which attempts to add points (as x-y pairs) to a HashSet. The common case is that the set already contains the point in which case nothing happens. Should I use a separate point for adding from the one I use for checking if the set already contains it? It seems this would allow the JVM to allocate the checking-point on stack. Thus in the common case, this will require no heap allocation.
Ex. I'm considering changing
HashSet<Point> set;
public void addPoint(int x, int y) {
if(set.add(new Point(x,y))) {
//Do some stuff
}
}
to
HashSet<Point> set;
public void addPoint(int x, int y){
if(!set.contains(new Point(x,y))) {
set.add(new Point(x,y));
//Do some stuff
}
}
Is there a profiler which will tell me whether objects are allocated on heap or stack?
EDIT: To clarify why I think the second might be faster, in the first case the object may or may not be added to the collection, so it's not non-escaping and cannot be optimized. In the second case, the first object allocated is clearly non-escaping so it can be optimized by the JVM and put on stack. The second allocation only occurs in the rare case where it's not already contained.
Marko Topolnik properly answered your question; the space allocated for the first new Point may or may not be immediately freed and it is probably foolish to bank on it happening. But I want to expand on why you're currently in a deep state of sin:
You're trying to optimise this the wrong way.
You've identified object creation to be the bottleneck here. I'm going to assume that you're right about this. You're hoping that, if you create fewer objects, the code will run faster. That might be true, but it will never run very fast as you've designed it.
Every object in Java has a pretty fat header (16 bytes; an 8-byte "mark word" full of bit fields and an 8-byte pointer to the class type) and, depending on what's happened in your program thus far, possibly another pretty fat trailer. Your HashSet isn't storing just the contents of your objects; it's storing pointers to those fat-headers-followed-by-contents. (Actually, it's storing pointers to Entry classes that themselves store pointers to Points. Two levels of indirection there.)
A HashSet lookup, then, figures out which bucket it needs to look at and then chases one pointer per thing in the bucket to do the comparison. (As one great big chain in series.) There probably aren't very many of these objects, but they almost certainly aren't stored close together, making your cache angry. Note that object allocation in Java is extremely cheap---you just increment a pointer---and that this is quite probably a bigger source of slowness.
Java doesn't provide any abstraction like C++'s templates, so the only real way to make this fast and still provide the Set abstraction is to copy HashSet's code, change all of the data structures to represent your objects inline, modify the methods to work with the new data structures, and, if you're still worried, make copies of the relevant methods that take a list of parameters corresponding to object contents (i.e. contains(int, int)) that do the right thing without constructing a new object.
This approach is error-prone and time-consuming, but it's necessary unfortunately often when working on Java projects where performance matters. Take a look at the Trove library Marko mentioned and see if you can use it instead; Trove did exactly this for the primitive types.
With that out of the way, a monomorphic call site is one where only one method is called. Hotspot aggressively inlines calls from monomorphic call sites. You'll notice that HashSet.contains punts to HashMap.containsKey. You'd better pray for HashMap.containsKey to be inlined since you need the hashCode call and equals calls inside to be monomorphic. You can verify that your code is being compiled nicely by using the -XX:+PrintAssembly option and poring over the output, but it's probably not---and even if it is, it's probably still slow because of what a HashSet is.
As soon as you have written new Point(x,y), you are creating a new object. It may happen not to be placed on the heap, but that's just a bet you can lose. For example, the contains call should be inlined for the escape analysis to work, or at least it should be a monomorphic call site. All this means that you are optimizing against a quite erratic performance model.
If you want to avoid allocation the solid way, you can use Trove library's TLongHashSet and have your (int,int) pairs encoded as single long values.
Java exposes the CAS operation through its atomic classes, e.g.
AtomicInteger.compareAndSet(expected,update)
When expected == update, are these calls a no-op or do they still have the memory consistency effects of a volatile read+write (as is the case when expected != update)?
I went through the native code and there does not appear to be any difference in relation to the values being equal. Specifically in this case since integer equivalence doesnt do so on the reference.
It will run through the same logic with same memory consistencies in the event expected != update
One note is that there will always be at least a volatile load on the field's location, so you at the very least will have a volatile read of the backing int field.
AFAIK there is no check for expected == update and no change in behaviour. To do so could add cycles if the hardware didn't do it already, which I suspect it doesn't.
I wouldn't write code which depends on side effects of calling CAS in any case.
You could add the check yourself if you think it is likely of course.