DataflowAnomalyAnalysis: Found
'DD'-anomaly for variable 'variable'
(lines 'n1'-'n2').
DataflowAnomalyAnalysis: Found
'DU'-anomaly for variable 'variable'
(lines 'n1'-'n2').
DD and DU sound familiar...I want to say in things like testing and analysis relating to weakest pre and post conditions, but I don't remember the specifics.
NullAssignment: Assigning an Object to
null is a code smell. Consider
refactoring.
Wouldn't setting an object to null assist in garbage collection, if the object is a local object (not used outside of the method)? Or is that a myth?
MethodArgumentCouldBeFinal: Parameter
'param' is not assigned and could be
declared final
LocalVariableCouldBeFinal: Local
variable 'variable' could be declared
final
Are there any advantages to using final parameters and variables?
LooseCoupling: Avoid using
implementation types like
'LinkedList'; use the interface
instead
If I know that I specifically need a LinkedList, why would I not use one to make my intentions explicitly clear to future developers? It's one thing to return the class that's highest up the class path that makes sense, but why would I not declare my variables to be of the strictest sense?
AvoidSynchronizedAtMethodLevel: Use
block level rather than method level
synchronization
What advantages does block-level synchronization have over method-level synchronization?
AvoidUsingShortType: Do not use the
short type
My first languages were C and C++, but in the Java world, why should I not use the type that best describes my data?
DD and DU anomalies (if I remember correctly—I use FindBugs and the messages are a little different) refer to assigning a value to a local variable that is never read, usually because it is reassigned another value before ever being read. A typical case would be initializing some variable with null when it is declared. Don't declare the variable until it's needed.
Assigning null to a local variable in order to "assist" the garbage collector is a myth. PMD is letting you know this is just counter-productive clutter.
Specifying final on a local variable should be very useful to an optimizer, but I don't have any concrete examples of current JITs taking advantage of this hint. I have found it useful in reasoning about the correctness of my own code.
Specifying interfaces in terms of… well, interfaces is a great design practice. You can easily change implementations of the collection without impacting the caller at all. That's what interfaces are all about.
I can't think of many cases where a caller would require a LinkedList, since it doesn't expose any API that isn't declared by some interface. If the client relies on that API, it's available through the correct interface.
Block level synchronization allows the critical section to be smaller, which allows as much work to be done concurrently as possible. Perhaps more importantly, it allows the use of a lock object that is privately controlled by the enclosing object. This way, you can guarantee that no deadlock can occur. Using the instance itself as a lock, anyone can synchronize on it incorrectly, causing deadlock.
Operands of type short are promoted to int in any operations. This rule is letting you know that this promotion is occurring, and you might as well use an int. However, using the short type can save memory, so if it is an instance member, I'd probably ignore that rule.
DataflowAnomalyAnalysis: Found
'DD'-anomaly for variable 'variable'
(lines 'n1'-'n2').
DataflowAnomalyAnalysis: Found
'DU'-anomaly for variable 'variable'
(lines 'n1'-'n2').
No idea.
NullAssignment: Assigning an Object to
null is a code smell. Consider
refactoring.
Wouldn't setting an object to null assist in garbage collection, if the object is a local object (not used outside of the method)? Or is that a myth?
Objects in local methods are marked to be garbage collected once the method returns. Setting them to null won't do any difference.
Since it would make less experience developers what is that null assignment all about it may be considered a code smell.
MethodArgumentCouldBeFinal: Parameter
'param' is not assigned and could be
declared final
LocalVariableCouldBeFinal: Local
variable 'variable' could be declared
final
Are there any advantages to using final parameters and variables?
It make clearer that the value won't change during the lifecycle of the object.
Also, if by any chance someone try to assign a value, the compiler will prevent this coding error at compile type.
consider this:
public void businessRule( SomeImportantArgument important ) {
if( important.xyz() ){
doXyz();
}
// some fuzzy logic here
important = new NotSoImportant();
// add for/if's/while etc
if( important.abc() ){ // <-- bug
burnTheHouse();
}
}
Suppose that you're assigned to solve some mysterious bug that from time to time burns the house.
You know what wast the parameter used, what you don't understand is WHY the burnTHeHouse method is invoked if the conditions are not met ( according to your findings )
It make take you a while to findout that at some point in the middle, somone change the reference, and that you are using other object.
Using final help to prevent this kind of things.
LooseCoupling: Avoid using
implementation types like
'LinkedList'; use the interface
instead
If I know that I specifically need a LinkedList, why would I not use one to make my intentions explicitly clear to future developers? It's one thing to return the class that's highest up the class path that makes sense, but why would I not declare my variables to be of the strictest sense?
There is no difference, in this case. I would think that since you are not using LinkedList specific functionality the suggestion is fair.
Today, LinkedList could make sense, but by using an interface you help your self ( or others ) to change it easily when it wont.
For small, personal projects this may not make sense at all, but since you're using an analyzer already, I guess you care about the code quality already.
Also, helps less experienced developer to create good habits. [ I'm not saying you're one but the analyzer does not know you ;) ]
AvoidSynchronizedAtMethodLevel: Use
block level rather than method level
synchronization
What advantages does block-level synchronization have over method-level synchronization?
The smaller the synchronized section the better. That's it.
Also, if you synchronize at the method level you'll block the whole object. When you synchronize at block level, you just synchronize that specific section, in some situations that's what you need.
AvoidUsingShortType: Do not use the
short type
My first languages were C and C++, but in the Java world, why should I not use the type that best describes my data?
I've never heard of this, and I agree with you :) I've never use short though.
My guess is that by not using it, you'll been helping your self to upgrade to int seamlessly.
Code smells are more oriented to code quality than performance optimizations. So the advice are given for less experienced programmers and to avoid pitfalls, than to improve program speed.
This way, you could save a lot of time and frustrations when trying to change the code to fit a better design.
If it the advise doesn't make sense, just ignore them, remember, you are the developer at charge, and the tool is just that a tool. If something goes wrong, you can't blame the tool, right?
Just a note on the final question.
Putting "final" on a variable results in it only be assignable once. This does not necessarily mean that it is easier to write, but it most certainly means that it is easier to read for a future maintainer.
Please consider these points:
any variable with a final can be immediately classified in "will not change value while watching".
by implication it means that if all variables which will not change are marked with final, then the variables NOT marked with final actually WILL change.
This means that you can see already when reading through the definition part which variables to look out for, as they may change value during the code, and the maintainer can spend his/her efforts better as the code is more readable.
Wouldn't setting an object to null
assist in garbage collection, if the
object is a local object (not used
outside of the method)? Or is that a
myth?
The only thing it does is make it possible for the object to be GCd before the method's end, which is rarely ever necessary.
Are there any advantages to using final parameters and variables?
It makes the code somewhat clearer since you don't have to worry about the value being changed somwhere when you analyze the code. More often then not you don't need or want to change a variable's value once it's set anyway.
If I know that I specifically need a
LinkedList, why would I not use one to
make my intentions explicitly clear to
future developers?
Can you think of any reason why you would specifically need a
LinkedList?
It's one thing to
return the class that's highest up the
class path that makes sense, but why
would I not declare my variables to be
of the strictest sense?
I don't care much about local variables or fields, but if you declare a method parameter of type LinkedList, I will hunt you down and hurt you, because it makes it impossible for me to use things like Arrays.asList() and Collections.emptyList().
What advantages does block-level synchronization have over method-level synchronization?
The biggest one is that it enables you to use a dedicated monitor object so that only those critical sections are mutually exclusive that need to be, rather than everything using the same monitor.
in the Java world, why should I not
use the type that best describes my
data?
Because types smaller than int are automtically promoted to int for all calculations and you have to cast down to assign anything to them. This leads to cluttered code and quite a lot of confustion (especially when autoboxing is involved).
AvoidUsingShortType: Do not use the short type
List item
short is 16 bit, 2's compliment in java
a short mathmatical operaion with anything in the Integer family outside of another short will require a runtime sign extension conversion to the larger size. operating against a floating point requires sign extension and a non-trivial conversion to IEEE-754.
can't find proof, but with a 32 bit or 64 bit register, you're no longer saving on 'processor instructions' at the bytecode level. You're parking a compact car in a a semi-trailer's parking spot as far as the processor register is concerned.
If your are optimizing your project at the byte code level, wow. just wow. ;P
I agree on the design side of ignoring this pmd warning, just weigh accurately describing your object with a 'short' versus the incurred performance conversions.
in my opinion, the incurred performance hits are miniscule on most machines. ignore the error.
What advantages does block-level
synchronization have over method-level
synchronization?
Synchronize a method is like do a synchronize(getClass()) block, and blocks all the class.
Maybe you don't want that
Related
This question already has answers here:
java how expensive is a method call
(12 answers)
Closed 6 years ago.
I have a for loop which runs 4096 times and it should be as fast as possible. Performance is really important here. Currently I use getter methods inside the loop which just return values or objects from fields which don't change while the loop is in progress.
Example:
for (;;) {
doSomething(example.getValue());
}
Is there any overhead using getters? Is it faster using the following way?
Example:
Object object = example.getValue();
for (;;) {
doSomething(object);
}
If yes, is that also true for accessing public fields like example.value?
Edit: I don't use System.out.println() inside the loop.
Edit: Some fields are not final. No fields are volatile and no method (getter) is synchronized.
As Rogério answered, getting the object reference outside the loop (Object object = example.getValue();) will likely be faster (or will at least never be slower) than calling the getter inside the loop because
in the "worst" case, example.getValue() might actually do some very computationally-expensive stuff in the background despite that getter methods are supposed to be "trivial". By assigning a reference once and re-using it, you do this expensive computation only once.
in the "best" case, example.getValue() does something trivial such as return value; and so assigning it inside the loop would be no more expensive than outside the loop after the JIT compiler inlines the code.
However, more important is the difference in semantics between the two and its possible effects in a multi-threaded environment: If the state of the object example changes in a way which causes example.getValue() to return references to different objects, it is possible that, in each iteration, the method doSomething(Object object) will actually operate on a different instance of Object by directly calling doSomething(example.getValue());. On the other hand, by calling a getter outside the loop and setting a reference to the returned instance (Object object = example.getValue();), doSomething(object); will operate on object n times for n iterations.
This difference in semantics can cause behavior in a multi-threaded environment to be radically different from that in a single-threaded environment. Moreover, this need not be an actual "in-memory" multi-threading issue: If example.getValue() depends on e.g. database/HDD/network resources, it is possible that this data changes during execution of the loop, making it possible that a different object is returned even if the Java application itself is single-threaded. For this reason, it is best to consider what you actually want to accomplish with your loop and to then choose the option which best reflects the intended behavior.
It depends on the getter.
If it's a simple getter, the JIT will in-line it to a direct field access anyway, so there won't be a measurable difference. From a style point of view, use the getter - it's less code.
If the getter is accessing a volatile field, there's an extra memory access hit as the value can't be kept in the register, however the hit is very small.
If the getter is synchronized, then using a local variable will be measurably faster as locks don't need to be obtained and released every call, but the loop code will use the potentially stale value of the field at the time the getter was called.
You should prefer a local variable outside the loop, for the following reasons:
It tends to make the code easier to read/understand, by avoiding nested method calls like doSomething(example.getValue()) in a single line of code, and by allowing the code to give a better, more specific, name to the value returned by the getter method.
Not all getter methods are trivial (ie, they sometimes do some potentially expensive work), but developers often don't notice it, assuming a given method is trivial and inexpensive when it really isn't. In such cases, the code may take a significant performance hit without the developer realizing it. Extraction into a local variable tends to avoid this issue.
It's very easy to worry about performance much more than is necessary. I know the feeling. Some things to consider:
4096 is not much, so unless this has to complete in an extremely short time don't worry about performance so much.
If there is anything else remotely expensive going on in this loop, the getter won't matter.
Premature optimisation is the root of all evil. Focus on making your code correct and clear first. Then measure and profile it and narrow down the most expensive thing, and take care of that. Improve the actual algorithm if possible.
Regarding your question, I don't know exactly what the JIT does, but unless it can prove with certainty that example.getValue() or example.value doesn't change in the loop (which is hard to do unless the field is final and the getter is trivial) then there is logically no way it can avoid calling the getter repeatedly in the former sample since that would risk changing the behaviour of the program. The repeated calls are certainly some nonzero amount of extra work.
Having said all that, create the local variable outside the loop, whether or not it's faster, because it's clearer. Maybe that surprises you, but good code is not always the shortest. Expressing intent and other information is extremely important. In this case the local variable outside the loop makes it obvious to anyone reading the code that the argument to doSomething doesn't change (especially if you make it final) which is useful to know. Otherwise they might have to do some extra digging to make sure they know how the program behaves.
If you need to run it as fast as possible, you should not use System.out.println in critical sections.
Concerning getter: There is slight overhead for using getter, but you should not bother about it. Java does have getter and setter optimization in JIT compiler. So eventually they will be replaced by native code.
This Java tutorial
says that an immutable object cannot change its state after creation.
java.lang.String has a field
/** Cache the hash code for the string */
private int hash; // Default to 0
which is initialized on the first call of the hashCode() method, so it changes after creation:
String s = new String(new char[] {' '});
Field hash = s.getClass().getDeclaredField("hash");
hash.setAccessible(true);
System.out.println(hash.get(s));
s.hashCode();
System.out.println(hash.get(s));
output
0
32
Is it correct to call String immutable?
A better definition would be not that the object does not change, but that it cannot be observed to have been changed. It's behavior will never change: .substring(x,y) will always return the same thing for that string ditto for equals and all the other methods.
That variable is calculated the first time you call .hashcode() and is cached for further calls. This is basically what they call "memoization" in functional programming languages.
Reflection isn't really a tool for "programming" but rather for meta-programming (ie programming programs for generating programs) so it doesn't really count. It's the equivalent of changing a constant's value using a memory debugger.
The term "Immutable" is vague enough to not allow for a precise definition.
I suggest reading Kinds of Immutability from Eric Lippert's blog. Although it's technically a C# article, it's quite relevant to the question posed. In particular:
Observational immutability:
Suppose you’ve got an object which has the property that every time
you call a method on it, look at a field, etc, you get the same
result. From the point of view of the caller such an object would be
immutable. However you could imagine that behind the scenes the object
was doing lazy initialization, memoizing results of function calls in
a hash table, etc. The “guts” of the object might be entirely mutable.
What does it matter? Truly deeply immutable objects never change their
internal state at all, and are therefore inherently threadsafe. An
object which is mutable behind the scenes might still need to have
complicated threading code in order to protect its internal mutable
state from corruption should the object be called on two threads “at
the same time”.
Once created, all the methods on a String instance (called with the same parameters) will always provide the same result. You cannot change its behavoiur (with any public method), so it will always represent the same entity. Also it is final and cannot be subclassed, so it is guaranteed that all instances will behave like this.
Therefore from public view the object is considered immutable. The internal state does not really matter in this case.
Yes it is correct to call them immutable.
While it is true that you can reach in and modify private ... and final ... variables of a class, it is an unnecessary and incredibly unwise thing to do on a String object. It is generally assumed that nobody is going to be crazy enough do it.
From a security standpoint, the reflection calls needed to modify the state of a String all perform security checks. Unless you've miss-implement your sandbox, the calls will be blocked for non-trusted code. So you should have to worry about this as a way that untrusted code can break sandbox security.
It is also worth noting that the JLS states that using reflection to change final, may break things (e.g. in multi-threading) or may not have any effect.
From the viewpoint of a developer who is using reflection, it is not correct to call String immutable. There are actual Java developers using reflection to write real software every day. Dismissing reflection as a "hack" is preposterous. However, from the viewpoint of a developer who is not using reflection, it is correct to call String immutable. Whether or not it is valid to assume that String is immutable depends on context.
Immutability is an abstract concept and therefore cannot apply in an absolute sense to anything with a physical form (see the ship of Theseus). Programming language constructs like objects, variables, and methods exist physically as bits in a storage medium. Data degradation is a physical process which happens to all storage media, so no data can ever be said to be truly immutable. In addition, it is almost always possible in practice to subvert the programming language features intended to prevent the mutation of a particular datum. In contrast, the number 3 is 3, has always been 3, and will always be 3.
As applied to program data, immutability should be considered a useful assumption rather than a fundamental property. For example, if one assumes that a String is immutable, one may cache its hash code for reuse and avoid the cost of ever recomputing its hash code again later. Virtually all non-trivial software relies on assumptions that certain data will not mutate for certain durations of time. Software developers generally assume that the code segment of a program will not change while it is executing, unless they are writing self-modifying code. Understanding what assumptions are valid in a particular context is an important aspect of software development.
It can not be modified from outside and it is a final class, so it can not be subclassed and made mutable. Theese are two requirments for immutability. Reflection is considered as a hack, its not a normal way of development.
A class can be immutable while still having mutable fields, as long as it doesn't provide access to its mutable fields.
It's immutable by design. If you use Reflection (getting the declared Field and resetting its accessibility), you are circumventing its design.
Reflection will allow you to change the contents of any private field. Is it therefore correct to call any object in Java immutable?
Immutability refers to changes that are either initiated by or perceivable by the application.
In the case of string, the fact that a particular implementation chooses to lazily calculate the hashcode is not perceptible to the application. I would go a step further, and say that an internal variable that is incremented by the object -- but never exposed and never used in any other way -- would also be acceptable in an "immutable" object.
Yes it is correct. When you modified a String like you do in your example, a new String is created but the older one maintain its value.
"The Java language automatically initializes fields of objects, in contrast to local variables of methods that the programmers are responsible for initializing. Given what you know of intra- and inter-procedural data flow analysis, explain why the language designers may have made these design choices."
Its obvious to me that its to prevent a bug. However, what exactly is that bug?
Would it be to condense the possible control flow of some given method?
Could someone go into greater detail on this? I'd really appreciate the help.
It's really easy to do intra-procedural data flow, so it's really easy to check whether a field has been initialized and give warnings if it hasn't (one can write a simplistic decidable algorithm, e.g. make sure all branches of an if initialize a variable, and if one branch doesn't, fail, even if the branch is unreachable).
It's really hard to do inter-procedural data flow, so it's really hard to check whether a field of an object has ever been initialized anywhere in the code (you quickly get into undecidable territory for any reasonable approximation).
Thus Java does the former and gives compile-time errors when it detects uninitialized local variables, but doesn't do the latter and initializes an object's fields to their defaults.
It is not always the case that they are initialized. Objects can be instantiated without invoking any constructor by using reflections in combination with the class sun.misc.Unsafe or ObjectInputStream to access these classes private native methods or directly through JNI. These are intended for the purpose of object serialization/deserialization, and expect the fields to be populated by the deserializer. As for why the designers would have chosen to eliminate direct access to these methods(ie. without reflections) it stands to reason that pointers still left in memory could be used for stack-smashing or return-to-lib-c attacks. Clearing memory allocated for these "automatically" for most programs reduces the security risk as well as reducing the chance for errors. Also note that an attempt to read a local variable that has not been initialized results in a compile error for much the same reason
Before I ask my question can I please ask not to get a lecture about optimising for no reason.
Consider the following questions purely academic.
I've been thinking about the efficiency of accesses between root (ie often used and often accessing each other) classes in Java, but this applies to most OO languages/compilers. The fastest way (I'm guessing) that you could access something in Java would be a static final reference. Theoretically, since that reference is available during loading, a good JIT compiler would remove the need to do any reference lookup to access the variable and point any accesses to that variable straight to a constant address. Perhaps for security reasons it doesn't work that way anyway, but bear with me...
Say I've decided that there are some order of operations problems or some arguments to pass at startup that means I can't have a static final reference, even if I were to go to the trouble of having each class construct the other as is recommended to get Java classes to have static final references to each other. Another reason I might not want to do this would be... oh, say, just for example, that I was providing platform specific implementations of some of these classes. ;-)
Now I'm left with two obvious choices. I can have my classes know about each other with a static reference (on some system hub class), which is set after constructing all classes (during which I mandate that they cannot access each other yet, thus doing away with order of operations problems at least during construction). On the other hand, the classes could have instance final references to each other, were I now to decide that sorting out the order of operations was important or could be made the responsibility of the person passing the args - or more to the point, providing platform specific implementations of these classes we want to have referencing each other.
A static variable means you don't have to look up the location of the variable wrt to the class it belongs to, saving you one operation. A final variable means you don't have to look up the value at all but it does have to belong to your class, so you save 'one operation'. OK I know I'm really handwaving now!
Then something else occurred to me: I could have static final stub classes, kind of like a wacky interface where each call was relegated to an 'impl' which can just extend the stub. The performance hit then would be the double function call required to run the functions and possibly I guess you can't declare your methods final anymore. I hypothesised that perhaps those could be inlined if they were appropriately declared, then gave up as I realised I would then have to think about whether or not the references to the 'impl's could be made static, or final, or...
So which of the three would turn out fastest? :-)
Any other thoughts on lowering frequent-access overheads or even other ways of hinting performance to the JIT compiler?
UPDATE: After running several hours of test of various things and reading http://www.ibm.com/developerworks/java/library/j-jtp02225.html I've found that most things you would normally look at when tuning e.g. C++ go out the window completely with the JIT compiler. I've seen it run 30 seconds of calculations once, twice, and on the third (and subsequent) runs decide "Hey, you aren't reading the result of that calculation, so I'm not running it!".
FWIW you can test data structures and I was able to develop an arraylist implementation that was more performant for my needs using a microbenchmark. The access patterns must have been random enough to keep the compiler guessing, but it still worked out how to better implement a generic-ified growing array with my simpler and more tuned code.
As far as the test here was concerned, I simply could not get a benchmark result! My simple test of calling a function and reading a variable from a final vs non-final object reference revealed more about the JIT than the JVM's access patterns. Unbelievably, calling the same function on the same object at different places in the method changes the time taken by a factor of FOUR!
As the guy in the IBM article says, the only way to test an optimisation is in-situ.
Thanks to everyone who pointed me along the way.
Its worth noting that static fields are stored in a special per-class object which contains the static fields for that class. Using static fields instead of object fields are unlikely to be any faster.
See the update, I answered my own question by doing some benchmarking, and found that there are far greater gains in unexpected areas and that performance for simple operations like referencing members is comparable on most modern systems where performance is limited more by memory bandwidth than CPU cycles.
Assuming you found a way to reliably profile your application, keep in mind that it will all go out the window should you switch to another jdk impl (IBM to Sun to OpenJDK etc), or even upgrade version on your existing JVM.
The reason you are having trouble, and would likely have different results with different JVM impls lies in the Java spec - is explicitly states that it does not define optimizations and leaves it to each implementation to optimize (or not) in any way so long as execution behavior is unchanged by the optimization.
Why are people so emphatic about making every variable within a class "final"? I don't believe that there is any true benefit to adding final to private local variables, or really to use final for anything other than constants and passing variables into anonymous inner classes.
I'm not looking to start any sort of flame war, I just honestly want to know why this is so important to some people. Am I missing something?
Intent. Other people modifying your code won't change values they aren't supposed to change.
Compiler optimizations can be made if the compiler knows a field's value will never change.
Also, if EVERY variable in a class is final (as you refer to in your post), then you have an immutable class (as long as you don't expose references to mutable properties) which is an excellent way to achieve thread-safety.
The downside, is that
annoy it is hard
annoy to read
annoy code or anything
annoy else when it all
annoy starts in the
annoy same way
Other than the obvious usage for creating constants and preventing subclassing/overriding, it is a personal preference in most cases since many believe the benefits of "showing programmer intent" are outweighed by the actual code readability. Many prefer a little less verbosity.
As for optimisations, that is a poor reason for using it (meaningless in many cases). It is the worst form of micro optimisation and in the days of JIT serves no purpose.
I would suggest to use it if you prefer, don't if you that is what you prefer. Since it will all come down to religious arguments in many cases, don't worry about it.
It marks that I'm not expecting that value to change, which is free documentation. The practice is because it clearly communicates the intent of that variable and forces the compiler to verify that. Beyond that, it allows the compiler to make optimizations.
It's important because immutability is important particularly when dealing with a shared memory model. If something is immutable then it's thread safe, that makes it good enough an argument to follow as a best practice.
http://www.artima.com/intv/blochP.html
One benefit for concurrent programming which hasn't been mentioned yet:
Final fields are guaranteed to be initialized when the execution of the constructor is completed.
A project I'm currently working on is setup in a way that whenever one presses "save" in Eclipse, the final modifier is added to every variable or field that is not changed in the code. And it hasn't yet hurt anybody.
There are many good reasons to use final, as noted elsewhere. One place where it is not worth it, IMO, is on parameters to a method. Strictly speaking, the keyword adds value here, but the value is not high enough to withstand the ugly syntax. I'd prefer to express that kind of information through unit tests.
I think use of final over values that are inner to a class is an overkill unless the class is probably going to be inherited. The only advantage is around the compiler optimizations, which surely may benefit.