Is there any performance penalty for the following code snippet?
for (int i=0; i<someValue; i++)
{
Object o = someList.get(i);
o.doSomething;
}
Or does this code actually make more sense?
Object o;
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
If in byte code these two are totally equivalent then obviously the first method looks better in terms of style, but I want to make sure this is the case.
In today's compilers, no. I declare objects in the smallest scope I can, because it's a lot more readable for the next guy.
To quote Knuth, who may be quoting Hoare:
Premature optimization is the root of all evil.
Whether the compiler will produce marginally faster code by defining the variable outside the loop is debatable, and I imagine it won't. I would guess it'll produce identical bytecode.
Compare this with the number of errors you'll likely prevent by correctly-scoping your variable using in-loop declaration...
There's no performance penalty for declaring the Object o within the loop.
The compiler generates very similar bytecode and makes the correct optimizations.
See the article Myth - Defining loop variables inside the loop is bad for performance for a similar example.
You can disassemble the code with javap -c and check what the compiler actually emits. On my setup (java 1.5/mac compiled with eclipse), the bytecode for the loop is identical.
The first code is better as it restricts scope of o variable to the for block. From a performance perspective, it might not have any effects in Java, but it might have in lower level compilers. They might put the variable in a register if you do the first.
In fact, some people might think that if the compiler is dumb, the second snippet is better in terms of performance. This is what some instructor told me at the college and I laughed at him for this suggestion! Basically, compilers allocate memory on the stack for the local variables of a method just once at the start of the method (by adjusting the stack pointer) and release it at the end of method (again by adjusting the stack pointer, assuming it's not C++ or it doesn't have any destructors to be called). So all stack-based local variables in a method are allocated at once, no matter where they are declared and how much memory they require. Actually, if the compiler is dumb, there is no difference in terms of performance, but if it's smart enough, the first code can actually be better as it'll help the compiler understand the scope and the lifetime of the variable! By the way, if it's really smart, there should no absolutely no difference in performance as it infers the actual scope.
Construction of a object using new is totally different from just declaring it, of course.
I think readability is more important that performance and from a readability standpoint, the first code is definitely better.
I've got to admit I don't know java. But are these two equivalent? Are the object lifetimes the same? In the first example, I assume (not knowing java) that o will be eligible for garbage collection immediately the loop terminates.
But in the second example surely o won't be eligible for garbage collection until the outer scope (not shown) is exited?
Don't prematurely optimize. Better than either of these is:
for(Object o : someList) {
o.doSomething();
}
because it eliminates boilerplate and clarifies intent.
Unless you are working on embedded systems, in which case all bets are off. Otherwise, don't try to outsmart the JVM.
I've always thought that most compilers these days are smart enough to do the latter option. Assuming that's the case, I would say the first one does look nicer as well. If the loop gets very large, there's no need to look all around for where o is declared.
These have different semantics. Which is more meaningful?
Reusing an object for "performance reasons" is often wrong.
The question is what does the object "mean"? WHy are you creating it? What does it represent? Objects must parallel real-world things. Things are created, undergo state changes, and report their states for reasons.
What are those reasons? How does your object model and reflect those reasons?
To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott
In both cases the type info for the object o is determined at compile time.In the second instance, o is seen as being global to the for loop and in the first instance, the clever Java compiler knows that o will have to be available for as long as the loop lasts and hence will optimise the code in such a way that there wont be any respecification of o's type in each iteration.
Hence, in both cases, specification of o's type will be done once which means the only performance difference would be in the scope of o. Obviously, a narrower scope always enhances performance, therefore to answer your question: no, there is no performance penalty for the first code snip; actually, this code snip is more optimised than the second.
In the second snip, o is being given unnecessary scope which, besides being a performance issue, can be also a security issue.
The first makes far more sense. It keeps the variable in the scope that it is used in. and prevents values assigned in one iteration being used in a later iteration, this is more defensive.
The former is sometimes said to be more efficient but any reasonable compiler should be able to optimise it to be exactly the same as the latter.
As someone who maintains more code than writes code.
Version 1 is much preferred - keeping scope as local as possible is essential for understanding. Its also easier to refactor this sort of code.
As discussed above - I doubt this would make any difference in efficiency. In fact I would argue that if the scope is more local a compiler may be able to do more with it!
When using multiple threads (if your doing 50+) then i found this to be a very effective way of handling ghost thread problems:
Object one;
Object two;
Object three;
Object four;
Object five;
try{
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
}catch(e){
e.printstacktrace
}
finally{
one = null;
two = null;
three = null;
four = null;
five = null;
System.gc();
}
The answer depends partly on what the constructor does and what happens with the object after the loop, since that determines to a large extent how the code is optimized.
If the object is large or complex, absolutely declare it outside the loop. Otherwise, the people telling you not to prematurely optimize are right.
I've actually in front of me a code which looks like this:
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
So, relying on compiler abilities, I can assume there would be only one stack allocation for i and one for append. Then everything would be fine except the duplicated code.
As a side note, java applications are known to be slow. I never tried to do profiling in java but I guess the performance hit comes mostly from memory allocation management.
Related
In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?
I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter
I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.
I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)
In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.
I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug
I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.
While I was thinking over the memory usage of various types, I started to become a bit confused of how Java utilizes memory for integers when passed to a method.
Say, I had the following code:
public static void main (String[] args){
int i = 4;
addUp(i);
}
public static int addUp(int i){
if(i == 0) return 0;
else return addUp(i - 1);
}
In this following example, I am wondering if my following logic was correct:
I have made a memory initially for integer i = 4. Then I pass it to a method. However, since primitives are not pointed in Java, in the addUp(i == 4), I create another integer i = 4. Then afterwards, there is another addUp(i == 3), addUp(i == 2), addUp(i == 1), addUp(i == 0) in which each time, since the value is not pointed, a new i value is allocated in the memory.
Then for a single "int i" value, I have used 6 integer value memories.
However, if I were to always pass it through an array:
public static void main (String[] args){
int[] i = {4};
// int tempI = i[0];
addUp(i);
}
public static int addUp(int[] i){
if(i[0] == 0) return 0;
else return addUp(i[0] = i[0] - 1);
}
- Since I create an integer array of size 1 and then pass that to addUp which will again be passed for addUp(i[0] == 3), addUp(i[0] == 2), addUp(i[0] == 1), addUp(i[0] == 0), I have only had to use 1 integer array memory space and hence far more cost efficient. In addition, if I were to make a int value beforehand to store the initial value of i[0], I still have my "original" value.
Then this leads me to the question, why do people pass primitives like int in Java methods? Isn't it far more memory efficient to just pass the array values of those primitives? Or is the first example somehow still just O(1) memory?
And on top of this question, I just wonder the memory differences of using int[] and int especially for a size of 1. Thank you in advance. I was simply wondering being more memory efficient with Java and this came to my head.
Thanks for all the answers! I'm just now quickly wondering if I were to "analyze" big-oh memory of each code, would they both be considered O(1) or would that be wrong to assume?
What you are missing here: the int values in your example go on the stack, not on the heap.
And it is much less overhead to deal with fixed size primitive values existing on the stack - compared to objects on the heap!
In other words: using a "pointer" means that you have to create a new object on the heap. All objects live on the heap; there is no stack for arrays! And objects becomes subject to garbage collection immediately after you stopped using them. Stacks on the other hand come and go as you invoke methods!
Beyond that: keep in mind that the abstractions that programming languages provide to us are created to help us writing code that is easy to read, understand and maintain. Your approach is basically to do some sort of fine tuning that leads to more complicated code. And that is not how Java solves such problems.
Meaning: with Java, the real "performance magic" happens at runtime, when the just-in-time compiler kicks in! You see, the JIT can inline calls to small methods when the method is invoked "often enough". And then it becomes even more important to keep data "close" together. As in: when data lives on the heap, you might have to access memory to get a value. Whereas items living on the stack - might still be "close" (as in: in the processor cache). So your little idea to optimize memory usage could actually slow down program execution by orders of magnitude. Because even today, there are orders of magnitude between accessing the processor cache and reading main memory.
Long story short: avoid getting into such "micro-tuning" for either performance or memory usage: the JVM is optimized for the "normal, typical" use cases. Your attempts to introduce clever work-arounds can therefore easily result in "less good" results.
So - when you worry about performance: do what everybody else is doing. And if you one really care - then learn how the JVM works. As it turns out that even my knowledge is slightly outdated - as the comments imply that a JIT can inline objects on the stack. In that sense: focus on writing clean, elegant code that solves the problem in straight forward way!
Finally: this is subject to change at some point. There are ideas to introduce true value value objects to java. Which basically live on the stack, not the heap. But don't expect that to happen before Java10. Or 11. Or ... (I think this would be relevant here).
Several things:
First thing will be splitting hairs, but when you pass an int in java you are allocating 4 bytes onto the stack, and when you pass an array (because it is a reference) you are actually allocating 8 bytes (assuming an x64 architecture) onto the stack, plus the additional 4 bytes that store the int into the heap.
More importantly, the data that lives in the array is allocated into the heap, whereas the reference to the array itself is allocated onto the stack, when passing an integer there is no heap allocation required the primitive is only allocated into the stack. Over time reducing the heap allocations will mean that the garbage collector will have fewer things to clean up. Whereas the cleanup of stack-frames is trivial and doesn't require additional processing.
However, this is all moot (imho) because in practice when you have complicated collections of variables and objects you are likely going to end up grouping them together into a class. In general, you should be writing to promote readability and maintainability rather than trying to squeeze every last drop of performance out of the JVM. The JVM is pretty quick as it is, and there is always Moore's Law as a backstop.
It would be difficult to analyze the the Big-O for each because in order to get a true picture you would have to factor in the behavior of the garbage collector and that behavior is highly dependent on both the JVM itself and any runtime (JIT) optimizations that the JVM has made to your code.
Please remember Donald Knuth's wise words that "premature optimization is the root of all evil"
Write code that avoids micro-tuning, code that promotes readability and maintainability will fare better over the long run.
If your assumption is that arguments passed to functions necessarily consume memory (which is false by the way), then in your second example that passes an array, a copy of the reference to the array is made. That reference may actually be larger than an int, it's unlikely to be smaller.
Whether these methods take O(1) or O(N) depends on the compiler. (Here N is the value of i or i[0], depending.) If the compiler uses tail-recursion optimization then the stack space for the parameters, local variables, and return address can be reused and the implementation will then be O(1) for space. Absent tail-recursion optimization the space complexity is the same as the time complexity, O(N).
Basically tail-recursion optimization amounts (in this case) to the compiler rewriting your code as
public static int addUp(int i){
while(i != 0) i = i-1 ;
return 0;
}
or
public static int addUp(int[] i){
while(i[0] != 0) i[0] = i[0] - 1 ;
return 0 ;
}
A good optimizer might further optimize away the loops.
As far as I know, no Java compilers implement tail-recursion optimization at present, but there is no technical reason that it can't be done in many cases.
Actually, when you pass an array as a parameter to a method - a reference to this array is passed under the hood. The array itself is stored on the heap. And the reference can be 4 or 8 bytes in size (depending on CPU architecture, JVM implementation, etc.; even more, JLS doesn't say anything about how big a reference is in memory).
On the other hand, primitive int value always consumes only 4 bytes and resides on the stack.
When you pass an array, the content of the array may be modified by the method that receives the array. When you pass int primitives, those primitives may not be modified by the method that receives them. That's why sometimes you may use primitives and sometimes arrays.
Also in general, in Java programming you tend to favor readability and let this kind of memory optimizations be done by the JIT compiler.
The int array reference actually takes up more space in the stack frames than an int primitive (8 bytes vs 4). You're actually using more space.
But I think the primary reason people prefer the first way is because it's clearer and more legible.
People actually do do things a lot closer to the second when more ints are involved.
Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.
Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.
The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.
It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..
In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.
I have a basic doubt regarding object creation in java.Suppose i have two classes as follows
Class B{
public int value=100;
}
Class A{
public B getB(){
return new B();
}
public void accessValue(){
//accessing the value without storing object B
System.out.println("value is :"+getB().value);
//accessing the value by storing object B in variable b
B b=getB();
System.out.println("value is :"+b.value);
}
}
My question is,does storing the object and accessing the value make any difference in terms of memory or both are same?
They are both equivalent, since you are instantiating B both times. The first way is just a shorter version of the second.
Following piece of code is using an anonymous object. which can't be reused later in code.
//accessing the value without storing object B
System.out.println("value is :"+getB().value);
Below code uses the object by assigning it to a reference.
//accessing the value by storing object B in variable b
B b=getB();
System.out.println("value is :"+b.value);
Memory and performance wise it's NOT much difference except that in later version stack frame has an extra pointer.
It is the same. This way: B b=getB(); just keeps your code more readable. Keep in mind, that object must be stored somewhere in memory anyway.
If you never reuse the B-object after this part, the first option with an anonymous object is probably neater:
the second option would need an additional store/load command (as Hot Licks mentioned) if it isn't optimized by the compiler
possibly first storing the object in a variable creates slight overhead for the garbage collector as opposed to an anonymous object, but that's more of a "look into that" than a definitive statement of me
If you do want to access a B a second time, storing one in its own variable is faster.
EDIT: ah, both points already mentioned above while I was typing.
You will not be able to say the difference without looking at the generated machine code. It could be that the JIT puts the local variable "b" onto the stack. More likely however that the JIT will optimize b away. Depends on the JRE and JIT you are using. In any case, the difference is minor and only significant in extremely special cases.
Actually there is no difference in the second instance you are just giving the new object reference to b.
So code wise you cannot achieve the println if you use version 1, as you dont have any reference as you have in the second case unless you keep creating new object for every method call.
In that case the difference, if any, would not be worth mentioning. In the second case an extra bytecode or two would probably be generated, if the compiler didn't optimize them away, but any decent JIT would almost certainly optimize the two cases to the identical machine code.
And, in any event, the cost of an extra store/load would be inconsequential for 99.9% of applications (and swamped in this case by the new operation).
Details: If you look at the bytecodes, in the first case the getB method is called and returns a value on the top of the stack. Then a getfield references value and places that on the top of the stack. Then a StringBuilder append is done to begin building the println parameter list.
In the second case there is an extra astore and aload (pointer store/load) after the getB invocation, and the setup for the StringBuilder is stuck between the two, so that side-effects occur in the order specified in the source. If there were no side-effects to worry about the compiler might have chosen to do the very slightly more efficient dupe and astore sequence. In any case, a decent JIT would recognize that b is never used again and optimize away the store.
I'm cleaning up Java code for someone who starts their functions by declaring all variables up top, and initializing them to null/0/whatever, as opposed to declaring them as they're needed later on.
What are the specific guidelines for this? Are there optimization reasons for one way or the other, or is one way just good practice? Are there any cases where it's acceptable to deviate from whatever the proper way of doing it is?
Declare variables as close to the first spot that you use them as possible. It's not really anything to do with efficiency, but makes your code much more readable. The closer a variable is declared to where it is used, the less scrolling/searching you have to do when reading the code later. Declaring variables closer to the first spot they're used will also naturally narrow their scope.
The proper way is to declare variables exactly when they are first used and minimize their scope in order to make the code easier to understand.
Declaring variables at the top of functions is a holdover from C (where it was required), and has absolutely no advantages (variable scope exists only in the source code, in the byte code all local variables exist in sequence on the stack anyway). Just don't do it, ever.
Some people may try to defend the practice by claiming that it is "neater", but any need to "organize" code within a method is usually a strong indication that the method is simply too long.
From the Java Code Conventions, Chapter 6 on Declarations:
6.3 Placement
Put declarations only at the beginning
of blocks. (A block is any code
surrounded by curly braces "{" and
"}".) Don't wait to declare variables
until their first use; it can confuse
the unwary programmer and hamper code
portability within the scope.
void myMethod() {
int int1 = 0; // beginning of method block
if (condition) {
int int2 = 0; // beginning of "if" block
...
}
}
The one exception to the rule is
indexes of for loops, which in Java
can be declared in the for statement:
for (int i = 0; i < maxLoops; i++) { ... }
Avoid local declarations that hide
declarations at higher levels. For
example, do not declare the same
variable name in an inner block:
int count;
...
myMethod() {
if (condition) {
int count = 0; // AVOID!
...
}
...
}
If you have a kabillion variables used in various isolated places down inside the body of a function, your function is too big.
If your function is a comfortably understandable size, there's no difference between "all up front" and "just as needed".
The only not-up-front variable would be in the body of a for statement.
for( Iterator i= someObject.iterator(); i.hasNext(); )
From Google Java Style Guide:
4.8.2.2 Declared when needed
Local variables are not habitually declared at the start of their
containing block or block-like construct. Instead, local variables are
declared close to the point they are first used (within reason), to
minimize their scope. Local variable declarations typically have
initializers, or are initialized immediately after declaration.
Well, I'd follow what Google does, on a superficial level it might seem that declaring all variables at the top of the method/function would be "neater", it's quite apparent that it'd be beneficial to declare variables as necessary. It's subjective though, whatever feels intuitive to you.
I've found that declaring them as-needed results in fewer mistakes than declaring them at the beginning. I've also found that declaring them at the minimum scope possible to also prevent mistakes.
When I looked at the byte-code generated by the location of the declaration few years ago, I found they were more-or-less identical. There were ocassionally differences depending on when they were assigned. Even something like:
for(Object o : list) {
Object temp = ...; //was not "redeclared" every loop iteration
}
vs
Object temp;
for(Object o : list) {
temp = ...; //nearly identical bytecoode, if not exactly identical.
}
Came out more or less identical
I am doing this very same thing at the moment. All of the variables in the code that I am reworking are declared at the top of the function. I've seen as I've been looking through this that several variables are declared but NEVER used or they are declared and operations are being done with them (ie parsing a String and then setting a Calendar object with the date/time values from the string) but then the resulting Calendar object is NEVER used.
I am going through and cleaning these up by taking the declarations from the top and moving them down in the function to a spot closer to where it is used.
Defining variable in a wider scope than needed hinders understandability quite a bit. Limited scope signals that this variable has meaning for only this small block of code and you can not think about when reading further. This is a pretty important issue because of the tiny short-term working memory that the brain has (it said that on average you can keep track of only 7 things). One less thing to keep track of is significant.
Similarly you really should try to avoid variables in the literal sense. Try to assign all things once, and declare them final so this is known to the reader. Not having to keep track whether something changes or not really cuts the cognitive load.
Principle: Place local variable declarations as close to their first use as possible, and NOT simply at the top of a method. Consider this example:
/** Return true iff s is a blah or a blub. */
public boolean checkB(String s) {
// Return true if s is a blah
... code to return true if s is a blah ...
// Return true if s is a blub. */
int helpblub= s.length() + 1;
... rest of code to return true is s is a blah.
return false;
}
Here, local variable helpblub is placed where it is necessary, in the code to test whether s is a blub. It is part of the code that implements "Return true is s is a blub".
It makes absolutely no logical sense to put the declaration of helpblub as the first statement of the method. The poor reader would wonder, why is that variable there? What is it for?
I think it is actually objectively provable that the declare-at-the-top style is more error-prone.
If you mutate-test code in either style by moving lines around at random (to simulate a merge gone bad or someone unthinkingly cut+pasting), then the declare-at-the-top style has a greater chance of compiling while functionally wrong.
I don't think declare-at-the-top has any corresponding advantage that doesn't come down to personal preference.
So assuming you want to write reliable code, learn to prefer doing just-in-time declaration.
Its a matter of readability and personal preference rather than performance. The compiler does not care and will generate the same code anyway.
I've seen people declare at the top and at the bottom of functions. I prefer the top, where I can see them quickly. It's a matter of choice and preference.