Undefined behaviour

Undefined behaviour - java

I'm making some simulations on C++ and I've faced a weird problem. I have the following function which returns a vector of double:
vector<double> processSimulation(int Q){
//do things
vector<double> output;
output.push_back(mean);
output.push_back(variance);
return output;
}
In the main, I have the following:
//define Q
vector<double>::iterator it = processSimulation(Q).begin();
double mean = *it;
double variance = *(it+1);
The problem is that I get a wrong number for the mean (something with e-305) and a correct number for the variance.
I tried to explained this behaviour myself and I think that this was probably caused by and undefined behaviour, since the iterator points to the old vector in the function, which is now out of scope and exists no more. Am I correct?
Probably I was just lucky with variance being correct, as it could've also been wrong.
I changed the code to
vector<double> output = processSimulation(Q);
vector<double>::iterator it = output.begin();
//same as before
and it works just fine, so this strengthens my hypotesis.
Also I noticed a debugger weirdness: when trying to figure out what was happening (before fixing the code), I looked at the values of mean and variance through debugging and they were BOTH wrong. Though, when I runned the program only the mean was wrong (I've tried this many times and it was always: both wrong while debugging, mean wrong and variance correct while running). What's happening in here?
Java question: well, this problem I've met is really bugging me, because often in Java, to shorten things, I didn't define new objects but used methods directly on the function that would return that object (like in this example). Though, I've never faced any problem. Have I always been doing things inadvertently (and luckily)? Or is just that in Java no such behaviour exists, since functions that should return objects, in truth, return pointers to them and the true objects are always in the heap (and are garbaged when there's no reference to them)?
Hope you can clarify my doubts!

This is a very common mistake people make when they get lazy about chaining calls on rvalues instead of storing the result in a local variable.
vector<double>::iterator it = processSimulation(Q).begin();
In the above, your processSimulation(Q) call returns vector<double>. You then obtain an iterator to the beginning of the vector and store that. Now that the resulting vector is no longer in scope, it is destroyed. That leaves a dangling iterator.
And now you start using it. Remember, that iterator contains valid information, but it's pointing into an object that no longer exists:
double mean = *it; // undefined behaviour
double variance = *(it+1); // undefined behaviour
Think of it as being a little bit like this:
vector<double>::iterator it;
{
vector<double> result = processSimulation(Q);
it = result.begin();
}
double mean = *it; // boom
When you change the code to store the return value in a local variable, the behaviour will be defined, provided the vector stays in scope for the entire time you are using the iterator.
And so this is correct (excepting the C++ style-related comments on your question):
vector<double> output = processSimulation(Q);
vector<double>::iterator it = output.begin();
double mean = *it;
double variance = *(it+1);
But you could just have easily ditched the iterator and used the array index operator:
double mean = output[0];
double variance = output[1];
You might want to consider returning your own struct that encapsulates this information, rather than a vector. Or at the very least switch to using std::pair<double, double>.

Related

In java method why we need to initialize a variable if we want to return it [duplicate]

Was there any reason why the designers of Java felt that local variables should not be given a default value? Seriously, if instance variables can be given a default value, then why can't we do the same for local variables?
And it also leads to problems as explained in this comment to a blog post:
Well this rule is most frustrating when trying to close a resource in a finally block. If I instantiate the resource inside a try, but try to close it within the finally, I get this error. If I move the instantiation outside the try, I get another error stating that a it must be within a try.
Very frustrating.

Local variables are declared mostly to do some calculation. So it's the programmer's decision to set the value of the variable and it should not take a default value.
If the programmer, by mistake, did not initialize a local variable and it takes a default value, then the output could be some unexpected value. So in case of local variables, the compiler will ask the programmer to initialize it with some value before they access the variable to avoid the usage of undefined values.

The "problem" you link to seems to be describing this situation:
SomeObject so;
try {
// Do some work here ...
so = new SomeObject();
so.DoUsefulThings();
} finally {
so.CleanUp(); // Compiler error here
}
The commenter's complaint is that the compiler balks at the line in the finally section, claiming that so might be uninitialized. The comment then mentions another way of writing the code, probably something like this:
// Do some work here ...
SomeObject so = new SomeObject();
try {
so.DoUsefulThings();
} finally {
so.CleanUp();
}
The commenter is unhappy with that solution because the compiler then says that the code "must be within a try." I guess that means some of the code may raise an exception that isn't handled anymore. I'm not sure. Neither version of my code handles any exceptions, so anything exception-related in the first version should work the same in the second.
Anyway, this second version of code is the correct way to write it. In the first version, the compiler's error message was correct. The so variable might be uninitialized. In particular, if the SomeObject constructor fails, so will not be initialized, and so it will be an error to attempt to call so.CleanUp. Always enter the try section after you have acquired the resource that the finally section finalizes.
The try-finally block after the so initialization is there only to protect the SomeObject instance, to make sure it gets cleaned up no matter what else happens. If there are other things that need to run, but they aren't related to whether the SomeObject instance was property allocated, then they should go in another try-finally block, probably one that wraps the one I've shown.
Requiring variables to be assigned manually before use does not lead to real problems. It only leads to minor hassles, but your code will be better for it. You'll have variables with more limited scope, and try-finally blocks that don't try to protect too much.
If local variables had default values, then so in the first example would have been null. That wouldn't really have solved anything. Instead of getting a compile-time error in the finally block, you'd have a NullPointerException lurking there that might hide whatever other exception could occur in the "Do some work here" section of the code. (Or do exceptions in finally sections automatically chain to the previous exception? I don't remember. Even so, you'd have an extra exception in the way of the real one.)

Moreover, in the example below, an exception may have been thrown inside the SomeObject construction, in which case the 'so' variable would be null and the call to CleanUp will throw a NullPointerException
SomeObject so;
try {
// Do some work here ...
so = new SomeObject();
so.DoUsefulThings();
} finally {
so.CleanUp(); // Compiler error here
}
What I tend to do is this:
SomeObject so = null;
try {
// Do some work here ...
so = new SomeObject();
so.DoUsefulThings();
} finally {
if (so != null) {
so.CleanUp(); // safe
}
}

The actual answer to your question is because method variables are instantiated by simply adding a number to the stack pointer. To zero them would be an extra step. For class variables they are put into initialized memory on the heap.
Why not take the extra step? Take a step back--Nobody mentioned that the "warning" in this case is a Very Good Thing.
You should never initialize your variable to zero or null on the first pass (when you are first coding it). Either assign it to the actual value or don't assign it at all because if you don't then Java can tell you when you really screw up. Take Electric Monk's answer as a great example. In the first case, it's actually amazingly useful that it's telling you that if the try() fails because SomeObject's constructor threw an exception, then you would end up with an NPE in the finally. If the constructor can't throw an exception, it shouldn't be in the try.
This warning is an awesome multi-path bad programmer checker that has saved me from doing stupid stuff since it checks every path and makes sure that if you used the variable in some path then you had to initialize it in every path that lead up to it. I now never explicitly initialize variables until I determine that it is the correct thing to do.
On top of that, isn't it better to explicitly say "int size=0" rather than "int size" and make the next programmer go figure out that you intend it to be zero?
On the flip side I can't come up with a single valid reason to have the compiler initialize all uninitialized variables to 0.

Notice that the final instance/member variables don't get initialized by default. Because those are final and can't be changed in the program afterwards. That's the reason that Java doesn't give any default value for them and force the programmer to initialize it.
On the other hand, non-final member variables can be changed later. Hence, the compiler doesn't let them remain uninitialised; precisely, because those can be changed later. Regarding local variables, the scope of local variables is much narrower; and compiler knows when it's getting used. Hence, forcing the programmer to initialize the variable, makes sense.

For me, the reason comes down to this this: The purpose of local variables is different than the purpose of instance variables. Local variables are there to be used as part of a calculation; instance variables are there to contain state. If you use a local variable without assigning it a value, that's almost certainly a logic error.
That said, I could totally get behind requiring that instance variables were always explicitly initialized; the error would occur on any constructor where the result allows an uninitialized instance variable (e.g., not initialized at declaration and not in the constructor). But that's not the decision Gosling, et. al., took in the early 90's, so here we are. (And I'm not saying they made the wrong call.)
I could not get behind defaulting local variables, though. Yes, we shouldn't rely on compilers to double-check our logic, and one doesn't, but it's still handy when the compiler catches one out. :-)

I think the primary purpose was to maintain similarity with C/C++. However the compiler detects and warns you about using uninitialized variables which will reduce the problem to a minimal point. From a performance perspective, it's a little faster to let you declare uninitialized variables since the compiler will not have to write an assignment statement, even if you overwrite the value of the variable in the next statement.

It is more efficient not to initialize variables, and in the case of local variables it is safe to do so, because initialization can be tracked by the compiler.
In cases where you need a variable to be initialized you can always do it yourself, so it is not a problem.

The idea behind local variables is they only exist inside the limited scope for which they are needed. As such, there should be little reason for uncertainty as to the value, or at least, where that value is coming from. I could imagine many errors arising from having a default value for local variables.
For example, consider the following simple code... (N.B. let us assume for demonstration purposes that local variables are assigned a default value, as specified, if not explicitly initialized)
System.out.println("Enter grade");
int grade = new Scanner(System.in).nextInt(); // I won't bother with exception handling here, to cut down on lines.
char letterGrade; // Let us assume the default value for a char is '\0'
if (grade >= 90)
letterGrade = 'A';
else if (grade >= 80)
letterGrade = 'B';
else if (grade >= 70)
letterGrade = 'C';
else if (grade >= 60)
letterGrade = 'D';
else
letterGrade = 'F';
System.out.println("Your grade is " + letterGrade);
When all is said and done, assuming the compiler assigned a default value of '\0' to letterGrade, this code as written would work properly. However, what if we forgot the else statement?
A test run of our code might result in the following
Enter grade
43
Your grade is
This outcome, while to be expected, surely was not the coder's intent. Indeed, probably in a vast majority of cases (or at least, a significant number, thereof), the default value wouldn't be the desired value, so in the vast majority of cases the default value would result in error. It makes more sense to force the coder to assign an initial value to a local variable before using it, since the debugging grief caused by forgetting the = 1 in for(int i = 1; i < 10; i++) far outweighs the convenience in not having to include the = 0 in for(int i; i < 10; i++).
It is true that try-catch-finally blocks could get a little messy (but it isn't actually a catch-22 as the quote seems to suggest), when for example an object throws a checked exception in its constructor, yet for one reason or another, something must be done to this object at the end of the block in finally. A perfect example of this is when dealing with resources, which must be closed.
One way to handle this in the past might be like so...
Scanner s = null; // Declared and initialized to null outside the block. This gives us the needed scope, and an initial value.
try {
s = new Scanner(new FileInputStream(new File("filename.txt")));
int someInt = s.nextInt();
} catch (InputMismatchException e) {
System.out.println("Some error message");
} catch (IOException e) {
System.out.println("different error message");
} finally {
if (s != null) // In case exception during initialization prevents assignment of new non-null value to s.
s.close();
}
However, as of Java 7, this finally block is no longer necessary using try-with-resources, like so.
try (Scanner s = new Scanner(new FileInputStream(new File("filename.txt")))) {
...
...
} catch(IOException e) {
System.out.println("different error message");
}
That said, (as the name suggests) this only works with resources.
And while the former example is a bit yucky, this perhaps speaks more to the way try-catch-finally or these classes are implemented than it speaks about local variables and how they are implemented.
It is true that fields are initialized to a default value, but this is a bit different. When you say, for example, int[] arr = new int[10];, as soon as you've initialized this array, the object exists in memory at a given location. Let's assume for a moment that there is no default values, but instead the initial value is whatever series of 1s and 0s happens to be in that memory location at the moment. This could lead to non-deterministic behavior in a number of cases.
Suppose we have...
int[] arr = new int[10];
if(arr[0] == 0)
System.out.println("Same.");
else
System.out.println("Not same.");
It would be perfectly possible that Same. might be displayed in one run and Not same. might be displayed in another. The problem could become even more grievous once you start talking reference variables.
String[] s = new String[5];
According to definition, each element of s should point to a String (or is null). However, if the initial value is whatever series of 0s and 1s happens to occur at this memory location, not only is there no guarantee you'll get the same results each time, but there's also no guarantee that the object s[0] points to (assuming it points to anything meaningful) even is a String (perhaps it's a Rabbit, :p)! This lack of concern for type would fly in the face of pretty much everything that makes Java Java. So while having default values for local variables could be seen as optional at best, having default values for instance variables is closer to a necessity.

Flip this around and ask: why are fields initialised to default values? If the Java compiler required you to initialise fields yourself instead of using their default values, that would be more efficient because there would be no need to zero out memory before you used it. So it would be a sensible language design if all variables were treated like local variables in this regard.
The reason is not because it's more difficult to check this for fields than for local variables. The Java compiler already knows how to check whether a field is definitely initialised by a constructor, because it has to check this for final fields. So it would be little extra work for the compiler to apply the same logic to other fields to ensure they are definitely assigned in the constructor.
The reason is that, even for final fields where the compiler proves that the field is definitely assigned in the constructor, its value before assignment can still be visible from other code:
class A {
final int x;
A() {
this.x = calculate();
}
int calculate() {
System.out.println(this.x);
return 1;
}
}
In this code, the constructor definitely assigns to this.x, but even so, the field's default initial value of 0 is visible in the calculate method at the point where this.x is printed. If the field wasn't zeroed out before the constructor was invoked, then the calculate method would be able to observe the contents of uninitialised memory, which would be non-deterministic behaviour and have potential security concerns.
The alternative would be to forbid the method call calculate() at this point in the code where the field isn't yet definitely assigned. But that would be inconvenient; it is useful to be able to call methods from the constructor like this. The convenience of being able to do that is worth more than the tiny performance cost of zeroing out the memory for the fields before invoking the constructor.
Note that this reasoning does not apply to local variables, because a method's uninitialised local variables are not visible from other methods; because they are local.

Eclipse even gives you warnings of uninitialized variables, so it becomes quite obvious anyway. Personally I think it's a good thing that this is the default behaviour, otherwise your application may use unexpected values, and instead of the compiler throwing an error it won't do anything (but perhaps give a warning) and then you'll be scratching your head as to why certain things don't quite behave the way they should.

Instance variable will have default values but the local variables could not have default values. Since local variables basically are in methods/behavior, its main aim is to do some operations or calculations. Therefore, it is not a good idea to set default values for local variables. Otherwise, it is very hard and time-consuming to check the reasons of unexpected answers.

The local variables are stored on a stack, but instance variables are stored on the heap, so there are some chances that a previous value on the stack will be read instead of a default value as happens in the heap.
For that reason the JVM doesn't allow to use a local variable without initializing it.

Memory stack for methods is created at execution time. The method stack order is decided at execution time.
There might be a function that may not be called at all. So to instantiate local variables at the time of object instantiation would be a complete wastage of memory. Also, Object variables remain in memory for a complete object lifecycle of a class whereas, local variables and their values become eligible for garbage collection the moment they are popped from the memory stack.
So, To give memory to the variables of methods that might not even be called or even if called, will not remain inside memory for the lifecycle of an object, would be a completely illogical and memory-waste-worthy

The answer is instance variables can be initialized in the class constructor or any class method. But in case of local variables, once you defined whatever in the method, that remains forever in the class.

I could think of the following two reasons
As most of the answers said, by putting the constraint of initialising the local variable, it is ensured that the local variable gets assigned a value as the programmer wants and ensures the expected results are computed.
Instance variables can be hidden by declaring local variables (same name) - to ensure the expected behaviour, local variables are forced to be initialised to a value (I would totally avoid this, though).

Iterating and counting across class variables in Java

I have class, which contains variables of multiple type, most of those (about 30) are double:
String something;
double x;
double y;
double z;
...
I want to iterate over doubles, but also keep them written in this "classic way", not inside array, because derived classes use most of them. The function I am having problem with now is how to iterate across all the double type variables, find how many of those are non zero and then pick one of all these variables randomly. There will be thousands of instances of this class and as I said, there are classes that expand this one. So I am working on solution, preferably something like pseudo:
nonzeros = 0
foreach doubleVarInClass variable
{
if (variable != 0)
nonzeros++;
}
if (nonzeros < parameter)
{
randomDoubleVarInClass = random.next(...);
}
One solution which I was thinking about was to use HashMap to keep all the variables in, but then I will have to rewrite all classes that uses this one and not sure how it will affect performance, since it will be pretty intensively used all the time. Should I be afraid of performance and try something with classic arrays perhaps? I'd like to atleast keep variable names if nothing. I thought about array with references to these variables, so I can keep them written this way, not sure if its possible due to value passing in Java.
Also maybe there is some structure that keeps info about how many of those are non zero or have efficient function for it?
Thank you for any info that could solve my problem :)

I suggest using reflection for this. Suppose you have instance of your class named o:
int nonzeros = 0;
for (Field f : o.getClass().getDeclaredFields()) {
f.setAccessible(true);
if (f.getType().equals(Double.TYPE) && f.getDouble(o) != 0.0) {
nonzeros++;
}
}
NOTE: Java Reflection will probably be bad idea from the point of performance, and you should test this first, from that point of view. Besides that, this provide easy checks without any changes in your class definition. In Java 6 performance of reflection is little better than on older versions, and you should check this in your personal use case and in your environment.

ImplementIon of eval() parser and 2d array in Java

I have really stuck on two Java related issues. One simple and one more difficult.
Regarding the creation of a 2D array, I initialize a table like this:
private String [][] table_of_classifiers = null;
and then, within a function, I fill its contents like this:
String [][] table_of_classifiers = {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
But as you can guess the second table overwrites (locally) the first one, that is of course not what I want to do. What I do wrong? Note that the dimension of the table is not known from the beginning.

Regarding the creation of a 2D array, I initialize a table like this:
private String [][] table_of_classifiers = null;
Not really. This is the declaration and initialization of a variable that can point to a "2d array", "table" or more exact an "array of arrays" of Strings.
Unless you work with that fact that the variable can/will be null, initializing it to null is usually a bad idea, because you need to do extra work to check for null. Examples:
String[][] a;
// ...
String b = a[0][0];
This won't compile, unless a wasn't initialized in the mean time. This is a good thing, because you can avoid a potential bug.
String[][] a = null;
// ...
String b = a[0][0];
This will however will compile, and if you forgot to actually assign the variable a real array, the program will "crash" with a "null pointer exception" or you need to add additional code/work to check for null.
I fill its contents like this:
String [][] table_of_classifiers = {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
You are not "filling" anything here. For something to be filled it must exist first, but you haven't created anything yet.
Here you are declaring a second variable of the same name, which is only possible if you are in a different scope that the first one, and in that case you are "hiding" ("shadowing") the original variable if it originally was accessible from this new scope.
But as you can guess the second table overwrites (locally) the first
one, that is of course not what I want to do. What I do wrong?
Which "first" table? There was no first table until now, only a first variable. The others have shown you what you need to do to assign the "table" to the original variable, by not using the "declaration" String[][] at the beginning of the line.
Otherwise it's impossible to say what you are "doing wrong" because you haven't really explained what you are attempting to do.
Note that the dimension of the table is not known from the beginning.
It's not? How/why are you using a array literal then? Literal arrays are for creating arrays of a fixed size with a fixed "prefilling".
What exactly do mean with "the beginning"? Isn't the size known when you are programming (during compile time) or when the program starts (at run time)?
If you get the size of the array during run time you can create a normal array with new:
int a = ...;
int b = ...; // Get the sizes from somewhere, e.g, user input
String[][] table_of_classifiers = new String[a][b];
// Now you have an "empty" table
If size "changes" during run time, then - depending on what you are actually attempting to do - then an array is the wrong tool and you should be using a List implementation such as ArrayList instead.
Regarding "eval", as the others say, Java is a compiled language making "eval" basically impossible. The is "reflection" or the use of Class types to achieve what you are hinting at, but you really need to explain much more extensively what you are trying to achieve, then it may be possible to help you here.
However reflection and CLass types are a complicated matter, and considering you are obviously struggling with the most basic Java concepts, you have a long way to go to until you will be able to do what you want to do.

Just do:
class Foo {
private String [][] table_of_classifiers = null;
void bar() {
table_of_classifiers = new String[][] {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
}
}
Java doesn't have eval (because it's a compiled language), but it does have reflection. It's almost certainly not the best approach to whatever it is that you want to do, though.

Regarding your first problem: to assign to table_of_classifiers without redeclaring it, write:
table_of_classifiers = new String[][] {
{"x1","x","x","x","x"},
{"x2","x","x","x","x"},
{"x3","x","x","x","x"},
{"x4","x","x","x","x"},
{"x5","x","x","x","x"},
{"x6","x","x","x","x"},
};
Regarding eval . . . the problem is that the run-time doesn't have the names of scoped local variables, and although it can get the names of instance variables, it has to do that within the context of an object. It's possible to address these sorts of issues, but it's non-trivial, and will involve major compromises. I think you have to thoroughly understand how scoping works and how reflection works before you start figuring out what features eval will support, because otherwise you'll just be disappointed at all the requirements you give it that turn out to be impossible.

Java instance variables vs. local variables

I'm in my first programming class in high school. We're doing our end of the first semester project.
This project only involves one class, but many methods. My question is about best practice with instance variables and local variables. It seems that it would be much easier for me to code using almost only instance variables. But I'm not sure if this is how I should be doing it or if I should be using local variables more (I would just have to have methods take in the values of local variables a lot more).
My reasoning for this is also because a lot of times I'll want to have a method return two or three values, but this is of course not possible. Thus it just seems easier to simply use instance variables and never having to worry since they are universal in the class.

I haven't seen anyone discuss this so I'll throw in more food for thought. The short answer/advice is don't use instance variables over local variables just because you think they are easier to return values. You are going to make working with your code very very hard if you don't use local variables and instance variables appropriately. You will produce some serious bugs that are really hard to track down. If you want to understand what I mean by serious bugs, and what that might look like read on.
Let's try and use only instance variables as you suggest to write to functions. I'll create a very simple class:
public class BadIdea {
public Enum Color { GREEN, RED, BLUE, PURPLE };
public Color[] map = new Colors[] {
Color.GREEN,
Color.GREEN,
Color.RED,
Color.BLUE,
Color.PURPLE,
Color.RED,
Color.PURPLE };
List<Integer> indexes = new ArrayList<Integer>();
public int counter = 0;
public int index = 0;
public void findColor( Color value ) {
indexes.clear();
for( index = 0; index < map.length; index++ ) {
if( map[index] == value ) {
indexes.add( index );
counter++;
}
}
}
public void findOppositeColors( Color value ) {
indexes.clear();
for( index = 0; i < index < map.length; index++ ) {
if( map[index] != value ) {
indexes.add( index );
counter++;
}
}
}
}
This is a silly program I know, but we can use it to illustrate the concept that using instance variables for things like this is a tremendously bad idea. The biggest thing you'll find is that those methods use all of the instance variables we have. And it modifies indexes, counter, and index every time they are called. The first problem you'll find is that calling those methods one after the other can modify the answers from prior runs. So for example, if you wrote the following code:
BadIdea idea = new BadIdea();
idea.findColor( Color.RED );
idea.findColor( Color.GREEN ); // whoops we just lost the results from finding all Color.RED
Since findColor uses instance variables to track returned values we can only return one result at a time. Let's try and save off a reference to those results before we call it again:
BadIdea idea = new BadIdea();
idea.findColor( Color.RED );
List<Integer> redPositions = idea.indexes;
int redCount = idea.counter;
idea.findColor( Color.GREEN ); // this causes red positions to be lost! (i.e. idea.indexes.clear()
List<Integer> greenPositions = idea.indexes;
int greenCount = idea.counter;
In this second example we saved the red positions on the 3rd line, but same thing happened!?Why did we lose them?! Because idea.indexes was cleared instead of allocated so there can only be one answer used at a time. You have to completely finish using that result before calling it again. Once you call a method again the results are cleared and you lose everything. In order to fix this you'll have to allocate a new result each time so red and green answers are separate. So let's clone our answers to create new copies of things:
BadIdea idea = new BadIdea();
idea.findColor( Color.RED );
List<Integer> redPositions = idea.indexes.clone();
int redCount = idea.counter;
idea.findColor( Color.GREEN );
List<Integer> greenPositions = idea.indexes.clone();
int greenCount = idea.counter;
Ok finally we have two separate results. The results of red and green are now separate. But, we had to know a lot about how BadIdea operated internally before the program worked didn't we? We need to remember to clone the returns every time we called it to safely make sure our results didn't get clobbered. Why is the caller forced to remember these details? Wouldn't it be easier if we didn't have to do that?
Also notice that the caller has to use local variables to remember the results so while you didn't use local variables in the methods of BadIdea the caller has to use them to remember results. So what did you really accomplish? You really just moved the problem to the caller forcing them to do more. And the work you pushed onto the caller is not an easy rule to follow because there are some many exceptions to the rule.
Now let's try doing that with two different methods. Notice how I've been "smart" and I reused those same instance variables to "save memory" and kept the code compact. ;-)
BadIdea idea = new BadIdea();
idea.findColor( Color.RED );
List<Integer> redPositions = idea.indexes;
int redCount = idea.counter;
idea.findOppositeColors( Color.RED ); // this causes red positions to be lost again!!
List<Integer> greenPositions = idea.indexes;
int greenCount = idea.counter;
Same thing happened! Damn but I was being so "smart" and saving memory and the code uses less resources!!! This is the real peril of using instance variables like this is calling methods is order dependent now. If I change the order of the method calls the results are different even though I haven't really changed the underlying state of BadIdea. I didn't change the contents of the map. Why does the program yield different results when I call the methods in different order?
idea.findColor( Color.RED )
idea.findOppositeColors( Color.RED )
Produces a different result than if I swapped those two methods:
idea.findOppositeColors( Color.RED )
idea.findColor( Color.RED )
These types of errors are really hard to track down especially when those lines aren't right next to each other. You can completely break your program by just adding a new call in anywhere between those two lines and get wildly different results. Sure when we're dealing with small number of lines it's easy to spot errors. But, in a larger program you can waste days trying to reproduce them even though the data in the program hasn't changed.
And this only looks at single threaded problems. If BadIdea was being used in a multi-threaded situation the errors can get really bizarre. What happens if findColors() and findOppositeColors() is called at the same time? Crash, all your hair falls out, Death, space and time collapse into a singularity and the universe is swallows up? Probably at least two of those. Threads are probably above your head now, but hopefully we can steer you away from doing bad things now so when you do get to threads those bad practices don't cause you real heartache.
Did you notice how careful you had to be when calling the methods? They overwrote each other, they shared memory possibly randomly, you had to remember the details of how it worked on the inside to make it work on the outside, changing the order in which things were called produce very big changes in the next lines down, and it only could only work in a single thread situation. Doing things like this will produce really brittle code that seems to fall apart whenever you touch it. These practices I showed contributed directly to the code being brittle.
While this might look like encapsulation it is the exact opposite because the technical details of how you wrote it have to be known to the caller. The caller has to write their code in a very particular way to make their code work, and they can't do it without knowing about the technical details of your code. This is often called a Leaky Abstraction because the class is suppose to hide the technical details behind an abstraction/interface, but the technical details leak out forcing the caller to change their behavior. Every solution has some degree of leaky-ness, but using any of the above techniques like these guarantees no matter what problem you are trying to solve it will be terribly leaky if you apply them. So let's look at the GoodIdea now.
Let's rewrite using local variables:
public class GoodIdea {
...
public List<Integer> findColor( Color value ) {
List<Integer> results = new ArrayList<Integer>();
for( int i = 0; i < map.length; i++ ) {
if( map[index] == value ) {
results.add( i );
}
}
return results;
}
public List<Integer> findOppositeColors( Color value ) {
List<Integer> results = new ArrayList<Integer>();
for( int i = 0; i < map.length; i++ ) {
if( map[index] != value ) {
results.add( i );
}
}
return results;
}
}
This fixes every problem we discussed above. I know I'm not keeping track of counter or returning it, but if I did I can create a new class and return that instead of List. Sometimes I use the following object to return multiple results quickly:
public class Pair<K,T> {
public K first;
public T second;
public Pair( K first, T second ) {
this.first = first;
this.second = second;
}
}
Long answer, but a very important topic.

Use instance variables when it's a core concept of your class. If you're iterating, recursing or doing some processing, then use local variables.
When you need to use two (or more) variables in the same places, it's time to create a new class with those attributes (and appropriate means to set them). This will make your code cleaner and help you think about problems (each class is a new term in your vocabulary).
One variable may be made a class when it is a core concept. For example real-world identifiers: these could be represented as Strings, but often, if you encapsulate them into their own object they suddenly start "attracting" functionality (validation, association to other objects, etc.)
Also (not entirely related) is object consistency - an object is able to ensure that its state makes sense. Setting one property may alter another. It also makes it far easier to alter your program to be thread-safe later (if required).

Local variables internal to methods are always prefered, since you want to keep each variable's scope as small as possible. But if more than one method needs to access a variable, then it's going to have to be an instance variable.
Local variables are more like intermediate values used to reach a result or compute something on the fly. Instance variables are more like attributes of a class, like your age or name.

The easy way: if the variable must be shared by more than one method, use instance variable, otherwise use local variable.
However, the good practice is to use as more local variables as possible. Why? For your simple project with only one class, there is no difference. For a project that includes a lot of classes, there is big difference. The instance variable indicates the state of your class. The more instance variables in your class, the more states this class can have and then, the more complex this class is, the hard the class is maintained or the more error prone your project might be. So the good practice is to use as more local variable as possible to keep the state of the class as simple as possible.

Short story: if and only if a variable needs to be accessed by more than one method (or outside of the class), create it as an instance variables. If you need it only locally, in a single method, it has to be a local variable.
Instance variables are more costly than local variables.
Keep in mind: instance variables are initialized to default values while local variables are not.

Declare variables to be scoped as narrowly as possible. Declare local variables first. If this isn't sufficient, use instance variables. If this isn't sufficient, use class (static) variables.
I you need to return more than one value return a composite structure, like an array or an object.

Try to think about your problem in terms of objects. Each class represents a different type of object. Instance variables are the pieces of data that a class needs to remember in order to work, either with itself or with other objects. Local variables should just be used intermediate calculations, data that you don't need to save once you leave the method.

Try not to return more than one value from your methods in first place. If you can't, and in some cases you really can't, then I would recommend encapsulating that in a class. Just in last case I would recommend changing another variable inside your class (an instance variable). The problem with the instance variables approach is that it increases side effects - for example, you call method A in your program and it modifies some instance(s) variable(s). Over time, that leads to increased complexity in your code and maintenance becomes harder and harder.
When I have to use instance variables, I try to make then final and initialize then in the class constructors, so side effects are minimized. This programming style (minimizing the state changes in your application) should lead to better code that is easier to maintain.

Generally variables should have minimal scope.
Unfortunately, in order to build classes with minimized variable scope, one often needs to do a lot of method parameter passing.
But if you follow that advice all the time, perfectly minimizing variable scope, you
may end up with a lot of redundancy and method inflexibility with all the required objects passed in and out of methods.
Picture a code base with thousands of methods like this:
private ClassThatHoldsReturnInfo foo(OneReallyBigClassThatHoldsCertainThings big,
AnotherClassThatDoesLittle little) {
LocalClassObjectJustUsedHere here;
...
}
private ClassThatHoldsReturnInfo bar(OneMediumSizedClassThatHoldsCertainThings medium,
AnotherClassThatDoesLittle little) {
...
}
And, on the other hand, imagine a code base with lots of instance variables like this:
private OneReallyBigClassThatHoldsCertainThings big;
private OneMediumSizedClassThatHoldsCertainThings medium;
private AnotherClassThatDoesLittle little;
private ClassThatHoldsReturnInfo ret;
private void foo() {
LocalClassObjectJustUsedHere here;
....
}
private void bar() {
....
}
As code increases, the first way may minimize variable scope best, but can easily lead to a lot of method parameters being passed around. The code will usually be more verbose and this can lead to a complexity as one refactors all these methods.
Using more instance variables can reduce the complexity of lots of method parameters being passed around and can give a flexibility to methods when you are frequently reorganizing methods for clarity. But it creates more object state that you have to maintain. Generally the advice is to do the former and refrain from the latter.
However, very often, and it may depend on the person, one can more easily manage state complexity compared with the thousands of extra object references of the first case. One may notice this when business logic within methods increases and organization needs to change to keep order and clarity.
Not only that. When you reorganize your methods to keep clarity and make lots of method parameter changes in the process, you end up with lots of version control diffs which is not so good for stable production quality code. There is a balance. One way causes one kind of complexity. The other way causes another kind of complexity.
Use the way that works best for you. You will find that balance over time.
I think this young programmer has some insightful first impressions for low maintenance code.

Use instance variables when
If two functions in the class need the same value, then make it an instance variable
or
If the state is not expected to change, make it an instance variable. For example: immutable object, DTO, LinkedList, those with final variables
or
If it is an underlying data on whom actions are performed. For example: final in arr[] in the PriorityQueue.java source code file
or
Even if it is used only once and state is expected to change, make it an instance if it is used only once by a function whose parameter list should be empty. For example: HTTPCookie.java Line: 860 hashcode() function uses 'path variable'.
Similarly, use a local variable when none of these conditions match, specifically if the role of the variable would end after the stack is popped off. For example: Comparator.compare(o1, o2);

Declare an object inside or outside a loop?

Is there any performance penalty for the following code snippet?
for (int i=0; i<someValue; i++)
{
Object o = someList.get(i);
o.doSomething;
}
Or does this code actually make more sense?
Object o;
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
If in byte code these two are totally equivalent then obviously the first method looks better in terms of style, but I want to make sure this is the case.

In today's compilers, no. I declare objects in the smallest scope I can, because it's a lot more readable for the next guy.

To quote Knuth, who may be quoting Hoare:
Premature optimization is the root of all evil.
Whether the compiler will produce marginally faster code by defining the variable outside the loop is debatable, and I imagine it won't. I would guess it'll produce identical bytecode.
Compare this with the number of errors you'll likely prevent by correctly-scoping your variable using in-loop declaration...

There's no performance penalty for declaring the Object o within the loop.
The compiler generates very similar bytecode and makes the correct optimizations.
See the article Myth - Defining loop variables inside the loop is bad for performance for a similar example.

You can disassemble the code with javap -c and check what the compiler actually emits. On my setup (java 1.5/mac compiled with eclipse), the bytecode for the loop is identical.

The first code is better as it restricts scope of o variable to the for block. From a performance perspective, it might not have any effects in Java, but it might have in lower level compilers. They might put the variable in a register if you do the first.
In fact, some people might think that if the compiler is dumb, the second snippet is better in terms of performance. This is what some instructor told me at the college and I laughed at him for this suggestion! Basically, compilers allocate memory on the stack for the local variables of a method just once at the start of the method (by adjusting the stack pointer) and release it at the end of method (again by adjusting the stack pointer, assuming it's not C++ or it doesn't have any destructors to be called). So all stack-based local variables in a method are allocated at once, no matter where they are declared and how much memory they require. Actually, if the compiler is dumb, there is no difference in terms of performance, but if it's smart enough, the first code can actually be better as it'll help the compiler understand the scope and the lifetime of the variable! By the way, if it's really smart, there should no absolutely no difference in performance as it infers the actual scope.
Construction of a object using new is totally different from just declaring it, of course.
I think readability is more important that performance and from a readability standpoint, the first code is definitely better.

I've got to admit I don't know java. But are these two equivalent? Are the object lifetimes the same? In the first example, I assume (not knowing java) that o will be eligible for garbage collection immediately the loop terminates.
But in the second example surely o won't be eligible for garbage collection until the outer scope (not shown) is exited?

Don't prematurely optimize. Better than either of these is:
for(Object o : someList) {
o.doSomething();
}
because it eliminates boilerplate and clarifies intent.
Unless you are working on embedded systems, in which case all bets are off. Otherwise, don't try to outsmart the JVM.

I've always thought that most compilers these days are smart enough to do the latter option. Assuming that's the case, I would say the first one does look nicer as well. If the loop gets very large, there's no need to look all around for where o is declared.

These have different semantics. Which is more meaningful?
Reusing an object for "performance reasons" is often wrong.
The question is what does the object "mean"? WHy are you creating it? What does it represent? Objects must parallel real-world things. Things are created, undergo state changes, and report their states for reasons.
What are those reasons? How does your object model and reflect those reasons?

To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott

In both cases the type info for the object o is determined at compile time.In the second instance, o is seen as being global to the for loop and in the first instance, the clever Java compiler knows that o will have to be available for as long as the loop lasts and hence will optimise the code in such a way that there wont be any respecification of o's type in each iteration.
Hence, in both cases, specification of o's type will be done once which means the only performance difference would be in the scope of o. Obviously, a narrower scope always enhances performance, therefore to answer your question: no, there is no performance penalty for the first code snip; actually, this code snip is more optimised than the second.
In the second snip, o is being given unnecessary scope which, besides being a performance issue, can be also a security issue.

The first makes far more sense. It keeps the variable in the scope that it is used in. and prevents values assigned in one iteration being used in a later iteration, this is more defensive.
The former is sometimes said to be more efficient but any reasonable compiler should be able to optimise it to be exactly the same as the latter.

As someone who maintains more code than writes code.
Version 1 is much preferred - keeping scope as local as possible is essential for understanding. Its also easier to refactor this sort of code.
As discussed above - I doubt this would make any difference in efficiency. In fact I would argue that if the scope is more local a compiler may be able to do more with it!

When using multiple threads (if your doing 50+) then i found this to be a very effective way of handling ghost thread problems:
Object one;
Object two;
Object three;
Object four;
Object five;
try{
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
}catch(e){
e.printstacktrace
}
finally{
one = null;
two = null;
three = null;
four = null;
five = null;
System.gc();
}

The answer depends partly on what the constructor does and what happens with the object after the loop, since that determines to a large extent how the code is optimized.
If the object is large or complex, absolutely declare it outside the loop. Otherwise, the people telling you not to prematurely optimize are right.

I've actually in front of me a code which looks like this:
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
So, relying on compiler abilities, I can assume there would be only one stack allocation for i and one for append. Then everything would be fine except the duplicated code.
As a side note, java applications are known to be slow. I never tried to do profiling in java but I guess the performance hit comes mostly from memory allocation management.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.