For years, I've been using named blocks to limit the scope of temporary variables. I've never seen this done anywhere else, which makes me wonder if this is a bad idea. Especially since the Eclipse IDE flags these as warnings by default.
I've used this to good effect, I think, in my own code. But since it is un-idiomatic to the point where good programmers will distrust it when they see it, I really have two ways to go from here:
avoid doing it, or
promote it, with the hope that it will become an idiom.
Example (within a larger method):
final Date nextTuesday;
initNextTuesday: {
GregorianCalendar cal = new GregorianCalendar();
... // About 5-10 lines of setting the calendar fields
nextTuesday = cal.getTime();
}
Here I'm using a GregorianCalendar just to initialize a date, and I want to make sure that I don't accidentally reuse it.
Some people have commented that you don't actually need to name the block. While that's true, a raw block looks even more like a bug, as the intent is unclear. Furthermore, naming something encourages you to think about the intention of the block. The goal here is to identify distinct sections of code, not to give every temporary variable its own scope.
Many people have commented that it's best to go straight to small methods. I agree that this should be your first instinct. However, there may be several mitigating factors:
To even consider a named block, the code should be short, one-off code that will never be called elsewhere.
A named block is a quick way to organize an oversized method without creating a one-off method with a dozen parameters. This is especially true when a class is in flux, and the inputs are likely to change from version to version.
Creating a new method encourages its reuse, which may be ill-advised if the use cases aren't well-established. A named block is easier (psychologically, at least) to throw away.
Especially for unit tests, you may need to define a dozen different objects for one-off assertions, and they are just different enough that you can't (yet) find a way to consolidate them into a small number of methods, nor can you think of a way to distinguish them with names that aren't a mile long.
Advantages of using the named scope:
Can't accidentally reuse temporary variables
Limited scope gives garbage collector and JIT compiler more information about programmer intent
Block name provides a comment on a block of code, which I find more readable than open-ended comments
Makes it easier to refactor code out of a big method into little methods, or vice versa, since the named block is easier to separate than unstructured code.
Disadvantages:
Not idiomatic: programmers who haven't seen this use of named blocks (i.e. everyone but me) assume it's buggy, since they can't find references to the block name. (Just like Eclipse does.) And getting something to become idiomatic is an uphill battle.
It can be used as an excuse for bad programming habits, such as:
Making huge, monolithic methods where several small methods would be more legible.
Layers of indentation too deep to read easily.
Note: I've edited this question extensively, based on some thoughtful responses. Thanks!
I'd just go straight for refactoring into smaller methods. If a method is big enough that it needs breaking up like this, it really needs breaking up into multiple methods if at all possible.
While limiting scope is nice, this isn't really what named blocks are for. It's unidiomatic, which is very rarely a good thing.
If this was bad, then why is this a feature in the language! It's got a purpose, and you've found it.
I often write code exactly as in your example. When you want to initialize a variable, and there's a little calculation that needs doing to work out what that should be, and that involves a couple of variables... then you don't want those variables hanging around for the entire scope of your function, then a little scope to contain the initialization works great.
Mini scopes are an easy way to break code into 'paragraphs'. If you split into methods you can make the code harder to navigate when those methods don't get called from anywhere else and have a serial kind of order in which they need to be executed.
It's always a balance, but if you think it's going to be easiest to maintain and it actually adds value to a future reader of your code if its all inline, then go for it.
There are no hard and fast rules. I get a little fed up sometimes with co-workers who excessively put everything into its own method or class or file, and this becomes a nightmare to navigate. There's a nice balance somewhere!
Sometimes I use unnamed blocks to isolate mutable things needed to prepare some immutable thing. Instead of having a label I put the Block under the immutable variable declaration.
final String example;
{
final StringBuilder sb = new StringBuilder();
for(int i = 0; i < 100; i++)
sb.append(i);
example = sb.toString();
}
When I find some other use for the block, or just think that it's in the way, I turn it into a method.
Using blocks to limit scope is a good technique in my book.
But since you're using the label to do the work of a comment, why not just use an actual comment instead? This would remove the confusion about the unreferenced label.
This is the 1st time I am seeing someone else using blocks. whew! I thought I was the only one. I know that I didn't invent it -- remembered reading it somewhere -- possibly from my previous C++ world.
I don't use the labels, though and just comment what I'm doing.
I don't agree with all the guys that are asking you extract it into a method. Most of the things we don in such blocks aren't really reusable blocks. It makes sense in a big initialization AND YES, I've used blocks to prevent COPY/PASTE errors.
BR,
~A
If you have 5-10 lines of code that can safely be put into a block like that, the same code could just as well be extracted into a method.
This might seem like it's only a semantic difference, but at least with extracting into a method then you would gain the benefit of the ability of re-use.
Just because they exist doesn't mean they should be used. Most of the advantages gained from using named blocks are better gained by using a new private method.
You won't be able to use the temporary variables declared in the new method
The GC and JIT Compiler will glean the same info by using a new method
Using a descriptive name for the new method (using "private Date initNextTuesday()" in your case) will allow for the self commenting code advantage
No need to refactor code when you have already "pre-factored" it
In addition to these benefits, you also get code reuse benefits and it will shorten your long methods.
I'd use a block with a comment rather adding a label there.
When I see a label, I can't assume that nothing else is referencing the block.
If I change the behavior of the block, then the label name may not be appropriate any more. But I can't just reach out and change it: I'll have to look through the rest of the method to determine what label is calling out to the block. At which point I'll figure out that it's an unreferenced label.
Using a comment is clearer in this instance, because it describes the behavior of the block without imposing any extra work on the part of the maintainer.
It's a good technique in my book. Managing large numbers of throwaway methods is evil and the reasons you're providing for naming the blocks are good.
What does the generated bytecode look like? That'd be my only hesitation. I suspect it strips away the block name and might even benefit from greater optimizations. But you'd have to check.
Sorry for resurrecting this, but I didn't see anyone mention what I consider to be a very important point. Let's look at your example:
final Date nextTuesday;
initNextTuesday: {
GregorianCalendar cal = new GregorianCalendar();
... // About 5-10 lines of setting the calendar fields
nextTuesday = cal.getTime();
}
Including this initialization logic here makes it easier to understand if you're reading the file from top to bottom and care about every line. But think about how you read code. Do you start reading from the top of a file and continue to the bottom? Of course not! The only time you would ever do that is during a code review. Instead, you probably have a starting point based on previous knowledge, a stack trace, etc. Then you drill further down/up through the execution path until you find what you're looking for. Optimize for reading based on execution path, not code reviews.
Does the person reading the code that uses nextTuesday really want to read about how it's initialized? I would argue that the only information that they need is that there's a Date corresponding to next Tuesday. All of this information is contained in its declaration. This is a perfect example of code that should be broken into a private method, because it isn't necessary to understand the logic that the reader cares about.
final Date nextTuesday;
initNextTuesday: {
GregorianCalendar cal = new GregorianCalendar();
//1
//2
//3
//4
//5
nextTuesday = cal.getTime();
}
vs:
final Date nextTuesday = getNextTuesday();
Which would you rather read on your way through a module?
Name Blocks helps: Using break as a Form of Goto
Using break as a civilized form of goto.
class Break {
public static void main(String args[]) {
boolean t = true;
first: {
second: {
third: {
System.out.println("Before the break.");
if (t)
break second; // break out of second block
System.out.println("This won't execute");
}
System.out.println("This won't execute");
}
System.out.println("This is after second block.");
}
}
}
Using break to exit from nested loops
class BreakLoop4 {
public static void main(String args[]) {
outer: for (int i = 0; i < 3; i++) {
System.out.print("Pass " + i + ": ");
for (int j = 0; j < 100; j++) {
if (j == 10)
break outer; // exit both loops
System.out.print(j + " ");
}
System.out.println("This will not print");
}
System.out.println("Loops complete.");
}
}
Source Link
I have done this in some of my c#. I didn't know you could name the blocks though, I'll have to try that see if it works in c# too.
I think the scope block can be a nice idea, because you can encapsulate code specific to something within a block of code, where you might not want to split it out into its own function.
As for the disadvantage of nesting them, I see that as more of a fault of a programmer not of scope blocks themselves.
Named scopes are technically ok here, it's just they aren't used in this way very often. Therefore, when someone else comes to maintain your code in the future it may not be immediately obvious why they are there. IMHO a private helper method would be a better choice...
I love the idea of using block to limit var scope.
So many times I was confused by short-lived vars given large scope which should go away immediately after use. Long method + many non-final vars make it difficult to reason about the coder's intention, especially when comments are rare. Considering much of the logic I see in a method were like below
Type foo(args..){
declare ret
...
make temp vars to add information on ret
...
make some more temp vars to add info on ret. not much related to above code. but previously declared vars are still alive
...
return ret
}
if vars can have smaller scope than the entire method body, I can quickly forget most of them (good thing).
Also I agree that too many or too few private things leads to spaghetti code.
Actually what I was looking for was something like nested method in functional languages, and seems its cousin in Java is a {BLOCK} (inner class and labmda expression are not for this..).
However, I would prefer to use a unnamed block since this may be misleading to people trying to find the reference to the label, plus I can explain better with commented block.
For using a private method, I would consider it as the next step of using blocks.
Related
public Void traverseQuickestRoute(){ // Void return-type from interface
findShortCutThroughWoods()
.map(WoodsShortCut::getTerrainDifficulty)
.ifPresent(this::walkThroughForestPath) // return in this case
if(isBikePresent()){
return cycleQuickestRoute()
}
....
}
Is there a way to exit the method at the ifPresent?
In case it is not possible, for other people with similar use-cases: I see two alternatives
Optional<MappedRoute> woodsShortCut = findShortCutThroughWoods();
if(woodsShortCut.isPresent()){
TerrainDifficulty terrainDifficulty = woodsShortCut.get().getTerrainDifficulty();
return walkThroughForrestPath(terrainDifficulty);
}
This feels more ugly than it needs to be and combines if/else with functional programming.
A chain of orElseGet(...) throughout the method does not look as nice, but is also a possibility.
return is a control statement. Neither lambdas (arrow notation), nor method refs (WoodsShortcut::getTerrainDifficulty) support the idea of control statements that move control to outside of themselves.
Thus, the answer is a rather trivial: Nope.
You have to think of the stream 'pipeline' as the thing you're working on. So, the question could be said differently: Can I instead change this code so that I can modify how this one pipeline operation works (everything starting at findShortCut() to the semicolon at the end of all the method invokes you do on the stream/optional), and then make this one pipeline operation the whole method.
Thus, the answer is: orElseGet is probably it.
Disappointing, perhaps. 'functional' does not strike me as the right answer here. The problem is, there are things for/if/while loops can do that 'functional' cannot do. So, if you are faced with a problem that is simpler to tackle using 'a thing that for/if/while is good at but functional is bad at', then it is probably a better plan to just use for/if/while then.
One of the core things lambdas can't do are about the transparencies. Lambdas are non-transparant in regards to these 3:
Checked exception throwing. try { list.forEach(x -> throw new IOException()); } catch (IOException e) {} isn't legal even though your human brain can trivially tell it should be fine.
(Mutable) local variables. int x = 5; list.forEach(y -> x += y); does not work. Often there are ways around this (list.mapToInt(Integer::intValue).sum() in this example), but not always.
Control flow. list.forEach(y -> {if (y < 0) return y;}); does not work.
So, keep in mind, you really have only 2 options:
Continually retrain yourself to not think in terms of such control flow. You find orElseGet 'not as nice'. I concur, but if you really want to blanket apply functional to as many places as you can possibly apply it, the whole notion of control flow out of a lambda needs not be your go-to plan, and you definitely can't keep thinking 'this code is not particularly nice because it would be simpler if I could control flow out', you're going to be depressed all day programming in this style. The day you never even think about it anymore is the day you have succeeded in retraining yourself to 'think more functional', so to speak.
Stop thinking that 'functional is always better'. Given that there are so many situations where their downsides are so significant, perhaps it is not a good idea to pre-suppose that the lambda/methodref based solution must somehow be superior. Apply what seems correct. That should often be "Actually just a plain old for loop is fine. Better than fine; it's the right, most elegant1 answer here".
[1] "This code is elegant" is, of course, a non-falsifiable statement. It's like saying "The Mona Lisa is a pretty painting". You can't make a logical argument to prove this and it is insanity to try. "This code is elegant" boils down to saying "I think it is prettier", it cannot boil down to an objective fact. That also means in team situations there's no point in debating such things. Either everybody gets to decide what 'elegant' is (hold a poll, maybe?), or you install a dictator that decrees what elegance is. If you want to fix that and have meaningful debate, the term 'elegant' needs to be defined in terms of objective, falsifiable statements. I would posit that things like:
in face of expectable future change requests, this style is easier to modify
A casual glance at code leaves a first impression. Whichever style has the property that this first impression is accurate - is better (in other words, code that confuses or misleads the casual glancer is bad). Said even more differently: Code that really needs comments to avoid confusion is worse than code that is self-evident.
this code looks familiar to a wide array of java programmers
this code consists of fewer AST nodes (the more accurate from of 'fewer lines = better')
this code has simpler semantic hierarchy (i.e. fewer indents)
Those are the kinds of things that should define 'elegance'. Under almost all of those definitions, 'an if statement' is as good or better in this specific case!
For example:
public Void traverseQuickestRoute() {
return findShortCutThroughWoods()
.map(WoodsShortCut::getTerrainDifficulty)
.map(this::walkThroughForestPath)
.orElseGet(() -> { if (isBikePresent()) { return cycleQuickestRoute(); } });
}
There is Optional#ifPresentOrElse with an extra Runnable for the else case. Since java 9.
public Void traverseQuickestRoute() { // Void return-type from interface
findShortCutThroughWoods()
.map(WoodsShortCut::getTerrainDifficulty)
.ifPresentOrElse(this::walkThroughForestPath,
this::alternative);
return null;
}
private void alternative() {
if (isBikePresent()) {
return cycleQuickestRoute()
}
...
}
I would split the method as above. Though for short code () -> { ... } might be readable.
Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.
Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.
The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.
It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..
In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.
I just read this thread Critical loop containing many "if" whose output is constant : How to save on condition tests?
and this one Constant embedded for loop condition optimization in C++ with gcc which are exactly what I would like to do in Java.
I have some if conditions called many times, the conditions are composed of attributes define at initialization and which won't change.
Will the Javac optimize the bytecode by removing the unused branches of the conditions avoiding to spend time testing them?
Do I have to define the attributes as final or is it useless?
Thanks for you help,
Aurélien
Java compile time optimization is pretty lacking. If you can use a switch statement it can probably do some trivial optimizations. If the number of attributes is very large then a HashMap is going to be your best bet.
I'll close by saying that this sort of thing is very very rarely a bottleneck and trying to prematurely optimize it is counterproductive. If your code is, in fact, called a lot then the JIT optimizer will do its best to make your code run faster. Just say what you want to happen and only worry about the "how" when you find that's actually worth the time to optimize it.
In OO languages, the solution is to use delegation or the command pattern instead of if/else forests.
So your attributes need to implement a common interface like IAttribute which has a method run() (or make all attributes implement Runnable).
Now you can simply call the method without any decisions in the loop:
for(....) {
attr.run();
}
It's a bit more complex if you can't add methods to your attributes. My solution in this case is using enums and an EnumMap which contains the runnables. Access to an EnumMap is almost like an array access (i.e. O(1)).
for(....) {
map.get(attr).run();
}
I don't know about Java specifics regarding this, but you might want to look into a technique called Memoization which would allow you to look up results for a function in a table instead of calling the function. Effectively, memoization makes your program "remember" results of a function for a given input.
Try replacing the if with runtime polymorphism. No, that's not as strange as you think.
If, for example you have this:
for (int i=0; i < BIG_NUMBER; i++) {
if (calculateSomeCondition()) {
frobnicate(someValue);
} else {
defrobnicate(someValue);
}
}
then replace it with this (Function taken from Guava, but can be replaced with any other fitting interface):
Function<X> f;
if (calculateSomeCondition()) {
f = new Frobnicator();
else {
f = new Defrobnicator();
}
for int (i=0; i < BIG_NUMBER; i++) {
f.apply(someValue);
}
Method calls are pretty highly optimized on most modern JVMs even (or especially) if there are only a few possible call targets.
Some people consider multiple return statements as bad programming style. While this is true for larger methods, I'm not sure if it is acceptable for short ones. But there is another question: Should else explicitly be written, if there is a return statement in the previous if?
Implicit else:
private String resolveViewName(Viewable viewable) {
if(viewable.isTemplateNameAbsolute())
return viewable.getTemplateName();
return uriInfo.getMatchedResources().get(0).getClass().toString();
}
Explicit else:
private String resolveViewName(Viewable viewable) {
if(viewable.isTemplateNameAbsolute())
return viewable.getTemplateName();
else
return uriInfo.getMatchedResources().get(0).getClass().toString();
}
Technically else is not necessary here, but it make the sense more obvious.
And perhaps the cleanest approach with a single return:
private String resolveViewName(Viewable viewable) {
String templateName;
if(viewable.isTemplateNameAbsolute())
templateName = viewable.getTemplateName();
else
templateName = uriInfo.getMatchedResources().get(0).getClass().toString();
return templateName;
}
Which one would you prefer? Other suggestions?
Other obvious suggestion: use the conditional operator.
private String resolveViewName(Viewable viewable) {
return viewable.isTemplateNameAbsolute()
? viewable.getTemplateName()
: uriInfo.getMatchedResources().get(0).getClass().toString();
}
For cases where this isn't viable, I'm almost certainly inconsistent. I wouldn't worry too much about it, to be honest - it's not the kind of thing where the readability is like to be significantly affected either way, and it's unlikely to introduce bugs.
(On the other hand, I would suggest using braces for all if blocks, even single statement ones.)
i prefer the cleanest approach with single return.To me code is readable, maintainable and not confusing.Tomorrow if you need to add some lines to the if or else block it is easy.
1.) code should never be clever.
The "single point of exit" dogma comes from the days of Structured Programming.
In its day, structured programming was a GOOD THING, especially as an alternative to the GOTO ridden spaghetti code that was prevalent in 1960's and 1970's vintage Fortran and Cobol code. But with the popularity of languages such as Pascal, C and so on with their richer range of control structures, Structured Programming has been assimilated into mainstream programming, and certain dogmatic aspects have fallen out of favor. In particular, most developers are happy to have multiple exits from a loop or method ... provided that it makes the code easier to understand.
My personal feeling is that in this particular case, the symmetry of the second alternative makes it easiest to understand, but the first alternative is almost as readable. The last alternative strikes me as unnecessarily verbose, and the least readable.
But #Jon Skeet pointed out that there is a far more significant stylistic issue with your code; i.e. the absence of { } blocks around the 'then' and 'else' statements. To me the code should really be written like this:
private String resolveViewName(Viewable viewable) {
if (viewable.isTemplateNameAbsolute()) {
return viewable.getTemplateName();
} else {
return uriInfo.getMatchedResources().get(0).getClass().toString();
}
}
This is not just an issue of code prettiness. There is actually a serious point to always using blocks. Consider this:
String result = "Hello"
if (i < 10)
result = "Goodbye";
if (j > 10)
result = "Hello again";
At first glance, it looks like result will be "Hello again" if i is less than 10 and j is greater than 10. In fact, that is a misreading - we've been fooled by incorrect indentation. But if the code had been written with { } 's around the then parts, it would be clear that the indentation was wrong; e.g.
String result = "Hello"
if (i < 10) {
result = "Goodbye";
}
if (j > 10) {
result = "Hello again";
}
As you see, the first } stands out like a sore thumb and tells us not to trust the indentation as a visual cue to what the code means.
I usually prefer the first option since it's the shortest.
And I think that any decent programmer should realize how it works without me having to write the else or using a single return at the end.
Plus there are cases in long methods where you might need to do something like
if(!isValid(input)) { return null; }// or 0, or false, or whatever
// a lot of code here working with input
I find it's even clearer done like this for these types of methods.
Depends on the intention. If the first return is a quick bail-out, then I'd go without the else; if OTOH it's more like a "return either this or that" scenario, then I'd use else. Also, I prefer an early return statement over endlessly nested if statements or variables that exist for the sole purpose of remembering a return value. If your logic were slightly complex, or even as it is now, I'd consider putting the two ways of generating the return value into dedicated functions, and use an if / else to call either.
I prefer multiple returns in an if-else structure when the size of both statements is about equal, the code looks more balanced that way. For short expressions I use the ternary operator. If the code for one test is much shorter or is an exceptional case, I might use a single if with the rest of the code remaining in the method body.
I try to avoid modifying variables as much as possible, because I think that makes the code much harder to follow than multiple exits from a method.
Keep the lingo consistent and readable for the lowest common denominated programmer who might have to revisit the code in the future.
Its only a few extra letters to type the else, and makes no difference to anything but legibility.
I prefer the first one.
Or... you can use if return else return for equally important bifurcations, and if return return for special cases.
When you have assertions (if p==null return null) then the first one is the most clear by far. If you have equally weighted options... I find fine to use the explicit else.
It's completely a matter of personal preference - I've literally gone through phases of doing all 4 of those option (including the one Jon Skeet posted) - none of them are wrong, and I've never experienced any drawbacks as a result of using either of them.
The stuff about only one return statement dates from the 1970s when Dijkstra and Wirth were sorting out structured programming. They applied it with great success to control structures, which have now settled down according to their prescription of one entry and one exit. Fortran used to have multiple entries to a subroutine (or possibly function, sorry, about 35 years since I wrote any), and this is a feature I've never missed, indeed I don't think I ever used it.
I've never actually encountered this 'rule' as applied to methods outside academia, and I really can't see the point. You basically have to obfuscate your code considerably to obey the rule, with extra variables and so on, and there's no way you can convince me that's a good idea. Curiously enough, if you write it the natural way as per your first option the compiler usually generates the code according to the rule anyway ... so you can argue that the rule is being obeyed: just not by you ;-)
Sure, people have a lot to say about programming style, but don't be so concerned about something relatively trivial to your program's purpose.
Personally, I like to go without the else. If anybody is going through your code, chances are high he won't be too confused without the else.
I prefer the second option because to me it is the quickest to read.
I would avoid the third option because it doesn't add clarity or efficiency.
The first option is fine too, but at least I would put a blank line between the first bit (the if and its indented return) and the second return statement.
In the end, it comes to down to personal preference (as so many things in programming style).
Considering multiple return statements "bad style" is a long, long discredited fallacy. They can make the code far clearner and more maintainable than explicit return value variables. Especially in larger methods.
In your example, I'd consider the second option the cleanest because the symmetrical structure of the code reflects its semantics, and it's shorter and avoids the unnecessary variable.
I'm cleaning up Java code for someone who starts their functions by declaring all variables up top, and initializing them to null/0/whatever, as opposed to declaring them as they're needed later on.
What are the specific guidelines for this? Are there optimization reasons for one way or the other, or is one way just good practice? Are there any cases where it's acceptable to deviate from whatever the proper way of doing it is?
Declare variables as close to the first spot that you use them as possible. It's not really anything to do with efficiency, but makes your code much more readable. The closer a variable is declared to where it is used, the less scrolling/searching you have to do when reading the code later. Declaring variables closer to the first spot they're used will also naturally narrow their scope.
The proper way is to declare variables exactly when they are first used and minimize their scope in order to make the code easier to understand.
Declaring variables at the top of functions is a holdover from C (where it was required), and has absolutely no advantages (variable scope exists only in the source code, in the byte code all local variables exist in sequence on the stack anyway). Just don't do it, ever.
Some people may try to defend the practice by claiming that it is "neater", but any need to "organize" code within a method is usually a strong indication that the method is simply too long.
From the Java Code Conventions, Chapter 6 on Declarations:
6.3 Placement
Put declarations only at the beginning
of blocks. (A block is any code
surrounded by curly braces "{" and
"}".) Don't wait to declare variables
until their first use; it can confuse
the unwary programmer and hamper code
portability within the scope.
void myMethod() {
int int1 = 0; // beginning of method block
if (condition) {
int int2 = 0; // beginning of "if" block
...
}
}
The one exception to the rule is
indexes of for loops, which in Java
can be declared in the for statement:
for (int i = 0; i < maxLoops; i++) { ... }
Avoid local declarations that hide
declarations at higher levels. For
example, do not declare the same
variable name in an inner block:
int count;
...
myMethod() {
if (condition) {
int count = 0; // AVOID!
...
}
...
}
If you have a kabillion variables used in various isolated places down inside the body of a function, your function is too big.
If your function is a comfortably understandable size, there's no difference between "all up front" and "just as needed".
The only not-up-front variable would be in the body of a for statement.
for( Iterator i= someObject.iterator(); i.hasNext(); )
From Google Java Style Guide:
4.8.2.2 Declared when needed
Local variables are not habitually declared at the start of their
containing block or block-like construct. Instead, local variables are
declared close to the point they are first used (within reason), to
minimize their scope. Local variable declarations typically have
initializers, or are initialized immediately after declaration.
Well, I'd follow what Google does, on a superficial level it might seem that declaring all variables at the top of the method/function would be "neater", it's quite apparent that it'd be beneficial to declare variables as necessary. It's subjective though, whatever feels intuitive to you.
I've found that declaring them as-needed results in fewer mistakes than declaring them at the beginning. I've also found that declaring them at the minimum scope possible to also prevent mistakes.
When I looked at the byte-code generated by the location of the declaration few years ago, I found they were more-or-less identical. There were ocassionally differences depending on when they were assigned. Even something like:
for(Object o : list) {
Object temp = ...; //was not "redeclared" every loop iteration
}
vs
Object temp;
for(Object o : list) {
temp = ...; //nearly identical bytecoode, if not exactly identical.
}
Came out more or less identical
I am doing this very same thing at the moment. All of the variables in the code that I am reworking are declared at the top of the function. I've seen as I've been looking through this that several variables are declared but NEVER used or they are declared and operations are being done with them (ie parsing a String and then setting a Calendar object with the date/time values from the string) but then the resulting Calendar object is NEVER used.
I am going through and cleaning these up by taking the declarations from the top and moving them down in the function to a spot closer to where it is used.
Defining variable in a wider scope than needed hinders understandability quite a bit. Limited scope signals that this variable has meaning for only this small block of code and you can not think about when reading further. This is a pretty important issue because of the tiny short-term working memory that the brain has (it said that on average you can keep track of only 7 things). One less thing to keep track of is significant.
Similarly you really should try to avoid variables in the literal sense. Try to assign all things once, and declare them final so this is known to the reader. Not having to keep track whether something changes or not really cuts the cognitive load.
Principle: Place local variable declarations as close to their first use as possible, and NOT simply at the top of a method. Consider this example:
/** Return true iff s is a blah or a blub. */
public boolean checkB(String s) {
// Return true if s is a blah
... code to return true if s is a blah ...
// Return true if s is a blub. */
int helpblub= s.length() + 1;
... rest of code to return true is s is a blah.
return false;
}
Here, local variable helpblub is placed where it is necessary, in the code to test whether s is a blub. It is part of the code that implements "Return true is s is a blub".
It makes absolutely no logical sense to put the declaration of helpblub as the first statement of the method. The poor reader would wonder, why is that variable there? What is it for?
I think it is actually objectively provable that the declare-at-the-top style is more error-prone.
If you mutate-test code in either style by moving lines around at random (to simulate a merge gone bad or someone unthinkingly cut+pasting), then the declare-at-the-top style has a greater chance of compiling while functionally wrong.
I don't think declare-at-the-top has any corresponding advantage that doesn't come down to personal preference.
So assuming you want to write reliable code, learn to prefer doing just-in-time declaration.
Its a matter of readability and personal preference rather than performance. The compiler does not care and will generate the same code anyway.
I've seen people declare at the top and at the bottom of functions. I prefer the top, where I can see them quickly. It's a matter of choice and preference.