Function call with i + 1, ++i implementation difference?

Function call with i + 1, ++i implementation difference? - java

Just a quickie about Java, and let me say this first: I'm not worried about this due to performance optimization, just curious about the behind the scenes stuff in general. Since I found nothing to this I assume they are equivalent in every respect, but just to be sure:
In a recursive function foo(int i), or I guess in generel in a function call, is there a difference between foo(++i) and foo(i + 1)? I know the result is the same, but I thought the one with ++i might do some extrawork in the background because it is an assignment? So maybe it creates an (anonymous field) = i in the background which gets incremented and then referenced?
Or does the compiler just optimize ++i away to i + 1?
My understanding is a bit fuzzy as you see, so help would be appreciated.
EDIT to clarify:
At first this was due to a recursive function e.g.
public static int foo(int i) {
return (i >= 4) ? 0
: i + foo(++i);
}
The "functions in general"-part came in the middle of writing the question and, as remarked, makes this ambiguous etc. Hope this clarifies everything.

If the answer is not about semantics but about performance at the machine level after the IR has been optimized, translated into machine instructions, I would say, "no unless measured and proven otherwise."
It's very unlikely that there will be a performance difference after all the optimizations are made between f(++i) and f(i+1), assuming your code is such that you can actually consider these as alternatives (assuming the state of i ceases to become relevant after the function call).
It's just basic hardware and compiler design, the cost of instructions for memory already stored in a register, the simplicity of optimizing away this code to the same machine code in even a semi-competent compiler (and I'd think Java's JIT would at least be that). It's a very basic nature of a compiler to recognize unnecessary side effects and eliminate them outright (actually one of the reasons why artificial micro-level benchmarks can be misleading unless they're written very carefully in a way that prevents the optimizer from skipping certain code outright). Among the easiest side effects to eliminate is a case like this, where we're incrementing a variable, i but not depending on the state change afterwards.
It seems unlikely to have any real impact on performance. Of course the ultimate way is to look at the final resulting machine code (not the bytecode IR but the actual final machine code) or measure and profile it. But I'd be quite shocked, to say the least, if one is faster than the other, and it would tend to make me think that the compiler is doing a pretty poor job in either instruction selection or register allocation.
That said, if one actually was (remote chance) faster than the other, I think you have it backwards. ++i would likely require less work since it can just increment a value in a register. ++i is a unary operation on one operand, and that works well with registers which are mutable. i + 1 is an expression that requires that i be treated as immutable and would call for a second register (but only in a really horrible kind of toy compiler scenario that didn't optimize anything, though in that case we're also compiling this expression in the context of a function call and have to consider things like stack spills, so even a toy compiler might make these somewhat equivalent).

There is no difference in the execution of foo(++i) and foo(i + 1): the value of the parameter passed to foo is the same.
However, in the former case, the local variable i is increased by one; in the latter case, it retains its original value.
So if you make the same method call twice, there is a difference:
foo(++i);
foo(++i); // Invoke with i 1 greater than before.
is different from
foo(i + 1);
foo(i + 1); // Invoke with the same argument
As such, in general, the compiler cannot optimize anything away, because they have different semantics.

Beside the other comments about the value of i after using the prefix increment. There is a small difference in the generated bytecode. See below the simplified results.
foo(++i) compiled to
iinc
iload_1
invokestatic
foo(i + 1) compiled to
iload_1
iconst_1
iadd
invokestatic

yes, ++i consists of two operations addition and assignment.Since, you don't need i later, i+1 is better to use rather than ++i.

If i is used after foo(++i), the value will have been changed
If you process i after foo(i + 1), the value will not have been changed
foo(int i)
{
//process stuff A
foo(i + 1);
//process stuff B
}
bar(int i)
{
//process stuff A
bar(i + 1);
//process stuff B
}
Depending on what is in //process stuff B, you may either want to use i + 1 and have the value of i be the same, or ++i and have i be incremented

Related

Creating a variable instead of multiple getter usage - which is better for overall performance? [duplicate]

In the following piece of code we make a call listType.getDescription() twice:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
if (listType.getDescription() != null)
{
children.add(new SelectItem( listType.getId() , listType.getDescription()));
}
}
I would tend to refactor the code to use a single variable:
for (ListType listType: this.listTypeManager.getSelectableListTypes())
{
String description = listType.getDescription();
if (description != null)
{
children.add(new SelectItem(listType.getId() ,description));
}
}
My understanding is the JVM is somehow optimized for the original code and especially nesting calls like children.add(new SelectItem(listType.getId(), listType.getDescription()));.
Comparing the two options, which one is the preferred method and why? That is in terms of memory footprint, performance, readability/ease, and others that don't come to my mind right now.
When does the latter code snippet become more advantageous over the former, that is, is there any (approximate) number of listType.getDescription() calls when using a temp local variable becomes more desirable, as listType.getDescription() always requires some stack operations to store the this object?

I'd nearly always prefer the local variable solution.
Memory footprint
A single local variable costs 4 or 8 bytes. It's a reference and there's no recursion, so let's ignore it.
Performance
If this is a simple getter, the JVM can memoize it itself, so there's no difference. If it's a expensive call which can't be optimized, memoizing manually makes it faster.
Readability
Follow the DRY principle. In your case it hardly matters as the local variable name is character-wise as about as long as the method call, but for anything more complicated, it's readability as you don't have to find the 10 differences between the two expressions. If you know they're the same, so make it clear using the local variable.
Correctness
Imagine your SelectItem does not accept nulls and your program is multithreaded. The value of listType.getDescription() can change in the meantime and you're toasted.
Debugging
Having a local variable containing an interesting value is an advantage.
The only thing to win by omitting the local variable is saving one line. So I'd do it only in cases when it really doesn't matter:
very short expression
no possible concurrent modification
simple private final getter

I think the way number two is definitely better because it improves readability and maintainability of your code which is the most important thing here. This kind of micro-optimization won't really help you in anything unless you writing an application where every millisecond is important.

I'm not sure either is preferred. What I would prefer is clearly readable code over performant code, especially when that performance gain is negligible. In this case I suspect there's next to no noticeable difference (especially given the JVM's optimisations and code-rewriting capabilities)

In the context of imperative languages, the value returned by a function call cannot be memoized (See http://en.m.wikipedia.org/wiki/Memoization) because there is no guarantee that the function has no side effect. Accordingly, your strategy does indeed avoid a function call at the expense of allocating a temporary variable to store a reference to the value returned by the function call.
In addition to being slightly more efficient (which does not really matter unless the function is called many times in a loop), I would opt for your style due to better code readability.

I agree on everything. About the readability I'd like to add something:
I see lots of programmers doing things like:
if (item.getFirst().getSecond().getThird().getForth() == 1 ||
item.getFirst().getSecond().getThird().getForth() == 2 ||
item.getFirst().getSecond().getThird().getForth() == 3)
Or even worse:
item.getFirst().getSecond().getThird().setForth(item2.getFirst().getSecond().getThird().getForth())
If you are calling the same chain of 10 getters several times, please, use an intermediate variable. It's just much easier to read and debug

I would agree with the local variable approach for readability only if the local variable's name is self-documenting. Calling it "description" wouldn't be enough (which description?). Calling it "selectableListTypeDescription" would make it clear. I would throw in that the incremented variable in the for loop should be named "selectableListType" (especially if the "listTypeManager" has accessors for other ListTypes).
The other reason would be if there's no guarantee this is single-threaded or your list is immutable.

Expression evaluation in C vs Java

int y=3;
int z=(--y) + (y=10);
when executed in C language the value of z evaluates to 20
but when the same expression in java, when executed gives the z value as 12.
Can anyone explain why this is happening and what is the difference?

when executed in C language the value of z evaluates to 20
No it does not. This is undefined behavior, so z could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: Undefined, unspecified and implementation-defined behavior
As a rule of thumb, never modify a variable twice in the same expression.
It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. Why are these constructs using pre and post-increment undefined behavior?
In C, when it comes to arithmetic operators, like + and /, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, Is it undefined behavior to use functions with side effects in an unspecified order? , about that, so I'll update this answer later.
Some other variables have specified order (left to right) of evaluation, like || and && and this feature is used for short circuiting. For instance, if we use the above example functions and use foo() && bar(), only the foo() function will be executed.
I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer

There are 3 parts to this answer:
How this works in C (unspecified behaviour)
How this works in Java (the spec is clear on how this should be evaluated)
Why is there a difference.
For #1, you should read #klutt's fantastic answer.
For #2 and #3, you should read this answer.
How does it work in java?
Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.
The java spec clearly says that x+y is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10 is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10 is evaluated which is clearly 12.
Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?
The answer is: performance.
In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.
In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.
If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.
So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.
In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe does it to class files; javac.exe is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.
So java code is never undefined behaviour?
Not so. Java has a memory model which includes a ton of undefined behaviour:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
is undefined in java. It may print 0056, 0012, 0010, 0002, 5600, 0600, and many many more possibilities. Something like 5000 (which it could legally print) is hard to imagine: How can the read of a 'work' but the read of b then fail?
For the exact same reason your C code produces arbitrary answers:
Optimization.
The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using e.g. synchronized.

When executed in C language the value of z evaluates to 20
It is not the truth. The compiler you use evaluates it to 20. Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh
This kind of behaviour is called Undefined Behaviour.
In your expression you have two problems.
Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
In this expression there is also problem with the sequence point (Undefined Bahaviour)

How are JVM optimizations based on assumptions?

In section 12.3.3., "Unrealistic Sampling of Code Paths" the Java Concurrency In Practice book says:
In some cases, the JVM
may make optimizations based on assumptions that may only be true temporarily, and later back them out by invalidating the compiled code if they become untrue
I cannot understand above statement.
What are these JVM assumptions?
How does the JVM know whether the assumptions are true or untrue?
If the assumptions are untrue, does it influence the correctnes of my data?

The statement that you quoted has a footnote which gives an example:
For example, the JVM can use monomorphic call transformation to convert a virtual method call to a direct method call if no classes currently loaded override that method, but it invalidates the compiled code if a class is subsequently loaded that overrides the method.
The details are very, very, very complex here. So the following is a extremely oversimpilified example.
Imagine you have an interface:
interface Adder { int add(int x); }
The method is supposed to add a value to x, and return the result. Now imagine that there is a program that uses an implementation of this class:
class OneAdder implements Adder {
int add(int x) {
return x+1;
}
}
class Example {
void run() {
OneAdder a1 = new OneAdder();
int result = compute(a1);
System.out.println(result);
}
private int compute(Adder a) {
int sum = 0;
for (int i=0; i<100; i++) {
sum = a.add(sum);
}
return sum;
}
}
In this example, the JVM could do certain optimizations. A very low-level one is that it could avoid using a vtable for calling the add method, because there is only one implementation of this method in the given program. But it could even go further, and inline this only method, so that the compute method essentially becomes this:
private int compute(Adder a) {
int sum = 0;
for (int i=0; i<100; i++) {
sum += 1;
}
return sum;
}
and in principle, even this
private int compute(Adder a) {
return 100;
}
But the JVM can also load classes at runtime. So there may be a case where this optimization has already been done, and later, the JVM loads a class like this:
class TwoAdder implements Adder {
int add(int x) {
return x+2;
}
}
Now, the optimization that has been done to the compute method may become "invalid", because it's not clear whether it is called with a OneAdder or a TwoAdder. In this case, the optimization has to be undone.
This should answer 1. of your question.
Regarding 2.: The JVM keeps track of all the optimizations that have been done, of course. It knows that it has inlined the add method based on the assumption that there is only one implementation of this method. When it finds another implementation of this method, it has to undo the optimization.
Regarding 3.: The optimizations are done when the assumptions are true. When they become untrue, the optimization is undone. So this does not affect the correctness of your program.
Update:
Again, the example above was very simplified, referring to the footnote that was given in the book. For further information about the optimization techniques of the JVM, you may refer to https://wiki.openjdk.java.net/display/HotSpot/PerformanceTechniques . Specifically, the speculative (profile-based) techniques can probably be considered to be mostly based on "assumptions" - namely, on assumptions that are made based on the profiling data that has been collected so far.

Taking the quoted text in context, this section of the book is actually talking about the importance of using realistic text data (inputs) when you do performance testing.
Your questions:
What are these JVM assumptions?
I think the text is talking about two things:
On the one hand, it seems to be talking about optimizing based on the measurement of code paths. For example whether the "then" or "else" branch of an if statement is more likely to be executed. This can indeed result in generation of different code and is susceptible to producing sub-optimal code if the initial measurements are incorrect.
On the other hand, it also seems to be talking about optimizations that may turn out to be invalid. For example, at a certain point in time, there may be only one implementation of a given interface method that has been loaded by the JVM. On seeing this, the optimizer may decide to simplify the calling sequence to avoid polymorphic method dispatching. (The term used in the book for this a "monomorphic call transformation".) A bit latter, a second implementation may be loaded, causing the optimizer to back out that optimization.
The first of these cases only affects performance.
The second of these would affect correctness (as well as performance) if the optimizer didn't back out the optimization. But the optimizer does do that. So it only affects performance. (The methods containing the affected calls need to be re-optimized, and that affects overall performance.)
How do JVM know the assumptions are true or untrue?
In the first case, it doesn't.
In the second case, the problem is noticed when the JVM loads the 2nd method, and sees a flag on (say) the interface method that says that the optimizer has assumed that it is effectively a final method. On seeing this, the loader triggers the "back out" before any damage is done.
If the assumptions are untrue, does it influence the correctness of my data?
No it doesn't. Not in either case.
But the takeaway from the section is that the nature of your test data can influence performance measurements. And it is not simply a matter of size. The test data also needs to cause the application to behave the same way (take similar code paths) as it would behave in "real life".

Java, optimal calling of objects and methods

Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.

Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.

The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.

It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..

In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.

Declare an object inside or outside a loop?

Is there any performance penalty for the following code snippet?
for (int i=0; i<someValue; i++)
{
Object o = someList.get(i);
o.doSomething;
}
Or does this code actually make more sense?
Object o;
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
If in byte code these two are totally equivalent then obviously the first method looks better in terms of style, but I want to make sure this is the case.

In today's compilers, no. I declare objects in the smallest scope I can, because it's a lot more readable for the next guy.

To quote Knuth, who may be quoting Hoare:
Premature optimization is the root of all evil.
Whether the compiler will produce marginally faster code by defining the variable outside the loop is debatable, and I imagine it won't. I would guess it'll produce identical bytecode.
Compare this with the number of errors you'll likely prevent by correctly-scoping your variable using in-loop declaration...

There's no performance penalty for declaring the Object o within the loop.
The compiler generates very similar bytecode and makes the correct optimizations.
See the article Myth - Defining loop variables inside the loop is bad for performance for a similar example.

You can disassemble the code with javap -c and check what the compiler actually emits. On my setup (java 1.5/mac compiled with eclipse), the bytecode for the loop is identical.

The first code is better as it restricts scope of o variable to the for block. From a performance perspective, it might not have any effects in Java, but it might have in lower level compilers. They might put the variable in a register if you do the first.
In fact, some people might think that if the compiler is dumb, the second snippet is better in terms of performance. This is what some instructor told me at the college and I laughed at him for this suggestion! Basically, compilers allocate memory on the stack for the local variables of a method just once at the start of the method (by adjusting the stack pointer) and release it at the end of method (again by adjusting the stack pointer, assuming it's not C++ or it doesn't have any destructors to be called). So all stack-based local variables in a method are allocated at once, no matter where they are declared and how much memory they require. Actually, if the compiler is dumb, there is no difference in terms of performance, but if it's smart enough, the first code can actually be better as it'll help the compiler understand the scope and the lifetime of the variable! By the way, if it's really smart, there should no absolutely no difference in performance as it infers the actual scope.
Construction of a object using new is totally different from just declaring it, of course.
I think readability is more important that performance and from a readability standpoint, the first code is definitely better.

I've got to admit I don't know java. But are these two equivalent? Are the object lifetimes the same? In the first example, I assume (not knowing java) that o will be eligible for garbage collection immediately the loop terminates.
But in the second example surely o won't be eligible for garbage collection until the outer scope (not shown) is exited?

Don't prematurely optimize. Better than either of these is:
for(Object o : someList) {
o.doSomething();
}
because it eliminates boilerplate and clarifies intent.
Unless you are working on embedded systems, in which case all bets are off. Otherwise, don't try to outsmart the JVM.

I've always thought that most compilers these days are smart enough to do the latter option. Assuming that's the case, I would say the first one does look nicer as well. If the loop gets very large, there's no need to look all around for where o is declared.

These have different semantics. Which is more meaningful?
Reusing an object for "performance reasons" is often wrong.
The question is what does the object "mean"? WHy are you creating it? What does it represent? Objects must parallel real-world things. Things are created, undergo state changes, and report their states for reasons.
What are those reasons? How does your object model and reflect those reasons?

To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott

In both cases the type info for the object o is determined at compile time.In the second instance, o is seen as being global to the for loop and in the first instance, the clever Java compiler knows that o will have to be available for as long as the loop lasts and hence will optimise the code in such a way that there wont be any respecification of o's type in each iteration.
Hence, in both cases, specification of o's type will be done once which means the only performance difference would be in the scope of o. Obviously, a narrower scope always enhances performance, therefore to answer your question: no, there is no performance penalty for the first code snip; actually, this code snip is more optimised than the second.
In the second snip, o is being given unnecessary scope which, besides being a performance issue, can be also a security issue.

The first makes far more sense. It keeps the variable in the scope that it is used in. and prevents values assigned in one iteration being used in a later iteration, this is more defensive.
The former is sometimes said to be more efficient but any reasonable compiler should be able to optimise it to be exactly the same as the latter.

As someone who maintains more code than writes code.
Version 1 is much preferred - keeping scope as local as possible is essential for understanding. Its also easier to refactor this sort of code.
As discussed above - I doubt this would make any difference in efficiency. In fact I would argue that if the scope is more local a compiler may be able to do more with it!

When using multiple threads (if your doing 50+) then i found this to be a very effective way of handling ghost thread problems:
Object one;
Object two;
Object three;
Object four;
Object five;
try{
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
}catch(e){
e.printstacktrace
}
finally{
one = null;
two = null;
three = null;
four = null;
five = null;
System.gc();
}

The answer depends partly on what the constructor does and what happens with the object after the loop, since that determines to a large extent how the code is optimized.
If the object is large or complex, absolutely declare it outside the loop. Otherwise, the people telling you not to prematurely optimize are right.

I've actually in front of me a code which looks like this:
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
So, relying on compiler abilities, I can assume there would be only one stack allocation for i and one for append. Then everything would be fine except the duplicated code.
As a side note, java applications are known to be slow. I never tried to do profiling in java but I guess the performance hit comes mostly from memory allocation management.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.