java performance front : ref in for loop

java performance front : ref in for loop - java

Which code is better for the performance point of you? I think second code because ref creation in for loop is not good.
May I know your opinion?
// First Code
for (int i = 0; i < array.size(); i++) {
SipSession abc = (SipSession) array1.get(i);
}
// Second Code
SipSession abc = null;
for (int i = 0; i < array.size(); i++) {
abc = (SipSession) array1.get(i);
}

You should only choose on performance grounds after you've profiled your code and established that this is the bottleneck.
Until you've done that, choose whichever version you think is clearer an easier to maintain.
I would always choose the first version except when I need the last SipSession reference to outlive the loop.

Ultimately it will make no difference. The JIT will optimize that code away to exactly the same thing.
The only difference is the scope, of course.

I don't think there's much performance difference between the two. Only major difference is the scope of the SipSession reference. But you should try profiling if you care that much.

In your first code the VM or even the compiler will simply remove the reference variable, because it is never used within its scope.
It will be optimized to
for(int i=0;i<array.size();i++){
array1.get(i);
}
Depending of what is done in the get method the hole loop may be removed while optimization.
If the order the elements are accessed is not important you can also:
for (int i = array.size()-1; 0 <= i ;) {
SipSession abc = (SipSession) array1.get(i--);
}
This would call array.size() only once instead of in each loop iteration.

This would be Micro Optimization and its better to do other kinds of optimizations of code than doing them without proof that it is the bottleneck. Which is not the case here.

Never try to optimize without profiling. The JIT compiler does the heavy lifting so you don't have to.
That aside, your array seems to be a raw List instead of a generic List<SipSession>. Generics won't necessarily optimize your code, but it makes it much easier to understand and maintain. Your simple loop could be rewritten as:
List<SipSession> array;
for(SipSession abc : array){
// Stuff
}

Related

The same computation inside a loop on a constant

In an interface I have the following:
public static byte[] and0xFFArray(byte[] array) {
for (int i = 0; i < array.length; i++) {
array[i] = (byte) (array[i] & 0xFF);
}
return array;
}
In another class I am calling the following:
while(true){
...
if (isBeforeTerminator(htmlInput, ParserI.and0xFFArray("포토".getBytes("UTF-8")), '<')) {
...
}
...
}
My question is, will the resultant array from String constant be computed once during compilation or will it be computed everytime the loop iterates?
Edit: I just noticed that the method doesn't make sense, but it doesn't affect the question.

I assume that you're referring to the result of
ParserI.and0xFFArray("포토".getBytes("UTF-8"))
Unless you explicitly cache/store the results somewhere, it'll be computed every time you call it.
You may want to consider something like:
byte[] parserI = ParserI.and0xFFArray("포토".getBytes("UTF-8"));
while (true) {
...
if (isBeforeTerminator(htmlInput, parserI, '<'))
...
To understand why compilers don't implement this automatically, keep in mind that you can't write a general algorithm to detect if a particular method will always return the same value as you'd quickly encounter things like the Halting Problem, so anything you try to write to do something like that would be massively complicated and wouldn't even work a good percent of the time. You'd also have to understand a fair amount about when a method will be called in order to work out a reasonable caching strategy. For example, is it worth persisting the cache after the for loop? You'd have to understand a fair amount about the program structure to know for sure.
It is possible that an optimizer could recognize that the results of a method are constant under certain limited circumstances (and I'm not sure the extent to which Java optimizers have actually implemented that), but you certainly can't count on that in the general case. The only way to know for sure if this is one of them is to look at the actual bytecode that the compiler produces, but I highly doubt that it's being as smart as you'd like it to here for the reasons I listed above. It's better to explicitly do the caching yourself as shown above.

Java, optimal calling of objects and methods

Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.

Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.

The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.

It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..

In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.

return in for loop or outside loop

Today, someone attended me to bad use of the return keyword in Java. I had written a simple for loop to validate that something is in an array. Supposing array is an array of length n, this was my code:
for (int i = 0; i < array.length; ++i) {
if (array[i] == valueToFind) {
return true;
}
}
return false;
Now someone told me that this is not very good programming because I use the return statement inside a loop and this would cause garbage collection to malfunction. Therefore, better code would be:
int i = 0;
while (i < array.length && array[i] != valueToFind) {
++i;
}
return i != array.length;
The problem is that I can't come up with a proper explenation of why the first for loop isn't a good practice. Can somebody give me an explanation?

Now someone told me that this is not very good programming because I use the return statement inside a loop and this would cause garbage collection to malfunction.
That's incorrect, and suggests you should treat other advice from that person with a degree of skepticism.
The mantra of "only have one return statement" (or more generally, only one exit point) is important in languages where you have to manage all resources yourself - that way you can make sure you put all your cleanup code in one place.
It's much less useful in Java: as soon as you know that you should return (and what the return value should be), just return. That way it's simpler to read - you don't have to take in any of the rest of the method to work out what else is going to happen (other than finally blocks).

Now someone told me that this is not very good programming because I
use the return statement inside a loop and this would cause garbage
collection to malfunction.
That's a bunch of rubbish. Everything inside the method would be cleaned up unless there were other references to it in the class or elsewhere (a reason why encapsulation is important). As a rule of thumb, it's generally better to use one return statement simply because it is easier to figure out where the method will exit.
Personally, I would write:
Boolean retVal = false;
for(int i=0; i<array.length; ++i){
if(array[i]==valueToFind) {
retVal = true;
break; //Break immediately helps if you are looking through a big array
}
}
return retVal;

There have been methodologies in all languages advocating for use of a single return statement in any function. However impossible it may be in certain code, some people do strive for that, however, it may end up making your code more complex (as in more lines of code), but on the other hand, somewhat easier to follow (as in logic flow).
This will not mess up garbage collection in any way!!
The better way to do it is to set a boolean value, if you want to listen to him.
boolean flag = false;
for(int i=0; i<array.length; ++i){
if(array[i] == valueToFind) {
flag = true;
break;
}
}
return flag;

Some people argue that a method should have a single point of exit (e.g., only one return). Personally, I think that trying to stick to that rule produces code that's harder to read. In your example, as soon as you find what you were looking for, return it immediately, it's clear and it's efficient.
Quoting the C2 wiki:
The original significance of having a single entry and single exit for a function is that it was part of the original definition of StructuredProgramming as opposed to undisciplined goto SpaghettiCode, and allowed a clean mathematical analysis on that basis.
Now that structured programming has long since won the day, no one particularly cares about that anymore, and the rest of the page is largely about best practices and aesthetics and such, not about mathematical analysis of structured programming constructs.

The code is valid (i.e, will compile and execute) in both cases.
One of my lecturers at Uni told us that it is not desirable to have continue, return statements in any loop - for or while. The reason for this is that when examining the code, it is not not immediately clear whether the full length of the loop will be executed or the return or continue will take effect.
See Why is continue inside a loop a bad idea? for an example.
The key point to keep in mind is that for simple scenarios like this it doesn't (IMO) matter but when you have complex logic determining the return value, the code is 'generally' more readable if you have a single return statement instead of several.
With regards to the Garbage Collection - I have no idea why this would be an issue.

Since there is no issue with GC. I prefer this.
for(int i=0; i<array.length; ++i){
if(array[i] == valueToFind)
return true;
}

Critical loop containing many “if” whose output is constant : How to save on condition tests ? For Java ;)

I just read this thread Critical loop containing many "if" whose output is constant : How to save on condition tests?
and this one Constant embedded for loop condition optimization in C++ with gcc which are exactly what I would like to do in Java.
I have some if conditions called many times, the conditions are composed of attributes define at initialization and which won't change.
Will the Javac optimize the bytecode by removing the unused branches of the conditions avoiding to spend time testing them?
Do I have to define the attributes as final or is it useless?
Thanks for you help,
Aurélien

Java compile time optimization is pretty lacking. If you can use a switch statement it can probably do some trivial optimizations. If the number of attributes is very large then a HashMap is going to be your best bet.
I'll close by saying that this sort of thing is very very rarely a bottleneck and trying to prematurely optimize it is counterproductive. If your code is, in fact, called a lot then the JIT optimizer will do its best to make your code run faster. Just say what you want to happen and only worry about the "how" when you find that's actually worth the time to optimize it.

In OO languages, the solution is to use delegation or the command pattern instead of if/else forests.
So your attributes need to implement a common interface like IAttribute which has a method run() (or make all attributes implement Runnable).
Now you can simply call the method without any decisions in the loop:
for(....) {
attr.run();
}
It's a bit more complex if you can't add methods to your attributes. My solution in this case is using enums and an EnumMap which contains the runnables. Access to an EnumMap is almost like an array access (i.e. O(1)).
for(....) {
map.get(attr).run();
}

I don't know about Java specifics regarding this, but you might want to look into a technique called Memoization which would allow you to look up results for a function in a table instead of calling the function. Effectively, memoization makes your program "remember" results of a function for a given input.

Try replacing the if with runtime polymorphism. No, that's not as strange as you think.
If, for example you have this:
for (int i=0; i < BIG_NUMBER; i++) {
if (calculateSomeCondition()) {
frobnicate(someValue);
} else {
defrobnicate(someValue);
}
}
then replace it with this (Function taken from Guava, but can be replaced with any other fitting interface):
Function<X> f;
if (calculateSomeCondition()) {
f = new Frobnicator();
else {
f = new Defrobnicator();
}
for int (i=0; i < BIG_NUMBER; i++) {
f.apply(someValue);
}
Method calls are pretty highly optimized on most modern JVMs even (or especially) if there are only a few possible call targets.

Declare an object inside or outside a loop?

Is there any performance penalty for the following code snippet?
for (int i=0; i<someValue; i++)
{
Object o = someList.get(i);
o.doSomething;
}
Or does this code actually make more sense?
Object o;
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
If in byte code these two are totally equivalent then obviously the first method looks better in terms of style, but I want to make sure this is the case.

In today's compilers, no. I declare objects in the smallest scope I can, because it's a lot more readable for the next guy.

To quote Knuth, who may be quoting Hoare:
Premature optimization is the root of all evil.
Whether the compiler will produce marginally faster code by defining the variable outside the loop is debatable, and I imagine it won't. I would guess it'll produce identical bytecode.
Compare this with the number of errors you'll likely prevent by correctly-scoping your variable using in-loop declaration...

There's no performance penalty for declaring the Object o within the loop.
The compiler generates very similar bytecode and makes the correct optimizations.
See the article Myth - Defining loop variables inside the loop is bad for performance for a similar example.

You can disassemble the code with javap -c and check what the compiler actually emits. On my setup (java 1.5/mac compiled with eclipse), the bytecode for the loop is identical.

The first code is better as it restricts scope of o variable to the for block. From a performance perspective, it might not have any effects in Java, but it might have in lower level compilers. They might put the variable in a register if you do the first.
In fact, some people might think that if the compiler is dumb, the second snippet is better in terms of performance. This is what some instructor told me at the college and I laughed at him for this suggestion! Basically, compilers allocate memory on the stack for the local variables of a method just once at the start of the method (by adjusting the stack pointer) and release it at the end of method (again by adjusting the stack pointer, assuming it's not C++ or it doesn't have any destructors to be called). So all stack-based local variables in a method are allocated at once, no matter where they are declared and how much memory they require. Actually, if the compiler is dumb, there is no difference in terms of performance, but if it's smart enough, the first code can actually be better as it'll help the compiler understand the scope and the lifetime of the variable! By the way, if it's really smart, there should no absolutely no difference in performance as it infers the actual scope.
Construction of a object using new is totally different from just declaring it, of course.
I think readability is more important that performance and from a readability standpoint, the first code is definitely better.

I've got to admit I don't know java. But are these two equivalent? Are the object lifetimes the same? In the first example, I assume (not knowing java) that o will be eligible for garbage collection immediately the loop terminates.
But in the second example surely o won't be eligible for garbage collection until the outer scope (not shown) is exited?

Don't prematurely optimize. Better than either of these is:
for(Object o : someList) {
o.doSomething();
}
because it eliminates boilerplate and clarifies intent.
Unless you are working on embedded systems, in which case all bets are off. Otherwise, don't try to outsmart the JVM.

I've always thought that most compilers these days are smart enough to do the latter option. Assuming that's the case, I would say the first one does look nicer as well. If the loop gets very large, there's no need to look all around for where o is declared.

These have different semantics. Which is more meaningful?
Reusing an object for "performance reasons" is often wrong.
The question is what does the object "mean"? WHy are you creating it? What does it represent? Objects must parallel real-world things. Things are created, undergo state changes, and report their states for reasons.
What are those reasons? How does your object model and reflect those reasons?

To get at the heart of this question... [Note that non-JVM implementations may do things differently if allowed by the JLS...]
First, keep in mind that the local variable "o" in the example is a pointer, not an actual object.
All local variables are allocated on the runtime stack in 4-byte slots. doubles and longs require two slots; other primitives and pointers take one. (Even booleans take a full slot)
A fixed runtime-stack size must be created for each method invocation. This size is determined by the maximum local variable "slots" needed at any given spot in the method.
In the above example, both versions of the code require the same maximum number of local variables for the method.
In both cases, the same bytecode will be generated, updating the same slot in the runtime stack.
In other words, no performance penalty at all.
HOWEVER, depending on the rest of the code in the method, the "declaration outside the loop" version might actually require a larger runtime stack allocation. For example, compare
for (...) { Object o = ... }
for (...) { Object o = ... }
with
Object o;
for (...) { /* loop 1 */ }
for (...) { Object x =...; }
In the first example, both loops require the same runtime stack allocation.
In the second example, because "o" lives past the loop, "x" requires an additional runtime stack slot.
Hope this helps,
-- Scott

In both cases the type info for the object o is determined at compile time.In the second instance, o is seen as being global to the for loop and in the first instance, the clever Java compiler knows that o will have to be available for as long as the loop lasts and hence will optimise the code in such a way that there wont be any respecification of o's type in each iteration.
Hence, in both cases, specification of o's type will be done once which means the only performance difference would be in the scope of o. Obviously, a narrower scope always enhances performance, therefore to answer your question: no, there is no performance penalty for the first code snip; actually, this code snip is more optimised than the second.
In the second snip, o is being given unnecessary scope which, besides being a performance issue, can be also a security issue.

The first makes far more sense. It keeps the variable in the scope that it is used in. and prevents values assigned in one iteration being used in a later iteration, this is more defensive.
The former is sometimes said to be more efficient but any reasonable compiler should be able to optimise it to be exactly the same as the latter.

As someone who maintains more code than writes code.
Version 1 is much preferred - keeping scope as local as possible is essential for understanding. Its also easier to refactor this sort of code.
As discussed above - I doubt this would make any difference in efficiency. In fact I would argue that if the scope is more local a compiler may be able to do more with it!

When using multiple threads (if your doing 50+) then i found this to be a very effective way of handling ghost thread problems:
Object one;
Object two;
Object three;
Object four;
Object five;
try{
for (int i=0; i<someValue; i++)
{
o = someList.get(i);
o.doSomething;
}
}catch(e){
e.printstacktrace
}
finally{
one = null;
two = null;
three = null;
four = null;
five = null;
System.gc();
}

The answer depends partly on what the constructor does and what happens with the object after the loop, since that determines to a large extent how the code is optimized.
If the object is large or complex, absolutely declare it outside the loop. Otherwise, the people telling you not to prematurely optimize are right.

I've actually in front of me a code which looks like this:
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
...
for (int i = offset; i < offset + length; i++) {
char append = (char) (data[i] & 0xFF);
buffer.append(append);
}
So, relying on compiler abilities, I can assume there would be only one stack allocation for i and one for append. Then everything would be fine except the duplicated code.
As a side note, java applications are known to be slow. I never tried to do profiling in java but I guess the performance hit comes mostly from memory allocation management.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.