I had an argument with my friend regarding this.
Consider the below snippet,
for(i=0; i<someList.size(); i++) {
//some logic
}
Here someList.size() will be executed for every iteration, so it is recommended to migrate this size calculation to outside(before) the loop.
Now what happens when I use an extended for loop like this,
for(SpecialBean bean: someBean.getSpecialList()) {
//some logic
}
Is it necessary to move someBean.getSpecialList() to outside the loop?
How many times will someBean.getSpecialList() execute if I were to retain the 2nd snippet as it is?
Repeated calls to list.size() won't result in any performance penalty. The JIT compiler will most probably inline it and even if it doesn't, it will still be quite inexpensive because it just involves reading the value of a field.
A much more serious problem with your first example is that the loop body will have to involve list.get(i) and for a LinkedList, acessing the i th element has O(i) cost with a quite significant constant factor due to pointer chasing, which translates to data-dependent loads on the CPU level. The CPU's prefetcher can't optimize this access pattern.
This means that the overall computational complexity will be O(n2) when applied to a LinkedList.
Your second example compiles to iteration via Iterator and will evaluate someBean.getSpecialList().iterator() only once. The cost of iterator.next() is constant in all cases.
From Item 46 in Effective Java by Joshua Bloch :
The for-each loop, introduced in release 1.5, gets rid of the clutter
and the opportunity for error by hiding the iterator or index
variable completely. The resulting idiom applies equally to
collections and arrays:
// The preferred idiom for iterating over collections and arrays for
(Element e : elements) {
doSomething(e); } When you see the colon (:), read it as “in.” Thus, the loop above reads as “for each element e in elements.” Note
that there is no performance penalty for using the for-each loop, even
for arrays. In fact, it may offer a slight performance advantage over
an ordinary for loop in some circumstances, as it computes the limit
of the array index only once. While you can do this by hand (Item 45),
programmers don’t always do so.
See also is-there-a-performance-difference-between-a-for-loop-and-a-for-each-loop
An alternative to the first snippet would be:
for(i=0, l=someList.size(); i<l; i++) {
//some logic
}
With regard to the for..each loop, the call to getSpecialList() will only be made once (you could verify this by adding some debugging/logging inside the method).
As the extended loop uses an Iterator taken from the Iterable, it wouldn't be possible or sensible to execute someBean.getSpecialList() more than once. Moving it outside the loop will not change the performance of the loop, but you could do it if it improves readability.
Note: if you iterate by index it can be faster for random access collections e.g. ArrayList as it doesn't create an Iterator, but slower for indexed collections which don't support random access.
for each variation will be same as below
for (Iterator i = c.iterator(); i.hasNext(); ) {
doSomething((Element) i.next());
}
From Item 46: Prefer for-each loops to traditional for loops of Effective java
for-each loop provides compelling advantages over the tradi-
tional for loop in clarity and bug prevention, with no performance penalty. You
should use it wherever you can.
So My first guess was wrong there is no penalty using function inside for each loop.
Related
I am doing some performance optimization for my java application and I am confuse about using the tmp variable to remove the method invocation in loop termination. Here is my situation:
Vector myVector = new Vector();
// some code
for (int i=0;i<myVector.size();i++){
//some code here;
}
I want to use
int tmp = myVector.size();
for(int i=0;i<tmp;i++){
//some code here
}
What would be negative impact of using second scenario ? My application is pretty large and I am not sure when and where myVector is being updated.
This change will not have any noticable impact on performance, neither positive nor negative. So you should not change this as long as there is no profound reason to do so.
Regarding your question
What could be negative impact of using second scenario ?
you should be aware that both implementations may behave differently in a multi-threaded environment. In the first case, changes of the vector that may be done by any other thread will be taken into account, and may affect how many times to loop is run. In the second case, the number of runs for the loop is computed once, and will not change later (even if the size of the vector changes). However, changing the contents of a vector while iterating over it with any of the both loops is dangerous and should be avoided if possible
BTW: The benchmark that was linked in the comment from #geoand is as flawed as a microbenchmark can be. This does not tell you anything.
Well, it is stated in Performance Tips that:
So, you should use the enhanced for loop by default, but consider a
hand-written counted loop for performance-critical ArrayList
iteration.
But looking at ioshed 2013 application, which is considered to be an example for most developers, in ScheduleUpdaterService.java in particular I can see the following:
void processPendingScheduleUpdates() {
try {
// Operate on a local copy of the schedule update list so as not to block
// the main thread adding to this list
List<Intent> scheduleUpdates = new ArrayList<Intent>();
synchronized (mScheduleUpdates) {
scheduleUpdates.addAll(mScheduleUpdates);
mScheduleUpdates.clear();
}
SyncHelper syncHelper = new SyncHelper(this);
for (Intent updateIntent : scheduleUpdates) {
String sessionId = updateIntent.getStringExtra(EXTRA_SESSION_ID);
boolean inSchedule = updateIntent.getBooleanExtra(EXTRA_IN_SCHEDULE, false);
LOGI(TAG, "addOrRemoveSessionFromSchedule:"
+ " sessionId=" + sessionId
+ " inSchedule=" + inSchedule);
syncHelper.addOrRemoveSessionFromSchedule(this, sessionId, inSchedule);
}
} catch (IOException e) {
// TODO: do something useful here, like revert the changes locally in the
// content provider to maintain client/server sync
LOGE(TAG, "Error processing schedule update", e);
}
}
Please notice there is an enhanced for loop iteration through scheduleUpdates, while it is suggested to avoid such type of iteration for ArrayList.
Is that because this part of the application is not considered to be critical from performance standpoint or am I not understanding something? Thanks a lot.
Yes. Readability and maintainability is much more important than performance in 99.999% of the cases. The performance tip says: "you should use the enhanced for loop by default".
So, unless you have a performance problem, and you've proven that transforming the foreach loop to a counted loop solves this performance problem, or at least significantly improves the situation, you should favor readability and maintainability, and thus the foreach loop.
You need to look at the context of the code. If you had tuned all object creation out of the loop and this was used heavily, switching to indexed loop can make a difference.
In your case, you clearly have a much more expensive operation creating a log string. Optimising this would make much, much more difference. (possibly 10 - 1000x more)
Yes, there is a small performance loss via the iterator. A regular for loop that uses this iterator will also encounter such losses, while iterating manually would be slightly faster. However this is extremely small(on the order of nanoseconds as seen here), and is negligible.
Optimizations should be made elsewhere, such as limiting object creation and destruction, and/or other expensive operations.
I agree foreach loop reduces typing and good for readability.
A little backup, I work on low latency application development and receive 1Million packets to process per second. Iterating through a million packets and sending this information across to its listeners. I was using foreach loop to iterate through the set of listeners.
Doing profiling i figured there are a lot of Iterator objects created to execute foreach loop. Converting foreach loop to index based foreach I observed a huge drop in the number of objects created there by reducing no. of GC's and increasing application throughput.
Edit: (Sorry for confusion, making this Q more clearer)
For example i have list of listeners(fixed size) and i loop through this forloop a million times a second. Is foreach an overkill in java?
Example:
for(String s:listOfListeners)
{
// logic
}
compared to
for (int i=0;i<listOfListeners.size();i++)
{
// logic
}
Profiled screenshot for the code
for (int cnt = 0; cnt < 1_000_000; cnt++)
{
for (String string : list_of_listeners)
{
//No Code here
}
}
EDIT: Answering the vastly different question of:
For example i have list of listeners(fixed size) and i loop through this forloop a million times a second. Is foreach an overkill in java?
That depends - does your profiling actually show that the extra allocations are significant? The Java allocator and garbage collector can do a lot of work per second.
To put it another way, your steps should be:
Set performance goals alongside your functional requirements
Write the simplest code you can to achieve your functional requirements
Measure whether that code meets the functional requirements
If it doesn't:
Profile to work out where to optimize
Make a change
Run the tests again to see whether they make a significant difference in your meaningful metrics (number of objects allocated probably isn't a meaningful metric; number of listeners you can handle probably is)
Go back to step 3.
Maybe in your case, the enhanced for loop is significant. I wouldn't assume that it is though - nor would I assume that the creation of a million objects per second is significant. I would measure the meaningful metrics before and after... and make sure you have concrete performance goals before you do anything else, as otherwise you won't know when to stop micro-optimizing.
Size of list is around a million objects streaming in.
So you're creating one iterator object, but you're executing your loop body a million times.
Doing profiling i figured there are a lot of Iterator objects created to execute foreach loop.
Nope? Only a single iterator object should be created. As per the JLS:
The enhanced for statement is equivalent to a basic for statement of the form:
for (I #i = Expression.iterator(); #i.hasNext(); ) {
VariableModifiersopt TargetType Identifier =
(TargetType) #i.next();
Statement
}
As you can see, that calls the iterator() method once, and then calls hasNext() and next() on it on each iteration.
Do you think that extra object allocation will actually hurt your performance significantly?
How much do you value readability over performance? I take the approach of using the enhanced for loop wherever it helps readability, until it proves to be a performance problem - and my personal experience is that it's never hurt performance significantly in anything I've written. That's not to say that would be true for all applications, but the default position should be to only use the less readable code after proving it will improve things significantly.
The "foreach" loop creates just one Iterator object, while the second loop creates none. If you are executing many, many separate loops that execute just a few times each, then yes, "foreach" may be unnecessarily expensive. Otherwise, this is micro-optimizing.
EDIT: The question has changed so much since I wrote my answer that I'm not sure what I'm answering at the moment.
Looking up stuff with list.get(i) can actually be a lot slower if it's a linked list, since for each lookup, it has to traverse the list, while the iterator remembers the position.
Example:
list.get(0) will get the first element
list.get(1) will first get the first element to find pointer to the next
list.get(2) will first get the first element, then go to the second and then to the third
etc.
So to do a full loop, you're actually looping over elements in this manner:
0
0->1
0->1->2
0->1->2->3
etc.
I do not think you should worry about the effectiveness here.
Most of time is consumed by your actual application logic (in this case - by what you do inside the loop).
So, I would not worry about the price you pay for convenience here.
Is this comparison even fair ? You are comparing using an Iterator Vs using get(index)
Furthermore, each loop would only create one additional Iterator. Unless the Iterator is itself inefficient for some reason, you should see comparable performance.
I just read this thread Critical loop containing many "if" whose output is constant : How to save on condition tests?
and this one Constant embedded for loop condition optimization in C++ with gcc which are exactly what I would like to do in Java.
I have some if conditions called many times, the conditions are composed of attributes define at initialization and which won't change.
Will the Javac optimize the bytecode by removing the unused branches of the conditions avoiding to spend time testing them?
Do I have to define the attributes as final or is it useless?
Thanks for you help,
Aurélien
Java compile time optimization is pretty lacking. If you can use a switch statement it can probably do some trivial optimizations. If the number of attributes is very large then a HashMap is going to be your best bet.
I'll close by saying that this sort of thing is very very rarely a bottleneck and trying to prematurely optimize it is counterproductive. If your code is, in fact, called a lot then the JIT optimizer will do its best to make your code run faster. Just say what you want to happen and only worry about the "how" when you find that's actually worth the time to optimize it.
In OO languages, the solution is to use delegation or the command pattern instead of if/else forests.
So your attributes need to implement a common interface like IAttribute which has a method run() (or make all attributes implement Runnable).
Now you can simply call the method without any decisions in the loop:
for(....) {
attr.run();
}
It's a bit more complex if you can't add methods to your attributes. My solution in this case is using enums and an EnumMap which contains the runnables. Access to an EnumMap is almost like an array access (i.e. O(1)).
for(....) {
map.get(attr).run();
}
I don't know about Java specifics regarding this, but you might want to look into a technique called Memoization which would allow you to look up results for a function in a table instead of calling the function. Effectively, memoization makes your program "remember" results of a function for a given input.
Try replacing the if with runtime polymorphism. No, that's not as strange as you think.
If, for example you have this:
for (int i=0; i < BIG_NUMBER; i++) {
if (calculateSomeCondition()) {
frobnicate(someValue);
} else {
defrobnicate(someValue);
}
}
then replace it with this (Function taken from Guava, but can be replaced with any other fitting interface):
Function<X> f;
if (calculateSomeCondition()) {
f = new Frobnicator();
else {
f = new Defrobnicator();
}
for int (i=0; i < BIG_NUMBER; i++) {
f.apply(someValue);
}
Method calls are pretty highly optimized on most modern JVMs even (or especially) if there are only a few possible call targets.
I'm just curious: Is there a difference on speed and performance between this two loops implementation? Assume that size() method returns the length of the array,collection, or object that handles a group of elements (actually it's from XOM api).
Implementation 1:
int size = someArray.size();
for (int i = 0; i < size; i++) {
// do stuff here
}
Implementation 2:
for (int i = 0; i < someArray.size(); i++) {
// do stuff here
}
From a performance point of view, there is little difference. This is because a loop can be optimized so that the size() lookup is inlined, resulting in very little performance difference.
The main difference is if the size changes while looping. The first case will try to iterate a fixed number of times. In the second case, the number of iterations will depend on the final size().
The 1st snippet is bound to execute faster since it calls size() once only. The 2nd snippet calls size() N times. Depending on the impl. it might pose significant penalty, esp. if the compiler finds hard to inline the method and/or the size() method doesn't just return non-volatile variable, etc.
I'd have rewritten it like for(int i=0, s=someCollection.size(); i<s; i++)
Note: arrays don't have size() method.
Yes, there is a difference. In the first loop, the size() method is only called once. In the second one, it's called at each iteration.
If the iteration modifies the size of the collection (which is very very uncommon), the second one is needed. In most cases, you should prefer the first one, but limit the scope of the size variable :
for (int i = 0, size = someArray.size(); i < size; i++) {
// ...
}
But most of the time, you should prefer the foreach syntax anyway :
for (Foo foo : collection) {
// ...
}
which will iterate over the array or collection efficiently, even for a LinkedList for example, where indexed access is not optimal.
Don't worry about it, JVM optimization is very aggressive these days.
Use the 2nd form, for it's more readable, and most likely as fast. Premature optimization yada yada.
And when you do need to improve speed, always profile first, don't guess.
It is extremely unlikely that caching size() in a local variable could benefit your app noticeably. If it does, you must be doing simple operations over a huge dataset. You shouldn't use ArrayList at all in that case.
Maybe it is worth to note that this construct:
for (String s : getStringsList()) {
//...
}
invokes getStringsList() only once and then operates on iterator behind the scenes. So it is safe to perform lengthy operations or change some state inside getStringsList().
Always avoid anything that can be done outside of the loop like method calls, assigning values to variables, or testing for conditions.
Method calls are more costly than the equivalent code without the call, and by repeating method calls again and again, you just add overhead to your application.
Move any method calls out of the loop, even if this requires rewriting of the code.
Benefits :-
Unless the compiler optimizes it, the loop condition will be calculated for each iteration over the loop.
If the condition value is not going to change, the code will execute faster if the method call is moved out of the loop.
Note :-
If the method returns a value that will not change during the loop, then store its value in a temporary variable before the loop.
Hence its value is stored in a temporary variable size outside the loop, and then used as the loop termination condition.