Is Java foreach loop an overkill for repeated execution - java

I agree foreach loop reduces typing and good for readability.
A little backup, I work on low latency application development and receive 1Million packets to process per second. Iterating through a million packets and sending this information across to its listeners. I was using foreach loop to iterate through the set of listeners.
Doing profiling i figured there are a lot of Iterator objects created to execute foreach loop. Converting foreach loop to index based foreach I observed a huge drop in the number of objects created there by reducing no. of GC's and increasing application throughput.
Edit: (Sorry for confusion, making this Q more clearer)
For example i have list of listeners(fixed size) and i loop through this forloop a million times a second. Is foreach an overkill in java?
Example:
for(String s:listOfListeners)
{
// logic
}
compared to
for (int i=0;i<listOfListeners.size();i++)
{
// logic
}
Profiled screenshot for the code
for (int cnt = 0; cnt < 1_000_000; cnt++)
{
for (String string : list_of_listeners)
{
//No Code here
}
}

EDIT: Answering the vastly different question of:
For example i have list of listeners(fixed size) and i loop through this forloop a million times a second. Is foreach an overkill in java?
That depends - does your profiling actually show that the extra allocations are significant? The Java allocator and garbage collector can do a lot of work per second.
To put it another way, your steps should be:
Set performance goals alongside your functional requirements
Write the simplest code you can to achieve your functional requirements
Measure whether that code meets the functional requirements
If it doesn't:
Profile to work out where to optimize
Make a change
Run the tests again to see whether they make a significant difference in your meaningful metrics (number of objects allocated probably isn't a meaningful metric; number of listeners you can handle probably is)
Go back to step 3.
Maybe in your case, the enhanced for loop is significant. I wouldn't assume that it is though - nor would I assume that the creation of a million objects per second is significant. I would measure the meaningful metrics before and after... and make sure you have concrete performance goals before you do anything else, as otherwise you won't know when to stop micro-optimizing.
Size of list is around a million objects streaming in.
So you're creating one iterator object, but you're executing your loop body a million times.
Doing profiling i figured there are a lot of Iterator objects created to execute foreach loop.
Nope? Only a single iterator object should be created. As per the JLS:
The enhanced for statement is equivalent to a basic for statement of the form:
for (I #i = Expression.iterator(); #i.hasNext(); ) {
VariableModifiersopt TargetType Identifier =
(TargetType) #i.next();
Statement
}
As you can see, that calls the iterator() method once, and then calls hasNext() and next() on it on each iteration.
Do you think that extra object allocation will actually hurt your performance significantly?
How much do you value readability over performance? I take the approach of using the enhanced for loop wherever it helps readability, until it proves to be a performance problem - and my personal experience is that it's never hurt performance significantly in anything I've written. That's not to say that would be true for all applications, but the default position should be to only use the less readable code after proving it will improve things significantly.

The "foreach" loop creates just one Iterator object, while the second loop creates none. If you are executing many, many separate loops that execute just a few times each, then yes, "foreach" may be unnecessarily expensive. Otherwise, this is micro-optimizing.

EDIT: The question has changed so much since I wrote my answer that I'm not sure what I'm answering at the moment.
Looking up stuff with list.get(i) can actually be a lot slower if it's a linked list, since for each lookup, it has to traverse the list, while the iterator remembers the position.
Example:
list.get(0) will get the first element
list.get(1) will first get the first element to find pointer to the next
list.get(2) will first get the first element, then go to the second and then to the third
etc.
So to do a full loop, you're actually looping over elements in this manner:
0
0->1
0->1->2
0->1->2->3
etc.

I do not think you should worry about the effectiveness here.
Most of time is consumed by your actual application logic (in this case - by what you do inside the loop).
So, I would not worry about the price you pay for convenience here.

Is this comparison even fair ? You are comparing using an Iterator Vs using get(index)
Furthermore, each loop would only create one additional Iterator. Unless the Iterator is itself inefficient for some reason, you should see comparable performance.

Related

Java, optimal calling of objects and methods

Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.
Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.
The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.
It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..
In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.

Java: Reusing vs Reallocating reference to container object?

tl;dr: In Java, which is better, reusing of container object or creating object every time and let garbage collector do the work
I am dealing with huge amount of data in Java where frequently I have following type of code structure:-
Version1:
for(...){//outer loop
HashSet<Integer> test = new HashSet<>(); //Some container
for(...){
//Inner loop working on the above container Data Structure
}
//More operation on the container defined above
}//Outer loop ends
Here I allocated new memory every time in a loop and do some operations in inner/outer loop before allocating empty memory again.
Now I am concerned about the memory leaks in Java. I know that Java has a fairly good Garbage Collector but instead of relying on that should I modify my code as follows:-
Version2:
HashSet<Integer> test = null;
for(...){//outer loop
if(test == null){
test = new HashSet<>(); //Some container
}else{
test.clear()
}
for(...){
//Inner loop working on the above container Data Structure
}
//More operation on the container defined above
}//Outer loop ends
I have three questions:-
Which will perform better, or there is no definitive answer.
Will second version will have more time complexity? In other other words is clear() function O(1) of O(n) in complexity. I didn't anything in javadocs.
This pattern is quite common, which version is more recommended one?
To my opinion it's better to use the first approach. Note that HashSet.clear never shrinks the size of hash-table. Thus if the first iteration of the outer loop adds many elements to the set, the hash-table will become quite big, but on the subsequent iterations even if much less space is necessary if won't be shrinked.
Also first version makes the further refactoring easier: you may later want to put the whole inner loop into the separate method. Using the first version you can just move it together with HashSet.
Finally note that for garbage-collection it's usually easier to manage short-lived objects. If your HashSet is long-lived, it may be moved to old generation and removed only during the full GC.
I think it's simpler to create a new HashSet each time, and likely to be less prone to refactoring errors later on. Unless you have a good reason to resuse the HashSet (Garbage Collection pauses are an issue for you, and profiling shows this part of the code is the cause) - I would keep things as simple as possible and stick to 1. Focus on maintainability, Premature Optimization should be avoided.
I would recommend you to stick to the first variant. The main reason behind this will be keeping the scope of your HashSet variable as small as possible. This way you actually ensure that it will be eligible for garbage collection after the iteration has ended. Promoting it's scope may cause other problems - the reference can be later used to actually change the state of the object.
Also, most modern Java compilers will produce the same byte code if you are creating the instance inside or outside the loop.
Which one is faster?. Actually the answer could vary depending on various factors.
Version-1 advantages :
Predictive branching at processor level might make this faster.
Scope of instance is limited to the first loop. If reference doesn't escape, JIT might actually compile your method. GC's job will
probably be easier.
Version -2 :
Less time in creation of new containers (frankly, this is not too much).
clear() is O(n)
Escaped reference might prevent JIT from making some optimizations.
Which one to choose?. measure performance for both versions several times. Then if you find significant difference, change your code, if not, don't do anything :)
Version 2 is better
but it will take little bit of more time but memory performance will be good
It depends.
Recycling objects can be useful in tight loops to eliminate GC pressure. Especially when the object is too large for the young generation or the loop runs long enough for it be tenured.
But in your particular example it's it may not help much because a hashset still contains node objects which will be created on inserting and become eligible for GC on clearing.
On the other hand, if you put so many items into the set that its internal Object[] array has to be resized multiple times and becomes too large for the young generation then it might be useful to recycle the set. But in that case you should be pre-sizing the set anyway.
Additionally objects that only live for the duration of a code block may be eligible for object decomposition/stack allocation via escape analysis. The shorter their lifetime and the less complex the code-paths touching those objects the more likely it is for EA to succeed.
In the end it doesn't matter much though until this method actually becomes an allocation hotspot in your application, in which case it would show up in profiler results and you could act accordingly.

disadvantage of using tmp variable while travessing for loop

I am doing some performance optimization for my java application and I am confuse about using the tmp variable to remove the method invocation in loop termination. Here is my situation:
Vector myVector = new Vector();
// some code
for (int i=0;i<myVector.size();i++){
//some code here;
}
I want to use
int tmp = myVector.size();
for(int i=0;i<tmp;i++){
//some code here
}
What would be negative impact of using second scenario ? My application is pretty large and I am not sure when and where myVector is being updated.
This change will not have any noticable impact on performance, neither positive nor negative. So you should not change this as long as there is no profound reason to do so.
Regarding your question
What could be negative impact of using second scenario ?
you should be aware that both implementations may behave differently in a multi-threaded environment. In the first case, changes of the vector that may be done by any other thread will be taken into account, and may affect how many times to loop is run. In the second case, the number of runs for the loop is computed once, and will not change later (even if the size of the vector changes). However, changing the contents of a vector while iterating over it with any of the both loops is dangerous and should be avoided if possible
BTW: The benchmark that was linked in the comment from #geoand is as flawed as a microbenchmark can be. This does not tell you anything.

Enhanced for loop performance

I had an argument with my friend regarding this.
Consider the below snippet,
for(i=0; i<someList.size(); i++) {
//some logic
}
Here someList.size() will be executed for every iteration, so it is recommended to migrate this size calculation to outside(before) the loop.
Now what happens when I use an extended for loop like this,
for(SpecialBean bean: someBean.getSpecialList()) {
//some logic
}
Is it necessary to move someBean.getSpecialList() to outside the loop?
How many times will someBean.getSpecialList() execute if I were to retain the 2nd snippet as it is?
Repeated calls to list.size() won't result in any performance penalty. The JIT compiler will most probably inline it and even if it doesn't, it will still be quite inexpensive because it just involves reading the value of a field.
A much more serious problem with your first example is that the loop body will have to involve list.get(i) and for a LinkedList, acessing the i th element has O(i) cost with a quite significant constant factor due to pointer chasing, which translates to data-dependent loads on the CPU level. The CPU's prefetcher can't optimize this access pattern.
This means that the overall computational complexity will be O(n2) when applied to a LinkedList.
Your second example compiles to iteration via Iterator and will evaluate someBean.getSpecialList().iterator() only once. The cost of iterator.next() is constant in all cases.
From Item 46 in Effective Java by Joshua Bloch :
The for-each loop, introduced in release 1.5, gets rid of the clutter
and the opportunity for error by hiding the iterator or index
variable completely. The resulting idiom applies equally to
collections and arrays:
// The preferred idiom for iterating over collections and arrays for
(Element e : elements) {
doSomething(e); } When you see the colon (:), read it as “in.” Thus, the loop above reads as “for each element e in elements.” Note
that there is no performance penalty for using the for-each loop, even
for arrays. In fact, it may offer a slight performance advantage over
an ordinary for loop in some circumstances, as it computes the limit
of the array index only once. While you can do this by hand (Item 45),
programmers don’t always do so.
See also is-there-a-performance-difference-between-a-for-loop-and-a-for-each-loop
An alternative to the first snippet would be:
for(i=0, l=someList.size(); i<l; i++) {
//some logic
}
With regard to the for..each loop, the call to getSpecialList() will only be made once (you could verify this by adding some debugging/logging inside the method).
As the extended loop uses an Iterator taken from the Iterable, it wouldn't be possible or sensible to execute someBean.getSpecialList() more than once. Moving it outside the loop will not change the performance of the loop, but you could do it if it improves readability.
Note: if you iterate by index it can be faster for random access collections e.g. ArrayList as it doesn't create an Iterator, but slower for indexed collections which don't support random access.
for each variation will be same as below
for (Iterator i = c.iterator(); i.hasNext(); ) {
doSomething((Element) i.next());
}
From Item 46: Prefer for-each loops to traditional for loops of Effective java
for-each loop provides compelling advantages over the tradi-
tional for loop in clarity and bug prevention, with no performance penalty. You
should use it wherever you can.
So My first guess was wrong there is no penalty using function inside for each loop.

Critical loop containing many “if” whose output is constant : How to save on condition tests ? For Java ;)

I just read this thread Critical loop containing many "if" whose output is constant : How to save on condition tests?
and this one Constant embedded for loop condition optimization in C++ with gcc which are exactly what I would like to do in Java.
I have some if conditions called many times, the conditions are composed of attributes define at initialization and which won't change.
Will the Javac optimize the bytecode by removing the unused branches of the conditions avoiding to spend time testing them?
Do I have to define the attributes as final or is it useless?
Thanks for you help,
Aurélien
Java compile time optimization is pretty lacking. If you can use a switch statement it can probably do some trivial optimizations. If the number of attributes is very large then a HashMap is going to be your best bet.
I'll close by saying that this sort of thing is very very rarely a bottleneck and trying to prematurely optimize it is counterproductive. If your code is, in fact, called a lot then the JIT optimizer will do its best to make your code run faster. Just say what you want to happen and only worry about the "how" when you find that's actually worth the time to optimize it.
In OO languages, the solution is to use delegation or the command pattern instead of if/else forests.
So your attributes need to implement a common interface like IAttribute which has a method run() (or make all attributes implement Runnable).
Now you can simply call the method without any decisions in the loop:
for(....) {
attr.run();
}
It's a bit more complex if you can't add methods to your attributes. My solution in this case is using enums and an EnumMap which contains the runnables. Access to an EnumMap is almost like an array access (i.e. O(1)).
for(....) {
map.get(attr).run();
}
I don't know about Java specifics regarding this, but you might want to look into a technique called Memoization which would allow you to look up results for a function in a table instead of calling the function. Effectively, memoization makes your program "remember" results of a function for a given input.
Try replacing the if with runtime polymorphism. No, that's not as strange as you think.
If, for example you have this:
for (int i=0; i < BIG_NUMBER; i++) {
if (calculateSomeCondition()) {
frobnicate(someValue);
} else {
defrobnicate(someValue);
}
}
then replace it with this (Function taken from Guava, but can be replaced with any other fitting interface):
Function<X> f;
if (calculateSomeCondition()) {
f = new Frobnicator();
else {
f = new Defrobnicator();
}
for int (i=0; i < BIG_NUMBER; i++) {
f.apply(someValue);
}
Method calls are pretty highly optimized on most modern JVMs even (or especially) if there are only a few possible call targets.

Categories