Performance-critical ArrayList iteration (ioshed example)

Performance-critical ArrayList iteration (ioshed example) - java

Well, it is stated in Performance Tips that:
So, you should use the enhanced for loop by default, but consider a
hand-written counted loop for performance-critical ArrayList
iteration.
But looking at ioshed 2013 application, which is considered to be an example for most developers, in ScheduleUpdaterService.java in particular I can see the following:
void processPendingScheduleUpdates() {
try {
// Operate on a local copy of the schedule update list so as not to block
// the main thread adding to this list
List<Intent> scheduleUpdates = new ArrayList<Intent>();
synchronized (mScheduleUpdates) {
scheduleUpdates.addAll(mScheduleUpdates);
mScheduleUpdates.clear();
}
SyncHelper syncHelper = new SyncHelper(this);
for (Intent updateIntent : scheduleUpdates) {
String sessionId = updateIntent.getStringExtra(EXTRA_SESSION_ID);
boolean inSchedule = updateIntent.getBooleanExtra(EXTRA_IN_SCHEDULE, false);
LOGI(TAG, "addOrRemoveSessionFromSchedule:"
+ " sessionId=" + sessionId
+ " inSchedule=" + inSchedule);
syncHelper.addOrRemoveSessionFromSchedule(this, sessionId, inSchedule);
}
} catch (IOException e) {
// TODO: do something useful here, like revert the changes locally in the
// content provider to maintain client/server sync
LOGE(TAG, "Error processing schedule update", e);
}
}
Please notice there is an enhanced for loop iteration through scheduleUpdates, while it is suggested to avoid such type of iteration for ArrayList.
Is that because this part of the application is not considered to be critical from performance standpoint or am I not understanding something? Thanks a lot.

Yes. Readability and maintainability is much more important than performance in 99.999% of the cases. The performance tip says: "you should use the enhanced for loop by default".
So, unless you have a performance problem, and you've proven that transforming the foreach loop to a counted loop solves this performance problem, or at least significantly improves the situation, you should favor readability and maintainability, and thus the foreach loop.

You need to look at the context of the code. If you had tuned all object creation out of the loop and this was used heavily, switching to indexed loop can make a difference.
In your case, you clearly have a much more expensive operation creating a log string. Optimising this would make much, much more difference. (possibly 10 - 1000x more)

Yes, there is a small performance loss via the iterator. A regular for loop that uses this iterator will also encounter such losses, while iterating manually would be slightly faster. However this is extremely small(on the order of nanoseconds as seen here), and is negligible.
Optimizations should be made elsewhere, such as limiting object creation and destruction, and/or other expensive operations.

Related

Java, optimal calling of objects and methods

Lets say I have the following code:
private Rule getRuleFromResult(Fact result){
Rule output=null;
for (int i = 0; i < rules.size(); i++) {
if(rules.get(i).getRuleSize()==1){output=rules.get(i);return output;}
if(rules.get(i).getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
Is it better to leave it as it is or to change it as follows:
private Rule getRuleFromResult(Fact result){
Rule output=null;
Rule current==null;
for (int i = 0; i < rules.size(); i++) {
current=rules.get(i);
if(current.getRuleSize()==1){return current;}
if(current.getResultFact().getFactName().equals(result.getFactName())) output=rules.get(i);
}
return output;
}
When executing, program goes each time through rules.get(i) as if it was the first time, and I think it, that in much more advanced example (let's say as in the second if) it takes more time and slows execution. Am I right?
Edit: To answer few comments at once: I know that in this particular example time gain will be super tiny, but it was just to get the general idea. I noticed I tend to have very long lines object.get.set.change.compareTo... etc and many of them repeat. In scope of whole code that time gain can be significant.

Your instinct is correct--saving intermediate results in a variable rather than re-invoking a method multiple times is faster. Often the performance difference will be too small to measure, but there's an even better reason to do this--clarity. By saving the value into a variable, you make it clear that you are intending to use the same value everywhere; if you re-invoke the method multiple times, it's unclear if you are doing so because you are expecting it to return different results on different invocations. (For instance, list.size() will return a different result if you've added items to list in between calls.) Additionally, using an intermediate variable gives you an opportunity to name the value, which can make the intention of the code clearer.

The only different between the two codes, is that in the first you may call twice rules.get(i) if the value is different one one.
So the second version is a little bit faster in general, but you will not feel any difference if the list is not bit.

It depends on the type of the data structure that "rules" object is. If it is a list then yes the second one is much faster as it does not need to search for rules(i) through rules.get(i). If it is a data type that allows you to know immediately rules.get(i) ( like an array) then it is the same..

In general yes it's probably a tiny bit faster (nano seconds I guess), if called the first time. Later on it will be probably be improved by the JIT compiler either way.
But what you are doing is so called premature optimization. Usually should not think about things that only provide a insignificant performance improvement.
What is more important is the readability to maintain the code later on.
You could even do more premature optimization like saving the length in a local variable, which is done by the for each loop internally. But again in 99% of cases it doesn't make sense to do it.

disadvantage of using tmp variable while travessing for loop

I am doing some performance optimization for my java application and I am confuse about using the tmp variable to remove the method invocation in loop termination. Here is my situation:
Vector myVector = new Vector();
// some code
for (int i=0;i<myVector.size();i++){
//some code here;
}
I want to use
int tmp = myVector.size();
for(int i=0;i<tmp;i++){
//some code here
}
What would be negative impact of using second scenario ? My application is pretty large and I am not sure when and where myVector is being updated.

This change will not have any noticable impact on performance, neither positive nor negative. So you should not change this as long as there is no profound reason to do so.
Regarding your question
What could be negative impact of using second scenario ?
you should be aware that both implementations may behave differently in a multi-threaded environment. In the first case, changes of the vector that may be done by any other thread will be taken into account, and may affect how many times to loop is run. In the second case, the number of runs for the loop is computed once, and will not change later (even if the size of the vector changes). However, changing the contents of a vector while iterating over it with any of the both loops is dangerous and should be avoided if possible
BTW: The benchmark that was linked in the comment from #geoand is as flawed as a microbenchmark can be. This does not tell you anything.

Multithreaded library exposing unsafe ArrayList

I am using a shared library in Java that returns ArrayList; as I iterate over it, a ConcurrentModificationException could be thrown and I am looking for 100% (?) guarantee to be safe. I was thinking on something like below and I'd appreciate any input.
The data_list is the ArrayList<> returned from the MT library.
boolean pass = true;
ArrayList<Something> local = new ArrayList<Something>(256);
for (int spin=0; spin<10; ++spin)
{
try {
local.addAll(data_list);
}
catch (java.util.ConcurrentModificationException ce) {
pass = false;
}
finally {
if (pass) break;
pass = true;
}
}
Assuming variable pass is true, how should I operate on local?

There is no safe way to do this. You should not catch ConcurrentModificationException.
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Some collections, like HashMap, even can enter an infinite loop when used this way. Here's an explanation of how it happens.
You should not do this. There is no correct way to do this.
Either you misunderstand how the library works, or you need to switch out your library with one written by a competent developer.
What library are you using?

You don't define exactly what you mean by safe, and don't specify what kind of modifications are being performed to the list, but in many cases it may be acceptable to iterate over it manually by index, i.e.
for (int index = 0; index < data_list.size(); index ++)
local.add(data_list.get(index));
The way I see it, there are four possible kinds of modification, with varying degrees of acceptability:
New items could be appended. This solution should work appropriately for this case, as long as the list does not grow enough to trigger a backing list expansion (and as this should happen with exponentially-reducing frequency, retrying if it occurs should be guaranteed to succeed eventually).
Existing items may be modified. This solution may not present a consistent view of the contents of the list at any given time, but it would be guaranteed to provide a usable list that is representative of items that have been in the list, which may be acceptable depending on your definition of "safe".
Items may be removed. There is a small chance this solution would fail with an IndexOutOfBoundsException, and the same caveat as for items being modified would apply with regards to consistency.
Items may be inserted into the middle of the list. The same caveat as items being modified would apply, and there would also be a danger of getting duplicated values. The problems with backing array expansion from the appending case would also apply.

You've got a bad situation here, but I think your solution is as sound as possible. The new ArrayList should go in the loop so you start fresh after each failure. Actually, the best thing might be to make your "try" line look like:
local = new ArrayList<Something>( data_list );
You don't want your ArrayList to have to expand itself because that will take time when you're trying to grab the data before the list changes. This should set the size, create it, and fill it with the least wasted effort.
You might need to catch things other than ConcurrentModification. You'll probably learn what the hard way. Or just catch Throwable.
If you want to go to extremes, run the code inside the for loop in it's own thread so if it does hang you can kill it and restart it. That's going to take some work.
I think this will work, if you let "spin" get large enough.

I don't have any fundamental changes, but I think that code could be simplified a bit:
ArrayList<Something> local = new ArrayList<Something>(256);
for (int spin=0; spin<10; ++spin)
{
try {
local.addAll(data_list);
break;
}
catch (java.util.ConcurrentModificationException ce) {}
}

Is Java foreach loop an overkill for repeated execution

I agree foreach loop reduces typing and good for readability.
A little backup, I work on low latency application development and receive 1Million packets to process per second. Iterating through a million packets and sending this information across to its listeners. I was using foreach loop to iterate through the set of listeners.
Doing profiling i figured there are a lot of Iterator objects created to execute foreach loop. Converting foreach loop to index based foreach I observed a huge drop in the number of objects created there by reducing no. of GC's and increasing application throughput.
Edit: (Sorry for confusion, making this Q more clearer)
For example i have list of listeners(fixed size) and i loop through this forloop a million times a second. Is foreach an overkill in java?
Example:
for(String s:listOfListeners)
{
// logic
}
compared to
for (int i=0;i<listOfListeners.size();i++)
{
// logic
}
Profiled screenshot for the code
for (int cnt = 0; cnt < 1_000_000; cnt++)
{
for (String string : list_of_listeners)
{
//No Code here
}
}

EDIT: Answering the vastly different question of:
For example i have list of listeners(fixed size) and i loop through this forloop a million times a second. Is foreach an overkill in java?
That depends - does your profiling actually show that the extra allocations are significant? The Java allocator and garbage collector can do a lot of work per second.
To put it another way, your steps should be:
Set performance goals alongside your functional requirements
Write the simplest code you can to achieve your functional requirements
Measure whether that code meets the functional requirements
If it doesn't:
Profile to work out where to optimize
Make a change
Run the tests again to see whether they make a significant difference in your meaningful metrics (number of objects allocated probably isn't a meaningful metric; number of listeners you can handle probably is)
Go back to step 3.
Maybe in your case, the enhanced for loop is significant. I wouldn't assume that it is though - nor would I assume that the creation of a million objects per second is significant. I would measure the meaningful metrics before and after... and make sure you have concrete performance goals before you do anything else, as otherwise you won't know when to stop micro-optimizing.
Size of list is around a million objects streaming in.
So you're creating one iterator object, but you're executing your loop body a million times.
Doing profiling i figured there are a lot of Iterator objects created to execute foreach loop.
Nope? Only a single iterator object should be created. As per the JLS:
The enhanced for statement is equivalent to a basic for statement of the form:
for (I #i = Expression.iterator(); #i.hasNext(); ) {
VariableModifiersopt TargetType Identifier =
(TargetType) #i.next();
Statement
}
As you can see, that calls the iterator() method once, and then calls hasNext() and next() on it on each iteration.
Do you think that extra object allocation will actually hurt your performance significantly?
How much do you value readability over performance? I take the approach of using the enhanced for loop wherever it helps readability, until it proves to be a performance problem - and my personal experience is that it's never hurt performance significantly in anything I've written. That's not to say that would be true for all applications, but the default position should be to only use the less readable code after proving it will improve things significantly.

The "foreach" loop creates just one Iterator object, while the second loop creates none. If you are executing many, many separate loops that execute just a few times each, then yes, "foreach" may be unnecessarily expensive. Otherwise, this is micro-optimizing.

EDIT: The question has changed so much since I wrote my answer that I'm not sure what I'm answering at the moment.
Looking up stuff with list.get(i) can actually be a lot slower if it's a linked list, since for each lookup, it has to traverse the list, while the iterator remembers the position.
Example:
list.get(0) will get the first element
list.get(1) will first get the first element to find pointer to the next
list.get(2) will first get the first element, then go to the second and then to the third
etc.
So to do a full loop, you're actually looping over elements in this manner:
0
0->1
0->1->2
0->1->2->3
etc.

I do not think you should worry about the effectiveness here.
Most of time is consumed by your actual application logic (in this case - by what you do inside the loop).
So, I would not worry about the price you pay for convenience here.

Is this comparison even fair ? You are comparing using an Iterator Vs using get(index)
Furthermore, each loop would only create one additional Iterator. Unless the Iterator is itself inefficient for some reason, you should see comparable performance.

Critical loop containing many “if” whose output is constant : How to save on condition tests ? For Java ;)

I just read this thread Critical loop containing many "if" whose output is constant : How to save on condition tests?
and this one Constant embedded for loop condition optimization in C++ with gcc which are exactly what I would like to do in Java.
I have some if conditions called many times, the conditions are composed of attributes define at initialization and which won't change.
Will the Javac optimize the bytecode by removing the unused branches of the conditions avoiding to spend time testing them?
Do I have to define the attributes as final or is it useless?
Thanks for you help,
Aurélien

Java compile time optimization is pretty lacking. If you can use a switch statement it can probably do some trivial optimizations. If the number of attributes is very large then a HashMap is going to be your best bet.
I'll close by saying that this sort of thing is very very rarely a bottleneck and trying to prematurely optimize it is counterproductive. If your code is, in fact, called a lot then the JIT optimizer will do its best to make your code run faster. Just say what you want to happen and only worry about the "how" when you find that's actually worth the time to optimize it.

In OO languages, the solution is to use delegation or the command pattern instead of if/else forests.
So your attributes need to implement a common interface like IAttribute which has a method run() (or make all attributes implement Runnable).
Now you can simply call the method without any decisions in the loop:
for(....) {
attr.run();
}
It's a bit more complex if you can't add methods to your attributes. My solution in this case is using enums and an EnumMap which contains the runnables. Access to an EnumMap is almost like an array access (i.e. O(1)).
for(....) {
map.get(attr).run();
}

I don't know about Java specifics regarding this, but you might want to look into a technique called Memoization which would allow you to look up results for a function in a table instead of calling the function. Effectively, memoization makes your program "remember" results of a function for a given input.

Try replacing the if with runtime polymorphism. No, that's not as strange as you think.
If, for example you have this:
for (int i=0; i < BIG_NUMBER; i++) {
if (calculateSomeCondition()) {
frobnicate(someValue);
} else {
defrobnicate(someValue);
}
}
then replace it with this (Function taken from Guava, but can be replaced with any other fitting interface):
Function<X> f;
if (calculateSomeCondition()) {
f = new Frobnicator();
else {
f = new Defrobnicator();
}
for int (i=0; i < BIG_NUMBER; i++) {
f.apply(someValue);
}
Method calls are pretty highly optimized on most modern JVMs even (or especially) if there are only a few possible call targets.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.