String indexed collection in Java

String indexed collection in Java - java

Using Java, assuming v1.6.
I have a collection where the unique index is a string and the non-unique value is an int.
I need to perform thousands of lookups against this collection as fast as possible.
I currently am using a HashMap<String, Integer> but I am worried that the boxing/unboxing of the Integer to int is making this slower.
I had thought of using an ArrayList<String> coupled with an int[].
i.e. Instead of:
int value = (int) HashMap<String, Integer>.get("key");
I could do
int value = int[ArrayList<String>.indexOf("key")];
Any thoughts? Is there a faster way to do this?
p.s. I will only build the collection once and maybe modify it once but each time I will know the size so I can use String[] instead of ArrayList but not sure there's a faster way to replicate indexOf...

Unboxing is fast - no allocations are required. Boxing is a potentially slower, as it needs to allocate a new object (unless it uses a pooled one).
Are you sure you've really got a problem though? Don't complicate your code until you've actually proved that this is a significant hit. I very much doubt that it is.
There are collection libraries available for primitive types, but I would stick to the normal HashMap from the JRE until you've profiled and checked that this is causing a problem. If it really only is thousands of lookups, I very much doubt it'll be a problem at all. Likewise if you're lookup-based rather than addition-based (i.e. you fetch more often than you add) then the boxing cost won't be particularly significant, just unboxing, which is cheap.
I would suggest using intValue() rather than the cast to convert the value to an int though - it makes it clearer (IMO) what's going on.
EDIT: To answer the question in the comment, HashMap.get(key) will be faster than ArrayList.indexOf(key) when the collection is large enough. If you've actually only got five items, the list may well be faster. I assume that's not really the case though.
If you really, really don't want the boxing/unboxing, try Trove (TObjectHashMap). There's also COLT to consider, but I couldn't find the right type in there.

Any performance gain that you get from not having to box/unbox is significanlty erased by the for loop that you need to go with the indexOf method.
Use the HashMap. Also you don't need the (int) cast, the compiler will take care of it for you.
The array thing would be ok with a small number of items in the array, but then so is the HashMap...
The only way you could make it fast to look up in an array (and this is not a real suggestion as it has too many issues) is if you use the hashCode of the String to work with as the index into the array - don't even think about doing that though! (I only mention it because you might find something via google that talks about it... if they don't explain why it is bad don't read any more about it!)

I would guess that the HashMap would give a much faster lookup, but I think this needs some benchmarking to answer correctly.
EDIT: Furthermore, There is no boxing involved, merely unboxing of the already-stored objects, which should be pretty fast, since no object allocation is done in that step. So, I don't think this would give you any more speed, but you should run benchmarks nonetheless.

I think scanning your ArrayList to find the match for your "key" is going to be much slower than your boxing/unboxing concerns.

Since you say it is indeed a bottleneck, I'll suggest Primitive Collections for Java; in particular, the ObjectKeyIntMap looks like exactly what you want.

If the cost of building the map once and once only doesn't matter, you might want to look at perfect hashing, for example Bob Jenkins' code.

One slight problem here: You can have duplicate elements in a List. If you really want to do it the second way, consider using a Set instead.
Having said that, have you done a performance test on the two to see if one is faster than the other?
Edit: Of course, the most popular Set type (HashSet) is itself backed by a HashMap, so switching to a set may not be such a wise change after all.

List.indexOf will do a linear scan of the list - O(n) typically. A binary search will do the job in O(log n). A hash table will do it in O(1).
Having large numbers of Integer objects in memory could be a problem. But then the same is true for Strings (both the String and char[]). You could do you own custom DB-style implementation, but I suggest benchmarking first.

The map access does not do unboxing for the lookup, only the later access to the result makes it slow.
I suggest to introduce a small wrapper with a getter for the int, such as SimpleInt. It holds the int without conversion. The constructor is not expensive and overall is is cheaper than an Integer.
public SimpleInt
{
private final int data;
public SimpleInt(int i)
{
data = i;
}
// getter here
....
}

Related

Is there a reason to use a mapping of string => index into a vector, instead of string => object?

If I have a collection of objects that I'd like to be able to look up by name, I could of course use a { string => object } map.
Is there ever a reason to use a vector of objects along with a { string => index into this vector } companion map, instead?
I've seen a number of developers do this over the years, and I've largely dismissed it as an indication that the developer is unfamiliar with maps, or is otherwise confused. But in recent days, I've started second-guessing myself, and I'm concerned I might be missing a potential optimization or something, though I can't for the life of me figure out what that could optimize.

There is one reason I can think of:
Besides looking up object by name, sometimes you also want to iterate through all the objects as efficient as possible. Using a map + vector can achieve this. You pay a very small penalty for accessing the vector via index, but you could gain a big performance improvement by iterating a vector rather than a map (because vector is in continuous memory and more cache friendly).
Of course you can do similar thing using boost::multiindex, but that has some restrictions on the object itself.

I can think of at least a couple reasons:
For unrelated reasons you need to retain insertion order.
You want to have multiple maps pointing into the vector (different indexes).
Not all items in the vector need to be pointed to by a string.

There's no optimization. If you think about it, it might actually decrease performance (albeit by a few micronanoseconds). This is because the vector-based "solution" would require an extra step to find the object in the vector, while the non-vector-based solution doesn't have to do that.

Perform arithmetic on number array without iterating

For example, I would like to do something like the following in java:
int[] numbers = {1,2,3,4,5};
int[] result = numbers*2;
//result now equals {2,4,6,8,10};
Is this possible to do without iterating through the array? Would I need to use a different data type, such as ArrayList? The current iterating step is taking up some time, and I'm hoping something like this would help.

No, you can't multiply each item in an array without iterating through the entire array. As pointed out in the comments, even if you could use the * operator in such a way the implementation would still have to touch each item in the array.
Further, a different data type would have to do the same thing.

I think a different answer from the obvious may be beneficial to others who have the same problem and don't mind a layer of complexity (or two).
In Haskell, there is something known as "Lazy Evaluation", where you could do something like multiply an infinitely large array by two, and Haskell would "do" that. When you accessed the array, it would try to evaluate everything as needed. In Java, we have no such luxury, but we can emulate this behavior in a controllable manner.
You will need to create or extend your own List class and add some new functions. You would need functions for each mathematical operation you wanted to support. I have examples below.
LazyList ll = new LazyList();
// Add a couple million objects
ll.multiplyList(2);
The internal implementation of this would be to create a Queue that stores all the primitive operations you need to perform, so that order of operations is preserved. Now, each time an element is read, you perform all operations in the Queue before returning the result. This means that reads are very slow (depending on the number of operations performed), but we at least get the desired result.
If you find yourself iterating through the whole array each time, it may be useful to de-queue at the end instead of preserving the original values.
If you find that you are making random accesses, I would preserve the original values and returned modified results when called.
If you need to update entries, you will need to decide what that means. Are you replacing a value there, or are you replacing a value after the operations were performed? Depending on your answer, you may need to run backwards through the queue to get a "pre-operations" value to replace an older value. The reasoning is that on the next read of that same object, the operations would be applied again and then the value would be restored to what you intended to replace in the list.
There may be other nuances with this solution, and again the way you implement it would be entirely different depending on your needs and how you access this (sequentially or randomly), but it should be a good start.

With the introduction of Java 8 this task can be done using streams.
private long tileSize(int[] sizes) {
return IntStream.of(sizes).reduce(1, (x, y) -> x * y);
}

No it isn't. If your collection is really big and you want to do it faster you can try to operates on elements in 2 or more threads, but you have to take care of synchronization(use synchronized collection) or divide your collection to 2(or more) collections and in each thread operate on one collection. I'm not sure wheather it will be faster than just iterating through the array - it depends on size of your collection and on what you want to do with each element. If you want to use this solution you will have wheather is it faster in your case - it might be slower and definitely it will be much more complicated.
Generally - if it's not critical part of code and time of execution isn't too long i would leave it as it is now.

Java /Android what is faster than ArrayList?

What is faster than ArrayList<String> in Java ? I have a list which is of an undefined length. (sometimes 4 items, sometimes 100).
What is the FASTEST way to add and get it from any list ? arrayList.add(string) and get() are very slow.
Is there a better way for this? (string s[] and then copyArray are the slowest?)

Faster for what?
"basically arraylist.add(string) and get() is very slow." - based on what evidence? And compared to what? (No need for the word 'basically' here - it's a high tech "um".) I doubt that ArrayList is the issue with your app. Profiling your code is the only way to tell whether or not you're just guessing and grasping at straws.
Even an algorithm that's O(n^2) is likely to be adequate if the data set is small.
You have to understand the Big-Oh behavior of different data structures to answer this question. Adding to the end of an ArrayList is pretty fast, unless you have to resize it. Adding in the middle may take longer.
LinkedList will be faster to add in the middle, but you'll have to iterate to get to a particular element.

Both add() to end of list and get() should run in O(1). And since length is undefined, you can't use a fixed length array. You can't do any better I'm afraid.
add(int index, E element) takes linear time for worst case though if that's why you think it's slow. If that is the case, either use Hashtable (insertion takes constant time) or TreeMap (insertion takes logarithmic time).

100 items is not very many. Your bottleneck is elsewhere.

Take a look the Jodd Utilities. They have some collections that implement ArrayList but on primatives (jodd/util/collection/), such as IntArrayList. So if you're creating a ArrayList of int, float, double, etc.. it will be faster and consume less memory.
Even faster than that is what they call a FastBuffer, which excels at add() and can provide a get() at O(1).
The classes have little interdependency, so it's easy to just drop in the class you need into your code.

You can use javolution library. http://javolution.org
http://javolution.org/target/site/apidocs/javolution/util/FastList.html
ist much faster than arraylist ;)

Try to use hashtable it is much faster

Are there reasons to prefer Arrays over ArrayLists? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Benefits of arrays
Hey there,
are there any reasons to prefer Arrays (MyObject[]) over ArrayLists (List<MyObject>)? The only left place to use Arrays is for primitive data types (int, boolean, etc.). However I have no reasonable explanation for this, it just makes the code a little bit slimmer.
In general I use List in order to maintain a better flexibility. But are there reasons left to use real Arrays?
I would like to know,
best regards

I prefer to use Arrays over ArrayLists whenever I know I am only going to work with a fixed number of elements. My reasons are mostly subjective, but I'm listing them here anyway:
Using Collection classes for primitives is appreciably slower since they have to use autoboxing and wrappers.
I prefer the more straightforward [] syntax for accessing elements over ArrayList's get(). This really becomes more important when I need multidimensional arrays.
ArrayLists usually allocate about twice the memory you need now in advance so that you can append items very fast. So there is wastage if you are never going to add any more items.
(Possibly related to the previous point) I think ArrayList accesses are slower than plain arrays in general. The ArrayList implementation uses an underlying array, but all accesses have to go through the get(), set(), remove(), etc. methods which means it goes through more code than a simple array access. But I have not actually tested the difference so I may be wrong.
Having said that, I think the choice actually depends on what you need it for. If you need a fixed number of elements or if you are going to use multiple dimensions, I would suggest a plain array. But if you need a simple random access list and are going to be making a lot of inserts and removals to it, it just makes a lot more sense to use an Arraylist

Generally arrays have their problems, e.g. type safety:
Integer[] ints = new Integer[10];
Number[] nums = ints; //that shouldn't be allowed
nums[3] = Double.valueOf[3.14]; //Ouch!
They don't play well with collections, either. So generelly you should prefer Collections over arrays. There are just a few things where arrays may be more convenient. As you already say primitive types would be a reason (although you could consider using collection-like libs like Trove). If the array is hidden in an object and doesn't need to change its size, it's OK to use arrays, especially if you need all performance you can get (say 3D and 4D Vectors and Matrices for 3D graphics). Another reason for using arrays may be if your API has lots of varargs methods.
BTW: There is a cute trick using an array if you need mutable variables for anonymous classes:
public void f() {
final int[] a = new int[1];
new Thread(new Runnable() {
public void run() {
while(true) {
System.out.println(a[0]++);
}
}
}).start();
}
Note that you can't do this with an int variable, as it must be final.

I think that the main difference of an array and a list is, that an array has a fixed length. Once it's full, it's full. ArrayLists have a flexible length and do use arrays to be implemented. When the arrayList is out of capacity, the data gets copied to another array with a larger capacity (that's what I was taught once).
An array can still be used, if you have your data length fixed. Because arrays are pretty primitive, they don't have much methods to call and all. The advantage of using these arrays is not so big anymore, because the arrayLists are just good wrappers for what you want in Java or any other language.
I think you can even set a fixed capacity to arraylists nowadays, so even that advantage collapses.
So is there any reason to prefer them? No probably not, but it does make sure that you have just a little more space in your memory, because of the basic functionality. The arraylist is a pretty big wrapper and has a lot of flexibility, what you do not always want.

one for you:
sorting List (via j.u.Collections) are first transformed to [], then sorted (which clones the [] once again for the merge sort) and then put back to List.
You do understand that ArrayList has a backing Object[] under the cover.
Back in the day there was a case ArrayList.get was not inlined by -client hotspot compiler but now I think that's fixed. Thus, performance issue using ArrayList compared to Object[] is not so tough, the case cast to the appropriate type still costs a few clocks (but it should be 99.99% of the times predicted by the CPU); accessing the elements of the ArrayList may cost one more cache-miss more and so (or the 1st access mostly)
So it does depend what you do w/ your code in the end.
Edit
Forgot you can have atomic access to the elements of the array (i.e. CAS stuff), one impl is j.u.c.atomic.AtomicReferenceArray. It's not the most practical ones since it doesn't allow CAS of Objec[][] but Unsafe comes to the rescue.

Java Performance - ArrayLists versus Arrays for lots of fast reads

I have a program where I need to make 100,000 to 1,000,000 random-access reads to a List-like object in as little time as possible (as in milliseconds) for a cellular automata-like program. I think the update algorithm I'm using is already optimized (keeps track of active cells efficiently, etc). The Lists do need to change size, but that performance is not as important. So I am wondering if the performance from using Arrays instead of ArrayLists is enough to make a difference when dealing with that many reads in such short spans of time. Currently, I'm using ArrayLists.
Edit:
I forgot to mention: I'm just storing integers, so another factor is using the Integer wrapper class (in the case of ArrayLists) versus ints (in the case of arrays). Does anyone know if using ArrayList will actually require 3 pointer look ups (one for the ArrayList, one for the underlying array, and one for the Integer->int) where as the array would only require 1 (array address+offset to the specific int)? Would HotSpot optimize the extra look ups away? How significant are those extra look ups?
Edit2:
Also, I forgot to mention I need to do random access writes as well (writes, not insertions).

Now that you've mentioned that your arrays are actually arrays of primitive types, consider using the collection-of-primitive-type classes in the Trove library.
#viking reports significant (ten-fold!) speedup using Trove in his application - see comments. The flip-side is that Trove collection types are not type compatible with Java's standard collection APIs. So Trove (or similar libraries) won't be the answer in all cases.

Try both, but measure.
Most likely you could hack something together to make the inner loop use arrays without changing all that much code. My suspicion is that HotSpot will already inline the method calls and you will see no performance gain.
Also, try Java 6 update 14 and use -XX:+DoEscapeAnalysis

ArrayLists are slower than Arrays, but most people consider the difference to be minor. In your case could matter though, since you're dealing with hundreds of thousands of them.
By the way, duplicate: Array or List in Java. Which is faster?

I would go with Kevin's advise.
Stay with the lists first and measure your performance if your programm is to slow compare it to a version with an array. If that gives you a measurable performance boost go with the arrays, if not stay with the lists because they will make your life much much easier.

There will be an overhead from using an ArrayList instead of an array, but it is very likely to be small. In fact, the useful bit of data in the ArrayList can be stored in registers, although you will probably use more (List size for instance).
You mention in your edit that you are using wrapper objects. These do make a huge difference. If you are typically using the same value repeatedly, then a sensible cache policy may be useful (Integer.valueOf gives the same results for -128 to 128). For primitives, primitive arrays usually win comfortably.
As a refinement, you might want to make sure the adjacent cells tend to be adjacent in the array (you can do better than rows of columns with a space filling curve).

One possibility would be to re-implement ArrayList (it's not that hard), but expose the backing array via a lock/release call cycle. This gets you convenience for your writes, but exposes the array for a large series of read/write operations that you know in advance won't impact the array size. If the list is locked, add/delete is not allowed - just get/set.
for example:
SomeObj[] directArray = myArrayList.lockArray();
try{
// myArrayList.add(), delete() would throw an illegal state exception
for (int i = 0; i < 50000; i++){
directArray[i] += 1;
}
} finally {
myArrayList.unlockArray();
}
This approach continues to encapsulate the array growth/etc... behaviors of ArrayList.

Java uses double indirection for its objects so they can be moved about in memory and have its references still be valid, this means every reference lookup is equivalent to two pointer lookups. These extra lookups cannot be optimised away completely.
Perhaps even worse is your cache performance will be terrible. Accessing values in cache is goings to be many times faster than accessing values in main memory. (perhaps 10x) If you have an int[] you know the values will be consecutive in memory and thus load into cache readily. However, for Integer[] the Integers individual objects can appear randomly across your memory and are much more likely to be cache misses. Also Integer use 24 bytes which means they are much less likely to fit into your caches than 4 byte values.
If you update an Integer, this often results in a new object created which is many orders of magnitude than updating an int value.

If you're creating the list once, and doing thousands of reads from it, the overhead from ArrayList may well be slight enough to ignore. If you're creating thousands of lists, go with the standard array. Object creation in a loop quickly goes quadratic, simply because of all the overhead of instantiating the member variables, calling the constructors up the inheritance chain, etc.
Because of this -- and to answer your second question -- stick with standard ints rather than the Integer class. Profile both and you'll quickly (or, rather, slowly) see why.

If you're not going to be doing a lot more than reads from this structure, then go ahead and use an array as that would be faster when read by index.
However, consider how you're going to get the data in there, and if sorting, inserting, deleting, etc, are a concern at all. If so, you may want to consider other collection based structures.

Primitives are much (much much) faster. Always. Even with JIT escape analysis, etc. Skip wrapping things in java.lang.Integer. Also, skip the array bounds check most ArrayList implementations do on get(int). Most JIT's can recognize simple loop patterns and remove the loop, but there isn't much reason to much with it if you're worried about performance.
You don't have to code primitive access yourself - I'd bet you could cut over to using IntArrayList from the COLT library - see http://acs.lbl.gov/~hoschek/colt/ - "Colt provides a set of Open Source Libraries for High Performance Scientific and Technical Computing in Java") - in a few minutes of refactoring.

The options are:
1. To use an array
2. To use the ArrayList which internally uses an array
It is obvious the ArrayList introduces some overhead (look into ArrayList source code). For the 99% of the use cases this overhead can be easily ignored. However if you implement time sensitive algorithms and do tens of millions of reads from a list by index then using bare arrays instead of lists should bring noticeable time savings. USE COMMON SENSE.
Please take a look here: http://robaustin.wikidot.com/how-does-the-performance-of-arraylist-compare-to-array I would personally tweak the test to avoid compiler optimizations, e.g. I would change "j = " into "j += " with the subsequent use of "j" after the loop.

An Array will be faster simply because at a minimum it skips a function call (i.e. get(i)).
If you have a static size, then Arrays are your friend.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.