What is faster than ArrayList<String> in Java ? I have a list which is of an undefined length. (sometimes 4 items, sometimes 100).
What is the FASTEST way to add and get it from any list ? arrayList.add(string) and get() are very slow.
Is there a better way for this? (string s[] and then copyArray are the slowest?)
Faster for what?
"basically arraylist.add(string) and get() is very slow." - based on what evidence? And compared to what? (No need for the word 'basically' here - it's a high tech "um".) I doubt that ArrayList is the issue with your app. Profiling your code is the only way to tell whether or not you're just guessing and grasping at straws.
Even an algorithm that's O(n^2) is likely to be adequate if the data set is small.
You have to understand the Big-Oh behavior of different data structures to answer this question. Adding to the end of an ArrayList is pretty fast, unless you have to resize it. Adding in the middle may take longer.
LinkedList will be faster to add in the middle, but you'll have to iterate to get to a particular element.
Both add() to end of list and get() should run in O(1). And since length is undefined, you can't use a fixed length array. You can't do any better I'm afraid.
add(int index, E element) takes linear time for worst case though if that's why you think it's slow. If that is the case, either use Hashtable (insertion takes constant time) or TreeMap (insertion takes logarithmic time).
100 items is not very many. Your bottleneck is elsewhere.
Take a look the Jodd Utilities. They have some collections that implement ArrayList but on primatives (jodd/util/collection/), such as IntArrayList. So if you're creating a ArrayList of int, float, double, etc.. it will be faster and consume less memory.
Even faster than that is what they call a FastBuffer, which excels at add() and can provide a get() at O(1).
The classes have little interdependency, so it's easy to just drop in the class you need into your code.
You can use javolution library. http://javolution.org
http://javolution.org/target/site/apidocs/javolution/util/FastList.html
ist much faster than arraylist ;)
Try to use hashtable it is much faster
Related
Does anyone know of something similar to ArrayList that is better geared to handling really large amounts of data as quickly as possible?
I've got a program with a really large ArrayList that's getting choked up when it tries to explore or modify the ArrayList.
Presumably when you do:
//i is an int;
arrayList.remove(i);
The code behind the scenes runs something like:
public T remove(int i){
//Let's say ArrayList stores it's data in a T [] array called "contents".
T output = contents[i];
T [] overwrite = new T [contents.length - 1];
//Yes, I know generic arrays aren't created this simply. Bear with me here...
for(int x=0;x<i;x++){
overwrite[x] = contents[x];
}
for(int x=i+1;x<contents.length;x++){
overwrite[x-1] = contents[x];
}
contents = overwrite;
return output;
}
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
I've tried to alleviate this problem by creating my own custom ArrayList subclass which segments it's data storage into smaller ArrayLists. Any process that required the ArrayList to scan it's data for a specific item generates a new search thread for each of the smaller ArrayLists within (to take advantage of my multiple CPU cores).
But this system doesn't work because when the Thread calling the search has an item in any of the ArrayLists synchronized, it can block those seperate search threads from completing their search, which in turn locks up the original thread that called the search in the process, essentially deadlocking the whole program up.
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
Any ideas?
I really need some kind of data storage class oriented to containing and manipulating large amounts of objects as quickly as the PC is capable.
The answer depends a lot on what sort of data you are talking about and the specific operations you need. You use the work "explore" without defining it.
If you are talking about looking up a record then nothing beats a HashMap – ConcurrentHashMap for threaded operation. If you are talking about keeping in order, especially when dealing with threads, then I'd recommend a ConcurrentSkipListMap which has O(logN) lookup, insert, remove, etc..
You may also want to consider using multiple collections. You need to be careful that the collections don't get out of sync, which can be especially challenging with threads, but that might be faster depending on the various operations you are making.
When the size of the ArrayList is a couple million units or so, all those cycles rearranging the positions of items in the array would take a lot of time.
As mentioned ConcurrentSkipListMap is O(logN) for rearranging an item. i.e. remove and add with new position.
The [ArrayList.remove(i)] code behind the scenes runs something like: ...
Well not really. You can look at the code in the JDK right? ArrayList uses System.arraycopy(...) for these sorts of operations. They maybe not efficient for your case but it isn't O(N).
One example of good usage for a linked list is where the list elements are very large ie. large enough that only one or two can fit in CPU cache at the same time. At this point the advantage that contiguous block containers like vectors or arrays for iteration have is more or less nullified, and a performance advantage may be possible if many insertions and removals are occurring in realtime.
ref: Under what circumstances are linked lists useful?
ref : https://coderanch.com/t/508171/java/Collection-datastructure-large-data
Different collection types has different time complexity for various operations. Typical complexities are: O(1), O(N), and O(log(N)). To choose a collection, you first need to decide which operation you use often, and avoid collections which have O(N) complexity for that operations. Here you often use operation ArrayList.remove(i) which is O(N). Even worse, you use remove(i) and not remove(element). If remove(element) would have been the only operation used often, then LinkedList could help, its remove(element) is O(1), but LinkedList.remove(i)is also O(N).
I doubt that a List with remove(i) complexity of O(1) can be implemented. The best possible time is O(log(N)), which is definitely better than O(N). Java standard library has no such implementation. You can try to google it by "binary indexed tree" keywords.
But the first thing I would do is to review the algorithm and try to get rid of List.remove(i) operation.
I am using Java. I want to add to the start of an Array. Would it be more efficient to move all variables up one space in the array, leaving one spot for a new variable to be added in index 0, or to just use an ArrayList?
I am aware an ArrayList will move the values for me, but I have heard that they are very inefficient, is this true?
Are there any other APIs that will do this efficiently?
Apart from the method call overhead and some small maintenance cost, ArrayList is no more inefficient than copying array elements yourself. Some implementations of ArrayList may even be faster at moving data, by allowing the list to start somewhere else in the backing array than at index 0, as ArrayDeque does.
Neither would be efficient, because each insertion at the beginning needs to move what you've added so far. This means that inserting N elements takes O(N2) time, which is rather inefficient.
LinkedList<T>s are better for situations when you need to insert at the beginning of the list. However, they have memory overhead, and do not allow fast lookup based on the index.
If you do not need to use your list until after all elements have been inserted, you may be better off inserting elements at the back of the list, and then reversing the list before starting to use it.
ArrayList also uses Arrays internally to store the data. But, Sun/Oracle added a fastest algorithm to add the item in index 0 and move the items starting from index 1. So, better use the ArrayList for simpler coding, But if you can tweak a better algorithm, then go for Array.
If you would be adding to the first index very frequenlty, it will be very expensive as it needs to relocate all the indices from 1 to end of the array i.e it will resize it itself to adjust a new element at the top.
LinkedLists provide better performance in such cases but they do not implement the Random Access behaviour .
ArrayList provides enough performance for normal usage, and what's even more important - they are safe. So you don't need to worry about getting out-of-bounds, null-pointers etc.
To make it "faster" you can, for example, get rid of ArrayList's checking capacity etc., but then you are making your code unsafe, which means you must be sure you are setting the right parameters, because if not you will be getting IndexOutOfBounds etc.
You can read a very interesting post about Trove - using primitive collections for performance, for more information.
But 99 times out of 100, there is no real need. Remember and repeat after me:
Premature optimization is the root of all evil.
Besides, I really recommend checking out the JDK source code yourself. You can learn a lot and, obviously, see how it's made.
I need a sorted list in a scenario dominated by iteration (compared to insert/remove, not random get at all). For this reason I thought about using a skip list compared to a tree (the iterator should be faster).
The problem is that java6 only has a concurrent implementation of a skip list, so I was guessing whether it makes sense to use it in a non-concurrent scenario or if the overhead makes it a wrong decision.
For what I know ConcurrentSkipList* are basically lock-free implementations based on CAS, so they should not carry (much) overhead, but I wanted to hear somebody else's opinion.
EDIT:
After some micro-benchmarking (running iteration multiple times on different-sized TreeSet, LinkedList, ConcurrentSkipList and ArrayList) shows that there's quite an overhead. ConcurrentSkipList does store the elements in a linked list inside, so the only reason why it would be slower on iteration than a LinkedList would be due to the aforementioned overhead.
If thread-safety's not required I'd say to skip package java.util.concurrent altogether.
What's interesting is that sometimes ConcurrentSkipList is slower than TreeSet on the same input and I haven't sorted out yet why.
I mean, have you seen the source code for ConcurrentSkipListMap? :-) I always have to smile when I look at it. It's 3000 lines of some of the most insane, scary, and at the same time beautiful code I've ever seen in Java. (Kudos to Doug Lea and co. for getting all the concurrency utils integrated so nicely with the collections framework!) Having said that, on modern CPUs the code and algorithmic complexity won't even matter so much. What usually makes more difference is having the data to iterate co-located in memory, so that the CPU cache can do its job better.
So in the end I'll wrap ArrayList with a new addSorted() method that does a sorted insert into the ArrayList.
Sounds good. If you really need to squeeze every drop of performance out of iteration you could also try iterating a raw array directly. Repopulate it upon each change, e.g. by calling TreeSet.toArray() or generating it then sorting it in-place using Arrays.sort(T[], Comparator<? super T>). But the gain could be tiny (or even nothing if the JIT does its job well) so it might not be worth the inconvenience.
As measured using Open JDK 6 on typical production hardware my company uses, you can expect all add and query operations on a skip-list map to take roughly double the time as the same operation on a tree map.
examples:
63 usec vs 31 usec to create and add 200 entries.
145 ns vs 77 ns for get() on that 200-element map.
And the ratio doesn't change all that much for smaller and larger sizes.
(The code for this benchmark will eventually be shared so you can review it and run it yourself; sorry we're not ready to do that yet.)
Well you can use a lot of other structures to do the skip list, it exists in Concurrent package because concurrent data structures are a lot more complicated and because using a concurrent skip list would cost less than using other concurrent data structures to mimic a skip list.
In a single thread world is different: you can use a sorted set, a binary tree or your custom data structure that would perform better than concurrent skip list.
The complexity in iterating a tree list or a skip list will be always O(n), but if you instead use a linked list or an array list, you have the problem with insertion: to insert an item in the right position (sorted linked list) the complexity of insertion will be O(n) instead of O(log n) for a binary tree or for a skip list.
You can iterate in TreeMap.keySet() to obtain all inserted keys in order and it will not be so slow.
There is also the TreeSet class, that probably is what you need, but since it is just a wrapper to TreeMap, the direct use of TreeMap would be faster.
Without concurrency, it is usually more efficient to use a balanced binary search tree. In Java, this would be a TreeMap.
Skip lists are generally reserved for concurrent programming because of their ease in implementation the speed in multithreaded applications.
You seem to have a good grasp of the trade-off here, so I doubt anyone can give you a definitive, principled answer. Fortunately, this is pretty straightforward to test.
I started by creating a simple Iterator<String> that loops indefinitely over a finite list of randomly generated strings. (That is: on initialization, it generates an array _strings of a random strings of length b out of a pool of c distinct characters. The first call to next() returns _strings[0], the next call returns _strings[1] … the (n+1)th call returns _strings[0] again.) The strings returned by this iterator were what I used in all calls to SortedSet<String>.add(...) and SortedSet<String>remove(...).
I then wrote a test method that accepts an empty SortedSet<String> and loops d times. On each iteration, it adds e elements, then removes f elements, then iterates over the entire set. (As a sanity-check, it keeps track of the set's size by using the return values of add() and remove(), and when iterates over the entire set, it makes sure it finds the expected number of elements. Mostly I did that just so there would be something in the body of the loop.)
I don't think I need to explain what my main(...) method does. :-)
I tried various values for the various parameters, and I found that sometimes ConcurrentSkipListSet<String> performed better, and sometimes TreeSet<String> did, but the difference was never much more than twofold. In general, ConcurrentSkipListSet<String> performed better when:
a, b, and/or c were relatively large. (I mean, within the ranges I tested. My a's ranged from 1000 to 10000, my b's from 3 to 10, my c's from 10 to 80. Overall, the resulting set-sizes ranged from around 450 to exactly 10000, with modes of 666 and 6666 because I usually used e=2f.) This suggests that ConcurrentSkipListSet<String> copes somewhat better than TreeSet<String> with larger sets, and/or with more-expensive string-comparisons. Trying specific values designed to tease apart these two factors, I got the impression that ConcurrentSkipListSet<String> coped noticeably better than TreeSet<String> with larger sets, and slightly less well with more-expensive string-comparisons. (That's basically what you'd expect; TreeSet<String>'s binary-tree approach aims to do the absolute minimum possible number of comparisons.)
e and f were small; that is, when I called add(...)s and remove(...)s only a small number of times per iteration. (This is exactly what you predicted.) The exact turn-over point depended on a, b, and c, but to a first approximation, ConcurrentSkipListSet<String> performed better when e + f was less than around 10, and TreeSet<String> performed better when e + f was more than around 20.
Of course, this was on a machine that may look nothing like yours, using a JDK that may look nothing like yours, and using very artificial data that might look nothing like yours. I'd recommend that you run your own tests. Since Tree* and ConcurrentSkipList* both implement Sorted*, you should have no difficulty trying your code both ways and seeing what you find.
For what I know ConcurrentSkipList* are basically lock-free implementations based on CAS, so they should not carry (much) overhead, […]
My understanding is that this will depend on the machine. On some systems a lock-free implementation may not be possible, in which case these classes will have to use locks. (But since you're not actually multi-threading, even locks may not be all that expensive. Synchronization has overhead, of course, but its main cost is lock contention and forced single-threading. That isn't an issue for you. Again, I think you'll just have to test and see how the two versions perform.)
As noted SkipList has a lot of overhead compared to TreeMap and the TreeMap iterator isn't well suited to your use case because it just repeatedly calls the method successor() which turns out to be very slow.
So one alternative that will be significantly faster than the previous two is to write your own TreeMap iterator. Actually, I would dump TreeMap altogether since 3000 lines of code is a bit bulkier than you probably need and just write a clean AVL tree implementation with the methods you need. The basic AVL logic is just a few hundred lines of code in any language then add the iterator that works best in your case.
I am using vector of object. My issue is the removal from vector is expensive operation( O(n^2)). What would be the replacement of vector in Java. In my uses addition and removal is extensively happens.
i am C++ person don't know much Java
Well, Vector class shouldn't be used. There are so many containers available in Java. Few of them:
ArrayList is good for random access, but is bad for inserting or removing from the middle of the list.
LinkedList is bad for random access, but is fair good for iterating and adding/removing elements from the middle of container.
You can use ArrayList instead of vector in Java.
Check out this article:
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
LinkedList can add/remove items at O(1)
First of all, Vector removal time complexity is O(n) not O(n^2). If you want more performant class, you should choose LinkedList. Its time complexity is constant.
Maybe a list is not the ideal data structure for your use case - would you be better off using a HashSet if the ordering of elements is not imporant?
Actually, the difference between Vector and ArrayList is that Vector is synchronized whereas ArrayList is not. Generally, you don't need synchronization and thus you'd use ArrayList (much like StringBuffer <-> StringBuilder).
The replacement mostly depends on how you intend to use the collection.
Adding objects to an ArrayList is quite fast, since if more space is required, it is normally doubled, and if you know the size requirements in advance, even better.
Removing from an ArrayList is O(n) but iteration and random access are fast.
If you have frequent add or remove operations and otherwise iterate over the list, a LinkedList would be fine.
You could even consider using a LinkedHashMap which allows fast access as well as preserves the order of insertion.
i think, Vector using System.arrayCopy which complexity is O(n^2)
It is correct that Vector will use System.arrayCopy to move the elements. However the System.arrayCopy() call copies at most Vector.size() elements, and hence is O(N) where N is the vector's size.
Hence O(N^2) is incorrect for a single insertion / removal.
In fact, if you want better than O(N) insertion and deletion, you will need to use some kind of linked list type with a cursor abstraction that allows insertion and deletion at "the current position". Even then you only get better than O(N) if you can do the insertions / deletions in the right order; i.e. not random order.
FWIW, the Java List APIs don't provide such a cursor mechanism ... not least because it would be awkward to use, and only efficient in certain circumstances / implementations.
Thanks to everyone for there contribution which helped me to solve this problem. I used a circular queue which has been written with help of vector.
Using Java, assuming v1.6.
I have a collection where the unique index is a string and the non-unique value is an int.
I need to perform thousands of lookups against this collection as fast as possible.
I currently am using a HashMap<String, Integer> but I am worried that the boxing/unboxing of the Integer to int is making this slower.
I had thought of using an ArrayList<String> coupled with an int[].
i.e. Instead of:
int value = (int) HashMap<String, Integer>.get("key");
I could do
int value = int[ArrayList<String>.indexOf("key")];
Any thoughts? Is there a faster way to do this?
p.s. I will only build the collection once and maybe modify it once but each time I will know the size so I can use String[] instead of ArrayList but not sure there's a faster way to replicate indexOf...
Unboxing is fast - no allocations are required. Boxing is a potentially slower, as it needs to allocate a new object (unless it uses a pooled one).
Are you sure you've really got a problem though? Don't complicate your code until you've actually proved that this is a significant hit. I very much doubt that it is.
There are collection libraries available for primitive types, but I would stick to the normal HashMap from the JRE until you've profiled and checked that this is causing a problem. If it really only is thousands of lookups, I very much doubt it'll be a problem at all. Likewise if you're lookup-based rather than addition-based (i.e. you fetch more often than you add) then the boxing cost won't be particularly significant, just unboxing, which is cheap.
I would suggest using intValue() rather than the cast to convert the value to an int though - it makes it clearer (IMO) what's going on.
EDIT: To answer the question in the comment, HashMap.get(key) will be faster than ArrayList.indexOf(key) when the collection is large enough. If you've actually only got five items, the list may well be faster. I assume that's not really the case though.
If you really, really don't want the boxing/unboxing, try Trove (TObjectHashMap). There's also COLT to consider, but I couldn't find the right type in there.
Any performance gain that you get from not having to box/unbox is significanlty erased by the for loop that you need to go with the indexOf method.
Use the HashMap. Also you don't need the (int) cast, the compiler will take care of it for you.
The array thing would be ok with a small number of items in the array, but then so is the HashMap...
The only way you could make it fast to look up in an array (and this is not a real suggestion as it has too many issues) is if you use the hashCode of the String to work with as the index into the array - don't even think about doing that though! (I only mention it because you might find something via google that talks about it... if they don't explain why it is bad don't read any more about it!)
I would guess that the HashMap would give a much faster lookup, but I think this needs some benchmarking to answer correctly.
EDIT: Furthermore, There is no boxing involved, merely unboxing of the already-stored objects, which should be pretty fast, since no object allocation is done in that step. So, I don't think this would give you any more speed, but you should run benchmarks nonetheless.
I think scanning your ArrayList to find the match for your "key" is going to be much slower than your boxing/unboxing concerns.
Since you say it is indeed a bottleneck, I'll suggest Primitive Collections for Java; in particular, the ObjectKeyIntMap looks like exactly what you want.
If the cost of building the map once and once only doesn't matter, you might want to look at perfect hashing, for example Bob Jenkins' code.
One slight problem here: You can have duplicate elements in a List. If you really want to do it the second way, consider using a Set instead.
Having said that, have you done a performance test on the two to see if one is faster than the other?
Edit: Of course, the most popular Set type (HashSet) is itself backed by a HashMap, so switching to a set may not be such a wise change after all.
List.indexOf will do a linear scan of the list - O(n) typically. A binary search will do the job in O(log n). A hash table will do it in O(1).
Having large numbers of Integer objects in memory could be a problem. But then the same is true for Strings (both the String and char[]). You could do you own custom DB-style implementation, but I suggest benchmarking first.
The map access does not do unboxing for the lookup, only the later access to the result makes it slow.
I suggest to introduce a small wrapper with a getter for the int, such as SimpleInt. It holds the int without conversion. The constructor is not expensive and overall is is cheaper than an Integer.
public SimpleInt
{
private final int data;
public SimpleInt(int i)
{
data = i;
}
// getter here
....
}