Most Lightweight Java Collection - java

If I am going to create a Java Collection, and only want to fill it with elements, and then iterate through it (without knowing the necessary size beforehand), i.e. all I need is Collection<E>.add(E) and Collection<E>.iterator(), which concrete class should I choose? Is there any advantage to using a Set rather than a List, for example? Which one would have the least overhead?

which concrete class should I choose?
I would probably just go with an ArrayList or a LinkedList. Both support the add and iterator methods, and neighter of them have any considerable overhead.
Is there any advantage to using a Set rather than a List, for example?
No, I wouldn't say so. (Unless you rely on the order of the elements, in which case you must use a List, or want to disallow duplicates, in which case you should use a Set.)
(I don't see how any Set implementation could beat a list implementation for add / iterator methods, so I'd probably go with a List even if I don't care about order.)
Which one would have the least overhead?
Sounds like micro benchmarking here, but if I'd be forced to guess, I'd say ArrayList (or perhaps LinkedList in coner cases where ArrayLists need to reallocate memory often :-)

Do not go with a Set. Sets and Lists differ according to their purpose, that you should always consider when choosing the right Collection
a List is there for maintaining elements in the order you added them; and if you insert the same element twice it will be kept twice
a Set is there for holding one specific element exactly once (uniqueness); order is only relevant for specific implementations (like TreeSet), but still elements that are 'the same' would not be added twice

Set is only meaningful if you want to sort your objects and to make sure no duplicate element is 'registered'. Else, an ArrayList is just fine.
However, if you want to add elements while iterating too, an ArrayBlockingQueue is better.

Here are some key points which can help you to choose your collection according to your requirement -
List(ArrayList or LinkedList)
Allowed duplicate values.
Insertion order preserved.
Set
Not allowed duplicate values.
Insertion order is not preserved.
So according to your requirement List seems to be a suitable choice.
Now Between ArrayList and LinkedList -
ArrayList is a random access list. Use if your frequent operation is the retrieval of elements.
LinkedList is the best option if you want to add or remove elements from the list.

Related

The right data structure for random acces and no duplicates

I need to store unique objects in some datastructure, but also I need access by index like in ArrayList or plain array. Maybe there is some elegant way to do this without using convertions from set to array, iteration through all elemnts, checking value while adding to ArrayList and others.
I would be grateful for any help and advices.
You should have a look at ListOrderedSet from Apache Commons Collections:
Decorates another Set to ensure that the order of addition is retained and used by the iterator.
If an object is added to the set for a second time, it will remain in the original position in the iteration. The order can be observed from the set via the iterator or toArray methods.
The ListOrderedSet also has various useful direct methods. These include many from List, such as get(int), remove(int) and indexOf(int). An unmodifiable List view of the set can be obtained via asList().
Make your own class containing a HashSet<T> and an ArrayList<T>. For the add/append operation, if the element is not already in the set, append it to the list and add it to the HashSet. You'll use about twice as much memory as a normal ArrayList, but you'll get O(1) random access and contains operations.

Java random insert collection

I am considering using a Java collection that would work best with random insertions. I will be inserting a lot and only read the collection once at the end.
My desired functionality is adding an element at a specified index, anywhere between <0, current_length>. Which collection would be the most efficient to use?
Useful link for your reference:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
Not entirely sure how you will be reading the information post input (and how important it is to you). Hashmap or ArrayList would make sense depending on what you are looking to do. Also not sure if you are looking for something thread safe or not.
Hope it helps.
The inefficiency of using List is endemic to the problem. Every time you add something, every subsequent element will have to be re-indexed - as the javadoc states:
Shifts the element currently at that position (if any) and any
subsequent elements to the right (adds one to their indices).
From your question/comments, it would appear that you have a bunch of Objects, and you're sorting them as you go. I'd suggest a more efficient solution to this problem would be to write a Comparator (or make your object implement Comparable), and then use Collections.sort(list, comparator) (or Collections.sort(list)).
You might suggest that your Objects are being sorted on the basis of other variables. In which case, you could create an extension of the Object, with those other variables as fields and extending Comparable, and with a method like getOriginal(). You add these wrapped objects to your list, sort, and then iterate through the list, adding the original objects (from getOriginal()) to a new list.
For info on the sorting algorithm of collections - see this SO question

Efficient java data structure to delete and retrieve information?

I have a situation where I have need a data structure that I can add strings to. This data structure is very large.
The specific qualities I need it have are:
get(index)
delete a certain number of entries that were added initially when the limit exceeds.(LIFO)
I've tried using an ArrayList but the delete operation is o(n) and for a linkedList the traverse or get() operation will be o(n).
What other options do I have?
circular buffer - one thats implemented with an array under the hood.
LinkedHashSet might be of interest. It is effectively a HashSet but it also maintains a LinkedList to allow a predictable iteration order - and therefore can also be used as a FIFO queue, with the nice added benefit that it can't contain duplicate entries.
Because it is a HashSet too, searches (as opposed to scans) can be O(1) if they can match on equals()
You can have a look at this question and this too.

Randomly getting elements in a HashMap or HashSet without looping

I have roughly 420,000 elements that I need to store easily in a Set or List of some kind. The restrictions though is that I need to be able to pick a random element and that it needs to be fast.
Initially I used an ArrayList and a LinkedList, however with that many elements it was very slow. When I profiled it, I saw that the equals() method in the object I was storing was called roughly 21 million times in a very short period of time.
Next I tried a HashSet. What I gain in performance I loose in functionality: I can't pick a random element. HashSet is backed by a HashMap which is backed by an array of HashMap.Entry objects. However when I attempted to expose them I was hindered by the crazy private and package-private visibility of the entire Java Collections Framework (even copying and pasting the class didn't work, the JCF is very "Use what we have or roll your own").
What is the best way to randomly select an element stored in a HashSet or HashMap? Due to the size of the collection I would prefer not to use looping.
IMPORTANT EDIT: I forgot a really important detail: exactly how I use the Collection. I populate the entire Collection at the begging of the table. During the program I pick and remove a random element, then pick and remove a few more known elements, then repeat. The constant lookup and changing is what causes the slowness
There's no reason why an ArrayList or a LinkedList would need to call equals()... although you don't want a LinkedList here as you want quick random access by index.
An ArrayList should be ideal - create it with an appropriate capacity, add all the items to it, and then you can just repeatedly pick a random number in the appropriate range, and call get(index) to get the relevant value.
HashMap and HashSet simply aren't suitable for this.
If ALL you need to do is get a large collection of values and pick a random one, then ArrayList is (literally) perfect for your needs. You won't get significantly faster (unless you went directly to primitive array, where you lose benefits of abstraction.)
If this is too slow for you, it's because you're using other operations as well. If you update your question with ALL the operations the collection must service, you'll get a better answer.
If you don't call contains() (which will call equals() many times), you can use ArrayList.get(randomNumber) and that will be O(1)
You can't do it with a HashMap - it stores the objects internally in an array, where the index = hashcode for the object. Even if you had that table, you'd need to guess which buckets contain objects. So a HashMap is not an option for random access.
Assuming that equals() calls are because you sort out duplicates with contains(), you may want to keep both a HashSet (for quick if-already-present lookup) and an ArrayList (for quick random access). Or, if operations don't interleave, build a HashSet first, then extract its data with toArray() or transform it into ArrayList with constructor of the latter.
If your problems are due to remove() call on ArrayList, don't use it and instead:
if you remove not the last element, just replace (with set()) the removed element with the last;
shrink the list size by 1.
This will of course screw up element order, but apparently you don't need it, judging by description. Or did you omit another important detail?

Complexity of processing a collection's values

I need to store a growing large number of objects in a collection. While performing actions of each object of the collection, I regularly need to check whether an object is already stored. If an object is not stored yet I will add it to the end of the collection. I process each object iteratively while doing the checks.
Objects already processed should not be removed from the collection because I do not want put them back to processing when I stumble upon them again.
As a result I do not know what collection may fit best. HashSet has a constant time "contains" method but a List has faster methods to iterate over its elements, right ?
What would be the wiser choice ? Would it be relevant to keep two different structures at a time containing the same nodes, a HashSet for the checks and a LinkedList for the processing ?
As a result I do not know what collection may fit best. HashSet has a constant time "contains" method but a List has faster methods to iterate over its elements, right ?
How about a LinkedHashSet?
Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order)
1) Use ArrayList, not LinkedList. LinkedLists consume a lot of memory, and it's slower on iteration than ArrayList.
2) I'd suggest to use two data structures. E.g. for the sake of you being unable to add to a collection wile iterating through it (ConcurrentModificationException)
Well, it seems you are interested in two views on your collection.
A queue like view, adding things to the end and inspecting them at the front.
A contains check
All those operations are well supported in different kinds of heaps, e.g. java.util.PriorityQueue

Categories