I need to store unique objects in some datastructure, but also I need access by index like in ArrayList or plain array. Maybe there is some elegant way to do this without using convertions from set to array, iteration through all elemnts, checking value while adding to ArrayList and others.
I would be grateful for any help and advices.
You should have a look at ListOrderedSet from Apache Commons Collections:
Decorates another Set to ensure that the order of addition is retained and used by the iterator.
If an object is added to the set for a second time, it will remain in the original position in the iteration. The order can be observed from the set via the iterator or toArray methods.
The ListOrderedSet also has various useful direct methods. These include many from List, such as get(int), remove(int) and indexOf(int). An unmodifiable List view of the set can be obtained via asList().
Make your own class containing a HashSet<T> and an ArrayList<T>. For the add/append operation, if the element is not already in the set, append it to the list and add it to the HashSet. You'll use about twice as much memory as a normal ArrayList, but you'll get O(1) random access and contains operations.
Related
List interface allows us to get the object using get() method at an index.
How can we obtain the object at the particular index in set interface like LinkedHashSet
Set is unordered. There isn't the concept of index.
Therefore, if you want to get a particular element, you are forced to loop over it and break as soon as you find element you wanted.
http://docs.oracle.com/javase/7/docs/api/java/util/Set.html
and here: http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html
But a set is only used to check if something is in the list, not where it is.
Short answer is, it is not possible. However, you can get an array that has all datum from the Set you are using and then accessing it via an index. This has to do with the abstraction provided by Set which is different from List.
A Set is simply a collection that does not allow duplicates (no comments on ordering), but a List is a collection that implies ordering, so each value has an associated index.
You can't. There is no indexed acces for a set since it is not ordered.
I am considering using a Java collection that would work best with random insertions. I will be inserting a lot and only read the collection once at the end.
My desired functionality is adding an element at a specified index, anywhere between <0, current_length>. Which collection would be the most efficient to use?
Useful link for your reference:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
Not entirely sure how you will be reading the information post input (and how important it is to you). Hashmap or ArrayList would make sense depending on what you are looking to do. Also not sure if you are looking for something thread safe or not.
Hope it helps.
The inefficiency of using List is endemic to the problem. Every time you add something, every subsequent element will have to be re-indexed - as the javadoc states:
Shifts the element currently at that position (if any) and any
subsequent elements to the right (adds one to their indices).
From your question/comments, it would appear that you have a bunch of Objects, and you're sorting them as you go. I'd suggest a more efficient solution to this problem would be to write a Comparator (or make your object implement Comparable), and then use Collections.sort(list, comparator) (or Collections.sort(list)).
You might suggest that your Objects are being sorted on the basis of other variables. In which case, you could create an extension of the Object, with those other variables as fields and extending Comparable, and with a method like getOriginal(). You add these wrapped objects to your list, sort, and then iterate through the list, adding the original objects (from getOriginal()) to a new list.
For info on the sorting algorithm of collections - see this SO question
I have roughly 420,000 elements that I need to store easily in a Set or List of some kind. The restrictions though is that I need to be able to pick a random element and that it needs to be fast.
Initially I used an ArrayList and a LinkedList, however with that many elements it was very slow. When I profiled it, I saw that the equals() method in the object I was storing was called roughly 21 million times in a very short period of time.
Next I tried a HashSet. What I gain in performance I loose in functionality: I can't pick a random element. HashSet is backed by a HashMap which is backed by an array of HashMap.Entry objects. However when I attempted to expose them I was hindered by the crazy private and package-private visibility of the entire Java Collections Framework (even copying and pasting the class didn't work, the JCF is very "Use what we have or roll your own").
What is the best way to randomly select an element stored in a HashSet or HashMap? Due to the size of the collection I would prefer not to use looping.
IMPORTANT EDIT: I forgot a really important detail: exactly how I use the Collection. I populate the entire Collection at the begging of the table. During the program I pick and remove a random element, then pick and remove a few more known elements, then repeat. The constant lookup and changing is what causes the slowness
There's no reason why an ArrayList or a LinkedList would need to call equals()... although you don't want a LinkedList here as you want quick random access by index.
An ArrayList should be ideal - create it with an appropriate capacity, add all the items to it, and then you can just repeatedly pick a random number in the appropriate range, and call get(index) to get the relevant value.
HashMap and HashSet simply aren't suitable for this.
If ALL you need to do is get a large collection of values and pick a random one, then ArrayList is (literally) perfect for your needs. You won't get significantly faster (unless you went directly to primitive array, where you lose benefits of abstraction.)
If this is too slow for you, it's because you're using other operations as well. If you update your question with ALL the operations the collection must service, you'll get a better answer.
If you don't call contains() (which will call equals() many times), you can use ArrayList.get(randomNumber) and that will be O(1)
You can't do it with a HashMap - it stores the objects internally in an array, where the index = hashcode for the object. Even if you had that table, you'd need to guess which buckets contain objects. So a HashMap is not an option for random access.
Assuming that equals() calls are because you sort out duplicates with contains(), you may want to keep both a HashSet (for quick if-already-present lookup) and an ArrayList (for quick random access). Or, if operations don't interleave, build a HashSet first, then extract its data with toArray() or transform it into ArrayList with constructor of the latter.
If your problems are due to remove() call on ArrayList, don't use it and instead:
if you remove not the last element, just replace (with set()) the removed element with the last;
shrink the list size by 1.
This will of course screw up element order, but apparently you don't need it, judging by description. Or did you omit another important detail?
If I am going to create a Java Collection, and only want to fill it with elements, and then iterate through it (without knowing the necessary size beforehand), i.e. all I need is Collection<E>.add(E) and Collection<E>.iterator(), which concrete class should I choose? Is there any advantage to using a Set rather than a List, for example? Which one would have the least overhead?
which concrete class should I choose?
I would probably just go with an ArrayList or a LinkedList. Both support the add and iterator methods, and neighter of them have any considerable overhead.
Is there any advantage to using a Set rather than a List, for example?
No, I wouldn't say so. (Unless you rely on the order of the elements, in which case you must use a List, or want to disallow duplicates, in which case you should use a Set.)
(I don't see how any Set implementation could beat a list implementation for add / iterator methods, so I'd probably go with a List even if I don't care about order.)
Which one would have the least overhead?
Sounds like micro benchmarking here, but if I'd be forced to guess, I'd say ArrayList (or perhaps LinkedList in coner cases where ArrayLists need to reallocate memory often :-)
Do not go with a Set. Sets and Lists differ according to their purpose, that you should always consider when choosing the right Collection
a List is there for maintaining elements in the order you added them; and if you insert the same element twice it will be kept twice
a Set is there for holding one specific element exactly once (uniqueness); order is only relevant for specific implementations (like TreeSet), but still elements that are 'the same' would not be added twice
Set is only meaningful if you want to sort your objects and to make sure no duplicate element is 'registered'. Else, an ArrayList is just fine.
However, if you want to add elements while iterating too, an ArrayBlockingQueue is better.
Here are some key points which can help you to choose your collection according to your requirement -
List(ArrayList or LinkedList)
Allowed duplicate values.
Insertion order preserved.
Set
Not allowed duplicate values.
Insertion order is not preserved.
So according to your requirement List seems to be a suitable choice.
Now Between ArrayList and LinkedList -
ArrayList is a random access list. Use if your frequent operation is the retrieval of elements.
LinkedList is the best option if you want to add or remove elements from the list.
What changes to be done in ArrayList to make it behave like a Set (means it should not accept any duplicate values).
There are many ways to accomplish this. Here are a two:
Store the elements of the ArrayList in random order. When inserting a new value, do a linear scan over the elements and see if the element you're adding already exists. If so, don't add it. Otherwise, append it to the elements.
Enforce that the elements of the ArrayList always be stored in sorted order. To insert a new element, do a binary search to find where that element should be placed, and if the element already exists don't insert it. Otherwise, insert it at the given position.
However, you shouldn't be doing this. These approaches are very slow compared to HashSet or TreeSet, which are specialized data structures optimized to handle this efficiently.
Create your own implementation , implement java.util.List
override add(), addAll() , make use of contains()
As the others said, it's unclear why you need this.
Maybe LinkedHashSet is what you need?
http://download.oracle.com/javase/6/docs/api/java/util/LinkedHashSet.html
Other than already said, you could have a look at java.util.concurrent.CopyOnWriteArraySet. If you leave the "CopyOnWrite" part away, you have your ArraySet.