Collection with no duplicates and in random order in Java

Collection with no duplicates and in random order in Java - java

It looks like I can't either use an ArrayList nor a Set:
Set<> - I can avoid duplicates using a set, but no shuffle option // Collections.shuffle(List<?> list)
ArrayList<> - I can use shuffle to randomise the list, but duplicates are allowed.
I could use a Set and convert this into an ArrayList (or the other way around) to avoid the duplicates. Alternatively, loop through the set to randomise the items. But I am looking for something more efficient.

You can maintain two separate collections, an ArrayList and a HashSet, and reject insertion of any item which is present in the HashSet.
If you are concerned with encapsulation, wrap the two collections in a meta-object that implements List, and carefully document that insertions of duplicate elements will be rejected, even if the general contract of List doesn't prescribe so.
Talking about the cost of this solution, I believe that in terms of time the cost would be absolutely negligible if compared to a plain ArrayList: most operations on HashSets cost amortized O(1), namely lookup and insertion. On the other hand, your memory usage will be twice (or more, depending on the HashSet load factor).

As far as I know sets aren't ordered, so you obviously cannot shuffle items of sets. For removing duplicates from a list I found this: How do I remove repeated elements from ArrayList?.

With the least amount of code and most elegance you can do something like:
public void testFoo() {
Set<Integer> s = new TreeSet<Integer>();
s.add(2);
s.add(1);
s.add(3);
Collections.shuffle(Arrays.asList(s.toArray()));
}
But this is not very effective, you could use an array and a hash function to put the elements in the desired spot on the array, and check of they are already there before putting them, this will work in O(n) time, so it's very good, but needs a little more code and some attention to the hash function.

You can use a Map to avoid duplicates and then Map.entrySet() and shuffle the ArrayList

You could actually use an "ordered set", e.g. TreeSet. In order to get a random order, don't insert the actual item but a wrapper with some random weight and use a corresponding comparator. Re-shuffling however would require to update all wrapper weights.

Related

Java: What collection type should I use for this case?

What I need:
Fastest put/remove, this is used alot.
Iteration, also used frequently.
Holds an object, e.g. Player. remove should be o(1) so maybe hashmap?
No duplicate keys
direct get() is never used, mainly iterating to retrieve data.`
I don't worry about memory, I just want the fastest speed possible even if it's at the cost of memory.

For iteration, nothing is faster than a plain old array. Entries are saved sequentially in memory, so the JVM can get to the next entry simply by adding the length of one entry to the its address.
Arrays are typically a bit of a hassle to deal with compared to maps or lists (e.g: no dictionary-style lookups, fixed length). However, in your case I think it makes sense to go with a one or two dimensional array since the length of the array will not change and dictionary-style lookups are not needed.

So if I understand you correctly you want to have a two-dimensional grid that holds information of which, if any, player is in specific tiles? To me it doesn't sound like you should be removing, or adding things to the grid. I would simply use a two-dimensional array that holds type Player or something similar. Then if no player is in a tile you can set that position to null, or some static value like Player.none() or Tile.empty() or however you'd want to implement it. Either way, a simple two-dimensional array should work fine. :)

The best Collection for your case is a LinkedList. Linked lists will allow for fast iteration, and fast removal and addition at any place in the linked list. For example, if you use an ArrayList, and you can to insert something at index i, then you have to move all the elements from i to the end one entry to the right. The same would happen if you want to remove. In a linked list you can add and remove in constant time.
Since you need two dimensions, you can use linked lists inside of linked lists:
List<List<Tile> players = new LinkedList<List<Tile>>(20);
for (int i = 0; i < 20; ++i){
List<Tile> tiles = new LinkedList<Tile>(20);
for (int j = 0; j < 20; ++j){
tiles.add(new Tile());
}
players.add(tiles);
}

use a map of sets guarantee O(1) for vertices lookup and amortized O(1) complexity edge insertion and deletions.
HashMap<VertexT, HashSet<EdgeT>> incidenceMap;

There is no simple one-size-fits-all solution to this.
For example, if you only want to append, iterate and use Iterator.remove(), there are two obvious options: ArrayList and LinkedList
ArrayList uses less memory, but Iterator.remove() is O(N)
LinkedList uses more memory, but Iterator.remove() is O(1)
If you also want to do fast lookup; (e.g. Collection.contains tests), or removal using Collection.remove, then HashSet is going to be better ... if the collections are likely to be large. A HashSet won't allow you to put an object into the collection multiple times, but that could be an advantage. It also uses more memory than either ArrayList or LinkedList.
If you were more specific on the properties required, and what you are optimizing for (speed, memory use, both?) then we could give you better advice.

The requirement of not allowing duplicates is effectively adding a requirement for efficient get().
Your options are either hash-based, or O(Log(N)). Most likely, hashcode will be faster, unless for whatever reason, calling hashCode() + equals() once is much slower than calling compareTo() Log(N) times. This could be, for instance, if you're dealing with very long strings. Log(N) is not very much, by the way: Log(1,000,000,000) ~= 30.
If you want to use a hash-based data structure, then HashSet is your friend. Make sure that Player has a good fast implementation of hashCode(). If you know the number of entries ahead of time, specify the HashSet size. ( ceil(N/load_factor)+1. The default load factor is 0.75).
If you want to use a sort-based structure, implement an efficient Player.compareTo(). Your choices are TreeSet, or Skip List. They're pretty comparable in terms of characteristics. TreeSet is nice in that it's available out of the box in the JDK, whereas only a concurrent SkipList is available. Both need to be rebalanced as you add data, which may take time, and I don't know how to predict which will be better.

Java HashSet vs Array Performance

I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.

Depends on your data;
HashSet gives you an O(1) contains() method but doesn't preserve order.
ArrayList contains() is O(n) but you can control the order of the entries.
Array if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set, you can directly use SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.

The choice greatly depends on what do you want to do with it.
If it is what mentioned in your question:
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
If this is what you need to do, the you need neither of them. There is a size() method in Collection for which you can get the size of it, which mean how many of them there are in the collection.
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.

theoretically, and as SCJP6 Study guide says :D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)

It looks like you will want an HashMap that maps id's to counts. Particularly,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.

Searching LinkedHashMap, faster method than sequential?

I am wondering if there is a more efficient method for getting objects out of my LinkedHashMap with timestamps greater than a specified time. I.e. something better than the following:
Iterator<Foo> it = foo_map.values().iterator();
Foo foo;
while(it.hasNext()){
foo = it.next();
if(foo.get_timestamp() < minStamp) continue;
break;
}
In my implementation, each of my objects has essentially three values: an "id," "timestamp," and "data." The objects are insterted in order of their timestamps, so when I call an iterator over the set, I get ordered results (as required by the linked hashmap contract). The map is keyed to the object's id, so I can quickly lookup them up by id.
When I look them up by a timestamp condition, however, I get an iterator with sorted results. This is an improvement over a generic hashmap, but I still need to iterate sequentially over much of the range until I find the next entry with a higher timestamp than the specified one.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential? If I went with a treemap as an alternative, would it offer overall speed advantages, or is it doing essentially the same thing in the background? Since the collection is sorted by insertion order already, I'm thinking tree map has a lot more overhead I don't need?

There is no faster way ... if you just use a LinkedHashMap.
If you want faster access, you need to use a different data structure. For example, a TreeSet with an appropriate comparator might be a better solution for this aspect of your problem. For example if your TreeSet is ordered by date, then calling tailSet with an appropriate dummy value can give you all elements greater or equal to a given date.
Since the results are already sorted, is there any algorithm I can pass the iterator (or collection to), that can search it faster than sequential?
Not for a LinkedHashMap.
However, if the ordered list was an ArrayList instead, then you could use "binary search" on the list ... provided that you could lock it to prevent concurrent modifications while you are searching. (Actually, concurrency is a potential issue to consider no matter how you implement this ... including your current linear search.)
If you want to keep the ability to do id lookups, then you need two data structures; e.g. a TreeSet and a HashMap which share their element objects. A TreeSet will probably be more efficient than trying to maintain an ArrayList in order assuming that there are random insertions and/or random deletions.

Randomly getting elements in a HashMap or HashSet without looping

I have roughly 420,000 elements that I need to store easily in a Set or List of some kind. The restrictions though is that I need to be able to pick a random element and that it needs to be fast.
Initially I used an ArrayList and a LinkedList, however with that many elements it was very slow. When I profiled it, I saw that the equals() method in the object I was storing was called roughly 21 million times in a very short period of time.
Next I tried a HashSet. What I gain in performance I loose in functionality: I can't pick a random element. HashSet is backed by a HashMap which is backed by an array of HashMap.Entry objects. However when I attempted to expose them I was hindered by the crazy private and package-private visibility of the entire Java Collections Framework (even copying and pasting the class didn't work, the JCF is very "Use what we have or roll your own").
What is the best way to randomly select an element stored in a HashSet or HashMap? Due to the size of the collection I would prefer not to use looping.
IMPORTANT EDIT: I forgot a really important detail: exactly how I use the Collection. I populate the entire Collection at the begging of the table. During the program I pick and remove a random element, then pick and remove a few more known elements, then repeat. The constant lookup and changing is what causes the slowness

There's no reason why an ArrayList or a LinkedList would need to call equals()... although you don't want a LinkedList here as you want quick random access by index.
An ArrayList should be ideal - create it with an appropriate capacity, add all the items to it, and then you can just repeatedly pick a random number in the appropriate range, and call get(index) to get the relevant value.
HashMap and HashSet simply aren't suitable for this.

If ALL you need to do is get a large collection of values and pick a random one, then ArrayList is (literally) perfect for your needs. You won't get significantly faster (unless you went directly to primitive array, where you lose benefits of abstraction.)
If this is too slow for you, it's because you're using other operations as well. If you update your question with ALL the operations the collection must service, you'll get a better answer.

If you don't call contains() (which will call equals() many times), you can use ArrayList.get(randomNumber) and that will be O(1)
You can't do it with a HashMap - it stores the objects internally in an array, where the index = hashcode for the object. Even if you had that table, you'd need to guess which buckets contain objects. So a HashMap is not an option for random access.

Assuming that equals() calls are because you sort out duplicates with contains(), you may want to keep both a HashSet (for quick if-already-present lookup) and an ArrayList (for quick random access). Or, if operations don't interleave, build a HashSet first, then extract its data with toArray() or transform it into ArrayList with constructor of the latter.
If your problems are due to remove() call on ArrayList, don't use it and instead:
if you remove not the last element, just replace (with set()) the removed element with the last;
shrink the list size by 1.
This will of course screw up element order, but apparently you don't need it, judging by description. Or did you omit another important detail?

Java - PriorityQueue vs sorted LinkedList

Which implementation is less "heavy": PriorityQueue or a sorted LinkedList (using a Comparator)?
I want to have all the items sorted. The insertion will be very frequent and ocasionally I will have to run all the list to make some operations.

A LinkedList is the worst choice. Either use an ArrayList (or, more generally, a RandomAccess implementor), or PriorityQueue. If you do use a list, sort it only before iterating over its contents, not after every insert.
One thing to note is that the PriorityQueue iterator does not provide the elements in order; you'll actually have to remove the elements (empty the queue) to iterate over its elements in order.

You should implement both and then do performance testing on actual data to see which works best in your specific circumstances.

I have made a small benchmark on this issue. If you want your list to be sorted after the end of all insertions then there is almost no difference between PriorityQueue and LinkedList(LinkedList is a bit better, from 5 to 10 percents quicker on my machine), however if you use ArrayList you will get almost 2 times quicker sorting than in PriorityQueue.
In my benchmark for lists I measured time from the beginning of filling it with values till the end of sorting. For PriorityQueue - from the beginning of filling till the end of polling all elements(because elements get ordered in PriorityQueue while removing them as mentioned in erickson answer)

adding objects to the priority queue will be O log(n) and the same for each pol. If you are doing inserts frequently on very large queues then this could impact performance. Inserting into the top of an ArrayList is constant so on the whole all those inserts will go faster on the ArrayList than on the priority queue.
If you need to grab ALL the elements in sorted order the Collections.sort will work in about O n log (n) time total. Where as each pol from the priority queue will be O log(n) time, so if you grab all n things from the queue that will again be O n log (n).
The use case where priority queue wins is if you are trying to find what the biggest value in the queue is at any given time. To do that with the ArrayList you have to sort the whole list each time you want to know the biggest. But with the priority queue it always knows what the biggest value is.

If you use a LinkedList, you would need to resort the items each time you added one and since inserts are frequent, I wouldn't use a LinkedList. So in this case, I would use a PriorityQueue's If you will only be adding unique elements to the list, I recommend using a SortedSet (one implementation is the TreeSet).

There is a fundamental difference between the two data structures and they are not as easily interchangeable as you might think.
According to the PriorityQueue documentation:
The Iterator provided in method iterator() is not guaranteed to traverse the elements of the priority queue in any particular order.
Use an ArrayList and call Collections.sort() on it only before iterating the list.

The issue with PriorityQueue is that you have to empty the queue to get the elements in order. If that is what you want then it is a fine choice. Otherwise you could use an ArrayList that you sort only when you need the sorted result or, if the items are distinct (relative to the comparator), a TreeSet. Both TreeSet and ArrayList are not very 'heavy' in terms of space; which is faster depends on the use case.

Do you need it sorted at all times? If that's the case, you might want to go with something like a tree-set (or other SortedSet with a fast lookup).
If you only need it sorted occasionally, go with a linked list and sort it when you need access. Let it be unsorted when you don't need access.

java.util.PriorityQueue is
"An unbounded priority queue based on
a priority heap"
. The heap data structure make much more sense than a linked list

I can see two options, which one is better depends on whether you need to be able to have duplicate items.
If you don't need to maintain duplicate items in your list, I would use a SortedSet (probably a TreeSet).
If you need maintain duplicate items, I would go with an LinkedList and insert new items into the list in the correct order.
The PriorityQueue doesn't really fit unless you want to remove the items whenever you do operations.
Going along with the others, make sure you use profiling to make sure you're picking out the correct solution for your particular problem.

IMHO: we don't need PriorityQueue if if have LinkedList. I can sort queue with LinkedList faster than with PriorityQueue. e.g.
Queue list = new PriorityQueue();
list.add("1");
list.add("3");
list.add("2");
while(list.size() > 0) {
String s = list.poll().toString();
System.out.println(s);
}
I believe this code works too long, cause each time I add element it will sort elements. but if I will use next code:
Queue list = new LinkedList();
list.add("1");
list.add("3");
list.add("2");
List lst = (List)list;
Collections.sort(lst);
while(list.size() > 0) {
String s = list.poll().toString();
System.out.println(s);
}
I think this code will sort only once and it will be faster that using PriorityQueue. So, I can once sort my LinkedList once, before using it, in any case and it will work faster. And even if it sort the same time I don't really need PriorityQueue, we really don't need this class.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.