Sortable Java collection without duplicates

Sortable Java collection without duplicates - java

I am searching for sortable (I mean sorting after initialization and many times using Comparator) Java class collection without duplicates. Is there any more pure solution than writing code which will opaque and prevent for example some ArrayList for adding another object with the same value that already exists?
Edit 1:
I should add some explanation about sorting. I need to sort this set of values many times with different comparators (diversity of implementations).

Use a Set! The common implementations are HashSet and TreeSet. The latter preserves the order of items as it implements SortedSet.

Set Interface---->SortedSet Interface----->TreeSet Class
Set Interface---->HashSet Class
Set Interface---->LinkedHashSet Class
You can Use TreeSet. It will remove duplicates.
TreeSet implements SortedSet Interface so that it will sort the elements entered
SortedSet s=new TreeSet();
s.add(12);
s.add(12);
s.add(1);
s.add(56);
s.add(6);
s.add(47);
s.add(1);
System.out.println(s);
Output
[1, 6, 12, 47, 56]

Use Set for unique elements.. You can always use Collections.sort() to sort any collection you use

This is a set.
usage:
Collection collection = new HashSet();

It might be best to extend a standard collection or implement one from scratch. Eg:
class SetList<E> extends ArrayList<E> {
boolean add(E e) {
if (contains(e)) {
return false;
} else {
super.add(e);
return true;
}
}
void add(int index, E e) { .. }
void addAll(..) {..}
void addAll(..) {..}
}
And then you've got Collections.sort as previously stated. I would want to double-check everything though -- I can just imagine library methods making false assumptions about SetList because it extends ArrayList, leading to disaster. Read the javadocs of ArrayList, List, and Collection for a start, and really consider doing one from scratch.

Related

Internally HashSet is using HashMap only but why we are going for hashSet instead of hashMap?

The internal implementation of HashSet
.......................................
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable
{
private transient HashMap<E,Object> map;
private static final Object PRESENT = new Object();
//constructors
public HashSet() {
map = new HashMap<>();
}
public HashSet(int initialCapacity) {
map = new HashMap<>(initialCapacity);
}
public HashSet(int initialCapacity, float loadFactor) {
map = new HashMap<>(initialCapacity, loadFactor);
}
public HashSet(Collection<? usnoextends E> c) {
map = new HashMap<>(Math.imax((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
//add method
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
}
Internally HashSet is using HashMap only and performance-wise HashMap is faster than HashSet
So why we are not using HashMap directly instead of going for HashSet.

Because HashSet is another type of collection - focused on the single object rather than pair of items. To make HashMap work like HashSet we would need to provide everywhere some artificial value object like
HashMap<MyItem, Object> set;
and then instead of e.g. set.add(new MyItem()) use something like set.put(new MyItem(), null) what makes no sense and can cause serious issues (when type of Object will be changed, when you will need to serialize etc)
Moreover internal implementation is nothing you should take care of - it can change in the next Java version (probably won't) and some another mechanism will be used underneath. The most important is Set interface and the fact HashSet is implementing this
What is the difference between Lists, ArrayLists, Maps, Hashmaps, Collections etc..?

As pointed by #m.antkowicz, though it internally uses HashMap, there is no guarantee.
Another major reason:
Generally in large projects, interfaces are defined independent of implementations.
If a business interface expects Set (or even Collection), it will define it as Set(or Collection)
The interface does not care about the underlying implementations(they assume the expected behavior will be maintained)
Any business concrete implementation of the interface should declare the method signature exactly(to override)
So, they also will use Set(or Collection)
Also, different implementations of Set use different Map
ConcurrentSkipListSet uses ConcurrentNavigableMap
HashSet uses HashMap
So, this is difficult to use exactly in interface contracts.

Adavantages of HashSet over ArrayList and vice versa

I have a doubt regarding data structures in Java. While solving a typical hashing problem in Java, I was using the HashSet data structure, which worked fine until there were duplicate objects (object contents). Since HashSet does not support insert of duplicates, my logic was failing.
I replaced the hashset with the typical Arraylist since the methods of hashset such as .add(), .contains(), .remove() are supported in both, and my logic worked perfectly then.
But does this necessarily mean ArrayList is the logical choice over Hashset when duplicates are involved? There should be some time complexity advantages of Hashset over ArrayList right? Can someone please provide me some insight regarding this?
EDIT: What would be the ideal data structure when you want to do hashing when duplicates are involved. I mean when the duplicates should not be ignored and should be inserted.

It's not clear what you mean by a "hashing problem," but maybe you're looking for a multiset. From the Guava docs:
A collection that supports order-independent equality, like Set, but may have duplicate elements. A multiset is also sometimes called a bag.
Elements of a multiset that are equal to one another are referred to as occurrences of the same single element. The total number of occurrences of an element in a multiset is called the count of that element (the terms "frequency" and "multiplicity" are equivalent, but not used in this API).
No such thing exists in the JDK.

When you use a HashMap it replaces the original value with the new duplicate.
When you use a HashSet, subsequent duplicates are ignored (not inserted).
When you use an ArrayList, it simply adds the duplicate to the end of the list
It all depended on what you need given your requirements.

ArrayList is not the logical choice if you don't want duplicates. Different tools for different use cases.
You would use a Set in areas where duplicates wouldn't make sense, for example, a set of students. A List allows duplicates.

If you specifically need a HashSet that handles duplicates, a HashMap will be able to do the job. If you just need a count of the number of objects added (with quick lookup/etc), a HashMap<T,Integer> will be ideal, where T is the type of your object. If you actually need to keep references to the duplicate objects you've added, go with HashMap<T, List<T>>. That way you can look up by using HashMap's .containsKey(T t), and iterate through all of the similarly hashing objects in the resulting list. So for example, you could create this class:
public class HashSetWithDuplicates<T> {
private HashMap<T, List<T>> entries;
private int size;
public HashSetWithDuplicates(){
entries = new HashMap<>();
size = 0;
}
public HashSetWithDuplicates(Collection<? extends T> col){
this();
for(T t : col){
add(t);
}
}
public boolean contains(T t){
return entries.containsKey(t);
}
public List<T> get(T t){
return entries.get(t);
}
public void add(T t){
if (!contains(t)) entries.put(t, new ArrayList<>());
entries.get(t).add(t);
size++;
}
public void remove(T t){
if (!contains(t)) return;
entries.get(t).remove(t);
if(entries.get(t).isEmpty()) entries.remove(t);
size--;
}
public int size(){
return size;
}
public boolean isEmpty(){
return size() == 0;
}
}
Add functionality to your needs.

How can I randomize the iteration sequence of a Set?

I need to use the Set collection.
Each time I start a jvm to run the program, I want to iterate through the items in the Set in a randomly decided sequence.
The iteration sequence has nothing to do with the sequence in which I placed them in the Set, right?
So, what to do? How can I randomize the iteration sequence in a Set?
Here is my method, and it does not randomize.
public static <T> void shuffle(Set<T> set) {
List<T> shuffleMe = new ArrayList<T>(set);
Collections.shuffle(shuffleMe);
set.clear();
set.addAll(shuffleMe);
}

What you need is a RandomizingIterator
Set is unordered, so randomizing an unordered Collection doesn't make any logical sense.
An ordered Set is ordered using a Comparator which means it has a fixed order, you can't shuffle it, that has no meaning as the order is determined by the Comparator or the compare() method.
Set -> List will allow you to shuffle the contents of the List and then use a custom RandomizingIterator to iterate across the Set.
Example Implementation :
Link to Gist on GitHub - TestRandomizingIterator.java
import org.junit.Test;
import javax.annotation.Nonnull;
import java.util.*;
public class TestRandomzingIterator
{
#Test
public void testRandomIteration()
{
final Set<String> set = new HashSet<String>()
{
/** Every call to iterator() will give a possibly unique iteration order, or not */
#Nonnull
#Override
public Iterator<String> iterator()
{
return new RandomizingIterator<String>(super.iterator());
}
class RandomizingIterator<T> implements Iterator<T>
{
final Iterator<T> iterator;
private RandomizingIterator(#Nonnull final Iterator<T> iterator)
{
List<T> list = new ArrayList<T>();
while(iterator.hasNext())
{
list.add(iterator.next());
}
Collections.shuffle(list);
this.iterator = list.iterator();
}
#Override
public boolean hasNext()
{
return this.iterator.hasNext();
}
#Override
public T next()
{
return this.iterator.next();
}
/**
* Modifying this makes no logical sense, so for simplicity sake, this implementation is Immutable.
* It could be done, but with added complexity.
*/
#Override
public void remove()
{
throw new UnsupportedOperationException("TestRandomzingIterator.RandomizingIterator.remove");
}
}
};
set.addAll(Arrays.asList("A", "B", "C"));
final Iterator<String> iterator = set.iterator();
while (iterator.hasNext())
{
System.out.println(iterator.next());
}
}
}
Notes:
This is a straw man example, but the intention is clear, use a custom Iterator to get custom iteration.
You can't get the normal iteration behavior back, but that doesn't seem to be a problem with your use case.
Passing the the super.iterator() to the facade is important, it will StackOverflowError otherwise, because it becomes a recursive call if you pass this to .addAll() or the List() constructor.
HashSet may appear to be ordered but it isn't guaranteed to stay ordered, the order depends on the hashCode of the objects and adding a single object may reorder the how the contents are order, the contract of the Set interface is that the order is undefined and in particular the HashSet is nothing more than a Facade over a backing Map.keySet().
There are other more supposedly light weight, but much more complex solutions that use the original Iterator and try and keep track of what has already been seen, those solutions aren't improvements over this technique unless the size of the data is excessively large, and the you are probably looking at on disk structures at that point.

You could copy the contents of the Set into a List, shuffle the List, then return a new LinkedHashSet populated from the shuffled list. Nice thing about LinkedHashSet is that its iterators return elements in the order they were inserted.
public static <T> Set<T> newShuffledSet(Collection<T> collection) {
List<T> shuffleMe = new ArrayList<T>(collection);
Collections.shuffle(shuffleMe);
return new LinkedHashSet<T>(shuffleMe);
}

According to the docs for java.util.Set:
The elements are returned in no particular order (unless this set is an instance of some class that provides a guarantee).
When you insert the elements there is no guarantee about the order they will be returned to you. If you want that behavior you will need to use a data structure which supports stable iteration order, e.g. List.

Internally HashSet sorts all its elements, AFAIR according to their hash() value. So you should use other classes like SortedSet with a custom comparator. But remember the whole idea of Set is to find elements quickly, that's why it sorts elements internally. So you have to keep "stability" of the comparison. Maybe you don't need a set after shuffling?

T[] toArray(T[] a) implementation

I am creating a SortedList class that implements List.
If I understand correctly, the method toArray(T[] a) takes an array of objects as a parameter and returns a sorted array of these objects.
In the java documentation we can read that if the Collection length is greater than the sortedList, a new array is created with the good size, and if the collection length is smaller than the sortedList, the object following the last object of the collection is set to null.
The project I am working on does not let me use null values in the sorted list, so I am implementing the method differently, using a new sortedList and the toArray() method:
public <T> T[] toArray(T[] a)
{
SortedList sort = new SortedList();
for(Object o : a)
{
sort.add(o);
}
return (T[])sort.toArray();
}
Would this be a good way to implement this method or should I expect errors using it like that?
Thank you for your time.

First a recommendation:
If you want SortedList to implement the List interface, it's a good idea to extend AbstractList instead of implementing List directly. AbstractList has already defined many of the necessary methods, including the one you're having problems with. Most List-implementations in the Java platform libraries also extend AbstractList.
If you still want to implement List directly, here is what the method is supposed to do:
Let a be the specified array.
If a is large enough, fill it with the elements from your SortedList (in the correct order) without caring about what was previously in a.
If there's room to spare in a after filling it, set a[size()] = null. Then the user will know where the list ends, unless the list contains null-elements.
If the list doesn't fit in a, create a new array of type T with the same size as the list, and fill the new one instead.
Return the array you filled. If you filled a, return a. If you made a new array, return the new array.
There are two reasons why this method is useful:
The array will not necessarily be of type Object, but of a type T decided by the user (as long as the type is valid).
The user may want to save memory and re-use an array instead of allocating more mamory to make a new one.
Here is how the Java Docs describe the method.

If you are implementing a "SortedList" class, it's probably in your best interest to maintain a sorted list internally, rather than relying on the toArray() method to sort them on the way out. In other words, users of the class may not use the toArray() method, but may instead use listIterator() to return an Iterator that is supposed to iterate over the elements of the list in the proper order.

Are you sure you need to implement List. It is often sufficient just to implement Iterable and Iterator.
public class SortedList<S extends Comparable<S>> implements Iterable<S>, Iterator<S> {
private final Iterator<S> i;
// Iterator version.
public SortedList(Iterator<S> iter, Comparator<S> compare) {
// Roll the whole lot into a TreeSet to sort it.
Set<S> sorted = new TreeSet<S>(compare);
while (iter.hasNext()) {
sorted.add(iter.next());
}
// Use the TreeSet iterator.
i = sorted.iterator();
}
// Provide a default simple comparator.
public SortedList(Iterator<S> iter) {
this(iter, new Comparator<S>() {
public int compare(S p1, S p2) {
return p1.compareTo(p2);
}
});
}
// Also available from an Iterable.
public SortedList(Iterable<S> iter, Comparator<S> compare) {
this(iter.iterator(), compare);
}
// Also available from an Iterable.
public SortedList(Iterable<S> iter) {
this(iter.iterator());
}
// Give them the iterator directly.
public Iterator<S> iterator() {
return i;
}
// Proxy.
public boolean hasNext() {
return i.hasNext();
}
// Proxy.
public S next() {
return i.next();
}
// Proxy.
public void remove() {
i.remove();
}
}
You can then do stuff like:
for ( String s : new SortedList<String>(list) )
which is usually all that you want because TreeSet provides your sortedness for you.

List with fast contains

I wonder if there's a List implementation allowing fast contains. I'm working with quite a long List and I can't switch to Set since I need the access by index. I can ignore the performance, which may be acceptable now and may or may not be acceptable in the future. I can create a HashSet and do all modifying operations on both, but doing it manually is quite boring and error prone.
I know that it's impossible to have a class working like both List and Set (because of the different equals semantics), but I wonder if there's List implementing RandomAccess and employing an HashSet for speeding up contains.

I know that it's impossible to have a class working like both List and Set
Have you tried LinkedHashSet? Technically it's a set but it preserves order which might be just enough for you. However access by index is linear and not built-in.
Other approach would be to wrap List with a custom decorator that both delegates to List and maintains a n internalSet for faster contains.

you can wrap a list and hashSet that combines best of both worlds
public class FastContainsList<T> extends AbstractSequentialList<T> implements RandomAccess{
//extending sequential because it bases itself of the ListIterator(int) and size() implementation
private List<T> list=new ArrayList<T>();
private Set<T> set=new HashSet<T>();
public int size(){
return list.size();
}
public boolean contains(Object o){//what it's about
return set.contains(o);
}
public ListIterator<T> listIterator(int i){
return new ConIterator(list.listIterator(i));
}
/*for iterator()*/
private ConIterator implements ListIterator<T>{
T obj;
ListIterator<T> it;
private ConIterator(ListIterator<T> it){
this.it = it
}
public T next(){
return obj=it.next();
}
public T previous(){
return obj=it.previous();
}
public void remove(){
it.remove();//remove from both
set.remove(obj);
}
public void set(T t){
it.set(t);
set.remove(obj);
set.add(obj=t);
}
public void add(T t){
it.add(t);
set.add(t);
}
//hasNext and hasPrevious + indexes still to be forwarded to it
}
}

What about a BiMap<Integer, MyClass>? This can be found in Guava
BiMap<Integer, MyClass> map = HashBiMap.create();
//store by index
map.put(1, myObj1);
//get by index
MyClass retrievedObj = map.get(1);
//check if in map
if ( map.containsValue(retrievedObj) ) {
//...
}
I know it doesn't implement the List interface. The major limitation here is that insertion and removal are not provided in the traditional List sense; but you didn't specifically say whether those were important to you.

Apache Commons TreeList should do the trick:
http://commons.apache.org/proper/commons-collections/javadocs/api-release/org/apache/commons/collections4/list/TreeList.html

You can maintain a List and Set. This will give you fast indexed lookup and contains (with a small overhead)
BTW: If your list is small, e.g. 10 entries, it may not make any difference to use a plain List.

I think a class that wraps a HashMap and List (as you said in your post) is probably the best best for fast contains and access.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Sortable Java collection without duplicates - java

Use a Set! The common implementations are HashSet and TreeSet. The latter preserves the order of items as it implements SortedSet.

Use Set for unique elements.. You can always use Collections.sort() to sort any collection you use

This is a set. usage: Collection collection = new HashSet();

Related

Internally HashSet is using HashMap only but why we are going for hashSet instead of hashMap?

Adavantages of HashSet over ArrayList and vice versa

How can I randomize the iteration sequence of a Set?

T[] toArray(T[] a) implementation

List with fast contains

Categories

Resources