My usecase is like in which I need to maintain a collection of unique items. I may need to frequently add or remove items from the collection at an index(which I have as member of the item currently, but I am open to modification), and while doing that I need to update index of items.
I am not able to decide which Java collection would suit my needs best. HashSet and SortedSet both guarantee uniqueness, but not sure how index part can be taken care of.
According to the question + comments, you have the following fundamental requirements for the collection:
The elements in the collection must be unique.
The collection must maintain the elements in an order specified by the user.
The collection elements must have unique indexes representing the element's current position.
The indexes must adjust as elements are inserted and deleted.
There is no (single) Java SE collection type that does 1 and 2, 3 or 4.
The only Java SE collection type that supports and can maintain an arbitrary ordering is List, so you need to start with that. Something like this for instance:
public class MyList<E> extends ArrayList<E> {
...
#Override
public void add(int pos, <E> e) {
if (this.contains(e)) {
throw new SomeException("already in collection");
}
this.add(pos, e);
}
}
Note that HashMap<Integer, E> is a possible alternative, but adjusting the indexes as elements are added and removed is complicated. (Comparing the performance characteristics is not straightforward, but the chances are that it won't matter in your use-case.)
Related
my question was why does iterator work on set?
Here is my example code,
public class Staticex {
public static void main(String[] args) {
HashSet set = new HashSet();
set.add(1);
set.add(2);
set.add(3);
set.add(4);
set.add(5);
Iterator iter = set.iterator();
while (iter.hasNext()) {
System.out.println(iter.next());
}
}
}
I understand, set is unordered, In contrast List
So, How can get the values one by one through an iterator?
Is iterator changing set into like list which ordered data structure?
How can Iterator can using in set?
Like you are using it.
How can get the values one by one through an iterator?
Your code is doing that.
Is iterator changing set into like list which ordered data structure?
No.
The thing that you are missing is what "unordered" means. It means that the order in which the (set's) elements are returned is not predictable1, and not specified in the javadocs. However each element will be returned once and (since the elements of a set are unique!) only once for the iteration.
1 - Actually, this is not strictly true. If you have enough information about the element class, the element values, how they were created and how / when they were added to the HashSet, AND you analyze the specific HashSet implementation ... it is possible that you CAN predict what the iteration order is going to be. For example if you create a HashSet<Integer> and add 1, 2, 3, 4, ... to it, you will see a clear (and repeatable) pattern when you iterate the elements. This is in part due to the way that Integer.hashCode() is specified.
Referring to the documentation, we see that:
Iterator<E> iterator()
Returns an iterator over the elements in this collection. There are no guarantees concerning the order in which the elements are returned (unless this collection is an instance of some class that provides a guarantee).
Since there are no guarantees concerning the order in which the elements are returned for iterator, it is not a problem for iterator to apply to Set, which is unordered.
Further, it is not changing the Set into a List
Set is unordered in a logical sense. When you have a bag of things, there isn't a sense of order when they are inside the bag. But when you take each thing out of the bag, one at a time, you end up with some order. And like the other answer has mentioned, you cannot rely on that order since it is purely accidental.
I understand, set is unordered, In contrast List
This is not necessarily true. SortedSet is a subinterface of Set. As the name implies, instances of this interface are ordered in some fashion. For example, TreeSets are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used. Also, the main distinction between Set and List is that List allows for duplicate objects to be contained, whereas Set does not.
Now, if you are talking specifically about HashSet, then you are correct about being unordered.
I think your confusion is because you are asking yourself "why is the print out showing the numbers in numeric (insertion) order?" This is sort of a complicated answer for someone of your familiarization level, but the order in which they are printed out is because you are inserting integers and their hash code are basically their numeric values. And, although there is no guarantee as to the order in which the elements of the hash set are returned when iterating, the implementation of HashSet is backed by a hash table. In fact, if you change the insertion order of those same values, most likely the numbers will be printed out in the same numeric order. Now, remember that with all that, the order is not guaranteed. This may not be true, for instance, if you change the set elements to be String objects.
This is a very generalized question, I'll try to be as clear as I can. let's say I have some collection of objects, for simplicity, make them integers. Now I want to make a class which represents these integers as some data structure. In this class I want to implement
a sort function, which sorts the collection according to some defined sorting logic
the iterable interface, where the Iterator traverses in insertion order
How could I make it so that, even if I add integers in unsorted order, e.g.
someCollection.add(1);
someCollection.add(3);
someCollection.add(2);
and then call
Collections.sort(someSortingLogic);
The iterator still traverses in insertion order, after the collection is sorted. Is there a particular data structure I could use for this purpose, or would it be a case of manually tracking which elements are inserted in which order, or something else I can't think of?
Many thanks!
Generally, to solve a problem like this, you maintain two indexes to the values. Perhaps one of those indexes contains the actual values, perhaps both indexes contain the actual values, or perhaps the actual values are stored elsewhere.
Then when you want to walk the sorted order, you use the sorted index to the values, and when you want the insertion order, you use the insert index to the values.
An index can be as simple as an array containing the values. Naturally, you can't store two different values into one spot in an array, so a simple solution is to wrap two arrays in an Object, such that calling the Object's sort() method sorts one array, while leaving the insertion order array untouched.
Fancier data structures leverage fancier techniques, but they all basically boil down to maintaining two orders, the insertion order AND the sort order.
public class SomeCollection {
public void add(int value) {
insertArray = expandIfNeeded(insertArray);
insertArray[insertIndex++] = value;
sortArray = expandIfNeeded(sortArray);
sortArray[sortIndex++] = value;
sort(sortArray);
}
...
}
I'm not sure you've shown us enough code to give you a good answer, but if you have a class that looks a bit like this:
public class Hand implements Iterator<Card>
{
private List<Card> cards = new ArrayList<>();
// Returns iterator for natural ordering of cards
#Override
public Iterator<Card> iterator()
{
return cards.iterator();
}
// Rest of code omitted
Then you can implement a sortedIterator(...) method like this:
// Returns iterator for sorted ordering by Comparator c
public Iterator<Card> sortedIterator(Comparator<? super Card> c)
{
return cards.stream().sorted(c).iterator();
}
If you show us some more code for what you have written, there may be better solutions.
Given a sorted array of objects, while the order is based on some object attribute. (Sorting is done via a List using Collections.sort() with a custom Comparator and then calling toArray()).
Duplicate instances of SomeObject are not allowed ("duplicates" in this regard depends on multiple attribute value in SomeObject), but it's possible that multiple instances of SomeObject have the same value for attribute1, which is used for sorting.
public SomeObject {
public attribute1;
public attribute2;
}
List<SomeObject> list = ...
Collections.sort(list, new Comparator<SomeObject>() {
#Override
public int compare(SomeObject v1, SomeObject v2) {
if (v1.attribute1 > v2.attribute1) {
return 1;
} else if (v1.attribute1 < v2.attribute1) {
return -1;
} else
return 0;
}
});
SomeObject[] array = list.toArray(new SomeObject[0]);
How to efficiently check whether a certain object based on some attribute is in that array while also being able to "mark" objects already found in some previous look up (e.g. simply by removing them from the array; already found objects don't need to be accessed at later time).
Without the later requirement, one could do a Arrays.binarySearch() with custom Comparator. But obviously it's not working when one want to remove objects already found.
Use a TreeSet (or TreeMultiset).
You can initialize it with your comparator; it sorts itself; look-up and removal are in logarithmic time.
You can also check for existence and remove in one step, because remove returns a boolean.
Building on Arian's answer, you can also use TreeBag from Apache Commons' Collections library. This is backed by a TreeMap, and maintains a count for repeated elements.
If you want you can put all the elements into some sort of linked list whose nodes are also connected in a heap form when you sort them. That way, finding an element would be log n and you can still delete the nodes in place.
As a sample, I am developing a simple MySortedSet in java which implements SortedSet interface. It is backed up with a simple array which is E[] array.
I have several questions regarding that:
This is the class: (I am not writing entire code, instead of related parts)
public class MySortedSet<E> implements SortedSet<E>, Iterator<E> {
private E[] array;
private Comparator<? super E> _comparator;
private int size = 0;
private int capacity;
#SuppressWarnings("unchecked")
public MySortedSet() {
this.capacity = 10;
this.array = (E[]) new Object[this.capacity];
// this.array = Array.newInstance(Class<E> var,int size);
// We have to get Class<E> from outside caller.
}
}
Question 3: There is a explanation for Constructor since this is a sorted set so it is assumed the elements are sorted:
If this constructor is used to create the sorted set, it is assumed
that the elements are ordered using their natural ordering (i.e., E
implements Comparable).
It gets two different constructor. One is parameterless and the other one is accepting Comparator<? super E>.
public MySortedSet() {
this.capacity = 10;
this.array = (E[]) new Object[this.capacity];
// this.array = Array.newInstance(Class<E> var,int size);
// We have to get Class<E> from outside caller.
}
public MySortedSet(Comparator<? super E> comparator) {
this._comparator = comparator;
}
If comparator is not passed in, natural ordering should be used but I am not really sure how to accomplish it as I need to get Comparator in some way to access compare method. Do you guys please recommend me a way to call it so I can call it while comparing in sort method to sort each element.
For whom they like to see whole code, please refer to this address: http://codepaste.net/4ucvsw it is not perfect but I am working on it.
Question 4: The code guide says :
Because the fastest way to search for a particular item is to use a
binary search, you must ensure that the items in the collection are in
sorted order at all times. This won't be as hard to achieve as it may
seem. When you insert a new item, you can assume the array you are
inserting it into is already sorted. So all you need to do is find the
position in the array where the new item belongs, shift everything
greater than the new item one slot to the right, and insert the new
item. This is called insertion sort.
Here is the sort method logic. I am needed to make binary search to find where a new item belongs to so I can insert the new item in that place and move the other one slot to the right.
But I am not really sure how binary search will work for me here since I am not really sure what I need to find. I am given the item to add not the item to find. Instead what I think is to compare each element with the one I need to add and when I find the last smaller and first greater item, then I will get the first greater item's index, move them to the right and add new item at the index.
Question 3:
The fact is that every collection works with a predefined comparator, which is implicitly defined on the E, so that every class that will be used to concretize the type parameter E should implement Comparable<E>. The compare method you are looking for natural ordering is the method
int compareTo(E other)
that must be implemented by the classes that you are going to use with your data structure. Since your work is not related to defining classes to be used with your collection but just the collection itself what you are going to do it to have
public class MySortedSet<E> ... {
private Comparator<? super E> _comparator;
public int innerCompare(E e1, E e2)
{
if (_comparator != null)
return _comparator.compare(e1,e2);
else
return e1.compareTo(e2);
}
...
So that you'll use a custom comparator when provided, the natural one otherwise.
Both Comparable and Comparator work by following the same principle but the first one, as the name states, is attached to a data class, so that it is its natural comparator. The latter, instead, is used because it allows you to define a custom way to sort elements which would be sorted in a different way according to natural ordering.
Question 4:
What it means is that, under the assumption of having a sorted array, you just must keep this constraint valid after every insertion and you will be allowed to do binary search when looking for items.
You must focus just on placing the elements at correct index (the item you have to add). The part of the statement related to finding elements must be interpreted in the following way:
if you take care of keeping your array sorted, that can be done by ensuring that every element you add is placed in right position (eg with insertion sort), then you can apply binary search on the array when looking if an element is contained in the set.
This is true because, if the array is sorted, you can be sure that looking at the middle element of a section of the array will always point you to the right direction to see if another element is indeed contained in the list.
EG:
1, 2, 6, 11, 21, 30, 45
You need to check for 2, you can take element at index size()/2 = 3, which is 11. Since your already know that the array is sorted and 2 < 11 you can just do the same thing recursively on the left half and so on.
Answer 3:
"Natural ordering" means that the elements must implement Comparable so that they can be compared without the use of a Comparator. The tricky part about giving the caller a choice between Comparable elements and a Comparator is that you can't do compile time checking to make sure they fulfilled at least one of these requirements.
Java's TreeSet similarly exposes a constructor that takes a Comparator and others that don't. TreeSet's contract is essentially to throw a ClassCastException if you try to insert an element that isn't a Comparable if you didn't provide the TreeSet with a Comparator when you created it. Whether you want to use this strategy or another one is up to you.
Answer 4:
Based on the quote's strategy, you should be able to use Arrays.binarySearch(Object[], Object) for this exact purpose. From that method's documentation:
Returns: index of the search key, if it is contained in the array;
otherwise, (-(insertion point) - 1). The insertion point is defined as
the point at which the key would be inserted into the array: the index
of the first element greater than the key, or a.length if all elements
in the array are less than the specified key. Note that this
guarantees that the return value will be >= 0 if and only if the key
is found.
I've been using HashMaps since I started programming again in Java without really understanding these Collections thing.
Honestly I am not really sure if using HashMaps all the way would be best for me or for production code. Up until now it didn't matter to me as long as I was able to get the data I need the way I called them in PHP (yes, I admit whatever negative thing you are thinking right now) where $this_is_array['this_is_a_string_index'] provides so much convenience to recall an array of variables.
So now, I have been working with java for more than 3 months and came across the Interfaces I specified above and wondered, why are there so many of these things (not to mention, vectors, abstractList {oh well the list goes on...})?
I mean how are they different from each other?
And more importantly, what is the best Interface to use in my case?
The API is pretty clear about the differences and/or relations between them:
Collection
The root interface in the collection hierarchy. A collection represents a group of objects, known as its elements. Some collections allow duplicate elements and others do not. Some are ordered and others unordered.
http://download.oracle.com/javase/6/docs/api/java/util/Collection.html
List
An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.
http://download.oracle.com/javase/6/docs/api/java/util/List.html
Set
A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction.
http://download.oracle.com/javase/6/docs/api/java/util/Set.html
Map
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
http://download.oracle.com/javase/6/docs/api/java/util/Map.html
Is there anything in particular you find confusing about the above? If so, please edit your original question. Thanks.
A short summary of common java collections:
'Map': A 'Map' is a container that allows to store key=>value pair. This enables fast searches using the key to get to its associated value. There are two implementations of this in the java.util package, 'HashMap' and 'TreeMap'. The former is implemented as a hastable, while the latter is implemented as a balanced binary search tree (thus also having the property of having the keys sorted).
'Set': A 'Set' is a container that holds only unique elements. Inserting the same value multiple times will still result in the 'Set' only holding one instance of it. It also provides fast operations to search, remove, add, merge and compute the intersection of two sets. Like 'Map' it has two implementations, 'HashSet' and 'TreeSet'.
'List': The 'List' interface is implemented by the 'Vector', 'ArrayList' and 'LinkedList' classes. A 'List' is basically a collection of elements that preserve their relative order. You can add/remove elements to it and access individual elements at any given position. Unlike a 'Map', 'List' items are indexed by an int that is their position is the 'List' (the first element being at position 0 and the last at 'List.size()'-1). 'Vector' and 'ArrayList' are implemented using an array while 'LinkedList', as the name implies, uses a linked list. One thing to note is, unlike php's associative arrays (which are more like a Map), an array in Java and many other languages actually represents a contiguous block of memory. The elements in an array are basically laid out side by side on adjacent "slots" so to speak. This gives very fast lookup and write times, much faster than associative arrays which are implemented using more complex data structures. But they can't be indexed by anything other than the numeric positions within the array, unlike associative arrays.
To get a really good idea of what each collection is good for and their performance characteristics I would recommend getting a good idea about data structures like arrays, linked lists, binary search trees, hashtables, as well as stacks and queues. There is really no substitute to learning this if you want to be an effective programmer in any language.
You can also read the Java Collections trail to get you started.
In Brief (and only looking at interfaces):
List - a list of values, something like a "resizable array"
Set - a container that does not allow duplicates
Map - a collection of key/value pairs
A Map vs a List.
In a Map, you have key/value pairs. To access a value you need to know the key. There is a relationship that exists between the key and the value that persists and is not arbitrary. They are related somehow. Example: A persons DNA is unique (the key) and a persons name (the value) or a persons SSN (the key) and a persons name (the value) there is a strong relationship.
In a List, all you have are values (a persons name), and to access it you need to know its position in the list (index) to access it. But there is no permanent relationship between the position of the value in the list and its index, it is arbitrary.
■ List — An ordered collection of elements that allows duplicate entries
Concrete Classes:
ArrayList — Standard resizable list.
LinkedList — Can easily add/remove from beginning or end.
Vector — Older thread-safe version of ArrayList.
Stack — Older last-in, first-out class.
■ Set — Does not allow duplicates
Concrete Classes:
HashSet—Uses hashcode() to find unordered elements.
TreeSet—Sorted and navigable. Does not allow null values.
■ Queue — Orders elements for processing
Concrete Classes:
LinkedList — Can easily add/remove from beginning or end.
ArrayDeque—First-in, first-out or last-in, first-out. Does not allow null values.
■ Map — Maps unique keys to values
Concrete Classes:
HashMap — Uses hashcode() to find keys.
TreeMap — Sorted map. Does not allow null keys.
Hashtable — Older version of hashmap. Does not allow null keys or values.
That is a question that ultimately has a very complex answer--there are entire college classes dedicated to data structures. The short answer is that they all have trade-offs in memory usage and the speed of various operations.
What would be really healthy is some time with a nice book on data structures--I can almost guarantee that your code will improve significantly if you get a nice understanding of data structures.
That said, I can give you some quick, temporary advice from my experience with Java. For most simple internal things, ArrayList is generally preferred. For passing collections of data about, simple arrays are generally used. HashMap is only really used for cases when there is some logical reason to have special keys corresponding to values--I haven't seen anyone use them as a general data structure for everything. Other structures are more complicated and tend to be used in special cases.
As you already know, they are containers for objects. Reading their respective APIs will help you understand their differences.
Since others have described what are their differences about their usage, I will point you to this link which describes complexity of various data structures.
This list is programming language agnostic, and, as always, real world implementations will vary.
It is useful to understand complexity of various operations for each of these structures, since in the real world, it will matter if you're constantly searching for an object in your 1,000,000 element linked list that's not sorted. Performance will not be optimal.
List Vs Set Vs Map
1) Duplicity: List allows duplicate elements. Any number of duplicate elements can be inserted into the list without affecting the same existing values and their indexes.
Set doesn’t allow duplicates. Set and all of the classes which implements Set interface should have unique elements.
Map stored the elements as key & value pair. Map doesn’t allow duplicate keys while it allows duplicate values.
2) Null values: List allows any number of null values.
Set allows single null value at most.
Map can have single null key at most and any number of null values.
3) Order: List and all of its implementation classes maintains the insertion order.
Set doesn’t maintain any order; still few of its classes sort the elements in an order such as LinkedHashSet maintains the elements in insertion order.
Similar to Set Map also doesn’t stores the elements in an order, however few of its classes does the same. For e.g. TreeMap sorts the map in the ascending order of keys and LinkedHashMap sorts the elements in the insertion order, the order in which the elements got added to the LinkedHashMap.enter code here
List Vs Set Vs Map
1) Duplicity: List allows duplicate elements. Any number of duplicate elements can be inserted into the list without affecting the same existing values and their indexes.
Set doesn’t allow duplicates. Set and all of the classes which implements Set interface should have unique elements.
Map stored the elements as key & value pair. Map doesn’t allow duplicate keys while it allows duplicate values.
2) Null values: List allows any number of null values.
Set allows single null value at most.
Map can have single null key at most and any number of null values.
3) Order: List and all of its implementation classes maintains the insertion order.
Set doesn’t maintain any order; still few of its classes sort the elements in an order such as LinkedHashSet maintains the elements in insertion order.
Similar to Set Map also doesn’t stores the elements in an order, however few of its classes does the same. For e.g. TreeMap sorts the map in the ascending order of keys and LinkedHashMap sorts the elements in the insertion order, the order in which the elements got added to the LinkedHashMap.
Difference between Set, List and Map in Java -
Set, List and Map are three important interface of Java collection framework and Difference between Set, List and Map in Java is one of the most frequently asked Java Collection interview question. Some time this question is asked as When to use List, Set and Map in Java. Clearly, interviewer is looking to know that whether you are familiar with fundamentals of Java collection framework or not. In order to decide when to use List, Set or Map , you need to know what are these interfaces and what functionality they provide. List in Java provides ordered and indexed collection which may contain duplicates. Set provides an un-ordered collection of unique objects, i.e. Set doesn't allow duplicates, while Map provides a data structure based on key value pair and hashing. All three List, Set and Map are interfaces in Java and there are many concrete implementation of them are available in Collection API. ArrayList and LinkedList are two most popular used List implementation while LinkedHashSet, TreeSet and HashSet are frequently used Set implementation. In this Java article we will see difference between Map, Set and List in Java and learn when to use List, Set or Map.
Set vs List vs Map in Java
As I said Set, List and Map are interfaces, which defines core contract e.g. a Set contract says that it can not contain duplicates. Based upon our knowledge of List, Set and Map let's compare them on different metrics.
Duplicate Objects
Main difference between List and Set interface in Java is that List allows duplicates while Set doesn't allow duplicates. All implementation of Set honor this contract. Map holds two object per Entry e.g. key and value and It may contain duplicate values but keys are always unique. See here for more difference between List and Set data structure in Java.
Order
Another key difference between List and Set is that List is an ordered collection, List's contract maintains insertion order or element. Set is an unordered collection, you get no guarantee on which order element will be stored. Though some of the Set implementation e.g. LinkedHashSet maintains order. Also SortedSet and SortedMap e.g. TreeSet and TreeMap maintains a sorting order, imposed by using Comparator or Comparable.
Null elements
List allows null elements and you can have many null objects in a List, because it also allowed duplicates. Set just allow one null element as there is no duplicate permitted while in Map you can have null values and at most one null key. worth noting is that Hashtable doesn't allow null key or values but HashMap allows null values and one null keys. This is also the main difference between these two popular implementation of Map interface, aka HashMap vs Hashtable.
Popular implementation
Most popular implementations of List interface in Java are ArrayList, LinkedList and Vector class. ArrayList is more general purpose and provides random access with index, while LinkedList is more suitable for frequently adding and removing elements from List. Vector is synchronized counterpart of ArrayList. On the other hand, most popular implementations of Set interface are HashSet, LinkedHashSet and TreeSet. First one is general purpose Set which is backed by HashMap , see how HashSet works internally in Java for more details. It also doesn't provide any ordering guarantee but LinkedHashSet does provides ordering along with uniqueness offered by Set interface. Third implementation TreeSet is also an implementation of SortedSet interface, hence it keep elements in a sorted order specified by compare() or compareTo() method. Now the last one, most popular implementation of Map interface are HashMap, LinkedHashMap, Hashtable and TreeMap. First one is the non synchronized general purpose Map implementation while Hashtable is its synchronized counterpart, both doesn' provide any ordering guarantee which comes from LinkedHashMap. Just like TreeSet, TreeMap is also a sorted data structure and keeps keys in sorted order.