As a sample, I am developing a simple MySortedSet in java which implements SortedSet interface. It is backed up with a simple array which is E[] array.
I have several questions regarding that:
This is the class: (I am not writing entire code, instead of related parts)
public class MySortedSet<E> implements SortedSet<E>, Iterator<E> {
private E[] array;
private Comparator<? super E> _comparator;
private int size = 0;
private int capacity;
#SuppressWarnings("unchecked")
public MySortedSet() {
this.capacity = 10;
this.array = (E[]) new Object[this.capacity];
// this.array = Array.newInstance(Class<E> var,int size);
// We have to get Class<E> from outside caller.
}
}
Question 3: There is a explanation for Constructor since this is a sorted set so it is assumed the elements are sorted:
If this constructor is used to create the sorted set, it is assumed
that the elements are ordered using their natural ordering (i.e., E
implements Comparable).
It gets two different constructor. One is parameterless and the other one is accepting Comparator<? super E>.
public MySortedSet() {
this.capacity = 10;
this.array = (E[]) new Object[this.capacity];
// this.array = Array.newInstance(Class<E> var,int size);
// We have to get Class<E> from outside caller.
}
public MySortedSet(Comparator<? super E> comparator) {
this._comparator = comparator;
}
If comparator is not passed in, natural ordering should be used but I am not really sure how to accomplish it as I need to get Comparator in some way to access compare method. Do you guys please recommend me a way to call it so I can call it while comparing in sort method to sort each element.
For whom they like to see whole code, please refer to this address: http://codepaste.net/4ucvsw it is not perfect but I am working on it.
Question 4: The code guide says :
Because the fastest way to search for a particular item is to use a
binary search, you must ensure that the items in the collection are in
sorted order at all times. This won't be as hard to achieve as it may
seem. When you insert a new item, you can assume the array you are
inserting it into is already sorted. So all you need to do is find the
position in the array where the new item belongs, shift everything
greater than the new item one slot to the right, and insert the new
item. This is called insertion sort.
Here is the sort method logic. I am needed to make binary search to find where a new item belongs to so I can insert the new item in that place and move the other one slot to the right.
But I am not really sure how binary search will work for me here since I am not really sure what I need to find. I am given the item to add not the item to find. Instead what I think is to compare each element with the one I need to add and when I find the last smaller and first greater item, then I will get the first greater item's index, move them to the right and add new item at the index.
Question 3:
The fact is that every collection works with a predefined comparator, which is implicitly defined on the E, so that every class that will be used to concretize the type parameter E should implement Comparable<E>. The compare method you are looking for natural ordering is the method
int compareTo(E other)
that must be implemented by the classes that you are going to use with your data structure. Since your work is not related to defining classes to be used with your collection but just the collection itself what you are going to do it to have
public class MySortedSet<E> ... {
private Comparator<? super E> _comparator;
public int innerCompare(E e1, E e2)
{
if (_comparator != null)
return _comparator.compare(e1,e2);
else
return e1.compareTo(e2);
}
...
So that you'll use a custom comparator when provided, the natural one otherwise.
Both Comparable and Comparator work by following the same principle but the first one, as the name states, is attached to a data class, so that it is its natural comparator. The latter, instead, is used because it allows you to define a custom way to sort elements which would be sorted in a different way according to natural ordering.
Question 4:
What it means is that, under the assumption of having a sorted array, you just must keep this constraint valid after every insertion and you will be allowed to do binary search when looking for items.
You must focus just on placing the elements at correct index (the item you have to add). The part of the statement related to finding elements must be interpreted in the following way:
if you take care of keeping your array sorted, that can be done by ensuring that every element you add is placed in right position (eg with insertion sort), then you can apply binary search on the array when looking if an element is contained in the set.
This is true because, if the array is sorted, you can be sure that looking at the middle element of a section of the array will always point you to the right direction to see if another element is indeed contained in the list.
EG:
1, 2, 6, 11, 21, 30, 45
You need to check for 2, you can take element at index size()/2 = 3, which is 11. Since your already know that the array is sorted and 2 < 11 you can just do the same thing recursively on the left half and so on.
Answer 3:
"Natural ordering" means that the elements must implement Comparable so that they can be compared without the use of a Comparator. The tricky part about giving the caller a choice between Comparable elements and a Comparator is that you can't do compile time checking to make sure they fulfilled at least one of these requirements.
Java's TreeSet similarly exposes a constructor that takes a Comparator and others that don't. TreeSet's contract is essentially to throw a ClassCastException if you try to insert an element that isn't a Comparable if you didn't provide the TreeSet with a Comparator when you created it. Whether you want to use this strategy or another one is up to you.
Answer 4:
Based on the quote's strategy, you should be able to use Arrays.binarySearch(Object[], Object) for this exact purpose. From that method's documentation:
Returns: index of the search key, if it is contained in the array;
otherwise, (-(insertion point) - 1). The insertion point is defined as
the point at which the key would be inserted into the array: the index
of the first element greater than the key, or a.length if all elements
in the array are less than the specified key. Note that this
guarantees that the return value will be >= 0 if and only if the key
is found.
Related
My usecase is like in which I need to maintain a collection of unique items. I may need to frequently add or remove items from the collection at an index(which I have as member of the item currently, but I am open to modification), and while doing that I need to update index of items.
I am not able to decide which Java collection would suit my needs best. HashSet and SortedSet both guarantee uniqueness, but not sure how index part can be taken care of.
According to the question + comments, you have the following fundamental requirements for the collection:
The elements in the collection must be unique.
The collection must maintain the elements in an order specified by the user.
The collection elements must have unique indexes representing the element's current position.
The indexes must adjust as elements are inserted and deleted.
There is no (single) Java SE collection type that does 1 and 2, 3 or 4.
The only Java SE collection type that supports and can maintain an arbitrary ordering is List, so you need to start with that. Something like this for instance:
public class MyList<E> extends ArrayList<E> {
...
#Override
public void add(int pos, <E> e) {
if (this.contains(e)) {
throw new SomeException("already in collection");
}
this.add(pos, e);
}
}
Note that HashMap<Integer, E> is a possible alternative, but adjusting the indexes as elements are added and removed is complicated. (Comparing the performance characteristics is not straightforward, but the chances are that it won't matter in your use-case.)
So, in some question I was required to implement the following:
Data structure of fixed size (n=10), that is always ordered (descending, not that it matters), thread safe, and supports random access.
My solution was - using a TreeSet, whenever adding an element, if there are already n elements, remove the smallest element (if the new element if bigger than it) and add the new element. Otherwise, just add the new element.
When accessing a random index, use the TreeSet iterator to iterate until the required index.
I don't like this solution so much. So I thought of another solution:
Using an ArrayList, constructed with the size of n. Whenever trying to add an element, do a Collections.binarySearch() for the element and insert it if it doesn't exists, using the index returned from binarySearch. If after adding the element the list length is bigger than n (equals n+1 actually), remove the smallest element (which is on the end of the list). This way, we get log(n) for add (same as TreeSet from previous solution) and random access is O(1). Only thing I don't like about it is that the add() for an arbitrary index in the middle of the list requires shifting all the elements after it. (works well for small n but for big n maybe not?)
For both solutions I use ReentrantReadWriteLock - acquire writeLock() for add and readLock() for the get() / read operations.
Is there a better solution?
**
Collections.synchronizedList(List i) makes the passed argument a threadsafe List.
you can implement the comparable interface when creating your class and override compareTo() method in a way that it orders the element by descending order when you are adding them to the ArrayList<>() or you can go for Comparator class and overriding compare() method while sorting it.
In the total collection, only List(I) supports RandomAccess.
ArrayList<Employee> arrayList = Collections.synchronizedCollection(new ArrayList<Employee>(10));
and if you want the same item should not be added to the ArrayList use Comparator and return 0 when a new item (want to add) and last item (already added) is equal. handle the return value in such a manner that if (...==0){ don't add to the data ) else { add it }.
I hope I could give u some hint.
A HW question
You are given a list, L , and a another list, P, containing integers sorted in ascending order. The operation printLots(L,P) will print the elements in L that are in positions specified by P. For instance, if P=1,3,4,6, the elements in positions 1,3,4, and 6 in L are printed. Write the procedure printLots(L,P). The code you provide should be the java method itself (not pseudocode), the containing class is not necessary.
You may use only the public Collection https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html methods that are inherited by lists L and P. You may not use methods that are only in List. Anything that the Collection methods return is fair game, so you might think about how to use iterators.
Hi all, above all is a part of a problem. I am very confused about what it means. I am not very familiar with Collection and lists. So does this requirement means that I cannot use methods such as get() and instead have to use Iterator?
Here is my code:
public static void printLots(List<Integer> L, List<Integer> P){
int size = L.size;
for(int i=0; i<P.size; i++){
int pos = P.get(i);
if(P.get(i) <= size){
int val = L.get(pos);
System.out.println(val);
}
else{
System.out.println("It has exceeded the number of elements in L");
}
}
}
No, you cannot use List.get(idx).
I think they are trying to get you to think about a more "interesting" way of doing it. Collection does technically defined a toArray method that you could use, but it's probably against the spirit of the problem you have been set.
Collection also has a contains method, so you could iterate over your first list, increment a counter, and see if the second contains that index each time. If so, print it. It's a very inefficient way, but it would satisfy the problem.
I would suggest changing the declaration of your parameters to Collection<Integer> instead, that way the compiler will prevent you from accidentally using methods that exist in List
I'd suspect that you're right and .get(int) is not allowed.
Since positions in P are given in ascending order, it's easy to use an iterator and keep a counter of the element currently "iterated at". The counter is also strictly ascending.
A "cheat" would be to use .toArray() and use index access in it :)
Can anyone explain why subList() doesn't behave as subSet() method, and throws a ConcurrentModificationException, while subSet not. Both methods created a Backed Collection, so probably the subList() method designers created this method relying on a unmodifiable original list, but wouldn't be better if all Backed Collections had the same behavior (like subSet())?
// code
public class ConcurrentModificationException {
public static void main(String[] args) {
String[] array = {"Java","Python","Pearl","Ada","Javascript","Go","Clojure"};
subListEx(array);
subSetEx(array);
}
private static void subListEx(String[] array) {
List<String> l = new ArrayList<String>(Arrays.asList(array));
List<String> l2 = l.subList(2, 4);
System.out.println(l.getClass().getName());
// l.add("Ruby"); // ConcurrentModificationException
// l.remove(2); // ConcurrentModificationException
l2.remove("Ada"); // OK
for (String s:l) { System.out.print(s+", "); }
System.out.println();
for (String s:l2) { System.out.print(s+", "); }
}
private static void subSetEx(String[] array) {
SortedSet<String> s1 = new TreeSet<String>(Arrays.asList(array));
SortedSet<String> s2 = s1.subSet("Java", "Python");
s1.remove("Ada");
for (String s:s1) { System.out.print(s+", "); }
System.out.println();
for (String s:s2) { System.out.print(s+", "); }
}}
Thanks in advance!
It's already clear that the behaviour is as per documented. But I think your main question is why is the behaviour different for ArrayList and TreeSet. Well, it has to do with how data is stored internally in both the collections.
An ArrayList internally uses an array to store the data, which is re-sized as the size of ArrayList dynamically increases. Now, when you create a subList of your given list, original list with the specified indices is associated with the subList. So, any structural changes (that screws the indexing of the original array), done in the original list, will make the index stored as a part of sublist meaningless. That is why any structural changes is not allowed in case of ArrayList#subList method.
The subList method returns you an instance of an inner class named SubList inside the ArrayList class, which looks like:
private class SubList extends AbstractList<E> implements RandomAccess {
private final AbstractList<E> parent;
private final int parentOffset;
private final int offset;
int size;
SubList(AbstractList<E> parent,
int offset, int fromIndex, int toIndex) {
this.parent = parent;
this.parentOffset = fromIndex;
this.offset = offset + fromIndex;
this.size = toIndex - fromIndex;
this.modCount = ArrayList.this.modCount;
}
As you see, the SubList contains a reference to the original list. And the parentOffset is nothing but the starting index of the subList you are creating. Now modifying the original list will possibly change the value at fromIndex in original list, but not inside the SubList. In that case, parentOffset in SubList class and fromIndex in original list, will point to different array elements. It might also be possible that at some point the original array becomes shorter enough to invalidate index stored in the SubList and make it OutOfRange. This is certainly not desirable, and the semantics of the subList returned is considered undefined, on such structural changes to original list.
On the other hand, a TreeSet stores it's data internally in a TreeMap. Now as there is no such concept of indices in a Map, there is no issue of indices breaking up. A Map is nothing but a mapping of key-value pair. Creating a SubSet involves creating a SubMap which is backed by the original Map. Modifying the original Set will just require the corresponding key-value mapping being invalidated, thus propagating the changes to the subMap created for the subSet.
The contract for List.subList(int, int) covers this. Here are the relevant parts, emphasis mine.
The returned list is backed by this list, so non-structural changes in
the returned list are reflected in this list, and vice-versa.
...
The semantics of the list returned by this method become undefined if the
backing list (i.e., this list) is structurally modified in any way
other than via the returned list. (Structural modifications are those
that change the size of this list, or otherwise perturb it in such a
fashion that iterations in progress may yield incorrect results.)
In your sample, you are making structural changes to the backing list, thus the results are undefined.
The Javadoc for both are very clear on this:
From ArrayList.sublist()
The semantics of the list returned by this method become undefined if the backing list (i.e., this list) is structurally modified in any way other than via the returned list. (Structural modifications are those that change the size of this list, or otherwise perturb it in such a fashion that iterations in progress may yield incorrect results.)
So while you're not guaranteed to receive a ConcurrentModificationException, it's not out of the question. The behavior is undefined if you modify the backing list.
Whereas...
From TreeSet.subSet()
Returns a view of the portion of this set whose elements range from fromElement, inclusive, to toElement, exclusive. (If fromElement and toElement are equal, the returned set is empty.) The returned set is backed by this set, so changes in the returned set are reflected in this set, and vice-versa. The returned set supports all optional set operations that this set supports.
There is no cautioning here and modifications to the backing set are fine and pose no issue.
The documentation is very clear: aList.subList() returns a view of the list. The returned object is backed by the original list, or so you should assume.
The documentation is also very clear regarding Sets:
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException.
Why producing the issue with Lists is easier than with Set's? Because when in a Set you delete an element the effect is very clear: the element is not in the set anymore, both for the set and any subset. Same thing with insert operations. But when you delete an element from a List, does it mean that sublists adjust their indices accordingly (and possibly size), or they just preserve their size by adding elements from the right or the left? And, in the second case, what if the backing list becomes too short? The behavior is too complex to define. And also too limiting for implementations.
Example: A list is [a,b,c,d], a sublist is list.subList(1,3) ([b,c]). a is removed from list. The effect in list is clear. But does sublist stays [b,c] (changing its range to (0,2)), or becomes [c,d] (preserving its range in (1,3))?
Example: A list is [a,b,c,d], a sublist is list.subList(1,3) ([b,c]). b is removed from list. Again, the effect in list is clear. But does sublist become [c] (1,2), [a,c] (0,2), or [c,d] (1,3)?
Given a sorted array of objects, while the order is based on some object attribute. (Sorting is done via a List using Collections.sort() with a custom Comparator and then calling toArray()).
Duplicate instances of SomeObject are not allowed ("duplicates" in this regard depends on multiple attribute value in SomeObject), but it's possible that multiple instances of SomeObject have the same value for attribute1, which is used for sorting.
public SomeObject {
public attribute1;
public attribute2;
}
List<SomeObject> list = ...
Collections.sort(list, new Comparator<SomeObject>() {
#Override
public int compare(SomeObject v1, SomeObject v2) {
if (v1.attribute1 > v2.attribute1) {
return 1;
} else if (v1.attribute1 < v2.attribute1) {
return -1;
} else
return 0;
}
});
SomeObject[] array = list.toArray(new SomeObject[0]);
How to efficiently check whether a certain object based on some attribute is in that array while also being able to "mark" objects already found in some previous look up (e.g. simply by removing them from the array; already found objects don't need to be accessed at later time).
Without the later requirement, one could do a Arrays.binarySearch() with custom Comparator. But obviously it's not working when one want to remove objects already found.
Use a TreeSet (or TreeMultiset).
You can initialize it with your comparator; it sorts itself; look-up and removal are in logarithmic time.
You can also check for existence and remove in one step, because remove returns a boolean.
Building on Arian's answer, you can also use TreeBag from Apache Commons' Collections library. This is backed by a TreeMap, and maintains a count for repeated elements.
If you want you can put all the elements into some sort of linked list whose nodes are also connected in a heap form when you sort them. That way, finding an element would be log n and you can still delete the nodes in place.