Given a sorted array of objects, while the order is based on some object attribute. (Sorting is done via a List using Collections.sort() with a custom Comparator and then calling toArray()).
Duplicate instances of SomeObject are not allowed ("duplicates" in this regard depends on multiple attribute value in SomeObject), but it's possible that multiple instances of SomeObject have the same value for attribute1, which is used for sorting.
public SomeObject {
public attribute1;
public attribute2;
}
List<SomeObject> list = ...
Collections.sort(list, new Comparator<SomeObject>() {
#Override
public int compare(SomeObject v1, SomeObject v2) {
if (v1.attribute1 > v2.attribute1) {
return 1;
} else if (v1.attribute1 < v2.attribute1) {
return -1;
} else
return 0;
}
});
SomeObject[] array = list.toArray(new SomeObject[0]);
How to efficiently check whether a certain object based on some attribute is in that array while also being able to "mark" objects already found in some previous look up (e.g. simply by removing them from the array; already found objects don't need to be accessed at later time).
Without the later requirement, one could do a Arrays.binarySearch() with custom Comparator. But obviously it's not working when one want to remove objects already found.
Use a TreeSet (or TreeMultiset).
You can initialize it with your comparator; it sorts itself; look-up and removal are in logarithmic time.
You can also check for existence and remove in one step, because remove returns a boolean.
Building on Arian's answer, you can also use TreeBag from Apache Commons' Collections library. This is backed by a TreeMap, and maintains a count for repeated elements.
If you want you can put all the elements into some sort of linked list whose nodes are also connected in a heap form when you sort them. That way, finding an element would be log n and you can still delete the nodes in place.
Related
This is a very generalized question, I'll try to be as clear as I can. let's say I have some collection of objects, for simplicity, make them integers. Now I want to make a class which represents these integers as some data structure. In this class I want to implement
a sort function, which sorts the collection according to some defined sorting logic
the iterable interface, where the Iterator traverses in insertion order
How could I make it so that, even if I add integers in unsorted order, e.g.
someCollection.add(1);
someCollection.add(3);
someCollection.add(2);
and then call
Collections.sort(someSortingLogic);
The iterator still traverses in insertion order, after the collection is sorted. Is there a particular data structure I could use for this purpose, or would it be a case of manually tracking which elements are inserted in which order, or something else I can't think of?
Many thanks!
Generally, to solve a problem like this, you maintain two indexes to the values. Perhaps one of those indexes contains the actual values, perhaps both indexes contain the actual values, or perhaps the actual values are stored elsewhere.
Then when you want to walk the sorted order, you use the sorted index to the values, and when you want the insertion order, you use the insert index to the values.
An index can be as simple as an array containing the values. Naturally, you can't store two different values into one spot in an array, so a simple solution is to wrap two arrays in an Object, such that calling the Object's sort() method sorts one array, while leaving the insertion order array untouched.
Fancier data structures leverage fancier techniques, but they all basically boil down to maintaining two orders, the insertion order AND the sort order.
public class SomeCollection {
public void add(int value) {
insertArray = expandIfNeeded(insertArray);
insertArray[insertIndex++] = value;
sortArray = expandIfNeeded(sortArray);
sortArray[sortIndex++] = value;
sort(sortArray);
}
...
}
I'm not sure you've shown us enough code to give you a good answer, but if you have a class that looks a bit like this:
public class Hand implements Iterator<Card>
{
private List<Card> cards = new ArrayList<>();
// Returns iterator for natural ordering of cards
#Override
public Iterator<Card> iterator()
{
return cards.iterator();
}
// Rest of code omitted
Then you can implement a sortedIterator(...) method like this:
// Returns iterator for sorted ordering by Comparator c
public Iterator<Card> sortedIterator(Comparator<? super Card> c)
{
return cards.stream().sorted(c).iterator();
}
If you show us some more code for what you have written, there may be better solutions.
I am curious that in the Java collections library, HashMap has a method that searches for the existance of a particular object value called containsValue(Object value) returing a boolean, but no method exists to get the value object by value object directly like you do by providing a key via the get(Object key) method. Now, I know that the purpose of HashMap is to access object values via the keys, but in exceptional cases may want retrieve via the object value, so why is there not a getValue(Object value) method? I ask this, because the algorithm that the method containsValue() implements to search for the object value is faster than my custom search (see below). Also, is there a better way to accomplish this search using HashMap in Java 7 ?
Code Snippet:
// Custom Search
MyCustomer findCust = new MyCustomer(50000, "Joe Bloggs", "London");
for (MyCustomer value : hashMap.values()) {
if (value.equals(findCust)) { // found
cust = value;
break;
}
}
The basic assumption of the collections framework is that if two objects are .equals, they are interchangeable in every way. Given that assumption, there's no reason to get out the value from a Map, because you already have one that is equals and interchangeable. As far as the Collections Framework is concerned, these two methods are fully equivalent:
for (V value : map.values()) {
if (value.equals(myValue)) {
return value;
}
}
and
if (map.containsValue(myValue)) {
return myValue;
}
This assumption is built into the Collections Framework in many places, and this is one of many examples.
hashMap.values().contains(findCust)
You will need equals and hashCode on Customer based on your "business rules" (for example, are two customers with the same "id" but with different other values "equal"????... Obviously you are already doing that because you are using equals...)
HashMap is designed to aid constant lookups using hashcode() and equals() of the key you use to put some value into map.
If you look at the internal structure of HashMap, it's nothing but an array. Each index is called a bucket which can be obtained by normalizing current array's length and the hashcode of the key you pass. Once you find the bucket, it will store the element at that particular index. But if there's already some element stored in that index, they will form a LinkedList of these elements chaining all the values having same hashcode() but different equals() criteria.
In Java 8, this linked list is even changed to TreeMap if the number of elements in that linked list reaches some threshold (8) for improving performance.
Coming to your question, containsValue() basically iterates over all the buckets in the array and again through all the elements in the linked list of each bucket
// iterate through buckets
for (int i = 0; i < table.length; ++i) {
// iterate through each element in linked list at each bucket
for (Node<K,V> e = table[i]; e != null; e = e.next) {
if ((v = e.value) == value ||
(value != null && value.equals(v)))
return true;
}
}
HashMap.values() returns a Collection with the iterator implemented to traverse each element in HashMap providing access to Value object in each iteration.
containsValue() is used when you want to do something if some value is already there in the map but you don't need that value to proceed with your flow.This is merely a convenience method because if you're using values, you will be creating a Collection object and an iterator object to iterate over them but using containsValue(), you just have two nested for loops. I think the reason for not having a getValue() is to encourage the purpose HashMap is intended for - near constant time look ups using hashcode & equals of some key.
values() is used when you basically need to iterate over all the values. This is different from calling map.get(key) in a loop because you don't have to normalize the hashcode, find the bucket, then find the element in the linked list in each iteration, you just loop in the natural order, the way the elements are laid out in the array.
If you're doing this value lookup way too many times, you lose the advantage of constant lookups offered by HashMap. If you're only going to skim through the values searching for some value, I suggest you use an ArrayList. If there are too many elements in that list, and you need to search for some random value quite often, sort the list and use Binary Search.
I need a collection class which has both: quick index and hash access.
Now I have ArrayList. It has good index acces, but his contains method is not performant. HashSet has good contains implementation but no indexed acces. Which collection has both? Probably something from Apache?
Or should I create my own collection class which has both: ArrayList for indexed acces and HashSet for contains check?
Just for clarification: i need both get(int index) and contains(Object o)
If indexed access performance is not a problem the closest match is LinkedHashSet whose API says that it is
Hash table and linked list implementation of the Set interface, with predictable iteration order.
at least I dont think that the performance will be be worse than that of LinkedListPerformance. Otherwise I cannot see no alternative but your ArrayList + HashTable solution
If you are traversing the index from start to finish, I think this might satisfy your needs: LinkedHashSet
If you need to randomly access via the index, as well as hash access, if no-one else has a better suggestion I guess you can make your own collection which does both.
Do like this; use combination of Hash technique as well as list to get best of both worlds :)
class DataStructure<Integer>{
Hash<Integer,Integer> hash = new HashMap<Integer, Integer>();
List<Integer> list = new ArrayList<Integer>();
public void add(Integer i){
hash.add(i,i);
list.add(i);
}
public Integer get(int index){
return list.get(index);
}
...
} //used Integers to make it simpler
So object; you keep in HashMap/HashSet as well as ArrayList.
So if you want to use
contains method : call hashed contains method.
get an object with index: use array to return the value
Just make sure that you have both these collections in sync. And take care of updation/deletion in both data structures.
I don't know the exact lookup times, but maybe you could use some implementation of the Map interface. You could store you objects with map.put(objectHash, obj).
Then you could verify that you have a certain object with:
boolean contained = map.containsValue(obj);
And you can use the hash to lookup an object in the map:
MyObject object = map.get(objectHash);
Though, the only downfall is that you would need to know your hashes on this lookup call which may not be probably in your implementation.
As a sample, I am developing a simple MySortedSet in java which implements SortedSet interface. It is backed up with a simple array which is E[] array.
I have several questions regarding that:
This is the class: (I am not writing entire code, instead of related parts)
public class MySortedSet<E> implements SortedSet<E>, Iterator<E> {
private E[] array;
private Comparator<? super E> _comparator;
private int size = 0;
private int capacity;
#SuppressWarnings("unchecked")
public MySortedSet() {
this.capacity = 10;
this.array = (E[]) new Object[this.capacity];
// this.array = Array.newInstance(Class<E> var,int size);
// We have to get Class<E> from outside caller.
}
}
Question 3: There is a explanation for Constructor since this is a sorted set so it is assumed the elements are sorted:
If this constructor is used to create the sorted set, it is assumed
that the elements are ordered using their natural ordering (i.e., E
implements Comparable).
It gets two different constructor. One is parameterless and the other one is accepting Comparator<? super E>.
public MySortedSet() {
this.capacity = 10;
this.array = (E[]) new Object[this.capacity];
// this.array = Array.newInstance(Class<E> var,int size);
// We have to get Class<E> from outside caller.
}
public MySortedSet(Comparator<? super E> comparator) {
this._comparator = comparator;
}
If comparator is not passed in, natural ordering should be used but I am not really sure how to accomplish it as I need to get Comparator in some way to access compare method. Do you guys please recommend me a way to call it so I can call it while comparing in sort method to sort each element.
For whom they like to see whole code, please refer to this address: http://codepaste.net/4ucvsw it is not perfect but I am working on it.
Question 4: The code guide says :
Because the fastest way to search for a particular item is to use a
binary search, you must ensure that the items in the collection are in
sorted order at all times. This won't be as hard to achieve as it may
seem. When you insert a new item, you can assume the array you are
inserting it into is already sorted. So all you need to do is find the
position in the array where the new item belongs, shift everything
greater than the new item one slot to the right, and insert the new
item. This is called insertion sort.
Here is the sort method logic. I am needed to make binary search to find where a new item belongs to so I can insert the new item in that place and move the other one slot to the right.
But I am not really sure how binary search will work for me here since I am not really sure what I need to find. I am given the item to add not the item to find. Instead what I think is to compare each element with the one I need to add and when I find the last smaller and first greater item, then I will get the first greater item's index, move them to the right and add new item at the index.
Question 3:
The fact is that every collection works with a predefined comparator, which is implicitly defined on the E, so that every class that will be used to concretize the type parameter E should implement Comparable<E>. The compare method you are looking for natural ordering is the method
int compareTo(E other)
that must be implemented by the classes that you are going to use with your data structure. Since your work is not related to defining classes to be used with your collection but just the collection itself what you are going to do it to have
public class MySortedSet<E> ... {
private Comparator<? super E> _comparator;
public int innerCompare(E e1, E e2)
{
if (_comparator != null)
return _comparator.compare(e1,e2);
else
return e1.compareTo(e2);
}
...
So that you'll use a custom comparator when provided, the natural one otherwise.
Both Comparable and Comparator work by following the same principle but the first one, as the name states, is attached to a data class, so that it is its natural comparator. The latter, instead, is used because it allows you to define a custom way to sort elements which would be sorted in a different way according to natural ordering.
Question 4:
What it means is that, under the assumption of having a sorted array, you just must keep this constraint valid after every insertion and you will be allowed to do binary search when looking for items.
You must focus just on placing the elements at correct index (the item you have to add). The part of the statement related to finding elements must be interpreted in the following way:
if you take care of keeping your array sorted, that can be done by ensuring that every element you add is placed in right position (eg with insertion sort), then you can apply binary search on the array when looking if an element is contained in the set.
This is true because, if the array is sorted, you can be sure that looking at the middle element of a section of the array will always point you to the right direction to see if another element is indeed contained in the list.
EG:
1, 2, 6, 11, 21, 30, 45
You need to check for 2, you can take element at index size()/2 = 3, which is 11. Since your already know that the array is sorted and 2 < 11 you can just do the same thing recursively on the left half and so on.
Answer 3:
"Natural ordering" means that the elements must implement Comparable so that they can be compared without the use of a Comparator. The tricky part about giving the caller a choice between Comparable elements and a Comparator is that you can't do compile time checking to make sure they fulfilled at least one of these requirements.
Java's TreeSet similarly exposes a constructor that takes a Comparator and others that don't. TreeSet's contract is essentially to throw a ClassCastException if you try to insert an element that isn't a Comparable if you didn't provide the TreeSet with a Comparator when you created it. Whether you want to use this strategy or another one is up to you.
Answer 4:
Based on the quote's strategy, you should be able to use Arrays.binarySearch(Object[], Object) for this exact purpose. From that method's documentation:
Returns: index of the search key, if it is contained in the array;
otherwise, (-(insertion point) - 1). The insertion point is defined as
the point at which the key would be inserted into the array: the index
of the first element greater than the key, or a.length if all elements
in the array are less than the specified key. Note that this
guarantees that the return value will be >= 0 if and only if the key
is found.
I want to create a large (~300,000 entries) List of self defined objects of the class Drug.
Every Drug has an ID and I want to be able to search the Drugs in logarithmic time via that ID.
What kind of List do I have to use?
How do I declare that it should be searchable via the ID?
The various implementations of the Map interface should do what you want.
Just remember to override the hashCode() method of your Drug class if you plan to use a HashMap.
public class Drug implements Comparable<Drug> {
public int compareTo(Drug o) {
return this.id.compareTo(o.getId());
}
}
Then in your List you can use binarySearch
List<Drug> drugList; <--- List of all drugs
Drug drugToSearchFor; <---- The drug that you want to search for, containing the id
// Sort before search
Collections.sort(drugList);
int index = Collections.binarySearch(drugList, drugToSearchFor);
if (index >= 0) {
return true;
} else {
return false;
}
Wouldn't you use TreeMap instead of List using the ID as your Key?
If searching by a key is important for you, then you probably need to use a Map and not a List. From the Java Collections Trail:
The three general-purpose Map
implementations are HashMap, TreeMap
and LinkedHashMap. If you need
SortedMap operations or key-ordered
Collection-view iteration, use
TreeMap; if you want maximum speed and
don't care about iteration order, use
HashMap; if you want near-HashMap
performance and insertion-order
iteration, use LinkedHashMap.
Due to the high number of entries you might consider to use a database instead of holding everything in memory.
If you still want to keep it in memory you might have a look at b-trees.
You could use any list, and as long as it is sorted you can use a binary search.
But I would use a Map which searches in O(1).
I know I am pretty redundant with this statement, but as everybody said isnt this exactly the case for a Map ?