Sorted set with fast lookup (as fast as HashSet?) - java

I'm looking for some kind of set data structure that can meet both of these requirements:
Sorted
O(1) for lookup
This is what I got so far, but I really hope there's an existing, less-awkward, data structure out there.
/**
* This MUST support both
* (1) Looking up by A - O(n)
* (2) Iteration by sorted Foo<A, B>
*/
public class MySet<Foo<A, B>> extends TreeSet<Foo<A, B>>
{
private Map<A, Foo<A, B>> temp = new HashMap<A, Foo<A, B>>();
public Foo<A, B> getNode(A a)
{
return temp.get(a);
}
#Override
public boolean add(Foo<A, B> foo)
{
temp.put(foo.getA(), foo);
return super.add(foo);
}
}
And my Foo class looks like this:
public class Foo<A, B>
{
private A a; //Can NEVER be null
private B b; //Can NEVER be null
//... constructor and stuff omitted
public int compareTo(Foo<A, B> that)
{
if (this.equals(that))
return 0;
//Compare by a first
int ret = this.a.compareTo(that);
if (ret == 0)
return 0;
//Compare by b
return this.b.compareTo(that.b);
}
public boolean equals(Object obj)
{
if (!(obj instanceof Foo))
return false;
Foo rhs = (Foo) obj;
return this.a.equals(rhs.a) && this.b.equals(rhs.b);
}
}
UPDATE:
Here's a use case for my set:
MySet<Foo<SomeA, SomeB>> mySet = getTheData(); //getTheData() returns a set with a bunch of Foo objects
SomeA a = getA(); //getA() returns some instance of SomeA that I'm interested in
I want to be able to check the set and RETRIEVE a Foo object (if exists) such that Foo.getA() == a;
mySet.getNode(a);

You can get it by using some additional space. So you need a HashSet. Additionally, each element will point to the next value in the sort order. Let's say you have keys 1, 3, 5, 10 and you are using linear probing.
value array = [3, 5, null, null, 10, 1];
pointer array = [1, 4, null, null, null, 0];
So the value array contains the values. The hash function decides where the value goes. so in the above example, h(1) = 5 (1 goes to index 5), h(3) = 0, h(5) = 1, and h(10) = 4. The indexes 2, 3 have null (open spaces for future elements). The pointer array says which element follows the current element in the sorting order. So let's say we are doing set.contains(3), it will result in computing h(3) (which will yield 0), and we know that the element exists in the set. If we want the next element in the set of elements according to sort order, we look at the value in the pointer array. So for value 3 (which is at position 0 in the value array), we get the next element in the sort order by looking up the index in the pointer array (pointer_array[0], which is 1), and then looking up value_array[1], which is 5.
This is a very common implementation. Java's LinkedHashMap is usually used as a LRU cache, which is implemented as a hash map + a doubly linked list of keys. The keys in the doubly linked list are in the order of their access.
In your case, when you insert an element, you need to adjust your pointer array which is very slow. You have to do a linear scan. If this is not read only, you can use the following approach.
In your data structure, have a hashset and a avl tree, a red black tree or any other balanced binary tree. Whenever you do a containsKey test, it's O(1). Whenever you are enumerating, you can traverse them using the binary tree in sorted order in linear time. Whenever you are inserting a new element, you also insert it into both the binary tree and HashSet. When you delete, you delete the element from the hash set and the binary tree. So deletes and inserts become O(log n).

I think you should try using MultiMaps in google guava library.
Its pretty simple to use also:
Map<Salesperson, List<Sale>> map = new Hashmap<SalesPerson, List<Sale>>();
public void makeSale(Salesperson salesPerson, Sale sale) {
List<Sale> sales = map.get(salesPerson);
if (sales == null) {
sales = new ArrayList<Sale>();
map.put(salesPerson, sales);
}
sales.add(sale);
}
can be replaced by,
Multimap<Salesperson, Sale> multimap = new ArrayListMultimap<Salesperson,Sale>();
public void makeSale(Salesperson salesPerson, Sale sale) {
multimap.put(salesperson, sale);
}
But you have to be careful here, multimaps will preserve the entries with same keys unlike hashmaps which replaces the equivalent keys with the latest one.
Google Guava Libraries feature a lot of other data structures with different functionality. You can find the information about it on its wiki.
Hope this was helpful.

Related

TreeSet Comparator failed to remove duplicates in some cases?

I have the following comparator for my TreeSet:
public class Obj {
public int id;
public String value;
public Obj(int id, String value) {
this.id = id;
this.value = value;
}
public String toString() {
return "(" + id + value + ")";
}
}
Obj obja = new Obj(1, "a");
Obj objb = new Obj(1, "b");
Obj objc = new Obj(2, "c");
Obj objd = new Obj(2, "a");
Set<Obj> set = new TreeSet<>((a, b) -> {
System.out.println("Comparing " + a + " and " + b);
int result = a.value.compareTo(b.value);
if (a.id == b.id) {
return 0;
}
return result == 0 ? Integer.compare(a.id, b.id) : result;
});
set.addAll(Arrays.asList(obja, objb, objc, objd));
System.out.println(set);
It prints out [(1a), (2c)], which removed the duplicates.
But when I changed the last Integer.compare to Integer.compare(b.id, a.id) (i.e. switched the positions of a and b), it prints out [(2a), (1a), (2c)]. Clearly the same id 2 appeared twice.
How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?
You're askimg:
How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?
You want the comparator to
remove duplicates based on Obj.id
sort the set by Obj.value and Obj.id
Requirement 1) results in
Function<Obj, Integer> byId = o -> o.id;
Set<Obj> setById = new TreeSet<>(Comparator.comparing(byId));
Requirement 2) results in
Function<Obj, String> byValue = o -> o.value;
Comparator<Obj> sortingComparator = Comparator.comparing(byValue).thenComparing(Comparator.comparing(byId).reversed());
Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);
Let's have a look on the JavaDoc of TreeSet. It says:
Note that the ordering maintained by a set [...] must be consistent with equals if it is to
correctly implement the Set interface. This is so
because the Set interface is defined in terms of the equals operation,
but a TreeSet instance performs all element comparisons using its
compareTo (or compare) method, so two elements that are deemed equal
by this method are, from the standpoint of the set, equal.
The set will be ordered according to the comparator but its elements are also compared for equality using the comparator.
As far as I can see there is no way to define a Comparator which satisfies both requirements. Since a TreeSet is in the first place a Set requirement 1) has to match. To achieve requirement 2) you can create a second TreeSet:
Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);
setByValueAndId.addAll(setById);
Or if you don't need the set itself but to process the elements in the desired order you can use a Stream:
Consumer<Obj> consumer = <your consumer>;
setById.stream().sorted(sortingComparator).forEach(consumer);
BTW:
While it's possible to sort the elements of a Stream according to a given Comparator there is no distinct method taking a Comparator to remove duplicates according to it.
EDIT:
You have two different tasks: 1. duplicate removal, 2. sorting. One Comparator cannot solve both tasks. So what alternatives are there?
You can override equals and hashCode on Obj. Then a HashSet or a Stream can be used to remove duplicates.
For the sorting you still need a Comparator (as shown above). Implementing Comparable just for sorting would result in an ordering which is not "consistent with equals" according to Comparable JavaDoc.
Since a Stream can solve both tasks, it would be my choice. First we override hashCode and equals to identify duplicates by id:
public int hashCode() {
return Integer.hashCode(id);
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Obj other = (Obj) obj;
if (id != other.id)
return false;
return true;
}
Now we can use a Stream:
// instantiating one additional Obj and reusing those from the question
Obj obj3a = new Obj(3, "a");
// reusing sortingComparator from the code above
Set<Obj> set = Stream.of(obja, objb, objc, objd, obj3a)
.distinct()
.sorted(sortingComparator)
.collect(Collectors.toCollection(LinkedHashSet::new));
System.out.println(set); // [(3a), (1a), (2c)]
The returned LinkedHashSet has the semantics of a Set but it also preserved the ordering of sortingComparator.
EDIT (answering the questions from comments)
Q: Why it didn't finish the job correctly?
See it for yourself. Change the last line of your Comparator like follows
int r = result == 0 ? Integer.compare(a.id, b.id) : result;
System.out.println(String.format("a: %s / b: %s / result: %s -> %s", a.id, b.id, result, r));
return r;
Run the code once and then switch the operands of Integer.compare. The switch results in a different comparing path. The difference is when (2a) and (1a) are compared.
In the first run (2a) is greater than (1a) so it's compared with the next entry (2c). This results in equality - a duplicate is found.
In the second run (2a) is smaller than (1a). Thus (2a) would be compared as next with a previous entry. But (1a) is already the smallest entry and there is no previous one. Hence no duplicate is found for (2a) and it's added to the set.
Q: You said one comparator can't finish two tasks, my 1st comparators in fact did both tasks correctly.
Yes - but only for the given example. Add Obj obj3a to the set as I did and run your code. The returned sorted set is:
[(1a), (3a), (2c)]
This violates your requirement to sort for equal values descending by id. Now it's ascending by id. Run my code and it returns the right order, as shown above.
Struggling with a Comparator a time ago I got the following comment: "... it’s a great exercise, demonstrating how tricky manual comparator implementations can be ..." (source)

PriorityQueue is Java does not order descending with custom comparator [duplicate]

This question already has answers here:
Iterating through PriorityQueue doesn't yield ordered results
(3 answers)
Closed 5 years ago.
I'm implementing a sample order book (in Exchange domain) and I'm implementing the buy and the sell sides using PriorityQueue in Java.
Buy side should be descending and sell side should be ascending.
PriorityQueue<ArrayList<Order>> bookSide;
Each side consists of price points, and each point has a list of Orders.
My buy side works fine.
This is my sell side. I want this is to be ordered descending.
sellSide = new PriorityQueue<ArrayList<Order>>(new Comparator<ArrayList<Order>>() {
#Override
public int compare(ArrayList<Order> arg0, ArrayList<Order> arg1) {
// below two conditions are highly unlikely to happen
// as the the elements are added to the list before the list is
// added to the queue.
if (arg0.size() == 0) {
return -1;
}
if (arg1.size() == 0) {
return -1;
}
// all the elements in a list have a similar price
Order o1 = arg0.get(0);
Order o2 = arg1.get(0);
int r = (int) (o1.getPrice() - o2.getPrice());
return r;
}
});
I add 100,100,101 and 99.
When 101 is added, it correctly adds 101 below 100 (list of 100). But when I add 99, it destroys the order and becomes 99,101,100.
I have no clue what is wrong.
Please help me out.
EDIT
This is how I add the elements to the lists. The price is a long.
ArrayList<Order> pricePoint = sidePoints.get(price);
if (pricePoint == null) {
pricePoint = new ArrayList<>();
pricePoint.add(order); // I want the list to be non-empty when adding to queue
bookSide.add(pricePoint);
} else {
pricePoint.add(order);
}
It seems there's a misunderstanding about how PriorityQueue works.
Let's try to clear that up.
But when I add 99, it destroys the order and becomes 99,101,100.
First, an important reminder from the Javadoc of PriorityQueue:
An unbounded priority queue based on a priority heap.
The key term here is heap.
In a heap, elements are not ordered in a sequence.
A heap is a tree-like structure,
where every node is consistently ordered compared to every other node below it.
In other words,
there are no guarantees whatsoever with respect to ordering of nodes at the same level.
A heap in ascending order (min heap) will guarantee that the top element is the smallest.
After you pop the top element,
the next top element will be the smallest of the remaining elements.
And so on.
If you want a list of sorted elements,
you have to build it by popping from the heap one by one.
Alternatively,
you can use just a list,
and sort it using Collections.sort.
As an aside,
and as others pointed out in comments,
the implementation of the compare method violates the contract of the Comparator interface:
when precisely one of a and b is empty,
both compare(a, b) and compare(b, a) returns -1,
which implies that a < b and b < a,
which breaks logic.
The fix is easy, I also simplified a bit the rest of the implementation:
#Override
public int compare(ArrayList<Order> arg0, ArrayList<Order> arg1) {
if (arg0.isEmpty()) {
return -1;
}
if (arg1.isEmpty()) {
return 1;
}
return Integer.compare(arg0.get(0).getPrice(), arg1.get(0).getPrice());
}

Java - Array index out of range - Vector?

I'm testing a method that adds a linked list of hash pairs inside a vector. Although, I'm running into a IndexOutOfBounds but I'm having trouble understanding where the problem exists.
import java.util.*;
class HashPair<K, E> {
K key;
E element;
}
public class Test4<K, E> {
private Vector<LinkedList<HashPair<K, E>>> table;
public Test4(int tableSize) {
if (tableSize <= 0)
throw new IllegalArgumentException("Table Size must be positive");
table = new Vector<LinkedList<HashPair<K, E>>>(tableSize);
}
public E put(K key, E element) {
if (key == null || element == null)
throw new NullPointerException("Key or element is null");
int i = hash(key);
LinkedList<HashPair<K, E>> onelist = table.get(i);
ListIterator<HashPair<K, E>> cursor = onelist.listIterator();
HashPair<K, E> pair;
E answer = null;
while (cursor.hasNext()) {
pair = cursor.next();
if (pair.key.equals(key)) {
answer = pair.element;
pair.element = element;
return answer;
}
}
pair = new HashPair<K, E>();
pair.key = key;
pair.element = element;
onelist.addFirst(pair);
return answer;
}
private int hash(K key) {
return Math.abs(key.hashCode() % table.capacity());
}
public static void main(String[] args) {
Test4<Integer, Integer> obj = new Test4<Integer, Integer>(10);
obj.put(0, 10);
}
}
My compiler says that the problem is here:
LinkedList<HashPair<K, E>> onelist = table.get(i);
From what I understand is that I'm trying to get the table index of i which is a hash value generated from the hash(K key) method. So in my main method if I set the key to 0 as an example? Why is the index out of range?
Here is the exception
Exception in thread "main" 0java.lang.ArrayIndexOutOfBoundsException:
Array index out of range: 0
at java.util.Vector.get(Vector.java:748)
at Test4.put(Test4.java:24)
at Test4.main(Test4.java:55)
The problem here is that you are considering the capacity of a vector to be the number of elements in the vector. This is not what capacity of a collection represents.
The capacity of a collection in the standard Java libraries is the size of the internal array used by that collection. The number of elements in the collection, however, is represented by size.
Whenever an element is added to/removed from such a collection, the size property is modified. This does not affect the capacity of the collection unless the internal array needs to be resized.
The solution: modify hash() to the following:
private int hash(K key) {
return Math.abs(key.hashCode() % table.size());
}
And make sure that the table vector contains at least one element before calling hash and table.get.
I presume that you are creating an implementation of a HashMap with buckets. If you are, then ponder this: How can you go about storing a value in a bucket if there aren't any buckets? You need to have at least one bucket before trying to get a bucket.
It seems your code is getting stuck at line 748, which is:
LinkedList<HashPair<K, E>> onelist = table.get(i);
The description Array index out of range: 0 means you're trying to get an object at slot '0', when there is no such slot available at the time. In short: your vector is empty. And by looking at your code, the reason becomes pretty evident. The only treatment this Vector called table receives before Test4.put() is called gets down to this at line 15:
table = new Vector<LinkedList<HashPair<K, E>>>(tableSize);
So, yes, you're properly creating an object and initializing a variable, you are even specifying a default capacity, but you never added something into your brand new Vector, and both lists and vectors do require to be filled manually with stuff first. Keep on mind that this "capacity" refers to how much stuff is this Vector supposed to hold without need to resize the array it uses internally. It gives me the impression you are trying to create a class whose objects have a behavior like HashMaps, but I can't wrap my mind around the need of using a Vector of LinkedLists of KeyPairs when just a single collection of KeyPairs should be enough unless... wait, what is that hash() method doing? Oh... ohh... oh, I see what you did there.
So, right, the solution. As your Vector is properly created but empty, you need to fill it with whatever it is supposed to hold. In this case, it holds LinkedLists of KeyPairs, so let's fill it with just enough of them to hold the capacity you set through the constructor. This modification to the constructor should do the thing:
public Test4(int tableSize) {
if (tableSize <= 0)
throw new IllegalArgumentException("Table Size must be positive");
table = new Vector<LinkedList<HashPair<K, E>>>(tableSize);
//Prepare the fast lookup table (at least that's what I think it could be called)
for (int i = 0; i < tableSize; i++) {
table.add(new LinkedList<HashPair<K, E>>());
}
}
And that's pretty much it. I even tested it here just to be sure it worked fine after my patch.
Hope this helps you.
PS: Splitting your structure in n pieces to speedup search/store? I like the idea.

TreeSet not Re-Ordering Itself after Values in it are Changed

I have a TreeSet in JAVA which orders a custom datatype called 'Node'. However, the TreeSet does not order itself properly.
A link to my complete implementation is at the bottom. Let me explain my probelm in detail first
Here have a custome datatype 'Node'. It has 2 parameters as such. A character and a number associated to it. Say something like:
pair<char,int> node;
in C++.
Here is the code:
class Node
{
char c;int s;
Node(char x,int y)
{
c=x;s=y;
}
}
I implemented a Comparator for this TreeSet which order the elements in descending order of 's'. So for example we have 2 nodes one is (a,2) and the other is (b,5), the they will be ordered as (b,5);(a,2). However, if their size is the same, the lexicographically smaller character will come first.
TreeSet<Node> ts=new TreeSet<Node>(new Comparator<Node>()
{
#Override
public int compare(Node o1, Node o2)
{
if(o2.s!=o1.s)
{
return o2.s-o1.s;
}
return (o1.c-o2.c);
}
});
Now i input the values as Following:
ts.add(new Node('a',3));
ts.add(new Node('b',2));
ts.add(new Node('c',1));
Now here is the operation I want to perform:
Pick the element at the head of the TreeSet. Call it a node. Print it
Check for 's' or the number associated with this node. Reduce it by 1
If s becomes 0, remove the node all together from the TreeSet
Keep performing this expression till the TreeSet is empty
Here is my implementaion:
while(ts.size()!=0)
{
Node node=ts.first();
System.out.println(node.c+" "+node.s);//Printing the character and the number associated with it
if(node.s==1)
{
ts.pollFirst();//removing redundant node as reducing it by 1 will become 0
}
else
{
ts.first().s--;//Reducing it's size
}
}
Well here is my probelem:
Current Output: Expected Output:
a 3 a 3
a 2 a 2
a 1 //In this iteration, b has the largest number b 2
b 2 a 1
b 1 b 1
c 1 c 1
Why is the TreeSet not reordering itself and behaving more like a List? How do I rectify this?
I've gotten used to the idea of a 'Comparator' quite recently so please excuse me for a few goof ups in the implementation if any.
Here is the complete implementation: http://ideone.com/As35FO
The TreeSet won't reorder itself when you change the value of the elements. You have to take the elements out and re-add them if you want them to remain ordered:
Node first = ts.pollFirst();
first.s--;
ts.add(first);

Efficient search for not empty intersection (Java)

I have a method that returns an integer value or integer range (initial..final) and I want to know if values are all disjoint.
Is there a more efficient solution than the following one:
ArrayList<Integer> list = new ArrayList<Integer>();
// For single value
int value;
if(!list.contains(value))
list.add(value);
else
error("",null);
// Range
int initialValue,finalValue;
for(int i = initialValue; i <= finalValue; i++){
if(!list.contains(i))
list.add(i);
else
error("",null);
}
Finding a value (contains) in HashSet is a constant-time operation (O(1)) on average, which is better than a List, where contains is linear (O(n)). So, if your lists are large enough, it may be worthwhile to replace your first line with:
HashSet<Integer> list = new HashSet<Integer>();
The reason for this is that to find a value in an (unsorted) list, you need to check every index in the list until you find the one you want or run out of indexes to check. On average you'll check half the list before finding a value if the value is in the list, or the whole list if it's not. For a hash table, you generate an index from the value you want to find, then you check that one index (it's possible you need to check more than one, but it should be uncommon in a well-designed hash table).
Also, if you use a Set, you get a guarantee that each value is unique, so if you try to add a value that already exists, add will return false. You can use that to slightly simplify the code (note: This will not work if you use a List, because add always returns true on a List):
HashSet<Integer> list = new HashSet<Integer>();
int value;
if(!list.add(value))
error("",null);
Problems involving ranges often lend themselves to the use of a tree. Here's a way to do that using TreeSet:
public class DisjointChecker {
private final NavigableSet<Integer> integers = new TreeSet<Integer>();
public boolean check(int value) {
return integers.add(value);
}
public boolean check(int from, int to) {
NavigableSet<Integer> range = integers.subSet(from, true, to, true);
if (range.isEmpty()) {
addRange(from, to);
return true;
}
else {
return false;
}
}
private void addRange(int from, int to) {
for (int i = from; i <= to; ++i) {
integers.add(i);
}
}
}
Here, rather than calling an error handler, the check methods return a boolean indicating whether the arguments were disjoint from all previous arguments. The semantics of the range version are different to in the original code; if the range is not disjoint, none of the elements are added, whereas in the original, any below the first non-disjoint element are added.
A few points may deserve elaboration:
Set::add returns a boolean indicating whether the addition modified the set; we can use that as the return value from the method.
NavigableSet is an obscure but standard subinterface of SortedSet which is sadly neglected. Although you could actually use a plain SortedSet here with only minor modifications.
The NavigableSet::subSet method (like SortedSet::subSet) returns a lightweight view on the underlying set which is restricted to a given range. This provides a very efficient way to query the tree for any overlap with the whole range in one operation.
The addRange method here is very simple, and runs in O(m log n) when adding m items to a checker which has seen n items previously. It would be possible to make a version which ran in O(m) by writing an implementation of SortedSet which described a range of integers and then using Set::addAll, because TreeSet's implementation of this contains a special case for adding other SortedSets in linear time. The code for that special set implementation is very simple, but involves a lot of boilerplate, so i leave it as an exercise for the reader!

Categories