Java: Retrieving an element from a HashSet - java

Why cannot I retrieve an element from a HashSet?
Consider my HashSet containing a list of MyHashObjects with their hashCode() and equals() methods overridden correctly. I was hoping to construct a MyHashObject myself, and set the relevant hash code properties to certain values.
I can query the HashSet to see if there "equivalent" objects in the set using the contains() method. So even though contains() returns true for the two objects, they may not be == true.
How come then there isn’t any get() method similar to how the contains() works?
What is the thinking behind this API decision?

If you know what element you want to retrieve, then you already have the element. The only question for a Set to answer, given an element, is whether it contains() it or not.
If you want to iterator over the elements, just use a Set.iterator().
It sounds like what you're trying to do is designate a canonical element for an equivalence class of elements. You can use a Map<MyObject,MyObject> to do this. See this Stack Overflow question or this one for a discussion.
If you are really determined to find an element that .equals() your original element with the constraint that you must use the HashSet, I think you're stuck with iterating over it and checking equals() yourself. The API doesn't let you grab something by its hash code. So you could do:
MyObject findIfPresent(MyObject source, HashSet<MyObject> set)
{
if (set.contains(source)) {
for (MyObject obj : set) {
if (obj.equals(source))
return obj;
}
}
return null;
}
It is brute-force and O(n) ugly, but if that's what you need to do...

You can use HashMap<MyHashObject, MyHashObject> instead of HashSet<MyHashObject>.
Calling containsKey() on your "reconstructed" MyHashObject will first hashCode() - check the collection, and if a duplicate hashcode is hit, finally equals() - check your "reconstructed" against the original, at which you can retrieve the original using get()
Complexity is O(1) but the downside is you will likely have to override both equals() and hashCode() methods.

It sounds like you're essentially trying to use the hash code as a key in a map (which is what HashSets do behind the scenes). You could just do it explicitly, by declaring HashMap<Integer, MyHashObject>.
There is no get for HashSets because typically the object you would supply to the get method as a parameter is the same object you would get back.

If you know the order of elements in your Set, you can retrieve them by converting the Set to an Array. Something like this:
Set mySet = MyStorageObject.getMyStringSet();
Object[] myArr = mySet.toArray();
String value1 = myArr[0].toString();
String value2 = myArr[1].toString();

The idea that you need to get the reference to the object that is contained inside a Set object is common. It can be archived by 2 ways:
Use HashSet as you wanted, then:
public Object getObjectReference(HashSet<Xobject> set, Xobject obj) {
if (set.contains(obj)) {
for (Xobject o : set) {
if (obj.equals(o))
return o;
}
}
return null;
}
For this approach to work, you need to override both hashCode() and equals(Object o) methods
In the worst scenario we have O(n)
Second approach is to use TreeSet
public Object getObjectReference(TreeSet<Xobject> set, Xobject obj) {
if (set.contains(obj)) {
return set.floor(obj);
}
return null;
}
This approach gives O(log(n)), more efficient.
You don't need to override hashCode for this approach but you have to implement Comparable interface. ( define function compareTo(Object o)).

One of the easiest ways is to convert to Array:
for(int i = 0; i < set.size(); i++) {
System.out.println(set.toArray()[i]);
}

If I know for sure in my application that the object is not used in search in any of the list or hash data structure and not used equals method elsewhere except the one used indirectly in hash data structure while adding. Is it advisable to update the existing object in set in equals method. Refer the below code. If I add the this bean to HashSet, I can do group aggregation on the matching object on key (id). By this way I am able to achieve aggregation functions such as sum, max, min, ... as well. If not advisable, please feel free to share me your thoughts.
public class MyBean {
String id,
name;
double amountSpent;
#Override
public int hashCode() {
return id.hashCode();
}
#Override
public boolean equals(Object obj) {
if(obj!=null && obj instanceof MyBean ) {
MyBean tmpObj = (MyBean) obj;
if(tmpObj.id!=null && tmpObj.id.equals(this.id)) {
tmpObj.amountSpent += this.amountSpent;
return true;
}
}
return false;
}
}

First of all, convert your set to an array. Then, get the item by indexing the array.
Set uniqueItem = new HashSet();
uniqueItem.add("0");
uniqueItem.add("1");
uniqueItem.add("0");
Object[] arrayItem = uniqueItem.toArray();
for(int i = 0; i < uniqueItem.size(); i++) {
System.out.println("Item " + i + " " + arrayItem[i].toString());
}

If you could use List as a data structure to store your data, instead of using Map to store the result in the value of the Map, you can use following snippet and store the result in the same object.
Here is a Node class:
private class Node {
public int row, col, distance;
public Node(int row, int col, int distance) {
this.row = row;
this.col = col;
this.distance = distance;
}
public boolean equals(Object o) {
return (o instanceof Node &&
row == ((Node) o).row &&
col == ((Node) o).col);
}
}
If you store your result in distance variable and the items in the list are checked based on their coordinates, you can use the following to change the distance to a new one with the help of lastIndexOf method as long as you only need to store one element for each data:
List<Node> nodeList;
nodeList = new ArrayList<>(Arrays.asList(new Node(1, 2, 1), new Node(3, 4, 5)));
Node tempNode = new Node(1, 2, 10);
if(nodeList.contains(tempNode))
nodeList.get(nodeList.lastIndexOf(tempNode)).distance += tempNode.distance;
It is basically reimplementing Set whose items can be accessed and changed.

If you want to have a reference to the real object using the same performance as HashSet, I think the best way is to use HashMap.
Example (in Kotlin, but similar in Java) of finding an object, changing some field in it if it exists, or adding it in case it doesn't exist:
val map = HashMap<DbData, DbData>()
val dbData = map[objectToFind]
if(dbData!=null){
++dbData.someIntField
}
else {
map[dbData] = dbData
}

Related

Best way to traverse and find an object field from a list

I have a list of Custom object and i want to find an object by given an Id(a field in custom object). i was coding for this so i found two solutions when comparing fields.
1
private Product getProduct(String productId,List<Product> productList){
for (int i = 0; i < productList.size(); i++) {
if (productId.equals(productList.get(i).getId())) {
return productList.get(i);
}
}
return null;
}
2.
private Product getProduct(String productId,List<Product> productList){
for (int i = 0; i < productList.size(); i++) {
if (productList.get(i).getId().equals(productId)) {
return productList.get(i);
}
}
return null;
}
The difference is in if condition , i want to know which one is better than the other and why, when to use 1st method and when to use second ?
Since equals() is required by Java to be symmetric, there is no difference between the two snippets.
Both snippets are sub-optimal, in that they iterate by numeric index, and retrieve productList.get(i) twice before returning it. Iterating by index is especially dangerous, because passing a LinkedList<Product> will slow down your search considerably.
A better approach is to use a for-each form of the loop:
for (Product p : productList) {
if (p.getId().equals(productId)) {
return p;
}
}
return null;
The concern in both of your implementations is the possibility of calling .equals on a null value.
If you can guarantee neither of them are null then they are equivalent.
If you are using Java 8, stream may be a better choice.
private Product getProduct(String productId,List<Product> productList){
return products.stream()
.filter(p-> productId.equals(p.getId())
.findFirst()
.orElse(null);
When you are sure the product id's are never null it doesn't really matter.
But in general it's always good to program in a defensive way, so for example prefer using
"SomeString".equals(aString)
instead of
aString.equals("SomeString")
since you know "SomeString" is never null.
Or use
Objects.equals(object1, object2)
when both objects might be null.
The first one invokes equals on the parameter productId, while the second one invokes equals on the current list element from productList. The result is the same because equals is symmetric:
for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
You can also use a stream for this, so you don't have to care about implementation details (furthermore, Objects#equals(Object, Object) is null-safe):
String p = productList.stream().filter(e -> Objects.equals(e, productId))
.findFirst()
.orElse(null);
Have a look a this question for further information.

How to implement efficient hash cons with java HashSet

I am trying to implement a hash cons in java, comparable to what String.intern does for strings. I.e., I want a class to store all distinct values of a data type T in a set and provide an T intern(T t) method that checks whether t is already in the set. If so, the instance in the set is returned, otherwise t is added to the set and returned. The reason is that the resulting values can be compared using reference equality since two equal values returned from intern will for sure also be the same instance.
Of course, the most obvious candidate data structure for a hash cons is java.util.HashSet<T>. However, it seems that its interface is flawed and does not allow efficient insertion, because there is no method to retrieve an element that is already in the set or insert one if it is not in there.
An algorithm using HashSet would look like this:
class HashCons<T>{
HashSet<T> set = new HashSet<>();
public T intern(T t){
if(set.contains(t)) {
return ???; // <----- PROBLEM
} else {
set.add(t); // <--- Inefficient, second hash lookup
return t;
}
}
As you see, the problem is twofold:
This solution would be inefficient since I would access the hash table twice, once for contains and once for add. But okay, this may not be a too big performance hit since the correct bucket will be in the cache after the contains, so add will not trigger a cache miss and thus be quite fast.
I cannot retrieve an element already in the set (see line flagged PROBLEM). There is just no method to retrieve the element in the set. So it is just not possible to implement this.
Am I missing something here? Or is it really impossible to build a usual hash cons with java.util.HashSet?
I don't think it's possible using HashSet. You could use some kind of Map instead and use your value as key and as value. The java.util.concurrent.ConcurrentMap also happens to posess the quite convenient method
putIfAbsent(K key, V value)
that returns the value if it is already existent. However, I don't know about the performance of this method (compared to checking "manually" on non-concurrent implementations of Map).
Here is how you would do it using a HashMap:
class HashCons<T>{
Map<T,T> map = new HashMap<T,T>();
public T intern(T t){
if (!map.containsKey(t))
map.put(t,t);
return map.get(t);
}
}
I think the reason why it is not possible with HashSet is quite simple: To the set, if contains(t) is fulfilled, it means that the given t also equals one of the t' in the set. There is no reason for being able return it (as you already have it).
Well HashSet is implemented as HashMap wrapper in OpenJDK, so you won't win in memory usage comparing to solution suggested by aRestless.
10-min sketch
class HashCons<T> {
T[] table;
int size;
int sizeLimit;
HashCons(int expectedSize) {
init(Math.max(Integer.highestOneBit(expectedSize * 2) * 2, 16));
}
private void init(int capacity) {
table = (T[]) new Object[capacity];
size = 0;
sizeLimit = (int) (capacity * 2L / 3);
}
T cons(#Nonnull T key) {
int mask = table.length - 1;
int i = key.hashCode() & mask;
do {
if (table[i] == null) break;
if (key.equals(table[i])) return table[i];
i = (i + 1) & mask;
} while (true);
table[i] = key;
if (++size > sizeLimit) rehash();
return key;
}
private void rehash() {
T[] table = this.table;
if (table.length == (1 << 30))
throw new IllegalStateException("HashCons is full");
init(table.length << 1);
for (T key : table) {
if (key != null) cons(key);
}
}
}

How to get object from Map with different key?

I have a map that stores my ClassA as a key and some exception as a value.
I also have a list that contains ClassB objects.
ClassA has an entry X (Long) and ClassB has an entry Y(ClassY) and it has field X (String) too.
Now I should find in map where ClassA.getX == ClassB.getY.getX
But problem is I can search ın map only by key and key object must be ClassA. Otherwise it returns null.
Here is my iteration:
list = listModelNewSc;
for (int i = 0; i < exceptionMap.size(); i++) {
for (int k = 0; k < list.getSize(); k++) {
if (((ClassA) exceptionMap.get(i)).getX() == Long
.parseLong((((CLassB) list.getElementAt(k)).getY().getX()))) {
Listitem itemAtIndex = list.getItemAtIndex(i);
if (itemAtIndex != null) {
System.out.print("FOUND");
}
}
}
}
The only way to do this with the setup you described is to iterate through all the keys in the map until you find the one you want.
Alternatively, you could have have a second map with ClassA.getX as the key (mapping to the same value).
Essentially it's a trade off, the first solution is slower but uses less memory, the second solution is faster but uses more memory (you have two copies of the map).
Iterate over keyset or entryset.
A solution, haven't tried but tell me if it works :)
As far as I know when you call Map.get(...) it uses the keys' equals(o) and hashCode() methods. As a solution to your problem you could override these methods of ClassA in a kinda "wrong" way, like this:
#Override
public int hashCode() {
// If you know that the code of a key instance has no logic value anywhere else, give constant value here and in ClassB
return 99827;
}
#Override
public boolean equals(Object o) {
// This will (maybe) make Map.get(ClassB) work with ClassA as a key
if (o instanceof ClassB) {
ClassB cb = (ClassB) o;
return this.getX() == cb.getY().getX();
} else if (o instanceof ClassA) {
// ...
}
return false;
}
So then you'll just do exceptionMap.get(classB) and hopefully get the exception.

What is the fastest way to compare two sets in Java?

I am trying to optimize a piece of code which compares elements of list.
Eg.
public void compare(Set<Record> firstSet, Set<Record> secondSet){
for(Record firstRecord : firstSet){
for(Record secondRecord : secondSet){
// comparing logic
}
}
}
Please take into account that the number of records in sets will be high.
Thanks
Shekhar
firstSet.equals(secondSet)
It really depends on what you want to do in the comparison logic... ie what happens if you find an element in one set not in the other? Your method has a void return type so I assume you'll do the necessary work in this method.
More fine-grained control if you need it:
if (!firstSet.containsAll(secondSet)) {
// do something if needs be
}
if (!secondSet.containsAll(firstSet)) {
// do something if needs be
}
If you need to get the elements that are in one set and not the other.
EDIT: set.removeAll(otherSet) returns a boolean, not a set. To use removeAll(), you'll have to copy the set then use it.
Set one = new HashSet<>(firstSet);
Set two = new HashSet<>(secondSet);
one.removeAll(secondSet);
two.removeAll(firstSet);
If the contents of one and two are both empty, then you know that the two sets were equal. If not, then you've got the elements that made the sets unequal.
You mentioned that the number of records might be high. If the underlying implementation is a HashSet then the fetching of each record is done in O(1) time, so you can't really get much better than that. TreeSet is O(log n).
If you simply want to know if the sets are equal, the equals method on AbstractSet is implemented roughly as below:
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Collection c = (Collection) o;
if (c.size() != size())
return false;
return containsAll(c);
}
Note how it optimizes the common cases where:
the two objects are the same
the other object is not a set at all, and
the two sets' sizes are different.
After that, containsAll(...) will return false as soon as it finds an element in the other set that is not also in this set. But if all elements are present in both sets, it will need to test all of them.
The worst case performance therefore occurs when the two sets are equal but not the same objects. That cost is typically O(N) or O(NlogN) depending on the implementation of this.containsAll(c).
And you get close-to-worst case performance if the sets are large and only differ in a tiny percentage of the elements.
UPDATE
If you are willing to invest time in a custom set implementation, there is an approach that can improve the "almost the same" case.
The idea is that you need to pre-calculate and cache a hash for the entire set so that you could get the set's current hashcode value in O(1). Then you can compare the hashcode for the two sets as an acceleration.
How could you implement a hashcode like that? Well if the set hashcode was:
zero for an empty set, and
the XOR of all of the element hashcodes for a non-empty set,
then you could cheaply update the set's cached hashcode each time you added or removed an element. In both cases, you simply XOR the element's hashcode with the current set hashcode.
Of course, this assumes that element hashcodes are stable while the elements are members of sets. It also assumes that the element classes hashcode function gives a good spread. That is because when the two set hashcodes are the same you still have to fall back to the O(N) comparison of all elements.
You could take this idea a bit further ... at least in theory.
WARNING - This is highly speculative. A "thought experiment" if you like.
Suppose that your set element class has a method to return a crypto checksums for the element. Now implement the set's checksums by XORing the checksums returned for the elements.
What does this buy us?
Well, if we assume that nothing underhand is going on, the probability that any two unequal set elements have the same N-bit checksums is 2-N. And the probability 2 unequal sets have the same N-bit checksums is also 2-N. So my idea is that you can implement equals as:
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Collection c = (Collection) o;
if (c.size() != size())
return false;
return checksums.equals(c.checksums);
}
Under the assumptions above, this will only give you the wrong answer once in 2-N time. If you make N large enough (e.g. 512 bits) the probability of a wrong answer becomes negligible (e.g. roughly 10-150).
The downside is that computing the crypto checksums for elements is very expensive, especially as the number of bits increases. So you really need an effective mechanism for memoizing the checksums. And that could be problematic.
And the other downside is that a non-zero probability of error may be unacceptable no matter how small the probability is. (But if that is the case ... how do you deal with the case where a cosmic ray flips a critical bit? Or if it simultaneously flips the same bit in two instances of a redundant system?)
There is a method in Guava Sets which can help here:
public static <E> boolean equals(Set<? extends E> set1, Set<? extends E> set2){
return Sets.symmetricDifference(set1,set2).isEmpty();
}
There's an O(N) solution for very specific cases where:
the sets are both sorted
both sorted in the same order
The following code assumes that both sets are based on the records comparable. A similar method could be based on on a Comparator.
public class SortedSetComparitor <Foo extends Comparable<Foo>>
implements Comparator<SortedSet<Foo>> {
#Override
public int compare( SortedSet<Foo> arg0, SortedSet<Foo> arg1 ) {
Iterator<Foo> otherRecords = arg1.iterator();
for (Foo thisRecord : arg0) {
// Shorter sets sort first.
if (!otherRecords.hasNext()) return 1;
int comparison = thisRecord.compareTo(otherRecords.next());
if (comparison != 0) return comparison;
}
// Shorter sets sort first
if (otherRecords.hasNext()) return -1;
else return 0;
}
}
You have the following solution from https://www.mkyong.com/java/java-how-to-compare-two-sets/
public static boolean equals(Set<?> set1, Set<?> set2){
if(set1 == null || set2 ==null){
return false;
}
if(set1.size() != set2.size()){
return false;
}
return set1.containsAll(set2);
}
Or if you prefer to use a single return statement:
public static boolean equals(Set<?> set1, Set<?> set2){
return set1 != null
&& set2 != null
&& set1.size() == set2.size()
&& set1.containsAll(set2);
}
If you are using Guava library it's possible to do:
SetView<Record> added = Sets.difference(secondSet, firstSet);
SetView<Record> removed = Sets.difference(firstSet, secondSet);
And then make a conclusion based on these.
I would put the secondSet in a HashMap before the comparison. This way you will reduce the second list's search time to n(1). Like this:
HashMap<Integer,Record> hm = new HashMap<Integer,Record>(secondSet.size());
int i = 0;
for(Record secondRecord : secondSet){
hm.put(i,secondRecord);
i++;
}
for(Record firstRecord : firstSet){
for(int i=0; i<secondSet.size(); i++){
//use hm for comparison
}
}
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof Set))
return false;
Set<String> a = this;
Set<String> b = o;
Set<String> thedifference_a_b = new HashSet<String>(a);
thedifference_a_b.removeAll(b);
if(thedifference_a_b.isEmpty() == false) return false;
Set<String> thedifference_b_a = new HashSet<String>(b);
thedifference_b_a.removeAll(a);
if(thedifference_b_a.isEmpty() == false) return false;
return true;
}
I think method reference with equals method can be used. We assume that the object type without a shadow of a doubt has its own comparison method. Plain and simple example is here,
Set<String> set = new HashSet<>();
set.addAll(Arrays.asList("leo","bale","hanks"));
Set<String> set2 = new HashSet<>();
set2.addAll(Arrays.asList("hanks","leo","bale"));
Predicate<Set> pred = set::equals;
boolean result = pred.test(set2);
System.out.println(result); // true

Duplicates in Arraylist, comparing various fields java

I have a code to return an arrayList with the duplicates of an ArrayList
but seems it's not working, I am comparing all items in the array...
public ArrayList<ObjectList> duplicates(ArrayList<ObjectList> someObjectsList) {
ArrayList<ObjectList> ret = new ArrayList<ObjectList>();
for ( ObjectList aSomeObjectsList: someObjectsList) {
String field1 = aSomeObjectsList.get1();
String field2 = aSomeObjectsList.get2();
String field3 = aSomeObjectsList.get3();
String field4 = aSomeObjectsList.get4();
for (ObjectList someObject : ret) {
if (
field1.trim().equals(someObject.get1())&&
field2.trim().equals(someObject.get2())&&
field3.trim().equals(someObject.get3())&&
field4.trim().equals(someObject.get4())
){
ret.add(aSomeObjectsList);
}
}
}
return ret;
}
But i guess I am doing something wrong because it doesn't return anything, and I know it has duplictates under this 4 field criteria
Thanks in advance
for (Object someObject : ret) {
if (
field1.trim().equals(someObject.get1())&&
field2.trim().equals(someObject.get2())&&
field3.trim().equals(someObject.get3())&&
field4.trim().equals(someObject.get4())
){
ret.add(aSomeObjectsList);
}
}
The above loop wouldn't work, since it has the size of zero.
Here you go,
public Set<ObjectList> duplicates(ArrayList<ObjectList> someObjectsList) {
Set<ObjectList> originals = new HashSet<ObjectList>();
Set<ObjectList> duplicates = new HashSet<ObjectList>();
for ( ObjectList aSomeObjectsList: someObjectsList) {
boolean added = originals.add(aSomeObjectsList);
if(!added){
duplicates.add(aSomeObjectsList);
}
}
return duplicates;
}
This would work, provided your ObjectList class have the correct implementation of hashCode() and equals() methods.
Disclaimer: This implementation will not provide the information about how many times a particular object was duplicated in the provided list. It will just tell you that a particular object was duplicated. I assumed that that was your real intention. If you wanna count, how many times, you have to modify the code accordingly.
Hint/Suggestion: You should override the equals() method and place your field equality check in there instead, once and for all.
This shouldn't compile - if aSomeObjectsList is an Object then it doesn't have methods get1(), get2(), etc.
Your logic won't work because you aren't checking each element in your input List against the other elements in the input List; rather, you're trying to check the return List.
Also, this is not a really efficient way to check for duplicates in a collection. A better way would be to use a HashMap, where you could check set membership in roughly constant time. If you have to use a List, then sort it first (assuming your objects have a natural ordering) and check adjacent members for equality.
Barring those two, just use List.contains().
Here's a way you can do this. I have defined a basic class ObjectList that shows a way to implement equals and hashCode. Note that this assumes that all the internal variables are non-null. If these variables can contain null then you will need to check for that when computing the equals/hashCode. Also, the objects in this class must also themselves properly implement equals/hashCode.
public class ObjectList {
private int h;
private Object obj1;
private Object obj2;
private Object obj3;
private Object obj4;
#Override
public boolean equals(final Object o) {
if (!(o instanceof ObjectList))
return false;
final ObjectList that = (ObjectList) o;
return that.obj1.equals(obj1) && that.obj2.equals(obj2)
&& that.obj3.equals(obj3) && that.obj4.equals(obj4);
}
#Override
public int hashCode() {
// caches the hashcode since it could be costly to recompute every time
// but this assumes that your object is essentially immutable
// (which it should be if you are using equals/hashCode. If this is not
// true and you want to just temporarily use this when doing the duplicate
// test, move the h variable definition from the object level to this method
// and remove this if statement.
if (h != 0)
return h;
h = obj1.hashCode();
h = h * 31 + obj2.hashCode();
h = h * 31 + obj3.hashCode();
h = h * 31 + obj4.hashCode();
return h;
}
}
public Collection<ObjectList> duplicates(
final Collection<ObjectList> someObjectsList) {
final Set<ObjectList> unique = new HashSet<ObjectList>(someObjectsList);
final ArrayList<ObjectList> ret = new ArrayList<ObjectList>(someObjectsList);
for (final ObjectList o : unique) {
ret.remove(o);
}
// The ret list now contains the duplicate instances; instances
// with more than two occurrences will occur multiple times still in
// this list.
return ret;
// If you want a list of unique duplicate instances then, comment out the above
// return and uncomment this one.
// return new HashSet<ObjectList>(ret);
}
Using Collection<ObjectList> is better, if you can do that, for both the parameter and returned value so you can vary the implementations (ArrayList, Set, etc).

Categories