How hashset checks for duplicate elements?

How hashset checks for duplicate elements? - java

Kindly look into my code :
HashSet<A> set = new HashSet<A>();
for (int i = 0; i < 10; i++)
set.add(new A());
System.out.println(set.contains(new A()));
Class A is defined as :
class A {
public boolean equals(Object o) {
return true;
}
public int hashCode() {
return (int) (Math.random()%100);
}
}
If hashset uses hashmap inside......why is the output true ?
Because different hashcodes means their bucket location is different .
So how checking for new A() returns true .
Also if I return 1 always from hashcode output is true which seems ok.

The reason is your hashcode function:
(int) (Math.random()%100);
always returns 0. So all A elements always have the same hashcode. Therefore all A elements will be in the same bucket in the HashSet so since your equals will always return true. As soon as it finds an A in the same bucket (in this case always) it will return true that that A is alreay contained.
Math.random() returns a number between 0 and 1 so that modulo anything will always be0.
you probably meant do to * instead of % to get random numbers between 0 and 100
(int) (Math.random() * 100);
Does what you want

HashSet uses equals() on all objects with the same hash bucket to determine contains(). Because equals() is always true, it doesn't matter which bucket the new A matches, but all objects will be in the same bucket because (int)(Math.random() % 100) is always 0.
Try changing your hash to:
(int)(Math.random() * 100)

Related

Bad Hash Function [duplicate]

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.
For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?
For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?
Did I interpret it right in this class:
public MyClass{
long a, b, c; // these are the only fields
//some code and methods
public int hashCode(){
return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32)))
+ (int) (c ^ (c >>> 32));
}
}

The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().
Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (int) (a ^ (a >>> 32));
result = prime * result + (int) (b ^ (b >>> 32));
result = prime * result + (int) (c ^ (c >>> 32));
return result;
}

A well-behaved hashcode method already exists for long values - don't reinvent the wheel:
int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+
int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8
Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.
The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

Workaround when hashcode crosses the integer boundary

I have a POJO having ~450 fields and I'm trying to compare instances of this POJO using hascode. I've generated the overridden hashCode() method with eclipse. In quite a few cases the generated hashcode is crossing the integer boundary. As a result, it's getting difficult to perform the comparison. What's the workaround?
The hashCode() method is as follows:
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((stringOne == null) ? 0 : stringOne.hashCode());
result = prime * result + intOne;
result = prime * result + Arrays.hashCode(someArray);
result = prime * result + ((stringTwo == null) ? 0 : stringTwo.hashCode());
result = prime * result + intTwo;
result = prime * result + intThree;
result = prime * result + ((stringThree == null) ? 0 : stringThree.hashCode());
result = prime * result + ((stringFour == null) ? 0 : stringFour.hashCode());
result = prime * result + ((stringFive == null) ? 0 : stringFive.hashCode());
result = prime * result + ((objectOne == null) ? 0 : objectOne.hashCode());
result = prime * result + ((objectTwo == null) ? 0 : objectTwo.hashCode());
return result;
}

Integer overflow is a normal part of hashCode() calculations. It is not a problem.
For example, the hashCode() of a String is often negative.
System.out.println("The hashCode() of this String is negative".hashCode());
If a hashCode() calculation can overflow, obviously that can mean that unequal Objects can have the same hashCode, but this can happen without overflow. For example, both of these print true.
System.out.println("Aa".hashCode() == "BB".hashCode());
System.out.println(new HashSet<>(Arrays.asList(1, 2)).hashCode() == Collections.singleton(3).hashCode());
The only requirement is that equal objects should have the same hashCode. There is no requirement that different objects should have different hashCodes.
hashCode() and equals() should also be quick. You can improve the performance of equals() by comparing the fields most likely to be different first and returning early. You can't do this with hashCode() because the calculation must involve all the relevant fields. If your class has 450 fields, you may want to consider caching the result of hashCode() or, better, refactoring your class into smaller units.
The other thing to consider is whether you need to override these methods at all. It is only absolutely necessary if the objects are going to used as keys in a hash based container, such as HashMap.

The workaround is to use a different method to compute the hashcode. For instance, you could xor the hashcodes of your 450 fields (btw: wow!), but without knowing more about your object it's hard to say whether that would be a good approach for your particular case.
Ideally, since hashcodes are used for hashing, objects that are not equal should also with high probability produce different hashcodes.

Why is BigDecimal natural ordering inconsistent with equals?

From the Javadoc for BigDecimal:
Note: care should be exercised if BigDecimal objects are used as keys in a SortedMap or elements in a SortedSet since BigDecimal's natural ordering is inconsistent with equals.
For example, if you create a HashSet and add new BigDecimal("1.0") and new BigDecimal("1.00") to it, the set will contain two elements (because the values have different scales, so are non-equal according to equals and hashCode), but if you do the same thing with a TreeSet, the set will contain only one element, because the values compare as equal when you use compareTo.
Is there any specific reason behind this inconsistency?

From the OpenJDK implementation of BigDecimal:
/**
* Compares this {#code BigDecimal} with the specified
* {#code Object} for equality. Unlike {#link
* #compareTo(BigDecimal) compareTo}, this method considers two
* {#code BigDecimal} objects equal only if they are equal in
* value and scale (thus 2.0 is not equal to 2.00 when compared by
* this method).
*
* #param x {#code Object} to which this {#code BigDecimal} is
* to be compared.
* #return {#code true} if and only if the specified {#code Object} is a
* {#code BigDecimal} whose value and scale are equal to this
* {#code BigDecimal}'s.
* #see #compareTo(java.math.BigDecimal)
* #see #hashCode
*/
#Override
public boolean equals(Object x) {
if (!(x instanceof BigDecimal))
return false;
BigDecimal xDec = (BigDecimal) x;
if (x == this)
return true;
if (scale != xDec.scale)
return false;
long s = this.intCompact;
long xs = xDec.intCompact;
if (s != INFLATED) {
if (xs == INFLATED)
xs = compactValFor(xDec.intVal);
return xs == s;
} else if (xs != INFLATED)
return xs == compactValFor(this.intVal);
return this.inflate().equals(xDec.inflate());
}
More from the implementation:
* <p>Since the same numerical value can have different
* representations (with different scales), the rules of arithmetic
* and rounding must specify both the numerical result and the scale
* used in the result's representation.
Which is why the implementation of equals takes scale into consideration. The constructor that takes a string as a parameter is implemented like this:
public BigDecimal(String val) {
this(val.toCharArray(), 0, val.length());
}
where the third parameter will be used for the scale (in another constructor) which is why the strings 1.0 and 1.00 will create different BigDecimals (with different scales).
From Effective Java By Joshua Bloch:
The final paragraph of the compareTo contract, which is a strong
suggestion rather than a true provision, simply states that the
equality test imposed by the compareTo method should generally return
the same results as the equals method. If this provision is obeyed,
the ordering imposed by the compareTo method is said to be consistent
with equals. If it’s violated, the ordering is said to be inconsistent
with equals. A class whose compareTo method imposes an order that is
inconsistent with equals will still work, but sorted collections
containing elements of the class may not obey the general contract of
the appropriate collection interfaces (Collection, Set, or Map). This
is because the general contracts for these interfaces are defined in
terms of the equals method, but sorted collections use the equality
test imposed by compareTo in place of equals. It is not a catastrophe
if this happens, but it’s something to be aware of.

The behaviour seems reasonable in the context of arithmetic precision where trailing zeros are significant figures and 1.0 does not carry the same meaning as 1.00. Making them unequal seems to be a reasonable choice.
However from a comparison perspective neither of the two is greater or less than the other and the Comparable interface requires a total order (i.e. each BigDecimal must be comparable with any other BigDecimal). The only reasonable option here was to define a total order such that the compareTo method would consider the two numbers equal.
Note that inconsistency between equal and compareTo is not a problem as long as it's documented. It is even sometimes exactly what one needs.

BigDecimal works by having two numbers, an integer and a scale. The integer is the "number" and the scale is the number of digits to the right of the decimal place. Basically a base 10 floating point number.
When you say "1.0" and "1.00" these are technically different values in BigDecimal notation:
1.0
integer: 10
scale: 1
precision: 2
= 10 x 10 ^ -1
1.00
integer: 100
scale: 2
precision: 3
= 100 x 10 ^ -2
In scientific notation you wouldn't do either of those, it should be 1 x 10 ^ 0 or just 1, but BigDecimal allows it.
In compareTo the scale is ignored and they are evaluated as ordinary numbers, 1 == 1. In equals the integer and scale values are compared, 10 != 100 and 1 != 2. The BigDecimal equals method ignores the object == this check I assume because the intention is that each BigDecimal is treated as a type of number, not like an object.
I would liken it to this:
// same number, different types
float floatOne = 1.0f;
double doubleOne = 1.0;
// true: 1 == 1
System.out.println( (double)floatOne == doubleOne );
// also compare a float to a double
Float boxFloat = floatOne;
Double boxDouble = doubleOne;
// false: one is 32-bit and the other is 64-bit
System.out.println( boxInt.equals(boxDouble) );
// BigDecimal should behave essentially the same way
BigDecimal bdOne1 = new BigDecimal("1.0");
BigDecimal bdOne2 = new BigDecimal("1.00");
// true: 1 == 1
System.out.println( bdOne1.compareTo(bdOne2) );
// false: 10 != 100 and 1 != 2 ensuring 2 digits != 3 digits
System.out.println( bdOne1.equals(bdOne2) );
Because BigDecimal allows for a specific "precision", comparing both the integer and the scale is more or less the same as comparing both the number and the precision.
Although there is a semi-caveat to that when talking about BigDecimal's precision() method which always returns 1 if the BigDecimal is 0. In this case compareTo && precision evaluates true and equals evaluates false. But 0 * 10 ^ -1 should not equal 0 * 10 ^ -2 because the former is a 2 digit number 0.0 and the latter is a 3 digit number 0.00. The equals method is comparing both the value and the number of digits.
I suppose it is weird that BigDecimal allows trailing zeroes but this is basically necessary. Doing a mathematical operation like "1.1" + "1.01" requires a conversion but "1.10" + "1.01" doesn't.
So compareTo compares BigDecimals as numbers and equals compares BigDecimals as BigDecimals.
If the comparison is unwanted, use a List or array where this doesn't matter. HashSet and TreeSet are of course designed specifically for holding unique elements.

The answer is pretty short. equals() method compares objects while compareTo() compares values. In case of BigDecimal different objects can represent same value. Thats why equals() might return false, while compareTo() returns 0.
equal objects => equal values
equal values =/> equal objects
Object is just a computer representation of a some real world value. For example same picture might be represented in a GIF and JPEG formats. Thats very like BigDecimal, where same value might have distinct representations.

How to check if an array has two different pairs of matching values?

public static boolean hasTwoPair(int[] arrayOfInts){
}
The method is supposed to return true if it can find two different pairs of matching int values. So if the array was {2,2,4,7,7}, it should return true because it has two 2s and two 7s.
It only applies to different pair values though. If it was {2,2,2,2,5}, it would return false because they are not different pair values.
EDIT: This is what I have so far for the body of the method:
boolean pairFound = false;
int pairValue;
for(int s = 0; s < arrayOfInts.length - 1; s++){
pairValue = arrayOfInts[s];
for(int c = s + 1; c < arrayOfInts.length; c++){
if(pairValue == arrayOfInts[c])
pairFound = true;
}
}
return false; //placeholder
I'm not sure where to go from here.

Since you haven't actually tried any code, I'll give a description of how to solve this problem, but no actual code.
Start with a boolean like pairFound that's initialized to false and change to true when you find your first pair. Additionally, you'll need an int (pairValue to keep track of the value of the first pair found (if you found one).
Iterate through, looking for a pair. If you find a pair, and pairFound is false, set pairFound to true, and set pairValue to the value of your first found pair. Now keep iterating through.
If you find a pair and pairFound is true and the pair is != pairValue, then return true;. If you iterate through everything and haven't returned true yet, then you can return false.
Based on your updated question, you're pretty close.
boolean pairFound = false;
int pairValue = Integer.MIN_VALUE;
//or some value that arrayOfInts will never have based on context
for(int s = 0; s < arrayOfInts.length - 1; s++){
if(arrayOfInts[s] == pairValue) {
continue;
}
for(int c = s + 1; c < arrayOfInts.length; c++){
if(arrayOfInts[s] == arrayOfInts[c]) {
if(arrayOfInts[s] != pairValue) {
if(pairFound) {
return true;
}
pairValue = arrayOfInts[s];
pairFound = true;
break;
}
}
}
}
return false;

This task asks you to build a list of counts:
Create a data structure (say, a Map<Integer,Integer>) containing a count for each number from the array.
Go through the map, and count the number of entries with the count of two and above.
If you counted two or more items, return true; otherwise, return false.
The counts for your first example would look like this:
V #
- -
2 - 2
4 - 1
7 - 2
You have two items (2 and 7) with counts of 2, so return true.
The counts for your second example look like this:
V #
- -
2 - 4
5 - 1
There is only one item with the count above 2, so return false.
If you use a HashMap, this algorithm produces an answer in O(n).

Sort the array. Then look for the first two consecutive values that match. If found, skip to the first entry that is different from the matched pair and repeat. If you reach the end of the array before either the first or second search succeeds, return false.

You don't need to sort the array, that would give O(n*log(n)) complexity. But your current solution is even worse since it's yielding O(n^2) complexity. But you also don't need a HashMap. A Set and an integer is enough:
For each array element do
Check: is the element already in the set?
If not put it into the set
If yes check if it's equal to your last found Int pair (yes, use a boxed Int, not a primitive int here, to be able to initialize it with null)
If it's equal continue
If it's not equal
If the last found pair is null set it to elements value
Otherwise you're done and you have at least 2 different pairs
If you iterated over the list without reaching the last condition you don't have 2 different pairs
As for the Set implementation I would recommend a HashSet
There is more to say here though. If you implement it like this you make no assumption about integers and indexable arrays. All you need is a list of comparable elements. So you could make the signature of your method much more generic:
public static boolean hasTwoPair(Iterable<E> iterable)
But since arrays don't support generics and the Iterable interface every client of this method would have to transform Array parameters to an Iterable with the asList() method. If that's too inconvenient you can provide another method:
public static <T> boolean hasTwoPair(T[] array)
In general it's a good idea to use the most generic type possible when you design API's (also see Item 40 and 52 in "Effective Java, Second Edition")

Searching for a different instance of the same exact object, within a set

I have a set, in the World class, for an object called collidable:
Set<Collidable> collidables = new HashSet<Collidable>();
While trying to develop a collision detection system (for a ball), I made two for loops, for X and Y.
cboxX = (int) Math.floor(position.x - RADIUS);
cboxY = (int) Math.floor(position.y - RADIUS);
cboxW = Math.abs((int) Math.ceil(nextPosition.x + RADIUS) - (int) Math.floor(position.x - RADIUS));
cboxH = Math.abs((int) Math.ceil(nextPosition.y + RADIUS) - (int) Math.floor(position.y - RADIUS));
for (int x = cboxX; x <= cboxW + cboxX - 1; x++)
{
for (int y = cboxY; y <= cboxH + cboxY; y++)
{
}
}
Everything is good here. However, inside the for loop, I am trying to check for collidables with x and y parameters, but due to the fact that I am creating a new instance of a collidable (albeit with the exact same parameters as one that was previously generated), it will always turn up false:
world.collidables.add(new Block(new Vector2(x, y)));
System.out.println(world.collidables.contains(new Block(new Vector2(x, y)))); //returns false
However, if I use the same instance of block, it will turn up true:
Block b = new Block(new Vector2(x, y))
world.collidables.add(b);
System.out.println(world.collidables.contains(b)); //returns true
This is unacceptable however, as the entire reason for having two for loops was to not have to iterate over every collidable, every update.
What I'm asking is, does anyone know of a way to get whether a collidable is at the location I am specifying, without having to iterate over the entire set?

You need to provide you own implementation of two methods:
int hashCode()
boolean equals(Object o)
These methods should be implemented such that c1.hashCode() == c2.hashCode() if and only if the attributes (vectors) of the two instances are equal. At the same time hashCode() must be consistent with equals(Object o) in the sense stated by documentation:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How hashset checks for duplicate elements? - java

Related

Bad Hash Function [duplicate]

Workaround when hashcode crosses the integer boundary

Why is BigDecimal natural ordering inconsistent with equals?

How to check if an array has two different pairs of matching values?

Searching for a different instance of the same exact object, within a set

Categories

Resources