String manipulated through reflection and its impact on equals method - java

It prints true for both of the following print statements in the sample code. I understand, its as per the logic of equals method of String class as:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
...
}
But I am unable to figure out how their hashcode remains unchanged. Does the condition, this == anObject has any relationship with the hashCode method of String class? If yes then how are they equal.
Please help me to understand this.
It is true that value of a string can be modified through reflection(where it losses its immutability nature). But in this case the hashcode remains unchanged. Why?
import java.lang.reflect.Field;
public class StringHacker {
public static void main(String[] args) throws Exception {
String myMonth = "January";
char[] yourMonth = {'M', 'a', 'y'};
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
value.set(myMonth, yourMonth);
System.out.println(myMonth.equals("January"));
System.out.println(myMonth.equals("May"));
}
}
The output is:
true
true

But in this case the hashcode remains unchanged. Why?
The answer is that String::hashCode caches its result in a private field. So if you do this:
String s = /* create string */
int hash = s.hashcode();
/* use reflection to mutate string */
int hash2 = s.hashCode();
you will find that hash and hash2 are the same value. This is just one more reason why it is a bad idea to use reflection to mutate strings.
(But if you read the code for String you can see how hashCode is implemented and then use reflection to clear the cached hashcode value.)

The hashcode does not change because String is an immutable class.
That means by contract its value will not change. As the same value must always have the same hashcode, there is no need ever to change the hashcode. Even worse, an object with a hashcode changing over time may get you in big trouble, e.g. when dealing with Set and Map.
An object must not change its hashcode!
If you alter a string's value via reflection you're actively breaking the contract and thus causing undefined, chaotic and possibly catastrophic behaviour.

You mention hashcode in your question, but never call it nor display it's value, nor compare hoshcode values. So to answer your question :
Does the condition, this == anObject has any relationship with the hashCode method of String class?
The answer is an emphatic "no" (other, of course, than the obvious case that two references to the same object will obviously be calling the same method and get returned the same result). Likewise, hashcode() is also not called/considered by the equals() method.
So let's consider ==, equals(), and hashcode(), and how these play out in your example. Firstly, though, it must be mentioned that you are using reflection in a way that it was never intended to be used. There are situations where calling value.set(object, value) is valid and necessary - but changing the value of an immutable class like "String" is not one of them. Upshot is that it's not surprising to get weird results by doing things like that.
Let's start by restating that every object (such as a String) lives at its own location in the computer's memory. For example, consider code like :
String myName = "Fred";
String yourName = "Fred";
String databaseName = fetchNameFromDatabase(); // returns "Fred"
boolean mineIsYours = (myName == yourName); // true
boolean mineIsDatabases = (myName == databaseName); // false
boolean mineEqualsDatabases = myName.equals(databaseName); // true
All 3 Strings will have the same value "Fred" - but there's a neat trick. When the Java compiler compiles the program, it will load all hard-coded strings into the .class file. Since Strings are immutable, it saves some space by creating unique values in a "String pool" - so for my example, "Fred" will only be created ONCE, and myName and yourName will both be pointing to the SAME instance in memory - hence mineIsYours will be true.
Strings created dynamically (eg read from the database) would not use this String pool, so will be different instances even though they may have the same value - hence the importance to test equality using equals() rather than ==.
Can you now see what's happening in your program ? Let's look at a few specific lines :
String myMonth = "January";
"January" is a hard-coded constant, so it's put in the String pool, and myMonth is pointing to the location of that instance in memory.
value.set(myMonth, yourMonth);
The value of myMonth - ie, the value of that instance in memory that myMonth is pointing to - is changed to be "May".
System.out.println(myMonth.equals("January"));
Calls "equals" on myMonth, passing in the instance of the hard-coded String that the Java compiler put into the String pool for "January". However, this instance is THE SAME INSTANCE that myMonth was initialised to (remember my variable mineIsYours was true) !! Yes, THE SAME INSTANCE that you changed the value of to be "May".
So, when you changed myMonth's instance value in the String pool from "January" to "May", you didn't just change it for that one myMonth variable, but for EVERY hard-coded "January" value in the program !
System.out.println(myMonth.equals("May"));
The value of the instance that myMonth is pointing to has been changed to "May", so this is true.
So where is hashcode() used in all this ? As I mentioned earlier, it isn't. Not at all.
From your question, I'm wondering : is your understanding that two objects are equal if their hashcodes match ? If so, no - not at all. There is NO - repeat NO - requirement that hashcodes be unique, which means the idea of "equal if hashcodes match" obviously fails.
The purpose of hashcode() is to give a wide spread of values for different instances of the class. This is used in structures like HashMap,etc to put objects into different "buckets" for quick retrieval.
The implicit connection between equals() and hashcode() is that :
1) where one is created (or rather, overridden), then the other
should be as well, and
2) hashcode() calculation should use the
exact same fields as equals() - no more and no less.

Related

Is == guaranteed to return correct result? [duplicate]

This question already has answers here:
Compare two objects with .equals() and == operator
(16 answers)
Closed 1 year ago.
From what I understand, the == operator in Java compares references (an int) of objects.
This value is what the default implementation of hashCode method in Object returns.
The hashCode method has an implementation note:
As far as is reasonably practical, the hashCode method defined
by class Object returns distinct integers for distinct objects.
reasonably practical: This means that, no matter how small, there is a real possibility that two distinct objects can have equal hashCode or reference value.
So, if I compare two different objects (that don't override hashCode and equals) using ==, it's a real possibility that the result can be true (?). The default implementation of equals does a == check:
public class Test {
public static void main(String[] args) {
var t1 = new Test();
var t2 = new Test();
System.out.println(t1.hashCode() + ":" + t2.hashCode()); // 2055281021:1554547125 (Could've been 1554547125:1554547125 ?)
System.out.println(t1 == t2); // false (Could've been true ?)
System.out.println(t1.equals(t2)); // false (Could've been true ?)
}
}
Why is that equals and hashCode are overridden in certain situations only and rest of the time (many library classes such as Thread) depend on default implementation for equality check when it's not guaranteed to return correct result?
And, how someone extremely risk-averse make sure the above false-positive would never occur? If the class has at least one non-static field, one can override hashCode and equals. But, what if this is not the case (like the Test class above)?
Can you please explain what am I missing here?
Edit 1:
Adding an API note for hashCode (taken form Silvio's answer):
This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language
Okay, there's a lot of questions here, so let's try to break it down.
From what I understand, the == operator in Java compares references (an int) of objects.
This value is what the default implementation of hashCode method in Object returns.
== in Java compares references, yes. Those references are not necessarily compatible with int. On many common architectures, int will probably coincide with most of the observable space that a reference can occupy, but that's not true in general.
In particular.
int is a signed type. That means half of its values are negative. Pointers are generally unsigned.
Even if we ignore the sign problems, int is a 32-bit type. Most modern computers are 64-bit, which means the address space would fit better in a 64-bit integer (i.e. a Java long). So only a small fraction of addresses can even be stored in int.
Second, hashCode is not required to have anything to do with the pointer itself. From the hashCode docs you referenced already
(This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
A Java implementation is free to choose whatever hashCode it wants. Maybe you're running on some bizarre embedded hardware and it makes sense to use some additional flag variable in the computation. hashCode should not be assumed to be the pointer.
Why is that equals and hashCode are overridden in certain situations only and rest of the time (many library classes such as Thread) depend on default implementation for equality check when it's not guaranteed to return correct result?
What is your definition of "correct" here? The guarantees demanded by the Java specification can be summarized from the docs
The equals method implements an equivalence relation on non-null object references:
It is reflexive: for any non-null reference value x, x.equals(x) should return true.
It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
For any non-null reference value x, x.equals(null) should return false.
...
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
The default equals implementation clearly satisfies the basic requirements above, and the default hashCode is guaranteed by the standard to be the same for two equal objects.
We override equals when we have a better notion of equality. For instance, two strings should be considered equal if they have the same characters, even if they are distinct objects in memory, and two array lists should be equal if their elements are equal pointwise. But for something like Thread, what would that even mean? When should two arbitrary threads be equal? The default suffices well enough, because we'd gain nothing from overriding it anyway.
If the class has at least one non-static field, one can override hashCode and equals
What does equality have to do with the number of non-static fields? I can override the two just fine. Watch.
public final class MySimpleClass {
public boolean equals(Object other) {
return (other != null) && (other instanceof MySimpleClass);
}
public int hashCode() {
return 42;
}
}
That's a perfectly valid, conformant implementation of equality and hashing for MySimpleClass. In particular, since there's only one meaningfully distinct value of this class, I'd argue that's a good implementation of the two methods. No non-static fields required.
== always returns false if you compare 2 different objects and always returns true if you compare an object to itself.
But it is not guaranteed, that 2 different objects return different hash codes. That's because hashCode() returns int and there's only about 4 billion distinct ints. The number of objects in your code is constrained by the size of heap only.
So, because there can be more than 4 billion distinct objects, their hash codes can sometimes be the same
As for equals, it works as == by default, but can be overridden, so == can return false, when equals returns true and vice versa
equals and hashCode have an unenforceable-at-compile-time contract between them (which itself is different than the == operator).
Fundamentally speaking, an object should override hashCode such that a.equals(b) (and its inverse) is congruent to a.hashCode() == b.hashCode() (and its inverse).
The == operator is only looking to compare numeric equality, which is why the same instance of an object compared against itself (or a == a) will return true, with some caveats given to Strings and string interning.
Because the contract between equals and hashCode is unenforceable, suggesting that == will always return a "correct" result depends on your definition of "correct".
For instance:
It's correct that a square is a parallelogram; it's not correct that any given square is the same as any given parallelogram.
It's correct that a book is a dictionary; it's not correct that any given book is a dictionary.
It's correct that a car has wheels; it's not correct that any given car has any given number of wheels.
Also too - just remember that hashCode is only 32 bits, so there's always going to be the chance of a collision between two unrelated objects (which is where having equals pick up the slack is beneficial here).
In this context, you can only trust == based on the constraints and conditions the individual object has, and what business rules make sense for equality comparisons given a hash code, and nothing further. If your business rules require a deviation between how equals and hashCode behave, then you have to keep that context in mind when comparing through those methods.
Firstly, == checks the memory reference. JVM does this using pointers internally. So each object is literally different as they are stored in different memory address. As compared using memory address location 32 or 64 bit int/number.
For your second question, if you need to have a hash code implementation, but the class has no fields. Then use System.identityHashCode() to do it. It will provide zero for null object and a unique/smal hashcode for same object.
From what I understand, the == operator in Java compares references
(an int) of objects.
On a 64-bit architecture, a reference will need 8 bytes. This is a long, not an int.
This value is what the default implementation of hashCode method in Object returns.
When you cast a long to an int, you will lose information. This is why the default implementation of hashCode() can return equal hashes for different objects.

How this hashCode method works?

So I did not fully understand how hashCode Overridng Works, so I searched for a tutorial on a hashCode Overriding. I found a tutorial where I learned the concept of a hashCode, same object must have same hashCode( but that does not mean that different objects must have diffrent hashCode). What I did not understand is his implementation of the hashcode:
#Override
public int hashCode() {
int hash = 7;
hash = 31 * hash + Objects.hashCode(this.myShirtColor);
return hash;
}
What i do not understand here is what will Objects.hashCode(this.myShirtColor) give?
myShirtColor is a String.
Ok first you need to understand two things String immutability and String Pool.
String immutability
it means that the content of the String Object can't be changed, once
it is created.
String Pool
The Java string constant pool is an area in heap memory where Java
stores literal string values. The heap is an area of memory used for
run-time operations. When a new variable is created and given a value,
Java checks to see if that exact value exists in the pool.
Let's sum up both together in a sample , assuming String strOne="abc"; the abc value is created once (String Immutability) and stored on the string pool in heap (String Pool), Ok so what if I do another string String strTwo= "abc"; on the same JVM the compiler will check the pool if it abc is exist which is true then it is retrieved.
That means: strOne == strTwo is true since both refer to the same object
Ok back to hashcode you can see now if 2 object on your case has the same
shirtColor will have the same hashcode
if shirtColor="blue" for example
Then right after you have an object e.g shirt1 of your class -you did not mention- but assuming Shirt class a blue value string is stored in heap if you created another object of class Shirt with color "blue" e.g shirt2, the blue value will be fetched from string pool the very exact same object as shirt1's shirtColor if you then called the hashCode for both objects it will be the same since it depends on the very exact object "blue".
Objects.hashCode is a simple function that does check object if null returns 0 otherwise returns object's hashCode
https://docs.oracle.com/javase/7/docs/api/java/util/Objects.html#hashCode(java.lang.Object)

String interning and HashSet in java

I have read about string interning, in which String literals are reused, whereas String object created using new aren't reused. This can be seen below when I print true and false for their equality. To be specific, (p1==p2)!=p3, So there are two objects, one pointed by p1 and p2 and another by p3. However, when I add them to HashSet, all considered same. I was expecting a.size() to return 2, but it returned 1. Why is this so?
package collections;
import java.util.HashSet;
public class Col {
public static void main(String[] args) {
method1();
}
public static void method1()
{
HashSet a = new HashSet();
String p1 = "Person1";
String p2 = "Person1";
String p3 = new String("Person1");
if(p1 == p2)
System.out.println(true);
else
System.out.println(false);
if(p1 == p3)
System.out.println(true);
else
System.out.println(false);
a.add(p1);
a.add(p2);
a.add(p3);
System.out.println(a.size());
}
}
Output
true
false
1
HashSet uses equality to keep a unique set of values, not identity (i.e., if two objects are equals to each other, but not ==, a HashSet will only keep one of them).
You can implement a set that uses identity instead of equality by using the JDK's IdentityHashMap with a dummy value shared between all keys, in a similar way that HashSet is based on HashMap.
I have read about string interning, in which String literals are reused, whereas String object created using new aren't reused. This can be seen below when I print true and false for their equality. To be specific, (p1==p2)!=p3, So there are two objects, one pointed by p1 and p2 and another by p3. However, when I add them to HashSet, all considered same. I was expecting a.size() to return 2, but it returned 1.
This is right only if you compare String using ==, the result is different when comparing using equals() method. (In doubt, you can test).
When adding into HashSet, the comparison method used is equals() as its proper for objects. And so, p1, p2 and p3 are equals.
You can try testing using equals() it will output true, true, 1 instead of true, false, 1
p1 and p2 are string literals and they are pointing to the same value because of string pool. So, when we are comparing them using == then they are matching.
p3 is a string object, so when we match using == then it tries to match using reference, so it gives false.
HashSet's add method call HashMap's put method internally. HashMap's put method use hashCode and equals method to set the value in HashMap. String implement hashCode and equals method and provide same hashCode for same value. HashSet contain unique value, so it store only one value.
This is one of those cases where I would recommend learning how to use javap to understand how your code is compiled but let me try to explain what is going on under the hood.
When Java compiles that class, it creates instructions for building what is called the constant pool for that class. That constant pool will hold a reference to a string with the value "Person1". The compiled logic will also say p1 and p2's value should be set to the constant pool's reference to that string (the address in memory that it lives in). Calling p1==p2 will return true because they literally have the same exact value. When you call String p3 = new String("Person1"); you are telling Java to create a new string in a different place in memory which is merely a copy of the original one and then set p3's value as a reference to the place in memory that the new string object lives in. So if you call p1 == p3 it will return false because what you are saying is "does p1's location in memory equals p2's location in memory?"
As others have pointed out, if you called p1.equals(p3) it returns true because .equals compares the string values instead of the references. And a HashSet will see them all the same because it uses the method .hashCode which is similar to .equals in the sense that it generates a hash off of the string value.
Hopefully that clears up some of the confusion!

Check equality of a ENUM value with a String in JAVA

I know the correct way to do this is Days.MONDAY.name().equals(day). But I'm wonder why Days.MONDAY.equals(day) fails when both prints MONDAY.
I know I'm missing something with equals() and toString(). I wanna clearly know what is it.
String day = "MONDAY";
System.out.println("main().Days.MONDAY : " + Days.MONDAY); // Prints MONDAY
System.out.println("main().day : " + day);// Prints MONDAY
System.out.println("main().Days.MONDAY.equals(day) : " + Days.MONDAY.equals(day)); // Why is this false when below is OK.
System.out.println("main().Days.MONDAY.toString().equals(day) : " + Days.MONDAY.toString().equals(day));// This is true
System.out.println("main().Days.MONDAY.name().equals(day) : " + Days.MONDAY.name().equals(day)); // This is true and I know this is the correct way
Edit: This is the enum.
enum Days{
MONDAY,TUESDAY,WEDENSDAY,THURSDAY,FRIDAY,SATURDAY,SUNDAY;
}
The equals method of an Enum compares the Static instances of the Enum. Because any representation of an Enum is pointing to the same object instance.
So the equals method of the Enum is not comparing the Name or toString it compares the instances.
String day="MONDAY";
The above line create Object inside Constant Pool , Where as
public enum Days{
MONDAY <-- Created in HEAP
}
Now Coming to
Days.MONDAY.equals(day) --> Why False ?
equals() method of Enum compares the instances of the Enum not the data as String#equals() does !!
Days.MONDAY.toString().equals(day) --> Why true ?
because it is String#equals() method which is overloaded !!
Look at is-it-ok-to-use-on-enums-in-java. Based on this, Java's implementation of equals on Enum simply performs ==. Since the Enum and the String day in your example are not the same object, it returns false.
The methods of the class Object have a strictly defined contract.
One of those methods is the Object.equals() method - here is its documentation.
To be able to maintain the symmetry requirement, it is practically impossible to return true in any implementation of equals() unless the two objects being compared are of the same class. equals() is supposed to represent some sort of equivalent between their properties, but objects which are not of the same class do not have the same properties.
Do not confuse the Days object Days.MONDAY with the string returned from Days.MONDAY.toString(). Its toString() just returns a string that represents it, and two strings are objects that can be equal. But Days.MONDAY.toString() is not the object itself (try Days.MONDAY.equals( Day.MONDAY.toString() ) and you'll get false here, too!
When you send an object to the print() or println() methods of System.out or any other PrintWriter, print() will take that object's toString() value and print it. This is the reason why they "print the same thing". It is not actually the MONDAY object that's being printed (it's hard to define "printing an object"), it's the string "MONDAY" that's returned from its toString() method.
All this would hold true even if Days was not an enum but some other object that is not a string, though in the particular case of an enum, its equals() method is indeed a comparison of references rather than attributes.

equals and hashCode

I am running into a question about equals and hashCode contracts:
here it is
Given:
class SortOf {
String name;
int bal;
String code;
short rate;
public int hashCode() {
return (code.length() * bal);
}
public boolean equals(Object o) {
// insert code here
}
}
Which of the following will fulfill the equals() and hashCode() contracts for this
class? (Choose all that apply.)
Correct Answer
C:
return ((SortOf)o).code.length() * ((SortOf)o).bal == this.code.length() *
this.bal;
D:
return ((SortOf)o).code.length() * ((SortOf)o).bal * ((SortOf)o).rate ==
this.code.length() * this.bal * this.rate;
I have a question about the last choice D, say if the two objects
A: code.length=10, bal=10, rate = 100
B: code.length=10, bal=100, rate = 10
Then using the equals() method in D, we get A.equals(B) evaluating to true right? But then they get a different hashCode because they have different balances? Is it that I misunderstood the concept somewhere? Can someone clarify this for me?
You're right - D would be inappropriate because of this.
More generally, hashCode and equals should basically take the same fields into account, in the same way. This is a very strange equals implementation to start with, of course - you should normally be checking for equality between each of the fields involved. In a few cases fields may be inter-related in a way which would allow for multiplication etc, but I wouldn't expect that to involve a string length...
One important point which often confuses people is that it is valid for unequal objects to have the same hash code; it's the case you highlighted (equal objects having different hash codes) which is unacceptable.
You have to check at least all the fields used by .hashCode() so objects which are equal do have the same hash. But you can check more fields in equals, its totally fine to have different objects with the same hash. It seems your doing SCJP 1.6? This topic is well covered in the SCJP 1.6 book from Katherine Sierra and Bert Bates.
Note: thats why its legit to implement a useful .equals() while returning a constant value from .hashCode()
It's all about fulfilling the contract (as far as this question is concerned). Different implementation (of hasCode and equal) has different limitations and its own advantages - so its for developer to check that.
but then they get different hashCode because they have a different balance?
Exactly! But that's why you should choose option C. The question wants to test your grasp on fulfilling the contract concept and not which hascode will be better for the scenario.
More clarification:
The thing you need to check always is :
Your hashCode() implementation should use the same instance variables as used in equals() method.
Here these instance variables are : code.length() and bal used in hashCode() and hence you are limited to use these same variables in equals() as well. (Unless you can edit the hashCode() implementation and add rate to it)
hashCode() method is used to get a unique integer for given object. This integer is used to determined the bucket location, when this object need to be stored in some HashTable like HashMap data structure. But default, Object's hashCode() method returns an integer to represent memory address where object is stored.
equals() method, as name suggest, is used to simply verify the equality of two objects. Default implementation just simply check the object references of two object to verify their equality.
equal objects must have equal hash codes.
equals() must define an equality relation. if the objects are not modified, then it must keep returning the same value. o.equals(null) must always return false.
hashCode() must also be consistent, if the object is not modified in terms of equals(), it must keep returning the same value.
The relation between the two method is:
whenever a.equals(b) then a.hashCode() must be same as b.hashCode().
refer to: https://howtodoinjava.com/interview-questions/core-java-interview-questions-series-part-1/
In general, you should always override one if you override the other in a class. If you don't, you might find yourself getting into trouble when that class is used in hashmaps/hashtables, etc.

Categories