String interning and HashSet in java - java

I have read about string interning, in which String literals are reused, whereas String object created using new aren't reused. This can be seen below when I print true and false for their equality. To be specific, (p1==p2)!=p3, So there are two objects, one pointed by p1 and p2 and another by p3. However, when I add them to HashSet, all considered same. I was expecting a.size() to return 2, but it returned 1. Why is this so?
package collections;
import java.util.HashSet;
public class Col {
public static void main(String[] args) {
method1();
}
public static void method1()
{
HashSet a = new HashSet();
String p1 = "Person1";
String p2 = "Person1";
String p3 = new String("Person1");
if(p1 == p2)
System.out.println(true);
else
System.out.println(false);
if(p1 == p3)
System.out.println(true);
else
System.out.println(false);
a.add(p1);
a.add(p2);
a.add(p3);
System.out.println(a.size());
}
}
Output
true
false
1

HashSet uses equality to keep a unique set of values, not identity (i.e., if two objects are equals to each other, but not ==, a HashSet will only keep one of them).
You can implement a set that uses identity instead of equality by using the JDK's IdentityHashMap with a dummy value shared between all keys, in a similar way that HashSet is based on HashMap.

I have read about string interning, in which String literals are reused, whereas String object created using new aren't reused. This can be seen below when I print true and false for their equality. To be specific, (p1==p2)!=p3, So there are two objects, one pointed by p1 and p2 and another by p3. However, when I add them to HashSet, all considered same. I was expecting a.size() to return 2, but it returned 1.
This is right only if you compare String using ==, the result is different when comparing using equals() method. (In doubt, you can test).
When adding into HashSet, the comparison method used is equals() as its proper for objects. And so, p1, p2 and p3 are equals.
You can try testing using equals() it will output true, true, 1 instead of true, false, 1

p1 and p2 are string literals and they are pointing to the same value because of string pool. So, when we are comparing them using == then they are matching.
p3 is a string object, so when we match using == then it tries to match using reference, so it gives false.
HashSet's add method call HashMap's put method internally. HashMap's put method use hashCode and equals method to set the value in HashMap. String implement hashCode and equals method and provide same hashCode for same value. HashSet contain unique value, so it store only one value.

This is one of those cases where I would recommend learning how to use javap to understand how your code is compiled but let me try to explain what is going on under the hood.
When Java compiles that class, it creates instructions for building what is called the constant pool for that class. That constant pool will hold a reference to a string with the value "Person1". The compiled logic will also say p1 and p2's value should be set to the constant pool's reference to that string (the address in memory that it lives in). Calling p1==p2 will return true because they literally have the same exact value. When you call String p3 = new String("Person1"); you are telling Java to create a new string in a different place in memory which is merely a copy of the original one and then set p3's value as a reference to the place in memory that the new string object lives in. So if you call p1 == p3 it will return false because what you are saying is "does p1's location in memory equals p2's location in memory?"
As others have pointed out, if you called p1.equals(p3) it returns true because .equals compares the string values instead of the references. And a HashSet will see them all the same because it uses the method .hashCode which is similar to .equals in the sense that it generates a hash off of the string value.
Hopefully that clears up some of the confusion!

Related

Does newBuilder() from Protobuf generated classes create a new Java object?

Does newBuilder() creates a new Java object? It doesn't seem like to be the case for my quick test. Calling .hashcode() on 2 different objects, it has the same hash code.
import com.mydomain.proto.users.api.User;
...
User a = User.newBuilder().setUserUuid("1111111111").build();
User b = User.newBuilder().setUserUuid("1111111111").build();
System.out.println("a hashcode: " + a.hashCode());
System.out.println("b hashcode: " + b.hashCode());
// assertNotEquals fails.
assertNotEquals(a.hashCode(), b.hashCode());
Printing them out and see that the hash code is the same, though I am expecting a new Java object.
a hashcode: 611667980
b hashcode: 611667980
Note, we are using this
'com.google.protobuf:protobuf-java-util:3.12.0'
The implication of the hashcode and same (==) is not bidirectional.
Two objects that are the same object, or own the same contents, should have the same hashcode.
Two objects sharing the same hashcode doesn't imply they're the same object.
In order to test this, check
System.out.println(a==b);
It will print false, because they are different Objects, even if they share the same hashcode.
Source code for Message:
Equals check first if the objects are indeed the same (==) . If not, checks if their fields and properties contain the same values.
Hashcode will output the same value if two objects contain the same properties and fields. Regardless they're the same object (other==this) or not. Just like Strings do.
This is what Java Object's hashCode() tells:
Example with Strings
String s1 = new String("Yepp"); //hashcode = 2752044
String s2 = new String("Yepp"); //hashcode = 2752044
String s3 = s1; //hashcode = 2752044
All share the same hashcode. But there are two different Objects here:
s1/s3 and s2.
System.out.println(s1==s2); // false
System.out.println(s1==s3); // true
System.out.println(s1.hashCode()==s2.hashCode()); //true
That's why is an usual mistake comparing two strings with the == operator. This is not checking the values, but the object reference.
If the class overrides equals, (as Strings do), it will tell you they're equal objects: Equal as twins may be, but twins are two anyway.
System.out.println(s1.equals(s2)); // true
Resume: Yes, it's creating new objects. The values inside are meaningless, and sharing the same hashcode only means they both have the same contents (and that their hashcode algorithm works well)

String manipulated through reflection and its impact on equals method

It prints true for both of the following print statements in the sample code. I understand, its as per the logic of equals method of String class as:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
...
}
But I am unable to figure out how their hashcode remains unchanged. Does the condition, this == anObject has any relationship with the hashCode method of String class? If yes then how are they equal.
Please help me to understand this.
It is true that value of a string can be modified through reflection(where it losses its immutability nature). But in this case the hashcode remains unchanged. Why?
import java.lang.reflect.Field;
public class StringHacker {
public static void main(String[] args) throws Exception {
String myMonth = "January";
char[] yourMonth = {'M', 'a', 'y'};
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
value.set(myMonth, yourMonth);
System.out.println(myMonth.equals("January"));
System.out.println(myMonth.equals("May"));
}
}
The output is:
true
true
But in this case the hashcode remains unchanged. Why?
The answer is that String::hashCode caches its result in a private field. So if you do this:
String s = /* create string */
int hash = s.hashcode();
/* use reflection to mutate string */
int hash2 = s.hashCode();
you will find that hash and hash2 are the same value. This is just one more reason why it is a bad idea to use reflection to mutate strings.
(But if you read the code for String you can see how hashCode is implemented and then use reflection to clear the cached hashcode value.)
The hashcode does not change because String is an immutable class.
That means by contract its value will not change. As the same value must always have the same hashcode, there is no need ever to change the hashcode. Even worse, an object with a hashcode changing over time may get you in big trouble, e.g. when dealing with Set and Map.
An object must not change its hashcode!
If you alter a string's value via reflection you're actively breaking the contract and thus causing undefined, chaotic and possibly catastrophic behaviour.
You mention hashcode in your question, but never call it nor display it's value, nor compare hoshcode values. So to answer your question :
Does the condition, this == anObject has any relationship with the hashCode method of String class?
The answer is an emphatic "no" (other, of course, than the obvious case that two references to the same object will obviously be calling the same method and get returned the same result). Likewise, hashcode() is also not called/considered by the equals() method.
So let's consider ==, equals(), and hashcode(), and how these play out in your example. Firstly, though, it must be mentioned that you are using reflection in a way that it was never intended to be used. There are situations where calling value.set(object, value) is valid and necessary - but changing the value of an immutable class like "String" is not one of them. Upshot is that it's not surprising to get weird results by doing things like that.
Let's start by restating that every object (such as a String) lives at its own location in the computer's memory. For example, consider code like :
String myName = "Fred";
String yourName = "Fred";
String databaseName = fetchNameFromDatabase(); // returns "Fred"
boolean mineIsYours = (myName == yourName); // true
boolean mineIsDatabases = (myName == databaseName); // false
boolean mineEqualsDatabases = myName.equals(databaseName); // true
All 3 Strings will have the same value "Fred" - but there's a neat trick. When the Java compiler compiles the program, it will load all hard-coded strings into the .class file. Since Strings are immutable, it saves some space by creating unique values in a "String pool" - so for my example, "Fred" will only be created ONCE, and myName and yourName will both be pointing to the SAME instance in memory - hence mineIsYours will be true.
Strings created dynamically (eg read from the database) would not use this String pool, so will be different instances even though they may have the same value - hence the importance to test equality using equals() rather than ==.
Can you now see what's happening in your program ? Let's look at a few specific lines :
String myMonth = "January";
"January" is a hard-coded constant, so it's put in the String pool, and myMonth is pointing to the location of that instance in memory.
value.set(myMonth, yourMonth);
The value of myMonth - ie, the value of that instance in memory that myMonth is pointing to - is changed to be "May".
System.out.println(myMonth.equals("January"));
Calls "equals" on myMonth, passing in the instance of the hard-coded String that the Java compiler put into the String pool for "January". However, this instance is THE SAME INSTANCE that myMonth was initialised to (remember my variable mineIsYours was true) !! Yes, THE SAME INSTANCE that you changed the value of to be "May".
So, when you changed myMonth's instance value in the String pool from "January" to "May", you didn't just change it for that one myMonth variable, but for EVERY hard-coded "January" value in the program !
System.out.println(myMonth.equals("May"));
The value of the instance that myMonth is pointing to has been changed to "May", so this is true.
So where is hashcode() used in all this ? As I mentioned earlier, it isn't. Not at all.
From your question, I'm wondering : is your understanding that two objects are equal if their hashcodes match ? If so, no - not at all. There is NO - repeat NO - requirement that hashcodes be unique, which means the idea of "equal if hashcodes match" obviously fails.
The purpose of hashcode() is to give a wide spread of values for different instances of the class. This is used in structures like HashMap,etc to put objects into different "buckets" for quick retrieval.
The implicit connection between equals() and hashcode() is that :
1) where one is created (or rather, overridden), then the other
should be as well, and
2) hashcode() calculation should use the
exact same fields as equals() - no more and no less.

How and what does "==" operator in java check?

I have been stuck with this for a while now, Actually I have 2 objects which according to me are same, I have overriden the hashCode method to create equal hash codes for both, still when I compare them for equality using "==" or Object's equals(which too uses "=="), it returns false to me, The below scenario should exactly explain:::
1)HashCode-->-626561382 AND 2)HashCode--->-626561382
1)IdentityHashCode-->19640463
2)IdentityHashCode-->22330755
1)Bean1=beans.OrdersBean#daa76e9a AND 2)Bean2=beans.OrdersBean#daa76e9a
Check MySelf for(==)-->false
Check Object's Equals()-->false
Please kindly explain me why is this happening????
== operator compare refrences(Memory location) of Objects in java...
if you compare objects then use .equals()
if(obj1.equals(obj2)){
}
See here
To compare two objects for equal value you need to override the equals method.
The == operator as others have mentioned compares references (i.e. is it the exact same object)
Explanation
If you take the example of identical twins Ben and Adam, using an == would return false when comparing the two since Ben is not Adam (even though they look the same), if you use .equals and the comparison is based on looks then this would return true.
In Java == is used to compare reference. To valuate if two objects are equivalent use equals.
Note If you have to compare custom objects consider ovverriding equals in your class according to your equivalence criteria.
Override .equals from the parent Object's method, this is intended for "deeper" comparisons whereas == pertains to checking that the references (identifiers) are referring to (so that updates to one apply to both) the same instantiation.
1. Using '==' :
When you want to check if two reference variables are referring to a same object you should use == operator in java. For example - (Assume there is a class called Person)
Person person1 = new Person();
Person person2 = person1;
System.out.println(person1 == person2); // true
Here as we have used new only once, only one object is getting created in the heap memory and we are assigning it to a reference variable -> person1. In the second statement we are assigning person1 to person2. So actually there is only one object in memory but both person1 and person2 are referring to the same object.
[In short we can say that, similar to primitives, == compares the value in the variable which in case of reference variables is memory address of the actual object].
2. Using '.equals()' :
When you want to check if the two objects are meaningfully equal then use .equals() method. For example -
Person person1 = new Person();
Person person2 = new Person();
System.out.println(person1.equals(person2)); // false
Here we are creating altogether two different objects, so they are not meaningfully equal. Hence the equals() method will return false.

Java hashCode doubt

I have this program:
import java.util.*;
public class test {
private String s;
public test(String s) { this.s = s; }
public static void main(String[] args) {
HashSet<Object> hs = new HashSet<Object>();
test ws1 = new test("foo");
test ws2 = new test("foo");
String s1 = new String("foo");
String s2 = new String("foo");
hs.add(ws1);
hs.add(ws2);
hs.add(s1);
hs.add(s2); // removing this line also gives same output.
System.out.println(hs.size());
}
}
Note that this is not a homework. We were asked this question on our quiz earlier today. I know the answers but trying to understand why it is so.
The above program gives 3 as output.
Can anyone please explain why that is?
I think (not sure):
The java.lang.String class overrides the hashCode method from java.lang.Object. So the String objects with value "foo" will be treated as duplicates. The test class does not override the hashCode method and ends up using the java.lang.Object version and this version always returns a different hashcode for every object, so the two test objects being added are treated as different.
In this case it's not about hashCode() but is about equals() method. HashSet is still Set, which has semantic of not allowing duplicates. Duplicates are checked for using equals() method which in case of String will return true
However for your test class equals() method is not defined and it will use the default implementation from Object which will return true only when both references are to the same instance.
Method hashCode() is used not to check if objects should be treated as same but as a way to distribute them in collections based on hash functions. It's absolutely possible that for two objects this method will return same value while equals() will return false.
P.S. hashCode implementation of Object doesn't guarantee uniqueness of values. It's easy to check using simple loop.
Hashcode is used to narrow down the search result. When we try to insert any key in HashMap first it checks whether any other object present with same hashcode and if yes then it checks for the equals() method. If two objects are same then HashMap will not add that key instead it will replace the old value by new one.
In fact, it is not about overriding the hashcode(), it is about equals method. Set does not allow duplicates. A duplicate is the one where the objects are logically equal.
For verifying you can try with
System.out.println(ws1.equals(ws2));
System.out.println(s1.equals(s2));
If the objects are equal, only one will be accepted by a set.
Below are few (well quite many) bullets refarding the equals and hashcode from my preparations to SCJP.
Hope it helps:
equals(), hashCode(), and toString() are public.
Override toString() so that System.out.println() or other methods can see something useful, like your object's state.
Use == to determine if two reference variables refer to the same object.
Use equals() to determine if two objects are meaningfully equivalent.
If you don't override equals(), your objects won't be useful hashing keys.
If you don't override equals(), different objects can't be considered equal.
Strings and wrappers override equals() and make good hashing keys.
When overriding equals(), use the instanceof operator to be sure you're evaluating an appropriate class.
When overriding equals(), compare the objects' significant attributes.
Highlights of the equals() contract:
a. Reflexive: x.equals(x) is true.
b. Symmetric: If x.equals(y) is true, then y.equals(x) must be true.
c. Transitive: If x.equals(y) is true, and y.equals(z) is true, then z.equals(x) is true.
d. Consistent: Multiple calls to x.equals(y) will return the same result.
e. Null: If x is not null, then x.equals(null) is false.
f. If x.equals(y) is true, then x.hashCode() == y.hashCode() is true.
If you override equals(), override hashCode().
HashMap, HashSet, Hashtable, LinkedHashMap, & LinkedHashSet use hashing.
An appropriate hashCode() override sticks to the hashCode() contract.
An efficient hashCode() override distributes keys evenly across its buckets.
An overridden equals() must be at least as precise as its hashCode() mate.
To reiterate: if two objects are equal, their hashcodes must be equal.
It's legal for a hashCode() method to return the same value for all instances (although in practice it's very inefficient).
In addition if you implement equals and hashcode the transient fields (if any) must be treated properly.
The Commons have nice implementation for EqualsBuilder and HashcodeBuilder. They are available in Coomons Lang
http://commons.apache.org/lang/
I use them whenevr I need to implement the equals and the hashcode.

Example of ==, equals and hashcode in java

Given this:
String s1= new String("abc");
String s2= new String("abc");
String s3 ="abc";
System.out.println(s1==s3);
System.out.println(s1==s2);
System.out.println(s1.equals(s2));
System.out.println(s1.equals(s3));
System.out.println(s1.hashCode());
System.out.println(s2.hashCode());
System.out.println(s3.hashCode());
Output is:
false
false
true
true
96354
96354
96354
Here == is giving false for each object but the hashcode for each String object is same. Why is it so?
== does compare real equality of objects (I mean - both references point to the same object), not their content, whereas .equal compares content (at least for String).
String a = new String("aa");
String b = new String("aa");
a and b are pointing to different objects.
Notice also that if objects are equal then their hashchodes must be the same, but if hashcodes are the same, it doesn't mean that objects are equal.
The equals contract says that if o1.equals(o2), then o1.hashCode() == o2.hashCode(). It doesn't specify anything about the hash codes of unequal objects. You could have a method like
public int hashCode()
{
return 42;
}
and it'd fulfill the contract. It's just expected that the hash code be related to the value of the object, in order to make hash tables work more efficiently.
Now, as for why your == doesn't work, two objects will always be compared by reference. That is, if o1 == o2, then o1 and o2 are the exact same object. That's rarely what you want; you usually want to see if o1.equals(o2) instead.
When you use ==, you are comparing if two variables hold reference to the same Object. In other words s1 == s2 is like asking: are the s1 and s2 variables referring to the same String object? And that's not true, even when both String objects have the same "abc" value.
When you use equals(), you are comparing the value of both objects. Both objects may not be the same, but their value (in this case "abc") is the same, so it returns true.
How do you define whether an object is equal to another? That's up to you. In this case the String object already defines this for you, but for example if you define a Person object, how do you know if a person P1 is equal to P2? You do that by overriding equals() and hashCode().
== tells you whether the two variable references point at the same object in memory, nothing more. equals() and hashCode() both look at the contents of the object and each uses its own algorithm for calculation.

Categories