java: memoizing construction through hash function

java: memoizing construction through hash function - java

I have an X object whose constructor takes in 4 integers fields. To calculate it's hash function, I simple throw them in an array and use Arrays.hashCode.
Currently the constructor is private and I have a static creator method. I'd like to memoize construction so that whenever the creator method is called with 4 integer parameters that have been called before, I can return the same object as last time. [Ideally without having to create another X object to compare with.]
Originally I tried a hashSet but that required me to create a new X to check if my hashSet.contains the equal object... nevermind the fact that I can't 'get' out of a hashSet.
My next idea is to use a HashTable which maps:
the hashCode of the int array of the 4 fields --> object. I'm not sure why, but that doesn't feel right. It feels like I'm doing too much work, isn't the point of a hashCode to be a sort of mapping to a bunch of objects which calculate to the same hashCode?
I appreciate your advice.

The purpose of a hash code is generally to narrow down the location in which to look for a particular object. Or put another way, the idea is that your hash code makes it so that if two objects have the same hash code they are "very likely" to be the same object.
Now, how likely is "very likely" essentially depends on the width (number of bits) and quality of the hash code. In the case of Java, with 32 bit hash codes, this "very likely" still generally means "not near enough to 100% that you can do away with an actual comparison of the object data". So as well as implementing hashCode(), you need to implement equals() on an object that is used as the key to a Java Map (HashMap etc).
Or put another way: your implementation is essentially correct, even though it looks like you're doing a lot of work. The upshot is that if what you are looking for is a performance improvement, you may as well just create a new object each time. But if functionally you require that there never exists more than one object with a given set of values, then your implementation is essentially correct.
Things you could do in principle:
if you had a large number of ints, then for the hashCode(), just form the hash code from a 'sample' of a couple of them -- the idea is to 'narrow down the choices' or make it 'fairly but not 100% likely' that equal hash code will mean equal object-- your equals() has to go through and check them anyway, so there's little point in cycling through all values in both hashCode() and equals();
potentially, you can use a stronger hash code, so that you literally assume that equal hash codes mean equal objects. In effect, you cycle through all of the values once in the hash code function and don't have an equals function at all. In practice this means using at least a strong-ish 64 bit hash code. It's probably not worth it for the case you mention. But if you want to understand a little about how it would work, I would point you to a tutorial I wrote on the advanced use of hash codes in Java.

If the 4 integers during construction mean the resulting object will be exactly the same, then use those as the key, not their hash. Notice I'm not using your full Object as the key, just the 4 integer values. The MyObjectSpecification below will be a tiny object.
public class MyObjectSpecification {
private final int i1, i2, i3, i4;
public MyObjectSpecification(int i1, int i2, int i3, int i4) {
this.i1 = i1;
this.i2 = i2;
this.i3 = i3;
this.i4 = i4;
}
public boolean equals(Object o) {
// ...
}
public int hashCode() {
// ...
}
}
public class MyObject {
private static final Map<MyObjectSpecification, MyObject> myObjects
= new ConcurrentHashMap<MyObjectSpecification, MyObject>();
private MyObject(MyObjectSpecification spec) {
// ...
}
public static MyObject getMyObject(int i1, int i2, int i3, int i4) {
MyObjectSpecification spec = new MyObjectSpecification(i1, i2, i3, i4);
if (myObjects.containsKey(spec)) {
return myObjects.get(spec);
}
MyObject newObject = new MyObject(spec);
myObjects.put(spec, newObject);
return newObject;
}
}

Not sure how you plan to use the Hashtable but I think below would do your job:
private static Hashtable<Integer, MyObject> objectInstances =
new Hashtable<Integer, MyObject>();
public static MyObject instance(int i1, int i2, int i3, int i4){
int hashKey = Arrays.hashCode(new int[]{i1, i2,i3,i4});
//get the object from hashtable
MyObject myObject = objectInstances.get(hashKey);
//if object was not already created, create now and put in the hashtable
if(myObject == null){
myObject = new MyObject(i1,i2,i3,i4);
objectInstances.put(hashKey, myObject);
}
return myObject;
}

Related

Creating or tracking classes with the same variable values

Is there any way to create a class that gives identical references to class objects with the same valued variables, or otherwise any way to induce behaviour where I can easily track objects with the same variables? To elaborate, I am trying to implement the following code:
A Position class:
public final class Position {
private final int x, y, layer;
...
with relevant functions, and I am trying to track Entities with a
private Map<Position, List<Entity>> positionMap;
in another class. The problem is that to my understanding, once I stick a Position object as a key, I cannot create another Position object at the same "coordinates" and access the value held at the previous Position. Is there any easy workaround or else some kind of commonly accepted solution to this kind of problem? At the moment, my solution is to loop through each Position held in the keys and compare with .equals(), which defeats the point of a Map.

I think you misunderstand.
Here, trivially:
public final class Position {
private final int x, y, layer;
// very important!
public int hashCode() {
// proper impl that mixes all 3 variables into it
}
public boolean equals(Object other) {
// proper impl
}
}
Then:
Map<Position, List<Entity>> map = new HashMap<>();
Position p1 = new Position(1, 2, 5);
Position p2 = new Position(1, 2, 5);
System.out.println(p1 == p2); // false, as expected.
map.put(p1, List.of());
System.out.println(map.get(p2)); // prints an empty list, NOT null!
map.put(p2, List.of());
System.out.println(map.size()); // 1, not 2!
The map impl already loops through all keys and compares them with equals - you don't also have to do that. The reason HashMap is much faster than literally doing that is because it also uses hashCode(): The rules of the hashCode method state that any 2 objects whose hashCode are different cannot be identical (note that the reverse is not true; 2 objects with identical hashCode need not be equal). Hence, hashmap first calls hashCode to reduce the search space to a tiny bucket compared to the size of the whole map, and needs to do a full loop-scan through just the keys in this bucket. Usually a bucket contains just 1 object so that's great.
hashCode is not a key - This works:
public final class Position {
private final int x, y, layer;
// very important!
public int hashCode() {
return 1; // this is a really bad impl
}
public boolean equals(Object other) {
// proper impl
}
}
You can stuff the above Position objects in a hashmap if you must, and it'll work correctly. It won't be very fast - because all position objects have the same hashCode, it's 100% collisions, so the hashmap needs to loop through all keys for all operations. But it'll work. In the same vein, given a hashCode of position, you cannot do a lookup. HashMap simply doesn't allow that, and it wouldn't know what to return even if it did (because more than one key can have that hashCode. You shouldn't - collisions are bad, but it's no problem if you have a handful).

I think you can implement the hashcode like this:
// hashcode
public int hashCode() {
String a = "" + this.x + "," + this.y + "," + this.layer
return a.hashCode();
}

Uses of hashcode in Java apart from hashing collections [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?

hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.

From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)

hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.

The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation

A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here

Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.

hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.

One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

How to efficiently store a set of tuples/pairs in Java

I need to perform a check if the combination of a long value and an integer value were already seen before in a very performance-critical part of an application. Both values can become quite large, at least the long will use more than MAX_INT values in some cases.
Currently I have a very simple implementation using a Set<Pair<Integer, Long>>, however this will require too many allocations, because even when the object is already in the set, something like seen.add(Pair.of(i, l)) to add/check existence would allocate the Pair for each call.
Is there a better way in Java (without libraries like Guava, Trove or Apache Commons), to do this check with minimal allocations and in good O(?)?
Two ints would be easy because I could combine them into one long in the Set, but the long cannot be avoided here.
Any suggestions?

Here are two possibilities.
One thing in both of the following suggestions is to store a bunch of pairs together as triple ints in an int[]. The first int would be the int and the next two ints would be the upper and lower half of the long.
If you didn't mind a 33% extra space disadvantage in exchange for an addressing speed advantage, you could use a long[] instead and store the int and long in separate indexes.
You'd never call an equals method. You'd just compare the three ints with three other ints, which would be very fast. You'd never call a compareTo method. You'd just do a custom lexicographic comparison of the three ints, which would be very fast.
B* tree
If memory usage is the ultimate concern, you can make a B* tree using an int[][] or an ArrayList<int[]>. B* trees are relatively quick and fairly compact.
There are also other types of B-trees that might be more appropriate to your particular use case.
Custom hash set
You can also implement a custom hash set with a custom, fast-calculated hash function (perhaps XOR the int and the upper and lower halves of the long together, which will be very fast) rather than relying on the hashCode method.
You'd have to figure out how to implement the int[] buckets to best suit the performance of your application. For example, how do you want to convert your custom hash code into a bucket number? Do you want to rebucket everything when the buckets start getting too many elements? And so on.

How about creating a class that holds two primitives instead? You would drop at least 24 bytes just for the headers of Integer and Long in a 64 bit JVM.
Under this conditions you are looking for a Pairing Function, or generate an unique number from 2 numbers. That wikipeia page has a very good example (and simple) of one such possibility.

How about
class Pair {
int v1;
long v2;
#Override
public boolean equals(Object o) {
return v1 == ((Pair) o).v1 && v2 == ((Pair) o).v2;
}
#Override
public int hashCode() {
return 31 * (31 + Integer.hashCode(v1)) + Long.hashCode(v2);
}
}
class Store {
// initial capacity should be tweaked
private static final Set<Pair> store = new HashSet<>(100*1024);
private static final ThreadLocal<Pair> threadPairUsedForContains = new ThreadLocal<>();
void init() { // each thread has to call init() first
threadPairUsedForContains.set(new Pair());
}
boolean contains(int v1, long v2) { // zero allocation contains()
Pair pair = threadPairUsedForContains.get();
pair.v1 = v1;
pair.v2 = v2;
return store.contains(pair);
}
void add(int v1, long v2) {
Pair pair = new Pair();
pair.v1 = v1;
pair.v2 = v2;
store.add(pair);
}
}

What is the use of hashCode in Java?

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?

hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.

From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)

hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.

The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation

Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.

hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.

One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

In JVM heap can there be more than one object with the same hash code?

As per the title, can there be more than one object on the heap with the same hash code?

Yes.
public class MyObject {
#Override
public int hashCode() {
return 42;
}
public static void main(String[] args) {
MyObject obj1 = new MyObject();
MyObject obj2 = new MyObject(); // Ta-da!
}
}
For a less flippant answer, consider the hashCode Javadocs:
The general contract of hashCode is:
... (snip) ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

yes, since you can have as many objects with the same hashCode as you want. For example, the following code, without interning Strings, show this fact:
String foo = new String("dfa");
String bar = new String("dfa");
assert foo != bar; // yields false, two distinct objects (references)
assert foo.hashCode() == bar.hashCode(); // yields true

Trivial proof:
hashCode returns a 32 bit integer.
Allocate 2^32+1 Objects. (Probably going to need a 64-bit VM and a lot of RAM for this! ;-) )
Now your hashCode method, no matter how clever, must have a collision.

About hash codes: yes they are nearly, but not really unique. :) It depends on the implementation/theory how nearly unique they are.
But if we talk about the JVM, we must first of all talk about what kind of hash code you've meant.
If you talk about the result of the hashCode() method which is used f.e. by the HashMap, then the answer is: it depends on your implementation and the number of objects in your JVM.
It's your choice and plan and knowledge to resolve this conflict in a self-implemented hashCode() method.
If you talk about the result of the method System.identityHashCode( obj ), then it's a little bit different. This implementation doesn't call your hashCode() method. And the implementation isn't unique - but it's nearly unique, like many other different hash functions. :)
public class MyObject {
#Override
public int hashCode() {
return 42;
}
public static void main(String[] args) {
MyObject obj1 = new MyObject();
MyObject obj2 = new MyObject(); // Ta-da!
final int obj1Hash = System.identityHashCode( obj1 );
final int obj2Hash = System.identityHashCode( obj2 );
if( obj1Hash == obj2Hash ) throw new IllegalStateException();
}
}
In this example you will get different hashes, and in the most cases they are different, but not definitely unique...
Best regards!

Of course, and obviously you can write:
class MyClass{
...
public int hashCode(){
return 1;
}
...
}
in which case all instances of MyClass will have the same hash code.

Yes, hashcode is a standard algorithm that tries to avoid duplicates ('collisions') but doesn't guarantee it.
Moreover, it's overrideable, so you could write your own implementation yielding the same hashcode for every object; as to why you would want to do that I have no answer however. :-)

You can, but it's not generally a good idea. The example mentioned several times above:
public int hashCode(){
return 1;
}
is perfectly valid under the specification for hashCode(). However, doing this turns HashMap into a Linked List, which significantly degrades performance. So you generally want to implement hashCode to return values as unique as you can get it.
As a practical matter, though, collisions can occur with many implementations. Take this for example:
public class OrderedPair{
private int x;
private int y;
public int hashCode(){
int prime = 31;
int result=x;
result =result*prime+y;
return result;
}
public boolean equals(){...}
}
This is a pretty standard implementation of hashCode() (in fact, this is pretty close to the output which is automatically generated in IDEA and Eclipse), but it can have many collisions: x=1,y=0 and x=0,y=1 will work for starters. The idea of a well-written hashCode() implementation is to have few enough collisions that your performance is not unduly affected.

Yes you certainly can have more than one object with the same hashcode. However, usually this doesn't cause problems because the java.util.* data structures that use the object hashcode use it as a key into a "bucket" that stores all objects returning the same hash.

Object a = new Integer(1);
Object b = new Integer(1);
System.out.printf(" Is the same object? = %s\n",(a == b ));
System.out.printf(" Have the same hashCode? = %s\n",( a.hashCode() == b.hashCode() ));
Prints:
Is the same object? = false
Have the same hashCode? = true

in a 32-bit environment, I doubt any JVM would return same 'identity hash code' for different objects. but in 64-bit, that is certainly a possibility; the likelihood of collision is still very small given the limited memory we have now.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.