JPA : not overriding equals() and hashCode() in the entities? - java

After reading this article , im bending toward not overriding equals() and hashCode() altogether.
In the summary of that article, concerning the no eq/hC at all column, the only consequence is that i couldnt do the comparison operations like :
contains() in a List for detached entities, or
compare the same entities from different sessions
and expect the correct result.
But im still in doubt and would like to ask your experiences about this whether it is a bad practice to skip equals and hashCode altogether and what other consequences that i still dont know for now.
Just another point of information, im bending towards using List Collections over Set. And my assumption is that i dont really need to override hashCode and equal when storing in a List.

Read this very nice article on the subject: Don't Let Hibernate Steal Your Identity.
The conclusion of the article goes like this:
Object identity is deceptively hard to implement correctly when
objects are persisted to a database. However, the problems stem
entirely from allowing objects to exist without an id before they are
saved. We can solve these problems by taking the responsibility of
assigning object IDs away from object-relational mapping frameworks
such as Hibernate. Instead, object IDs can be assigned as soon as the
object is instantiated. This makes object identity simple and
error-free, and reduces the amount of code needed in the domain model.

whether it is a bad practice to skip equals and hashCode altogether
Yes. You should always override your equals and hashCode. Period. The reason is that this method is present already in your class, implemented in Object. Turns out that this implementation is generic, and nearly 100% of the times it's a wrong implementation for your own objects. So, by skipping equals/hashCode you are in fact providing a wrong implementation and will (in the best case scenario) confuse whoever uses these classes. It may be your colleagues, or it may be some framework you are using (which can lead to unpredictable and hard-to-debug issues).
There's no reason to not implement these methods. Most IDEs provides a generator for equals/hashCode. You just need to inform the IDE about your business key.

You got the exact opposite conclusion from that article of what it was trying to convey.
Hibernate heavily relies on equals being implemented properly. It will malfunction if you don't.
In fact, almost everything does; including standard java collections.
The default implementation does not work when using persistence. You should always implement both equals and hashcode. There's a simple rule on how to do it, too:
For entities, use the key of the object.
For value objects, use the values
Always make sure the values you use in your equals/hashcode are immutable. If you pass these out (like in a getter), preferably pass them out in an immutable form.
This advice will improve your life :)

Related

Check whether a Java Object has been modified

I would like to use a clean/automatic way to check if a Java Object has been modified.
My specific problem is the following:
In my Java application, I use XStream library to deserialize XML to Java Objects, then the user can modify or change them. I'd like a way to check if these Objects in memory are at some point different from the serialized ones, so I can inform the user and ask him if he want to save the changes (i.e. serialize using XStream) or not.
In my application there are many Objects and are quite complex.
Please consider that I don't use databases in my application, so I'm not interested in solutions like using hibernate.
Two approaches:
Implement a hashcode for your objects, and compare the hashcode of the in-memory objects against the hashcode of the serialized objects to see if they've been changed. This is has a low impact on your class design, but performance will go down as O(n^2) as the number of objects increases. Note that two objects might return the same hashcode, but a good hashing implementation will make this very unlikely. If you are concerned about this, implement and use your own equals() method.
Have your objects implement the Observer pattern and have each setter method, or any other method that modifies the object, notify the observer when it's called. Performance will be better for large numbers of objects (as long as they aren't changing constantly), but it requires you to introduce Observer code into possibly lightweight classes. Java provides a utility interface for Observable, but you'll still need to do most of the work.
You can store a version field in this object, whenever the object changed it should update its version field (increment it), you can then compare the version field with the serialized object version field

Using UUID as business key and equals/hashmethod

I am working on a new application and I need some help on how to implement the equals method and the hashCode method. I have been reading many questions already asked here on SO, and I have also read several blog posts that has lead me to this question.
A little technical information first: I am using JPA (EclipseLink) and the application is for Java EE.
After what I have read you should use immutable values for hashCode and equals, but since the fields in the class is usually modifiable you can't use them. Nor can you use the primary key (JPA) because you won't have one before you have persisted it. So what I am thinking about is to use UUID. Both for equals and hashCode, but I have never done that before so I wonder if somebody thinks this is bad (why?) and possible downsides (apart from the tiny tiny tiny chance of getting the same ID)? Using a UUID and asign it in the constructor will give all objects a business ID from the very start. And I will make it immutable and save it to the database.
Is this approach bad?
IMO the UUID will work just fine and i would recommend doing so.
I can't find any drawbacks to this approach since the possibility of hitting 2 same values is infinitely small.

Issues with using objects as Map keys with Java

Given an object we will call loc that simply holds 2 int member values, I believe I need to come up with a mechanism to generate a hashcode for the object. What I tried below doesn't work as it uses an object reference, and the 2 references will be different despite having the same members variables.
Map<Loc,String> mapTest = new HashMap<Loc,String>();
mapTest.put(new Loc(1,2), "String 1");
mapTest.put(new Loc(0,1), "String 2");
mapTest.put(new Loc(2,2), "String 3");
System.out.println("Should be String 2 " + mapTest.get(new Loc(0,1)));
After some reading it seems I need to roll my own hashcode for this object, and use that hashcode as the key. Just wanted to confirm that I am on the right track here, and if someone could guide me to simple implementations to look at that would be excellent.
Thanks
Yes, you need to override equals() and hashCode() and they need to behave consistently (that is, equal objects have to have the same hash code). No you do not use the hash coe directly; Map uses it.
Yes, you're on the right track.
See articles like this for more details.
There are a lot of different ways to implement a hashcode, you'll probably just want to combine the hashcodes of each integer primitive.
Writing correct equals and hashcode methods can be tricky and the consequences of getting it wrong can be subtle and annoying. If you are able to, I would use the apache commons-lang library and take advantage of the HashCodeBuilder and EqualsBuilder classes. They will make it much easier to get the implementations right. The benefit of using these libraries is that it is much harder to get the boiler plate wrong, they hide the visual noise these methods tend to create and they make it harder for someone to come a long later and mess it up. Of course another alternative is to let your IDE generate those methods for you which works but just creates more of the noisy code vomit Java is known for.
If you want to use your type as a key type in a map, it's essential that it provides sane implementations of equals and hashCode. Fortunately, you don't have to write these implementations manually. Eclipse (and I guess other IDEs as well) can generate this boilerplate for you. Or you can even use Project Lombok for that.
Ideally the object to be used as a key in a map should be immutable. This can save you from many bugs led to by the equality issues in the context of mutation.
You need to implement both hashCode() and equals(). Joshua Bloch's Effective Java should be the definitive source on the "how" part of your question, and I'm not sure if it's okay to reproduce it here, so I'll just refer you to it.

What is the reason behind Enum.hashCode()?

The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.
Defining it e.g. as ordinal() ^ getClass().getName().hashCode() would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?
Summary of the answers
Using Object.hashCode() compares to a nicer hashCode like the one above as follows:
PROS
simplicity
CONTRAS
speed
more collisions (for any size of a HashMap)
non-determinism, which propagates to other objects making them unusable for
deterministic simulations
ETag computation
hunting down bugs depending e.g. on a HashSet iteration order
I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.
UPDATE
I was curious about the speed and wrote a benchmark with surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.
The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.
UPDATE 2
There are some strange things going on with the hashCode performance in general. When I understand them, there's still the open question, why System.identityHashCode (reading from the object header) is way slower than accessing a normal object field.
The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.
First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.
The reason for letting enums hashCode be implemented as Objects hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.
You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.
Why did they choose to solve it like this?
Well, look at the requirements of the hashCode implementation. The main concern is to make sure that each object should return a distinct hash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.
I think that the reason they made it final is to avoid developers shooting themselves in the foot by rewriting a suboptimal (or even incorrect) hashCode.
Regarding the chosen implementation: it's not stable across JVMs, but it's very fast, avoid collisions, and doesn't need an additional field in the enum. Given the normally small number of instances of an enum class, and the speed of the equals method, I wouldn't be surprised if the HashMap lookup time was bigger with your algorithm than with the current one, due to its additional complexity.
I've asked the same question, because did not saw this one. Why in Enum hashCode() refers to the Object hashCode() implementaion, instead of ordinal() function?
I encountered it as a sort of a problem, when defining my own hash function, for an Object relying on enum hashCode as one of the composites. When checking a value in a Set of Objects, returned by the function, I checked them in an order, which I would expect it to be the same, since the hashCode I define myself, and so I expect elements to fall at the same nodes on the tree, but since hashCode returned by enum changes from start to start, this assumption was wrong, and test could fail once in a while.
So, when I figured out the problem, I started using ordinal instead. I am not sure everyone writing hashCode for their Object realize this.
So basically, you can't define your own deterministic hashCode, while relying on enum hashCode, and you need to use ordinal instead
P.S. This was too big for a comment :)
The JVM enforces that for an enum constant, only one object will exist in memory. There is no way that you could end up with two different instance objects of the same enum constant within a single VM, not with reflection, not across the network via serialization/deserialization.
That being said, since it is the only object to represent this constant, it doesn't matter that its hascode is its address since no other object can occupy the same address space at the same time. It is guaranteed to be unique & "deterministic" (in the sense that in the same VM, in memory, all objects will have the same reference, no matter what it is).
There is no requirement for hash codes to be deterministic between JVMs and no advantage gained if they were. If you are relying on this fact you are using them wrong.
As only one instance of each enum value exists, Object.hashcode() is guaranteed never to collide, is good code reuse and is very fast.
If equality is defined by identity, then Object.hashcode() will always give the best performance.
The determinism of other hash codes is just a side effect of their implementation. As their equality is usually defined by field values, mixing in non-deterministic values would be a waste of time.
As long as we can't send an enum object1 to a different JVM I see no reason for putting such a requirements on enums (and objects in general)
1 I thought it was clear enough - an object is an instance of a class. A serialized object is a sequence of bytes, usually stored in a byte array. I was talking about an object.
One more reason that it is implemented like this I could imagine is because of the requirement for hashCode() and equals() to be consistent, and for the design goal of Enums that they sould be simple to use and compile-time constant (to use them is "case" constants). This also makes it legal to compare enum instances with "==", and you simply wouldn't want "equals" to behave differntly from "==" for enums. This again ties hashCode to the default Object.hashCode() reference-based behavior.
As said before, I also don't expect equals() and hashCode() to consider two enum constants from different JVM as being equal. When talking about serialization: For instance fields typed as enums the default binary serializer in Java has a special behaviour that serializess only the name of the constant, and on deserialization the reference to the corresponding enum value in the de-serializing JVM is re-created. JAXB and other XML-based serialization mechanisms work in a similar way. So: just don't worry

Should I be concerned about this compareTo/equals/hashCode implementation?

I'm in the middle of QA'ing a bunch of code and have found several instances where the developer has a DTO which implements Comparable. This DTO has 7 or 8 fields in it. The compareTo method has been implemented on just one field:
private DateMidnight field1; //from Joda date/time library
public int compareTo(SomeObject o) {
if (o == null) {
return -1;
}
return field1.compareTo(o.getField1());
}
Similarly the equals method is overridden and basically boils down to:
return field1.equals(o.getField1());
and finally the hashcode method implementation is:
return field1.hashCode;
field1 should never be null and will be unique across these objects (i.e. we shouldn't get two objects with the same field1).
So, the implementations are consistent which is good, but should I be concerned that only one field is used? Is this unusual? Is it likely to cause problems or confuse other developers? I'm thinking of the scenario where a list of these objects are passed around and another developer uses a Map or Set of somesort and gets unusual behaviour from these objects. Any thoughts appreciated. Thanks!
I suspect that this is a case of "first use wins" - someone needed to sort a collection of these objects or put them in a hash map, and they only cared about the date. The easiest way of implementing that was to override equals/hashCode and implement Comparable<T> in the way you've said.
For specialist sorting, a better approach would be to implement Comparator<T> in a different class... but Java doesn't have any equivalent class for equality testing, unfortunately. I consider it a major weakness in the Java collections, to be honest.
Assuming this really isn't "the one natural and obvious comparison", it certainly smells in terms of design... and should be very carefully document.
Strictly speaking, this violates the Comparable spec:
http://download.oracle.com/javase/6/docs/api/java/lang/Comparable.html
Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.
Similarly, it looks like the equals method will throw NPE on equals(null) instead of returning false (unless of course you "boiled" out the null handling code).
Is it likely to cause problems or confuse other developers?
Possibly, possibly not. It really depends on how large your project is and how widespread/"reusable"/long-lived your object source code is expected to be used:
Small/short-lived/limited use == probably not a problem.
Large/long-lived/widespread use == counter-intuitive implementation may cause future problems
You shouldnt be concerned with it, if field1 is really unique. If it`s not, you may have problems. Anyway, my advise is to do some unit tests. They should show the truth.
I don't think you need to be concerned. The contract between the three methods is kept and it's consistent.
Whether it's correct from a business logic point of view is a different question.
If e.g. field1 maps to a primary key in the database it's perfectly valid. If field1 is the "firstname" of a person, I would be concerned

Categories