I would like to use a clean/automatic way to check if a Java Object has been modified.
My specific problem is the following:
In my Java application, I use XStream library to deserialize XML to Java Objects, then the user can modify or change them. I'd like a way to check if these Objects in memory are at some point different from the serialized ones, so I can inform the user and ask him if he want to save the changes (i.e. serialize using XStream) or not.
In my application there are many Objects and are quite complex.
Please consider that I don't use databases in my application, so I'm not interested in solutions like using hibernate.
Two approaches:
Implement a hashcode for your objects, and compare the hashcode of the in-memory objects against the hashcode of the serialized objects to see if they've been changed. This is has a low impact on your class design, but performance will go down as O(n^2) as the number of objects increases. Note that two objects might return the same hashcode, but a good hashing implementation will make this very unlikely. If you are concerned about this, implement and use your own equals() method.
Have your objects implement the Observer pattern and have each setter method, or any other method that modifies the object, notify the observer when it's called. Performance will be better for large numbers of objects (as long as they aren't changing constantly), but it requires you to introduce Observer code into possibly lightweight classes. Java provides a utility interface for Observable, but you'll still need to do most of the work.
You can store a version field in this object, whenever the object changed it should update its version field (increment it), you can then compare the version field with the serialized object version field
Related
I have a simple pojo class with 20+ fields and performance-sensitive piece of code. To reduce allocations I reuse the same instance of the class.
How to clean-up the object the most performant (unsafe) way?
If I'm not mistaking fields data is stored as a continuous sequence of bytes so I expect that there must be something as fast as System.arraycopy.
The class itself is a part of a stable API and not a subject to modify.
Calling 20 simple setters is a cheap operation. It takes just a several nanoseconds. I'm pretty sure it's not the thing that worth optimizing.
Setting fields one by one in a straightforward way is already enough optimized. In theory, it is possible to clear an object a little bit faster with SIMD instructions, but there is no way to do it in Java.
There is a method Unsafe.setMemory, but it works only for primitive arrays. This limitation is quite understood: it's not valid to clear an object with reference fields with a bulk operation, because different GCs might need to track updates to reference fields individually.
If you look at Arrays.fill implementation, it uses a simple loop that stores elements one by one, and the method is not even a JVM instrinsic for above reasons.
As already mentioned the reasoning behind it is highly questionable.
We don't want to use reflection since it's known to be quite slow. Would you consider using a Java bytecode manipulation library (Javassist, Bytebuddy, ..) which allows you to modify the Pojo class and add an additional method which will directly set all the fields to null.
Assuming you want to nullify all fields at once always, then the easiest way would be to have an internal object that holds all the values. Have your getters and setters be light wrappers around the fields of that object. Then instead of nullifying the fields individually, just refresh the inner object with a new instance where the fields start as null.
Assuming we have an object inside an object, inside another object, what is the best way to retrieve the value of a private variable outside the two objects?
The simplest way seems to be to do something like this:
object1.object2.object3.getvalue();
Is this acceptable? Or would it be better to call a method which calls a method, which calls a method?
The second option seems unnecessarily laborious, considering you would basically be having the same method created in 3 different classes.
use getter to get any object
ex: Object obj = object1.getObject2().getObject3();
It depends on your definition of "acceptable". It may be acceptable in your case. It is hard to tell without proper context.
However, there are something you may consider, level-by-level:
1. Use of getters
Although such kind of getters are still far from satisfactory, it is still better than using direct property access
i.e. Instead of accessing object1.object2 by direct field access, provide Object2 getObject2() in Object1, so that the code looks like:
object1.getObject2().getObject3().getValue()
2. Null handling
Usually when we chained such kind of property navigation, we will have problem that in some level, null is returned, which makes object1.getObject2().getObject3().getValue() throwing NPE.
If you are using Java 8, consider returning Optional<>. e.g. in Object1, getter of object2 should look like Optional<Object2> getObject2()
With such change, your code can be made null-safe by something like:
Value value = object1.getObject2()
.flatMap(Object2::getObject3)
.map(Object3::getValue)
.orElse(Value.emptyValue())
3. Law of Demeter
In order to make a more loosely-coupled design, you may want to provide access to that value in API of Object1, instead of exposing multiple levels of indirection. Hence:
Value value = object1.getFooValue();
(Keep using Optional<> if it fit your need)
for which internally it retrieve the value from Object3. (Of course, Object2 may also want to do something similar)
4. Getter is evil
Always remember you should try to avoid providing internal representation of your object. Your objects should provide meaningful behavior instead of simply act as a value object for you to get or set data. It is hard to give an example here but ask yourself, why do you need to get the value for? Is that action more appropriate to be provided by your object itself?
The best way is to not think of your objects as data stores. A class should be defined to have some work to do, some cluster of related responsibilities. In order to perform that work to fulfill those responsibilities some internal data may be kept, and some nested objects contained. Serving out data should not be the goal of your objects, generally speaking.
Encapsulation
The whole idea of encapsulation in object-oriented programming is to not expose that internal data and nested objects. Instead publish the various available chores by declaring methods on your higher/outer object. Encapsulation frees you to change those internals without breaking the outside calling code – avoiding fragility is the goal.
For example, an Invoice object can contain a collection of LineItem objects. In turn each LineItem object contains other objects for product, quantity, price, extended cost, taxability, tax rate, tax amount, and line cost. If you want to know the total amount of sales tax added across the items, instead of asking the Invoice for the LineItem, and then asking the LineItem for TaxAmount object, define this chore as a method on Invoice, getTotalTaxAmount. Let that method figure out (and keep to itself!) how to go through the contained objects to collect the relevant information.
If you absolutely must expose that nested data, again define a method at the highest level that returns a copy of the desired data or a collection of the desired objects (probably copies of those objects). Again, the goal is to avoid exposing the objects within objects within objects.
Then, within that highest method, as the correct Answer by Raaga stated, define a getter that calls a getter.
Getter Methods versus Direct Member Access
In a very simple structure of data you could access the objects directly. But generally better to use getter methods. Again the reason is encapsulation. Having a getter method allows you the flexibility of redefining the implementation details of the stored data.
For example, presently you could store the "Sex" variable as a String with values of "F" or "M". But later you may decide to take advantage of Java's nifty enum feature. So you replace those single-character "F" & "M" strings with enum instances Sex.FEMALE and Sex.MALE. Having a getter provides a level of insulation, so the Strings can be replaced internally with enums. The getter method continues to return a String (and internally translating the enum to an "F" or "M" String to be returned). This way you can work on restructuring your class without breaking those dependent outside objects.
object1.object2.object3.getvalue();
This chaining seems incorrect...Object chaining under such scenario is always object1.someMethod().someOtherMethod(). Or something like suggested above in an answer using getter object1.getObject2().getObject3().
I hope it helps.
What you described may be the simplest way (if object2 and object3 are accessible) but it is definitely not the way to go. As Raaga pointed out getters are a lot better to retrieve members of a class and these members should then be private or protected to prevent errors.
If you can do
object1.object2.object3.getvalue();
you can also do something like
object1.object2 = null;
which is most likely not what you want to allow. This is one of the basic concepts of object oriented programming. Classes should handle their implementation details / secrets and not directly offer them to the outside! This is what getters/setters are for.
This way you have more control over the access and what can be done and what can't. If you should only be able to retrieve object2 from object1 but not be able to change it, you can only offer a getter and no setter.
If you should also be able to change it, it is also better to use setter for more control, because you can do checking in your setter to prevent my example where I put a null pointer as your object2
And just in case you worry about efficiency that calling a method might not be as efficient as directly accessing a member, you can rely on Java to internally optimize your method call that it is not any slower than the direct access.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Use cases for IdentityHashMap
What could be a practical use of the IdentityHashMap introduced in Java 5?
Have a look at the Java Docs :-)
A typical use of this class is topology-preserving object graph
transformations, such as serialization or deep-copying. To perform
such a transformation, a program must maintain a "node table" that
keeps track of all the object references that have already been
processed. The node table must not equate distinct objects even if
they happen to be equal. Another typical use of this class is to
maintain proxy objects. For example, a debugging facility might wish
to maintain a proxy object for each object in the program being
debugged.
On a side note: it's available since version 1.4, not Java 5 or 6...
For adding dynamic fields to objects.
Some language directly support dynamic fields: anybody can add any field to any object any time.
This is handy when you want to associate some information to objects, unforseenable by object designer.
Java doesn't have real dynamic field. We can simulate it by using an identity map to associate an object to some information of some kind.
WeakHashMap is better for the purpose; it is an identity map too, and it doesn't add additional strong reference to the object. So it is much closer to the dynamic field concept.
Concurrency is the remaining problem. If two threads accessing the same dynamic field of two different objects, there shouldn't be dependencies among two threads. We can solve it by some kind of concurrent weak hashmap. However the performance isn't ideal compared to normal field access.
Think about java.lang.ThreadLocal, adding dynamic field to threads; and java.lang.ClassValue, adding dynamic field to classes. They aren't strictly necessary - we can achieve the same thing with concurrent weak maps. They exist for performance reason. JDK can "hack" into Thread/Class to add supports to achieve faster lookup.
When serializing mutable objects you want to keep track of the objects you have serialized and their reference id. You cannot use equality as you cannot trust mutable objects to use identity checks for equals and to not change. e.g. Date is mutable and equals compares contents.
Used rarely. It implements Map interface but used in rare cases wherein reference-equality semantics are required.
After reading this article , im bending toward not overriding equals() and hashCode() altogether.
In the summary of that article, concerning the no eq/hC at all column, the only consequence is that i couldnt do the comparison operations like :
contains() in a List for detached entities, or
compare the same entities from different sessions
and expect the correct result.
But im still in doubt and would like to ask your experiences about this whether it is a bad practice to skip equals and hashCode altogether and what other consequences that i still dont know for now.
Just another point of information, im bending towards using List Collections over Set. And my assumption is that i dont really need to override hashCode and equal when storing in a List.
Read this very nice article on the subject: Don't Let Hibernate Steal Your Identity.
The conclusion of the article goes like this:
Object identity is deceptively hard to implement correctly when
objects are persisted to a database. However, the problems stem
entirely from allowing objects to exist without an id before they are
saved. We can solve these problems by taking the responsibility of
assigning object IDs away from object-relational mapping frameworks
such as Hibernate. Instead, object IDs can be assigned as soon as the
object is instantiated. This makes object identity simple and
error-free, and reduces the amount of code needed in the domain model.
whether it is a bad practice to skip equals and hashCode altogether
Yes. You should always override your equals and hashCode. Period. The reason is that this method is present already in your class, implemented in Object. Turns out that this implementation is generic, and nearly 100% of the times it's a wrong implementation for your own objects. So, by skipping equals/hashCode you are in fact providing a wrong implementation and will (in the best case scenario) confuse whoever uses these classes. It may be your colleagues, or it may be some framework you are using (which can lead to unpredictable and hard-to-debug issues).
There's no reason to not implement these methods. Most IDEs provides a generator for equals/hashCode. You just need to inform the IDE about your business key.
You got the exact opposite conclusion from that article of what it was trying to convey.
Hibernate heavily relies on equals being implemented properly. It will malfunction if you don't.
In fact, almost everything does; including standard java collections.
The default implementation does not work when using persistence. You should always implement both equals and hashcode. There's a simple rule on how to do it, too:
For entities, use the key of the object.
For value objects, use the values
Always make sure the values you use in your equals/hashcode are immutable. If you pass these out (like in a getter), preferably pass them out in an immutable form.
This advice will improve your life :)
I want to write an object into a stream (or byte array) with its transient attributes to be able to reconstruct it in another VM. I don't want to modify its attributes because that object is a part of legacy application.
Standard Java serialization mechanism doesn't help. What other options do I have?
Update:
The reason I'm asking the question is that I want to modify an existing Spring application. It called a bean's method in-process earlier but now I want to move the bean on a separate machine and use Spring remoting through HTTP invoker. And I have a problem with parameters that have transient fields that need to be passed to this method but not needed to be serialized in other parts of the app.
Hmm - if an attribute is marked as transient, that means exactly that it's not mean to be considered part of the object's persistent state, e.g. for serialization. The fact that you want to do this at all is a code smell, and the correct solution is to stop those fields being transient.
Let's say though that for whatever reason you can't modify the target classes themselves. My first thought was that you could customise the serialisation by implementing readObject() and writeObject() methods, but that would also require changes to the target class.
In that case, you'll need to work with some kind of reflection-based or metadata-based API in order to do this. There are many libraries that will convert objects to and from XML or JSON or DB rows, etc. Your best bet would be to use one of these to convert the object to and from "hydrated" form (and likely you'll need to customise them, as any sane serialiser will ignore transient fields). Which one to pick depends on your current software stack, and your precise requirements.
I assume you cannot change the legacy code. In this case I think you will have to resort to going over the object fields with reflection and DataOutputStream.
transient variables are supposed to be those that aren't serializable or are easily recalculated.
My first suggestion is to look for methods on this object to recalculate the transient fields.