Why serialVersionUID is not enforced by Java

Why serialVersionUID is not enforced by Java - java

serialVersionUID seems to me to have a very loose specification. Any body developing a class with serializable interface can burn his hands with InvalidClassException. Also, i saw a fellow dev using int as a datatype rather than long.
So, my question why the Java designers and developers did not do something more concrete like whenever we implement serializable we have to implement a method setting serialVersionUID like the hashcode method.
So, what am i missing ? Does not seem like a double edged sword to me.
Update 1:
My bad i gave a wrong example comparing a class level thing with a instance level thing.I have seen many production issues across different companies because of this. But my general idea was cant it be made more strict by compiler in any way.

Keep in mind the requirements:
the bytes making up the ID must be present within the stream of bytes representing a serialized object
unless you invent something "on top", the only way to get there is by using a field
the ID must be identical for all objects of a class, thus the source of the ID should be static
Now step back:
there is no polymorphism for static methods
in that sense, it doesn't make a difference if you use a static field or call a static method to acquire the ID bytes when serializing an object
but you are going to write those bytes into the stream anyway
Conclusion: using a static field addresses all the above points
. When you would be using a method - you still have to create that field. Let that sink in: when using a method, the implementation must call the method, write that as field into the byte stream - and when reading the byte stream, that number needs to be consumed - as it can't be mapped to a "real" field in the class. Compare that to: there is a static field that gets written into the byte stream and read from there.
So - from a technical point of view, the natural way to integrate such in ID somehow is by asking for a static field on the class.
Of course you could have invented something else. Modern day java might have used annotations. But such usage of annotations wasn't around when Java was invented (and this kind of serialization is in Java since day 1).
And then: modern day java doesn't use "byte stream" oriented serialization in the first place.
Or as the answer from Stephen suggest - you could compute that ID based on the current content of the class. But again: this technology was invented 20 years ago. Back then, computing the id might have cost you 1, 5, 10 seconds. Compare that to the efforts of reading a static field.

So, my question why the Java designers and developers did not do something more concrete like whenever we implement serializable we have to implement a method setting serialVersionUID like the hashcode method.
If I understand you correctly, you are saying that there should be a method to calculate the serialVersion. I can see problems with this:
How would an application program calculate a decent serial version? Note that it must be calculated from type information ... not the values of fields.
Calculating a serial version at run time (repeatedly) would be expensive.
If we have lazy programmers who don't use the "serialver" tool to calculate a proper version number, the same lazy programmers are likely to implement this (hypothetical) method to return a bogus number ... and we are back where we started.
There are use-cases where NOT changing the serial version ID is intentional.
No. The real solution to the problem of lazy programmers is code reviews.
Or ... maybe ... tooling to either regenerate the serialVersion constants, or flag them as suspicious.
Note that tools can't pick all of the cases where there is going to be a problem. However, neither can a code-reviewer. And code-reviewers get tired, have to be paid, etc.
(Finally, it might be a good idea if certain IDE's didn't offer setting the constant to -1 as a quick-fix ...)
But my general idea was cant it be made more strict by compiler in any way.
It is not a compiler thing. It is a runtime behavior of a specific set of library classes. The classes use Unsafe ... but the compiler is not aware of the significance of the serialVersionID variable.

Why the Java designers and developers did not [make us] have to
implement a method setting serialVersionUID
The serialVersionUID is data related to the class. It's not data related to a single instance of the class.
If we had an interface such as:
interface MySerializable
{
long getSerialVersionUID();
}
then we could implement that like this:
class Foo implements MySerializable
{
private String myField;
//...
public long getSerialVersionUID()
{
if(myField.equals("hello"))
{
return 1L;
}
else
{
return 2L
}
}
}
This doesn't make sense. The version cannot depend on the instance.
Now that we have annotations, a good solution in my eyes would be to write a #Serializable annotation. I prefer this because anything related to serialization is really just metadata. Adding an additional field completely unrelated to the class's behaviour just muddies the water.
#Serializable(ID = "1")
class Foo
{
//...
}
Of course, annotations are a more recent addition so this wasn't an option when serialVersionUID was devised.

Related

Not serializable class with strings only [duplicate]

We work heavily with serialization and having to specify Serializable tag on every object we use is kind of a burden. Especially when it's a 3rd-party class that we can't really change.
The question is: since Serializable is an empty interface and Java provides robust serialization once you add implements Serializable - why didn't they make everything serializable and that's it?
What am I missing?

Serialization is fraught with pitfalls. Automatic serialization support of this form makes the class internals part of the public API (which is why javadoc gives you the persisted forms of classes).
For long-term persistence, the class must be able to decode this form, which restricts the changes you can make to class design. This breaks encapsulation.
Serialization can also lead to security problems. By being able to serialize any object it has a reference to, a class can access data it would not normally be able to (by parsing the resultant byte data).
There are other issues, such as the serialized form of inner classes not being well defined.
Making all classes serializable would exacerbate these problems. Check out Effective Java Second Edition, in particular Item 74: Implement Serializable judiciously.

I think both Java and .Net people got it wrong this time around, would have been better to make everything serializable by default and only need to mark those classes that can't be safely serialized instead.
For example in Smalltalk (a language created in 70s) every object is serializable by default. I have no idea why this is not the case in Java, considering the fact that the vast majority of objects are safe to serialize and just a few of them aren't.
Marking an object as serializable (with an interface) doesn't magically make that object serializable, it was serializable all along, it's just that now you expressed something that the system could have found on his own, so I see no real good reason for serialization being the way it is now.
I think it was either a poor decision made by designers or serialization was an afterthought, or the platform was never ready to do serialization by default on all objects safely and consistently.

Not everything is genuinely serializable. Take a network socket connection, for example. You could serialize the data/state of your socket object, but the essence of an active connection would be lost.

The main role of Serializable in Java is to actually make, by default, all other objects nonserializable. Serialization is a very dangerous mechanism, especially in its default implementation. Hence, like friendship in C++, it is off by default, even if it costs a little to make things serializable.
Serialization adds constraints and potential problems since structure compatibility is not insured. It is good that it is off by default.
I have to admit that I have seen very few nontrivial classes where standard serialization does what I want it to. Especially in the case of complex data structures. So the effort you'd spend making the class serializble properly dwarves the cost of adding the interface.

For some classes, especially those that represent something more physical like a File, a Socket, a Thread, or a DB connection, it makes absolutely no sense to serialize instances. For many others, Serialization may be problematic because it destroys uniqueness constraints or simply forces you to deal with instances of different versions of a class, which you may not want to.
Arguably, it might have been better to make everything Serializable by default and make classes non-serializable through a keyword or marker interface - but then, those who should use that option probably would not think about it. The way it is, if you need to implement Serializable, you'll be told so by an Exception.

I think the though was to make sure you, as the programmer, know that your object my be serialized.

Apparently everything was serializable in some preliminary designs, but because of security and correctness concerns the final design ended up as we all know.
Source: Why must classes implement Serializable in order to be written to an ObjectOutputStream?.

Having to state explicitely that instances of a certain class are Serializable the language forces you to think about if you you should allow that. For simple value objects serialization is trivial, but in more complex cases you need to really think things through.
By just relying on the standard serialization support of the JVM you expose yourself to all kinds of nasty versioning issues.
Uniqueness, references to 'real' resources, timers and lots of other types of artifacts are NOT candidates for serialization.

Read this to understand Serializable Interface and why we should make only few classes Serializable and also we shopuld take care where to use transient keyword in case we want to remove few fields from the storing procedure.
http://www.codingeek.com/java/io/object-streams-serialization-deserialization-java-example-serializable-interface/

Well, my answer is that this is for no good reason. And from your comments I can see that you've already learned that. Other languages happily try serializing everything that doesn't jump on a tree after you've counted to 10. An Object should default to be serializable.
So, what you basically need to do is read all the properties of your 3rd-party class yourself. Or, if that's an option for you: decompile, put the damn keyword there, and recompile.

There are some things in Java that simply cannot
be serialized because they are runtime specific. Things like streams, threads, runtime,
etc. and even some GUI classes (which are connected to the underlying OS) cannot
be serialized.

While I agree with the points made in other answers here, the real problem is with deserialisation: If the class definition changes then there's a real risk the deserialisation won't work. Never modifying existing fields is a pretty major commitment for the author of a library to make! Maintaining API compatibility is enough of a chore as it is.

A class which needs to be persisted to a file or other media has to implement Serializable interface, so that JVM can allow the class object to be serialized.
Why Object class is not serialized then none of the classes need to implement the interface, after all JVM serializes the class only when I use ObjectOutputStream which means the control is still in my hands to let the JVM to serialize.
The reason why Object class is not serializable by default in the fact that the class version is the major issue. Therefore each class that is interested in serialization has to be marked as Serializable explicitly and provide a version number serialVersionUID.
If serialVersionUID is not provided then we get unexpected results while deserialzing the object, that is why JVM throws InvalidClassException if serialVersionUID doesn't match. Therefore every class has to implement Serializable interface and provide serialVersionUID to make sure the Class presented at the both ends is identical.

How should Number be extended for classes without a no-args constructor?

I have implemented a few Java classes which extend the abstract java.lang.Number class. I have no immediate need for serializing objects of these classes. However, I do want to provide the rest of the Number contract for these classes which represent "numbers." The trouble is that java.lang.Number implements Serializable. As such, my classes are supposed to provide public default (i.e. no-args) constructors -- my IDE complains, but the compiler will still compile my classes. Fine, but providing public default constructors for "immutable" objects requires providing a default value when the constructor is invoked for any reason other than serialization -- ignore for the moment that these classes return objects from static factory methods and expose no public constructors now. Well, zero is a fine default in many cases, but natural numbers -- i.e. positive integers -- do not include zero in their domain and no single number is any more "special" than any other...O.K...."one" is always "special"...
Etc., etc., etc....
I did look into how BigDecimal handles Number and Serializable in an effort to determine the "right" way to address this question. However, both the JavaDoc and the source code I have been able to examine reveals BigDecimal does not provide a "no-args" constructor despite having extended Number. Realizing that:
Just because Sun Microsystems/Oracle implemented it that way doesn't make it "right."
I am back to the basic question:
What is the "right" way to extend java.lang.Number? If providing a "no-args" constructor is just another Java convention following the rule:
It's not a law, just a good idea...
Is the best answer to avoid the warts by ignoring the "convention?" If so, how can I satisfy an IDE -- Intellij, in particular -- and any Java-to-other-language-or-environment translator which might choose to be more strict than the Java compiler when Serializable raises its ugly head?

Well, there's always good ol 'NaN' -- Not a number. If you can represent it, that is.

My opinion is that one could forget about being compatible with Java's built-in serialization after looking at the benchmarks. It's 8x slower than textual Jackson and seems being just outdated.

You can also make the empty constructor protected, this way serialization still works. What Sun did for BigDecimal was providing readObject and writeObject methods , see here for further details.

Is it OK to create a Constants class with large number of static fields?

Let's say I have a constants class containing 200+ static fields :
class AnimalConstants {
static final int AARDVARK = 1;
static final int ANTELOPE = 2;
static final int BEAR = 3;
...
...
static final int ZEBRA = 200;
}
Can anyone explain if there are any negative impact on performance and memory from using such classes.
Would it be better or worse if the class is changed to an interface (e.g. SwingConstants) and being implemented by some classes?
Would it be better or worse if I implement the constants as an Enum class?

I don't think the impact is performance or memory.
I think it has to do with things like readability of code, keeping constants close to where they're used, and fundamental understanding of your problem.
I prefer to declare constants closer to where they're used. I believe they're easier to understand that way.
I would be surprised if the real constants that you claim number 200+ are truly related. If they are, they belong together in one place. If not, I'd say that they should be broken into smaller pieces and declared closer to where they're used.
I'll bet there's more context than your contrived example that would change responses if known.
Sure , enums are great. But see my other comments first.

Of course enum implementation is more ponderous than bunch of int constants but using enum:
you don't need to hardcode actual values of Animals (in your case) that can change later
you don't need to hardcode total number of Animals and you can simply iterate through all animals
methods with parameter of this enum will be understood correctly (foo(Animal animal) is better than foo(int animal))
you can add additional functionality to your enum values later, e.g. internal value isMammal

Would it be better or worse if the class is changed to an interface (e.g. SwingConstants) and being implemented by some classes?
--> That would be a Constant Interface Pattern. If we use interfaces for constant and it is implemented by all classes but if you are developing an API, it is something like you are exposing your implementation details. Above wiki link explains this very well.
In both approach(Interface or Class) I would suggest using final class, create constants and do static import for constants wherever necessary.
Would it be better or worse if I implement the constants as an Enum class?
--> With Enums, this would be the best approach.

Changing any value that has already been compiled into another class may require a full build.
Addendum: See Is it possible to disable javac's inlining of static final variables? for a more thorough examination.

Yes it is okay to create a large number of constants. It is hard to discuss negative impact because we don't know any alternatives because we don't have your functional requirements.
But be assured that the compiler is written to work well with code written by humans. Having a bunch of fields is probably going to be okay.
I feel that constants can be very nice as it can be used in switch case since JDK7, you can compare with == and the variable name can be informative.
Can enum be even better? Yes it can. Explore the features of enums and see if anything is appealing to you

For your kind of vars (Animal Types) i suggest you to use an Enumerator instead of a class. With the number of vars using it shouldn't be a problem for performance as you're only using int primitive. The problem would have occurred if any var has been a class, that are more memory demanding to maintain their structure. I hope to have clarified your doubt (Sorry for the poor english, i'm a little rusted)

How do I generate the source code to create an object I'm debugging?

Typical scenario for me:
The legacy code I work on has a bug that only a client in production is having
I attach a debugger and figure out how to reproduce the issue on their system given their input. But, I don't know why the error is happening yet.
Now I want to write an automated test on my local system to try and reproduce then fix the bug
That last step is really hard. The input can be very complex and have a lot of data to it. Creating the input by hand (eg: P p = new P(); p.setX("x"); p.setY("x"); imagine doing this 1000 times to create the object) is very tedious and error prone. In fact you may notice there's a typo in the example I just gave.
Is there an automated way to take a field from a break point in my debugger and generate source code that would create that object, populated the same way?
The only thing I've come up with is to serialize this input (using Xstream, for example). I can save that to a file and read it back in in an automated test. This has a major problem: If the class changes in certain ways (eg: a field/getter/setter name is renamed), I won't be able to deserialize the object anymore. In other words, the tests are extremely fragile.

Java standard serialisation is well know to be not very usefull when objects change their version ( content, naming of fields). Its fine for quick demo projects.
More suitable for your needs, is the approach that objetcs support your own (binary) custom serialisation:
This is not difficult, use DataOutputStream to write out all fields of an object. But now introduce versiong, by first writing out a versionId. Objects that have only one version, write out versionId 1. That way you can later, when you have to introduce a change in your objetcs, remove fields, add fields, raise the version number.
Such a ICustomSerializable will then first read out the version number from the input stream, in a readObject() method, and depending on the version Id call readVersionV1() or e.g readVersionV2().
public Interface ICustomSerializable {
void writeObject(DataOutputStream dos);
Object readObject(DataInputStream dis);
}
public Class Foo {
public static final VERSION_V1 = 1;
public static final VERSION_V2 = 2;
public static final CURRENT_VERSION = VERSION_V2;
private int version;
private int fooNumber;
private double fooDouble;
public void writeObject(DataOutputStream dos) {
dos.writeInt(this.version);
if (version == VERSION_V1) {
writeVersionV1(dos);
} else (version == VERSION_V2) {
writeVersionV2(dos);
} else {
throw new IllegalFormatException("unkown version: " + this.version);
}
}
public void writeVersionV1(DataOutputStream dos) {
writeInt(this.fooNumber);
writeDouble(this.fooValue);
}
}
Further getter and setter, and a constructor with initialised the version to CURRENT_VERSION is needed.
This kind of serialisazion is safe to refactoring if you change or add also the appropriate read and write version. For complex objects using classes from external libs not und your controll, it can be more work, but strings, lists are easily serialized.

I think what you want to do is store the "state", and then restore that in your test to ensure the bug stays fixed.
Short answer: There is afaik no such general code generation tool, but as long as several constraints are kept, writing such a tool is small work.
Long Comment:
There are constraints under which that can work. If everything is just beans with getter and setter for all the fields you need, then generating code for this is not so difficult. And yes that would be safe to renaming if you refactor the generated code along with the normal code. If setter are missing, then this approach will not work. And that is only one example of why this is no general solution.
Refactoring can also for example move fields to other classes. How do you want to introduce the values from the other fields of that class? How can you later know if they that altered your saved state still reflects the critical data? Or worse, imagine the refactoring gives the same field a different meaning than before.
The nature of the bug itself is also a constraint. Imagine for example the bug happened because a field/method had this and that name. If a refactoring now changes the name the bug will not appear anymore regardless your state.
Those are just arbitrary examples, that may have exactly nothing to do with your real life cases. But this is a case to case decision, not a general strategy. Anyway, if you know your code the bug and your refactorings are all well behaving enough for this, then making such a tool is done in less than day, probably much less.
With xstream you would partially get this as well, but you would have to change the xml yourself. If you used for example db4o you would have to tell it that this and that field has now this and that name.

When do I have to change the serialVersionUID?

I know that I can use serialVersionUID to control the version of classes. And I read that I can then add or remove fields and the class will still be compatible, it will just use default values.
When must I change the serialVersionUID?

The value of the serialVersionUID field should ideally be changed when incompatible changes are made to the structure of the class. The complete list of incompatible changes is present
in the Java Object Serialization Specification.
To expand further, incompatible changes to a class will prevent the deserialization mechanism from creating an instance of the object, because there is information in the stream that does not map to the current class definition.

The frequently-repeated mantra about changing the serialVersionUID every time you change the class is complete and utter nonsense. See this Sun article which they republished on their site and which was migrated to the Oracle Technology Network after the acquisition.
You should change the serialVersionUID only when you deliberately want to break compatibility with all existing serializations, or when your changes to the class are so radical that you have no choice - in which case you should really think several times about what it is that you are actually doing.
In all other cases you should bust your boiler trying to use custom readObject()/writeObject() and/or writeReplace()/readResolve() methods and/or serialFields annotations so that you can continue to read objects from those existing serializations. Once you break that you are in for a major headache, indeed nightmare.

If you don't specify a serialVersionUID field in your Serializable classes, the Java compiler will specify one for you -- essentially it's a hash of the class name, interface names, methods, and fields of the class. Methods can be altered at any time, though, so if you need to change how a stored class is deserialized, you can override the readObject method. If you do specify the serialVersionUID field in your code, though, the compiler won't override that even if you do make incompatible changes, which can result in an exception at runtime -- your IDE or compiler won't give you a warning. (EDIT -- thanks EJP) IDEs such as Eclipse can insert the compiler's UID for you, if you want to easily check how the compiler views certain changes.
If you make changes often, keep an old version of the disk file around to test deserialization with. You can write unit tests to try and read in the old file, and see if it works or if it's totally incompatible.
One caveat, I've personally experienced the pain that is working with Serializable classes originally intended for long-term storage that were improperly designed. For example, storing GUI elements on disk rather than creating them when needed. Ask yourself if Serializable is really the best way to save your data.

For the sake of completeness, here's a list of changes that break the compatibility of Java serialization according to the java 8 spec:
Deleting fields
Moving classes up or down the hierarchy
Changing a nonstatic field to static or a nontransient field to transient
Changing the declared type of a primitive field
Changing the writeObject or readObject method so that it no longer writes or reads the default field data or changing it so that it attempts to write it or read it when the previous version did not.
Changing a class from Serializable to Externalizable or vice versa
Changing a class from a non-enum type to an enum type or vice versa
Removing either Serializable or Externalizable
Adding the writeReplace or readResolve method to a class

You can set serialiVersionUID to the same value for the life of the class. (Not always a good idea) Note: you can implement your own serialization version checking strategy with readObject/writeObject if you need this and leave the UID unchanged.
The only time you MUST change it is if you have already serialized some data to a file and you want to read it. If it has changed for any reason you MUST set the serialiVersionUID to the version in the file to have any hope of being able to read the data.

To declare your own serialVersionUID in java, type this in the
serialized object class:
#Serial
private static final long serialVersionUID = desired_number;

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.