Are there good alternatives for serializing enums in Java?

Are there good alternatives for serializing enums in Java? - java

The Java language benefited much from adding enums to it; but unfortunately they don't work well when sending serialized objects between systems that have different code levels.
Example: assume that you have two systems A and B. They both start of with the same code levels, but at some point the start to see code updates at different points in time. Now assume that there is some
public enum Whatever { FIRST; }
And there are other objects that keep references to constants of that enum. Those objects are serialized and sent from A to B or vice versa. Now consider that B has a newer version of Whatever
public enum Whatever { FIRST; SECOND }
Then:
class SomethingElse implements Serializable { ...
private final Whatever theWhatever;
SomethingElse(Whatever theWhatever) {
this.theWhatever = theWhatever; ..
gets instantiated ...
SomethingElse somethin = new SomethingElse(Whatever.SECOND)
and then serialized and sent over to A (for example as result of some RMI call). Which is bad, because now there will be an error during deserialization on A: A knows the Whatever enum class, but in a version that doesn't have SECOND.
We figured this the hard way; and now I am very anxious to use enums for situations that would actually "perfect for enums"; simply because I know that I can't easily extend an existing enum later on.
Now I am wondering: are there (good) strategies to avoid such compatibility issues with enums? Or do I really have to go back to "pre-enum" times; and don't use enums, but have to rely on a solution where I use plain strings all over the place?
Update: please note that using the serialversionuid doesn't help here at all. That thing only helps you in making an incompatible change "more obvious". But the point is: I don't care why deserialization fails - because I have to avoid it to happen. And I am also not in a position to change the way we serialize our objects. We are doing RMI; and we are serializing to binary; I have no means to change that.

As #Jesper mentioned in the comments, I would recommend something like JSON for your inter-service communication. This will allow you to have more control on how unknown Enum values are handled.
For example, using the always awesome Jackson you can use the Deserialization Features READ_UNKNOWN_ENUM_VALUES_AS_NULL or READ_UNKNOWN_ENUM_VALUES_USING_DEFAULT_VALUE. Both will allow your application logic to handle unknown enum values as you see fit.
Example (straight from the Jackson doc)
enum MyEnum { A, B, #JsonEnumDefaultValue UNKNOWN }
...
final ObjectMapper mapper = new ObjectMapper();
mapper.enable(DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_USING_DEFAULT_VALUE);
MyEnum value = mapper.readValue("\"foo\"", MyEnum.class);
assertSame(MyEnum.UNKNOWN, value);

After going back and forth regarding different solutions, I figured a solution based on the suggestion from #GuiSim : one can build a class that contains an enum value. This class can
do custom deserialization; thus I can prevent there won't be exceptions during the deserialization process
provide simple methods like isValid() and getEnumValue(): the first one tells you if the enum deserialization actually worked; and the second one returns the deserialized enum (or throws an exception)

Related

Writing data classes with builder pattern

I have a data class which uses a builder to create the object and stores the data in a buffer in serialized form. I am planning to change the class to add and remove some fields. There are systems that will use both version of the class to create data i.e. the current version with all fields and newer version with removed/added fields. I am trying to see what is the best way to do this so that this is backward compatible(without breaking any consumer)?
I have couple of suggestions on how to do this but I am having a difficult time to pick one over the other.
Requirements:
The data stored has to be in binary.
Length of serialized record is same in both versions
Existing code
public class A implements Comparable<A>, Serializable {
private final Buffer buffer;
public static final Builder {
private Header header//header with version
private long creationTime;
private SomeObject someObject;//this is removed in next version
private OtherObject otherObject;//this is added in next version
public Builder() { }
//bunch of getters setters for fields
public A build() {return new A(this);}
private A(Builder b) {
//build the object and put into the buffer
validate()
}
private void validate() {//validates the object}
public A(Buffer buf) {
this.buffer=buf;
validate();
}
public A(String encodedString) {
this(ByteBuffer.wrap(encodedString));
}
}
// consumers use this to get creationTime for object A
public long getCreationTime() {
return buffer.getLong(OFFSET_CREATION_DATE);
}
}
Solution1: add new fields in the builder and use version in the header to decide which fields to use at build time (in build method) to create the object. Problem with this approach is that all the methods will exist at compile time to the consumers and unless they test their code every object will be valid. So it will be difficult to reason about which fields are required for which version at build time.
Solution2: Add a new builder in the class with the fields that you want. There will be duplicate fields that are in the existing builder.
Consumers can then use the builder that they want. This seems to be cleaner because builders will be completely independent. Problem with this approach is that since we are adding and removing fields, fields will be at different offsets so the getters will have to change to use an offset based on versionType. This is also problematic for future versions because then we will have this gigantic class with lots of builders and logic in getters for every version
Solution3: Create a new class (let's say B) that extends A and have its own builder. This way the code is more modular. Problem is that now there will need to be some logic somewhere to differentiate and know which constructor to call. For example , if a consumer is passing base64 to get an object A, it will need to figure out which version it is.
String encodedString = "someString form of A"
A a = new A(encodedString);
Is there a recommended way to code these data classes with builder patterns to make it both future and backwards compatible.

Aproach 2 combined with aproach 1 + correct binary representation is the answere. Choosing correct format for your binary representation, the simplest thing would be to pick up json . Make concrete builders for V1 and V2 object and use the byte buffer to construct them. Each builder/Factory would be interested only in the fields it recognizes. You may consider using a version field if a builder/factory attempts to deserialize wrong version exception may be thrown. The concrete builder/factory will be build only objects of the version it recognizes.
Subclassing is unnessessary in my opinion. You can separate the Builder/factory class from the object class. See "StreamSerializer" from Hazelcast as an example, completly external class to the entity dedicated only to doing marshaling.
Using proper format will fix your problem with offset from Aproach two. If you must have it in binary form then a workaround would be to use flat format where record size is bigger than nessesary and you have reserved free space for changes. In the old Cobol days this is how they were doing it. I don't recommend you to do that though. Use json :) it is simplest may be not most effecient. You can also check https://developers.google.com/protocol-buffers/ protocol buffers.
Depending on what layout you choose for serialization when demarshaling you may configure chain of responsibility that attempts to deserialize a stream portion. When you deprecate a version, the marshaler will be just removed/deactivated from the chain.

What is use of making classes serializable?

I have noticed many of the library classes "ArrayList", "String" even the exceptions are having a serialVersionUID. Why they have made it like this. Whats the practical use of doing that.FYI I am familiar with the concept of Serialization. Please point out the practical purpose of it.
For your reference find the serialversionUid for ClassCastException
public class ClassCastException extends RuntimeException {
private static final long serialVersionUID = -9223365651070458532L;
Where these object's state going to persist? And where will these objects state going to be retrieved ?
I am currently working in a project where we are making REST controllers whose input and output parameters will be JSON.We are creating simple POJOs for i/p and o/p parameters.I have seen people making those POJOs serializable .Whats the point in doing that ?
But I havent seen **out.readObject** or out.writeobject which is used to write and read the state of object.Will the POJO's state persist just making it serializable? If yes where it will be stored?

If you want the full story, read the spec: Java Object Serialization Specification.
[...] many of the library classes "ArrayList", "String" even the exceptions are having a serialVersionUID. Why they have made it like this.
To support backwards compatibility when reading objects that were written in an older version of the class. See Stream Unique Identifiers.
Where these object's state going to persist?
Wherever you decide. See Writing to an Object Stream.
And where will these objects state going to be retrieved ?
Wherever you put it. See Reading from an Object Stream.
[...] input and output parameters will be JSON. [...] I have seen people making those POJOs serializable. Whats the point in doing that ?
None. JSON is not using Java serialization. Java serialization creates a binary stream. JSON creates text.
Will the POJO's state persist just making it serializable? If yes where it will be stored?
No, see above.

Hamcrest - Elegant way to test complex object with samepropertyvaluesas

I have quite complex object structure (with bunch of primitive fields and object references) and want to test all fields except -a few- of them. As an example;
ComplexObject actual = generateMagically("someInput");
ComplexObject expected = ActualFunction.instance.workMagically(actual);
// we want to be sure that workMagically() would create a new ComplexObject
// with some fields are different than "actual" object.
// assertThat(actual, samePropertyValuesAs(expected)); would check all fields.
// what I want is actually; - notice that "fieldName1" and "fieldName2" are
// primitives belong to ComplexObject
assertThat(actual, samePropertyValuesExceptAs(expected, "fieldName1", "fieldName2"))
Since I don't want to check all fields manually, I believe there must be a way to write that test elegantly. Any ideas?
Cheers.

You should have a look at shazamcrest, a great Hamcrest extension that offers what you need.
assertThat(expected, sameBeanAs(expectedPerson).ignoring("fieldName1").ignoring("fieldName2"));
See https://github.com/shazam/shazamcrest#ignoring-fields

Just pass the list of properties to ignore as 2nd parameter to samePropertyValuesAs.
Hamcrest matcher API
public static <B> Matcher<B> samePropertyValuesAs(B expectedBean, String... ignoredProperties)
e.g.
samePropertyValuesAs(salesRecord,"id")

In general I see two solutions if ComplexObject can be modified by yourself.
You could introduce an interface that represents the properties of ComplexObject that are being changed by ActualFunction. Then you can test that all properties of that new interface have changed. This would require that ComplexObject implements that new interface.
Another approach would be to replace the properties of ComplextObject that are changed by ActualFunction with a new property of a new type that contains all those properties. A better design would then be to let ActualFunction return an instance of the new type.

Last time I had a similar requirements I came to the conclusion that manually writing both code and tests to assert that some values are updated is inherently fagile and error-prone.
I externalized the fields in a bag object and generated the Java source files for both the bag class itself and the copier at compile time. This way you can test actual code (the generator) and have the actual definition of the domain in exactly one place, so the copy code can't be out-of-date.
The language to describe the property can be anything you are comfortable with, from JSON-schema to XML to Java itself (Java example follows - custom annotations are to be consumed from the generator)
public class MyBag {
#Prop public int oh;
#Prop public String yeah;
}

How does serialization tool skip unknown fields during deserialization?

How does serialization tool(i.e. hessian) deserialize a class of different version with the same serialVersionUID? In most cases, it can skip those unknown(not found in class loader) fields and keep compatible. But last time, I tried appending a new field of Map<String, Object>, put some unknown object into the map, then it threw a ClassNotFoundException.
Why can't skip the map like the others?
Is it a problem associated with the tool's implementation or serialization mechanism?

This would depend on the tool itself. serialVersionUID is intended for use by Java's built-in serializer (ObjectOutputStream) which, as best I can tell from reading the Hessian source, is not used by Hessian.
For Hessian specifically, the best source I can find which mentions these kinds of changes is this email:
At least for Hessian, it's best to think of versioning as a set of
types of changes that can be handled.
Specifically Hessian can manage the following kinds of changes: 1)
if you add or drop a field, the side that doesn't understand the
field will ignore it. 2) some field type changes are possible, if
Hessian can convert (e.g. int to long) 3) there's some flexibility
on map(bean) types, depending on how much information Hessian has
(which is a reason to prefer concrete types.)
So, if the sender sends an untyped map {"field1", 10} and the target
is known to be MyValue { int field1; }, then Hessian can map the
fields.
But it cannot manage things like: 1) field name changes (the data
will be dropped). 2) class name changes where the target is
underdefined, like Object field1. If you send a MyValue2 as the new
field1, when the previous version was MyValue1, Hessian can't make
that automatic transition. (But as with #3 above, a "MyValue2 field1"
would give Hessian enough information to translate.) 3) class
splits, e.g. creating a subclass and pushing some fields into it.
4) map to list or list to map changes.
Basically, I don't think Hessian intends to support unknown types in maps.

Why its required to mark a class as serializable?

If a similar question is already posted on stackoverflow, pls just post the link.
What is the need to implement Serializable interface (with no methods) for objects which are to be serialized ?
The Java API says -
- If its not implemented then it will throw java.io.NotSerializableException.
That's because of the following code in ObjectOutputStream.java
............................
writeObject0(Object obj, boolean unshared){
.............
} else if (cl.isArray()) {
writeArray(obj, desc, unshared);
} else if (obj instanceof Serializable) {
writeOrdinaryObject(obj, desc, unshared);
} else {
throw new NotSerializableException(cl.getName());
}
................
But my question is why its necessary to implement Serializable and thereby inform or tell Java/JVM that a class can be serialized. (Is it only to avoid the exception ?).
In this is the case, If we write a similar functionality which writes objects to streams without the check of whether the class in an instanceOf Serializable, Will the objects of a class not implemneting Serializable serialized ?
Any help is appreciated.

It's a good question. The Serializable is know as a marker interface, and can be viewed as a tag on a class to identify it as having capabilities or behaviours. e.g. you can use this to identify classes that you want to serialise that don't have serialVersionUid defined (and this may be an error).
Note that the commonly used serialisation library XStream (and others) don't require Serializable to be defined.

It is needed so that the JVM can know whether or not a class can be safely serialized. Some things (database connections for example) contain state or connections to external resources that cannot really be serialized.
Also, you'll want to make sure that you put a serialVersionUID member in every serializable class to ensure that serialized objects can be de-serialized after a code change or recompile:
// Set to some arbitrary number.
// Change if the definition/structure of the class changes.
private static final long serialVersionUID = 1;

The serialization allows you to save objects directly to binary files without having to parse them to text, write the string out and then create a new object, and parse the string inputs when reading back in. The primary purpose is to allow you to save objects with all their data to a binary file. I found it to be extremely useful when having to work with linked lists containing lots of objects of the same type and I needed to save them and open them.

The reason is that not all classes can be serialized. Examples:
I/O stuff: InputStream, HTTP connections, channels. They depend on objects created outside the scope of the Java VM and there is no simple way to restore them.
OS resources like windows, images, etc.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Are there good alternatives for serializing enums in Java? - java

Related

Writing data classes with builder pattern

What is use of making classes serializable?

Hamcrest - Elegant way to test complex object with samepropertyvaluesas

How does serialization tool skip unknown fields during deserialization?

Why its required to mark a class as serializable?

Categories

Resources