Writing data classes with builder pattern - java

I have a data class which uses a builder to create the object and stores the data in a buffer in serialized form. I am planning to change the class to add and remove some fields. There are systems that will use both version of the class to create data i.e. the current version with all fields and newer version with removed/added fields. I am trying to see what is the best way to do this so that this is backward compatible(without breaking any consumer)?
I have couple of suggestions on how to do this but I am having a difficult time to pick one over the other.
Requirements:
The data stored has to be in binary.
Length of serialized record is same in both versions
Existing code
public class A implements Comparable<A>, Serializable {
private final Buffer buffer;
public static final Builder {
private Header header//header with version
private long creationTime;
private SomeObject someObject;//this is removed in next version
private OtherObject otherObject;//this is added in next version
public Builder() { }
//bunch of getters setters for fields
public A build() {return new A(this);}
private A(Builder b) {
//build the object and put into the buffer
validate()
}
private void validate() {//validates the object}
public A(Buffer buf) {
this.buffer=buf;
validate();
}
public A(String encodedString) {
this(ByteBuffer.wrap(encodedString));
}
}
// consumers use this to get creationTime for object A
public long getCreationTime() {
return buffer.getLong(OFFSET_CREATION_DATE);
}
}
Solution1: add new fields in the builder and use version in the header to decide which fields to use at build time (in build method) to create the object. Problem with this approach is that all the methods will exist at compile time to the consumers and unless they test their code every object will be valid. So it will be difficult to reason about which fields are required for which version at build time.
Solution2: Add a new builder in the class with the fields that you want. There will be duplicate fields that are in the existing builder.
Consumers can then use the builder that they want. This seems to be cleaner because builders will be completely independent. Problem with this approach is that since we are adding and removing fields, fields will be at different offsets so the getters will have to change to use an offset based on versionType. This is also problematic for future versions because then we will have this gigantic class with lots of builders and logic in getters for every version
Solution3: Create a new class (let's say B) that extends A and have its own builder. This way the code is more modular. Problem is that now there will need to be some logic somewhere to differentiate and know which constructor to call. For example , if a consumer is passing base64 to get an object A, it will need to figure out which version it is.
String encodedString = "someString form of A"
A a = new A(encodedString);
Is there a recommended way to code these data classes with builder patterns to make it both future and backwards compatible.

Aproach 2 combined with aproach 1 + correct binary representation is the answere. Choosing correct format for your binary representation, the simplest thing would be to pick up json . Make concrete builders for V1 and V2 object and use the byte buffer to construct them. Each builder/Factory would be interested only in the fields it recognizes. You may consider using a version field if a builder/factory attempts to deserialize wrong version exception may be thrown. The concrete builder/factory will be build only objects of the version it recognizes.
Subclassing is unnessessary in my opinion. You can separate the Builder/factory class from the object class. See "StreamSerializer" from Hazelcast as an example, completly external class to the entity dedicated only to doing marshaling.
Using proper format will fix your problem with offset from Aproach two. If you must have it in binary form then a workaround would be to use flat format where record size is bigger than nessesary and you have reserved free space for changes. In the old Cobol days this is how they were doing it. I don't recommend you to do that though. Use json :) it is simplest may be not most effecient. You can also check https://developers.google.com/protocol-buffers/ protocol buffers.
Depending on what layout you choose for serialization when demarshaling you may configure chain of responsibility that attempts to deserialize a stream portion. When you deprecate a version, the marshaler will be just removed/deactivated from the chain.

Related

How to get XStream to map old deleted field to a new one of a different type?

We've made changes to our objectmodel and hence XML, such that some boolean fields are no longer used and superseded by enums.
For example say the object had
#XStreamAlias("showABC")
private boolean showABC;
and that's now superseded by
#XStreamAlias("showOption")
private ShowOptions showOption;
I'd like to remove the defunct booleans but need to be able to read old XML that still contains them as the new field needs to be initialised base on the value of the old..
Currently we just leave the old fields behind and mark them #Deprecated so they can still be read and have readResolve() do the initialisation e.g.
if (showABC == false) {
showOption = ShowOptions.NONE;
}
Leaving these fields around is ugly as hell though, polluting the XML!
Is there a way to delete the fields and create a converter to do the right thing? I don't want to create a converter that manually reads all the fields as there's LOTS of them.
Currently leaning towards another ugly approach of extending AbstractReflectionConverter and munging about in doUnmarshal() so I can have handleUnknownField() do something more useful..
But is there a better solution? Perhaps through the use of Mappers although I can't find any details in the docs/tuts, just right to the source.
UPDATE
Whilst looking at the use of Mappers in AbstractReflectionConverter I thought I might get lucky by making the old field an alias of the new and having a converter that can deal with the boolean and the enum:
xstream.aliasField("showABC", MyModel.class, "showOption");
xstream.registerLocalConverter(MyModel.class, "ShowOptions", new ShowConverter());
But wouldn't you know, this throws a DuplicateFieldException so this approach would need munging to allow overwriting (as of course we've got intermediate XML with both the old and new in it o_O); fortunately new fields are declared after the old so ordering is ok..
I'd have thought this kind of conversion requirement isn't uncommon?!

Are there good alternatives for serializing enums in Java?

The Java language benefited much from adding enums to it; but unfortunately they don't work well when sending serialized objects between systems that have different code levels.
Example: assume that you have two systems A and B. They both start of with the same code levels, but at some point the start to see code updates at different points in time. Now assume that there is some
public enum Whatever { FIRST; }
And there are other objects that keep references to constants of that enum. Those objects are serialized and sent from A to B or vice versa. Now consider that B has a newer version of Whatever
public enum Whatever { FIRST; SECOND }
Then:
class SomethingElse implements Serializable { ...
private final Whatever theWhatever;
SomethingElse(Whatever theWhatever) {
this.theWhatever = theWhatever; ..
gets instantiated ...
SomethingElse somethin = new SomethingElse(Whatever.SECOND)
and then serialized and sent over to A (for example as result of some RMI call). Which is bad, because now there will be an error during deserialization on A: A knows the Whatever enum class, but in a version that doesn't have SECOND.
We figured this the hard way; and now I am very anxious to use enums for situations that would actually "perfect for enums"; simply because I know that I can't easily extend an existing enum later on.
Now I am wondering: are there (good) strategies to avoid such compatibility issues with enums? Or do I really have to go back to "pre-enum" times; and don't use enums, but have to rely on a solution where I use plain strings all over the place?
Update: please note that using the serialversionuid doesn't help here at all. That thing only helps you in making an incompatible change "more obvious". But the point is: I don't care why deserialization fails - because I have to avoid it to happen. And I am also not in a position to change the way we serialize our objects. We are doing RMI; and we are serializing to binary; I have no means to change that.
As #Jesper mentioned in the comments, I would recommend something like JSON for your inter-service communication. This will allow you to have more control on how unknown Enum values are handled.
For example, using the always awesome Jackson you can use the Deserialization Features READ_UNKNOWN_ENUM_VALUES_AS_NULL or READ_UNKNOWN_ENUM_VALUES_USING_DEFAULT_VALUE. Both will allow your application logic to handle unknown enum values as you see fit.
Example (straight from the Jackson doc)
enum MyEnum { A, B, #JsonEnumDefaultValue UNKNOWN }
...
final ObjectMapper mapper = new ObjectMapper();
mapper.enable(DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_USING_DEFAULT_VALUE);
MyEnum value = mapper.readValue("\"foo\"", MyEnum.class);
assertSame(MyEnum.UNKNOWN, value);
After going back and forth regarding different solutions, I figured a solution based on the suggestion from #GuiSim : one can build a class that contains an enum value. This class can
do custom deserialization; thus I can prevent there won't be exceptions during the deserialization process
provide simple methods like isValid() and getEnumValue(): the first one tells you if the enum deserialization actually worked; and the second one returns the deserialized enum (or throws an exception)

Hamcrest - Elegant way to test complex object with samepropertyvaluesas

I have quite complex object structure (with bunch of primitive fields and object references) and want to test all fields except -a few- of them. As an example;
ComplexObject actual = generateMagically("someInput");
ComplexObject expected = ActualFunction.instance.workMagically(actual);
// we want to be sure that workMagically() would create a new ComplexObject
// with some fields are different than "actual" object.
// assertThat(actual, samePropertyValuesAs(expected)); would check all fields.
// what I want is actually; - notice that "fieldName1" and "fieldName2" are
// primitives belong to ComplexObject
assertThat(actual, samePropertyValuesExceptAs(expected, "fieldName1", "fieldName2"))
Since I don't want to check all fields manually, I believe there must be a way to write that test elegantly. Any ideas?
Cheers.
You should have a look at shazamcrest, a great Hamcrest extension that offers what you need.
assertThat(expected, sameBeanAs(expectedPerson).ignoring("fieldName1").ignoring("fieldName2"));
See https://github.com/shazam/shazamcrest#ignoring-fields
Just pass the list of properties to ignore as 2nd parameter to samePropertyValuesAs.
Hamcrest matcher API
public static <B> Matcher<B> samePropertyValuesAs(B expectedBean, String... ignoredProperties)
e.g.
samePropertyValuesAs(salesRecord,"id")
In general I see two solutions if ComplexObject can be modified by yourself.
You could introduce an interface that represents the properties of ComplexObject that are being changed by ActualFunction. Then you can test that all properties of that new interface have changed. This would require that ComplexObject implements that new interface.
Another approach would be to replace the properties of ComplextObject that are changed by ActualFunction with a new property of a new type that contains all those properties. A better design would then be to let ActualFunction return an instance of the new type.
Last time I had a similar requirements I came to the conclusion that manually writing both code and tests to assert that some values are updated is inherently fagile and error-prone.
I externalized the fields in a bag object and generated the Java source files for both the bag class itself and the copier at compile time. This way you can test actual code (the generator) and have the actual definition of the domain in exactly one place, so the copy code can't be out-of-date.
The language to describe the property can be anything you are comfortable with, from JSON-schema to XML to Java itself (Java example follows - custom annotations are to be consumed from the generator)
public class MyBag {
#Prop public int oh;
#Prop public String yeah;
}

Implementing my own serialization in java

How can I implement serialization on my own. Meaning I don't want my class to implement serializable. But I want to implement serialization myself. So that without implementing serializable I can transfer objects over network or write them to a file and later retrieve them in same state. I want to do it since I want to learn and explore things.
Serialization is the process of translating the structure of an object into another format that could be easily transfered across network or could be stored in a file. Java serializes objects into a binary format. This is not necessary if bandwidth/disk-space is not a problem. You can simply encode your objects as XML:
// Code is for illustration purpose only, I haven't compiled it!!!
public class Person {
private String name;
private int age;
// ...
public String serializeToXml() {
StringBuilder xml = new StringBuilder();
xml.append("<person>");
xml.append("<attribute name=\"age\" type=\"int\">").append(age);
xml.append("</attribute>");
xml.append("<attribute name=\"name\" type=\"string\">").append(name);
xml.append("</attribute>");
xml.append("</person>");
return xml.toString();
}
Now you can get an object's XML representation and "serialize" it to a file or a network connection. A program written in any language that can parse XML can "deserialize" this object into its own data structure.
If you need a more compact representation, you can think of binary encoding:
// A naive binary serializer.
public byte[] serializeToBytes() {
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
// Object name and number of attributes.
// Write the 4 byte length of the string and the string itself to
// the ByteArrayOutputStream.
writeString("Person", bytes);
bytes.write(2); // number of attributes;
// Serialize age
writeString("age", bytes);
bytes.write(1); // type = 1 (i.e, int)
writeString(Integer.toString(age), bytes);
// serialize name
writeString("name", bytes);
bytes.write(2); // type = 2 (i.e, string)
writeString(name, bytes);
return bytes.toByteArray();
}
private static void writeString(String s, ByteArrayOutputStream bytes) {
bytes.write(s.length());
bytes.write(s.toBytes());
}
To learn about a more compact binary serialization scheme, see the Java implementation of Google Protocol Buffers.
You can use Externalizable and implement your own serialization mechanism. One of the difficult aspects of serialization is versioning so this can be a challenging exercise to implement. You can also look at protobuf and Avro as binary serialization formats.
You start with reflection. Get the object's class and declared fields of its class and all superclasses. Then obtain value of each field and write it to dump.
When deserializing, just reverse the process: get class name from your serialized form, instantiate an object and set its fields accordingly to the dump.
That's the simplistic approach if you just want to learn. There's many issues that can come up if you want to do it "for real":
Versioning. What if one end of the application is running new version, but the other end has an older class definition with some fields missing or renamed?
Overwriting default behavior. What if some object is more complex and cannot be recreated on a simple field-by-field basis?
Recreating dependencies between objects, including cyclic ones.
... and probably many more.
Get the Java Source code and understand how Serialization is implemented. I did this some month ago, and now have a Serialization that uses only 16% of the space and 20% of the time of "normal" serialization, at the cost of assuming that the classes that wrote the serialized data have not changed. I use this for client-server serialization where I can use this assumption.
As a supplement to #Konrad Garus' answer. There is one issue that is a show-stopper for a full reimplementation of Java serialization.
When you deserialize an object, you need to use one of the object's class's constructors to recreate an instance. But which constructor should you use? If there is a no-args constructor, you could conceivably use that. However, the no-args constructor (or indeed any constructor) might do something with the object in addition to creating it. For example, it might send a notification to something else that a new instance has been created ... passing the instance that isn't yet completely deserialized.
In fact, it is really difficult replicate what standard Java deserialization code does. What it does is this:
It determines the class to be created.
Create an instance of the class without calling any of its constructors.
It uses reflection to fill in the instance's fields, including private fields, with objects and values reconstructed from the serialization.
The problem is that step 2. involves some "black magic" that a normal Java class is not permitted to do.
(If you want to understand the gory details, read the serialization spec and take a look at the implementation in the OpenJDK codebase.)

Why its required to mark a class as serializable?

If a similar question is already posted on stackoverflow, pls just post the link.
What is the need to implement Serializable interface (with no methods) for objects which are to be serialized ?
The Java API says -
- If its not implemented then it will throw java.io.NotSerializableException.
That's because of the following code in ObjectOutputStream.java
............................
writeObject0(Object obj, boolean unshared){
.............
} else if (cl.isArray()) {
writeArray(obj, desc, unshared);
} else if (obj instanceof Serializable) {
writeOrdinaryObject(obj, desc, unshared);
} else {
throw new NotSerializableException(cl.getName());
}
................
But my question is why its necessary to implement Serializable and thereby inform or tell Java/JVM that a class can be serialized. (Is it only to avoid the exception ?).
In this is the case, If we write a similar functionality which writes objects to streams without the check of whether the class in an instanceOf Serializable, Will the objects of a class not implemneting Serializable serialized ?
Any help is appreciated.
It's a good question. The Serializable is know as a marker interface, and can be viewed as a tag on a class to identify it as having capabilities or behaviours. e.g. you can use this to identify classes that you want to serialise that don't have serialVersionUid defined (and this may be an error).
Note that the commonly used serialisation library XStream (and others) don't require Serializable to be defined.
It is needed so that the JVM can know whether or not a class can be safely serialized. Some things (database connections for example) contain state or connections to external resources that cannot really be serialized.
Also, you'll want to make sure that you put a serialVersionUID member in every serializable class to ensure that serialized objects can be de-serialized after a code change or recompile:
// Set to some arbitrary number.
// Change if the definition/structure of the class changes.
private static final long serialVersionUID = 1;
The serialization allows you to save objects directly to binary files without having to parse them to text, write the string out and then create a new object, and parse the string inputs when reading back in. The primary purpose is to allow you to save objects with all their data to a binary file. I found it to be extremely useful when having to work with linked lists containing lots of objects of the same type and I needed to save them and open them.
The reason is that not all classes can be serialized. Examples:
I/O stuff: InputStream, HTTP connections, channels. They depend on objects created outside the scope of the Java VM and there is no simple way to restore them.
OS resources like windows, images, etc.

Categories