Data Model Evolution

Data Model Evolution - java

When writing code I am seeing requirements to change data models (e.g. adding/changing/removing data members from a class). When these data models belong to an interface, it seems difficult to change without breaking the existing client codes. So I am wondering if there is any best practice of designing interfaces/data models in a way to minimize the impact during evolution.
The closest thing I can find from google is data contract versioning. But that seems to be a .net specific topic. I am wondering if the same practice applies to the Java world, or there is a different or generic way to deal with data model evolution.
Thanks

There are some tools which can help, have a look at LiquiBase.
This article goves a good overview on developerworks

There are no easy answers to this in either the Java or data modeling domains.
Some changes are upwards compatible; e.g. addition of new methods, optional fields, subclasses and so on.
Some changes are not compatible, but can be handled using a simple transformation; e.g. addition of a mandatory field could supported by a transformation that adds an extra constructor argument.
Some changes unavoidably require major programmer intervention.
Another point to note is that the problem gets a lot harder when the data corresponding to the data models is persistent, and cannot be thrown away when the data model changes. This is referred to as the "schema evolution" problem, and I believe that it has been proven that there is no general solution.

Related

How does MicroStream (de)serialization work?

I was wondering how the serialization of MicroStream works in detail.
Since it is described as "Super-Fast" it has to rely on code-generation, right? Or is it based on reflections?
How would it perform in comparison to the Protobuf-Serialization, which relies on Code-generation that directly reads out of the java-fields and writes them into a bytebuffer and vice-versa.
Using reflections would drastically decrease the performance when serializing objects on a huge scale, wouldn't it?
I'm looking for a fast way to transmit and persist objects for a multiplayer-game and every millisecond counts. :)
Thanks in advance!
PS: Since I don't have enough reputation, I can not create the "microstream"-tag. https://microstream.one/

I am the lead developer of MicroStream.
(This is not an alias account. I really just created it. I'm reading on StackOverflow for 10 years or so but never had a reason to create an account. Until now.)
On every initialization, MicroStream analyzes the current runtime's versions of all required entity and value type classes and derives optimized metadata from them.
The same is done when encountering a class at runtime that was unknown so far.
The analysis is done per reflection, but since it is only done once for every handled class, the reflection performance cost is negligible.
The actual storing and loading or serialization and deserialization is done via optimized framework code based on the created metadata.
If a class layout changes, the type analysis creates a mapping from the field layout that the class' instances are stored in to that of the current class.
Automatically if possible (unambiguous changes or via some configurable heuristics), otherwise via a user-provided mapping. Performance stays the same since the JVM does not care if it (simplified speaking) copies a loaded value #3 to position #3 or to position #5. It's all in the metadata.
ByteBuffers are used, more precisely direct ByteBuffers, but only as an anchor for off-heap memory to work on via direct "Unsafe" low-level operations. If you are not familiar with "Unsafe" operations, a short and simple notion is: "It's as direct and fast as C++ code.". You can do anything you want very fast and close to memory, but you are also responsible for everything. For more details, google "sun.misc.Unsafe".
No code is generated. No byte code hacking, tacit replacement of instances by proxies or similar monkey business is used. On the technical level, it's just a Java library (including "Unsafe" usage), but with a lot of properly devised logic.
As a side note: reflection is not as slow as it is commonly considered to be. Not any more. It was, but it has been optimized pretty much in some past Java version(s?).
It's only slow if every operation has to do all the class analysis, field lookups, etc. anew (which an awful lot of frameworks seem to do because they are just badly written). If the fields are collected (set accessible, etc.) once and then cached, reflection is actually surprisingly fast.
Regarding the comparison to Protobuf-Serialization:
I can't say anything specific about it since I haven't used Protocol Buffers and I don't know how it works internally.
As usual with complex technologies, a truly meaningful comparison might be pretty difficult to do since different technologies have different optimization priorities and limitations.
Most serialization approaches give up referential consistency but only store "data" (i.e. if two objects reference a third, deserialization will create TWO instances of that third object.
Like this: A->C<-B ==serialization==> A->C1 B->C2.
This basically breaks/ruins/destroys object graphs and makes serialization of cyclic graphs impossible, since it creates and endlessly cascading replication. See JSON serialization, for example. Funny stuff.)
Even Brian Goetz' draft for a Java "Serialization 2.0" includes that limitation (see "Limitations" at http://cr.openjdk.java.net/~briangoetz/amber/serialization.html) (and another one which breaks the separation of concerns).
MicroStream does not have that limitation. It handles arbitrary object graphs properly without ruining their references.
Keeping referential consistency intact is by far not "trying to do too much", as he writes. It is "doing it properly". One just has to know how to do it properly. And it even is rather trivial if done correctly.
So, depending on how many limitations Protobuf-Serialization has ("pacts with the devil"), it might be hardly or even not at all comparable to MicroStream in general.
Of course, you can always create some performance comparison tests for your particular requirements and see which technology suits you best. Just make sure you are aware of the limitations a certain technology imposes on you (ruined referential consistency, forbidden types, required annotations, required default constructor / getters / setters, etc.).
MicroStream has none*.
(*) within reason: Serializing/storing system-internals (e.g. Thread) or non-entities (like lambdas or proxy instances) is, while technically possible, intentionally excluded.

DRY Principle: Angular2/Typescript and Java back end object duplication

I'm a Java developer but I've recently begun learning Angular2/Typescript. I've worked with Angular 1.x before so I'm not a complete noob :)
While working through a POC with a RESTful Spring Boot back end and Angular2 front end I noticed myself duplicating model objects on both sides a lot e.g.
Java Object
public class Car {
private Double numSeats;
private Double numDoors;
.....
}
Now in interest of Typescript and being strongly typed I'd create a similar object within my front end project:
export interface PersonalDetailsVO {
numSeats : number;
numDoors : number;
}
I'm duplicating the work and constantly violating the DRY (Don't Repeat Yourself) principle here.
I'm wondering is there a better way of going about this. I was thinking about code generation tools like jSweet but interested to hear if anyone else has come across the same issue and how they approached it.

There are two schools of thought on whether this is a violation of the DRY principle. If you're really, really sure that there's a natural mapping you would always apply to bind json in each language, then you could say that it is duplicate work; which is (at least part of) the thinking behind IDL-type languages in technologies like CORBA (but I'm showing my age).
OTOH maybe each system (the server, the client, an alternate client if anyone were to write one) should be free to independently define the internal representations of objects that is best suited to that system (given its language, what it plans to do, etc.).
In your example, the typescript certainly doesn't contain all of the information needed to define the Java "equivalent". ('number' could map to a lot of things; and the typescript says nothing about access modifiers...) Of course you can narrow that down by adopting conventions, but my point is it's not self-evident that there'd be a 1-to-1 mapping.
Maybe one language handles references more gracefully than another. Maybe one can't deal with circular references but the other can. Maybe one has reason to prefer a more flat view of the object. Maybe a lot of things.
All of that said, it certainly is true that if you modify the json structure of an object, and you're maintaining each system's internal representation independently, then you likely have to make code changes in multiple places to accommodate that single underlying change. And pragmatically, if that can be avoided it's a good thing.
So if you can come up with a code generator that processes the more expressive language's representation to create a representation for the less expressive language, and maybe at least use that by default, you may find it's not a bad thing for your project.

Can I (easily) use a third-party library to handle serialization for Java RMI?

I very much like the simplicity of calling remote methods via Java's RMI, but the verbosity of its serialization format is a major buzz kill (Yes, I have benchmarked, thanks). It seems that the architects at Sun did the obvious right thing when designing the RPC (speaking loosely) component, but pulled an epic fail when it came to implementing serialization.
Conversely, it seems the architects of Thrift, Avro, Kryo (especially), protocol buffers (not so much), etc. generally did the obvious right thing when designing their serialization formats, but either do not provide a RPC mechanism, provide one that is needlessly convoluted (or immature), or else one that is more geared toward data transfer than invoking remote methods (perfectly fine for many purposes, but not what I'm looking for).
So, the obvious question: How can I use RMI's method-invocation loveliness but employ one of the above libraries for the wire protocol? Is this possible without a lot of work? Am I evaluating one of the aforementioned libraries too harshly (N.B. I very much dislike code generation, in general; I dislike unnecessary annotations somewhat, and XML configuration quite a bit more; any sort of "beans" make me cringe--I don't need the weight; ideally, I'm looking to just implement an interface for my remote objects, as with RMI).

Once upon a time, I did have the same requirement. I had changed rmi methods arguments and return types to byte[].
I had serialized objects with my preferred serializer to byte array, then called my modified rmi methods.
Well, as you mentioned java serialization is too verbose, therefore 5 years ago I did implement a space efficient serialization algorithm. It saves too much space, if you are sending a very complex object graph.. Recently, I have to port this serialization implementation to GWT, because GWT serialization in Dev mode is incredibly slow.
As an example;
rmi method
public void saveEmployee(Employee emp){
//business code
}
you should change it like below ,
public void saveEmployee(byte[] empByte) {
YourPreferredSerializer serialier = YourPreferredSerializerFactory.creteSerializer();
Employee emp = (Employee) serializer.deSerialize(empByte);
//business code
}
EDIT :
You should check MessagePack . it looks promising.

I don't think there is a way to re-wire RMI, but it might be that specific replacement projects -- I am specifically thinking of DiRMI -- might? And/or project owners might be interest in helping with this (Brian, its author, is a very competent s/w engineer from Amazon.com).
Another interesting project is Protostuff -- its author is building a RPC framework too (I think); but even without it supports an impressive range of data formats; and does this very efficiently (as per https://github.com/eishay/jvm-serializers/wiki/).
Btw, I personally think biggest mistake most projects have made (like PB, Avro) is not keeping proper separation between RPC and serialization aspects nicely separate.
So ability to do RPC using a pluggable data format or serialization providers seems like a good idea to me.

writeReplace() and readResolve() is probably the best combo for doing so. Mighty powerful in the right hands.

Java serialization is only verbose where it describes the classes and fields it's serializing. Overall, the format is as "self describing" as XML. You can can actually override this and replace it with something else. This is what the writeClassDescriptor and readClassDescriptor methods are for. Dirmi overrides these methods, and so it is able to use standard object serialization with less wire overhead.
The way it works is related to how its sessions work. Both endpoints may have different versions of the object, and so simply throwing away the class descriptors won't work. Instead, additional data is exchanged (in the background) so that the serialized descriptor is replaced with a session-specific identifier. Upon seeing the identifier, a lookup table is examined to find the descriptor object. Because the data is exchanged in the background, there's a brief "warm up period" after a session is created and for every time an object type is written for the first time.
Dirmi has no way to replace the wire format at this time.

Collection as a metaphor for real world containers

I find modeling physical containers using collections very intuitive. I override/delegate add methods with added capacity constraints based on physical attributes such as volume of added elements, sort based on physical attributes, locate elements by using maps of position to element and so on.
However, when I read the documentation of collection classes, I get the impression that it's not the intended use, that it's just a mathematical construct and a bounded queue is just meant to be constrained by the number of elements and so forth.
Indeed I think that I unless I'm able to model this collection coherently, I should perhaps not expose this class as a collection but only delegate to it internally. Opinions?

Many structures in software development do not have a physical counterpart. In fact, some structures and algorithms are quite abstract, and do not model objects directly in the physical world. So just because an object does not serve as a suitable model for physical objects in the real world does not necessarily mean it cannot be used effectively to solve problems within a computer program.

Indeed I think that I unless I'm able to model this collection coherently, I should perhaps not expose this class as a collection but only delegate to it internally. Opinions?
Firstly, you don't want to get too hung up with the modeling side of software engineering. UML style models (usually) serve primarily as a way of organizing and expressing the developer's high level ideas about how an application should be implemented. There is no need to have a strict one-to-one relationship between the classes in the model and the implementation classes in the application code.
Second, you don't want to get too hung up about modeling "real world" (i.e. physical) objects and their behavior. Most of the "objects" that are used in a typical applications have no real connection with the real world. For example, a "folder" or "directory" is really little more than an analogy of the physical objects with the same names. There's typically no need for the computer concept to be constrained by the physical behavior of the real world objects.
Finally, there are a number of software engineering reasons why it is a bad idea to have your Java domain classes extend the standard collection types. For example:
The collections have a generic behavior that it is typically not appropriate to expose in a domain object. For instance, you typically don't want components of a domain object to be added and removed willy-nilly.
By extending a collection type, you are implicitly giving permission for some part of your application to treat domain objects as just lists or sets or whatever.
By extending collection classes, you would be hard-wiring implementation details into your domain APIs. For example, you would need to decide between extending ArrayList or LinkedList, and changing your mind would result (at least) in a binary API incompatibility ... and possibly worse.

Not entirely sure that I've understood you correctly. I gather that you want to know if you should expose the collection (subclassing) or wrap it (have a private field).
As Robert says, it really depends on the case. It's pretty much your choice. Nonetheless I'd say that in many cases the better choice is to not expose the collection because the constraints define the object you are modelling and are not fully congruent with the underlying collection. In short: users of your object shouldn't need to know that they are dealing with a collection unless it is really a collection with some speciality e.g. has all properties of a collection but allows only a certain number of objects.

How easily customizable are SAP industry-specific solutions?

First of all, I have a very superficial knowledge of SAP. According to my understanding, they provide a number of industry specific solutions. The concept seems very interesting and I work on something similar for banking industry. The biggest challenge we face is how to adapt our products for different clients. Many concepts are quite similar across enterprises, but there are always some client-specific requirements that have to be resolved through configuration and customization. Often this requires reimplementing and developing customer specific features.
I wonder how efficient in this sense SAP products are. How much effort has to be spent in order to adapt the product so it satisfies specific customer needs? What are the mechanisms used (configuration, programming etc)? How would this compare to developing custom solution from scratch? Are they capable of leveraging and promoting best practices?

Disclaimer: I'm talking about the ABAP-based part of SAP software only.
Disclaimer 2, ref PATRYs response: HR is quite a bit different from the rest of the SAP/ABAP world. I do feel rather competent as a general-purpose ABAP developer, but HR programming is so far off my personal beacon that I've never even tried to understand what they're doing there. %-|
According to my understanding, they provide a number of industry specific solutions.
They do - but be careful when comparing your own programs to these solutions. For example, IS-H (SAP for Healthcare) started off as an extension of the SD (Sales & Distribution) system, but has become very much more since then. While you could technically use all of the techniques they use for their IS, you really should ask a competent technical consultant before you do - there are an awful lot of pits to avoid.
The concept seems very interesting and I work on something similar for banking industry.
Note that a SAP for Banking IS already exists. See here for the documentation.
The biggest challenge we face is how to adapt our products for different clients.
I'd rather rephrase this as "The biggest challenge is to know where the product is likely to be adapted and to structurally prepare the product for adaption." The adaption techniques are well researched and easily employed once you know where the customer is likely to deviate from your idea of the perfect solution.
How much effort has to be spent in
order to adapt the product so it
satisfies specific customer needs?
That obviously depends on the deviation of the customer's needs from the standard path - but that won't help you. With a SAP-based system, you always have three choices. You can try to customize the system within its limits. Customizing basically means tweaking settings (think configuration tables, tens of thousands of them) and adding stuff (program fragments, forms, ...) in places that are intended to do so. Technology - see below.
Sometimes customizing isn't enough - you can develop things additionally. A very frequent requirement is some additional reporting tool. With the SAP system, you get the entire development environment delivered - the very same tools that all the standard applications were written with. Your programs can peacefully coexist with the standard programs and even use common routines and data. Of course you can really screw things up, but show me a real programming environment where you can't.
The third option is to modify the standard implementations. Modifications are like a really sharp two-edged kitchen knife - you might be able to cook really cool things in half of the time required by others, but you might hurt yourself really badly if you don't know what you're doing. Even if you don't really intend to modify the standard programs, it's very comforting to know that you could and that you have full access to the coding.
(Note that this is about the application programs only - you have no chance whatsoever to tweak the kernel, but fortunately, that's rarely necessary.)
What are the mechanisms used (configuration, programming etc)?
Configurations is mostly about configuration tables with more or less sophisticated dialog applications. For the programming part of customizing, there's the extension framework - see http://help.sap.com/saphelp_nw70ehp1/helpdata/en/35/f9934257a5c86ae10000000a155106/frameset.htm for details. It's basically a controlled version of dependency injection. As a solution developer, you have to anticipate the extension points, define the interface that has to be implemented by the customer code and then embed the call in your code. As a project developer, you have to create an implementation that adheres to the interface and activate it. The basic runtime system takes care of glueing the two programs together, you don't have to worry about that.
How would this compare to developing custom solution from scratch?
IMHO this depends on how much of the solution is the same for all customers and how much of it has to be adapted. It's really hard to be more specific without knowing more about what you want to do.

I can only speak for the Human Resource component, but this is a component where there is a lot of difference between customers, based on a common need.
First, most of the time you set the value for a group, and then associate the object (person, location...) with a group depending on one or two values. This is akin to an indirection, and allow for great flexibility, as you can change the association for a given location without changing the others. in a few case, there is a 3 level indirection...
Second, there is a lot of customization that is nearly programming. Payroll or administrative operations are first class example of this. In the later cas, you get a table with the operation (hiring for example), the event (creation, modification...) a code for the action (I for test, F to call a function, O for a standard operation) and a text field describing the parameters of a function ("C P0001, begda, endda" to create a structure P001 with default values).
Third, you can also use such a table to indicate a function or class (ABAP-OO), that will be dynamically called. You get a developer to create this function or class, and then indicate this in the table. This is a method to replace a functionality by another one, or extend it. This is used extensively in the ESS/MSS.
Last, there is also extension point or file that you can modify. this is nearly the same as the previous one, except that you don't need to indicate the change : the file is always used (ZXPADU01/02 for HR modification of infotype)
hope this help
Guillaume PATRY

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.