Storing historical data with Java and Hibernate

Storing historical data with Java and Hibernate - java

This is a problem about historical data handling.
Suppose you have a class MyClass like the following one:
class MyClass {
String field1;
Integer field2;
Long field3;
getField1() {...}
setField1(String ...) {...}
...
}
Now, suppose I need to make MyClass able to store and retrieve old data, what's the best way to do this?
The requirements are to persist the classes through Hibernate, too. And to have at most two tables per "entity": only one table or one table for the "continuity" class (the one which represents the entity which evolves over the time) and another table for the historical data (as it's suggested here)
Please note that I have to be able to assign an arbitrary valid time to the values of the fields.
The class should have an interface like:
class MyClass {
// how to store the fields????
getField1At(Instant i) {...}
setField1At(Instant i, String ...) {...}
...
}
I'm currently using the JTemporal library, and it has a TemporalAttribute<T> class, which is like a map: you can do things like T myAttr.get(Instant i) to get the version of myAttr at Instant i. I know how to persist a TemporalAttribute in a table with Hibernate (it's simple: I persist the SortedMap used by the TemporalAttribute and you get a table with start and end valid time and the value of the attribute).
The real problem is that here we have multiple attributes.
I have a solution in mind but it's not clear, and I'd like to hear your ideas.

Your project reminds me of Hibernate Envers.
The Envers project aims to enable easy
auditing of persistent classes. All
that you have to do is annotate your
persistent class or some of its
properties, that you want to audit,
with #Audited. For each audited
entity, a table will be created, which
will hold the history of changes made
to the entity. You can then retrieve
and query historical data without much
effort.
choose what you want to audit (on a per attribute basis)
make your own Revision Entity (that stores informations such as revision number, author, timestamp...)
Using Hibernate Envers for this decouples entities and revision data (in database and in your code).

You can do something like this simply by adding a version number to your domain class. I did something like this where the Id was a composite between an db assigned number and the version number, but I would advise against that. Use a normal surrogate key, and if you really want, make the [id, version] tuple a natural key.
You can actually version entire object graphs that way, just by ensuring that the version number is the same for all elements on the graph. You can then easily go back to any previous version.
You should write a lot of service tests to insure the integrity of the code that manages the version.

Related

DDD valueObject and database schema

To end 2014 year I got a simple question I think.
I would like to use "DDD" a bit more, and I'm currently trying to experiment various usecases to learn more about DDD.
My current usecase is the following :
we have a new database schema that is using a classic pattern in our company : modeling our nomenclature table as "id / code / label". I think it's a pretty classic case when using hibernate for example.
But in the OO world things get "complciated" for something this simple when using a API like JDBC or QueryDSL. I need to fetch an object by its code, retrieve its id or load the full object and then set it as a one to one relation in another object.
I wondering :
this kind of nomenclature can be an enum (or a class with String cosnatnts depending on the developer). in DDD terms, it is my ValueObject
the id /code / label in the database is not i18n friendly (it's not a prerequisite) so I don't see its advantages. Except when the table can be updated dynamically and the usecase is "pick something in a combobox loaded from this table and build a relation with another object : but that's all because if you have business rules that must be applied you need to know the new code etc etc).
My questions are :
do you often use the id / ocde / label pattern in your database model.
how do your model your nomenclature data ? (country is perhaps not the best example :) but no matter what how do you model it ? without thinking much I would say database table for country; but for some status : "valid, waiting validation, rejected" ?
do you model your valueObjects using this pattern ?
or do you use lots of enum and only store their toString (or ordinal) in the database ?
In the Java OO objects world, I'm currently thinking that it is easier to manipulate enum that objects loaded from the database. I need to build repositories to load them for example. And it will be so simple to use them as enums. I'm searching some recomfort here or perhaps am I missing something so obvious ?
thanks
see you in 2015 !
Update 1 :
We can create a "Budget" and the first one is mark as Initial and the next ones are marked as "Corrective" (with a increment). For example, we can have a list of Budgets :"Initial Budget", "Corrective budget #1", "Corrective budget #2".
For this we have this database design : a Budget Table, a Version Budge with a foreign key between the two. the Version budget only contains an ID, a CODE and a LABEL.
Personnaly, I would like to remove this table. I don't see the advantages of this structure. And from the OO perspective, when I'm creating a budget I can query the databse to see if I need to create an Inital or Corrective budget (using a count query) then I can set the right enum to my new budget. But with the current design I need to query the database using the CODE that I want, select the ID and set the ID. So yes, it's really database oriented. Where is the DDD part ? a ValueObject is something that describe, quantify something. In my case seems good to me. A Version describe the current status of my Budget. I can comapre two versions just but checking their code, they don't have lifecycle (I don't want this one in particular).
How to you handle this type of usecases ?
It's only a simple example because I found that if you ask a database admin he would surely said that all seems good : using primary key, modeling relations, enforing constraints, using foreign key and avoid data duplication.
Thanks again Mike and Doctor for their comments.

I will hook in in your country example. In most cases, country will be a value object. There is nothing that will reference a country entity and that should know that if the values of the country changes it is still the same country. In fact, the country could be represented as an enum, and some nasty resource lookup functions that translate the Iso3 into a usefull display text. What we do is, we define it as a value object class with iso3, displayname and some other static information. Now out of this value object we define a kind of "power enum" (I still miss a standard term here). The class implementing the country value object gets a private constructor and static properties for each of its values (for each country) and explicit cast operators from and to int. Now you can treat it just like a normal enum of your programing language. The advantage to a normal enum beside having more property fields is, that it also can have methods (of course query methods, that don't change the state of the object). You can even use polymorphism (some countries with different behaviour than others). You could also load the content of the enums from a database table (without the statics then and a static lookupByIso3 method instead).
This you could make with some other "enum like" value objects, too. Imagine Currencies (it could have conversion methods that are implemented polymorphic). The handling of the daily exchange rates is a different topic though.
If the set of values is not fixed (for example another value object candidate like postal adress) then it is not a value object enum, but a standard value object that could be instantiated with the values you want.
To decide if you can live with something as a value object, you can use the following question: Do you want copy semantic, or reference semantic? If you ever change a property of the object, should all places where you used it update, too, or should they stay as they are? If the latter, than the "changed" object is a new and different value object. Another question would be, if you need to track changes to an object realizing that it remains the "same" despite of changing values. And if you have a value object, where you only want specific instances to exist, it is a kind of enum described above.
Does that somehow help you?

JPA inheritance alternative for SINGLE_TABLE?

AppEngine only supports "TABLE_PER_CLASS" and "MAPPED_SUPERCLASS" for JPA inheritance.
Unfortunately "JOINED" and especially "SINGLE_TABLE" are not supported.
I'm wondering what the best alternative is to implement a SINGLE_TABLE alternative?
My only requirements are:
1) Have separate classes like AbstractEmployee, InternalEmployee, ExternalExmployee.
2) Being able to run a query over all employees, thus resulting in both InternalEmployee and ExternalEmployee instances.
The only thing I'm thinking off is using a 'big' Employee object containing all fields?
Any other ideas?
PS: vote for proper "SINGLE_TABLE" support via http://code.google.com/p/googleappengine/issues/detail?id=8366

You could in theory use #Embeded and #Embeddable to group related fields into an object. So you would have a class that looks something like.
#Entity
public class Employee {
// all the common employee fields go here
//
// the discriminator column on Employee class lets you be specific in your queries
private Integer type;
#Emebded
private Internal internal; // has the fields that are internal
#Embeded
private External external; // has the fields that are external
equals & hashcode that compare based on the discriminator type and other fields
}

What AppEngine supports and doesn't support is misleading there. AppEngine uses a property store, so any Kind can have any properties. Consequently, in principle, a Kind can contain InternalEmployee and ExternalEmployee "instances". The only thing that AppEngine JPA actually does is store all fields of a class in a single Kind object. That doesn't preclude having subtypes stored in the same Kind (with extra properties for the subtype-specific fields), which is the equivalent of "single-table".
PS, raising some issue on "AppEngine" as a whole won't get any response (look at the rest of issues in there ;-) ), bearing in mind the code affected here is in its own project at http://code.google.com/p/datanucleus-appengine and has its own issue tracker

XStream - treat objects with the same identifier as one

I am working on a legacy system where XStream is being used to serialize objects in order to keep two databases in sync. A new object is first stored in one database, then the stored object is serialized and sent to be stored in the other database.
Up until recently, the structure of the object in question was like this:
public class Project {
List<Milestone> milestones;
[...]
}
But, after changes to the requirements, the structure is supposed to be like this:
public class Project {
List<Goal> goals;
}
public class Goal {
List<Milestone> milestones;
}
In order to keep the milestones of legacy data, which knew nothing about goals, the final structure of project was this:
public class Project {
List<Goal> goals;
List<Milestone> milestones;
}
So, there are two paths from a Project, to a Milestone, one directly and one through a Goal. The problem occurs when this structure is deserialized and stored. When it is being deserialized by XStream, the objects for the Milestones connected to the Project directly becomes different objects from the ones connected through Goals, even though they have the same id.
As long as Hibernate's Session#merge() was used to persist this object, it was no problem, since merge() doesn't care about the object identifiers as long as the db identifiers are the same.
But, I can no longer use merge() for this purpose, and have to rely on Session#save() instead. And save() DO care about the object identifiers! So now I get a org.hibernate.NonUniqueObjectException when trying to store the deserialized object.
I figure the least intrusive way to solve this is, if it's possible, to make XStream create 1 object per database id. But is this possible?

After some consideration, it is appearant to me that the problem is not XStream, as it has mechanisms for object references. The problem is another nifty "feature" of the project I'm working on - it has 2 versions of each domain class, one for commmunication with Hibernate, and one for "logic use" (don't ask me why...) In the conversion between these two versions (which basically moves values from one object to another), objects are new-ed uncritically, resulting in the same "Hibernate-object" being transformed into multiple "Java-objects". Then, I can't really blame XStream for not understanding that these should be the same :)

How to create a Java(6) Hibernate(3.6) Entity or other construct to create a unique combination of a string + int

I'm working on a desktop application in Java6 using H2 as the db and Hibernate 3.6.
Because of a construct with a third-party library involving JNI and some interesting decisions made a priori, I am unable to pass around long identifiers in their index code, and can only pass int. These indexes are generated quickly and repeatedly(not my choice), and get handed around via callbacks. However, I can split my expected dataset along the lines of a string value, and keep my id size at int without blowing out my id's. To this end, I'm keeping a long value as pk on the core object, and then using that as a one-to-one into another table, where it maps the int id back to the core entity, which when combined with the string, is unique.
So I've considered embedded compound keys and such in hibernate, but what I REALLY want is to just have this "extra" id that is unique within the context of the extra string key, but not necessarily universally unique.
So something like(not adding extraneous code/annotations):
#Entity
public class Foo{
...
#Id
public Long getId(){...}
...
#OneToOne
#PrimaryKeyJoinColumn
public ExtraKey getExtra(){...}
}
#Entity
public class ExtraKey{
...
#Id
public Long getFooId(){...}
...
public Integer getExtraId(){...}
...
public String getMagicString(){...}
}
In that case, I could really even remove the magicString, and just have the fooId -> extraId mapping in the table, and then have the extraId + magicString be in another where magicString is unique. However, I want hibernate to allow the creation of new magicString's at whim(app requirement), ideally one per row in a table, and then have hibernate just update the extraId associated to that magicString via incrementation/other strategy.
Perusing all of the hibernate manuals and trying a few tests on my own in a separate environment has not quite yielded what I want(dynamically created named and sequential id's basically), so I was hoping for SO's input. It's entirely possible I'll have to hand-code all of it myself in the db with sequences or splitting a long and doing logic on the upper and lower, but I'd really rather not, as I might have to maintain this code someday(really likely).
Edit/Addendum
As a sneaky way of getting around this, I'm just adding the extraId to the Foo object(ditching the extraKey class), and generating it from another object singleton, that at load time, does a group by select over the backing Foo table, returning magicKey, and the max(extraId). When I create a new Foo, I ask that object(multithread safe) to hand me the next extraId for the given magicKey and push that into Foo, and store it, thus updating my effective extraId for each magicKey on next app reload without an extra table. It costs me one group by query on the first request for a new extraId, which is suboptimal, but it's fast enough for what I need, simple enough to maintain in the future, and all contained in an external class, so I COULD replace it in one place if I ever come up with something more clever. I do dislike having the extra "special query" in my dao for this purpose, but it's easy enough to remove in the future, and well-documented.

Maybe I still didn't understand your problem properly, but I think you can consider using Hibernate's hilo algorithm. It will generate unique identifier for the whole database, based on a table that Hibernate creates and manages. More details here:
http://docs.jboss.org/hibernate/core/3.5/reference/en/html/mapping.html#mapping-declaration-id

Persisting data suited for enums

Most projects have some sort of data that are essentially static between releases and well-suited for use as an enum, like statuses, transaction types, error codes, etc. For example's sake, I'll just use a common status enum:
public enum Status {
ACTIVE(10, "Active");
EXPIRED(11, "Expired");
/* other statuses... */
/* constructors, getters, etc. */
}
I'd like to know what others do in terms of persistence regarding data like these. I see a few options, each of which have some obvious advantages and disadvantages:
Persist the possible statuses in a status table and keep all of the possible status domain objects cached for use throughout the application
Only use an enum and don't persist the list of available statuses, creating a data consistency holy war between me and my DBA
Persist the statuses and maintain an enum in the code, but don't tie them together, creating duplicated data
My preference is the second option, although my DBA claims that our end users might want to access the raw data to generate reports, and not persisting the statuses would lead to an incomplete data model (counter-argument: this could be solved with documentation).
Is there a convention that most people use here? What are peoples' experiences with each and are there other alternatives?
Edit:
After thinking about it for a while, my real persistence struggle comes with handling the id values that are tied to the statuses in the database. These values would be inserted as default data when installing the application. At this point they'd have ids that are usable as foreign keys in other tables. I feel like my code needs to know about these ids so that I can easily retrieve the status objects and assign them to other objects. What do I do about this? I could add another field, like "code", to look stuff up by, or just look up statuses by name, which is icky.

We store enum values using some explicit string or character value in the database. Then to go from database value back to enum we write a static method on the enum class to iterate and find the right one.
If you expect a lot of enum values, you could create a static mapping HashMap<String,MyEnum> to translate quickly.
Don't store the actual enum name (i.e. "ACTIVE" in your example) because that's easily refactored by developers.

I'm using a blend of the three approaches you have documented...
Use the database as the authoritative source for the Enum values. Store the values in a 'code' table of some sort. Each time you build, generate a class file for the Enum to be included in your project.
This way, if the enum changes value in the database, your code will be properly invalidated and you will receive appropriate compile errors from your Continuous Integration server. You have a strongly typed binding to your enumerated values in the database, and you don't have to worry about manually syncing the values between code and the data.

Joshua Bloch gives an excellent explanation of enums and how to use them in his book "Effective Java, Second Edition" (p.147)
There you can find all sorts of tricks how to define your enums, persist them and how to quickly map them between the database and your code (p.154).
During a talk at the Jazoon 2007, Bloch gave the following reasons to use an extra attribute to map enums to DB fields and back: An enum is a constant but code isn't. To make sure that a developer editing the source can't accidentally break the DB mapping by reordering the enums or renaming then, you should add a specific attribute (like "dbName") to the enum and use that to map it.
Enums have an intrinsic id (which is used in the switch() statement) but this id changes when you change the order of elements (for example by sorting them or by adding elements in the middle).
So the best solution is to add a toDB() and fromDB() method and an additional field. I suggest to use short, readable strings for this new field, so you can decode a database dump without having to look up the enums.

While I am not familiar with the idea of "attributes" in Java (and I don't know what language you're using), I've generally used the idea of a code table (or domain specific tables) and I've attributed my enum values with more specific data, such as human readable strings (for instance, if my enum value is NewStudent, I would attribute it with "New Student" as a display value). I then use Reflection to examine the data in the database and insert or update records in order to bring them in line with my code, using the actual enum value as the key ID.

What I used in several occations is to define the enum in the code and a storage representation in the persistence layer (DB, file, etc.) and then have conversion methods to map them to each other. These conversion methods need only be used when reading from or writing to the persistent store and the application can use the type safe enums everywhere. In the conversion methods I used switch statements to do the mapping. This allows also to throw an exception if a new or unknown state is to be converted (usually because either the app or the data is newer than the other and new or additional states had been declared).

If there's at least a minor chance that list of values will need to be updated than it's 1. Otherwise, it's 3.

Well we don't have a DBA to answer to, so our preference is for option 2).
We simply save the Enum value into the database, and when we are loading data out of the database and into our Domain Objects, we just cast the integer value to the enum type.
This avoids any of the synchronisation headaches with options 1) and 3). The list is defined once - in the code.
However, we have a policy that nobody else accesses the database directly; they must come through our web services to access any data. So this is why it works well for us.

In your database, the primary key of this "domain" table does't have to be a number. Just use a varchar pk and a description column (for the purposes your dba is concerned). If you need to guarantee the ordering of your values without relying on the alphabetical sor, just add a numeric column named "order or "sequence".
In your code, create a static class with constants whose name (camel-cased or not) maps to the description and value maps to the pk. If you need more than this, create a class with the necessary structure and comparison operators and use instances of it as the value of the constants.
If you do this too much, build a script to generate the instatiation / declaration code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.