Pragmatic Programmer: avoid data source duplication by using map? - java

In the Pragmatic Programmer book, chapter “Data source duplication” the authors state:
Many Data sources allow you to introspect on their data schema. This can be used to remove much of the duplication between them and your code. Rather than manually creating the code to contain this stored data, you can generate the containers directly from the schema. Many persistence frameworks will do this heavy lifting for you.
So far so good. We can achieve this easily connecting our IDE to the DB and let it create our entities for us.
Then it continues:
There’s another option, and one we often prefer. Rather than writing code that represents external data in a fixed structure (an instance of a struct of class for example), just stick it into a key/value data structure (your language might call it a map, hash, dictionary, or even object). On its own this is risky .... we recommend adding a second layer to this solution: a simply table-driven validation suite that verifies that the map you’ve created contains at least the data you need. Your API documentation tool might be able to generate this.
The idea if I got it right is to avoid having an Entity to represent the table in the DB (so to avoid duplication of knowledge) but rather to use a map, so that if we add a new column to the schema we don’t need to update our representation of that schema (i.e. the entity) as well in our application.
Then comes the part that is not clear to me: he talks about an autogenerated “table-driven validation suite that verifies that the map you’ve created contains at least the data you need”.
Does any of you know how these concept implemented would look like?
The closest thing i could find on Google about this topic is this question on StackOverflow but the answers skipped the second part.

I think it really depends on the language you’re using and on the data you need to read. For Java, if you’re mapping the raw data to a Map what you could do is to use validators (ex Hibernate validators or spring validators) to define your custom annotation and enforce that schema’s constraints are respected when creating the in memory representation (eg: you’re reading a user table with id as primary key, the map must then contain the id key with a valid value)

Related

Using same Entity classes in different spring data repositories

I'm trying to put together a project in which I have to persist some entity classes using different spring data repositories (gemfire, jpa, mongodb etc). As the data is more or less the same that needs to go into these repositories, I was wondering if I can use the same entity class for all of them to save me from converting from one object to another?
I got it working for gemfire and jpa but the entity class is already starting to looking a bit wired.
#Id // spring-data-gemfire
#javax.persistence.Id // jpa
#GeneratedValue
private Long id;
So far I can see following options:
Create an interface based separate Entity (domain) classes - Trying to re-use same class looks like a bit of premature optimization.
Externalize xml based mapping for JPA, not sure if gemfire and mongodb mapping can be externalized.
Use different concrete entity classes and use some copy constructor/converter for the conversion.
Been literally hitting my head against the wall to find the best approach - Any response is much appreciated. Thanks
If by weird, you mean your application domain objects/entity classes are starting to accumulate many different, but separate (mapping) annotations (some semantically the same even, e.g. SD Common's o.s.data.annotation.Id and JPA's #javax.persistence.Id) for the different data stores in which those entities will be persisted, then I suppose that is understandable.
The annotation pollution only increases too as the number of representations for your entities increases. For example, think Jackson annotations for JSON mapping or JAXB for XML, etc. Pretty soon, you have more meta-data then actual data, :-)
However, it is more a matter of preference, convenience, simplicity, really.
Some developers are purists and like to externalize everything. Others like to keep information (meta-data) close to the code using it. Even certain patterns have emerged to address these type of concerns... DTOs, Bounded Contexts (see Fowler's BoundedContext, which has a strong correlation to DDD and Microservices).
Personally, I use the following rules when designing and applying architectural principals/decisions in my code, especially when introducing something new:
Simplicity
Consistency
DRY
Test
Refactor
(along with a few others as well... good OOD, SoC, SOLID, Design Patterns, etc).
In that order too. If something starts getting too complex, refactor and simplify it. Be consistent in what you do by following/using patterns, conventions; familiarity is 1 key to consistency. But, don't keep repeating yourself either.
At the end of the day, it is really about maintaining the application. Will someone else who picks up where you left off be able to understand the organization and logic quickly, and be able to maintain it... simplicity is king. It does not mean it is so simple it is not viable or valuable. Even complex things can be simple if organized properly. However, breaking things apart and introducing abstractions can have hidden costs (see closing thoughts).
To more concretely answer (a few of) your questions...
I am not certain about MongoDB, but (Spring Data) GemFire does not have an external mapping. Minimally, #Region (on the entity class) and #Id are required, along with #PersistenceConstructor if your entity class has more than 1 constructor. For example.
This sounds sneakingly like to DTOs. Personally, I think BoundContexts are a better, more natural model of the application's data since the domain model should not be unduly tied to any persistent store or external representation (e.g. JSON, XML, etc). The application domain model is the 1 true state of the application and it should model the concept that is represents in a natural way, not superficially to satisfy some representation or persistent store (hence the mapping/conversion).
Anyway, try not to beat yourself up too much. It is all about managing complexity. Try to let yourself just do and use testing and other feedback loops to find an answer that is right for your application. You'll know.
Hope this helps.

Correct modeling historical records in a database

In my applications I have a set of object which stay alive during the whole application lifecycle and I need to create an historical database of them.
These objects are instances of a hierarchy of Java / Scala classes annotated with Hibernate annotations, which I use in my application to load them at startup. Luckily all the classes already contain a timestamp, which means that I do not need to change the object model to be able to create historical records.
What is the most suitable approach:
Use Hibernate without annotations and providing external xml mappings, which are the same as the one of annotations besides the primary key ( which is now a composite key consisting of the previous primary key + the timestamp)
Use other classes for historical records ( this sounds very complicated, as I do have a hierarchy of classes and not a single class, and I would have to subclass my HistoricalRecordClass for every type of record, as I want to build it back). Still use Hibernate
Use a completely different approach (Please not I do not like ORMS, it is just a matter of convience)
Some considerations:
The goal of storing historical records is that the user, through a single GUI, might access both the real-time values of certain data or the historical value, just by specifying a date.
How do you intend to use the historical records? The easiest solution would be to serialize them as JSON and log them to a file.
I've never combined hibernate xml mappings in conjunction with hibernate annotations, but if it worked, it sounds more attractive than carrying two parallel object models.
If you need to be able to recreate the application state at any point in time, then you're more or less stuck with writing them to a database (because of the fast random access). You could cheat and have a "history" table that has a composite key of id + timestamp + type, then a "json" field where you just marshal the thing down and save it. That would help with a) carrying one history table instead of a bunch of clone tables, and b) give you some flexibility if the schema changes (i.e. leverage the open schema nature of JSON)
But since it's archive data with a different usage pattern (you're just reading/writing the records whole), I'd think about some other means of storing it than with the same strict schema as the live data.
It's a nice application of the "write once" paradigm... do you have Hadoop available? ;)

ORMLite - force read objects to have same identity

I'm reading a hierarchy of objects with ORMLite. It is shaped like a tree, parents have a #ForeignCollection of 0+ children, and every child refers to its parent with #DatabaseField(foreign=true). I'm reading and saving the whole hierarchy at once.
As I'm new to ORM in general, and to ORMLite as well, I didn't know that when objects with the same ID in the database are read, they won't be created as the actually same object with the same Identity, but as several duplicates having the same ID. Meaning, I'm now facing the problem that (let's say "->" stands for "refers to") A -> B -> C != C -> B -> A.
I was thinking to solve the problem by manually reading them through the provided DAOs and puting them together by their ID, assuring that objects with the same ID have the same identity.
Is there are ORMLite-native way of solving this? If yes, what is it, if not, what are common ways of solving this problem? Is this a general problem of ORM? Does it have a name (I'd like to learn more about it)?
Edit:
My hierarchy is so that one building contains several floors, where each floor knows its building, and each floor contains several zones, where every zone knows its floor.
Is this a general problem of ORM? Does it have a name (I'd like to learn more about it)?
It is a general pattern for ORMs and is called “Identity Map”: within a session, no matter where in your code you got a mapped object from the ORM, there will be only one object representing a specific line in the db (i.e. having it’s primary key).
I love this pattern: you can retrieve something from the db in one part of your code, even do modifications to it, store that object in a instance variable, etc... And in another part of the code, if you get hold of an object for the same “db row” (by whatever means: you got it passed as a argument, you made a bulk query to the db, you created a “new” mapped object with the primary key set to the same and add it to the session), you will end up with the same object. – Even the modifications from before (including unflushed) will be there.
(adding an mapped object to the session may fail because of this, and depending on the ORM and programming language this adding may give you another object back as “the same”)
Unfortunately there is not a ORMLite-native way of solving this problem. More complex ORM systems (such as Hibernate) have caching layers which are there specifically for this reason. ORMLite does not have a cache layer so it doesn't know that it just returned an object with the same id "recently". Here's documentation of Hibernate caching:
http://docs.jboss.org/hibernate/core/3.3/reference/en/html/performance.html
However, ORMLite is designed to be Lite and cache layers violate that designation IMO. About the only [unfortunate] solution that I see to your issue in ORMLite is to do what you are doing -- rebuilding the object tree based on the ids. If you give more details about your hierarchy we may be able to help more specifically.
So after thinking about your case a bit more #riwi, it occurred to me that if you have a Building that contains a collection of Floors, there is no reason why the Building object on each of the Floors in the collection cannot be set with the parent Building object. Duh. ORMLite has all of the information it needs to make this happen. I implemented this behavior and it was released in version 4.24.
Edit:
As of ORMLite version 4.26 we added an initial take on an object-cache that can support the requested features asked for. Here are the docs:
http://ormlite.com/docs/object-cache

XML vs. object trees

In my current project (an order management system build from scratch), we are handling orders in the form of XML objects which are saved in a relational database.
I would outline the requirements like this:
Selecting various details from anywhere in the order
Updating / enriching data (e.g. from the CRM system)
Keeping a record of the changes (invalidating old data, inserting new values)
Details of orders should be easily selectable by SQL queries (for 2nd level support)
What we did:
The serialization is done with proprietary code, disassembling the order into tables like customer, address, phone_number, order_position etc.
Whenever an order is processed a bit further (e.g. due to an incoming event), it is read completely from the database and assembled back into a XML document.
Selection of data is done by XPath (scattered over code).
Most updates are done directly in the database (the order will then be reloaded for the next step).
The problems we face:
The order structure (XSD) evolves with every release. Therefore XPaths and the custom persistence often breaks and produces bugs.
We ended up having a mixture of working with the document and the database (because the persistence layer can not persist the changes in the document).
Performance is not really an issue (yet), since it is an offline system and orders are often intentionally delayed by days.
I do not expect free consultancy here, but I am a little confused on how the approach could be improved (next time, basically).
What would you think is a good solution for handling these requirements?
Would working with an object graph, something like JXPath and OGNL and an OR mapper be a better approach? Or using XML support of e.g. the Oracle database?
If your schema changes often, I would advise against using any kind of object-mapping. You'd keep changing boilerplate code just for the heck of it.
Instead, use the declarative schema definition to validate data changes and access.
Consider an order as a single datum, expressed as an XML document.
Use a document-oriented store like MongoDB, Cassandra or one of the many XML databases to manipulate the document directly. Don't bother with cutting it into pieces to store it in a relational db.
Making the data accessible via reporting tools in a relational database might be considered secondary. A simple map-reduce job on a MongoDB, for example, could populate the required order details into a relational database whenever required, separating the two use cases quite naturally.
The standard Java EE approach is to represent your data as POJOs and use JPA for the database access and JAXB to convert the objects to/from XML.
JPA
Object-to-Relational standard
Supported by all the application server vendors.
Multiple available implementations EclipseLink, Hibernate, etc.
Powerful query language JPQL (that is very similar to SQL)
Handles query optimization for you.
JAXB
Object-to-XML standard
Supported by all the application server vendors.
Multiple implementations available: EclipseLink MOXy, Metro, Apache JaxMe, etc.
Example
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-15.html

Where do you put your dictionary data?

Let's say I have a set of Countries in my application. I expect this data to change but not very often. In other words, I do not look at this set as an operational data (I would not provide CRUD operations for Country, for example).
That said I have to store this data somewhere. I see two ways to do that:
Database driven. Create and populate a Country table. Provide some sort of DAO to access it (findById() ?). This way client code will have to know Id of a country (which also can be a name or ISO code). On the application side I will have a class Country.
Application driven. Create an Enum where I can list all the Countries known to my system. It will be stored in DB as well, but the difference would be that now client code does not have to have lookup method (findById, findByName, etc) and hardcode Id, names or ISO codes. It will reference particular country directly.
I lean towards second solution for several reasons. How do you do this?
Is this correct to call this 'dictionary data'?
Addendum: One of the main problems here is that if I have a lookup method like findByName("Czechoslovakia") then after 1992 this will return nothing. I do not know how the client code will react on it (after all it sorta expects always get the Country back, because, well, it is a dictionary data). It gets even worse if I have something like findById(ID_CZ). It will be really hard to find all these dependencies.
If I will remove Country.Czechoslovakia from my enum, I will force myself to take care of any dependency on Czechoslovakia.
In some applications I've worked on there has been a single 'Enum' table in the database that contained all of this type of data. It simply consisted of two columns: EnumName and Value, and would be populated like this:
"Country", "Germany"
"Country", "United Kingdom"
"Country", "United States"
"Fruit", "Apple"
"Fruit", "Banana"
"Fruit", "Orange"
This was then read in and cached at the beginning of the application execution. The advantages being that we weren't using dozens of database tables for each distinct enumeration type; and we didn't have to recompile anything if we needed to alter the data.
This could easily be extended to include extra columns, e.g. to specify a default sort order or alternative IDs.
This won't help you, but it depends...
-What are you going to do with those countries ?
Will you store them in other tables in the DB / what will happen with existing data if you add new countries / will other applications access to those datas ?
-Are you going to translate the contry names in several languages ?
-Will the business logic of your application depend on the choosen country ?
-Do you need a Country class ?
etc...
Without more informations I would start with an Enum with a few countries and refactor depending on my needs...
If it's not going to change very often and you can afford to bring the application down to apply updates, I'd place it in a Java enumeration and write my own methods for findById(), findByName() and so on.
Advantages:
Fast - no DB access for invariant data (or caching requirement);
Simple;
Plays nice with refactoring tools.
Disadvantages:
Need to bring down the application to update.
If you place the data in its own jarfile, updating is as simple as updating the jar and restarting the application.
The hardcoding concern can be made to go away either by consumers storing a value of the enumeration itself, or by referencing the ISO code which is unlikely to change for countries...
If you're worried about keeping this enumeration "in synch" with the database, write an integration test that checks exactly that and run it regularly (eg: on your CI machine).
Personally, I've always gone for the database approach, mostly because I'm already storing other information in the database so writing another DAO is easy.
But another approach might be to store it in a properties file in the jar? I've never done it that way in Java, but it seems to be common in iPhone development (something I'm currently learning).
I'd probably have a text file embedded into my jar. I'd load it into memory on start-up (or on first use.) At that point:
It's easy to change (even by someone with no programming knowledge)
It's easy to update even without full redeployment - put just the text file somewhere on the class path
No database access required
EDIT: Okay, if you need to refer to the particular country data from code, then either:
Use the enum approach, which will always mean redeployment
Use the above approach, but keep an enum of country IDs and then have a unit test to make sure that each ID is mapped in the text file. That means you could change the rest of the data without redeployment, and a non-technical person can still update the data without seeing scary code everywhere.
Ultimately it's a case of balancing pros and cons - if the advantages above aren't relevant for you (e.g. there'll always be a coder on hand, and deployment isn't an issue) then an enum makes sense.
One of the advantages of using a database table is you can put foreign key constraints in. That way your referential integrity will always be intact. No need to run integration tests as DanVinton suggested for enums, it will never get out of sync.
I also wouldn't try making a general enum table as saw-lau suggested, mainly because you lose clean foreign key constraints, which is the main advantage of having them in the DB in the first place (might was well stick them in a text file). Databases are good at handling lots of tables. Prefix the table names with "ENUM_" if you want to distinguish them in some fashion.
The app can always load them into a Map as start-up time or when triggered by a reload event.
EDIT: From comments, "Of course I will use foreign key constraints in my DB. But it can be done with or without using enums on app side"
Ah, I missed that bit while reading the second bullet point in your question. However I still say it is better to load them into a Map, mainly based on DRY. Otherwise, when whoever has to maintain it comes to add a new country, they're surely going to update in one place but not the other, and be scratching their heads until they figure out that they needed to update it in two different places. A case of premature optimisation. The performance benefit would be minimal, at the cost of less maintainable code, IMHO.
I'd start off doing the easiest thing possible - an enum. When it comes to the point that countries change almost as frequently as my code, then I'd make the table external so that it can be updated without a rebuild. But note when you make it external you add a whole can of UI, testing and documentation worms.

Categories