Where do you put your dictionary data?

Where do you put your dictionary data? - java

Let's say I have a set of Countries in my application. I expect this data to change but not very often. In other words, I do not look at this set as an operational data (I would not provide CRUD operations for Country, for example).
That said I have to store this data somewhere. I see two ways to do that:
Database driven. Create and populate a Country table. Provide some sort of DAO to access it (findById() ?). This way client code will have to know Id of a country (which also can be a name or ISO code). On the application side I will have a class Country.
Application driven. Create an Enum where I can list all the Countries known to my system. It will be stored in DB as well, but the difference would be that now client code does not have to have lookup method (findById, findByName, etc) and hardcode Id, names or ISO codes. It will reference particular country directly.
I lean towards second solution for several reasons. How do you do this?
Is this correct to call this 'dictionary data'?
Addendum: One of the main problems here is that if I have a lookup method like findByName("Czechoslovakia") then after 1992 this will return nothing. I do not know how the client code will react on it (after all it sorta expects always get the Country back, because, well, it is a dictionary data). It gets even worse if I have something like findById(ID_CZ). It will be really hard to find all these dependencies.
If I will remove Country.Czechoslovakia from my enum, I will force myself to take care of any dependency on Czechoslovakia.

In some applications I've worked on there has been a single 'Enum' table in the database that contained all of this type of data. It simply consisted of two columns: EnumName and Value, and would be populated like this:
"Country", "Germany"
"Country", "United Kingdom"
"Country", "United States"
"Fruit", "Apple"
"Fruit", "Banana"
"Fruit", "Orange"
This was then read in and cached at the beginning of the application execution. The advantages being that we weren't using dozens of database tables for each distinct enumeration type; and we didn't have to recompile anything if we needed to alter the data.
This could easily be extended to include extra columns, e.g. to specify a default sort order or alternative IDs.

This won't help you, but it depends...
-What are you going to do with those countries ?
Will you store them in other tables in the DB / what will happen with existing data if you add new countries / will other applications access to those datas ?
-Are you going to translate the contry names in several languages ?
-Will the business logic of your application depend on the choosen country ?
-Do you need a Country class ?
etc...
Without more informations I would start with an Enum with a few countries and refactor depending on my needs...

If it's not going to change very often and you can afford to bring the application down to apply updates, I'd place it in a Java enumeration and write my own methods for findById(), findByName() and so on.
Advantages:
Fast - no DB access for invariant data (or caching requirement);
Simple;
Plays nice with refactoring tools.
Disadvantages:
Need to bring down the application to update.
If you place the data in its own jarfile, updating is as simple as updating the jar and restarting the application.
The hardcoding concern can be made to go away either by consumers storing a value of the enumeration itself, or by referencing the ISO code which is unlikely to change for countries...
If you're worried about keeping this enumeration "in synch" with the database, write an integration test that checks exactly that and run it regularly (eg: on your CI machine).

Personally, I've always gone for the database approach, mostly because I'm already storing other information in the database so writing another DAO is easy.
But another approach might be to store it in a properties file in the jar? I've never done it that way in Java, but it seems to be common in iPhone development (something I'm currently learning).

I'd probably have a text file embedded into my jar. I'd load it into memory on start-up (or on first use.) At that point:
It's easy to change (even by someone with no programming knowledge)
It's easy to update even without full redeployment - put just the text file somewhere on the class path
No database access required
EDIT: Okay, if you need to refer to the particular country data from code, then either:
Use the enum approach, which will always mean redeployment
Use the above approach, but keep an enum of country IDs and then have a unit test to make sure that each ID is mapped in the text file. That means you could change the rest of the data without redeployment, and a non-technical person can still update the data without seeing scary code everywhere.
Ultimately it's a case of balancing pros and cons - if the advantages above aren't relevant for you (e.g. there'll always be a coder on hand, and deployment isn't an issue) then an enum makes sense.

One of the advantages of using a database table is you can put foreign key constraints in. That way your referential integrity will always be intact. No need to run integration tests as DanVinton suggested for enums, it will never get out of sync.
I also wouldn't try making a general enum table as saw-lau suggested, mainly because you lose clean foreign key constraints, which is the main advantage of having them in the DB in the first place (might was well stick them in a text file). Databases are good at handling lots of tables. Prefix the table names with "ENUM_" if you want to distinguish them in some fashion.
The app can always load them into a Map as start-up time or when triggered by a reload event.
EDIT: From comments, "Of course I will use foreign key constraints in my DB. But it can be done with or without using enums on app side"
Ah, I missed that bit while reading the second bullet point in your question. However I still say it is better to load them into a Map, mainly based on DRY. Otherwise, when whoever has to maintain it comes to add a new country, they're surely going to update in one place but not the other, and be scratching their heads until they figure out that they needed to update it in two different places. A case of premature optimisation. The performance benefit would be minimal, at the cost of less maintainable code, IMHO.

I'd start off doing the easiest thing possible - an enum. When it comes to the point that countries change almost as frequently as my code, then I'd make the table external so that it can be updated without a rebuild. But note when you make it external you add a whole can of UI, testing and documentation worms.

Related

Create aggregate root in the context of another aggregate root

i'm currently struggling with the creation of instances in the ddd context.
i have read and searched alot and sometimes thought that i have found the answer only to realize that it doesnt feel right while programming it.
This is my situation:
I have two aggregate roots Scenarioand Step. I made those AR
because they encapsulate related elements of the domain and each AR
should be in a consistent state.
Multiple Steps can exist in the
context of a Scenario. They can not exist on their own.
The "name/natural id" of each Step in the context of its Scenario has to be unique. Changes in Scenario do not automatically influence its Steps and
vice versa (e.g. Step doesnt care if Scenario changes some
descriptions or images).
Different Steps of a Scenario can be used, edited, etc. at the same time.
At the moment, each Step holds a reference to its Scenario by the corresponding natural identifier. The Scenario class doesnt know anything about its Steps, so it does not hold a collection with Step references.
How would i create a new Stepfor a given Scenario?
Should i load the Scenario and call something like
createNewStep(...) on it? That would not enforce the uniqueness
constraint (that is in fact a business constraint and not a
technical one), because Scenario doesnt know about its Steps. I would probably have to go with some kind of a "disconnected domain model" then or pass a repsoitory or service to the method to perform the checks.
Should i use a domain service that enforces the constraint, queries the repository, and finally creates and returns the Step?
Should Scenario simply know about its Steps? I think i would like to avoid this one, since that would create a ugly-to-maintain bidirectional relationship.
One could imagine other use cases like a Step shall be classified by options that are provided by the specific Scenario. In this case and if there would be no constraints regarding the "collection" of Steps, i would probably go with the first "solution". Then again: if the classification is changed afterwards, the access to the scenario would be necessary to check for the allowed classifications. That brings me to a possible 4th solution:
Using some kind of "combination" of some possible solutions. Would it be a good idea to create the domain service (accessing everything needed) and use it as an argument of the method that needs it? The method would then call the service where needed and the "domain logic" stays in the entity/model.
Thank you in advance!
I'll just edit instead of copy paste answering ;)
Thank you all for your responses! :)
Pushing the steps back into the scenario would lead to some pretty big objects which i'm trying to avoid (the current running application really suffers from this). It seems that its pretty much alike the Scrum-Example of Vaughns "Effective Aggregate Design" where he is using DomainServices to get smaller aggregates (i really dont know why i'm so uncertain about using domain services). Looks like i'll have to use domainservices or split the aggregates up into "StepName" and "StepDetails" as suggested.

For background, you should read what Greg Young has to say about set validation (via WaybackMachine). In particular, you really need to evaluate, in the context of your solution, what is the business impact of having a failure?
Accept the failure and escalate is by far your easiest option. In what follows, I assume that the business impact of the failure is large, so we need to prevent it from happening.
The "name/natural id" of each Step in the context of its Scenario has to be unique
That's a classic set validation concern.
The first thing to do is challenge the assumptions in your model
Is your model the book of record for "name"? If your model isn't the authority, you have to be very cautious about introducing constraints. Understanding the boundaries of your model's authority is really important.
Is there an invariant that couples the name of a step to any other part of its state? Aggregate design discipline says that two pieces of state coupled by an invariant need to be in the same aggregate, but its silent about properties that don't participate in an invariant.
Is it reasonable to reject a name change while accepting other changes to a step? This is really a variation of the previous -- can tasks be split into two different commands (one involving name, one not) that can succeed or fail independently?
In short, the invariant may be telling you that "step name", as a piece of state, belongs in the scenario aggregate rather than in the step aggregate.
If you think about the problem from the perspective of a relational model, we're looking at a tuple (scenarioId, name, stepId), and the constraint says that (scenarioId, name) form a unique key. That's a hint that step name belongs to the scenario. In code, that signature looks like a scenario data structure that includes a Map<ScenarioName, ScenarioId>.
That won't necessarily solve all of your problems of course, but it is a step toward aligning the model with your actual business.
When that doesn't work...
The "real" answer is to move the step entity back into the scenario aggregate. One way to think about it is this -- all of the entities taken together form "the model" that we are keeping consistent. The aggregates aren't part of the business, per se; they are artificial, independent subdivisions within the model -- we identify and isolate aggregates as a performance optimization; we can perform concurrent edits, and evaluate the validity of a command while loading a much smaller data set.
If the failures make the performance optimization too expensive, you take it out. So you can see that we have an estimate, of sorts, for what it means that the business impact is "large"; it needs to be bigger than the savings we get from using aggregates on the happy path.
Another possibility is to shift where you enforce the invariant. Relational databases are really really good at set validation. So maybe the right answer is to split the enforcement concern: put the invariant into your schema as a constraint, and ignore that constraint in code.
This isn't ideal for a number of reasons -- you've effectively "hidden" the constraint, you've introduced a constraint on the kind of data store that you use for your aggregates, you've introduced a constraint that requires that you store your step aggregates in the same database as the scenario they belong to, and so on. If you squint, you'll see that this is really just the "make the step entities part of the scenario" solution, but in disguise.
But keep in mind: part of the point of domain-driven-design is that we can push back on the business when the code is telling us that the business model itself is wrong. Where's the cost benefit analysis?
Here's the thing about uniqueness constraints: the model enforces uniqueness, not correctness. Imagine a data race, two different commands that each claim the same "name" for a different step in the scenario -- perhaps caused by a data entry error. The model, presumably, can't tell which command is "right", so it's going to make some arbitrary guess (most likely, first command wins). If the model guesses wrong, it has effectively blocked the client that provided correct data!
In cases where the model is the authority, uniqueness constraints can make sense -- the SeatMap aggregate can enforce the constraint that only one ticket can be assigned to a seat at any given time, because it is the authority for assignment.

Should I create Entity with "natural ID" or should I always use Long as ID in each entity

During my career I saw two different designs, how to model business objects in DB:
Always use Long as ID for entity
Choose the most suitable as possible.
And now, we have "Resource" entity which we can download from another service. Each resource contains natural ID - email(email is just an example, we can imagine other situation when we should use String). And I want to use it as primary ID in database. But my workmates want to create additional property - Long id. I am not sure, why should I create this additional property. Of course, DB model is simpler because all entities have the same structure, but I prefer to use String id.
What do you think, guys, which model is much better and why?

First of all, I am not sure if assuming that an email can be a "Resource"s unique natural ID, since that would mean for every new resource, you need to create a new email and a resource can not read emails, but I know cases, so that may be right.
So to the question:
Impacts
Numeric IDs are faster to look up, in every case. But since Strings are pretty fast as well (when using appropriate indices) that might be enough for most of applications out there.
Numeric IDs use less space (which is usually the least problem)
String IDs are usually preferred in cases where heterogenous systems are involved (which you can see in your example: the service provides the "Resource"s with String id's). One reason for that isthat it's easier to debug, e.g. the user might see with one look, what object is referred to, another reason that Strings are the most common denominator of virtually any system (encoding problems will be there though ^^).
If you have to do lots of manual jobs in a database, you can type numbers faster, since Strings tend to be longer
If you use natural IDs it is a pretty common case that the id consists of more than one column. This makes SQL statements longer and more error prone, just as it makes Object Relational Mapper Configurations longer and more error prone.
You usually have some unique identifier (like an email), that but might change over time (people marry ^^). In those cases it is quite common to add some artifical id's as well (have both)
In your case you do not have a choice (?) other than use that string id to communicate with this service, so you at least must have this as well.
So now for my own oppinion: I think as a developer, you have less work and less problems with numeric IDs, though debugging is a little harder. As a database administrator if you have only one column it does not matter if it's String or Long, since it does not complicate joins. As long as the String is immutable, e.g. never changes, you are allright. If it can change it will definitely give you lots of headaches as an administrator (and the stupid developer won't care a bit ^^). If it might change over time, use numeric IDs.

I agree in principal that you should use natural IDs where possible, though in this case email is potentially not a good candidate. Natural IDs should be immutable, i.e. they should never change. If there is any probability that the system would need to change/disassociate an email from a resource, you're essentially changing the identity of the record.
If it were me, and there were no other potential natural IDs; use a unique number. In this case it is not adding any unnecessary complexity, and leaves the design open for future changes to requirements around the email property.

Michal,
Email and such character data columns may not be the right choice for ID as the case sensitivity depends on the database implementation and/or collation being used. Do you want user#server.com and USER#SERVER.COM to give you the same result? Whether it is possible or not depends on your choice of database/OS/collation. When you have character data based IDs, you silently push these concerns from the application to the database administration & collation config.
This might be good as it is only a one time activity and your DB admin can set it up for you but more often than not, you have to maintain seperate DB scripts for different OS & databases.
In my view, there is no thumb rule for this and you have to make the best judgement depending upon the situation.

In addition to the arguments already mentioned, you can search for "surrogate key" or visit the Wikipedia's page on this topic http://en.wikipedia.org/wiki/Surrogate_key which lists many pros and cons.

How to maintain/generate tables in Hibernate for multi-user purpose?

I'm working on a project using Play Framework that requires me to create a multi-user application. I've a central panel where we add a certain workshop for a team. Thing is, I don't know if this is the best way, but I want to generate the tables like
team1_tablename
team1_secondtable..
Then when a certain request hits using the virtual host (e.x. http://teamawesome.workshop.com) I would need to maneuver the query to THAT certain table.
The problem is not generating the tables, but working with the models. All the workshops are going to have the same generic tables. In the model I would have to state the table, etc but then if this was PHP with doctrine I would have a template created them after creating the workshop team1, but in java even if I generate them I would have to compile them too which requires me to do more research.
My question is more Hibernate oriented before jumping the gun here and giving up on possible solutions. I'm all ears
I've thought of using NamedQueries, I don't know if I misread but I read in a hibernate book that you could query then add the result to a generic model so then I use that model to retain all my results...
If there are any doubts let me know, thanks (note this is not a multi database question, just using different sets of tables with unique prefixes)

I wonder if you could use one single set of tables, but have something like TEAM_ID as a foreign key in each table.
You would need one single TEAM table, where TEAM_ID will be the primary key. This will get migrated to tables and become part of foreign keys.
For instance, if you have a Player entity, having a collection of HighScores, then in the DB the Player table will have a TEAM_ID (foreign key from the Team table) and the HighScores table will have a composed foreign key (Player_id, Team_id) coming from the Player table..
So, bottom line, I am suggesting a logical partitioning of your database rather then a physical one (as you've considered initially).
Hope this makes sense, it definitely needs more thought, but if you think it's an interesting idea, I can think it through in more detail.

I am familiar with Hibernate and another web framework, here is how I would handle it:
I would create a single set of tables for one team that would address all my needs. Then I would:
Using DB2: Create a schema for each team copying the set of tables into each schema.
Using MySQL: Create a new Database for each copying the set of tables into each one.
Note: A 'database' in MySQL is more like a schema in other databases. (Sorry I'd rather keep things too simple than miss the point)
Now you can set up a separate hibernate.cfg.xml file for each connection (this isn't exactly the best way but perhaps best to start because it's so easy). Now you can specify the connection parameters... including the schema/db. Now your entity table, lets say it's called "team" will use the "team" table where ever it is connected...
To get started very quickly, when a user logs on create a user object in their session.
The user object will have a Hibernate SessionFactory which will be used for all database requests built from the correct hibernate.cfg.xml file as determined by parsing the URL used in the login.
Once the above is working... There are some serious efficiency concerns to address. That being that each logged on user is creating a SessionFactory... Maybe it isn't an issue if there isn't a lot of concurrent use but you probably want to look into Spring at that point and use a connection pool per team. This way there is one Session factory per team and there is no major object creation when a user signs in.
The benefits of this solution is that it should be easier to create new sets of tables because each table set lives in it's own world. There will only be one set of Entity Classes as opposed to the product of one for every team and table. The database schema stays rather simple not being complicated by adding team names and then the required constraints. If the teams require data ownership and privacy it will be rather easy to move the database to a different location.
The down side is that if the model needs to be changed for a team it must be done for each team (as opposed to a single table set using teamName as a foreign key).

The idea of using different tables for each team (despite what successful apps may use it) is honestly quite naïve, and has serious pitfalls when you take maintenance into account...
Just think what you will be forced to do if you discover you need a new table or even just an index... you'll end up needing to write DML scripts as templates and to use some (custom) software to run them on all the teams...
As mentioned in the other answers (Quaternion's and Octav's), I think you have two viable options:
Bring the "team" into your data model
Split the data in different databases/schemas
To choose the option that works best for you, you must decide if the "team" is really something you can partition your dataset into, or if it is really one more entity you want to bring into your datamodel.
You may have noticed that I'm using "splitting" here instead of "partitioning" - that's because the latter term is generally used by DBAs to indicate what we could call "sharding" - "splitting" is intended to be a stronger term.
Splitting is only viable if:
entities in different partitions do not ever need to reference each other
no query will ever need to access data from different partitions (this applies to queries used for reporting too)
As you might well see, splitting in this sense is not very attractive (maybe it could be ok now, but what when you find yourself wanting to add new features?), so my advice is to go for the "the Team is an entity" solution.
Also note that maintaining a set of databases/schemas is actually harder than maintaining a single (albeit maybe a bit more complex) database... again, think of what steps you should take to add an index in a production system...
The only downside of the single-databse solution manifests if you end up having multiple front-ends (maybe due to customizations for particular customers): changes to a shared database have the potential to affect all the applications using it, so you may need to coordinate upgrades to the different webapps to minimize risks (note, however, that in most cases you'll be able to change the database without breaking compatibility).

After all it's a little bit frustrating to get no information just shoot into the dark. Nevertheless now I have start the work, I try to finish.
I think you could do you job with following solution:
Wrote a PlayPlugin and make sure you add to every request the team to the request args. Then you wrote your own NamingStrategy. In the NamingStrategy you could read the request.args and put the team into your table name. Depending on how you add it Team_ or Team. it will be your preferred solution or something with schema. It sounds that you have an db-schema so it would be probably the best solution to stay with this tables and don't migrate.
Please make the next time your request more abstract so that you can provide some information like how many tables, is team an entity and how much records a table has (max, avg, min). How stable is your table model? This are all questions which helps to give a clear recommendation with arguments.

You can try the module vhost, but it seems not very good maintained. But I think the idea to put the name of the team into the table name is really weired. Postgres and Oracle has schemas for that. So you use myTeam.myTable. But then you must do the persistence by your selves.
Another approach would be different databases, but again you don't have good support by play. I would try this
Run for each team a separate play-server, if you don't have to much teams.
Put a reference to a Team-table for every model. Then you can use hibernate-filters or add it manually as additional parameter to each query. Of course this increase your performance. You can fix this issue with oracle partitions.

Dynamic Typed Table/Model in Java EE?

Usually with Java EE when we create Model, we define the fields and types of fields through XML or annotation before compilation time. Is there a way to change those in runtime? Or better, is it possible to create a new Model based on the user's input during the runtime? Such that the number of columns and types of fields are dynamic (determined at runtime)?
Help is much appreciated. Thank you.
I felt the need to clarify myself.
Yes, I meant database modeling, when talking about Model.
As for the use cases, I want to provide a means for users to define and create their own tables. Infinite flexibility is not required. However some degree of freedom has to be there: e.g. the users can define what fields are needed to describe their product.

You sound like you want to be able to change both objects and schema according to user input at runtime. This sounds like a chaotic recipe for disaster to me. I've never seen it done.
I have seen general schemas that incorporate foreign key relationships to generic tables of name/value pairs, but these tend to become infinitely flexible abstractions that can neither be easily understood nor get out of their own way when it comes to performance.
I'm betting that your users really don't want infinite flexibility. I'd caution you against taking this direction. Better to get your real use cases straight.
Anything is possible, of course. My direct experience tells me that it's a bad idea that your users will hate if you can pull it off. Best of luck.

I worked on a system where we had such facilities. To stay efficient, we would generate/alter the table dynamically for the customer schema. We also needed to embed a meta-model (the model of the model) to process information in the entities dynamically.
Option 1: With custom tables, you have full flexibility, but it also increases the complexity significantly, notably the update/migration of existing data. Here is a list of things you will need to consider:
What if the type of a column change?
What if a column is added? Is there a default value?
What if a column is removed? Can I discard the existing information?
How to manage renaming of a column?
How to make things portable across databases?
How to make it efficient at database-level (e.g. indexes) ?
How to manage a human error (e.g. user removes a column then changes its mind)?
How to manage migration (script, deployment, etc.) when new version of the system is installed at customer site?
How to have this while using an ORM?
Option 2: A lightweight alternative is to add a few "spare" columns in the business tables of different types (e.g.: "USER_DATE_1", "USER_DATE_2", etc.) I've seen that a few times. It will makes your DBA scream and is not really considered a good practice, but at least can facilitates a few things, e.g. (migration scripts, ORM integration).
Option 3: Another option is to store everything in a table with a structure property/data. But then it's really a disaster for database performance. Anything that is not completely trivial will require many joins. And the DBA will scream even more.
Option 4: It is a mix of options 2 and 3. Core tables are fixed, but a table with property/data can be used to somehow extend them.
In summary: think twice before you go this way. It can be done, but has a significant impact on the design and maintenance of the application.

This is somehow possible using meta-modeling techniques:
tables for table / column / types at the database level
key/value structures at the Java level
But this has obvious limitations (lack of strong typed objects) and can IMHO get quickly very complicated (not even sure how to deal with relations). I wouldn't use this approach to define domain objects entirely, but only to extend existing ones (products, articles, etc).
If I remember well, this is what some e-commerce solutions (e.g. BroadVision) were doing.

I think I have found a good answer myself. Those new no-sql (hbase, cassandra) database seems to be exactly what I was looking for. Thanks everyone for your answeres.

hibernate workflow

I'm trying to write a program with Hibernate. My domain is now complete and I'm writing the database.
I got confused about what to do. Should I
make my sql tables in classes and let the Hibernate make them
Or create tables in the
database and reverse engineer it and
let the hibernate make my classes?
I heard the first option one from someone and read the second option on the Netbeans site.
Does any one know which approach is correct?

It depends on how you best conceptualize the program you are writing. When I am designing my system I usually think in terms of entities and their relationships to eachother, so for me, I start with my business objects, then write my hibernate mappings and let hibernate create the database.
Other people are able to think better in terms of database tables, in whcih case that approach is best for them. So you gotta decide which one works for you based on your experience.

I believe you can do either, so it's down to preference.
Personally, I write the lot by hand. While Hibernate does a reasonable job of creating a database for you it doesn't do it as well as I can do myself. I'd assume the same goes for the Java classes it produces although I've never used that feature.
With regards to the generated classes (if you went the class generation route) I'm betting every field has a getter/setter whether fields should be read only or not (did somebody say thread safety and mutability) and that you can't add behavior because it gets overridden if you regenerate the classes.

Definitely write the java objects and then add the persistence and let hibernate generate the tables.
If you go the other way you lose the benefit of OOD and all that good stuff.

I'm in favor of writing Java first. It can be a personal preference though.
If you analyse your domain, you will probably find that they are some duplication.
For example, the audit columns (user creator and editor, time created and edited) are often common to most tables.
The id is often a common field.
Look at your domain to see your duplication.
The duplication is an opportunity to reuse.
You could use inheritance, or composition.
Advantages :
less time : You will have much less things to write,
logical : the same logical field would be written once (that would be other be many similar fields)
reuse : in the client code for your entities, you could write reusable code. For example, if all your entities have the same id field called ident because of their superclass, a client code could make the generic call object.getIdent() without having to find out the exact class of the object, so it will be more reusable.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.