Changing entity-classes problem with development for app-engine - java

My problem is pretty much obvious:
I had a class named TaskDescription which corresponded to a kind (table) in App Engine Datastore.
Then I've renamed it to TaskContent and all stored data is seems to be lost now (because class name which is the so-called kind name is a part of the path to the stored data).
I realize that almost the same problem will occur every time I rename a single field in any entity-class.
Nothing is perfect in this world and I have doubts that it's possible to create all and absolutely correct entities from scratch and never change them after that.
So, how to deal with this issue?

Two ways:
Leave Entities as they are and change the mapping: in objectify you can set the entity to class name mapping: e.g. #Entity(name="EntityName").
Change Entities in the datastore: as datastore is schemaless this can not be done with a simple command. For Entity name change you need to create a new entity, copy all properties and delete the old entity. People usually use a MapReduce jobs to perform datastore maintenance. Here's MapReduce for java.
For field name changes objectify has explicit support for migrating schemas (there is no schemas, but they still call it that way).

Related

How to use prefix table names with jpa

I'm using java play to make a web application and I'm having some trouble persisting data using ebeans and jpa annotations.
I've searched a lot, and I'm a little confused, I think that this is something that should be common, I want to have tables named like this:
company 1_users
company 2_users
company N_users
but I want to have only one entity named user, can be possible to have this schema ?
I've searched about using one entity with multiple tables, about using table name prefixes and about using dynamic table name but I got nothing helpful.
Some ideas ??
As #GlennLane said in comment, creating separate tables isn't good idea, instead use some field as a company id/discriminator. You will save tons of nerves, and you won't need to update your code each time when new company joins the branch.
If you really want separate tables create separate model for each company, at least you won't mishmash that in code.

Exception while persisting a large object graph using JPA/Hibernate

We are creating a new web application backed by JPA to replace an old web application. As part of the migration we are converting the old application's database to a new, more sophisticated, JPA-managed database.
So I've written a 'script' that converts the old database to a set of JPA entities and subsequently saves them. It works like this:
Create an order of conversion based on the dependencies of the domain models
For each entity
Execute database query to legacy DB
Store new object for each obtained table row in a list in memory
Iterate over generated lists in the same order as the conversion, and persist each entity.
Now, the first two steps work well. Upon persisting, however I get an exception. The exception occurs when one entity has a relation to another entity. For example if one of our entities would be a Book and another would be Chapter defining a #ManyToOne(optional=false) relation to Book. Upon persisting the Chapter, it throws the exception java.lang.IllegalStateException: org.hibernate.TransientPropertyValueException: Not-null property references a transient value - transient instance must be saved before current operation: models.Chapter.book -> models.Book.
Of course, this indicates that something is wrong with the state of the book: it seems it is either not set or has not yet been persisted. However, I can verify that the Book is set properly in the conversion of the Chapter, and I can also verify that all entities of type Book are persisted by the EntityManager before the entities of type Chapter get persisted. Obviously, my JPA provider does not behave as expected and does not truly persist my Book objects for some reason.
What solution would allow me to save the entire graph of objects that I have converted to the database? I use Hibernate as my JPA provider and I also use Spring 3.1 for injection of dependencies and EntityManagers.
EDIT 1: Some additional info: I've again verified that entityManager.persist() is called on each of the book objects before entityManager.persist() is called on the chapters. However, the id of the book object remains null, meaning it is not properly persisted. The database also remains empty, despite not using transactions.
EDIT 2: Because I don't think it's clear from the text above: the Book and Chapter story is just an example. It happens for any entity that references another entity. This makes it seem as if I'm not using JPA/Hibernate properly as opposed to not setting the values of my entities properly.
EDIT 3: The core issue seems to be that despite persisting Book properly, having all the right annotations, book.getId() remains null. Basically, Hibernate is not setting the ids on my entities after persisting them, leading to problems when I need to use those entities later.
I once battled with such an error from hibernate myself. It turned out that it was a combination of a circle in the object graph and the cascade settings that caused the problem.
It has been a while so the fowlling might not be 100% accurate but maybe it is enough information to track your problem:
Hibernate Wants to insert the chapter. Realizes it needs to insert the book first.
Wants to insert the book. Realizes it needs to insert another entity first (e.g. publisher)
Inserts publisher and performs cascades defined on publisher (e.g. authors)
Author has e.g. reference to his lastestBook. Because hibernate internally already marked the book as processed (in step 2) you would no get an exception stating that author.book references a transient instance.
To find out if this is your problem you can enable full hibernate debugging and follow the path hibernate is taking through your object graph.
I've found the answer thanks to the discussion I've had with user1888440.
The solution to this answer was that the Spring #Transactional annotation was nonfunctional in my application. This mean that everything Hibernate did didn't occur in the context of a transaction. This meant that Hibernate would not set ids after persisting and this meant that all conversions would break down.
The reason why #Transactional did not work is probably because of a fact I did not mention: this script is part of a Play 2.0 (actually 2.1) app and is thus built using SBT. SBT doesn't use a normal Java setup to build an application, but instead uses the Scala compiler to compile Java as well. My guess is that the Scala compile did not work well with the AspectJ that Spring requires to make #Transactional work.
Instead, I performed all of the database work involved in this conversion within a programmatically defined Spring transaction (section 11.6). Now everything behaves as expected.
Check he unsaved values for your primary key/Object ID in your hbm files.If you have automated ID creaion by hibernate framework and you are stting th ID somewhere it woudl throw this error.By defaut the unsaved-value is 0 , so if you set the ID as 0 you would see this error.
Sounds like you are forgetting to assign a Book to each Chapter before persisting it. Even if you have persisted the Book it needs to be assigned to the #book property of the Chapter instance before you can persist the Chapter. This is because you have specified the relationship as non-optional. #book can never be null.

Correct modeling historical records in a database

In my applications I have a set of object which stay alive during the whole application lifecycle and I need to create an historical database of them.
These objects are instances of a hierarchy of Java / Scala classes annotated with Hibernate annotations, which I use in my application to load them at startup. Luckily all the classes already contain a timestamp, which means that I do not need to change the object model to be able to create historical records.
What is the most suitable approach:
Use Hibernate without annotations and providing external xml mappings, which are the same as the one of annotations besides the primary key ( which is now a composite key consisting of the previous primary key + the timestamp)
Use other classes for historical records ( this sounds very complicated, as I do have a hierarchy of classes and not a single class, and I would have to subclass my HistoricalRecordClass for every type of record, as I want to build it back). Still use Hibernate
Use a completely different approach (Please not I do not like ORMS, it is just a matter of convience)
Some considerations:
The goal of storing historical records is that the user, through a single GUI, might access both the real-time values of certain data or the historical value, just by specifying a date.
How do you intend to use the historical records? The easiest solution would be to serialize them as JSON and log them to a file.
I've never combined hibernate xml mappings in conjunction with hibernate annotations, but if it worked, it sounds more attractive than carrying two parallel object models.
If you need to be able to recreate the application state at any point in time, then you're more or less stuck with writing them to a database (because of the fast random access). You could cheat and have a "history" table that has a composite key of id + timestamp + type, then a "json" field where you just marshal the thing down and save it. That would help with a) carrying one history table instead of a bunch of clone tables, and b) give you some flexibility if the schema changes (i.e. leverage the open schema nature of JSON)
But since it's archive data with a different usage pattern (you're just reading/writing the records whole), I'd think about some other means of storing it than with the same strict schema as the live data.
It's a nice application of the "write once" paradigm... do you have Hadoop available? ;)

How to maintain/generate tables in Hibernate for multi-user purpose?

I'm working on a project using Play Framework that requires me to create a multi-user application. I've a central panel where we add a certain workshop for a team. Thing is, I don't know if this is the best way, but I want to generate the tables like
team1_tablename
team1_secondtable..
Then when a certain request hits using the virtual host (e.x. http://teamawesome.workshop.com) I would need to maneuver the query to THAT certain table.
The problem is not generating the tables, but working with the models. All the workshops are going to have the same generic tables. In the model I would have to state the table, etc but then if this was PHP with doctrine I would have a template created them after creating the workshop team1, but in java even if I generate them I would have to compile them too which requires me to do more research.
My question is more Hibernate oriented before jumping the gun here and giving up on possible solutions. I'm all ears
I've thought of using NamedQueries, I don't know if I misread but I read in a hibernate book that you could query then add the result to a generic model so then I use that model to retain all my results...
If there are any doubts let me know, thanks (note this is not a multi database question, just using different sets of tables with unique prefixes)
I wonder if you could use one single set of tables, but have something like TEAM_ID as a foreign key in each table.
You would need one single TEAM table, where TEAM_ID will be the primary key. This will get migrated to tables and become part of foreign keys.
For instance, if you have a Player entity, having a collection of HighScores, then in the DB the Player table will have a TEAM_ID (foreign key from the Team table) and the HighScores table will have a composed foreign key (Player_id, Team_id) coming from the Player table..
So, bottom line, I am suggesting a logical partitioning of your database rather then a physical one (as you've considered initially).
Hope this makes sense, it definitely needs more thought, but if you think it's an interesting idea, I can think it through in more detail.
I am familiar with Hibernate and another web framework, here is how I would handle it:
I would create a single set of tables for one team that would address all my needs. Then I would:
Using DB2: Create a schema for each team copying the set of tables into each schema.
Using MySQL: Create a new Database for each copying the set of tables into each one.
Note: A 'database' in MySQL is more like a schema in other databases. (Sorry I'd rather keep things too simple than miss the point)
Now you can set up a separate hibernate.cfg.xml file for each connection (this isn't exactly the best way but perhaps best to start because it's so easy). Now you can specify the connection parameters... including the schema/db. Now your entity table, lets say it's called "team" will use the "team" table where ever it is connected...
To get started very quickly, when a user logs on create a user object in their session.
The user object will have a Hibernate SessionFactory which will be used for all database requests built from the correct hibernate.cfg.xml file as determined by parsing the URL used in the login.
Once the above is working... There are some serious efficiency concerns to address. That being that each logged on user is creating a SessionFactory... Maybe it isn't an issue if there isn't a lot of concurrent use but you probably want to look into Spring at that point and use a connection pool per team. This way there is one Session factory per team and there is no major object creation when a user signs in.
The benefits of this solution is that it should be easier to create new sets of tables because each table set lives in it's own world. There will only be one set of Entity Classes as opposed to the product of one for every team and table. The database schema stays rather simple not being complicated by adding team names and then the required constraints. If the teams require data ownership and privacy it will be rather easy to move the database to a different location.
The down side is that if the model needs to be changed for a team it must be done for each team (as opposed to a single table set using teamName as a foreign key).
The idea of using different tables for each team (despite what successful apps may use it) is honestly quite naïve, and has serious pitfalls when you take maintenance into account...
Just think what you will be forced to do if you discover you need a new table or even just an index... you'll end up needing to write DML scripts as templates and to use some (custom) software to run them on all the teams...
As mentioned in the other answers (Quaternion's and Octav's), I think you have two viable options:
Bring the "team" into your data model
Split the data in different databases/schemas
To choose the option that works best for you, you must decide if the "team" is really something you can partition your dataset into, or if it is really one more entity you want to bring into your datamodel.
You may have noticed that I'm using "splitting" here instead of "partitioning" - that's because the latter term is generally used by DBAs to indicate what we could call "sharding" - "splitting" is intended to be a stronger term.
Splitting is only viable if:
entities in different partitions do not ever need to reference each other
no query will ever need to access data from different partitions (this applies to queries used for reporting too)
As you might well see, splitting in this sense is not very attractive (maybe it could be ok now, but what when you find yourself wanting to add new features?), so my advice is to go for the "the Team is an entity" solution.
Also note that maintaining a set of databases/schemas is actually harder than maintaining a single (albeit maybe a bit more complex) database... again, think of what steps you should take to add an index in a production system...
The only downside of the single-databse solution manifests if you end up having multiple front-ends (maybe due to customizations for particular customers): changes to a shared database have the potential to affect all the applications using it, so you may need to coordinate upgrades to the different webapps to minimize risks (note, however, that in most cases you'll be able to change the database without breaking compatibility).
After all it's a little bit frustrating to get no information just shoot into the dark. Nevertheless now I have start the work, I try to finish.
I think you could do you job with following solution:
Wrote a PlayPlugin and make sure you add to every request the team to the request args. Then you wrote your own NamingStrategy. In the NamingStrategy you could read the request.args and put the team into your table name. Depending on how you add it Team_ or Team. it will be your preferred solution or something with schema. It sounds that you have an db-schema so it would be probably the best solution to stay with this tables and don't migrate.
Please make the next time your request more abstract so that you can provide some information like how many tables, is team an entity and how much records a table has (max, avg, min). How stable is your table model? This are all questions which helps to give a clear recommendation with arguments.
You can try the module vhost, but it seems not very good maintained. But I think the idea to put the name of the team into the table name is really weired. Postgres and Oracle has schemas for that. So you use myTeam.myTable. But then you must do the persistence by your selves.
Another approach would be different databases, but again you don't have good support by play. I would try this
Run for each team a separate play-server, if you don't have to much teams.
Put a reference to a Team-table for every model. Then you can use hibernate-filters or add it manually as additional parameter to each query. Of course this increase your performance. You can fix this issue with oracle partitions.

Where do you put your dictionary data?

Let's say I have a set of Countries in my application. I expect this data to change but not very often. In other words, I do not look at this set as an operational data (I would not provide CRUD operations for Country, for example).
That said I have to store this data somewhere. I see two ways to do that:
Database driven. Create and populate a Country table. Provide some sort of DAO to access it (findById() ?). This way client code will have to know Id of a country (which also can be a name or ISO code). On the application side I will have a class Country.
Application driven. Create an Enum where I can list all the Countries known to my system. It will be stored in DB as well, but the difference would be that now client code does not have to have lookup method (findById, findByName, etc) and hardcode Id, names or ISO codes. It will reference particular country directly.
I lean towards second solution for several reasons. How do you do this?
Is this correct to call this 'dictionary data'?
Addendum: One of the main problems here is that if I have a lookup method like findByName("Czechoslovakia") then after 1992 this will return nothing. I do not know how the client code will react on it (after all it sorta expects always get the Country back, because, well, it is a dictionary data). It gets even worse if I have something like findById(ID_CZ). It will be really hard to find all these dependencies.
If I will remove Country.Czechoslovakia from my enum, I will force myself to take care of any dependency on Czechoslovakia.
In some applications I've worked on there has been a single 'Enum' table in the database that contained all of this type of data. It simply consisted of two columns: EnumName and Value, and would be populated like this:
"Country", "Germany"
"Country", "United Kingdom"
"Country", "United States"
"Fruit", "Apple"
"Fruit", "Banana"
"Fruit", "Orange"
This was then read in and cached at the beginning of the application execution. The advantages being that we weren't using dozens of database tables for each distinct enumeration type; and we didn't have to recompile anything if we needed to alter the data.
This could easily be extended to include extra columns, e.g. to specify a default sort order or alternative IDs.
This won't help you, but it depends...
-What are you going to do with those countries ?
Will you store them in other tables in the DB / what will happen with existing data if you add new countries / will other applications access to those datas ?
-Are you going to translate the contry names in several languages ?
-Will the business logic of your application depend on the choosen country ?
-Do you need a Country class ?
etc...
Without more informations I would start with an Enum with a few countries and refactor depending on my needs...
If it's not going to change very often and you can afford to bring the application down to apply updates, I'd place it in a Java enumeration and write my own methods for findById(), findByName() and so on.
Advantages:
Fast - no DB access for invariant data (or caching requirement);
Simple;
Plays nice with refactoring tools.
Disadvantages:
Need to bring down the application to update.
If you place the data in its own jarfile, updating is as simple as updating the jar and restarting the application.
The hardcoding concern can be made to go away either by consumers storing a value of the enumeration itself, or by referencing the ISO code which is unlikely to change for countries...
If you're worried about keeping this enumeration "in synch" with the database, write an integration test that checks exactly that and run it regularly (eg: on your CI machine).
Personally, I've always gone for the database approach, mostly because I'm already storing other information in the database so writing another DAO is easy.
But another approach might be to store it in a properties file in the jar? I've never done it that way in Java, but it seems to be common in iPhone development (something I'm currently learning).
I'd probably have a text file embedded into my jar. I'd load it into memory on start-up (or on first use.) At that point:
It's easy to change (even by someone with no programming knowledge)
It's easy to update even without full redeployment - put just the text file somewhere on the class path
No database access required
EDIT: Okay, if you need to refer to the particular country data from code, then either:
Use the enum approach, which will always mean redeployment
Use the above approach, but keep an enum of country IDs and then have a unit test to make sure that each ID is mapped in the text file. That means you could change the rest of the data without redeployment, and a non-technical person can still update the data without seeing scary code everywhere.
Ultimately it's a case of balancing pros and cons - if the advantages above aren't relevant for you (e.g. there'll always be a coder on hand, and deployment isn't an issue) then an enum makes sense.
One of the advantages of using a database table is you can put foreign key constraints in. That way your referential integrity will always be intact. No need to run integration tests as DanVinton suggested for enums, it will never get out of sync.
I also wouldn't try making a general enum table as saw-lau suggested, mainly because you lose clean foreign key constraints, which is the main advantage of having them in the DB in the first place (might was well stick them in a text file). Databases are good at handling lots of tables. Prefix the table names with "ENUM_" if you want to distinguish them in some fashion.
The app can always load them into a Map as start-up time or when triggered by a reload event.
EDIT: From comments, "Of course I will use foreign key constraints in my DB. But it can be done with or without using enums on app side"
Ah, I missed that bit while reading the second bullet point in your question. However I still say it is better to load them into a Map, mainly based on DRY. Otherwise, when whoever has to maintain it comes to add a new country, they're surely going to update in one place but not the other, and be scratching their heads until they figure out that they needed to update it in two different places. A case of premature optimisation. The performance benefit would be minimal, at the cost of less maintainable code, IMHO.
I'd start off doing the easiest thing possible - an enum. When it comes to the point that countries change almost as frequently as my code, then I'd make the table external so that it can be updated without a rebuild. But note when you make it external you add a whole can of UI, testing and documentation worms.

Categories