Why most hibernate applications are using sequence for id generation? - java

Why most hibernate application are using sequence for id generation?
Why not use the default GenerationType=AUTO in #GeneratedValue annotation?
P.S. In my professional career I see everybody is use sequences, but I don't understand why they bother with harder to deploy solution (there is always sequence create SQL command in deployment instructions).

I see several reasons:
The most used database in enterprise apps is probably Oracle, and Oracle doesn't have auto-generated IDs, but sequences.
Sequences allows having the ID before inserting a new row, rather than after inserting the new row. This is easier to use and more efficient because you can batch insert statements at the end of the transaction but still have IDs definned in the middle of the transaction.
Sequences allow using hilo algorithms (which is the default with the hibernate sequence generation), and thus make only one DB call for several inserts, thus increasing performance.
AUTO varies between databases, whereas sequence always uses the same strategy.

From the excellent book Pro JPA 2 Mastering Java Persistence API by Mike Keith and Merrick Schincario.
From Chapter 4: Object Relational Mapping, section Identifier Generation.
[...] If an application does not care what
kind of generation is used by the
provider but wants generation to
occur, it can specify a strategy of
AUTO.
There is a catch to using AUTO,
though. The provider gets to pick its
own strategy to store the identifiers,
but it needs to have some kind of
persistent resource in order to do so.
For example, if it chooses a
table-based strategy, it needs to
create a table; if it chooses a
sequence-based strategy, it needs to
create a sequence. The provider can’t
always rely on the database connection
that it obtains from the server to
have permissions to create a table in
the database. This is normally a
privileged operation that is often
restricted to the DBA. There will need
to be some kind of creation phase or
schema generation to cause the
resource to be created before the AUTO
strategy is able to function.
The AUTO mode is really a generation
strategy for development or
prototyping. It works well as a means
of getting you up and running more
quickly when the database schema is
being generated. In any other
situation, it would be better to use
one of the other generation strategies
discussed in the later sections [...]

At least for Oracle: one reason is to be able to track the number of objects in a table (for which the table-specific sequence is good, if no objects are deleted from the table). Using GenerationType=AUTO uses a global sequence number, which results in gaps in id numbers when having more than one table in the database.

There are different considerations for choosing identity generator, the most important ones are performance and portability but also clustering and data migration might be a consideration.
In practice in the latest Hibernate versions (if not all of them) the SEQUENCE strategy is actually a sequence based HiLo and not a pure sequence as must people assume.
You can read a pretty details post regarding identity generation strategies at my blog: here
Eyal

Related

JPA performance optimization or alternatives

We are currently in a project with a high demand on performance when it comes to reads from the database.
We are currently using JPA (EclipseLink implementation), currently just because it provides convenient database access and column mapping.
For our queries we are using highly specific SQL queries. We are also using one database (SAP HANA, in-memory), so a language abstraction is not required. The database access is pretty fast, our current bottleneck really is the application server, especially the persistence layer.
The result sets often also do not contain entities because entities are made up of the context. For us, there is no point in using an #Id field like the following, because we don't have fields that are unique (only combinations, but defining an IdClass is too much overhead).
#Entity
public class Item {
#Id
public myField;
// other fields...
}
This seems to be enforced by JPA if I want to run a typed native query. Is that assumption true? Currently we haven't found a way around the ID mapping.
Are these findings valid?
If not, how can we make our use of JPA more performant (there is significant latency compared to plain JDBC), also without defining an #Id (because it is useless in our case) for result types?
If yes, is there another Java library that just provides a minimum layer on top of JDBC without too much latency that provides a more convenient use than plain JDBC (with column mapping and all that good stuff).
Thanks!
Usecase: We would like to stream historic GPS sensor data from the database. Besides just transforming this to JSON, we also do some transformations/validations. That's why we actually need to build objects. So what we basically looking for is a convenient way of mapping the fields of select statements to attributes. I hope that makes sense.
There are many articles and blogs about improving EclipseLink/JPA performance that you might look into, such as EclipseLink Performance, JPA Performance Tuning and Optimizing the EclipseLink Application
In the end though it all depends very much on your specific use case and any future use cases you may want. JPA is designed to make reading and writing overtop of JDBC easier and more maintainable and adds many performance benefits such as caching. If all you are using it for is to read raw data though, the extra layer might be extra overhead that isn't adding any value. There isn't much point to having JPA build you entities from the resultsets, maintain the cache and watch for changes only for your application to ignore it all and grab the raw data.
I do not understand why you would have an Item table with a single myField. How is it used by the application and how does it relate to other tables and potential entities?
Such a construct is not the normal use case for relational databases and ORMs, but there are still ways around it in JPA. The data could be used in element collections by other entities, or even just not mapped, and native SQL queries used which are passed straight through the JDBC layer. EclipseLink itself has many mapping types and options above and beyond JPA that might be used depending on your use cases.

Create Table from generated classes

I have generated the representations for my database-model in jooq.
Can I use this to recreate the Database?.
createTable(org.jooq.Table<?> table) wants me to specify the columns.
Ideally when the schema changes, i would just update the jooq representation and when another user installs it it would automatically create the right schema.
There is a pending feature request for this kind of functionality:
https://github.com/jOOQ/jOOQ/issues/3160
But as of jOOQ 3.7, this isn't yet possible. The main challenge is the fact that the generated schema meta information will always lack important pieces (e.g. vendor-specific storage clauses), so this kind of functionality is good for simple databases or test databases at best.

JPA 2: Benefits of #TableGenerator

What are the benefit of #TableGenerator Technique to generate the primary keys?
Why we use this technique and how to fetch the data using third table that use to store the sequence name and value of generator?
From the link. http://en.wikibooks.org/wiki/Java_Persistence/Identity_and_Sequencing#Table_sequencing
There are several strategies for generating unique ids. Some strategies are database agnostic and others make use of built-in databases support.
JPA provides support for several strategies for id generation defined through the GenerationType enum values: TABLE, SEQUENCE and IDENTITY.
The choice of which sequence strategy to use is important as it affects performance, concurrency and portability.
So the choice of using table generators frees you from using database specific features. This makes it easy to migrate the database to some other db provider later on.
So the decision should be made based on whether you want to later on migrate database providers, how much performance you will sacrifice for that etc.

How to maintain/generate tables in Hibernate for multi-user purpose?

I'm working on a project using Play Framework that requires me to create a multi-user application. I've a central panel where we add a certain workshop for a team. Thing is, I don't know if this is the best way, but I want to generate the tables like
team1_tablename
team1_secondtable..
Then when a certain request hits using the virtual host (e.x. http://teamawesome.workshop.com) I would need to maneuver the query to THAT certain table.
The problem is not generating the tables, but working with the models. All the workshops are going to have the same generic tables. In the model I would have to state the table, etc but then if this was PHP with doctrine I would have a template created them after creating the workshop team1, but in java even if I generate them I would have to compile them too which requires me to do more research.
My question is more Hibernate oriented before jumping the gun here and giving up on possible solutions. I'm all ears
I've thought of using NamedQueries, I don't know if I misread but I read in a hibernate book that you could query then add the result to a generic model so then I use that model to retain all my results...
If there are any doubts let me know, thanks (note this is not a multi database question, just using different sets of tables with unique prefixes)
I wonder if you could use one single set of tables, but have something like TEAM_ID as a foreign key in each table.
You would need one single TEAM table, where TEAM_ID will be the primary key. This will get migrated to tables and become part of foreign keys.
For instance, if you have a Player entity, having a collection of HighScores, then in the DB the Player table will have a TEAM_ID (foreign key from the Team table) and the HighScores table will have a composed foreign key (Player_id, Team_id) coming from the Player table..
So, bottom line, I am suggesting a logical partitioning of your database rather then a physical one (as you've considered initially).
Hope this makes sense, it definitely needs more thought, but if you think it's an interesting idea, I can think it through in more detail.
I am familiar with Hibernate and another web framework, here is how I would handle it:
I would create a single set of tables for one team that would address all my needs. Then I would:
Using DB2: Create a schema for each team copying the set of tables into each schema.
Using MySQL: Create a new Database for each copying the set of tables into each one.
Note: A 'database' in MySQL is more like a schema in other databases. (Sorry I'd rather keep things too simple than miss the point)
Now you can set up a separate hibernate.cfg.xml file for each connection (this isn't exactly the best way but perhaps best to start because it's so easy). Now you can specify the connection parameters... including the schema/db. Now your entity table, lets say it's called "team" will use the "team" table where ever it is connected...
To get started very quickly, when a user logs on create a user object in their session.
The user object will have a Hibernate SessionFactory which will be used for all database requests built from the correct hibernate.cfg.xml file as determined by parsing the URL used in the login.
Once the above is working... There are some serious efficiency concerns to address. That being that each logged on user is creating a SessionFactory... Maybe it isn't an issue if there isn't a lot of concurrent use but you probably want to look into Spring at that point and use a connection pool per team. This way there is one Session factory per team and there is no major object creation when a user signs in.
The benefits of this solution is that it should be easier to create new sets of tables because each table set lives in it's own world. There will only be one set of Entity Classes as opposed to the product of one for every team and table. The database schema stays rather simple not being complicated by adding team names and then the required constraints. If the teams require data ownership and privacy it will be rather easy to move the database to a different location.
The down side is that if the model needs to be changed for a team it must be done for each team (as opposed to a single table set using teamName as a foreign key).
The idea of using different tables for each team (despite what successful apps may use it) is honestly quite naïve, and has serious pitfalls when you take maintenance into account...
Just think what you will be forced to do if you discover you need a new table or even just an index... you'll end up needing to write DML scripts as templates and to use some (custom) software to run them on all the teams...
As mentioned in the other answers (Quaternion's and Octav's), I think you have two viable options:
Bring the "team" into your data model
Split the data in different databases/schemas
To choose the option that works best for you, you must decide if the "team" is really something you can partition your dataset into, or if it is really one more entity you want to bring into your datamodel.
You may have noticed that I'm using "splitting" here instead of "partitioning" - that's because the latter term is generally used by DBAs to indicate what we could call "sharding" - "splitting" is intended to be a stronger term.
Splitting is only viable if:
entities in different partitions do not ever need to reference each other
no query will ever need to access data from different partitions (this applies to queries used for reporting too)
As you might well see, splitting in this sense is not very attractive (maybe it could be ok now, but what when you find yourself wanting to add new features?), so my advice is to go for the "the Team is an entity" solution.
Also note that maintaining a set of databases/schemas is actually harder than maintaining a single (albeit maybe a bit more complex) database... again, think of what steps you should take to add an index in a production system...
The only downside of the single-databse solution manifests if you end up having multiple front-ends (maybe due to customizations for particular customers): changes to a shared database have the potential to affect all the applications using it, so you may need to coordinate upgrades to the different webapps to minimize risks (note, however, that in most cases you'll be able to change the database without breaking compatibility).
After all it's a little bit frustrating to get no information just shoot into the dark. Nevertheless now I have start the work, I try to finish.
I think you could do you job with following solution:
Wrote a PlayPlugin and make sure you add to every request the team to the request args. Then you wrote your own NamingStrategy. In the NamingStrategy you could read the request.args and put the team into your table name. Depending on how you add it Team_ or Team. it will be your preferred solution or something with schema. It sounds that you have an db-schema so it would be probably the best solution to stay with this tables and don't migrate.
Please make the next time your request more abstract so that you can provide some information like how many tables, is team an entity and how much records a table has (max, avg, min). How stable is your table model? This are all questions which helps to give a clear recommendation with arguments.
You can try the module vhost, but it seems not very good maintained. But I think the idea to put the name of the team into the table name is really weired. Postgres and Oracle has schemas for that. So you use myTeam.myTable. But then you must do the persistence by your selves.
Another approach would be different databases, but again you don't have good support by play. I would try this
Run for each team a separate play-server, if you don't have to much teams.
Put a reference to a Team-table for every model. Then you can use hibernate-filters or add it manually as additional parameter to each query. Of course this increase your performance. You can fix this issue with oracle partitions.

Migrating from hand-written persistence layer to ORM

We are currently evaluating options for migrating from hand-written persistence layer to ORM.
We have a bunch of legacy persistent objects (~200), that implement simple interface like this:
interface JDBC {
public long getId();
public void setId(long id);
public void retrieve();
public void setDataSource(DataSource ds);
}
When retrieve() is called, object populates itself by issuing handwritten SQL queries to the connection provided using the ID it received in the setter (this usually is the only parameter to the query). It manages its statements, result sets, etc itself. Some of the objects have special flavors of retrive() method, like retrieveByName(), in this case a different SQL is issued.
Queries could be quite complex, we often join several tables to populate the sets representing relations to other objects, sometimes join queries are issued on-demand in the specific getter (lazy loading). So basically, we have implemented most of the ORM's functionality manually.
The reason for that was performance. We have very strong requirements for speed, and back in 2005 (when this code was written) performance tests has shown that none of mainstream ORMs were that fast as hand-written SQL.
The problems we are facing now that make us think of ORM are:
Most of the paths in this code are well-tested and are stable. However, some rarely-used code is prone to result set and connection leaks that are very hard to detect
We are currently squeezing some additional performance by adding caching to our persistence layer and it's a huge pain to maintain the cached objects manually in this setup
Support of this code when DB schema changes is a big problem.
I am looking for an advice on what could be the best alternative for us. As far as I know, ORMs has advanced in last 5 years, so it might be that now there's one that offers an acceptable performance. As I see this issue, we need to address those points:
Find some way to reuse at least some of the written SQL to express mappings
Have the possibility to issue native SQL queries without the necessity to manually decompose their results (i.e. avoid manual rs.getInt(42) as they are very sensitive to schema changes)
Add a non-intrusive caching layer
Keep the performance figures.
Is there any ORM framework you could recommend with regards to that?
UPDATE To give a feeling of what kind of performance figures we are talking about:
The backend database is TimesTen, in-memory database that runs on the same machine as the JVM
We found out that changing rs.getInt("column1") to rs.getInt(42) brings the performance increase we consider significant.
If you want a standard persistence layer that lets you issue native SQL queries, consider using iBATIS. It's a fairly thin mapping between your objects and SQL. http://ibatis.apache.org/
For caching and lazy joins, Hibernate might be a better choice. I haven't used iBATIS for these purposes.
Hibernate provides a lot of flexibility in allowing you to specify certain defaults for lazy loading as you traverse your object graph, yet also pre-fetch data with SQL or HQL queries to your heart's content when you need better-known load times. However, the conversion effort will be complicated for you as it has a fairly high bar to entry in terms of learning and configuration. Annotations made this easier for me.
Two benefits you didn't mention about switching to a standard framework:
(1) running down bugs becomes easier when you have a wealth of sites and forums out there to support you.
(2) new hires are cheaper, easier and faster.
Good luck in addressing your performance and usability issues. The tradeoffs you point out are very common. Sorry if I evangelized.
For the bulk of your queries, I'd go with hibernate. It's widely used,well documented, and generally performant. You can drop down to hand-written SQL if hibernate isn't producing efficient enough queries. Hibernate gives you a lot of control in specifying the table names and columns that the domain objects map to, and in most cases you can retro fit it to an exisitng schema.
Find some way to reuse at least some of the written SQL to express mappings
The mappings are expressed in JPA using annotations. You can use the existing SQL as a guide when creating JPQL queries.
Add a non-intrusive caching layer
Caching in hibernate is automatic and transparent, unless you specifically choose to get involved. You can mark entities as read only, or evict from the cache, control when changes are flushed to the database (inside a transaction of course - automatic use of batching improves performance when network latency is a concern.)
Have the possibility to issue native
SQL queries without the necessity to
manually decompose their results (i.e.
avoid manual rs.getInt(42) as they
are very sensitive to schema changes)
Hibernate allows you to write SQL, and have this mapped to your entities. You don't deal with the ResultSet directly - hibernate takes care of the deconstruction into your entity. See Chpt 16, Native SQL in the hibernate manual.
Support of this code when DB schema changes is a big problem.
Managing schema changes can still be a pain, since you now effectively have two schemata - the database schema and the JPA mapping (an object schema). if you choose to let hibernate generate the db schema and move your data to that, you are no longer directly responsible for what goes into the database, and so you are then faced with manging automatic changes to a machine generated schema. There are tools that can assist, such as dbmigrate, and liquibase, but it's no walk in the park. Conversely, if you are managing the db schema by hand, then you will have to carefully recraft your entities, JPA annotations and queries to accomodate the schema changes. Adding columns and new entities is relatively trivial, but more complex changes such as changing a single property to a collection of properties, or restructing an object hierarchy will involve considerably more extensive changes. There is no easy way out of this - either the db or hibernate is the "master" that decides the schema, and when one changes, the other must follow. The code changes aren't so bad - in my experience, it's migrating the data that's difficult. But this is a basic issue with databases, and will be present in any solution you choose.
So, to sum up, I'd go with hibernate, and use the JPA interface.
I've recently drilled through a bunch of Java ORMs and didn't come up with anything much better than Hibernate. Hibernate's performance may get you there and satisfy your performance goals.
Lots of people think that moving to Hibernate will make everything so awesome, but it's really just moving a set of problems from JDBC queries into Hibernate tuning. Read a bunch of books or (better) hire a "Hibernate guy" to come in and help.
During your refactor, I'd recommend using JPA so you can un-plug and re-plug a new persistence provider when the Next Big Thing comes along (or you move to Oracle)
Do you really need to migrate? What's forcing you to move? Is there some REAL need here or someone just inventing work (an 'Astronaut architect')?
I agree with the above answers though - if you HAVE to move - Hibernate or iBatis are good choices. iBatis especially if you want to stay 'closer' to the SQL.
If you need more performance: drop the database (for on-line work) and handle the persistence direct. Adding caching is not going to help you with a TimesTen DB, it just adds an extra copy (slowing you down).
You might want to take a look at GemFire.
There is a lot of good advice already in here that I won't repeat. The only thing I didn't see suggested that might work for you is caching reference data in memory.
I have done quite a bit of this in the past and it does save a lot of time. If you have a large number of fairly static reference tables, load them all into memory at startup time and refresh them every couple minutes. That way you're not hitting the DB over and over again for data that never changes.

Categories