Hibernate + MySQL Best practices for reporting data

Hibernate + MySQL Best practices for reporting data - java

I am creating a webapp in Spring Boot (Spring + Hibernate + MySQL).
I have already created all the CRUD operations for the data of my app, and now I need to process the data and create reports.
As per the complexity of these reports, I will create some summary or pre proccesed tables. This way, I can trigger the reports creation once, and then get them efficiently.
My doubt is if I should build all the reports in Java or in Stored Procedures in MySQL.
Pros of doing it in Java:
More logging
More control of the structures (entities, maps, list, etc)
Catching exceptions
If I change my db engine (it would not happen, but never know)
Cons of doing it in Java:
Maybe memory?
Any thoughts on this?
Thanks!

Java. Though both are possible. It depends on what is most important and what skills are available for maintenance and the price of maintaining. Stored procedures are usually very fast, but availability and performance also depends on what exact database you use. You will need special skills, and then you have it all working on that specific database.
Hibernate does come with a special dialect written for every database to get the best performance out of the persistence layer. It’s not that fast as a stored procedure, but it comes pretty close. With Spring Data on top of that, all difficulty is gone. Maintenance will not cost that much and people who know Spring Data are more available than any special database vendor.
You can still create various “difficult” queries easily with HQL, so no block there. But Hibernate comes with more possibilities. You can have your caching done by eh-cache and with Hibernate envers you will have your audit done in no time. That’s the nice thing about this framework. It’s widely used and many free to use maven dependencies are there for the taking. And if in future you want to change your database, you can do it by changing like 3 parameters in your application.properties file when using Spring Data.
You can play with some annotations and see what performs better. For example you have the #Inheritance annotation where you can have some classes end up in the same table or split it to more tables. Also you have the #MappedSuperclass where you can have one JpaObject with the id which all your entities can extend. If you want some more tricks on JPA, maybe check this post with my answer on how to use a superclass and a general repository.

As per the complexity of these reports, I will create some summary or
pre proccesed tables. This way, I can trigger the reports creation
once, and then get them efficiently.
My first thought is, is this required? It seems like adding complexity to the application that perhaps isn't needed. Premature optimisation and all that. Try writing the reports in SQL and running an execution plan. If it's good enough, you have less code to maintain and no added batch jobs to administer. Consider load testing using E.G. jmeter or gatling to see how it holds up under stress.
Consider using querydsl or jooq for reporting. Both provide a database abstraction layer and fluent API for querying databases, which deliver the benefits listed in the "Pros of doing it in Java" section of the question and may be more suited to the problem. This blog post jOOQ vs. Hibernate: When to Choose Which is well worth a read.

Related

Is there a mature Java Workflow Engine for BPM backed by NoSQL?

I am researching how to build a general application or microservice to enable building workflow-centric applications. I have done some research about frameworks (see below), and the most promising candidates share a hard reliance upon RDBMSes to store workflow and process state combined with JPA-annotated entities. In my opinion, this damages the possibility of designing a general, data-driven workflow microservice. It seems that a truly general workflow system can be built upon NoSQL solutions like MondoDB or Cassandra by storing data objects and rules in JSON or XML. These would allow executing code to enforce types or schemas while using one or two simple Java objects to retrieve and save entities. As I see it, this could enable a single application to be deployed as a Controller for different domains' Model-View pairs without modification (admittedly given a very clever interface).
I have tried to find a workflow engine/BPM framework that supports NoSQL backends. The closest I have found is Activiti-Neo4J, which appears to be an abandoned project enabling a connector between Activity and Neo4J.
Is there a Java Work Engine/BPM framework that supports NoSQL backends and generalizes data objects without requiring specific POJO entities?
If I were to give up on my ideal, magically general solution, I would probably choose a framework like jBPM and Activi since they have great feature sets and are mature. In trying to find other candidates, I have found a veritable graveyard of abandoned projects like this one on Java-Source.net.

Yes, Temporal Workflow has pluggable persistence and runs on Cassandra as well as on SQL databases. It was tested to up to 100 Cassandra nodes and could support tens of thousands of events per second and hundreds of millions of open workflows.
It allows to model your workflow logic as plain old java classes and ensures that the code is fully fault tolerant and durable across all sorts of failures. This includes local variable and threads.
See this presentation that goes into more details about the programming model.

I think the reason why workflow engines are often based on RDBMS is not the database schema but more the combination to a transaction-safe data store.
Transactional robustness is an important factor for workflow engines, especially for long-running or nested transactions which are typical for complex workflows.
So maybe this is one reason why most engines (like activi) did not focus on a data-driven approach. (I am not talking about data replication here which is covered by NoSQL databases in most cases)
If you take a look at the Imixs-Workflow Project you will find a different approach based on Java Enterprise. This engine uses a generic data object which can consume any kind of serializable data values. The problem of the data retrieval is solved with the Lucene Search technology. Each object is translated into a virtual document with name/value pairs for each item. This makes it easy to search through the processed business data as also to query structured workflow data like the status information or the process owners. So this is one possible solution.
Apart from that, you always have the option to store your business data into a NoSQL database. This is independent from the workflow data of a running process instance as far as you link both objects together.
Going back to the aspect of transactional robustness it's a good idea to store the reference to your NoSQL data storage into the process instance, which is transaction aware. Take also a look here.
So the only problem you can run into is the fact that it's very hard to synchronize a transaction context from a EJB/JPA to an 'external' NoSQL database. For example: what will you do when your data was successful saved into your NoSQL data storage (e.g. Casnadra), but the transaction of the workflow engine fails and a role-back is triggered?

The designers of the Activiti project have also been aware of the problem you have stated, but knew it would be quite a re-write to implement such flexibility which, arguably, should have been designed into the project from the beginning. As you'll see in the link provided below, the problem has been a lack of interfaces toward which to code different implementations other than that of a relational database. With version 6 they went ahead and ripped off the bandaid and refactored the framework with a set of interfaces for which different implementations (think Neo4J, MongoDB or whatever other persistence technology you fancy) could be written and plugged in.
In the linked article below, they provide some code examples for a simple in-memory implementation of the aforementioned interfaces. Looks pretty cool and sounds to perhaps be precisely what you're looking for.
https://www.javacodegeeks.com/2015/09/pluggable-persistence-in-activiti-6.html

Given this (see inside) situation, is it worth it to use Hibernate?

We are about to create a Java standard project which is actually a batch process that executes at console.
Every "batch" uses only select statements on multiple tables from different DBs. But we'll be doing around thousands of selects.
I'm not really familiar with the "whole" of Hibernate but is it worth using it in this situation?

Have you taken a look at Spring Batch:
Spring Batch is a lightweight, comprehensive batch framework designed
to enable the development of robust batch applications vital for the
daily operations of enterprise systems. Spring Batch builds upon the
productivity, POJO-based development approach, and general ease of use
capabilities people have come to know from the Spring Framework, while
making it easy for developers to access and leverage more advanced
enterprise services when necessary.

Not necessary. As your description, your db operation is quite simple, so why not just use jdbc directly or some simple libs such as spring jdbc template (http://docs.spring.io/spring/docs/3.0.x/spring-framework-reference/html/jdbc.html)
There's no need to import a huge dependency like Hibernate in my opinion. The time of learning and configuring Hibernate is uncertain, so why not just focus on the main project requirements and make it simple at the first beginning.

If you will perform select on many different tables and/or need to manipulate the data, so the easiest way is retrieve rows from database as instances of objects i strongly recommend hibernate.

How to change JPA 2.0 SQL/JPQL queries dynamically in production

I have a problem with the architecture of JPA 2.0/ORM,
in our production system (and i believe in a lot of systems) we need the ability to change the SQL queries dynamically because slow queries and bugs in queries that was exposed only in production (heavy load and heavy data),
as a result we used in stored procedures and call them from iBatis.
As i see the pattern, the best practice is to seperate between the DB layer and the application layer so i can tell to my DBA to fix buggy stored procedures/indexes tables in production without deployed new application (distribute system - long time of deployment).
In ORM/JPA 2.0 the named queries defined in the code and it's cause to the programmer to catch the whole DB problems in development/QA - very bad!
I saw in the API that the framework give an option to define native query - but in the books/tutorials the best practice is to use the named queries...
After i read the Hibernate/JPA 2.0 SPEC to see if there is a solution for this problem,
i understand that there is no solution...?
Its look to me very wired that i need to define the queries hard coded in the application code layer...
also to define the queries in XML descriptor and to load the XML via patch solution of hot deploy - very bad and not as standard!
Did you have design pattern/solution
Thank you all!!!
Uri.

I'd advise that you do unit and performance testing before you deploy. You shouldn't be finding out about buggy or slow queries at that late juncture.
JPA/ORM is not like iBatis, as you are finding out.
"Buggy" sounds like a lack of thorough unit testing.
"Slow" sounds like your DBAs aren't checking the SQL generated by JPA. EXPLAIN PLAN for all of it. You might have indexing or schema issues on the database side that no amount of playing with JPA will fix.
Patterns aren't the solution.

You should be able to have your entities read data from views as well as tables. Then the sql for the views can be altered on the fly.

Java - JDBC alternatives [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
this is just theoretical question.
I use JDBC with my Java applications for using database (select, insert, update, delete or whatever).
I make "manually" Java classes which will contain data from DB tables (attribute = db column). Then I make queries (ResultSet) and fill those classes with data. I am not sure, if this is the right way.
But I've read lot of about JDO and another persistence solutions.
Can someone please recommend the best used JDBC alternatives, based on their experience?
I would also like to know the advantages of JDO over JDBC (in simple words).
I've been able to google lot of this stuff, but opinions from the "first hand" are always best.
Thanks

The story of database persistence in Java is already long and full of twists and turns:
JDBC is the low level API that everybody uses at the end to talk to a database. But without using a higher level API, you have to do all the grunt work yourself (writing SQL queries, mapping results to objects, etc).
EJB 1.0 CMP Entity Beans was a first try for a higher level API and has been successfully adopted by the big Java EE providers (BEA, IBM) but not by users. Entity Beans were too complex and had too much overhead (understand, poor performance). FAIL!
EJB 2.0 CMP tried to reduce some of the complexity of Entity Beans with the introduction of local interfaces, but the majority of the complexity remained. EJB 2.0 also lacked portability (because the object-relational mapping were not part of the spec and the deployment descriptor were thus proprietary). FAIL!
Then came JDO which is a datastore agnostic standard for object persistence (can be used with RDBMS, OODBMS, XML, Excel, LDAP). But, while there are several open-source implementations and while JDO has been adopted by small independent vendors (mostly OODBMS vendors hoping that JDO users would later switch from their RDBMS datastore to an OODBMS - but this obviously never happened), it failed at being adopted by big Java EE players and users (because of weaving which was a pain at development time and scaring some customers, of a weird query API, of being actually too abstract). So, while the standard itself is not dead, I consider it as a failure. FAIL!
And indeed, despite the existence of two standards, proprietary APIs like Toplink, an old player, or Hibernate have been preferred by users over EJB CMP and JDO for object to relational database persistence (competition between standards, unclear positioning of JDO, earlier failure of CMP and bad marketing have a part of responsibility in this I believe) and Hibernate actually became the de facto standard in this field (it's a great open source framework). SUCCESS!
Then Sun realized they had to simplify things (and more generally the whole Java EE) and they did it in Java EE 5 with JPA, the Java Persistence API, which is part of EJB 3.0 and is the new standard for object to relational database persistence. JPA unifies EJB 2 CMP, JDO, Hibernate, and TopLink APIs / products and seems to succeed where EJB CMP and JDO failed (ease of use and adoption). SUCCESS!
To summarize, Java's standard for database persistence is JPA and should be preferred over others proprietary APIs (using Hibernate's implementation of JPA is fine but use JPA API) unless an ORM is not what you need. It provides a higher level API than JDBC and is meant to save you a lot of manual work (this is simplified but that's the idea).

If you want to write SQL yourself, and don't want an ORM, you can still benefit from some frameworks which hides all the tedious connection handling (try-catch-finally). Eventually you will forget to close a connection...
One such framework that is quite easy to use is Spring JdbcTemplate.

I can recommend Hibernate. It is widely used (and for good reasons), and the fact that the Java Persistence API specification was lead by the main designer of Hibernate guarantees that it will be around for the foreseeable future :-) If portability and vendor neutrality is important to you, you may use it via JPA, so in the future you can easily switch to another JPA implementation.
Lacking personal experience with JDO, I can't really compare the two. However, the benefits of Hibernate (or ORM in general) at first sight seem to be pretty much the same as what is listed on the JDO page. To me the most important points are:
DB neutrality: Hibernate supports several SQL dialects in the background, switching between DBs is as easy as changing a single line in your configuration
performance: lazy fetching by default, and a lot more optimizations going on under the hood, which you woulds need to handle manually with JDBC
you can focus on your domain model and OO design instead of lower level DB issues (but you can of course fine-tune DML and DDL if you wish so)
One potential drawback (of ORM tools in general) is that it is not that suitable for batch processing. If you need to update 1 million rows in your table, ORM by default will never perform as well as a JDBC batch update or a stored procedure. Hibernate can incorporate stored procedures though, and it supports batch processing to some extent (I am not familiar with that yet, so I can't really say whether it is up to the task in this respect compared to JDBC - but judging from what I know so far, probably yes). So if your app requires some batch processing but mostly deals with individual entities, Hibernate can still work. If it is predominantly doing batch processing, maybe JDBC is a better choice.

Hibernate requires that you have an object model to map your schema to. If you're still thinking only in terms of relational schemas and SQL, perhaps Hibernate is not for you.
You have to be willing to accept the SQL that Hibernate will generate for you. If you think you can do better with hand-coded SQL, perhaps Hibernate is not for you.
Another alternative is iBatis. If JDBC is raw SQL, and Hibernate is ORM, iBatis can be thought of as something between the two. It gives you more control over the SQL that's executed.

JDO builds off JDBC technology. Similarly, Hibernate still requires JDBC as well. JDBC is Java's fundamental specification on database connectivity.
This means JDBC will give you greater control but it requires more plumbing code.
JDO provide higher abstractions and less plumbing code, because a lot of the complexity is hidden.
If you are asking this question, I am guessing you are not familiar with JDBC. I think a basic understanding of JDBC is required in order to use JDO effectively, or Hibernate, or any other higher abstraction tool. Otherwise, you may encounter scenario where ORM tools exhibit behavior you may not understand.
Sun's Java tutorial on their website provide a decent introductory material which walks you through JDBC. http://java.sun.com/docs/books/tutorial/jdbc/.

Have a look at MyBatis. Often being overlooked, but great for read-only complex queries using proprietary features of your DBMS.
http://www.mybatis.org

Here is how it goes with java persistence. You have just learnt java, now you want to persist some records, you get to learn JDBC. You are happy that you can now save your data to a database. Then you decide to write a bit bigger application. You realize that it has become tedious to try, catch , open connection, close connection , transfer data from resultset to your bean .... So you think there must be an easier way. In java there is always an alternative. So you do some googling and in a short while you discover ORM, and most likely, hibernate. You are so exited that you now dont have to think about connections. Your tables are being created automatically. You are able to move very fast. Then you decide to undertake a really big project, initially you move very fast and you have all the crud operations in place. The requirements keep comming, then one day you are cornered. You try to save but its not cascading to the objects children. Somethings done work as explained in the books that you have read. You dont know what to do because you didnt write the hibernate libraries. You wish you had written the SQL yourself. Its now time to rethink again... As you mature , you realize that the best way to interact with the Database is through SQL. You also realize that some tools get you started very fast but they cant keep you going for long. This is my story. I am now a very happy ibatis/User.

Ebean ORM is another alternative http://ebean-orm.github.io/
Ebean uses JPA Annotations for Mapping but it is architected to be sessionless. This means that you don't have the attached/detached concepts and you don't persist/merge/flush - you just simply save() your beans.
I'd expect Ebean to be much simplier to use than Hibernate, JPA or JDO
So if you are looking for a powerful alternative approach to JDO or JPA you could have a look at Ebean.

JPA/Hibernate is a popular choice for ORM. It can provide you with just about every ORM feature that you need. The learning curve can be steep for those with basic ORM needs.
There are lots of alternatives to JPA that provide ORM with less complexity for developers with basic ORM requirements. Query sourceforge for example:
http://sourceforge.net/directory/language:java/?q=ORM
I am partial to my solution, Sormula: sourceforge or bitbucket. Sormula was designed to minimize complexity while providing basic ORM.

Hibernate, surely. It's popular, there is even a .NET version.
Also, hibernate can be easily integrated with Spring framework.
And, it will mostly fit any developer needs.

A new and exciting alternative is GORM, which is the ORM implementation from Grails. Can now be used stand alone.
Under the hood it uses Hibernate, but gives you a nice layer on top with cool dynamic finders etc.

All these different abstraction layers eventually use JDBC. The whole idea is to automate some of the tedious and error prone work much in the same way that compilers automate a lot of the tedious work in writing programs (resizing a data structure - no problem, just recompile).
Note, however, that in order for these to work there are assumptions that you will need to adhere to. These are usually reasonable and quite easy to work with, especially if you start with the Java side as opposed to have to work with existing database tables.
JDO is the convergence of the various projects in a single Sun standard and the one I would suggest you learn. For implementation, choose the one your favorite IDE suggests in its various wizards.

There is also torque (http://db.apache.org/torque/) which I personally prefer because it's simpler, and does exactly what I need.
With torque I can define a database with mysql(Well I use Postgresql, but Mysql is supported too) and Torque can then query the database and then generate java classes for each table in the database. With Torque you can then query the database and get back Java objects of the correct type.
It supports where clauses (Either with a Criteria object or you can write the sql yourself) and joins.
It also support foreign keys, so if you got a User table and a House table, where a user can own 0 or more houses, there will be a getHouses() method on the user object which will give you the list of House objects the user own.
To get a first look at the kind of code you can write, take a look at http://db.apache.org/torque/releases/torque-3.3/tutorial/step5.html which contains examples which show how to load/save/query data with torque. (All the classes used in this example are auto-generated based on the database definition).

I recommend to use the Hibernate, its really fantastic way of connecting to the database, earlier there were few issues, but later it is more stable.
It uses the ORM based mapping, it reduces your time on writing the queries to an extent and it allows to change the databases at a minimum effort.
If you require any video based tutorials please let me know I can uplaod in my server and send you the link.

Use hibernate as a stand alone JAR file then distribute it to your different web apps. This far is the best solution out there. You have to design your Classes, Interfaces, Enums to do an abstract DAO pattern. As long as you have correct entities and mappings. You will only need to work with Objects(Entities) and not HSQL.

Hibernate/JPA DB Schema Generation Best Practices

I just wanted to hear the opinion of Hibernate experts about DB schema generation best practices for Hibernate/JPA based projects. Especially:
What strategy to use when the project has just started? Is it recommended to let Hibernate automatically generate the schema in this phase or is it better to create the database tables manually from earliest phases of the project?
Pretending that throughout the project the schema was being generated using Hibernate, is it better to disable automatic schema generation and manually create the database schema just before the system is released into production?
And after the system has been released into production, what is the best practice for maintaining the entity classes and the DB schema (e.g. adding/renaming/updating columns, renaming tables, etc.)?

It's always recommended to generate the schema manually, preferably by a tool supporting database schema revisions, such as the great Liquibase. Generating the schema from the entities is great in theory, but were fragile in practice and causes lots of problems in the long run(trust me on this).
In productions it's always best to have manually generated and review the schema.
You make an update to an entity and create a matching update script(revision) to update your database schema to reflect the entity change. You can create a custom solution(I've written a few) or use something more popular like liquibase(it even supports schema changes rollbacks). If you're using a build tool such as maven or ant - it's recommend to plug the db schema update util into the build process so that fresh builds stay in sync with the schema.

Although disputable, I'd say that the answer to all 3 questions is: let hibernate automatically generate the tables in the schema.
I haven't had any problems with that so far. You might need to clean some field up manually from time to time, but this is no headache compared to separately keeping track of DDL scripts - i.e. managing their revisions and synchronizing them with entity changes (and vice-versa)
For deploying on production - an obvious tip - first make sure everything is generated OK on the test environment and then deploy on production.

Manually, because:
Same database may be used by different applications and not all of
them would be using hibernate or even java. Database schema should
not be dictated by ORM, it should be designed around the data and
business requirements.
The datatypes chosen by hibernate might not be best suited for the application.
As mentioned in an earlier comment, changes to the entities would require manual intervention if data loss is not acceptable.
Things such as additional properties (generic term not java
properties) on join tables work wonderfully in RDBMS but are
somewhat complex and inefficient to use in an ORM. Doing such a
mapping from ORM -> RDBMS might create tables that are not
efficient. In theory, it is possible to build the exact same join
table using hibernate generated code, but it would require some
special care while writing the Entities.
I would use automatic generation for standalone applications or databases that are accessed via the same ORM layer and also if the app needs to be ported to different databases. It would save lot of time in by not requiring one to write and maintain DB vendor specific DDL scripts.

Like Bozhidar said, don´t let Hibernate create&update the database schema.
Let your application create and update the database schema.
For java the best tool to do this is Flyway. You need to create one or more SQL files with DDL statements which are describing your database schema. These SQL files are then executed by Flyway. For more information look at the site of Flyway.

I believe that a lot of what is being discussed or argued here should also be related to if you are more confortable with the code-first or the database-first approach.
Personally, I am more intended to go for latter and, making a reference to Single Responsibility Principle (SRP), I prefer having DB specialist handling the DB and an application specialist handling the application, than having the application handling the DB. Additionally, I am of the opinion that taking too many shortcuts will work fine at the beginning but create unmanageable problems as things grow/evolve.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.