JBoss envers and huge audit tables - java

I am auditing my Java EE application with JBoss Evers and the nature of my application causes the audit table to grow very fast. The historic data is queried infrequently and access time is not really an issue, apart from the data from the last week. This data IS queried frequently and access needs to be fast. Ideally, I would split the data and distribute it over two tables, with the older data in compressed format.
Unfortunately, Envers does not allow spreading data over multiple tables as far as I can tell from the docs.
Does somebody have any idea what would be the best way to achieve this (if possible while still using Envers)?

For the first time, StackOverflow does not know the answer to a question!
I posted the same question in the JBoss forum and Adam's answer was very helpful:
Hello,
not sure which version of Envers you are using, but maybe you can try using the ValidityAuditStrategy (present from 3.6).
Also, see: http://opensource.atlassian.com/projects/hibernate/browse/HHH-5371
Adam
Backlink to the forum entry: http://community.jboss.org/message/579047

Related

Hibernate + MySQL Best practices for reporting data

I am creating a webapp in Spring Boot (Spring + Hibernate + MySQL).
I have already created all the CRUD operations for the data of my app, and now I need to process the data and create reports.
As per the complexity of these reports, I will create some summary or pre proccesed tables. This way, I can trigger the reports creation once, and then get them efficiently.
My doubt is if I should build all the reports in Java or in Stored Procedures in MySQL.
Pros of doing it in Java:
More logging
More control of the structures (entities, maps, list, etc)
Catching exceptions
If I change my db engine (it would not happen, but never know)
Cons of doing it in Java:
Maybe memory?
Any thoughts on this?
Thanks!
Java. Though both are possible. It depends on what is most important and what skills are available for maintenance and the price of maintaining. Stored procedures are usually very fast, but availability and performance also depends on what exact database you use. You will need special skills, and then you have it all working on that specific database.
Hibernate does come with a special dialect written for every database to get the best performance out of the persistence layer. It’s not that fast as a stored procedure, but it comes pretty close. With Spring Data on top of that, all difficulty is gone. Maintenance will not cost that much and people who know Spring Data are more available than any special database vendor.
You can still create various “difficult” queries easily with HQL, so no block there. But Hibernate comes with more possibilities. You can have your caching done by eh-cache and with Hibernate envers you will have your audit done in no time. That’s the nice thing about this framework. It’s widely used and many free to use maven dependencies are there for the taking. And if in future you want to change your database, you can do it by changing like 3 parameters in your application.properties file when using Spring Data.
You can play with some annotations and see what performs better. For example you have the #Inheritance annotation where you can have some classes end up in the same table or split it to more tables. Also you have the #MappedSuperclass where you can have one JpaObject with the id which all your entities can extend. If you want some more tricks on JPA, maybe check this post with my answer on how to use a superclass and a general repository.
As per the complexity of these reports, I will create some summary or
pre proccesed tables. This way, I can trigger the reports creation
once, and then get them efficiently.
My first thought is, is this required? It seems like adding complexity to the application that perhaps isn't needed. Premature optimisation and all that. Try writing the reports in SQL and running an execution plan. If it's good enough, you have less code to maintain and no added batch jobs to administer. Consider load testing using E.G. jmeter or gatling to see how it holds up under stress.
Consider using querydsl or jooq for reporting. Both provide a database abstraction layer and fluent API for querying databases, which deliver the benefits listed in the "Pros of doing it in Java" section of the question and may be more suited to the problem. This blog post jOOQ vs. Hibernate: When to Choose Which is well worth a read.

Java H2 Database Framework

I know this is a newb question, but that's what I am so here goes.
I am writing an application in java that has a lot of H2 database queries so far I have written methods that pull the data I need from the database with queries, because this is the only way I know how.
My question is, is there an easier way to go about getting data from my database that would be more efficient and make things less work. In my research Spring does something like this, but if it does I have been unable to find good information on how to implement it.
Thanks,
I would say there is even better approach called Java Persistence API. It will make your code ORM agnostic and provide some flexibility.
JPA 2.0 is quite rich and will satisfy all your needs. So I do not think you should use Hibernate directly, instead you should try to use JPA where you can. Please note, Hibernate is JPA 2.0 provider.
Please see the following example Creating Good DAOs with Hibernate 3.5 and JPA 2.0 Annotations
There are many options. As ShyJ wrote, Spring Data JPA is one. Many people use Hibernate. There are other libraries you could use, for example SimpleORM.
But I wonder if "which one is better" is the right type of question for StackOverflow. There are many ways to do it "right", and many things to consider.
I am also using H2 rather heavily in a large environment. My advice is to use JPA and particularly Hibernate as it is one of the most popular implementation.
What you want to avoid is writing native sql as if you are going to change a database (if you are) you will run into numerous problems with native sql. JPA solves it by defining JPQL which is like SQL, but will work on any database.
Another great benefit from hibernate is the possibility of using L2 cache which can speed up your application drasticaly.
The last benefit is perhaps most relevant to you- it may take you slightly longer to set up, but once its there, it is much easier to work with the database from pure java.

Hibernate -- Load a record off arbitrary field

I've had the opportunity to rework a good deal of old, poorly maintained perl scripts from a department library into a newer Java design, which hopefully should be more maintainable. Originally, this library did a number of things relating to our Active Directory instance, including things like looking for and reporting on new users, keeping track of which users we knew about, etc.
The next functionality to replicate is the ability to store simple user information in a database -- things like names, employee IDs and account names, nothing too complex. Because I generally don't enjoy JDBC, and I had the opportunity to expand my horizons a bit, so I decided to poke at Hibernate. I know it's very likely overkill for what I'm doing with this application, but I figured that it was a good learning opportunity.
The issue that I have is fairly simple. I've got creating new persistent objects down, that's no sweat. Where I hit a speed bump is in retrieving those objects from the database using Hibernate. I can load the class by its built in ID, but I don't see an option to load on anything else, and needless to say, there isn't an option to save the database's user ID into AD itself. I'm wondering if someone can provide a bit of insight on how to load already-seen users from the database without the User ID; a tutorial or link would be fine. I've tried reading the Hibernate documentation itself, but it's massive, and the vast majority doesn't apply to what I'm actually doing.
Thanks.
Your best bet is to read section 10.4 of the hibernate reference guide on HQL queries. Although you can use the Hibernate Criteria API to formulate queries, HQL is probably the easier to grasp IMHO. In a nutshell, you can formulate queries using the Hibernate Session and using the persistent object's attributes for restriction criteria.

Suggestion for annotation only ORM framework (Java)

I'm working on a medium-sized project in Java (GWT to be precise), and I'm still in the process of deciding what ORM to use.
I just refuse to write SQL queries unless utterly and completely necessary (not the case :D)
I want to use ONLY annotations, no XML configuring [except database location, username, etc], and I DON'T want to create any tables or define them. I want this to be done by the framework completely.
Call me lazy, but I like Java/GWT programming, not creating tables and coping with that sort of things, and it's a plus in my assignment if I actually use an ORM :D
I've considered so far:
Hibernate with annotations: I've found little documentation to get started from ground using this. I've found little examples and alike. It's as if they didn't actually want you to use 100% annotations.
DataNucleus
JDO: It seems interesting, I'd never heard of DataNucleus up to until this week, but it seems extremely mature, and I actually discovered it because Google uses it in GWT, so that's a good sign. I also like the fact that they mentioned I don't need to define any tables or columns, though I think hibernate can achieve this as well. I actually enjoyed reading though their documentation (though I haven't finished yet), something quite opposite to hibernate.
JPA I'm not totally sure if DataNucleus/JPA can work with annotation-only configuration, though I might need to take a deeper look into the documentation.
As you might guess, I'm quite inclined to JDO... but it'd be nice to hear what people who've used it have to say vs the other alternatives, and if i'm missing some very important point here.
Edit 1: I know I'll need to XML the database location/usr/pwd, I meant I don't want to use an XML to configure the mapping or database schema.
JPA (1 and 2) is pretty much XML free, depending on how it's packaged. You most certainly don't need it for the schema. It also supports annotations for details when the tables are generated.
The only issue with these is that while they can create a database, they're a DB MAPPING tool, not a DB DEFINITION tool. Specifically, most won't allow you to create the arbitrary indexes that you may well need to get the DB tuned properly to your queries.
But other than that, JPA should fill your needs, and it has a lot of implementations (Hibernate is just one implementation).
This is a self publicizing but I'm been working for a while on a simple Java ORM package called ORMLite. I wanted something much less complicated than hibernate but without writing SQL directly. It's completely annotation based and currently supports MySQL, Postgres, Derby, and H2. Adding other database would be simple if I have access to a server. It is completely annotation based and can create (and destroy) tables.
http://ormlite.com/
It has pretty flexible QueryBuilder and table paging. Joining is, however, not supported.

What's the best way to persist data in a Java Desktop Application?

I have a large tree of Java Objects in my Desktop Application and am trying to decide on the best way of persisting them as a file to the file system.
Some thoughts I've had were:
Roll my own serializer using DataOutputStream: This would give me the greatest control of what was in the file, but at the cost of micromanaging it.
Straight old Serialization using ObjectOutputStream and its various related classes: I'm not sold on it though since I find the data brittle. Changing any object's structure breaks the serialized instances of it. So I'm locked in to what seems to be a horrible versioning nightmare.
XML Serialization: It's not as brittle, but it's significantly slower that straight out serialization. It can be transformed outside of my program.
JavaDB: I'd considered this since I'm comfortable writing JDBC applications. The difference here is that the database instance would only persist while the file was being opened or saved. It's not pretty but... it does lend itself to migrating to a central server architecture if the need arises later and it introduces the possibility of quering the datamodel in a simpler way.
I'm curious to see what other people think. And I'm hoping that I've missed some obvious, and simpler approach than the ones above.
Here are some more options culled from the answers below:
An Object Database - Has significantly less infrastructure than ORM approaches and performs faster than an XML approach. thanks aku
I would go for the your final option JavaDB (Sun's distribution of Derby) and use an object relational layer like Hibernate or iBatis. Using the first three aproaches means you are going to spend more time building a database engine than developing application features.
Have a look at Hibernate as a simpler way to interface to a database.
In my experience, you're probably better off using an embedded database. SQL, while less than perfect, is usually much easier than designing a file format that performs well and is reliable.
I haven't used JavaDB, but I've had good luck with H2 and SQLite. SQLite is a C library which means a little more work in terms of deployment. However, it has the benefit of storing the entire database in a single, cross-platform library. Basically, it is a pre-packaged, generic file format. SQLite has been so useful that I've even started using it instead of text files in scripts.
Be careful using Hibernate if you're working with a small persistence problem. It adds a lot of complexity and library overhead. Hibernate is really nice if you're working with a large number of tables, but it will probably be cumbersome if you only need a few tables.
db4objects might be the best choice
XStream from codehaus.org
XML serialization/deserialization largely without coding.
You can use annotations to tweak it.
Working well in two projects where I work.
See my users group presentation at http://cjugaustralia.org/?p=61
I think it depends on what you need. Let's see the options:
1) Descarded imediatelly! I'll not even justify. :)
2) If you need a simple, quick, one-method persistence, stick with it. It will persist the complete data graph as it is! Beware of how long you'll be maintaning the persisted objects. As yourself pointed out, versioning can be a problem.
3) Slower than (2), need extra code and can be edited by the user. I would only use it the data is supposed to be used by a client in another language.
4) If you need to query your data in anyway, stick with the DB solution.
Well, I think you had already answered your question :)

Categories