Auditing using Data tables vs Separate Audit tables

Auditing using Data tables vs Separate Audit tables - java

I am in the process of designing a new java application which has very strict requirements for auditing. A brief context is here:
I have a complex entity with multiple one to many nested relationships. If any of the field changes, I need to consider it as a new version of the object and all this need to be audited as well. Now I have two options:
1.) Do not do any update operation, just insert a new entity whenever anything changes. This would require me to create all the relational objects (even if they have not been changed) as I do not want to hold references to any previous version objects. My data tables becomes my auditing table as well.
OR
2.) Always do an update operation and maintain the auditing information in separate tables. That would add some more complexity in terms of implementation.
I would like to know if there is a good vs bad practice for any of these two approaches.
Thanks,
-csn

What should define your choice is your insert/update/read patterns for both the "live" data and the audits.
Most commonly these pattern are very different for both kinds.
- Conserning "live" it depends a lot on your application but I can imagine you have significants inserts; significatant updates; lot of reads. Live data also require transactionality and have lot relationship between tables for which you need to keep consistency. They might require fast and complex search. Indexes on many columns
- Audits have lot of inserts; almost no update; few reads. Read, search don't requires complex search (e.g. you only consult audits and sort them by date) and indexes on many columns.
So with increased load and data size you will probably need to split the data and optimize tables for your use cases.

Related

JPA performance optimization or alternatives

We are currently in a project with a high demand on performance when it comes to reads from the database.
We are currently using JPA (EclipseLink implementation), currently just because it provides convenient database access and column mapping.
For our queries we are using highly specific SQL queries. We are also using one database (SAP HANA, in-memory), so a language abstraction is not required. The database access is pretty fast, our current bottleneck really is the application server, especially the persistence layer.
The result sets often also do not contain entities because entities are made up of the context. For us, there is no point in using an #Id field like the following, because we don't have fields that are unique (only combinations, but defining an IdClass is too much overhead).
#Entity
public class Item {
#Id
public myField;
// other fields...
}
This seems to be enforced by JPA if I want to run a typed native query. Is that assumption true? Currently we haven't found a way around the ID mapping.
Are these findings valid?
If not, how can we make our use of JPA more performant (there is significant latency compared to plain JDBC), also without defining an #Id (because it is useless in our case) for result types?
If yes, is there another Java library that just provides a minimum layer on top of JDBC without too much latency that provides a more convenient use than plain JDBC (with column mapping and all that good stuff).
Thanks!
Usecase: We would like to stream historic GPS sensor data from the database. Besides just transforming this to JSON, we also do some transformations/validations. That's why we actually need to build objects. So what we basically looking for is a convenient way of mapping the fields of select statements to attributes. I hope that makes sense.

There are many articles and blogs about improving EclipseLink/JPA performance that you might look into, such as EclipseLink Performance, JPA Performance Tuning and Optimizing the EclipseLink Application
In the end though it all depends very much on your specific use case and any future use cases you may want. JPA is designed to make reading and writing overtop of JDBC easier and more maintainable and adds many performance benefits such as caching. If all you are using it for is to read raw data though, the extra layer might be extra overhead that isn't adding any value. There isn't much point to having JPA build you entities from the resultsets, maintain the cache and watch for changes only for your application to ignore it all and grab the raw data.
I do not understand why you would have an Item table with a single myField. How is it used by the application and how does it relate to other tables and potential entities?
Such a construct is not the normal use case for relational databases and ORMs, but there are still ways around it in JPA. The data could be used in element collections by other entities, or even just not mapped, and native SQL queries used which are passed straight through the JDBC layer. EclipseLink itself has many mapping types and options above and beyond JPA that might be used depending on your use cases.

Converting database data into objects

I'm building my first real Java application, and I'm confused about following good object-oriented design practices.
My application allows a yoga instructor to design a yoga session. There is a MySQL database with five tables:
a table of yoga poses,
a table for the warmup section of the session
a table for the work section of the session
a table for the restore section of the session
and finally a table containing the various yoga sessions the user has created.
The sessions are composed of sections, and the sections are composed of poses. I purposefully chose to use a database in this design rather than lists or arrays for the purpose of learning how to integrate a database with Java.
Here is my question:
Is it really necessary to make objects out of the poses, sections and sessions?
They are merely data, and they don't have behavior of any kind. The real objects in my application are the windows and tools the user is using to assemble these poses and sections into yoga sessions. It just seems that my code will become unnecessarily inflated and complicated if I force each pose and section and session to be an object. I realize that I may be asking for opinions here, and that is generally discouraged on this forum. But my intent is to understand and follow object-oriented design in what seems to me to be a murky area.

To answer your question, Yes, create objects for each. The main principle of Object oriented programming is having different "types" of Objects and doing stuffs (Behaviors - methods) around them.
I suggest you to look at the concept of ORM. ORM is the process of mapping Objects to their persistent representation, i.e Database tables.
If you go for plain JDBC, you may need to write a lot of native SQLs and code to extract individual column's values from your table. When the complexity of your schema increases, it will get difficult to maintain these queries.
With ORM, you can write simple java programs to get and save data from persistent layers.
You can go through this thread to look at the ORM advantages
Hibernate is a great framework available for Java which does ORM

1) If you use Hibernate or other ORM, you must use POJOs. There's no other way.
2) If you use JDBC directly, for example Spring JDBC, you could use maps (SimpleJdbcTemplate returns maps in many functions etc.), but reading POJO field is much quicker and less error-prone than reading field from map. Maps take also more place in memory.
3) If you are using JSF, you need POJOs with getters and setters, maps could theoretically be used for read-only, but the syntax is going obscure.
Reassuming, there's no good alternative for POJOs in Java for storing datas. In some cases you can use maps, but it's good only when the data structures are dynamic or as temporary solution.

Custom hibernate entity persister

I am in the process of performance testing/optimizing a project that maps
a document <--> Java object tree <--> mysql database
The document, Java classes, database schema and logic for mapping is orchestrated with HyperJaxb3. The ORM piece of it is JPA provided by hibernate.
There are about 50 different entities and obviously lots of relationships between them. A major feature of the application is to load the documents and then reorganize the data into new documents; all the pieces of each incoming document eventually gets sent out in one outgoing document. While I would prefer to not be living in the relational world, the transactional semantics are a very good fit for this application - there is a lot of money and government regulation involved, so we need to make sure everything gets delivered exactly once.
Functionally, everything is going well and performance is decent (after a fair amount of tweaking). Each document is made up of a few thousand entities which end up creating a few thousand rows in the database. The documents vary in size, and insert performance is pretty much proportional to the number of rows that need to be inserted (no surprise there).
I see the potential for a significant optimization, and this is where my question lies.
Each document is mapped to a tree of entities. The "leaf" half of the tree contains lots of detailed information that is not used in the decisions for how to generate the outgoing documents. In other words, I don't need to be able to query/filter by the contents of many of the tables.
I would like to map the appropriate entity sub-trees to blobs, and thus save the overhead of inserting/updating/indexing the majority of the rows I am currently handling the usual way.
It seems that my best bet is to implement a custom EntityPersister and associate it with the appropriate entities. Is this the right way to go? The hibernate docs are not bad, but it is a fairly complex class that needs to be implemented and I am left with lots of questions after looking at the javadoc. Can you point me to a concrete, yet simple example that I can use as a starting point?
Any thoughts about another way to approach this optimization?

I've run in to the same problem with storing large amounts of binary data. The solution I found worked best is a denormalization of the Object model. For example, I create a master record, and then I create a second object that holds the binary data. On the master, use the #OneToOne mapping to the secondary object, but mark the association as lazy. Now the data will only be loaded if you need it.
The one thing that might slow you down is the outer join that hibernate performs with all objects of this type. To avoid it, you can mark the object as mandatory. But if the database doesn't give you a huge performance hit, I suggest you leave it alone. I found that Hibernate has a tendency to load the binary data immediately if I tried to get a regular join.
Finally, if you need to retrieve a lot of the binary data in a single SQL call, use the HQL fetch join command. For example: from Article a fetch join a.data where a.data is the one-to-one relationship to the binary holder. The HQL compiler will see this as an instruction to get all the data in a single sql call.
HTH

Strategies for performance optimizations on an inherited EJB3 application

I was asked to have a look at a legacy EJB3 application with significant performance problems. The original author is not available anymore so all I've got is the source code and some user comments regarding the unacceptable performance. My personal EJB3 skill are pretty basic, I can read and understand the annotated code but that's all until know.
The server has a database, several EJB3 beans (JPA) and a few stateless beans just to allow CRUD on 4..5 domain objects for remote clients. The client itself is a java application. Just a few are connected to the server in parallel. From the user comments I learned that
the client/server app performed well in a LAN
the app was practically unusable on a WAN (1MBit or more) because read and update operations took much too long (up to several minutes)
I've seen one potential problem - on all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations, is it advisable to start tuning with the fetching strategies?
But that would not explain performance issues on update operations, or would it? Update is handled by an EntityManager, the client just passes the domain object to the manager bean and persisting is done with nothing but manager.persist(obj). Maybe the domain objects that are sent to the server are just too big (maybe a side effect of the EAGER strategy).
So my actual theory is that too many bytes are sent over a rather slow network and I should look at reducing the size of result sets.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?

On all EJB, all relations have been defined with the fetching strategy FetchType.EAGER. Would that explain the performance issues for read operations?
Depending on the relations betweens classes, you might be fetching much more (the whole database?) than actually wanted when retrieving entities?
is it advisable to start tuning with the fetching strategies?
I can't say that making all relations EAGER is a very standard approach. To my experience, you usually keep them lazy and use "Fetch Joins" (a type of join allowing to fetch an association) when you want to eager load an association for a given use case.
But that would not explain performance issues on update operations, or would it?
It could. I mean, if the app is retrieving a big fat object graph when reading and then sending the same fat object graph back to update just the root entity, there might be a performance penalty. But it's kinda weird that the code is using em.persist(Object) to update entities.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
The obvious ones include:
Retrieving more data than required
N+1 requests problems (bad fetching strategy)
Poorly written JPQL queries
Non appropriate inheritance strategies
Unnecessary database hits (i.e. lack of caching)
I would start with writing some integration tests or functional tests before touching anything to guarantee you won't change the functional behavior. Then, I would activate SQL logging and start to look at the generated SQL for the major use cases and work on the above points.

From DBA position.
From your experience, what are the typical and most common coding errors that lead to performance issues on CRUD operations, where should I start investigating/optimizing?
Turn off caching
Enable sql logging Ejb3/Hibernate generates by default a lots of extremely stupid queries.
Now You see what I mean.
Change FetchType.EAGER to FetchType.LAZY
Say "no" for big business logic between em.find em.persist
Use ehcache http://ehcache.org/
Turn on entity cache
If You can, make primary keys immutable ( #Column(updatable = false, ...)
Turn on query cache
Never ever use Hibernate if You want big performance:
http://www.google.com/search?q=hibernate+sucks

I my case a similar performance problem wasn't depending on the fetch strategy. Or lets say it was not really possible to change the business logic in the existing fetch strategies. In my case the solution was simply adding indices.
When your JPA Object model have a lot of relationsships (OneToOne, OneToMany, ...) you will typical use JPQL statements with a lot of joins. This can result in complex SQL translations. When you take a look at the datamodel (generated by the JPA) you will recognize that there are no indices for any of your table rows.
For example if you have a Customer and a Address object with an oneToOne relationship everything will work well on the first look. Customer and Address have an foreign key. But if you do selections like this
Select c from Customer as c where c.address.zip='8888'
you should take care about your table column 'zip' in the table ADDRESS. JPA will not create such an index for you during deployment. So in my case I was able to speed up the database performance by simply adding indices.
An SQL Statement in your database looks like this:
ALTER TABLE `mydatabase`.`ADDRESS` ADD INDEX `zip_index`(`IZIP`);

In the question, and in the other answers, I'm hearing a lot of "might"s and "maybe"s.
First find out what's going on. If you haven't done that, we're all just poking in the dark.
I'm no expert on this kind of system, but this method works on any language or OS.
When you find out what's making it take too long, why don't you summarize it here?
I'm especially interested to know if it was something that might have been guessed.

Filter through model or dao?

How should one go about filtering a series of domain objects according to user-defined criteria? Should the filtering methods be in the model or should they be in the DAO?

If you want to use your model objects only (mainly) as data containers you should put the filter into the DAOs you are already using. It is a good practice to make sure, that the user defined criteria are database independent (so pass your own filter object instead of plain e.g. Hibernate Criteria. query by example may work great, too).
Then your DAO methods can look like this:
public interface BeanDao
{
List<Bean> findAllByFilter(BeanFilter filter);
}

The choice of whether to retrieve a greater number of objects and then filter or simply to retrieve the correct objects in the first place depends on the underlying data. Most applications will use a combination of the two.
The things I would consider are:
Network bandwidth & Memory requirements
If after filtering there are a small number of results but a significantly larger number of results before filtering then it could be a waste of bandwidth and memory resources to do the filtering in code.
Query speed
Filtering the results in the database can be more expensive than doing the logic in code - disk vs memory. Indexes are required to make it worthwhile.
Maintainability
Creating new queries can be time consuming. It will definitely mean writing some sql and revisiting various phases of testing. It may require modifying the db schema such as adding an index to speed up the query.
When solving this problem in Java it might be worth considering the visitor pattern. I often use two interfaces SingleMatcher and MultipleMatcher to filter a collection of objects. Implementations of these can be combined to create new user-defined criteria. Another advantage of this is that once you have a the matchers users may want to use you won't have to go back to testing to create new criteria.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.