Persist data for 10 entities takes 12 sec - java

Im using eclipse link with derby DB or MaxDB,when I did performace testing to persist data for 10 entities with 1000 record each this take 12.9 sec ,my code take 0.9 and the commit of the entity manager using JPA takes about 12 sec .
1.is it OK that for 10,000 records the time will be 12 secondes?
2.I read that there is option use
<property name="eclipselink.jdbc.batch-writing" value="JDBC" />
what is the drawback of using it?how the logging is work?
3.what about using Thread for the commit is it OK?

It's OK if it's sufficiently fast for you. Only you knows that. You could compare it with code written by hand using JDBC. But don't forget to also take into account the maintainability and the correctness of the code and the time it takes to write and test it. Hardware is cheap. Developers are not. Note that the use-case you tested (inserting lots of rows in lots of tables) is not a very frequent use-case in most typical applications, and not well-suited for JPA, which is typically used to implement short transactions (like buying a book on Amazon, or adding a message in a blog, things like that).
No idea.
JPA entitymanagers are not thread-safe, and the current transaction is typically associated to the current thread. You can't start a transaction in a thread and commit it in another one.

You should definitely enable batch writing (and confirm that your database/driver supports it).
See,
http://java-persistence-performance.blogspot.com/2011/06/how-to-improve-jpa-performance-by-1825.html

Related

How to handle row lock contention at application level

I have 2 applications (Spring - Hibernate with Boot) using same oracle database (11g). Both apps hit a specific table consistently and there are huge number of hits on this table. we can see row lock contention exceptions in the DB logs and applications have to be restarted each time we get these or when it creates a deadlock like situation.
we are using JPA entitymanager for these applications.
need help for this issue
According to this link :
http://www.dba-oracle.com/t_enq_tx_row_lock_contention.htm
This error occurs because a transaction is waiting for another transaction to commit or roll back ... This behavior is correct from the database POV and if you think of Data consistency ..... But if availability / fulfillment is a concern for you... You might need to make some work around including :
1 make separate tables for each of the application then update the main table with data offline (but u will sacrifice data consistency)
2 make a separate thread to log and retry unsuccessful transactions
3 bear the availability issue (latency) if consistency is a big concern
Also there are some general tips to consider :
1 make the transaction minimal ... Think about every process included in the transaction. If it's mandatory or can be removed outside
2 tune transaction demarcation ... U might find transaction open for long with no reason but bad coding
3 don't make read operations inside transactions
4 avoid extended persistence context (stateless) whenever possible
5 u might choose to use non jta transactional data source for reporting and reading queries
6 check the lock types you are using and try to avoid -according to your case- any thing but OPTIMISTIC
But finally you agree with me we shouldn't blame the database from blocking two transactions from modifying the same row.

using ehcache blocking decorator with hibernate

I'm using ehcache with hibernate and I'd like to use the blocking or SelfPopulating cache to avoid the issues presented in http://ehcache.org/documentation/constructs-0_5.html#mozTocId722946
An expensive operation is required, say rendering a large web page, which takes 30 seconds. The page is not considered stale until it is 5 minutes old. The page is hit very heavily and will be hit an average of 20 times per minute each 5 minutes.
Do I have to do this programmatically as http://ehcache.org/documentation/cache_decorators.html suggests or is there a declarative (in xml) way to do so?
thanks a lot
There is no way to do this in ehcache.xml since you must register the class with the CacheManager before the cache config is read.
So you must use the code mentioned in the docs and you must run this code before you do anything with Hibernate. A simple way to do this is to use the hibernate.cache.provider_class property which tells Hibernate a factory for the cache. Have a look at the source of an implementation which should give you an idea what you need to do.

Restrict postges access from java clients by using java program on a server

Perhaps this question is not very clear but I didn't find better words for the heading, which describes the problem I like to deal with shortly.
I want to restrict access from a java desktop application to postgres.
The background:
Suppose you have 2 apps running and the first Application has to do some complex calculations on the basis of data in the db. To nail the immutability of the data in the db down i'd like to lock the db for insert, update and delete operations. On client side i think it's impossible to handle this behaviour satisfactory. So i thought about to use a little java-app on server-side which works like a proxy. So the task is to hand over CRUD (Create Read Update Delete) operations until it gets a command to lock. After a lock it rejects all CUD operations until it gets a unlock command from the locking client or a timeout is reached.
Questions:
What do you think about this approach?
Is it possible to lock a Database while using such an approach?
Would you prefer Java SE or Java EE as server-side java app?
Thanks in advance.
Why not use transactions in your operations? The database has features to maintain data integrity itself, rather than resorting to a brute operation such as a total-database lock.
This locking mechanism you describe sounds like it would be a pain for the users. Are the users initating the lock or is the software itself? If it's the users, you can expect some problems when Bob hits lock and then goes to lunch for 2 hours, forgetting to unlock the database first...
Indeed... there are a few proper ways to deal with this problem.
Just lock the tables in your code. Postgresql has commands for locking entire tables that you could run from your client application
Pick a transaction isolation level that doesn't have the problem of reading data that was committed after your txn started (BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ).
Of these, by far the most efficient is to use repeatable read as your isolation level. Postgres supports this quite efficiently, and it will give you a consistent view of the data without such heavy locking of the db.
Year i thought about transactions but in this case i can't use them. I'm sorry i didn't mention it exactly. So assume the follow easy case:
A calculation closes one area of responsibility. After calc a new one is opened and new inserts are dedicated to it. But while calculation-process a insert or update or delete is not allowed to the data of the (currently calculated) area of responsibility. More over a delete is strictly prohibited because data has to be archived.
So imo the use of transactions doesn't fit this requirement. Or did i miss sth.?
ps: (off topic) #jsight: i currently read that intenally postgres mapps "repeatable read" to "serializable", so using "repeatable read" gets you more restriction then you would perhaps expect.

When can/should you go whole hog with the ORM approach?

It seems to me that introducing an ORM tool is supposed to make your architecture cleaner, but for efficiency I've found myself bypassing it and iterating over a JDBC Result Set on occasion. This leads to an uncoordinated tangle of artifacts instead of a cleaner architecture.
Is this because I'm applying the tool in an invalid Context, or is it deeper than that?
When can/should you go whole hog with the ORM approach?
Any insight would be greatly appreciated.
A little of background:
In my environment I have about 50 client computers and 1 reasonably powerful SQL Server.
I have a desktop application in which all 50 clients are accessing the data at all times.
The project's Data Model has gone through a number of reorganizations for various reasons including clarity, efficiency, etc.
My Data Model's history
JDBC calls directly
DAO + POJO without relations between Pojos (basically wrapping the JDBC).
Added Relations between POJOs implementing Lazy Loading, but just hiding the inter-DAO calls
Jumped onto the Hibernate bandwagon after seeing how "simple" it made data access (it made inter POJO relations trivial) and because it could decrease the number of round trips to the database when working with many related entities.
Since it was a desktop application keeping Sessions open long term was a nightmare so it ended up causing a whole lot of issues
Stepped back to a partial DAO/Hibernate approach that allows me to make direct JDBC calls behind the DAO curtain while at the same time using Hibernate.
Hibernate makes more sense when your application works on object graphs, which are persisted in the RDBMS. Instead, if your application logic works on a 2-D matrix of data, fetching those via direct JDBC works better. Although Hibernate is written on top of JDBC, it has capabilities which might be non-trivial to implement in JDBC. For eg:
Say, the user views a row in the UI and changes some of the values and you want to fire an update query for only those columns that did indeed change.
To avoid getting into deadlocks you need to maintain a global order for SQLs in a transaction. Getting this right JDBC might not be easy
Easily setting up optimistic locking. When you use JDBC, you need to remember to have this in every update query.
Batch updates, lazy materialization of collections etc might also be non-trivial to implement in JDBC.
(I say "might be non-trivial", because it of course can be done - and you might be a super hacker:)
Hibernate lets you fire your own SQL queries also, in case you need to.
Hope this helps you to decide.
PS: Keeping the Session open on a remote desktop client and running into trouble is really not Hibernate's problem - you would run into the same issue if you keep the Connection to the DB open for long.

Batch insert using JPA/Toplink

I have a web application that receives messages through an HTTP interface, e.g.:
http://server/application?source=123&destination=234&text=hello
This request contains the ID of the sender, the ID of the recipient and the text of the message.
This message should be processed like:
finding the matching User object for both the source and the destination from the database
creating a tree of objects: a Message that contains a field for the message text and two User objects for the source and the destination
persisting this tree to a database.
The tree will be loaded by other applications that I can't touch.
I use Oracle as the backing database and JPA with Toplink for the database handling tasks. If possible, I'd stay with these.
Without much optimization I can achieve ~30 requests/sec throughput in my environment. That's not much, I'd require ~300 requests/sec. So I measured where the performance bottleneck is and found that the calls to em.persist() takes most of the time. If I simply comment out that line, the throughput go well over 1000 requests/sec.
I tried to write a small test application that used simple JDBC calls to persist 1 million messages to the same database. I used batching, meaning I did 100 inserts then a commit, and repeated until all the records was in the database. I measured ~500 requests/sec throughput in this scenario, that would meet my needs.
It is clear that I need to optimize insert performance here. However as I mentioned earlier I would like to keep using JPA and Toplink for this, not pure JDBC.
Do you know a way to create batch inserts with JPA and Toplink? Can you recommend any other technique for improving JPA persist performance?
ADDITIONAL INFO:
"requests/sec" means here: total number of requests / total time from beginning of test to last record written to database.
I tried to make the calls to em.persist() asynchronous by creating an in-memory queue between the servlet stuff and the persister. It helped the performance greatly. However the queue did grow really fast and as the application will receive ~200 requests/second continuously, It is not an acceptable solution for me.
In this decoupled approach I collected requests for 100 msec and called em.persist() on all collected items before commiting the transaction. The EntityManagerFactory is cached between each transaction.
You should decouple from the JPA interface and use the bare TopLink API. You can probably chuck the objects you're persisting into a UnitOfWork and commit the UnitOfWork on your schedule (sync or async). Note that one of the costs of em.persist() is the implicit clone that happens of the whole object graph. TopLink will work rather better if you uow.registerObject() your two user objects yourself, saving itself the identity tests it has to otherwise do. So you'll end up with:
uow=sess.acquireUnitOfWork();
for (job in batch) {
thingyCl=uow.registerObject(new Thingy());
user1Cl=uow.registerObject(user1);
user2Cl=uow.registerObject(user2);
thingyCl.setUsers(user1Cl,user2Cl);
}
uow.commit();
This is very old school TopLink btw ;)
Note that the batch will help a lot, because batch writing and more especially batch writing with parameter binding will kick in which for this simple example will probably have a very large impact on your performance.
Other things to look for: your sequencing size. A lot of the time spent writing objects in TopLink is actually spent reading sequencing information from the database, especially with the small defaults (I would probably have several hundred or even more as my sequence size).
What is your measure of "requests/sec"? In other words, what happens for the 31st request? What resource is being blocked? If it is the front-end/servlet/web portion, can you run em.persist() in another thread and return immediately?
Also, are you creating transactions each time? Are you creating EntityManagerFactory objects with each request?

Categories