hibernate transaction commit extremely slow when the session cache huge objects

hibernate transaction commit extremely slow when the session cache huge objects - java

Two table A(25k rows) and B(2.2m rows), use hibernate session load all those data(each row represent one object) then do update only one row in A and one row in B in a transaction, I found hibernate behavior is strange: the commit consume about 1.5 seconds to return. However the sql database's log shows the sql update command only consume several milliseconds. hibernate consume most of the time before flush the sql command to database.
So I use jprofiler to find out what it doing:
There are no clues about how the time was consumed. Due to database execute update command very fast, so it must not be blocked by database. If it was doing computation, it should be record by jprofiler(cpu time consuming).
What hibernate doing here? Why the commit so slow?

If you have loaded over 2 million objects that are sitting in Hibernate's first level cache you should not be surprised that things are a bit slow. The time is most likely spent going through all those objects looking for changes. If you know that you don't need an object you can evict it from the cache. That will reduce memory consumption and speed up the eventual commit. Just take care not to evict objects that are actually needed or you will create nasty bugs!

Related

Improving Get operation performance

I am running some comparison tests (ignite vs cassandra) to check how to improve the performance of 'get' operation. The data is fairly straightforward. A simple Employee Object(10 odd fields), being stored as BinaryObject in the cache as
IgniteCache<String, BinaryObject> empCache;
The cache is configured with,
Write Sync Mode - FULL_SYNC, Atomicity - TRANSACTIONAL, Backup - 1 & Persistence - Enabled
Cluster config,
3 server + 1 client node.
Client has multiple threads(configurable) making concurrent get calls.
For about 500k request, i am getting a throughput of about 1500/sec. Given all of the data is in off-heap with cache hits percentage = 100%. Interestingly with Cassandra i am getting a similar performance, with key Cache and limited row cache.
Letting the defaults for most of the Data configuration. For this test i turned the persistence off. Ideally for get's it shouldn't really matter. The performance is the same.
Data Regions Configured:
[19:35:58] ^-- default [initSize=256.0 MiB, maxSize=14.1 GiB, persistence=false]
Topology snapshot [ver=4, locNode=038f99b3, servers=3, clients=1, state=ACTIVE, CPUs=40, offheap=42.0GB, heap=63.0GB]
Frankly, i was expecting Ignite gets to be pretty fast, given all data is in cache. Atleast looking at this test https://www.gridgain.com/resources/blog/apacher-ignitetm-and-apacher-cassandratm-benchmarks-power-in-memory-computing
Planning to run one more test tomorrow with no-persistence and setting near cache (on heap) to see if it helps.
Let me know if you guys see any obvious configurations that should be set.

Data Change Notification performance impact

I built a simple application that is monitoring Oracle's DB on a single table. I tried to test performance impact of enabled subscription and was unpleased surprised that degradation is about x2 for inserting about 10000 records each in standalone transaction.
without subscription 10k insert ~ 30 sec
with subscription ROWID granularity 10k insert ~ 60 sec
If I set:
OracleConnection.DCN_NOTIFY_ROWIDS, "false"
OracleConnection.DCN_QUERY_CHANGE_NOTIFICATION, "false"
then all degradation is vanishing but I need to get details of updates.
I removed all extra processing from client side so this is all about subscription overhead.
I am wondering is it so expensive by nature or I can tune this somehow?

Database change notification has an overhead during commit. This can't be tuned. Note that this feature is designed for read-mostly tables that are worth being cached on the client/mid-tier. One trick might be to unregister your app during batch inserts.

Why establishing DB connection within foreachPartition and parallelizing it causing "ORA-00060: deadlock"?

I have a simple Spark job that map, compute and write results into Oracle DB. I'm having a problem when writing the results into DB.
After reducing the results by key, I'm calling foreachPartition action to establish a connect and to write results into DB. If I set the parallelisation to 1, it works fine. BUT when I change the parallelisation of the reducer to 2 or more, it only writes partial results. when I checked the log files, I see this error:
java.sql.BatchUpdateException: ORA-00060: deadlock detected while waiting for resource
How could I resolve this issue?

Oracle is only going to deadlock when you have multiple writes to the same row (or a configuration that doesn't support your level of concurrency but I find this unlikely with 2 parallel writers.)
To get significant benefit from parallelism you need to divide up your work so that your two separate writers don't update the same rows.
This could mean an additional Spark Job to divide up your updates based on the rows they affect before parallelizing the DB writes.
If organizing your writes perfectly to avoid contention is not practical you could add increased parallelization (and get more granularity) and then retry jobs that fail due to deadlocks. This is a band-aid for the real problem which is contention. If you have a high number of deadlocks performance will be much worse with parallelization than it would be just running serially.

Can ehCache return old cached value when cache is refreshing

I am using Spring framework and Java CXF services.
There are several clients accessing a fixed list of data, which are fetched from Database. Since fetching from Database is an expensive call and DB doesn't change frequently, I thought it was good to cache value.
DB refresh -> 15 times a day at indefinite intervals
Cache refresh -> Every 15 minutes.
Cache loading by call to DB takes 15 seconds.
Now, If while Cache is refreshing by making a call to DB, it takes 15 secs. Within this 15 seconds, if clients wants to access data, I am OK to send the previous data in cache. Application is that Stale and Outdated data can be tolerated instead of waiting for 15 secs (there is a delta function which brings data after the time cache was loaded which is very inexpensive). Is there a way in ehcache to return old data in cache while the new data is being loaded while cache is refreshing at regular interval of 15 minutes?

A consistent copy of the cache on local disk provides many possibilities for business requirements, such as working with different datasets according to time-based needs or moving datasets around to different locations. It can range from a simple key-value persistence mechanism with fast read performance, to an operational store with in-memory speeds during operation for both reads and writes.
Ehcache has a RestartStore which provides fast restartability and options for cache persistence. The RestartStore implements an on-disk mirror of the in-memory cache. After any restart, data that was last in the cache will automatically load from disk into the RestartStore, and from there the data will be available to the cache.
Cache persistence on disk is configured by adding the sub-element to a cache configuration. The sub-element includes two attributes: strategy and synchronousWrites.
<cache>
<persistence strategy=”localRestartable|localTempSwap|none|distributed” synchronousWrites=”false|true”/>
</cache>
For more information Ehcache

If you use a "cache aside" pattern and your application code is responsible for reloading the cached data, before it expires, you will observe exactly the behaviour you desire as long as the cache holds the mapping.
If you use a "cache through" pattern, with a cache loader, then there is no way to obtain such a behaviour as when the mapping is being reloaded it will lock the key, thus blocking other threads trying to get to it at the same time.

Java MySQL JDBC Memory Leak

Ok, so I have this program with many (~300) threads, each of which communicates with a central database. I create a global connection to the DB, and then each thread goes about its business creating statements and executing them.
Somewhere along the way, I have a massive memory leak. After analyzing the heap dump, I see that the com.mysql.jdbc.JDBC4Connection object is 70 MB, because it has 800,000 items in "openStatements" (a hash map). Somewhere it's not properly closing the statements that I create, but I cannot for the life of me figure out where (every single time I open one, I close it as well). Any ideas why this might be occurring?

I had exactly the same problem. I needed to keep 1 connection active for 3 threads and at the same time every thread had to execute a lot of statements (the order of 100k). I was very careful and I closed every statement and every resultset using a try....finally... algorithm. This way, even if the code failed in some way, the statement and the resultset were always closed. After running the code for 8 hours I was suprised to find that the necessary memory went from the initial 35MB to 500MB. I generated a dump of the memory and I analyzed it with Mat Analyzer from Eclipse. It turned out that one com.mysql.jdbc.JDBC4Connection object was taking 445MB of memory keeping alive some openStatements objects wich in turn kept alive aroun 135k hashmap entries, probably from all the resultsets. So it seems that even if you close all you statements and resultsets, if you do not close the connection, it keeps references to them and the GarbageCollector can't free the resources.
My solution: after a long search I found this statement from the guys at MySQL:
"A quick test is to add "dontTrackOpenResources=true" to your JDBC URL. If the memory leak
goes away, some code path in your application isn't closing statements and result sets."
Here is the link: http://bugs.mysql.com/bug.php?id=5022. So I tried that and guess what? After 8 hours I was around 40MB of memory required, for the same database operations.
Maybe a connection pool would be advisible, but if that's not an option, this is the next best thing I came around.

You know unless MySQL says so, JDBC Connections are NOT thread safe. You CANNOT share them across threads, unless you use a connection pool. In addition as pointed out you should be try/finally guaranteeing all statements, result sets, and connections are closed.

Once upon a time, whenever my code saw "server went away," it opened a new DB connection. If the error happened in the right (wrong!) place, I was left with some non-free()d orphan memory hanging around. Could something like this account for what you are seeing? How are you handling errors?

Without seeing your code (which I'm sure is massive), you should really consider some sort of more formal thread pooling mechanism, such as Apache Commons pool framework, Spring's JDBC framework, and others. IMHO, this is a much simpler approach, since someone else has already figured out how to effectively manage these types of situations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.