How to tune performance of hsqldb/hibernate app - java

I have an open source Java application that uses Hibernate and HSQLDB for persistence. In all my toy tests, things run fast and everything is good. I have a client who has been running the software for several months continuously and their database has grown significantly over that time, and performance has dropped gradually. It finally occurred to me that the database could be the problem. As far as I can tell from log statements, all of the computation in the server happens quickly, so this is consistent with the hypothesis that the DB might be at fault.
I know how to do normal profiling of a program to figure out where hot spots are and what is taking up significant amounts of time. But all the profilers I know of monitor execution time within the program and don't give you any help about calls to external resources. What tools do people use to profile programs that are using external db calls to find out where to optimize performance?
A little blind searching around has already found a few hot spots--I noticed a call where I was enumerating all the objects of a particular class in order to find out whether there were any. A one line change to the criterion [.setMaxResults(1)] changed that call from a half-second to virtually instantaneous. I also see places where I ask the same question from the db many times within a single transaction. I haven't figured out how to cache the answer yet, but what I really want is a tool to help me look for these kinds of things more systematically.

Unfortunately, as far as I know, there is no tool for that.
But there are some things you might want to check:
Are you using eager loading instead of lazy loading? By the description of your problem, it really looks like you are not using lazy loading...
Have you turned on and properly configured your second-level caching? Including the Query Cache? Hibernate caching mechanism is extremely powerful and flexible.
Have you consider using Hibernate Search? Depending on your query, Hibernate Search Full Text index on top of Apache Lucene can speed up you queries (since it indexing system is so powerful)

How much data are you storing in HSQLDB? I don't think it performs well when managing large sets of data, since it's just storing everything in files...

There was once a tool called IronGrid/IronEye/IronTrackSql that did exactly what you are looking for. Unfortunately, they went out of business. They did open source their product at the last minute, but I have not been able to find source or a binary for quite some time.
I have been using YourKit for profiling lately, partly because you can have it profile SQL time to find your most called statements and longest running statements. It is not as detailed as IronGrid was, but it does give you valuable information. In my latest database/hibernate tuning session, the problem turned out to be hibernate and how and when it was doing eager vs. lazy loading, and adding some judicious overrides of the default when selecting large numbers of items.

Lots to report on here. I have some results, and am still looking for good answers.
I've found a couple of tools that help:
VisualVM (with BTrace, or the built in Trace) claims to help with tracing, but I haven't been able to find any tool that shows timing on method calls.
YourKit is reputed to be useful; I've asked for an open source license.
The most useful thing I found is Hibernate's built in statistics. If you set
hibernate.generate_statistics true in your properties, you can send sessionFactory.getStatistics(), and see detailed statistics on what objects have been stored and retrieved and what affects the caches are having. I found one of the answers I wanted in the qeuryStatistics, which reports for each compiled query, the cache hits and misses, the number of times the query has run, how many rows were returned, and the average, max and min execution times. These timings made it abundantly clear where the time was going.
I then did some reading on caching. Razenha's suggestion was right on. [I'll mark his answer as right for now.] I added hibernate.cache.use_query_cache true to my properties, and added query.setCacheable(true); to most of my queries. I also added <cache usage="read-write"/> to a few of my .hbm.xml files. Now most of my statistics are showing a vast predominance of cache hits, and the performance is vastly better.
I'd still like some tools to help me trace execution timing so I can attack the worst problems rather than the most obvious, but this is a big help. Maybe one of the tracing tools above will turn out to help.

In Terracotta 3.1, you can monitor all of those statistics in real-time using the Terracotta Developer Console. You can see historical graphs for cache statistics, and see the hibernate statistics or cache statistics cluster-wide or on an per-node basis.
Terracotta is open source. More details and download is at Terracotta for Hibernate.

Related

Stuck on a Spring performance issue

I am kind of stuck on an issue. I have a Spring + Hibernate app, which for the last few days has been behaving really strange.
Normally, even on Debug mode, it bootstrapped for around 15 seconds.
As of a couple of days, without any significant errors or problems being shown, it started running twice, if not three times as slowly.
I thought maybe the problem was in Tomcat, but even when I ran the series of unit tests which I've written, they went amazingly slow (we are talking 8+ sec on a test + 20sec for the initial context bootstrapping). I use a local PostgreSQL database for the tests which is normally not so bad (around one second on a test)
I am stuck. I know that the last thing I did was to add support for #Transactional on my #Controller classes. Could this be the reason? I doubt, because when I deployed the transactional modifications to the stable server and restarted it, it ran just as fast as before.
I am completely stuck
You are aware that based on your description we can only guess. Obviously if you are using continuous-integration, build times of subsequent builds will reveal which code change could be the reason. If you identify the code, diagnosing will be much simpler.
Chances are it wasn't really caused by code change but by the environment change. E.g.:
Spring fetches some external resource on bootstrap (XML schemas, DTDs, etc.)
there is some extra network overhead (router/switch reconfiguration, firewall)
Database is fragmented or somehow slow. Try vacuuming
Investigating:
Capture few stack traces during bootstrap. No need for full-blown profiling, I bet there is a single blocking/waiting operation slowing things down
Enable every logging statement you can. Every. Investigate which pieces of your application are slow.
My guess is that your DB is the root cause for this.
Probably you have some query that is taking longer than it used to due to data size growth, schema changes, or the like.
Here are some tips:
Try to check the startup logs to see what seems to be slower.
Can you try to start an older version of the app, and see if it's faster?
Can you see which tests are slower now?
Maybe profiling will help you to gain more information. There is a nice tool which comes shipped with JDK called jvisualvm (you can find it within the bin folder of your jdk installation). It's pretty much self explanory. You can connect to you applciation and start sampeling.

Caching web applications in Java

What is the better way to cache a Java Web Application using MySQL? to improve performance.
What are the best techniques to do it?
It is better to do this at the application level or database level?
I'm new to this, so, sorry if I'm wrong.
Well there are ways to have some performance tips both at database levels and Application levels.
For database levels here are few inputs
Query optimization
Indexes creation on frequent asked data.
For some ORM layers like hibernate it also provides some sort of mechanism to cache the outputs in primary levels and secondary levels.
For application levels we have many options few of them are
1.EHCache
2. Memcached
JCS
here is a complete list of java based caching frameworks
java-caching-system
and some googling will help you to find many other options
for UI layers there are lots of area of improvements like
Proper use of HTTP headers
Less number of server Hits.
Way to load javascripts
Way to load CSS files
use of CDN severs
Yahoo has very good blog for this.YSLOW from YAHOO.If you are in the early stage of development i will suggest not to go for them as they lead to premature optimization and can lead to may problems.
Why don't you look at below links which could help you.
Article1
Article2
Frameworks exist for this purpose and Ehcache is one of them. Here you can read up on how you can use it:ehcache
Unfortunately, the question is way too broad (there are books on the topic, so it literally falls in the FAQ definition of offtopic)—and thus is likely to be closed soon.
In brief, there are plenty of Java caching solutions, including for example Guava and ehcache.
The three best techniques would be:
Profile
Profile
Profile
First, before changing anything, second, to make sure your changes have effect, third time in production, to make sure your changes do work in real life.
On the levels—both, the decision depends on the profiling data.

JDBC Profiler for JBoss / Distributed Applications

I'm trying to eliminate a slow database being the cause of some performance issues for a distributed application I'm supporting. I've done local profiling of various facets of the application and performance monitoring of the server itself, leading me to suspect that the database is at least partially responsible for the poor performance.
Currently I'm using JBoss for the back-end (using a Hibernate / JDBC layer to connect to the database), but I only have source access to some of the code.
I've found Elvyx, but this project seems to have been abandoned in 2008. Is there a newer JDBC profiler available - what's the current 'de facto' standard for profiling a database in a distributed app?
Alternatively, can anyone suggest a better / alternative approach?
Try using YourKit, it supports a reasonable degree of JDBC profiling:
You can view executed SQL after you capture a CPU snapshot.
You can also enable JDBC probes and view multiple things live, such as timings, stack traces, threads, SQL statements and many more, see attached screen shot from my colleague's computer (looks like you'll need to open this image in another browser tab/window to see it full size):
Don't really want this to sound as an ad for YourKit, but get yourself a trial license and give it a go.
I too would recommend AppDynamics. The 'Lite' version was more than adequate for my purposes.
Our Java EE application runs within JBoss and we knew we had performance bottlenecks in certain areas. By quickly and easily (and I do mean easily) installing and spinning up AppDynamics, then running some load through the application, we were able to see straight away exactly where our performance hits were located. On the clean and concise dashboard we were able to drill right down the stack to see which class needed some improvement.
Highly recommended. Definitely check it out. I heat the 'Pro' version is even better.
If you are trying to hunt down (or at least confirm) issues related to a slow database, IMO using the profiling tools provided by the database would be a good starting point.
We had done something along these lines previously by profiling JDBC calls (noting the timings) and comparing them against the time required to execute the same query "on" the database itself. This gave a pretty good idea of how much time was exactly spent by the JDBC in making the db call and getting back the result.
dynaTrace supports SQL call introspection and measures how long each sql call took. The field is called application performance management in general.
Is there a particular Use case which you feel is slow or Is it in general that you feel DB responsiveness is slow.
In case its a use case, I would suggest go for tools like AppDynamics or GlassBox.
In case its in general, Starting from DB is a better approach.
I am assuming you would have already done the analysis at distributed applications side regarding the connection metrics and at DB server OS side regarding the socket opens and IO permissible.
Arcturus Applicare does support JDBC profiling on JBoss (and other Java app servers). You can view all SQLs with min, max, avg stats aggregated across all servers on you env or on individual servers.
With full profiling enabled, you will be able to see the execution trace for each and every request/transaction processed by the server including SQLs with execution parameters. Making it pretty easy to detect expensive SQLs and where exactly they are been executed.

Am I crazy? Switching an established product from HSQLDB to Apache Derby

I have an established software product that uses HSQLDB as its internal settings database. Customer projects are stored in this database. Over the years, HSQLDB has served us reasonably well, but it has some stability/corruption issues that we've had to code circles around, and even then, we can't seem to protect ourselves from them completely.
I'm considering changing internal databases. Doing this would be fairly painful from a development perspective, but corrupted databases (and lost data) are not fun to explain to customers.
So my question is: Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? I found a post via Google complaining that Derby was unstable, but it was from 2006 so I'd entertain the idea that it has been improved in the last 4 years. Or, is there another pure Java embedded (in-process) database that I could use (commercial or open-source). Performance isn't very important to me. Stability is king. Data integrity across power loss, good BLOB support, and hot-backups are all a must.
Please don't suggest something that isn't a SQL-based relational database. I'm trying to retrofit an existing product, not start from scratch, thanks.
For each database engine there is a certain risk of corruption. I am the main author of the H2 database, and I also got reports about broken databases. Testing can reduce the probability of bugs, but unfortunately it's almost impossible to guarantee some software is 'bug free'.
As for the three Java database HSQLDB, Apache Derby, and H2, I can't really say which one is the most stable. I can only speak about H2. I think for most operations, H2 is now stable. There are many test cases that specially test for databases getting corrupt. This includes automated tests on power loss (using a christmas light timer). With power failure tests I found out stability also depends on the file system: sometimes I got 'CRC error' messages meaning the operating system can't read the file (it was Windows). In that case, there is not much you can do.
For mission critical data, in any case I wouldn't rely on the software being stable. It's very important to create backups regularly, and test them. Some databases have multiple way to create backups. H2 for example has an online backup feature, and a feature to write a SQL script file. An alternative is to use replication or clustering. H2 supports a simple cluster mode, I believe Derby supports replication.
I ran Derby 24/7 as the internal database supporting a build automation and test management system for 4 years. It was used by a worldwide team, and never crashed, lost data, or corrupted my records. The only reason we stopped using it is because our company was bought by another and a higher-level decision was handed down. Derby is solid, reliable, and well worth your consideration.
This search shows 215 posts in HSQLDB Users mailing list containing the string "corrupt".
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.java.hsqldb.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.java.hsqldb.user---A
This search shows 264 posts in Derby Users mailing list containing the same string.
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.user---A
This one shows 1003 posts in Derby Dev mailing list with the same string
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.devel&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.devel---A
A look at some of the posts shows possible or real cases of database corruption happen despite all the best efforts of database developers.
HSQLDB has had its own share of database corruption issues but has improved over the years. In the latest versions precautions and fixes have been introduced to prevent all the issues that were reported in the last few years.
The new lob storage feature however, turned out to have a logic bug that results in the lobs being "forgotten" after an update. This is being fixed right now, with more extensive tests to support the fix.
Users like CarlG have helped a lot over the years in the bug fixing efforts of both Derby and HSQLDB.
Fred Toussi, HSQLDB Project
Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? (...)
Derby, ex IBM Cloudscape (and now also distributed by Sun as JavaDB) is an ACID-compliant database that can stand a lot of concurrent users, running embedded or in server mode, and is know to be robust and production ready. It is not as fast as HSQLDB (Derby uses durable operations), but it's robust. Still, you should run your own tests against it.
See also
François Orsini's blog
I have been using Apache Derby since 2009 in many of my projects, some of them with 24/7 operation and many millions of rows.
Never ever had a single event of data corruption. Rock solid and fast.
I keep choosing it as my RDBMS of choice, unless a good reason not to pops out.
Try looking into H2. It was created by the guy who originally made HSQLDB but built from scratch so doesn't use any HSQLDB code. Not sure how its stability compares to HSQL since I haven't used HSQL in ages and I'm only using H2 for short-lived databases currently. I personally found H2 to be easier to get going than Derby but maybe that's because H2 has a cheat sheet web page.
It might be possible to re-code to use an abstraction layer and then run tests to compare H2 and Derby with the issues you have found.
On the project management side of the fence, does your roadmap have a major version coming up? That might be a rather appropriate time to rip out the guts this way and I wouldn't say you were crazy cause it could potentially remove lots of hard to manage work arounds. If you wanted to make the change where it could affect live systems without plenty of warning and backups in place then you may be crazy.
With regard to HSQLDB, one thing that it doesn't have as a project that SQLite has is the documentation of a robust testing suite and online documentation of assiduous ACID compliance.
I don't mean to take anything away from HSQLDB. It's meant to serve as an alternative to MySQL not to fopen() as SQLite is intended. One can say that the scope of HSQLDB (all the Java RDBMS's really) is much more ambiitious. Fredt and his group have accomplished an extraordinary achievement with HSQLDB. Even so, doing the Google search "Is HSQLDB ACID compliant" doesn't leave an early adopter feeling as confident as one feels after reading about the testing harnesses on the SQLite website.
At http://sqlite.org/transactional.html
"SQLite is Transactional
A transactional database is one in which all changes and queries appear to be Atomic, Consistent, Isolated, and Durable (ACID). SQLite implements serializable transactions that are atomic, consistent, isolated, and durable, even if the transaction is interrupted by a program crash, an operating system crash, or a power failure to the computer.
We here restate and amplify the previous sentence for emphasis: All changes within a single transaction in SQLite either occur completely or not at all, even if the act of writing the change out to the disk is interrupted by
a program crash,
an operating system crash, or
a power failure.
The claim of the previous paragraph is extensively checked in the SQLite regression test suite using a special test harness that simulates the effects on a database file of operating system crashes and power failures."
At http://sqlite.org/testing.html
"1.0 Introduction
The reliability and robustness of SQLite is achieved in part by thorough and careful testing.
As of version 3.7.14, the SQLite library consists of approximately 81.3 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 1124 times as much test code and test scripts - 91421.1 KSLOC.
1.1 Executive Summary
Three independently developed test harnesses
100% branch test coverage in an as-deployed configuration
Millions and millions of test cases
Out-of-memory tests
I/O error tests
Crash and power loss tests
Fuzz tests
Boundary value tests
Disabled optimization tests
Regression tests
Malformed database tests
Extensive use of assert() and run-time checks
Valgrind analysis
Signed-integer overflow checks"
Give SQLite a try if you're looking for something self contained (no server involved). This is what backs android's db api, and is highly stable.

db4o experiences?

I'm currently trying out db4o (the java version) and I pretty much like what I see. But I cannot help wondering how it does perform in a real live (web-)environment. Does anyone have any experiences (good or bad) to share about running db4o?
We run DB40 .NET version in a large client/server project.
Our experiences is that you can potentially get much better performance than typical relational databases.
However, you really have to tweak your objects to get this kind of performance. For example, if you've got a list containing a lot of objects, DB4O activation of these lists is slow. There are a number of ways to get around this problem, for example, by inverting the relationship.
Another pain is activation. When you retrieve or delete an object from DB4O, by default it will activate the whole object tree. For example, loading a Foo will load Foo.Bar.Baz.Bat, etc until there's nothing left to load. While this is nice from a programming standpoint, performance will slow down the more nesting in your objects. To improve performance, you can tell DB4O how many levels deep to activate. This is time-consuming to do if you've got a lot of objects.
Another area of pain was text searching. DB4O's text searching is far, far slower than SQL full text indexing. (They'll tell you this outright on their site.) The good news is, it's easy to setup a text searching engine on top of DB4O. On our project, we've hooked up Lucene.NET to index the text fields we want.
Some APIs don't seem to work, such as the GetField APIs useful in applying database upgrades. (For example, you've renamed a property and you want to upgrade your existing objects in the database, you need to use these "reflection" APIs to find objects in the database. Other APIs, such as the [Index] attribute don't work in the stable 6.4 version, and you must instead specify indexes using the Configure().Index("someField"), which is not strongly typed.
We've witnessed performance degrade the larger your database. We have a 1GB database right now and things are still fast, but not nearly as fast as when we started with a tiny database.
We've found another issue where Db4O.GetByID will close the database if the ID doesn't exist anymore in the database.
We've found the Native Query syntax (the most natural, language-integrated syntax for queries) is far, far slower than the less-friendly SODA queries. So instead of typing:
// C# syntax for "Find all MyFoos with Bar == 23".
// (Note the Java syntax is more verbose using the Predicate class.)
IList<MyFoo> results = db4o.Query<MyFoo>(input => input.Bar == 23);
Instead of that nice query code, you have to an ugly SODA query which is string-based and not strongly-typed.
For .NET folks, they've recently introduced a LINQ-to-DB4O provider, which provides for the best syntax yet. However, it's yet to be seen whether performance will be up-to-par with the ugly SODA queries.
DB4O support has been decent: we've talked to them on the phone a number of times and have received helpful info. Their user forums are next to worthless, however, almost all questions go unanswered. Their JIRA bug tracker receives a lot of attention, so if you've got a nagging bug, file it on JIRA on it often will get fixed. (We've had 2 bugs that have been fixed, and another one that got patched in a half-assed way.)
If all this hasn't scared you off, let me say that we're very happy with DB4O, despite the problems we've encountered. The performance we've got has blown away some O/RM frameworks we tried. I recommend it.
update July 2015 Keep in mind, this answer was written back in 2008. While I appreciate the upvotes, the world has changed since then, and this information may not be as reliable as it was when it was written.
Most native queries can and are efficiently converted into SODA queries behind the scenes so that should not make a difference. Using NQ is of course preferred as you remain in the realms of strong typed language. If you have problems getting NQ to use indexes please feel free to post your problem to the db4o forums and we'll try to help you out.
Goran
Main problem I've encountered with it is reporting. There just doesn't seem to be any way to run efficient reports against a db4o data source.
Judah, it sounds like you are not using transparent activation, which is a feature of the latest production version (7.4)? Perhaps if you specified the version you are using as there may be other issues which are now resolved in the latest version?

Categories