I'm trying to eliminate a slow database being the cause of some performance issues for a distributed application I'm supporting. I've done local profiling of various facets of the application and performance monitoring of the server itself, leading me to suspect that the database is at least partially responsible for the poor performance.
Currently I'm using JBoss for the back-end (using a Hibernate / JDBC layer to connect to the database), but I only have source access to some of the code.
I've found Elvyx, but this project seems to have been abandoned in 2008. Is there a newer JDBC profiler available - what's the current 'de facto' standard for profiling a database in a distributed app?
Alternatively, can anyone suggest a better / alternative approach?
Try using YourKit, it supports a reasonable degree of JDBC profiling:
You can view executed SQL after you capture a CPU snapshot.
You can also enable JDBC probes and view multiple things live, such as timings, stack traces, threads, SQL statements and many more, see attached screen shot from my colleague's computer (looks like you'll need to open this image in another browser tab/window to see it full size):
Don't really want this to sound as an ad for YourKit, but get yourself a trial license and give it a go.
I too would recommend AppDynamics. The 'Lite' version was more than adequate for my purposes.
Our Java EE application runs within JBoss and we knew we had performance bottlenecks in certain areas. By quickly and easily (and I do mean easily) installing and spinning up AppDynamics, then running some load through the application, we were able to see straight away exactly where our performance hits were located. On the clean and concise dashboard we were able to drill right down the stack to see which class needed some improvement.
Highly recommended. Definitely check it out. I heat the 'Pro' version is even better.
If you are trying to hunt down (or at least confirm) issues related to a slow database, IMO using the profiling tools provided by the database would be a good starting point.
We had done something along these lines previously by profiling JDBC calls (noting the timings) and comparing them against the time required to execute the same query "on" the database itself. This gave a pretty good idea of how much time was exactly spent by the JDBC in making the db call and getting back the result.
dynaTrace supports SQL call introspection and measures how long each sql call took. The field is called application performance management in general.
Is there a particular Use case which you feel is slow or Is it in general that you feel DB responsiveness is slow.
In case its a use case, I would suggest go for tools like AppDynamics or GlassBox.
In case its in general, Starting from DB is a better approach.
I am assuming you would have already done the analysis at distributed applications side regarding the connection metrics and at DB server OS side regarding the socket opens and IO permissible.
Arcturus Applicare does support JDBC profiling on JBoss (and other Java app servers). You can view all SQLs with min, max, avg stats aggregated across all servers on you env or on individual servers.
With full profiling enabled, you will be able to see the execution trace for each and every request/transaction processed by the server including SQLs with execution parameters. Making it pretty easy to detect expensive SQLs and where exactly they are been executed.
Related
We have installed 2 instance of same application in a same datacenter. Both the app is using same oracle DB. But we are observing performance issue in one application. In AppDynamics we can see the response time of one application is much higher that other.
Is it possible to intentionally prioritise/configure the DB such a way. If yes, where should I look into the database.
Any Idea why this is happening? I am totally clueless here.
In theory, yes: if Resource Manager has been enabled it could be the case that different Resource Manager plans have such an impact but experience shows that this feature is seldom used.
In practive this kind of difference can have many cause:-
different SQL statements run
data is different
database statistics differences
different database configuration
different hardware
etc.
The first thing to look at database level is something similar to Statspack report (or AWR if licensing allows) to compare database configuration and activity.
And don't forget that application performance is not only database performance it depends also on application server, network and front-end.
At present I have a set of benchmark tests for recording the speed at which a Java application connects submits and returns data from varying RDBMS housed on varying server platforms. The application uses a simple algorithm for recording the time taken associated with each test. The application itself is a simple Java interface for a user to specify the tests, this seemed easier than hard coding each test or using an IDE to perform each test (bare in mind with the combination of RDBMS, Server O.S and client O.S there are in the region of several hundred individual tests). I would like to further my findings by introducing the cpu usage and memory usage during these tests on the client side where the application resides, I could hard code the algorithm for doing so in my application(My Preference) or use a third party software for monitoring this (Bare in mind it would need to be suitable for cross platform use, Windows 7, Solaris and Ubuntu).
So my question is how could I record the usage of CPU and Memory during a test through either hard coding in my Java application or Using a third party software? If you believe a third party would be the solution please could you mention the actual product and how it is possible to do this?
Thankyou to all who take the time to answer.
Check VisualVM. Has a lot of features
I used VisualVM and help to much to get memory leaks.
Here has a video who show most important VisualVM features
There are plenty of commercial products for this. JProbe is my favorite these days, but I'm also using YourKit. In the free arena, Eclipse has "TPTT" -- "Test Platform something something" -- but it seems to be a rare person who can actually get the darn thing to work. Never works for me.
I've got an app running on a grid of uniform java processes (potentially on different physical machines). I'd like to collect cpu usage statistics from a single run of this app. I've went over profiling tools looking for an option of automatic collection of data but failed to find any in netbeans, tptp, jvisualvm, yourkit etc.
Maybe I'm looking in a wrong way?
What I was thinking is:
run the processes on the grid with some special setup that allows them to dump profiling info
run my app as usual - it will push tasks to the grid, the processes will execute the tasks and publish profiling info
uses some tool to collect and analyze the profiling results
but I can't find anything even remotely similar to this.
Any thoughts, experience, suggestions?
Thank you!
If you have allowed remote JMX access and if you are using SUN JDK 1.6 then try using jvisualvm. It has the option of remote JMX connection. Though I haven't it used for profiling CPU in a distributed environment.
Note: For CPU profiling your application should be running on SUN JDK 1.6 or above.
Have a look at these links:
JVisualVM
JVisualVM - Working with Remote Applications
Get heap dump from a remote application in Java using JVisualVM
Unable to profile JBoss 5 using jvisualvm
http://www.taranfx.com/java-visualvm
I have used CA Introscope for this type of monitoring. It uses Instrumentation to collect metrics over time. As an example, it can be configured to provide you a view of all nodes and their performance over time. From that node view, you can drill down to the method level to help you figure out where your bottle necks are.
Yes, it will provide CPU utilization.
It's a commercial $$$ tool, but its a great tool for collecting, monitoring and interrogating performance data.
if you look at something like zabbix (though there are tons others of monitoring tools), this allows for gathering data via JMX from a Java app. And if you enable JMX in your app and allow it to be queried externally (via TCP/IP) you will have access to a lot of the hotspot internals (free memory etc) also thread stacks etc. Then you could have these values graphed as well. It does need configuration but what you're looking for don't think can be done with a one line of a script.
Just to add that profiling information on each node usually contain timestamps.
To match these timestamps all machines should have exactly the same time (10 millis delta maximum)
cluster nodes should synchronize with single source network time server (NTP)
You can use some JMX library, e.g. jmxterm and wrap it in some code to connect to multiple hosts an poll them for changes. If you are abit familiar with Python, look at mys simple script here for some inspiration: http://rostislav-matl.blogspot.com/2011/02/monitoring-tomcat-with-jmxterm.html .
http://www.hyperic.com/products/open-source-systems-monitoring
I never tried other tools mentioned in other answers. I was more than satisfied with hyperic.
It exposes webservices API as well which you can use to write your own analysis tools.
If you know the critical paths you want to analyse I would suggest time stamping your process in key places and combining the logs yourself. This is likely to be a useful addition to your profiling, can be used in production and may be even more useful as a result. (It is for my project)
I have used YourKit to monitor a number of processes at once. It can show you what is happening in each in real time and collect the results when all is finished.
I don't know if it provides a combined view of what is happening.
I was looking for something similar and found Hyperic
Claims are the tool can monitor most common applications ans systems, gather all information and present them in a conveniant fashion.
To be honest this is on my todo list, so I can't say if it will do the job or not. Anyway, it seem impressive.
I have an established software product that uses HSQLDB as its internal settings database. Customer projects are stored in this database. Over the years, HSQLDB has served us reasonably well, but it has some stability/corruption issues that we've had to code circles around, and even then, we can't seem to protect ourselves from them completely.
I'm considering changing internal databases. Doing this would be fairly painful from a development perspective, but corrupted databases (and lost data) are not fun to explain to customers.
So my question is: Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? I found a post via Google complaining that Derby was unstable, but it was from 2006 so I'd entertain the idea that it has been improved in the last 4 years. Or, is there another pure Java embedded (in-process) database that I could use (commercial or open-source). Performance isn't very important to me. Stability is king. Data integrity across power loss, good BLOB support, and hot-backups are all a must.
Please don't suggest something that isn't a SQL-based relational database. I'm trying to retrofit an existing product, not start from scratch, thanks.
For each database engine there is a certain risk of corruption. I am the main author of the H2 database, and I also got reports about broken databases. Testing can reduce the probability of bugs, but unfortunately it's almost impossible to guarantee some software is 'bug free'.
As for the three Java database HSQLDB, Apache Derby, and H2, I can't really say which one is the most stable. I can only speak about H2. I think for most operations, H2 is now stable. There are many test cases that specially test for databases getting corrupt. This includes automated tests on power loss (using a christmas light timer). With power failure tests I found out stability also depends on the file system: sometimes I got 'CRC error' messages meaning the operating system can't read the file (it was Windows). In that case, there is not much you can do.
For mission critical data, in any case I wouldn't rely on the software being stable. It's very important to create backups regularly, and test them. Some databases have multiple way to create backups. H2 for example has an online backup feature, and a feature to write a SQL script file. An alternative is to use replication or clustering. H2 supports a simple cluster mode, I believe Derby supports replication.
I ran Derby 24/7 as the internal database supporting a build automation and test management system for 4 years. It was used by a worldwide team, and never crashed, lost data, or corrupted my records. The only reason we stopped using it is because our company was bought by another and a higher-level decision was handed down. Derby is solid, reliable, and well worth your consideration.
This search shows 215 posts in HSQLDB Users mailing list containing the string "corrupt".
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.java.hsqldb.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.java.hsqldb.user---A
This search shows 264 posts in Derby Users mailing list containing the same string.
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.user&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.user---A
This one shows 1003 posts in Derby Dev mailing list with the same string
http://search.gmane.org/?query=corrupt&author=&group=gmane.comp.apache.db.derby.devel&sort=date&DEFAULTOP=and&xP=Zcorrupt&xFILTERS=Gcomp.apache.db.derby.devel---A
A look at some of the posts shows possible or real cases of database corruption happen despite all the best efforts of database developers.
HSQLDB has had its own share of database corruption issues but has improved over the years. In the latest versions precautions and fixes have been introduced to prevent all the issues that were reported in the last few years.
The new lob storage feature however, turned out to have a logic bug that results in the lobs being "forgotten" after an update. This is being fixed right now, with more extensive tests to support the fix.
Users like CarlG have helped a lot over the years in the bug fixing efforts of both Derby and HSQLDB.
Fred Toussi, HSQLDB Project
Does anyone have enough experience to weigh in on the long-term stability of Apache Derby? (...)
Derby, ex IBM Cloudscape (and now also distributed by Sun as JavaDB) is an ACID-compliant database that can stand a lot of concurrent users, running embedded or in server mode, and is know to be robust and production ready. It is not as fast as HSQLDB (Derby uses durable operations), but it's robust. Still, you should run your own tests against it.
See also
François Orsini's blog
I have been using Apache Derby since 2009 in many of my projects, some of them with 24/7 operation and many millions of rows.
Never ever had a single event of data corruption. Rock solid and fast.
I keep choosing it as my RDBMS of choice, unless a good reason not to pops out.
Try looking into H2. It was created by the guy who originally made HSQLDB but built from scratch so doesn't use any HSQLDB code. Not sure how its stability compares to HSQL since I haven't used HSQL in ages and I'm only using H2 for short-lived databases currently. I personally found H2 to be easier to get going than Derby but maybe that's because H2 has a cheat sheet web page.
It might be possible to re-code to use an abstraction layer and then run tests to compare H2 and Derby with the issues you have found.
On the project management side of the fence, does your roadmap have a major version coming up? That might be a rather appropriate time to rip out the guts this way and I wouldn't say you were crazy cause it could potentially remove lots of hard to manage work arounds. If you wanted to make the change where it could affect live systems without plenty of warning and backups in place then you may be crazy.
With regard to HSQLDB, one thing that it doesn't have as a project that SQLite has is the documentation of a robust testing suite and online documentation of assiduous ACID compliance.
I don't mean to take anything away from HSQLDB. It's meant to serve as an alternative to MySQL not to fopen() as SQLite is intended. One can say that the scope of HSQLDB (all the Java RDBMS's really) is much more ambiitious. Fredt and his group have accomplished an extraordinary achievement with HSQLDB. Even so, doing the Google search "Is HSQLDB ACID compliant" doesn't leave an early adopter feeling as confident as one feels after reading about the testing harnesses on the SQLite website.
At http://sqlite.org/transactional.html
"SQLite is Transactional
A transactional database is one in which all changes and queries appear to be Atomic, Consistent, Isolated, and Durable (ACID). SQLite implements serializable transactions that are atomic, consistent, isolated, and durable, even if the transaction is interrupted by a program crash, an operating system crash, or a power failure to the computer.
We here restate and amplify the previous sentence for emphasis: All changes within a single transaction in SQLite either occur completely or not at all, even if the act of writing the change out to the disk is interrupted by
a program crash,
an operating system crash, or
a power failure.
The claim of the previous paragraph is extensively checked in the SQLite regression test suite using a special test harness that simulates the effects on a database file of operating system crashes and power failures."
At http://sqlite.org/testing.html
"1.0 Introduction
The reliability and robustness of SQLite is achieved in part by thorough and careful testing.
As of version 3.7.14, the SQLite library consists of approximately 81.3 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 1124 times as much test code and test scripts - 91421.1 KSLOC.
1.1 Executive Summary
Three independently developed test harnesses
100% branch test coverage in an as-deployed configuration
Millions and millions of test cases
Out-of-memory tests
I/O error tests
Crash and power loss tests
Fuzz tests
Boundary value tests
Disabled optimization tests
Regression tests
Malformed database tests
Extensive use of assert() and run-time checks
Valgrind analysis
Signed-integer overflow checks"
Give SQLite a try if you're looking for something self contained (no server involved). This is what backs android's db api, and is highly stable.
I have an open source Java application that uses Hibernate and HSQLDB for persistence. In all my toy tests, things run fast and everything is good. I have a client who has been running the software for several months continuously and their database has grown significantly over that time, and performance has dropped gradually. It finally occurred to me that the database could be the problem. As far as I can tell from log statements, all of the computation in the server happens quickly, so this is consistent with the hypothesis that the DB might be at fault.
I know how to do normal profiling of a program to figure out where hot spots are and what is taking up significant amounts of time. But all the profilers I know of monitor execution time within the program and don't give you any help about calls to external resources. What tools do people use to profile programs that are using external db calls to find out where to optimize performance?
A little blind searching around has already found a few hot spots--I noticed a call where I was enumerating all the objects of a particular class in order to find out whether there were any. A one line change to the criterion [.setMaxResults(1)] changed that call from a half-second to virtually instantaneous. I also see places where I ask the same question from the db many times within a single transaction. I haven't figured out how to cache the answer yet, but what I really want is a tool to help me look for these kinds of things more systematically.
Unfortunately, as far as I know, there is no tool for that.
But there are some things you might want to check:
Are you using eager loading instead of lazy loading? By the description of your problem, it really looks like you are not using lazy loading...
Have you turned on and properly configured your second-level caching? Including the Query Cache? Hibernate caching mechanism is extremely powerful and flexible.
Have you consider using Hibernate Search? Depending on your query, Hibernate Search Full Text index on top of Apache Lucene can speed up you queries (since it indexing system is so powerful)
How much data are you storing in HSQLDB? I don't think it performs well when managing large sets of data, since it's just storing everything in files...
There was once a tool called IronGrid/IronEye/IronTrackSql that did exactly what you are looking for. Unfortunately, they went out of business. They did open source their product at the last minute, but I have not been able to find source or a binary for quite some time.
I have been using YourKit for profiling lately, partly because you can have it profile SQL time to find your most called statements and longest running statements. It is not as detailed as IronGrid was, but it does give you valuable information. In my latest database/hibernate tuning session, the problem turned out to be hibernate and how and when it was doing eager vs. lazy loading, and adding some judicious overrides of the default when selecting large numbers of items.
Lots to report on here. I have some results, and am still looking for good answers.
I've found a couple of tools that help:
VisualVM (with BTrace, or the built in Trace) claims to help with tracing, but I haven't been able to find any tool that shows timing on method calls.
YourKit is reputed to be useful; I've asked for an open source license.
The most useful thing I found is Hibernate's built in statistics. If you set
hibernate.generate_statistics true in your properties, you can send sessionFactory.getStatistics(), and see detailed statistics on what objects have been stored and retrieved and what affects the caches are having. I found one of the answers I wanted in the qeuryStatistics, which reports for each compiled query, the cache hits and misses, the number of times the query has run, how many rows were returned, and the average, max and min execution times. These timings made it abundantly clear where the time was going.
I then did some reading on caching. Razenha's suggestion was right on. [I'll mark his answer as right for now.] I added hibernate.cache.use_query_cache true to my properties, and added query.setCacheable(true); to most of my queries. I also added <cache usage="read-write"/> to a few of my .hbm.xml files. Now most of my statistics are showing a vast predominance of cache hits, and the performance is vastly better.
I'd still like some tools to help me trace execution timing so I can attack the worst problems rather than the most obvious, but this is a big help. Maybe one of the tracing tools above will turn out to help.
In Terracotta 3.1, you can monitor all of those statistics in real-time using the Terracotta Developer Console. You can see historical graphs for cache statistics, and see the hibernate statistics or cache statistics cluster-wide or on an per-node basis.
Terracotta is open source. More details and download is at Terracotta for Hibernate.