I've been told by my company's support team that some versions of java have a significant performance impact when we turn on -verbose:gc. However I can't figure out if this is the case or not.
Was this logging slow(ish) at some point, and when did it stop?
The reason I ask is that there's some hesitation about applying this to a production environment to investigate potential memory leaks (and whether we can stop doing periodic restarts of the system...).
Specifically I'm talking about Java 1.4.2 which I think introduced the argument, and what service pack it applies up to.
I know you asked about the impact of verbose:gc (Amir is correct), but based on the comments I see you are investigating a memory leak.
Is it possible for you to get a histogram of your environment? verbose GC will only show you that there is a memory leak, not where the memory is sitting.
you mention java 1.4.2, is that your current version? If you are using 1.5 or higher you can use
jmap -histo <pid> > file.txt
This will give you a breakdown of all the objects in memory. You will freeze your JVM for a time dependent on the amount of memory in the system. (2GB can freeze for a minute or so on even good hardware) test this on a development system first. I know you don't want to impact your production environment but this is a necessary evil to find the source of the problem. Do a capture right before the periodic restart to lesson your impact.
I suggest that you do the following:
Write some benchmark that is likely to stress the garbage collection. (Create large linked data structures with weak references, etc, etc).
Install a copy of the same version of the JVM as you are using in production on some test box.
Run the benchmark with various GC logging settings, including the settings that you want to run in production, measuring the performance impact on the benchmark.
If you do this right, it will give you some solid evidence about what the likely performance impact will be for your production server.
Related
I had an old application, a JAR file, that went through some enhancements. Basically some parts of the code had to be modified along with modifying some of the logic.
Comparing the OLD version against the NEW version, the NEW version is about 2X slower than the old one.
I'm trying to narrow down whats causing the slow down, but I'm finding myself measuring the time for certain for-loops using System.println with System.currentTimeMillis(). This is really getting very tedious.
Is there a Java performance tool that will help me in figuring out why the NEW JAR is about 2X slower than the old one?
Thanks in advance.
JProfiler has the capability to compare CPU snapshots. Record the execution for the old and the new JAR file and save snapshots (if the JVM exits at the end, configure a "JVM exit" trigger that saves a snapshot).
Then open the snapshot comparison window with "Session->Compare Snapshots in New Window" and add the two snapshot. A hot spots comparison will look like this (a view filter is set in this case):
It will immediately show you which methods are responsible for the increase in execution time.
Another way to analyze the differences in execution time is the call tree comparison which will look like this:
Disclaimer: My company develops JProfiler.
You should use a profiler. This will show you which methods are taking the most time (and what is calling them), without you having to guess which ones to measure.
Java comes with a built-in profiler called hprof, but see also:
https://stackoverflow.com/questions/14762/please-recommend-a-java-profiler
5 things you didn't know about ... Java performance monitoring
The JConsole and VisualVM tools
Depending on how long-running the process is, I'd think about Visual VM 1.3.3. If you download all the plugins, you'll be able to see heap, threads, objects, etc. That ought to help, and it won't cost a dime.
I believe it assumes the Oracle/Sun JVM.
A profiler tool like YourKit or something to measure performance reliably like Hyperic's Sigar is a good canditate for your case. Have a look at those tools.
The former will find bottlenecks in your code and/or memory leaks (not all of them) while as the latter is an API that you can measure performance reliably since Oracle's JVM & OpenJDK have no way of getting perfomance metrics reliably/consistently/accurately (like CPU wall clock time or CPU time spent from the application, memory usage, application threads, etc).
By default, Java provides packages for these things.
For example:
java.lang.management.ManagementFactory
java.lang.management.ThreadMXBean
but depending on your case they may or may not be adequate (keep in mind they are OK for most cases unless we are talking about something critical).
I'm new into Java profiling, and I'd like to ask about it.
Does it make sense to profile an application on the server, where I only have the console?
Is there any console profiler which make sense?
Or should I profile the application only on localhost?
VisualVM is able to connect to a remote host. The profiling works the same way as local. It's part of the JDK since JDK 6 update 7.
When you do profiling you should generally try to reproduce the production environment as closely as possible. Differences in hardware (# of cores, memory, etc) and software (OS, JVM version) can make your profiling results as unique as the runtime environment.
For example, what looks like a CPU bottleneck worth optimizing on your local machine might disappear entirely or turn into a disk bottleneck on your production server depending on the differences in the CPU.
All modern profilers will allow to attach to a remotely running JVM so you don't need to worry about only having console access.
What profiler you decide to use will depend on your needs and preferences. Certain profilers will show you "hotspots" where your code is spending the majority of the time and these are often good candidates for optimization.
I prefer to use JProfiler for its extensive features and good performance. I previously used YourKit but switched to JProfiler for its memory and thread profiling features.
Does it make sense to profile an
application on the server, where I
only have the console?
Fortunately, that does not matter, since Java has always (well, for a long time, anyway) supported remote profiling, i.e. the Profiler can run on a different machine than the JVM being profiled and get its data via the network.
All Java profilers that I've ever seen support this, including visualvm, which comes with recent JDKs (in the bin directory).
There's a "quick and dirty" but effective way to find performance problems in Java.
This is a language-agnostic explanation of why it works.
Note that the JDK comes with a built-in profiler, HPROF. HPROF is a bit simplistic, but will find many problems. It is activated simply by invoking the JVM with parameter -agentlib:hprof; it will then automatically run as long as your JVM is running. It collects data until the JVM terminates, then dumps it into a file on the server, which you can analyze.
See e.g. http://java.sun.com/developer/technicalArticles/Programming/HPROF.html for a good introduction. A nice graphical analyzer for HPROF's results is PerfAnal: http://java.sun.com/developer/technicalArticles/Programming/perfanal/
We have a low latency trading system (feed handlers, analytics, order entry) written in Java. It uses TCP and UDP extensively, it does not use Infiniband or other non-standard networking.
Can anyone comment on the tradeoffs of various OSes or OS configurations to deploy this system? While throughput is obviously important to keep up with modern price feeds, latency is our #1 priority.
Solaris seems like a natural candidate since they created Java; should I use Sparc or x64 processors?
I've heard good things about RHEL and SLERT, are those the right versions of Linux to use in our benchmarking.
Has anyone tested Windows against the above OSes? Or is it assumed to not keep up?
I'd like to leave the Java vs C++ debate for a different thread.
Vendors love this kind of benchmark. You have code, right?
IBM, Sun/Oracle, HP will all love to run your app on their gear to demonstrate their advantages.
Make them do this. If you have code, make the vendors run a demonstration on their gear to show which is best for your needs.
It's easy, painless, free, and factual. The final decision will be easy and obvious. And you will know how to install and tune to maximize performance.
What I hate doing is predicting this kind of thing before the code is written. Too many customers have asked for a H/W and OS recommendation before we've finished identifying all the use cases. Asking for that kind of precognition is simple craziness.
But you have code. You can produce test cases that exercise your code. That's perfect.
For a trading environment, in addition to low latency you are probably concerned about consistency as well as latency so focusing on reducing the impact of GC pauses as much as possible may well give you more benefit than differnt OS choices.
The G1 garbage collector in recent versions of Suns Hotspot VM improves stop the world pauses a lot, in a similar way to the JRockit VM
For real performance guarantees though, Azul Systems version of the Hotspot compiler on their Java Appliance delivers the lowest guaranteed pauses available - also it scales to a massive size - 100s of GB stack and 100s of cores.
I'd discount Java Realtime - although you'd get guarantees of response, you'd sacrifice throughput to get those guarantees
However, if your planning on using your trading system in an environment where every microsecond counts, you're really going to have to live with the lack of consistency you will get from the current generation of VM's - none of them (except realtime) guarantees low microsecond GC pauses. Of course, at this level your going to run into the same issues from OS activity (process pre-emption, interrupt handling, page faults, etc.). In this case one of the real time variants of Linux is going to help you.
I wouldn't rule out Windows from this just because it's Windows. My expirience over the last few years has been that the Windows versions of the Sun JVM was usually the most mature performance wise in contrast to Linux or Soaris x86 on the same hardware. The JVM for Solaris SPARC may be good too, but I guess with Windows on x86 you'll get more power for less money.
I would strongly recommend that you look into an operating system you already have experience with. Solaris is a strange beast if you only know Linux, e.g.
Also I would strongly recommend to use a platform actually supported by Sun, as this will make it much easier to get professional assistance when you REALLY, REALLY need it.
http://java.sun.com/javase/6/webnotes/install/system-configurations.html
I'd probably worry about garbage collection causing latency well before the operating system; have you looked into tuning that at all?
If I were willing to spend the time to trial different OSs, I'd try Solaris 10 and NetBSD, and probably a Linux variant for good measure.
I'd experiment with 32-vs-64 bit architectures; 64 bit will give you a larger heap address space... but will take longer to address each bit of memory.
I'm assuming you've profiled your application and know where the bottlenecks are; by the comment about GC, you've done that. In that case, your application shouldn't be CPU-bound, and chip architecture shouldn't be a primary concern.
I don't think managed code environments and real-time processing go together very well. If you really care about latency, remove the layer imposed by the managed code. This is not a Java vs C++ argument, but a Java/C#/... vs C/C++/FORTRAN/... argument, and I believe that is a valid design discussion to have.
And yes, I do mean FORTRAN, we run a number of near real-time systems with a FORTRAN foundation.
One way to manage latency is to have several JVM's dividing the work with smaller heaps so that a stop the world garbage collection isn't as time consuming when it happens and affects less processes.
Another approach is to load up a cluster of JVM's with enough memory and allocate the processes to ensure there won't be a stop the world garbage collection during the hours you care about latency (if this isn't a 24/7 app), and restart JVMs on off hours.
You should also look at other JVM implementations as a possibility (such as JRocket). Of course if any of them are appropriate depends entirely on your specific application.
If any of the above matters to your approach, it will affect the choice of OS. For example, if you go with another JVM implementation, that might limit OS choices, and if you go with clustering or otherwise running a several JVM's for the application, that might require some better underlying OS tools to manage effectively, further influencing the OS choice.
The choice of operating system or configurable is completely redundant considering the availability of faster network fabrics.
Look at 10GigE with ToE NICs, or the faster solution of 4X QDR (40Gbs) InfiniBand but with IPoIB presenting a standard Ethernet interface and routing.
Has anyone used the new Java 1.6 JDK tool, VisualVM, to profile a production application and how does the application perform while being profiled?
The documentation say that it is designed for both Production and Development use, but based on previous profiling experience, with other profiling tools, I am hesitant.
While i haven't personally used VisualVM, I saw this blog post just today that might have some useful information for you. He talks about profiling a production app using it.
I tried it on a dev box and found that when I turned off profiling it would shut Tomcat down unexpectedly. I'd be very cautious about rolling this out to production- can you simulate load in a staging environment instead? It's not as good as the real thing, but it probably won't get you fired if it goes wrong...
I've used VisualVM before to profile something running locally. A big win was that I just start it up, and it can connect to the running JVM. It's easier to use than other profiling tools I've used before and didn't seem to have as much overhead.
I think it does sampling. The overhead on a CPU intensive application didn't seem significant. I didn't measure anything (I was interested in how my app performed, not how the tool performed), but it definitely didn't have the factor of 10 slowdown I'm used to seeing from profiling.
For just monitoring your application, running VisualVM remotely should not slow it down much. If the system is not on the edge of collapsing, I still haven't seen any problems. It's basically just reading out information from the coarse grained built-in instrumentation of the JVM. If you start profiling, however, you'll have the same issues as with other profilers. Basically because they all work almost they same way, often using the support in the JVM.
Many people have problems with running VisualVM remotely, due to firewall issues, but you can even run Visual VM remotely over ssh, with some system properties set.
It is possible to remote connect to your server from a different computer using VisualVM. You just need to right click on the "Remote" node and say "Add Remote Host."
This would at least eliminate the VisualVM overhead (if there is any) from impacting performance while it is running.
This may not eliminate all performance concerns, especially in Production environments, but it will help a little.
I've used the Net Beans profiler which uses the same underpinnings as Visual VM.
I was working with an older version of Weblogic, which meant using the 1.5 JVM, so I couldn't do a dynamic attach. The application I was profiling had several thousand classes and my workstation was pretty much unusable while the profiler instrumented them all. Once instrumentation was complete, the system was sluggish but not completely unusable. The amount of slowdown really depends on what you need to capture. The basic CPU metrics are pretty light weight. Profiling memory allocation slows things down a lot.
I would not use it on a production system. Aside from the potential for slowdown, I eventually ran out of PermGen space because the profiler reinstruments and reloads classes when you change settings. (This may be fixed in the 1.6 agent, I don't know)
I've been using VisualVM a lot since before it was included in the JDK. It has a negligable impact on the performance of the system. I've never noticed it cause a problem with performance on the system, but then again, our Java server had enough headroom at the time to support a little extra load. If your server is running at a level that is completely tacked out and can't handle the VisualVM running, then I would say its more likely that you need to buy another server . Any production server should have some memory headroom , otherwise what you have is a disaster just waiting to happen.
I have used VVM(VavaVoom?) quite extensively, works like a charm in the light mode, i.e. no profiling, just getting the basic data from the VM. But once you start profiling and there are many classes, then there is considerable slowdown. I wouldn't profile in a production environment even if you have 128 core board with 2 tera of memory purely because the reloading and re-defining of the classes is tricky, the server classloaders are another thing, also vary from one server implementation to another, interfering with them in production is not a very good idea.
I am calling a vendor's Java API, and on some servers it appears that the JVM goes into a low priority polling loop after logging into the API (CPU at 100% usage). The same app on other servers does not exhibit this behavior. This happens on WebSphere and Tomcat. The environment is tricky to set up so it is difficult to try to do something like profiling within Eclipse.
Is there a way to profile (or some other method of inspecting) an existing Java app running in Tomcat to find out what methods are being executed while it's in this spinwait kind of state? The app is only executing one method when it gets in this state (vendor's method). Vendor can't replicate the behavior (of course).
Update:
Using JConsole I was able to determine who was running and what they were doing. It took me a few hours to then figure out why it was doing it. The problem ended up being that the vendor's API jar that was being used did not match exactly to the the database configuration that it was using. It was defaulting to having tracing and performance monitoring enabled on the servers that had the slight mis-match in configuration. I used a different jar and all is well.
So thanks, Joshua, for your answer. JConsole was extremely easy to setup and use to monitor an existing application.
#Cringe - I did some experimenting with some of the options you suggested. I had some problems with getting JProfiler set up, it looks good (but pricey). Going forward I went ahead and added the Eclipse Profiler plugin and I'll be looking over the different open source profilers to compare functionality.
If you are using Java 5 or later, you can connect to your application using jconsole to view all running threads. jstack also will do a stack dump. I think this should still work even inside a container like Tomcat.
Both of these tools are included with JDK5 and later (I assume the process needs to be at least Java 5, though I could be wrong)
Update:
It's also worth noting that starting with JDK 1.6 update 7 there is now a bundled profiler called VisualVM which can be launched with 'jvisualvm'. It looks like it is a java.net project, so additional info may be available at that page. I haven't used this yet but it looks useful for more serious analysis.
Hope that helps
Facing the same problem I used YourKit profiler. It's loader doesn't activate unless you actually connect to it (though it does open a port to listen for connections). The profiler itself has a nice "get amount of time spent in each method" while working in it's less obtrusive mode.
Another way is to detect CPU load (via JNI, so you'd need an external library for this) in a "watchdog" thread with highest priority and start logging all threads when the CPU is high enough for a long enough time. You might find this article enlightining.
If it's for professional purpose and you have some money to spend, try to get your hands on JProfiler. If you just want to get some insights, try out the Eclipse Profiler Plugin. I used it several times, but I don't know the current state.
A new(?) project from the eclipse project itself is available too: http://www.eclipse.org/tptp/ (See this article). Never used it, so I can't tell if it is worth the effort.
There's also a very good list of open source profilers available at http://www.manageability.org/blog/stuff/open-source-profilers-for-java
If JConsole can't be used you can
press CTRL+BREAK under Windows
send kill -3 <process id> under Linux
to get a full Thread Dump. This doesn't affect performance and can always be run in production.
JRockit Mission Control Latency Analyzer.
The Latency Analyzer that comes with JRockit shows you what the JVM is "doing" when it's not doing anything. In the latest version you can see latencies for:
Java wait/blocked/sleep/parked.
File I/O
Network I/O
Memory allocation
GC pauses
JVM latencies, e.g code generation and class loading
Thread suspension
The tool will give you the stack trace when the latency occurred. You can view the latency data in many different ways (aggregated traces, as a histogram, in a thread graph etc.). The tool also allows you to see transitions between threads, for instance when one thread notifies another.
latency analyzer http://blogs.oracle.com/hirt/WindowsLiveWriter/The.0LatencyAnalyserMigratedfromtheoldBE_7246/latency_graph_2.png
The overhead is negligible and unlike many other tools it can be used in a production environment.
This blog post gives you a brief introduction and the program can be downloaded here.
It's free to use for development!
Use a profiler. Yes they cost money, and using them can occasionally be a bit awkward, but they do provide you with a great deal more real evidence rather than guesswork.
Human beings are universally bad at guessing where performance bottlenecks are. It just seems to be something our brains aren't build to do very well. It may seem obvious, you may have great ideas about what the problem is, but the real world often turns out to be doing something different. And optimising the wrong part of code means, at best, lots of work for minimal benefit. More often it makes things slower, and sometimes it breaks things entirely. So before you make any changes for the sake of optimisation, you should always have real evidence from a profiler or other accurate tool.
As mentioned, both JProfiler and YourKit are both fairly good and not prohibitively expensive. Last time I looked, they both had free demos too.
For completeness sake: even though my company more or less standardizes on Eclipse we use Netbeans (6 and up) with its included, free profiler on a daily basis. It works better than the Eclipse TPTP plugin (last checked 3 months ago) and for us it removes any need for a commercial profiler such as JProfiler, which is excellent, but fast becoming unnecessary.
VisualVM should be the profiler from netbeans as standalone. I tried the TPTP for eclipse but visualVm seems as a much nicer option!