Server option for java.exe - java

What is the difference between server and client Hotspot. Is there any reason to switch production environment to -server. Please share your practical experience. Is there any performance boost? Related to Oracle UCM 10g

Yes, there can be a huge performance boost in some cases. When benchmarking my Protocol Buffers implementation, I was comparing it against the Java implementation - and I was really pleased, until I switched on -server... and saw the Java performance double. I don't know the details of everything it does, but basically it lets the JIT work harder, as it expects the code to be running for longer.
I wouldn't expect that to be the case in every application of course, but it can make a big difference. Of course, it won't have much effect unless your application is already CPU-bound on the JVM. I have no experience with Oracle UCM, so couldn't say how much effect it will have on your specific use. Have you already performed appropriate analysis of where the bottleneck in your system is?

The server VM collects stats for a longer time than the client VM before converting Java bytecode to native code. A bit more here: http://java.sun.com/j2se/1.3/docs/guide/performance/hotspot.html#server

Related

Is there a way to achieve JIT performance without JIT overhead?

Is there a way to achieve JIT performance while removing JIT overhead? Preferably by compiling a class file to an native image.
I have investigated GCJ, but even for a simple program, GCJ output's performance is much worse than Java JIT.
You could try Excelsior.
http://www.excelsior-usa.com/jet.html
I've had good experiences with this in the past (but it was a long time ago)
There have been in the past a number of "static" compilers for Java, but I don't know that any are currently available. To the best of my knowledge the last one in use was the "Java Transformer" for the IBM iSeries "Classic JVM", but that JVM was deprecated in favor of the J9 JVM.
The "Java Transformer" did quite well, but, as others have noted, it could not take advantage of all of the info that a JITC has available at runtime (though it did manage to take advantage of some of the runtime info).
(And it should be noted that "JITC overhead" is really minimal. Compilation occurs pretty quickly and efficiently in most cases. The problem is that compilation doesn't even start until the interpreter has run long enough to collect statistics and trigger the JITC.)
The simplest solution is often to warmup your code on startup. If you have a server based application, the cost of startup isn't as important as the cost when the service is used. In this situation you can warmup all the critical code by calling it 10K - 20K times which triggers all that code to compile.
This can take less than a second in simple cases so has very little impact on startup and means you are using compiled code when the service is used.
If you have a client based application you usually have a lot of processing power for just one user in which case the cost of the background JIT is less important.
The moral of the story is; try to check you have a problem to solve before diving into a solution. Very often questions on stack over flow are about problems which have either a) already been solved or b) are not a significant problem in the first place.
Measuring the extent of your problem or performance is the best guide as to what matters and what doesn't. If you don't measure, you are just guessing. (Even if you have ten+ year experience performance tuning Java systems)
I have just found my answer here:
Why is Java faster when using a JIT vs. compiling to machine code?
Quote from top answer:
This means that you cannot write a AOT compiler which covers ALL Java
programs as there is information available only at runtime about the
characteristics of the program.
I'd recommend you to find the root cause of inferior performance of your Java code before trying out AOT compilation or rewriting any portions in C++.
Head over to http://www.javaperformancetuning.com/ for tons of information and links.

high performance rtsp server

I want to implement a high performance rtsp server which is to handle vod request --- it only handles signaling request, it does not need to streaming the media file. I have accomplish a version that is written in Java basing on the Mina networking framework, and the performance seems to be not very high.
As far as I know, high performance SIP server(e.g. VoIP server) is written in C (e.g. OpenSIPS, Kamailo), should I use C or C++ for my project to get a significant performance improvement?
BTW. I found some explanation of the reason why OpenSER is written in C by its author:
"On the other hand, it is the garbage collector that can cause lots of troubles when developing SIP applications in Java. Aheavily loaded server written in Java stopsworking when the garbage collector is cleaning the memory. The delay caused by the garbage collector can be even more than 10 seconds. Such delays are unacceptable"
Is that a fact nowadays which mean that I should use C too?
There are a huge number of variables here, language may not be the determining factor. Trustin Lee, the author of MINA, later created Netty, which offers very high performance indeed. Lee himself says that MINA has "relatively poor performance" as a result of the complexity of some of the features it offers being too tightly bound to the core. So you might look at Netty before completely rewriting everything.
If you're using Oracle's JVM, you're using an extremely optimized runtime system that identifies hotspots in the code (hence the name "HotSpot") and aggressively optimizes them at runtime. It's been a long time since you could say, ipso facto, that Java code would run more slowly than C code. Well-written, optimized C code probably out-performs equivalent Java code in certain select tasks, but a generalization from there is probably no longer appropriate, and of course your code has to take on several of the burdens that the JVM shoulders for you with Java. Also note that there are several things you can do to tune the JVM's garbage collector, for instance to prefer consistency and short pauses over footprint and long pauses.
Obviously C has several strengths (being close to the machine is sometimes exactly what you want), as does explicit memory management for certain tasks.
Have you compared your rtsp server with Wowza?
Wowza is also written in Java, if your rtsp server has lower performance than Wowza, I believe you could improve its performance without changing language, otherwise, if Wowza has similar performance with your server, it indicates that Java cannot satisfy the performance requirements, maybe you should consider to use c/c++ instead.
I built my own RtspServer in C# and have no problem streaming to hundreds of clients.
http://net7mma.codeplex.com/
Code Project article # http://www.codeproject.com/Articles/507218/Managed-Media-Aggregation-using-Rtsp-and-Rtp
You are more then welcome to adopt / reference the design! (Apache 2 License)

Does .Net have a different environment for client and server?

Recently I have heard the bellow statement. Can someone please elaborate on it?
With client side applications, Java has better performance than .Net. The reason is that .Net environment on the server-side (iis?) is different than its client side. While Java uses the same environment at both ends. Since frameworks performance is optimized mainly on the service side, .Net client side is not as good as .Net server side or Java.
Update: I believe he also mentioned the difference between clients (XP, VISTA) and servers (Windows 2008 server) with respect to .Net
In client operating systems you get a concurrent garbage collector. It is slower in absolute time, but it appears to the user to be faster because they get shorter pauses.
In server operating systems you get a serial garbage collector. It is faster overall, but has to pause applications longer.
This is old information, I don't know if it is still true.
EDIT: Java also has a client and server modes. Unlike .NET this isn't tied to the OS, you instead pass it as a command line parameter.
Edit 2: From MSDN Magizine in Dec. 2000
On a multiprocessor system running the server version of the execution engine (MSCorSvr.dll), the managed heap is split into several sections, one per CPU. When a collection is initiated, the collector has one thread per CPU; all threads collect their own sections simultaneously. The workstation version of the execution engine (MSCorWks.dll) doesn't support this feature.
http://msdn.microsoft.com/en-us/magazine/bb985011.aspx
Again, this is old information and may have changed.
This makes absolutely no sense.
.NET is not a server side or a client side framework. There are pieces that you use on the server side or on the client side but it's all part of the same beast.
Aside from whether it's correct or not, most (99.9999%) of people who make an unqualified statement like Y performs better than X for some ambiguous and unmeasurable task are, as Carlin would say, embarrassingly full of s***.
The .NET CLR (Common Language Runtime) is the same on the server and on the client side. The .NET CLR works conceptually like the Java VM.
There are Client Profiles for .NET 3.5 and later that only provide a subset of the .NET API suitable for client apps, but this is just offered as a convenience to reduce the .NET footprint. Any supported OS can install the full .NET version.
I can only guess that the statement is a result of misunderstanding what a Client Profile is.
I've never been able to prove to myself that in general terms, Java is faster than .NET. I've run a few of my own benchmarks that indicate quite the opposite, but even then, I'm not willing to make such a blanket statement.
I can say that in pure code execution, .NET executes faster that Java on the same machine, at least the last time I bothered to test about 2 yrs ago. Code written in C# incidentally executes a little faster than VB.NET because C# doesn't have all the type checking that VB.NET does.
The algorithm I used to test was basically a string parser that took a string which was an arithmetic expression, transformed it into reverse polish notation, then determined the answer (stuff taught in many schools). Even doing my best to optimize the code in Java, I could never get it as fast as even the VB.NET code. Differences were around 10% as I recall.
That said, I've not benchmarked GC or other aspects and never have been able to dig up good unbiased benchmarks that actually test either in a real-ish system. Usually you get someone trying to prove why their religion is better and they ignore any other view point. I'm sure there's some aspects of Java where they have better algorithms that will nullify the raw code execution speed.
In short, when people make statements like that, ask them to back it up. If they can't or rely on 'everyone knows it', don't bet the farm on their statements.

Why is the JVM slow to start?

What exactly makes the JVM (in particular, Sun's implementation) slow to get running compared to other runtimes like CPython? My impression was that it mainly has to do with a boatload of libraries getting loaded whether they're needed or not, but that seems like something that shouldn't take 10 years to fix.
Come to think of it, how does the JVM start time compare to the CLR on Windows? How about Mono's CLR?
UPDATE: I'm particularly concerned with the use case of small utilities chained together as is common in Unix. Is Java now suitable for this style? Whatever startup overhead Java incurs, does it add up for every Java process, or does the overhead only really manifest for the first process?
Here is what Wikipedia has to say on the issue (with some references).
It appears that most of the time is taken just loading data (classes) from disk (i.e. startup time is I/O bound).
Just to note some solutions:
There are two mechanisms that allow to faster startup JVM.
The first one, is the class data sharing mechanism, that is supported since Java 6 Update 21 (only with the HotSpot Client VM, and only with the serial garbage collector as far as I know)
To activate it you need to set -Xshare (on some implementations: -Xshareclasses ) JVM options.
To read more about the feature you may visit:
Class data sharing
The second mechanism is a Java Quick Starter. It allows to preload classes during OS startup, see:
Java Quick Starter for more details.
Running a trivial Java app with the 1.6 (Java 6) client JVM seems instantaneous on my machine. Sun has attempted to tune the client JVM for faster startup (and the client JVM is the default), so if you don't need lots of extra jar files, then startup should be speedy.
If you are using Sun's HotSpot for x86_64 (64bit compiled), note that the current implementation only works in server mode, that is, it precompiles every class it loads with full optimization, whereas the 32bit version also supports client mode, which generally postpones optimization and optimizes the most CPU-intensive parts only, but has faster start-up times.
See for instance:
http://en.wikipedia.org/wiki/64-bit#32_vs_64_bit
http://java.sun.com/docs/hotspot/HotSpotFAQ.html#64bit_compilers
That being said, at least on my machine (Linux x86_64 with 64bit kernel), the 32bit HotSpot version supports both client and server mode (via the -client and -server flags), but defaults to server mode, while the 64bit version only supports server mode.
It really depends on what you are doing during the start up. If you run Hello World application it takes 0.15 seconds on my machine.
However, Java is better suited to running as a client or a server/service which means the startup time isn't as important as the connection time (about 0.025 ms) or the round trip time response time (<< 0.001 ms).
There are a number of reasons:
lots of jars to load
verification (making sure code doesn't do evil things)
JIT (just in time compilation) overhead
I'm not sure about the CLR, but I think it is often faster because it caches a native version of assemblies for next time (so it doesn't need to JIT). CPython starts faster because it is an interpreter, and IIRC, doesn't do JIT.
In addition to things already mentioned (loading classes, esp. from compressed JARs); running in interpreted mode before HotSpot compiles commonly-used bytecode; and HotSpot compilation overhead, there is also quite a bit of one-time initialization done by JDK classes themselves.
Many optimizations are done in favor of longer-running systems where startup speed is less of a concern.
And as to unix style pipelining: you certainly do NOT want to start and re-start JVM multiple times. That is not going to be efficient. Rather chaining of tools should happen within JVM. This can not be easily intermixed with non-Java Unix tools, except by starting such tools from within JVM.
All VMs with a rich type system such as Java or CLR will not be instanteous when compared to less rich systems such as those found in C or C++. This is largely because a lot is happening in the VM, a lot of classes get initialized and are required by a running system. Snapshots of an initialized system do help but it still costs to load that image back into memory etc.
A simple hello world styled one liner class with a main still requires a lot to be loaded and initialized. Verifying the class requires a lot of dependency checking and validation all which cost time and many CPU instructions to be executed. On the other hand a C program will not do any of these and will amount of a few instructions and then invoke the printer function.

Does Java save its runtime optimizations?

My professor did an informal benchmark on a little program and the Java times were: 1.7 seconds for the first run, and 0.8 seconds for the runs thereafter.
Is this due entirely to the loading of the runtime environment into the operating environment ?
OR
Is it influenced by Java's optimizing the code and storing the results of those optimizations (sorry, I don't know the technical term for that)?
Okay, I found where I read that. This is all from "Learning Java" (O'Reilly 2005):
The problem with a traditional JIT compilation is that optimizing code takes time. So a JIT compiler can produce decent results but may suffer a significant latency when the application starts up. This is generally not a problem for long-running server-side applications but is a serious problem for client-side software and applications run on smaller devices with limited capabilities. To address this, Sun's compiler technology, called HotSpot, uses a trick called adaptive compilation. If you look at what programs actually spend their time doing, it turns out that they spend almost all their time executing a relatively small part of the code again and again. The chunk of code that is executed repeatedly may be only a small fraction of the total program, but its behavior determines the program's overall performance. Adaptive compilation also allows the Java runtime to take advantage of new kinds of optimizations that simply can't be done in a statically compiled language, hence the claim that Java code can run faster than C/C++ in some cases.
To take advantage of this fact, HotSpot starts out as a normal Java bytecode interpreter, but with a difference: it measures (profiles) the code as it is executing to see what parts are being executed repeatedly. Once it knows which parts of the code are crucial to performance, HotSpot compiles those sections into optimal native machine code. Since it compiles only a small portion of the program into machine code, it can afford to take the time necessary to optimize those portions. The rest of the program may not need to be compiled at all—just interpreted—saving memory and time. In fact, Sun's default Java VM can run in one of two modes: client and server, which tell it whether to emphasize quick startup time and memory conservation or flat out performance.
A natural question to ask at this point is, Why throw away all this good profiling information each time an application shuts down? Well, Sun has partially broached this topic with the release of Java 5.0 through the use of shared, read-only classes that are stored persistently in an optimized form. This significantly reduces both the startup time and overhead of running many Java applications on a given machine. The technology for doing this is complex, but the idea is simple: optimize the parts of the program that need to go fast, and don't worry about the rest.
I'm kind of wondering how far Sun has gotten with it since Java 5.0.
I'm not aware of any virtual machine in widespread use that saves statistical usage data between program invocations -- but it certainly is an interesting possibility for future research.
What you're seeing is almost certainly due to disk caching.
I agree that it's likely the result of disk caching.
FYI, the IBM Java 6 VM does contain an ahead-of-time compiler (AOT). The code isn't quite as optimized as what the JIT would produce, but it is stored across VMs, I believe in some sort of persistent shared memory. Its primary benefit is to improve startup performance. The IBM VM by default JITs a method after it's been called 1000 times. If it knows that a method is going to be called 1000 times just during the VM startup (think a commonly-used method like java.lang.String.equals(...) ), then it's beneficial for it to store that in the AOT cache so that it never has to waste time compiling at runtime.
I agree that the performance difference seen by the poster is most likely caused by disk latency bringing the JRE into memory. The Just In Time compiler (JIT) would not have an impact on performance of a little application.
Java 1.6u10 (http://download.java.net/jdk6/) touches the runtime JARs in a background process (even if Java isn't running) to keep the data in the disk cache. This significantly decreases startup times (Which is a huge benefit to desktop apps, but probably of marginal value to server side apps).
On large, long running applications, the JIT makes a big difference over time - but the amount of time required for the JIT to accumulate sufficient statistics to kick in and optimize (5-10 seconds) is very, very short compared to the overall life of the application (most run for months and months). While storing and restoring the JIT results is an interesting academic exercise, the practical improvement is not very large (Which is why the JIT team has been more focused on things like GC strategies for minimizing memory cache misses, etc...).
The pre-compilation of the runtime classes does help desktop applications quite a bit (as does the aforementioned 6u10 disk cache pre-loading).
You should describe how your Benchmark was done. Especially at which point you start to measure the time.
If you include the JVM startup time (which is useful for Benchmarking the User experience but not so useful to optimize Java code) then it might be a filesystem caching effect or it can be caused by a feature called "Java Class Data Sharing":
For Sun:
http://java.sun.com/j2se/1.5.0/docs/guide/vm/class-data-sharing.html
This is an option where the JVM saves a prepared image of the runtime classes to a file, to allow quicker loading (and sharing) of those at the next start. You can control this with -Xshare:on or -Xshare:off with a Sun JVM. The default is -Xshare:auto which will load the shared classes image if present, and if not present it will write it at first startup if the directory is write able.
With IBM Java 5 this is BTW even more powerful:
http://www.ibm.com/developerworks/java/library/j-ibmjava4/
I don't know of any mainstream JVM which is saving JIT statistics.
Java JVM (actually might change from different implementations of the JVM) when first started out will interpret the byte code. Once it detects that the code will be running enough number of times JITs it to native machine language so it runs faster.

Categories