Best OS to deploy a low latency Java application?

Best OS to deploy a low latency Java application? - java

We have a low latency trading system (feed handlers, analytics, order entry) written in Java. It uses TCP and UDP extensively, it does not use Infiniband or other non-standard networking.
Can anyone comment on the tradeoffs of various OSes or OS configurations to deploy this system? While throughput is obviously important to keep up with modern price feeds, latency is our #1 priority.
Solaris seems like a natural candidate since they created Java; should I use Sparc or x64 processors?
I've heard good things about RHEL and SLERT, are those the right versions of Linux to use in our benchmarking.
Has anyone tested Windows against the above OSes? Or is it assumed to not keep up?
I'd like to leave the Java vs C++ debate for a different thread.

Vendors love this kind of benchmark. You have code, right?
IBM, Sun/Oracle, HP will all love to run your app on their gear to demonstrate their advantages.
Make them do this. If you have code, make the vendors run a demonstration on their gear to show which is best for your needs.
It's easy, painless, free, and factual. The final decision will be easy and obvious. And you will know how to install and tune to maximize performance.
What I hate doing is predicting this kind of thing before the code is written. Too many customers have asked for a H/W and OS recommendation before we've finished identifying all the use cases. Asking for that kind of precognition is simple craziness.
But you have code. You can produce test cases that exercise your code. That's perfect.

For a trading environment, in addition to low latency you are probably concerned about consistency as well as latency so focusing on reducing the impact of GC pauses as much as possible may well give you more benefit than differnt OS choices.
The G1 garbage collector in recent versions of Suns Hotspot VM improves stop the world pauses a lot, in a similar way to the JRockit VM
For real performance guarantees though, Azul Systems version of the Hotspot compiler on their Java Appliance delivers the lowest guaranteed pauses available - also it scales to a massive size - 100s of GB stack and 100s of cores.
I'd discount Java Realtime - although you'd get guarantees of response, you'd sacrifice throughput to get those guarantees
However, if your planning on using your trading system in an environment where every microsecond counts, you're really going to have to live with the lack of consistency you will get from the current generation of VM's - none of them (except realtime) guarantees low microsecond GC pauses. Of course, at this level your going to run into the same issues from OS activity (process pre-emption, interrupt handling, page faults, etc.). In this case one of the real time variants of Linux is going to help you.

I wouldn't rule out Windows from this just because it's Windows. My expirience over the last few years has been that the Windows versions of the Sun JVM was usually the most mature performance wise in contrast to Linux or Soaris x86 on the same hardware. The JVM for Solaris SPARC may be good too, but I guess with Windows on x86 you'll get more power for less money.

I would strongly recommend that you look into an operating system you already have experience with. Solaris is a strange beast if you only know Linux, e.g.
Also I would strongly recommend to use a platform actually supported by Sun, as this will make it much easier to get professional assistance when you REALLY, REALLY need it.
http://java.sun.com/javase/6/webnotes/install/system-configurations.html

I'd probably worry about garbage collection causing latency well before the operating system; have you looked into tuning that at all?
If I were willing to spend the time to trial different OSs, I'd try Solaris 10 and NetBSD, and probably a Linux variant for good measure.
I'd experiment with 32-vs-64 bit architectures; 64 bit will give you a larger heap address space... but will take longer to address each bit of memory.
I'm assuming you've profiled your application and know where the bottlenecks are; by the comment about GC, you've done that. In that case, your application shouldn't be CPU-bound, and chip architecture shouldn't be a primary concern.

I don't think managed code environments and real-time processing go together very well. If you really care about latency, remove the layer imposed by the managed code. This is not a Java vs C++ argument, but a Java/C#/... vs C/C++/FORTRAN/... argument, and I believe that is a valid design discussion to have.
And yes, I do mean FORTRAN, we run a number of near real-time systems with a FORTRAN foundation.

One way to manage latency is to have several JVM's dividing the work with smaller heaps so that a stop the world garbage collection isn't as time consuming when it happens and affects less processes.
Another approach is to load up a cluster of JVM's with enough memory and allocate the processes to ensure there won't be a stop the world garbage collection during the hours you care about latency (if this isn't a 24/7 app), and restart JVMs on off hours.
You should also look at other JVM implementations as a possibility (such as JRocket). Of course if any of them are appropriate depends entirely on your specific application.
If any of the above matters to your approach, it will affect the choice of OS. For example, if you go with another JVM implementation, that might limit OS choices, and if you go with clustering or otherwise running a several JVM's for the application, that might require some better underlying OS tools to manage effectively, further influencing the OS choice.

The choice of operating system or configurable is completely redundant considering the availability of faster network fabrics.
Look at 10GigE with ToE NICs, or the faster solution of 4X QDR (40Gbs) InfiniBand but with IPoIB presenting a standard Ethernet interface and routing.

Related

Do Java Mission Control and Flight Recorder deliver same functionality as VisualVM?

The (relatively) new built-in performance monitor/profiler for Java is Mission Control. The Oracle docs advertise that they can be used in production without incurring performance hits (less than 2%):
The tool chain [Mission Control + Flight Recorder] enables developers and administrators to collect and analyze data from Java applications running locally or deployed in production environments.
I have used jvisualvm (VisualVM) for many years now, but never in a production environment due to the putative admonishment that it does incur performance overhead.
So I ask: What is so different between Mission Control (and its Flight Recorder) and VisualVM that allows MC/FR to not hinder performance? Or do they not include certain features/capabilities that VisualVM delivers?

The main performance difference in the method profiling is that MC/JFR uses sampling, and only samples a few threads per sampling interval. It uses a similar approach to AsyncGetCallTrace (see for example http://psy-lob-saw.blogspot.com/2016/06/the-pros-and-cons-of-agct.html)
Since I work with MC/JFR, I'm not as familiar with how VisualVM does it's sampling profiling, but I believe it's not using the same method.
MC/JFR has it's data gathering engine deeply integrated into the HotSpot JVM, VisualVM uses external APIs/MXBeans. This also helps JFR to lower the performance overhead.
Generally, JFR is designed to find the hot spots, rather than gathering data that 100% correct but might slow your application down and affect the actual behavior. This goes for both the method and allocation sampling, as well as other information about latency events (wait/sleep/block), where only the events above a certain threshold are recorded. I'm less familiar with how this compares for VisualVM.
Other than that, the two tools have different feature sets, none of which is a superset of the other.

How to make full use of multiple processors?

I am doing web crawling on a server with 32 virtual processors using Java. How can I make full of these processors? I've seen some suggestions on multi-threaded programming, but I wonder how that could ensure all processors would be taken advantage of since we can do multi-threaded programming on single processor machine as well.

There is no simple answer to this ... except the way to ensure all processors are used is to use multi-threading the right way. (Note: that is a circular answer!)
Basically, the way to get effective use of multiple processors is to:
ensure that there is work that can be done in parallel, and
reduce / eliminate contention points that force one thread to wait while another thread does something.
This is difficult enough when you are doing simple computation. For a web crawler, you've got the additional problems that the threads will be competing for network and (possibly) remove server bandwidth, and they will typically be attempting to put their results into a shared data structure or database.
That's about all that can be said at this level of generality ...
And as #veer correctly points, you can't "ensure" it.
... but using a load of threads will surely be quicker wall-time-wise because all the miserable network latency will happen in parallel ...
Actually, if you go overboard, a load of threads can reduce throughput because of contention. Just throwing lots of threads at the problem is rarely a good idea.

A computer or a program is only as fast as the slowest link in its processing chain. Just increasing the CPU capacity is not going to ensure a drastic performance peak. Leaving aside other issues like your cache-size, RAM, etc., there are two basic kinds of approach to your question about how to take advantage of all your processors:
[1] Using a Jit/just-in-time compiler/interpreter technology such as Java/.NET. I don't know much about java, but the .NET jitter is definitely designed to take advantage of all the available processors on the mahcine. In fact, this very feature makes a jitter stand out against other static language compilers like C/C++, because the jitter "knows" that it is sitting on 32 processors, it is in a much better position to take advantage of them than a program statically compiled on any other machine. (provided you have written a robust multi-threading code for it!)
[2] Programming in C/C++. This is the classic approach. If you compile your code on the same machine with 32 CPUs, and take proper care in your program such as memory-management, handling pointers, etc. the C/C++ program will be the most optimal and will perform better than its CLR/JVM counterpart (as it runs without the extra overhead of a garbage-collector or a VM).
But keep in mind that writing robust code is much easier in .NET/Java than C/C++. So, if you are not a "hard-core" programmer, I would suggest going with the former approach. Also remember to handle your multiple threads with care, such as locking variables when multiple threads try to change the same variables. However, excessive locking might make your code hang, if a variable behaves unexpectedly.

Processor management is implemented in native through the Virtual machine you are using i.e., JVM. You can have a look here Java Hotspot VM Options to optimize your machine if you are using Java Hotspot VM. If you are using a third party VM then your provider may help you with tuning it for your requirements.
Application performance in design practically depends on you.
If you would like to monitor your threads and memory usage to optimize your application, you can use any VM monitoring tools available to date. The Java virtual machine (JVM) has built-in instrumentation that enables you to monitor and manage it using JMX.
For details you can check Platform Monitoring and management using JMX. For third party VMs you have to contact the vendor I guess.

high performance rtsp server

I want to implement a high performance rtsp server which is to handle vod request --- it only handles signaling request, it does not need to streaming the media file. I have accomplish a version that is written in Java basing on the Mina networking framework, and the performance seems to be not very high.
As far as I know, high performance SIP server(e.g. VoIP server) is written in C (e.g. OpenSIPS, Kamailo), should I use C or C++ for my project to get a significant performance improvement?
BTW. I found some explanation of the reason why OpenSER is written in C by its author:
"On the other hand, it is the garbage collector that can cause lots of troubles when developing SIP applications in Java. Aheavily loaded server written in Java stopsworking when the garbage collector is cleaning the memory. The delay caused by the garbage collector can be even more than 10 seconds. Such delays are unacceptable"
Is that a fact nowadays which mean that I should use C too?

There are a huge number of variables here, language may not be the determining factor. Trustin Lee, the author of MINA, later created Netty, which offers very high performance indeed. Lee himself says that MINA has "relatively poor performance" as a result of the complexity of some of the features it offers being too tightly bound to the core. So you might look at Netty before completely rewriting everything.
If you're using Oracle's JVM, you're using an extremely optimized runtime system that identifies hotspots in the code (hence the name "HotSpot") and aggressively optimizes them at runtime. It's been a long time since you could say, ipso facto, that Java code would run more slowly than C code. Well-written, optimized C code probably out-performs equivalent Java code in certain select tasks, but a generalization from there is probably no longer appropriate, and of course your code has to take on several of the burdens that the JVM shoulders for you with Java. Also note that there are several things you can do to tune the JVM's garbage collector, for instance to prefer consistency and short pauses over footprint and long pauses.
Obviously C has several strengths (being close to the machine is sometimes exactly what you want), as does explicit memory management for certain tasks.

Have you compared your rtsp server with Wowza?
Wowza is also written in Java, if your rtsp server has lower performance than Wowza, I believe you could improve its performance without changing language, otherwise, if Wowza has similar performance with your server, it indicates that Java cannot satisfy the performance requirements, maybe you should consider to use c/c++ instead.

I built my own RtspServer in C# and have no problem streaming to hundreds of clients.
http://net7mma.codeplex.com/
Code Project article # http://www.codeproject.com/Articles/507218/Managed-Media-Aggregation-using-Rtsp-and-Rtp
You are more then welcome to adopt / reference the design! (Apache 2 License)

CPU or RAM in game development?

I'm beginning to work on a game server written in Java. Granted, Java is not the best solution to server development, but it is the language that I am most comfortable with.
I'm trying to design this game server to handle a lot of connections, and I'm not sure what to focus more on: storing data in memory or keeping memory usage down and using more raw CPU power?

Certainly depends on what your game is like. I recently benchmarked my server with 5,000 clients at like 20MB of ram. You may keep much more state data than me, and it could be a concern.
Your serious issue with many connections is setting up the sockets to handle it properly, using certain manner of socket handling becomes very bogged down or breaks #1024 connections etc. I'm not sure how much optimization you can do in java.
Look at this link for what I'm talking about.
And, good luck! And also, switch languages as soon as possible to a language offering comparable features to java but without the awful drawbacks (meaning Objective-C or C#). I'm not just "slamming java" but the problem you're going to reach when you talk about doing things that are performant is that java will abstract you too far from the Operating System and you will not be able to take advantage of optimizations when you need it.

I wouldn't suggest you design the server for far more than you really need to. If you suddenly find you have 10,000s of clients, you can re-design the system.
I would start with a basic server e.g. i5 with 8 GB of memory for less than £500, or an i7 with 24 GB for less than £1000.
As your number of connections grows you are likely to run out of bandwidth before you run out of resources unless you use a cloud solution.
BTW: You can implement a high frequency trading system with less than 100 micro-second latency in Java. I haven't heard of any Objective-C high frequency trading systems. C# might be able to perform as well or better on Windows, but I prefer Linux for server system.

Why does Java have such a large footprint?

Java - or at least Sun's Hotspot JVM - has long had a reputation for having a very large memory footprint. What exactly is it about the JVM that gives it this reputation? I'd be interested in a detailed breakdown: how much memory goes to the runtime (the JIT? the GC/memory management? the classloader?) anything related to "auxiliary" APIs like JNI/JVMTI? the standard libraries? (which parts get how much?) any other major components?
I realize that this may not be straightforward to answer without a concrete application plus VM configuration, so just to narrow things down at least somewhat: I'm primarily interested in default/typical VM configurations, and in a baseline console "Hello world" app as well as any real-world desktop or server app. (I'm suspecting that a substantial part of the JVM's footprint is largely independent of the app itself, and it is in this part that I'd like to zoom in, ideally.)
I have a couple of other closely related questions:
Other similar technology, such as .NET/mono, don't exhibit nearly the same footprint. Why is this the case?
I've read somewhere on the intarwebs that a large portion of the footprint is due simply to the size of the standard libraries. If this is the case, then why is so much of the standard libraries being loaded up front?
Are there any efforts (JSRs, whatever) to tame the memory footprint? The closest thing I've come across is a project to reduce the on-disk footprint of the JVM.
I'm sure that the footprint has varied over the past decade or so with every new version of Java. Are there any specific numbers/charts chronicling precisely how much the JVM's footprint has changed?

Some initiatives:
Since 1.5 class data sharing can be used;
Java 6 update 14 brought in compressed oops which reduces the footprint of 64-bit JVMs with under 4GB of Heap.

We have some server-side apps which do nothing but bridge multicast traffic (i.e. they have no permanent state). They all run with about 2.3 - 2.5 Mb of Heap on a 32-bit Java6 (linux) JRE.
Is this a big footprint? I could easily have a thousand of these on a typical server-class machine (from a memory perspective), although that would be bit pointless from a threading perspective!
That said, there is the Jigsaw project to modularize the VM (the libraries I believe) which is coming in Java7; this will help those who wish for smaller footprints.
I realize that this doesn't really answer your question but it is relevant nonetheless! What sort of applications are you designing where you are finding that memory footprint is an issue?

At least one thing is Java's long history - it started in 1995 and is now version 6. Keeping backwards compatibility while adding features inevitably inflates its footprint. The image below tells pretty much...

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.