We have in-house a 3rd party java application on a ridiculously hefty Linux box that runs a scheduling algorithm. The application runs far too slow for the load we need. We do not have the code and the vendor won't be making any changes to the application due to monetary reasons, thus I can't improve the code. The application is single threaded and its design does not lend itself to parallelization (so I can't split the load between 2 boxes).
What can I, either software or hardware wise, do to improve performance of the application?
Get on the newest version of Java (newer versions tend to have performance improvements)
Give Java more memory to work with (benchmark to see if this makes any difference)
Measure what it's doing with top. Upgrade whatever it's having problems with (more memory, faster CPU, SSD). Some CPUs are better at single threaded work loads than others (read: don't run this on a Bulldozer; something with Turbo Boost might be helpful).
Play with other experimental JVM options (benchmark to see if this makes any difference)
Remove any other applications running on this machine (benchmark to see if there's any benefit -- no sense wasting money if it doesn't help)
Pay the vendor to make it faster or give you the code (ie: give them monetary reasons to fix this)
Find an alternative
Write your own alternative
1) You can improve the hardware that the application runs on. Do this by looking at what resources the application is using. Is it maxing out CPU, or using all the memory (or both)? If so, you can add more CPU power or RAM accordingly.
2) Is there a way you can cache the results from the application? Can you ever avoid using it?
Otherwise, there really isn't much more you can do. If becomes a bigger problem, you might have to write your own scheduling algorithm, or better yet, find a better vendor.
Can you preprocess the input so the application has less work to do?
For example, perhaps the first thing the application does is sort the list of jobs to be scheduled using a merge sort. If you pre-sort the list, then the application's sort will have no work to do. You might be able to sort the list faster than the application can - use many cores, do it ahead of time, etc.
Run it on a faster computer. This is probably the cheapest solution of the lot.
Related
I am doing web crawling on a server with 32 virtual processors using Java. How can I make full of these processors? I've seen some suggestions on multi-threaded programming, but I wonder how that could ensure all processors would be taken advantage of since we can do multi-threaded programming on single processor machine as well.
There is no simple answer to this ... except the way to ensure all processors are used is to use multi-threading the right way. (Note: that is a circular answer!)
Basically, the way to get effective use of multiple processors is to:
ensure that there is work that can be done in parallel, and
reduce / eliminate contention points that force one thread to wait while another thread does something.
This is difficult enough when you are doing simple computation. For a web crawler, you've got the additional problems that the threads will be competing for network and (possibly) remove server bandwidth, and they will typically be attempting to put their results into a shared data structure or database.
That's about all that can be said at this level of generality ...
And as #veer correctly points, you can't "ensure" it.
... but using a load of threads will surely be quicker wall-time-wise because all the miserable network latency will happen in parallel ...
Actually, if you go overboard, a load of threads can reduce throughput because of contention. Just throwing lots of threads at the problem is rarely a good idea.
A computer or a program is only as fast as the slowest link in its processing chain. Just increasing the CPU capacity is not going to ensure a drastic performance peak. Leaving aside other issues like your cache-size, RAM, etc., there are two basic kinds of approach to your question about how to take advantage of all your processors:
[1] Using a Jit/just-in-time compiler/interpreter technology such as Java/.NET. I don't know much about java, but the .NET jitter is definitely designed to take advantage of all the available processors on the mahcine. In fact, this very feature makes a jitter stand out against other static language compilers like C/C++, because the jitter "knows" that it is sitting on 32 processors, it is in a much better position to take advantage of them than a program statically compiled on any other machine. (provided you have written a robust multi-threading code for it!)
[2] Programming in C/C++. This is the classic approach. If you compile your code on the same machine with 32 CPUs, and take proper care in your program such as memory-management, handling pointers, etc. the C/C++ program will be the most optimal and will perform better than its CLR/JVM counterpart (as it runs without the extra overhead of a garbage-collector or a VM).
But keep in mind that writing robust code is much easier in .NET/Java than C/C++. So, if you are not a "hard-core" programmer, I would suggest going with the former approach. Also remember to handle your multiple threads with care, such as locking variables when multiple threads try to change the same variables. However, excessive locking might make your code hang, if a variable behaves unexpectedly.
Processor management is implemented in native through the Virtual machine you are using i.e., JVM. You can have a look here Java Hotspot VM Options to optimize your machine if you are using Java Hotspot VM. If you are using a third party VM then your provider may help you with tuning it for your requirements.
Application performance in design practically depends on you.
If you would like to monitor your threads and memory usage to optimize your application, you can use any VM monitoring tools available to date. The Java virtual machine (JVM) has built-in instrumentation that enables you to monitor and manage it using JMX.
For details you can check Platform Monitoring and management using JMX. For third party VMs you have to contact the vendor I guess.
I had an old application, a JAR file, that went through some enhancements. Basically some parts of the code had to be modified along with modifying some of the logic.
Comparing the OLD version against the NEW version, the NEW version is about 2X slower than the old one.
I'm trying to narrow down whats causing the slow down, but I'm finding myself measuring the time for certain for-loops using System.println with System.currentTimeMillis(). This is really getting very tedious.
Is there a Java performance tool that will help me in figuring out why the NEW JAR is about 2X slower than the old one?
Thanks in advance.
JProfiler has the capability to compare CPU snapshots. Record the execution for the old and the new JAR file and save snapshots (if the JVM exits at the end, configure a "JVM exit" trigger that saves a snapshot).
Then open the snapshot comparison window with "Session->Compare Snapshots in New Window" and add the two snapshot. A hot spots comparison will look like this (a view filter is set in this case):
It will immediately show you which methods are responsible for the increase in execution time.
Another way to analyze the differences in execution time is the call tree comparison which will look like this:
Disclaimer: My company develops JProfiler.
You should use a profiler. This will show you which methods are taking the most time (and what is calling them), without you having to guess which ones to measure.
Java comes with a built-in profiler called hprof, but see also:
https://stackoverflow.com/questions/14762/please-recommend-a-java-profiler
5 things you didn't know about ... Java performance monitoring
The JConsole and VisualVM tools
Depending on how long-running the process is, I'd think about Visual VM 1.3.3. If you download all the plugins, you'll be able to see heap, threads, objects, etc. That ought to help, and it won't cost a dime.
I believe it assumes the Oracle/Sun JVM.
A profiler tool like YourKit or something to measure performance reliably like Hyperic's Sigar is a good canditate for your case. Have a look at those tools.
The former will find bottlenecks in your code and/or memory leaks (not all of them) while as the latter is an API that you can measure performance reliably since Oracle's JVM & OpenJDK have no way of getting perfomance metrics reliably/consistently/accurately (like CPU wall clock time or CPU time spent from the application, memory usage, application threads, etc).
By default, Java provides packages for these things.
For example:
java.lang.management.ManagementFactory
java.lang.management.ThreadMXBean
but depending on your case they may or may not be adequate (keep in mind they are OK for most cases unless we are talking about something critical).
I have an legacy in house business application which is running in one JVM and there are many performance issues with it more specifically regarding Heap Usage and Running Concurrent Threads, at the core of it, it's an scheduling application wherein the user can schedule some task from front end and when time arrives the task get's fired up, all code is home grown and we are not using any third party scheduler for scheduling purpose, now my goal is to enhance performance of the application and there are some options which i can try, like using scheduling mechanism like Quartz or distribute application to different jvms, challenge i have here is that i have never being exposed to this kind of situation of re-architecting the application and so am not sure where to start from, i know SO is not right place to ask this type of question but am not sure how to approach and any help/suggestions would be highly appreciated.
From reading your post I don't get the impression that you've really grasped what the underlying cause of your performance problems are. The first step in addressing any such problem should be to identify the cause before proposing a solution. I'd begin by asking some pretty high level questions.
How many concurrent tasks/threads do you currently execute?
Are the jobs CPU or IO bound?
What software stack is the app running on?
What hardware is the app running on?
By distributing the application across multiple JVMs you will invariably add complexity, which is fine, provided it's a valid and required solution.
I suggest you exercise the application with a realistic workload so that the server is busy and profile it to find CPU, memory and resource bottleneck.
IMHO: Separating JVM might be an option if you are using more than 1 - 8 GB of heap AND Full GC times are an issue. If you are using much less than it, its unlikely to help.
DON'T jump to any conclusions about which solution should be until you have a very good understanding of the problem or you can end up spending a lot of time optimising the wrong things and possibly making it worse.
I'm designing a Java based web-app and I need a key-value store. Berkeley DB seems fitting enough for me, but there appears to be TWO Berkeley DBs to choose from: Berkeley DB Core which is implemented in C, and Berkeley DB Java Edition which is implemented in pure Java.
The question is, how to choose which one to use? With web-apps scalability and performance is quite important (who knows, maybe my idea will become the next Youtube), and I couldn't find easily any meaningful benchmarks between the two. I have yet to familiarize with Cores Java API, but I find it hard to believe that it could be much worse than Java Editions, which seems to be quite nice.
If some other key-value store would be much better, feel free to recommend that too. I'm storing smallish binary blobs, and keys probably will be hashes of the data, or some other unique id.
I have quite a bit of experience using both BDB-JE and BDB-core with Java. Deciding which one to use is quite simple: If you want concurrency, use BDB-JE. If you want scalability, use BDB-core.
BDB-JE breaks down performance-wise with large databases due to its file format and its reliance on Java garbage collection to clean up evicted cache entries. Expect long garbage collection pauses or spend a lot of time tuning magic GC settings. The file format has issues too, because the background cleaner threads have to spend a lot of time cleaning up garbage created by early cache evictions. If your database fits in RAM, BDB-JE works quite well.
BDB-core relies on a page-locking strategy, and highly concurrent applications experience a lot of deadlocks. If you can randomly order operations it reduces the deadlock potential, but it never eliminates it. Because BDB-core stores data in a more traditional way, it scales to super large sizes with predictable and expected performance degradation. Because its cache is not managed by a garbage collector, it can be quite large and not cause any pauses.
If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)
I faced the same problem and decided to go with the Java edition, mainly because of its portability(I need something that would ran even on mobile devices). There are also the Direct Persistence Layer (DPL) API and the fact that the whole db is a single jar makes its deployment fairly simple.
The recent version 4 brought in High availability and performance improvements. There is also the fact that long running java applications can achieve such an optimization, that they would surpass native C applications performance in some scenarios.
It's a natural fit for any Java application - desktop or web.
I while ago I was having the same question, after doing some benchmarks I found that hash mode in the native edition is much faster and storage efficient than anything the java edition has to offer, so I decided to go with the native implementation.
I suggest you do your own benchmarks for the storage capacities you expect and decide if the Java edition is fast enough.
if it is, or if performance is not a big issue for you (it's critical for me), just go with the Java edition. otherwise go with the native one (assuming you see the same performance boost for your own use case).
btw:
my benchmark was test the speed of querying random keys out of 20,000,000 records, where the key is a string and the value is an int (4 bytes).
I saw that inserts (populating the benchmark) was much faster with the native version, and queries was twice as fast.
(This is not due to Java shortcoming but because the Java version is not of the same version as the native version - 4.0 vs 4.8 IIRC).
I decided to go with the Java Edition, simply because its possible to embed the database runtime within the same deployable. This was an important feature for my setup. I haven't benchmarked between core and JE, but I have seen great performance compared with other key-value stores that I tested when first evaluating database stores.
If you're creating a web-application though, then concurrency might be very important to you in the long run.
What's a good method for assigning work to a set of remote machines? Consider an example where the task is very CPU and RAM intensive, but doesn't actually process a large dataset. The language of choice would be Java. I was thinking Hadoop would be a good option, but the dataset passed between remote machines is fairly small, and Hadoop seems to focus mainly on the distribution of data rather than distribution of work.
What are some good technologies that can help?
EDIT: I'm mainly interested in load balancing. There will be a series of jobs with a small (< 3MB) dataset, but significant processing and memory needs.
MPI would probably be a good choice, there's even a JAVA implementation.
MPI may be part of your answer, but looking at the question, I'm not sure if it addresses the portion of the problem you care about.
MPI provides a communication layer between processing components. It is low level requiring you to do a fair amount of work, but from what I saw in an introduction presentation, it also comes with some common matrix data manipulation functions.
In your question, you seem to be more interested in the load balancing/job processing aspects of the problem. If that really is your focus, maybe a small program hosted in a Servlet or an RMI server might be sufficient. Let each program go to the server for their next unit of work and then submit the results back (you might even be able to use a database/file share, but pay attention to locking issues). In other words, a pull mechanism versus a push mechanism.
This approach is fairly simple to implement and gives you the advantage of scaling up by just running more distributed clients. Load balancing isn't too important if you intend to allow your process to take full control of the machine. You can experiment with running multiple clients on a machine that has multiple cores to see if you can improve overall through-put for the node. A multi-threaded client would be more efficient, but can increase complexity depending on the structure of the code you are using to solve the problem.