I made the observation that my java application is running much faster when executed on an AMD processor in contrast to an Intel CPU.
For example my JBoss starts in about 30 seconds on a 3 GHz AMD processor and needs about 60 seconds on a 3 GHz Intel processor with identical disc, RAM and OS?
Has anyone else made this observation? Why is this so?
It depends on the CPU generation as well - clock speed is not everything.
If you set up e.g. an Intel Pentium 4 and an AMD Phenom with the same clock speed, you'll see a large difference in favour of the Phenom.
Update: If you're really curious, use a profiler and post the results.
Other considerations:
Size of processor on-board cache
Bus speed of your motherboard
Cache size of your hard drive
Hard drive RPM and read speed
Bottom line: Unless your configurations are identical besides the chips, and you are trying to asses the performance of a particular technology, you're really comparing apples to oranges.
Are they both running the same architecture? Or is the AMD running a 64-bit OS?
Remember that startup time isn't everything; a 60s startup time probably isn't that bad if the application runs as fast AFTER it's started up.
I've seen 64-bit JDK work much faster than 32-bit one on the same processor. So maybe that's the case.
EDIT: http://java.sun.com/docs/hotspot/HotSpotFAQ.html#64bit_performance. Sorry, I guess I'm wrong.
Related
Intro
So far, I have been working on a piece of software which I am now testing to see the benefit of concurrency. I am testing the same software using two different systems:
System 1: 2 x Intel(R) Xeon(R) CPU E5-2665 # 2.40GHz with a
total of 16 cores , 64GB of RAM running on
Scientific LINUX 6.1 and JAVA SE Runtime Enviroment (build
1.7.0_11-b21).
System 2 Lenovo Thinkpad T410 with Intel i5
processor # 2.67GHz with 4 cores, 4GB of ram running windows 7 64-bit
and JAVA SE Runtime Enviroment (build 1.7.0_11-b21).
Details: The program simulates patients with type 1 diabetes. It does some import (read from csv), some numerical computations(Dopri54 + newton) and some export (Write to csv).
I have exclusive rights to the server, so there should be no noise at all.
Results
These are my results:
Now as you can see system 1 is just as fast as system 2 despite it is a pretty powerfull machine. I have no idea why this is the case - and I am confident that the system is the same. The number of threads goes from 10-100.
Question:
Why would does the two runs have similar execution time despite system 1 being significantly more powerfull than system 2?
UPDATE!
Now, I just thought a bit about what you guys said about it being an I/O memory issue. So, I thought that if I could reduce the file size it would speed up the program, right? I managed to reduce the import file size with a factor of 5, however, no performance improvement at all. Do you guys still think it is the same problem?
As you write .csv files, it is possible that the bottleneck is not your camputation power, but the writing rate on your hard disk.
Almost certainly this means that either CPU time is not the bottleneck for this application, or that something about it is making it resistant to effective parallelization, or both.
For example if reading the data from disk is actually the limiting factor then faster disks are what matters, not faster processors.
If it's running out of memory then that will be a bigger bottlneck.
If it takes more time to spawn each thread than the actual processing inside the thread.
etc.
In this sort of optimization work metrics are king. You need real hard solid numbers for how long things are taking, and where in your program you are losing that time. Only then can you see where to focus your efforts and see if they are effective.
I have performance issues in Isabelle (i.e., the resent version Isabelle2013-2).
I use Isabelle/JEdit, based on the new interface.
So before, the situation was I had some trouble with the performance. But now it is worse, as I sometimes have to wait up to 10 seconds sometimes to enter the right. The performance issues get worse over time, to the point were I have to restart Isabelle after an hour or so.
My suspicion is that I can configure Isabelle better or apply some tricks that improve the performance.
Hardware:
recent CPU, it's an intel i7 quadcore (mobile labtop chip), 16GB ram, fast SSD harddisk.
Software:
64bit arch linux (kernel 3.12.5-1-ARCH)
no 32bit compatibility libraries
my java version is:
java version "1.7.0_45"
OpenJDK Runtime Environment (IcedTea 2.4.3) (ArchLinux build 7.u45_2.4.3-1-x86_64)
My theory file has the size 125KB, the whole theory I am working is in one file, but at the moment I would really want to have just one file.
Symptoms:
Isabelle displays only about 900mb in the lower right corner of UI. I have 16GB RAM, should I configure java to use more RAM? Sometimes a singe process consumes 600% of the CPU, i.e., 6 cores that the linux kernel sees.
Tricks I use:
One trick is that I insert *) at a line below the code I am working on. This leads to a syntax error and the below code is not checked. The second trick is that I went to the timing panel, and all proofs that took longer than 0.2 seconds I commented out and replaced with sorry.
The resent two Isabelle versions are really great improvements!
Any suggestions or tricks to how I can improve the performance of Isabelle?
A few general hints on performance tuning:
One needs to distinguish Isabelle/ML (i.e. the underlying Poly/ML runtime) versus Isabelle/Scala (i.e. the underlying JVM).
Isabelle/ML: Intel CPUs like i7 have hyperthreading, which virtually doubles the number of cores. On smaller mobile machines it is usually better to restrict the nominal number of cores to half of that. See the "threads" option in Isabelle/jEdit / Plugin Options / Isabelle / General. When running on batteries you might even go further below.
Isabelle/ML: Using x86 (32bit) Poly/ML generally improves performance. This is only relevant to Linux, because that platform usually lacks x86 libraries that other platforms provide routinely. There is rarely any benefit to fall back on bulky x86_64. Poly/ML 5.5.x is very good at working in the constant space of 32bit mode.
Isabelle/Scala: JVM performance can be improved by using native x86_64 (which is the default) and providing generous stack and heap parameters.
The main Isabelle application bundle bootstraps the JVM with some options that are hard-wired in a certain place, which can be edited nonetheless:
Linux: Isabelle2013-2/Isabelle2013-2.run
Windows: Isabelle2013-2/Isabelle2013-2.ini
Mac OS X: Isabelle2013-2.app/Contents/Info.plist
For example, the maximum heap size can be changed from -Xmx1024m to -Xmx4096m.
The isabelle jedit command-line tool is configured via the Isabelle settings environment. See also $ISABELLE_HOME/src/Tools/etc/settings for some examples of JEDIT_JAVA_OPTIONS, which can be copied to $ISABELLE_HOME_USER/etc/settings and adapted accordingly. It is also possible to monitor JVM performance via jconsole to get an idea if that is actually a source of problems.
Isabelle/Scala: Isabelle bundles a certain JVM, which is assumed here by default. This variable elimination of Java versions is important to regain some sanity --- otherwise you never know what you get. Are you sure that your OpenJDK is actually used here? It is unlikely, unless you have edited some Isabelle settings.
Further sources of performance problems on Linux is graphics. Java/AWT is known to be much slower on X11 than on Windows and Mac OS X. Using the quasi-native GTK look-and-feel on Linux degrades graphics performance even further.
So I sent a friend a copy of my implementation of Conway's Game of Life. When he received it, he complained that my application [on the super-sampled grid size, with 0 delay] was barely getting 1 generation/second. I responded that on my computer, with approximately the same amount of filled grid spaces, I was getting around 38 generations/second. I couldn't attribute this disparity to different monitor sizes, as mine was more or less the same size, and the grids were therefore similarly sized. I ran my program from the jar as well, for consistency's sake.
Here's the kicker: His computer is running an AMD Phenom II X6 1090T processor # 3.2 Ghz (6-cores), with eight GB of RAM. My computer is running i7-4700MQ # 2.4 Ghz (quad-core), and 8 GB of RAM. He also has an Nvidia Ge-Force GT 440, vs. my intel integrated graphics.
It is beyond me how my computer can so profoundly outperform his, despite being inferior in every statistic. Does anyone know what could cause this? I am guessing it's something to do with the differences in processor architecture, but I'm no expert. Below is a link to the GitHub page for my project, in case you want to compile and test it yourself.
https://github.com/JoeAzar/CGOL-v1.3.2/tree/master
Does anyone know what could cause this? I am guessing it's something to do with the differences in processor architecture, but I'm no expert.
Well I'm only guessing too, but it could be things like:
Different operating systems
Different versions / releases of Java
32 bit versus 64 bit issues, at the JVM or OS level.
Different JVM parameters; e.g. those that affect heap size.
Differences in paging disk speed / latency (if the application is paging).
Other stuff running on the machine (e.g. resource hogging AV software)
It could also be processor architecture related ... as you postulated ... though I'd put that well down the list of possible reasons.
I've recently ran into a deployment issue with a call to Mac.getInstance("HmacSHA1").
It can take up to 10 minutes to execute that single call on this specific server, whilst on other machines its execution is instant.
CPU usage also spikes during the call.
Here's a bit of details on the server:
OS: CentOS 5.6 Final (kernel 2.6.35.8-16, i686);
JVM: Sun's JDK 1.6.0_25 (32bit);
CPU: Intel Core2 Duo CPU (E8400#3.00GHz);
Mem: 2GB of RAM;
Dedicated physical server.
Any clues on what might be the problem here?
I suspect you're low on system entropy for secure random numbers. See this page to check: Check available entropy in Linux. And this question has answers to consider: How to solve performance problem with Java SecureRandom? In particular this Java option should help you:
-Djava.security.egd=file:/dev/./urandom
It's much faster, but slightly less secure.
Hi I'm trying to test my JAVA app on Solaris Sparc and I'm getting some weird behavior. I'm not looking for flame wars. I just curious to know what is is happening or what is wrong...
I'm running the same JAR on Intel and on the T1000 and while on the Windows machine I'm able to get 100% (Performance monitor) cpu utilisation on the Solaris machine I can only get 25% (prstat)
The application is a custom server app I wrote that uses netty as the network framework.
On the Windows machine I'm able to reach just above 200 requests/responses a second including full business logic and access to outside 3rd parties while on the Solaris machine I get about 150 requests/responses at only 25% CPU
One could only imagine how many more requests/responses I could get out of the Sparc if I can make it uses full power.
The servers are...
Windows 2003 SP2 x64bit, 8GB, 2.39Ghz Intel 4 core
Solaris 10.5 64bit, 8GB, 1Ghz 6 core
Both using jdk 1.6u21 respectively.
Any ideas?
The T1000 uses a multi-core CPU, which means that the CPU can run multiple threads simultaneously. If the CPU is at 100% utilization, it means that all cores are running at 100%. If your application uses less threads than the number of cores, then your application cannot use all the cores, and therefore cannot use 100% of the CPU.
Without any code, it's hard to help out. Some ideas:
Profile the Java app on both systems, and see where the difference is. You might be surprised. Because the T1 CPU lacks out-of-order execution, you might see performance lacking in strange areas.
As Erick Robertson says, try bumping up the number of threads to the number of virtual cores reported via prstat, NOT the number of regular cores. The T1000 uses UltraSparc T1 processors, which make heavy use of thread-level parallelism.
Also, note that you're using the latest-gen Intel processors and old Sun ones. I highly recommend reading Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems and Maximizing Application Performance on Chip Multithreading (CMT) Architectures, both by Sun.
This is quite an old question now, but we ran across similar issues.
An important fact to notice is that SUN T1000 is based on UltraSpac T1 processor which only have 1 single FPU for 8 cores.
So if you application does a lot or even some Float-Point calculation, then this might become an issue, as the FPU will become the bottleneck.