Hi I'm trying to test my JAVA app on Solaris Sparc and I'm getting some weird behavior. I'm not looking for flame wars. I just curious to know what is is happening or what is wrong...
I'm running the same JAR on Intel and on the T1000 and while on the Windows machine I'm able to get 100% (Performance monitor) cpu utilisation on the Solaris machine I can only get 25% (prstat)
The application is a custom server app I wrote that uses netty as the network framework.
On the Windows machine I'm able to reach just above 200 requests/responses a second including full business logic and access to outside 3rd parties while on the Solaris machine I get about 150 requests/responses at only 25% CPU
One could only imagine how many more requests/responses I could get out of the Sparc if I can make it uses full power.
The servers are...
Windows 2003 SP2 x64bit, 8GB, 2.39Ghz Intel 4 core
Solaris 10.5 64bit, 8GB, 1Ghz 6 core
Both using jdk 1.6u21 respectively.
Any ideas?
The T1000 uses a multi-core CPU, which means that the CPU can run multiple threads simultaneously. If the CPU is at 100% utilization, it means that all cores are running at 100%. If your application uses less threads than the number of cores, then your application cannot use all the cores, and therefore cannot use 100% of the CPU.
Without any code, it's hard to help out. Some ideas:
Profile the Java app on both systems, and see where the difference is. You might be surprised. Because the T1 CPU lacks out-of-order execution, you might see performance lacking in strange areas.
As Erick Robertson says, try bumping up the number of threads to the number of virtual cores reported via prstat, NOT the number of regular cores. The T1000 uses UltraSparc T1 processors, which make heavy use of thread-level parallelism.
Also, note that you're using the latest-gen Intel processors and old Sun ones. I highly recommend reading Developing and Tuning Applications on UltraSPARC T1 Chip Multithreading Systems and Maximizing Application Performance on Chip Multithreading (CMT) Architectures, both by Sun.
This is quite an old question now, but we ran across similar issues.
An important fact to notice is that SUN T1000 is based on UltraSpac T1 processor which only have 1 single FPU for 8 cores.
So if you application does a lot or even some Float-Point calculation, then this might become an issue, as the FPU will become the bottleneck.
Related
Intro
So far, I have been working on a piece of software which I am now testing to see the benefit of concurrency. I am testing the same software using two different systems:
System 1: 2 x Intel(R) Xeon(R) CPU E5-2665 # 2.40GHz with a
total of 16 cores , 64GB of RAM running on
Scientific LINUX 6.1 and JAVA SE Runtime Enviroment (build
1.7.0_11-b21).
System 2 Lenovo Thinkpad T410 with Intel i5
processor # 2.67GHz with 4 cores, 4GB of ram running windows 7 64-bit
and JAVA SE Runtime Enviroment (build 1.7.0_11-b21).
Details: The program simulates patients with type 1 diabetes. It does some import (read from csv), some numerical computations(Dopri54 + newton) and some export (Write to csv).
I have exclusive rights to the server, so there should be no noise at all.
Results
These are my results:
Now as you can see system 1 is just as fast as system 2 despite it is a pretty powerfull machine. I have no idea why this is the case - and I am confident that the system is the same. The number of threads goes from 10-100.
Question:
Why would does the two runs have similar execution time despite system 1 being significantly more powerfull than system 2?
UPDATE!
Now, I just thought a bit about what you guys said about it being an I/O memory issue. So, I thought that if I could reduce the file size it would speed up the program, right? I managed to reduce the import file size with a factor of 5, however, no performance improvement at all. Do you guys still think it is the same problem?
As you write .csv files, it is possible that the bottleneck is not your camputation power, but the writing rate on your hard disk.
Almost certainly this means that either CPU time is not the bottleneck for this application, or that something about it is making it resistant to effective parallelization, or both.
For example if reading the data from disk is actually the limiting factor then faster disks are what matters, not faster processors.
If it's running out of memory then that will be a bigger bottlneck.
If it takes more time to spawn each thread than the actual processing inside the thread.
etc.
In this sort of optimization work metrics are king. You need real hard solid numbers for how long things are taking, and where in your program you are losing that time. Only then can you see where to focus your efforts and see if they are effective.
I have developed a Java application that normally run on Linux. It's a POJO application with Swing. Performance is reasonably good.
Now I tried to run it on Windows XP with 2Gb RAM, in a machine with similar or greater potency, and performance is much worse. I observe that it uses 100% CPU.
For example:
A process that creates a window very heavy, with many components: Linux 5 seconds, Windows 12.
A process that accesses a PostgreSQL DB with a heavy query (the server and the JDBC driver are the same): 23 seconds Linux, Windows 43.
I tried also with a virtualized Windows machine with similar features, and the result is significantly better!
Is it normal? What parameters can I assign to improve performance?
Unless you are comparing Linux and Windows XP on the same machine it is very hard to say what the difference is. It could be that while the CPU is faster, the GFX card and disk subsystem is slower.
Java passes all of this IO and GFX acitvity to the underlying OS and the only thing you can do differently is to do less work or work more efficiently. This is likely to make both systems faster, as there is not particular to one OS which you can tune.
Try running Java Visual VM (which is distributed as part of the JDK): attach to your application, then use the CPU Profiler to determine precisely where all that CPU time is going.
There may be subtle differences in the behavior of JRE parts (Swing comes to mind), where the JRE responds very unforgiving to a bad practice (like doing thing from the wrong thread in Swing).
Since you have no clues, I would try profiling the same use case in both environments and see if any significant differences turn up where the time is spent. This will hopefully reveal a hint.
Edit: And ensure that you do not run Windows with brakes on (aka. Antivirus and other 'useful' software that can kill system performance).
On many forums I found that people use Solaris for their Java applications.
I interested what are the main advantages of such combination?
My first assumption is that Solaris is very fast.
I also found out that on Solaris it is possible to match one-to-one java threads with kernel threads - as I understand it results in again very fast thread creation.
Please correct me if I'm wrong and are there any other main points?
What Solaris gives you (as its Software not hardware) over Linux or Windows is greater system manageability and low level tracing like DTrace.
What you appear to be asking about is having more threads running concurrently which is a feature of the hardware. If you run Solaris x86 or Linux or Window on the same hardware you will have the same number of logical threads. However if you run Solaris on some SPARC processors which have lots of logical threads (32 or more) running concurrently which reduces overhead if you have a need for that many threads.
The http://en.wikipedia.org/wiki/SPARC_T3 process supports up to 512 logical threads across 16 cores. This can really improve performance where you have a need for so many threads, e.g. using many blocking IO connections.
However if you need only one to six critical threads (and a bunch of non-critcal threads) a plain x64 processor will be much faster, and cheaper. (As it is designed to handle less threads faster and are mass produced on a larger scale)
We use Solaris for java applications at my workplace. I do not know about any exact performance advantage, but the reasons we decided to use Solaris were:
Solaris Service Management Facility (http://www.oreillynet.com/pub/a/sysadmin/2006/04/13/using-solaris-smf.html )
Ability to copy the entire zone backup to another box in case of HW failure.
We run application servers such as Weblogic, and it helps that SMF starts them back up if they crash for any reason. Also, we do backup of our zones at regular intervals, and from what I hear- the zone can be moved to another machine in case of HW failure, and the application back to normal.
Currently we have developed application using Java 6 based on windows 32bit(Dual core & 3G Ram).
If we install into 64bit windows OS, does it will perform better because of the resources advantage that having in the 64bit(Same OS diff. bit)? The 64bit machine is having Quad core processor and Ram more than 4g. Is the any different for JVM between 32bit vs 64bit.
Thank you in advance for your feedback.
Extra info
I am doing Security Information Event management Sys.(SIEM) - log management.
We have 4 important parts ,
Collector -to collect logs from devices/system,
Aggregator -To aggregate the syslog to be meta data for reporting,
Real Time Monitoring-To display realtime analisys report/charts and dashboard that must run every second
GUI - Struts2 apps. that runs the web GUI, log analytics, backup and other things
So far the most resources cpu and memory are used by 1-Collector, 2-RealTime, 3-Aggregator.
Right now in 32bit, collector can recieved up to 2000logs per seconds. If more than that it will crash to memory heap. So we used tanuki software to auto restart back the collector service. We use the Tanuki to split the memory usage and auto restart once detected memory heap.
Our objective is to increase event per second from 2000logs to maximum if possible by using 64bit advantages.
For the GC we let the Java handle automatically, more important we can process the more logs in 1 second without any problem.
Switching to a 64-bit JVM doesn't guarantee any performance differences. You will, however see a huge difference in the amount of RAM that can be allocated. On 32-bit Windows, the maximum amount of RAM that could be allocated for the heap maxed out at around 1.6 GB.
If you see a lot of swapping with your application on the 32-bit machine, then switching to the 64-bit machine and adding sufficient RAM is likely to improve your performance. You might also be able to make design choices that favor faster, but more memory hungry algorithms where such choices exist.
As of this writing, you will probably not see significant difference between running your app on a 32-bit JVM and a 64-bit JVM on the exact same hardware. Eventually, support for 32-bit operating systems and JVMs will probably be discontinued, but that's a different concern than performance.
I strongly recommend you start out by profiling your app first to see where your performance hot spots are.
It's a common misconception that 64-bit automatically means better performance than 32-bit. See e.g. this JVM faq and this MS Windows 7 FAQ.
It really depends on the nature of your application and where your performance bottlenecks are.
If you have relatively un-tuned garbage collection, and your application is latency sensitive (i.e. must respond to a user request such as an http request quickly), adding more memory can actually worsen your GC pauses.
Is your application multi-threaded, as most web servers are? If so, going from 2 to 4 cores will very likely help if you don't have significant locking / contention issues.
If you look into GC tuning, you might want to try parallel GC on the 4 core cpu. This can significantly reduce GC pause times while incurring some extra overhead. For a latency sensitive app I worked on this was definitely worth it.
Please feel free to reply with more info - we could use some context on your app, it's workload, in-memory working set, etc.
I made the observation that my java application is running much faster when executed on an AMD processor in contrast to an Intel CPU.
For example my JBoss starts in about 30 seconds on a 3 GHz AMD processor and needs about 60 seconds on a 3 GHz Intel processor with identical disc, RAM and OS?
Has anyone else made this observation? Why is this so?
It depends on the CPU generation as well - clock speed is not everything.
If you set up e.g. an Intel Pentium 4 and an AMD Phenom with the same clock speed, you'll see a large difference in favour of the Phenom.
Update: If you're really curious, use a profiler and post the results.
Other considerations:
Size of processor on-board cache
Bus speed of your motherboard
Cache size of your hard drive
Hard drive RPM and read speed
Bottom line: Unless your configurations are identical besides the chips, and you are trying to asses the performance of a particular technology, you're really comparing apples to oranges.
Are they both running the same architecture? Or is the AMD running a 64-bit OS?
Remember that startup time isn't everything; a 60s startup time probably isn't that bad if the application runs as fast AFTER it's started up.
I've seen 64-bit JDK work much faster than 32-bit one on the same processor. So maybe that's the case.
EDIT: http://java.sun.com/docs/hotspot/HotSpotFAQ.html#64bit_performance. Sorry, I guess I'm wrong.