Strange performance issues of Java VM - java

Look please at this code:
public static void main(String[] args) {
String[] array = new String[10000000];
Arrays.fill(array, "Test");
long startNoSize;
long finishNoSize;
long startSize;
long finishSize;
for (int called = 0; called < 6; called++) {
startNoSize = Calendar.getInstance().getTimeInMillis();
for (int i = 0; i < array.length; i++) {
array[i] = String.valueOf(i);
}
finishNoSize = Calendar.getInstance().getTimeInMillis();
System.out.println(finishNoSize - startNoSize);
}
System.out.println("Length saved");
int length = array.length;
for (int called = 0; called < 6; called++) {
startSize = Calendar.getInstance().getTimeInMillis();
for (int i = 0; i < length; i++) {
array[i] = String.valueOf(i);
}
finishSize = Calendar.getInstance().getTimeInMillis();
System.out.println(finishSize - startSize);
}
}
The execution result differs from run to run, but there can be observed a strange behavior:
6510
4604
8805
6070
5128
8961
Length saved
6117
5194
8814
6380
8893
3982
Generally, there are 3 result: 6 seconds, 4 seconds, 8 seconds and they iterates in the same order.
Who knows, why does it happen?
UPDATE
After some playing with -Xms and -Xmx Java VM option the next results was observed:
The minimum total memory size should be at least 1024m for this code, otherwise there will be an OutOfMemoryError. The -Xms option influences the time of execution of for block:
It flows between 10 seconds for -Xms16m and 4 seconds for -Xms256m.
The question is - why the initial available memory size affect each iteration and not only the first one ?
Thank you in advance.

Micro benchmarking in Java is not that trivial. A lot of things happen in the background when we run a java program; Garbage collection being a prime example. There also might be the case of a context switch from your Java process to another process. IMO, there is no definite explanation why there is a sequence in the seemingly random times generated.

This is not entirely unexpected. There are all sorts of factors that could be affecting your numbers.
See: How do I write a correct micro-benchmark in Java?

Related

Performance difference between Java direct array index access vs. for loop access

I was experimenting with predicates. I tried to implement the predicate for serializing issues in distributed systems. I wrote a simple example where the test function just returns true. I was measuring the overhead, and I stumbled upon this interesting problem. Accessing array in for loop is 10 times slower compared to direct access.
class Test {
public boolean test(Object o) { return true; }
}
long count = 1000000000l;
Test[] test = new Test[3];
test[0] = new Test();
test[1] = new Test();
test[2] = new Test();
long milliseconds = System.currentTimeMillis();
for(int i = 0; i < count; i++){
boolean result = true;
Object object = new Object();
for(int j = 0; j < test.length; j++){
result = result && test[j].test(object);
}
}
System.out.println((System.currentTimeMillis() - milliseconds));
However, the following code is almost 10 times faster. What can be the
reason?
milliseconds = System.currentTimeMillis();
for(int i=0 ; i < count; i++) {
Object object = new Object();
boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
}
System.out.println((System.currentTimeMillis() - milliseconds));
Benchmark results on my i5.
4567 msec for for loop access
297 msec for direct access
Due to the predictable result of test(Object o) the compiler is able to optimize the second piece of code quite effectively. The second loop in the first piece of code makes this optimization impossible.
Compare the result with the following Test class:
static class Test {
public boolean test(Object o) {
return Math.random() > 0.5;
}
}
... and the loops:
long count = 100000000l;
Test[] test = new Test[3];
test[0] = new Test();
test[1] = new Test();
test[2] = new Test();
long milliseconds = System.currentTimeMillis();
for(int i = 0; i < count; i++){
boolean result = true;
Object object = new Object();
for(int j = 0; j < test.length; j++){
result = result && test[j].test(object);
}
}
System.out.println((System.currentTimeMillis() - milliseconds));
milliseconds = System.currentTimeMillis();
for(int i=0 ; i < count; i++) {
Object object = new Object();
boolean result = test[0].test(object) && test[1].test(object) && test[2].test(object);
}
System.out.println((System.currentTimeMillis() - milliseconds));
Now both loops require almost the same time:
run:
3759
3368
BUILD SUCCESSFUL (total time: 7 seconds)
p.s.: check out this article for more about JIT compiler optimizations.
You are committing almost every basic mistake you can make with a microbenchmark.
You don't ensure code cannot be optimized away by making sure to actually use the calculations result.
Your two code branches have subtly but decidedly different logic (as pointed out variant two will always short-circuit). The second case is easier to optimize for the JIT due to test() returning a constant.
You did not warm up the code, inviting JIT optimization time being included somewhere into the execution time
Your testing code is not accounting for execution order of test cases exerting an influence on the test results. Its not fair to run case 1, then case 2 with the same data and objects. The JIT will by the time case 2 runs have optimized the test method and collected runtime statistics about its behavior (at the expense of case 1's execution time).
If loop header takes one unit time to execute the in first solution loop header evaluations takes 3N units of time. While in direct access it takes N.
Other than loop header overhead in first solution 3 && conditions per iteration to evaluate while in second there are only 2.
And last but not the least Boolean short-circuit evaluation which causes your second, faster example, to stop testing the condition "prematurely", i.e. the entire result evaluates to false if first && condition results false.

Working of size() method of list in java

I have 2 pieces of code and first part is here
int count = myArrayList.size();
for (int a =0; a< count; a++) {
//any calculation
}
Second part of the code is
for (int a =0; a< myArrayList.size(); a++) {
//any calculation
}
in both piece I am iterating over myArrayList (this is ArrayList) size but in first part I am calculating size then iterating it means the method size is being called only once but on the other hand in second part whenever it iterates it calculates the size first then then check for size in less than or not.Isn't it long process ?I have seen in many examples in many places (which calculate size on every iteration).
My questions:
Isn't it long process? (talking about second part)
what is best practice first or second?
which is efficient way to perform iteration?
myArrayList.size() how this size method works or calculates the size?
EDITION:
For testing the same thing I wrote programs and calculated the time the code is
ArrayList<Integer> myArrayList = new ArrayList<>();
for (int a =0; a<1000; a++) {
myArrayList.add(a);
}
long startTime = System.nanoTime();
for (int a =0; a< myArrayList.size(); a++) {
//any calculation
}
long lastTime = System.nanoTime();
long result = lastTime - startTime;
and the result is = 34490 nano seconds
on the other hand
ArrayList<Integer> myArrayList = new ArrayList<>();
for (int a =0; a<1000; a++) {
myArrayList.add(a);
}
long startTIme = System.nanoTime();
int count = myArrayList.size();
for (int a =0; a< count; a++) {
}
long endTime = System.nanoTime();
long result = endTime - startTIme;
and the result is = 11394 nano seconds
here when calling size() method in every iteration taking much time then without calling it every call.Is this the right way to check the time calculation?
No. The call is not a "long running" process, the JVM can make function calls quickly.
Either is acceptable. Prefer the one that's easier to read. Adding a local reference with a meaningful name can make something easier to read.
You might prefer the for-each loop, but for readability1. There is no appreciable efficiency difference in your options (or with the for-each).
The ArrayList implementation keeps an internal count (in the OpenJDK implementation, and probably others, that is size) and manages the internal array that backs the List.
1See also The Developer Insight Series, Part 1: Write Dumb Code

All java threads are running on a single core, ultimately taking too much time to execute

This is my homework problem:
I have to do matrix multiplication. My code should create a thread to each element in the resultant matrix. i.e., if resultant matrix is mXn size then there should be m*n threads.
(http://iit.qau.edu.pk/books/OS_8th_Edition.pdf page 179)
Yes I finished it by my own. Which is executing fine with all my test cases. Unfortunately, I end up getting 70% credit :(
This is my code and test cases.
Matrix Multiplication.zip
When I met my professor regarding my marks. He told me that my code taking too long to execute larger size matrix.
I argued with him that it is a expected behavior as it is obvious that large size data takes more time. However, he disagree with me.
I attached my code and test cases . My code is taking 3 hours. As per my professor it should only take 5 min to execute.
I tried to figured out for last couple of days but I couldn't find exact solution :(
outline of my code
ExecutorService executor = Executors
.newFixedThreadPool(threadCount); // creating thread pool
// with fixed threads
int mRRowLen = matrix1.length; // m result rows length
int mRColLen = matrix2[0].length; // m result columns length
mResult = new long[mRRowLen][mRColLen];
for (int i = 0; i < mRRowLen; i++) { // rows from m1
for (int j = 0; j < mRColLen; j++) { // columns from m2
Runnable r = new MatrixMultipliactionThread(matrix1ColLen, i, j, matrix1,
matrix2);
executor.execute(r);
}
}
executor.shutdown();
while (!executor.isTerminated()) {
// wait untill it get shutdown
}
Run method :
public void run() {
for (int k = 0; k < m1ColLength; k++) { // columns from m1
Matrix.mResult[i][j] += matrix1[i][k] * matrix2[k][j];
}
Thanks in Advance
Ok, I downloaded your zip and ran the program. Your problem isn't in the matrix multiplication at all. The advice in the comments is still valid about reducing the number of threads, however as it stands the multiplication happens very quickly.
The actual problem is in your writeToAFile method - all the single-threaded CPU utilization you are seeing is actually happening in there, after the multiplication is already complete.
The way you're appending your strings:
fileOutputString = fileOutputString + resultMatrix[i][j]
creates thousands of new String objects which then immediately become garbage; this is very inefficient. You should be using a StringBuilder instead, something like this:
StringBuilder sb=new StringBuilder();
for (int i = 0; i < resultMatrix.length; i++) {
for (int j = 0; j < resultMatrix[i].length; j++) {
sb.append(resultMatrix[i][j]);
if (j != resultMatrix[i].length - 1) sb.append(",");
}
sb.append("\n");
}
String fileOutputString=sb.toString();
That code executes in a fraction of a second.

getting execution time using getCurrentThreadUserTime()

I'm trying to measure the execution time of a loop, which is a simple Add Matrices.
here's my code:
//get integers m and n from user before this.
long start,end,time;
int[][] a = new int[m][n];
int[][] b = new int[m][n];
int[][] c= new int[m][n];
start = getUserTime();
for(int i = 0;i < m;i++)
{
for(int j = 0;j < n;j++)
{
c[i][j] = a[i][j]+b[i][j];
}
}
end = getUserTime();
time = end - start;
/** Get user time in nanoseconds. */
public long getUserTime() {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadUserTime() : 0L;
}
the problem is, sometimes it returns 0, for example when I input 1000 as m and n. which means I have two 1000x1000 matrices being added. sometimes it returns 0 and sometimes 15ms (both keep getting repeated).
I don't know whether to believe 15ms or 0. and there is a big difference between them.
I know the accuracy is OS dependent and not really nanoseconds accurate but 15miliseconds is way off to be an accuracy problem.
EDIT: the very goal of this code was to measure CPU performance on the loop. so if possible I want the effect of Compiler optimization and OS context switching etc to be minimal.
many thanks.
You should use System.nanoTime(). (API Here)
From the documentation:
This method can only be used to measure elapsed time and is not
related to any other notion of system or wall-clock time. The value
returned represents nanoseconds since some fixed but arbitrary origin
time (perhaps in the future, so values may be negative). The same
origin is used by all invocations of this method in an instance of a
Java virtual machine; other virtual machine instances are likely to
use a different origin.
So the nanoTime() is fine for use to measure your execution time because the measurement will always be the same and it will use nanoseconds.
Set the start time to the current nano time.
start = System.nanoTime();
At the end of the loop set the end time to the current nano time
end = System.nanoTime();
To find the difference, which is the time it took to execute, just subtract like you do.
To make it easy, you can just change getUserTime() to return System.nano()
Example:
//get integers m and n from user before this.
long start,end,time;
int[][] a = new int[m][n];
int[][] b = new int[m][n];
int[][] c= new int[m][n];
start = getUserTime();
for(int i = 0;i < m;i++)
{
for(int j = 0;j < n;j++)
{
c[i][j] = a[i][j]+b[i][j];
}
}
end = getUserTime();
// You could use Math.abs() here to handle the situation where
// the values could be negative
time = end - start;
/** Get user time in nanoseconds. */
public long getUserTime() {
return System.nanoTime()
}

why it's so different between "long var = System.nanoTime()" and "System.nanoTime()" on time used?

I make a benchmark like this :
for (int i = 0; i < 1000 * 1000; ++i) {
long var = System.nanoTime();
}
it takes 41 ms in my computer with jdk6.0
the follow code only costs 1 ms!!!
for (int i = 0; i < 1000 * 1000; ++i) {
System.nanoTime();
}
I think may be that time is costed by long var , so I make a test like this :
for (int i = 0; i < 1000 * 1000; ++i) {
long var = i;
}
it only costs 1 ms!!!
So,Why the first code block is so slow?
I'm a Chinese. Sorry for my poor English!
It really depends on how you run your benchmark. You most likely get <1ms runs because the JVM is not really running your code: it has determined that the code is not used and skips it:
for (int i = 0; i < 1000 * 1000; ++i) {
long var = i;
}
is equivalent to
//for (int i = 0; i < 1000 * 1000; ++i) {
// long var = i;
//}
and the JVM is probably running the second version.
You should read how to write a micro benchmark in Java or use a benchmarking library such as caliper.
It takes time for the JIT to detect your code doesn't do anything useful. The more complicated the code, the longer it takes (or it might not detect it at all)
In the second and third cases, it can replace the code with nothing (I suspect you can make it 100x longer and it won't run any longer)
Another possibility is you are running all three tests in the same method. When the first loop iterates more than 10,000 times, the whole method is compiled in the background such that when the second and third loops run they have been removed.
A simple way to test this is to change the order of the loops or better to place each loop in its own method to stop this.

Categories