Inline Code is Slower Than Function Calls / Static Functions in Java - java

I've been running some tests to see how inlining function code (explicitly writing function algorithms in the code itself) affects performance. I wrote a simple byte array to integer code and then wrapped it in a function, called it statically from another class, and called it statically from the class itself. The code is as follows:
public class FunctionCallSpeed {
public static final int numIter = 50000000;
public static void main (String [] args) {
byte [] n = new byte[4];
long start;
System.out.println("Function from Static Class =================");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
StaticClass.toInt(n);
}
System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
System.out.println("Function from Class ========================");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
toInt(n);
}
System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
int actual = 0;
int len = n.length;
System.out.println("Inline Function ============================");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
for (int j = 0; j < len; j++) {
actual += n[len - 1 - j] << 8 * j;
}
}
System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
}
public static int toInt(byte [] num) {
int actual = 0;
int len = num.length;
for (int i = 0; i < len; i++) {
actual += num[len - 1 - i] << 8 * i;
}
return actual;
}
}
The results are as follows:
Function from Static Class =================
Elapsed time: 0.096559931s
Function from Class ========================
Elapsed time: 0.015741711s
Inline Function ============================
Elapsed time: 0.837626286s
Is there something weird going on with the bytecode? I've looked at the bytecode myself, but I'm not very familiar and I can't make heads or tails of it.
EDIT
I added assert statements to read the outputs and then randomized the bytes read and the benchmark now behaves the way I thought it would. Thanks to Tomasz Nurkiewicz, who pointed me to the microbenchmark article. The resulting code is thus:
public class FunctionCallSpeed {
public static final int numIter = 50000000;
public static void main (String [] args) {
byte [] n;
long start, end;
int checker, calc;
end = 0;
System.out.println("Function from Object =================");
for (int i = 0; i < numIter; i++) {
checker = (int)(Math.random() * 65535);
n = toByte(checker);
start = System.nanoTime();
calc = StaticClass.toInt(n);
end += System.nanoTime() - start;
assert calc == checker;
}
System.out.println("Elapsed time: " + (double)end / 1000000000 + "s");
end = 0;
System.out.println("Function from Class ==================");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
checker = (int)(Math.random() * 65535);
n = toByte(checker);
start = System.nanoTime();
calc = toInt(n);
end += System.nanoTime() - start;
assert calc == checker;
}
System.out.println("Elapsed time: " + (double)end / 1000000000 + "s");
int len = 4;
end = 0;
System.out.println("Inline Function ======================");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
calc = 0;
checker = (int)(Math.random() * 65535);
n = toByte(checker);
start = System.nanoTime();
for (int j = 0; j < len; j++) {
calc += n[len - 1 - j] << 8 * j;
}
end += System.nanoTime() - start;
assert calc == checker;
}
System.out.println("Elapsed time: " + (double)(System.nanoTime() - start) / 1000000000 + "s");
}
public static byte [] toByte(int val) {
byte [] n = new byte[4];
for (int i = 0; i < 4; i++) {
n[i] = (byte)((val >> 8 * i) & 0xFF);
}
return n;
}
public static int toInt(byte [] num) {
int actual = 0;
int len = num.length;
for (int i = 0; i < len; i++) {
actual += num[len - 1 - i] << 8 * i;
}
return actual;
}
}
Results:
Function from Static Class =================
Elapsed time: 9.276437031s
Function from Class ========================
Elapsed time: 9.225660708s
Inline Function ============================
Elapsed time: 5.9512E-5s

It's always hard to make a guarantee of what the JIT is doing, but if I had to guess, it noticed the return value of the function was never being used, and optimized a lot of it out.
If you actually use the return value of your function I bet it changes the speed.

I ported your test case to caliper:
import com.google.caliper.SimpleBenchmark;
public class ToInt extends SimpleBenchmark {
private byte[] n;
private int total;
#Override
protected void setUp() throws Exception {
n = new byte[4];
}
public int timeStaticClass(int reps) {
for (int i = 0; i < reps; i++) {
total += StaticClass.toInt(n);
}
return total;
}
public int timeFromClass(int reps) {
for (int i = 0; i < reps; i++) {
total += toInt(n);
}
return total;
}
public int timeInline(int reps) {
for (int i = 0; i < reps; i++) {
int actual = 0;
int len = n.length;
for (int i1 = 0; i1 < len; i1++) {
actual += n[len - 1 - i1] << 8 * i1;
}
total += actual;
}
return total;
}
public static int toInt(byte[] num) {
int actual = 0;
int len = num.length;
for (int i = 0; i < len; i++) {
actual += num[len - 1 - i] << 8 * i;
}
return actual;
}
}
class StaticClass {
public static int toInt(byte[] num) {
int actual = 0;
int len = num.length;
for (int i = 0; i < len; i++) {
actual += num[len - 1 - i] << 8 * i;
}
return actual;
}
}
And indeed seems like inlined version is the slowest while two static versions are almost the same (as expected):
The reasons are hard to imagine. I can think of two factors:
JVM is better in performing micro-optimizations when code blocks are as small and simple to reason about as possible. When the function is inlined, the whole code becomes more complex and JVM gives up. With smaller toInt() function it JIT is more clever
cache locality - somehow JVM performs better with two small chunks of code (loop and method) rather than one bigger

You have several problems, but the main one is that you are testing one iteration of one optimised code. That is sure to give you mixed results. I suggest running the test for 2 seconds, ignoring the first 10,000 iterations or so.
If the result of a loop is not kept, the entire loop can be discarded after some random interval.
Breaking each test into a separate method
public class FunctionCallSpeed {
public static final int numIter = 50000000;
private static int dontOptimiseAway;
public static void main(String[] args) {
byte[] n = new byte[4];
for (int i = 0; i < 10; i++) {
test1(n);
test2(n);
test3(n);
System.out.println();
}
}
private static void test1(byte[] n) {
System.out.print("from Static Class: ");
long start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
dontOptimiseAway = FunctionCallSpeed.toInt(n);
}
System.out.print((System.nanoTime() - start) / numIter + "ns ");
}
private static void test2(byte[] n) {
long start;
System.out.print("from Class: ");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
dontOptimiseAway = toInt(n);
}
System.out.print((System.nanoTime() - start) / numIter + "ns ");
}
private static void test3(byte[] n) {
long start;
int actual = 0;
int len = n.length;
System.out.print("Inlined: ");
start = System.nanoTime();
for (int i = 0; i < numIter; i++) {
for (int j = 0; j < len; j++) {
actual += n[len - 1 - j] << 8 * j;
}
dontOptimiseAway = actual;
}
System.out.print((System.nanoTime() - start) / numIter + "ns ");
}
public static int toInt(byte[] num) {
int actual = 0;
int len = num.length;
for (int i = 0; i < len; i++) {
actual += num[len - 1 - i] << 8 * i;
}
return actual;
}
}
prints
from Class: 7ns Inlined: 11ns from Static Class: 9ns
from Class: 6ns Inlined: 8ns from Static Class: 8ns
from Class: 6ns Inlined: 9ns from Static Class: 6ns
This suggest that when the inner loop is optimised separately it is slightly more efficient.
However if I use an optimised conversion of bytes to int
public static int toInt(byte[] num) {
return num[0] + (num[1] << 8) + (num[2] << 16) + (num[3] << 24);
}
all the tests report
from Static Class: 0ns from Class: 0ns Inlined: 0ns
from Static Class: 0ns from Class: 0ns Inlined: 0ns
from Static Class: 0ns from Class: 0ns Inlined: 0ns
as its realised the test doesn't do anything useful. ;)

Your test is flawed. The second test is having the benefit of the first test already being run. You need to run each test case in its own JVM invocation.

Related

Why is running time of the same algorithm different in doubling ratio test?

I am analyzing brute force Three Sum algorithm. Let's say the running time of this algorithm is T(N)=aN^3. What I am doing is that I am running this ThreeSum.java program with 8Kints.txt and using that running time to calculate constant a. After calculating a I am guessing what the running time of 16Kints.txt is. Here is my ThreeSum.java file:
public class ThreeSum {
public static int count(int[] a) {
// Count triples that sum to 0.
int N = a.length;
int cnt = 0;
for (int i = 0; i < N; i++)
for (int j = i + 1; j < N; j++)
for (int k = j + 1; k < N; k++)
if (a[i] + a[j] + a[k] == 0)
cnt++;
return cnt;
}
public static void main(String[] args) {
In in = new In(args[0]);
int[] a = in.readAllInts();
Stopwatch timer = new Stopwatch();
int count = count(a);
StdOut.println("elapsed time = " + timer.elapsedTime());
StdOut.println(count);
}
}
When I run like this:
$ java ThreeSum 8Kints.txt
I get this:
elapsed time = 165.335
And now in doubling ratio experiment where I use the same method inside another client and run this client with multiple files as arguments and wanna try to compare the running time of 8Kints.txt with above method but I get different result actually faster result. Here is my DoublingRatio.java client:
public class DoublingRatio {
public static double timeTrial(int[] a) {
Stopwatch timer = new Stopwatch();
int cnt = ThreeSum.count(a);
return timer.elapsedTime();
}
public static void main(String[] args) {
In in;
int[][] inputs = new int[args.length][];
for (int i = 0; i < args.length; i++) {
in = new In(args[i]);
inputs[i] = in.readAllInts();
}
double prev = timeTrial(inputs[0]);
for (int i = 1; i < args.length; i++) {
double time = timeTrial(inputs[i]);
StdOut.printf("%6d %7.3f ", inputs[i].length, time);
StdOut.printf("%5.1f\n", time / prev);
prev = time;
}
}
}
When I run this like:
$ java DoublingRatio 1Kints.txt 2Kints.txt 4Kints.txt 8Kints.txt 16Kints.txt 32Kints.txt
I get faster reuslt and I wonder why:
N sec ratio
2000 2.631 7.8
4000 4.467 1.7
8000 34.626 7.8
I know it is something that has to do with Java not the algorithm? Does java optimizes some things under the hood.

Rolling hash: my codes fails with modulo for long string

I'm trying to solve https://leetcode.com/problems/longest-repeating-substring/
I want to use rolling hash to match strings.
However, my codes don't seem to work when I deal with modulo.
For a string with all same characters, the maximum length of repeating substring should be string.length - 1.
public class Main {
public static void main(String[] args) {
String str = "bbbbbbbbbbbbbbbbbbb";
System.out.println(str.length() - 1);
Solution s = new Solution();
System.out.println(s.longestRepeatingSubstring(str));
}
}
class Solution {
public int longestRepeatingSubstring(String S) {
HashSet<Long> h = new HashSet();
long mod = (long)1e7 + 7;
for(int i = S.length() - 1; i >0; i--){
h = new HashSet();
long c = 0;
int j = 0;
for(; j < i; j ++){
c = (c*26 % mod + S.charAt(j) - 'a')% mod;
}
h.add(c);
for(; j < S.length(); j++){
c -= (S.charAt(j - i ) - 'a') * Math.pow(26,i-1)% mod;
c = (c*26 % mod + S.charAt(j) - 'a')% mod;
if(h.contains(c)){
return i;
}
h.add(c);
}
}
return 0;
}
}
Playground for my codes: https://leetcode.com/playground/F4HkxbFQ
We cannot see your original link, we need a password.
The usage of modulo seems to be really complex.
Why not try something like this
class Scratch {
// "static void main" must be defined in a public class.
public static void main(String[] args) {
String str = "bbaaabbbbccbbbbbbzzzbbbbb";
System.out.println(str.length() - 1);
Solution s = new Solution();
System.out.println(s.longestRepeatingSubstring(str));
}
static class Solution {
public int longestRepeatingSubstring(String s) {
int max = -1;
int currentLength = 1;
char[] array = s.toCharArray();
for (int index = 1; index < array.length; index++) {
if (array[index - 1] == array[index]) {
currentLength++;
max = Math.max(max, currentLength);
} else {
currentLength = 1;
}
}
return max;
}
}
}

Why does reducing the number of loops not speed up the program?

I have a program that does a lot of matrix multiplication. I thought I'd speed it up by reducing the number of loops in the code to see how much faster it would be (I'll try a matrix math library later). It turns out it's not faster at all. I've been able to replicate the problem with some example code. My guess was that testOne() would be faster than testTwo() because it doesn't create any new arrays and because it has a third as many loops. On my machine, its takes twice as long to run:
Duration for testOne with 5000 epochs: 657, loopCount: 64000000
Duration for testTwo with 5000 epochs: 365, loopCount: 192000000
My guess is that multOne() is slower than multTwo() because in multOne() the CPU is not writing to sequential memory addresses like it is in multTwo(). Does that sound right? Any explanations would be appreciated.
import java.util.Random;
public class ArrayTest {
double[] arrayOne;
double[] arrayTwo;
double[] arrayThree;
double[][] matrix;
double[] input;
int loopCount;
int rows;
int columns;
public ArrayTest(int rows, int columns) {
this.rows = rows;
this.columns = columns;
this.loopCount = 0;
arrayOne = new double[rows];
arrayTwo = new double[rows];
arrayThree = new double[rows];
matrix = new double[rows][columns];
Random random = new Random();
for (int i = 0; i < rows; i++) {
for (int j = 0; j < columns; j++) {
matrix[i][j] = random.nextDouble();
}
}
}
public void testOne(double[] input, int epochs) {
this.input = input;
this.loopCount = 0;
long start = System.currentTimeMillis();
long duration;
for (int i = 0; i < epochs; i++) {
multOne();
}
duration = System.currentTimeMillis() - start;
System.out.println("Duration for testOne with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
}
public void multOne() {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < columns; j++) {
arrayOne[i] += matrix[i][j] * arrayOne[i] * input[j];
arrayTwo[i] += matrix[i][j] * arrayTwo[i] * input[j];
arrayThree[i] += matrix[i][j] * arrayThree[i] * input[j];
loopCount++;
}
}
}
public void testTwo(double[] input, int epochs) {
this.loopCount = 0;
long start = System.currentTimeMillis();
long duration;
for (int i = 0; i < epochs; i++) {
arrayOne = multTwo(matrix, arrayOne, input);
arrayTwo = multTwo(matrix, arrayTwo, input);
arrayThree = multTwo(matrix, arrayThree, input);
}
duration = System.currentTimeMillis() - start;
System.out.println("Duration for testTwo with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
}
public double[] multTwo(double[][] matrix, double[] array, double[] input) {
double[] newArray = new double[rows];
for (int i = 0; i < rows; i++) {
for (int j = 0; j < columns; j++) {
newArray[i] += matrix[i][j] * array[i] * input[j];
loopCount++;
}
}
return newArray;
}
public static void main(String[] args) {
int rows = 100;
int columns = 128;
ArrayTest arrayTest = new ArrayTest(rows, columns);
Random random = new Random();
double[] input = new double[columns];
for (int i = 0; i < columns; i++) {
input[i] = random.nextDouble();
}
arrayTest.testOne(input, 5000);
arrayTest.testTwo(input, 5000);
}
}
There is a simple reason why your tests take different time: they don't do the same thing. Since the two loops you compare are not functionally identical, the number of iterations is not a good metric to look at.
testOne takes longer than testTwo because:
In multOne you update arrayOne[i] in place, during each iteration
of the j loop. This means for each iteration of the inner loop j
you are using a new value of arrayOne[i], computed in the
previous iteration. This creates a loop carried dependency, which is
harder to optimise for the compiler, because you require the output
of the operation matrix[i][j] * arrayOne[i] * input[j] on the next
CPU clock cycle. This is not really possible with floating point
operations, which have a latency of a few clock cycles usually, so
it results in stalls, therefore reduced performance.
In testTwo you
update arrayOne only once per each iteration of the epoch, and
since there are no carried dependecies, the loop can be vectorised
efficiently, which results in better cache and arithmetic
performance.

Java 2D array fill - innocent optimization caused terrible slowdown

I've tried to optimize a filling of square two-dimensional Java array with sums of indices at each element by computing each sum once for two elements, opposite relative to the main diagonal. But instead of speedup or, at least, comparable performance, I've got 23(!) times slower code.
My code:
#State(Scope.Benchmark)
#BenchmarkMode(Mode.AverageTime)
#OperationsPerInvocation(ArrayFill.N * ArrayFill.N)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ArrayFill {
public static final int N = 8189;
public int[][] g;
#Setup
public void setup() { g = new int[N][N]; }
#GenerateMicroBenchmark
public int simple(ArrayFill state) {
int[][] g = state.g;
for(int i = 0; i < g.length; i++) {
for(int j = 0; j < g[i].length; j++) {
g[i][j] = i + j;
}
}
return g[g.length - 1][g[g.length - 1].length - 1];
}
#GenerateMicroBenchmark
public int optimized(ArrayFill state) {
int[][] g = state.g;
for(int i = 0; i < g.length; i++) {
for(int j = 0; j <= i; j++) {
g[j][i] = g[i][j] = i + j;
}
}
return g[g.length - 1][g[g.length - 1].length - 1];
}
}
Benchmark results:
Benchmark Mode Mean Mean error Units
ArrayFill.simple avgt 0.907 0.008 ns/op
ArrayFill.optimized avgt 21.188 0.049 ns/op
The question:
How could so tremendous performance drop be explained?
P. S. Java version is 1.8.0-ea-b124, 64-bit 3.2 GHz AMD processor, benchmarks were executed in a single thread.
A side note: Your "optimized" version mightn't be faster at all, even when we leave all possible problems aside. There are multiple resources in a modern CPU and saturating one of them may stop you from any improvements. What I mean: The speed may be memory-bound, and trying to write twice as fast may in one iteration may change nothing at all.
I can see three possible reasons:
Your access pattern may enforce bound checks. In the "simple" loop they can be obviously eliminated, in the "optimized" only if the array is a square. It is, but this information is available only outside of the method (moreover a different piece of code could change it!).
The memory locality in your "optimized" loop is bad. It accesses essentially random memory locations as there's nothing like a 2D array in Java (only an array of arrays for which new int[N][N] is a shortcut). When iterating column-wise, you use only a single int from each loaded cacheline, i.e., 4 bytes out of 64.
The memory prefetcher can have a problem with your access pattern. The array with its 8189 * 8189 * 4 bytes is too big to fit in any cache. Modern CPUs have a prefetcher allowing to load a cache line during in advance, when it spots a regular access pattern. The capabilities of the prefetchers vary a lot. This might be irrelevant here, as you're only writing, but I'm not sure if it's possible to write into a cache-line which hasn't been fetched.
I guess the memory locality is the main culprit:
I added a method "reversed" which works juts like simple, but with
g[j][i] = i + j;
instead of
g[i][j] = i + j;
This "innocuous" change is a performance desaster:
Benchmark Mode Samples Mean Mean error Units
o.o.j.s.ArrayFillBenchmark.optimized avgt 20 10.484 0.048 ns/op
o.o.j.s.ArrayFillBenchmark.reversed avgt 20 20.989 0.294 ns/op
o.o.j.s.ArrayFillBenchmark.simple avgt 20 0.693 0.003 ns/op
I wrote version that works faster than "simple". But, I don't know why it is faster (. Here is the code:
class A {
public static void main(String[] args) {
int n = 8009;
long st, en;
// one
int gg[][] = new int[n][n];
st = System.nanoTime();
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
gg[i][j] = i + j;
}
}
en = System.nanoTime();
System.out.println("\nOne time " + (en - st)/1000000.d + " msc");
// two
int g[][] = new int[n][n];
st = System.nanoTime();
int odd = (n%2), l=n-odd;
for(int i = 0; i < l; ++i) {
int t0, t1;
int a0[] = g[t0 = i];
int a1[] = g[t1 = ++i];
for(int j = 0; j < n; ++j) {
a0[j] = t0 + j;
a1[j] = t1 + j;
}
}
if(odd != 0)
{
int i = n-1;
int a[] = g[i];
for(int j = 0; j < n; ++j) {
a[j] = i + j;
}
}
en = System.nanoTime();
System.out.println("\nOptimized time " + (en - st)/1000000.d + " msc");
int r = g[0][0]
// + gg[0][0]
;
System.out.println("\nZZZZ = " + r);
}
}
The results are:
One time 165.177848 msc
Optimized time 99.536178 msc
ZZZZ = 0
Can someone explain me we why it is faster?
http://www.learn-java-tutorial.com/Arrays.cfm#Multidimensional-Arrays-in-Memory
Picture: http://www.learn-java-tutorial.com/images/4715/Arrays03.gif
int[][] === array of arrays of values
int[] === array of values
class A {
public static void main(String[] args) {
int n = 5000;
int g[][] = new int[n][n];
long st, en;
// one
st = System.nanoTime();
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
g[i][j] = 10;
}
}
en = System.nanoTime();
System.out.println("\nTwo time " + (en - st)/1000000.d + " msc");
// two
st = System.nanoTime();
for(int i = 0; i < n; i++) {
g[i][i] = 20;
for(int j = 0; j < i; j++) {
g[j][i] = g[i][j] = 20;
}
}
en = System.nanoTime();
System.out.println("\nTwo time " + (en - st)/1000000.d + " msc");
// 3
int arrLen = n*n;
int[] arr = new int[arrLen];
st = System.nanoTime();
for(int i : arr) {
arr[i] = 30;
}
en = System.nanoTime();
System.out.println("\n3 time " + (en - st)/1000000.d + " msc");
// 4
st = System.nanoTime();
int i, j;
for(i = 0; i < n; i++) {
for(j = 0; j < n; j++) {
arr[i*n+j] = 40;
}
}
en = System.nanoTime();
System.out.println("\n4 time " + (en - st)/1000000.d + " msc");
}
}
Two time 71.998012 msc
Two time 551.664166 msc
3 time 63.74851 msc
4 time 57.215167 msc
P.S. I'am not a java spec =)
I see, you allocated a new array for the second run, but still, did you try changing the order of "unoptimized" and "optimized" runs? – fikto
I changed order of them and optimized it a little bit:
class A {
public static void main(String[] args) {
int n = 8009;
double q1, q2;
long st, en;
// two
int g[][] = new int[n][n];
st = System.nanoTime();
int odd = (n%2), l=n-odd;
for(int i = 0; i < l; ++i) {
int t0, t1;
int a0[] = g[t0 = i];
int a1[] = g[t1 = ++i];
for(int j = 0; j < n; ++j, ++t0, ++t1) {
a0[j] = t0;
a1[j] = t1;
}
}
if(odd != 0)
{
int i = n-1;
int a[] = g[i];
for(int j = 0; j < n; ++j, ++i) {
a[j] = i;
}
}
en = System.nanoTime();
System.out.println("Optimized time " + (q1=(en - st)/1000000.d) + " msc");
// one
int gg[][] = new int[n][n];
st = System.nanoTime();
for(int i = 0; i < n; i++) {
for(int j = 0; j < n; j++) {
gg[i][j] = i + j;
}
}
en = System.nanoTime();
System.out.println("One time " + (q2=(en - st)/1000000.d) + " msc");
System.out.println("1 - T1/T2 = " + (1 - q1/q2));
}
}
And results are:
Optimized time 99.360293 msc
One time 162.23607 msc
1 - T1/T2 = 0.3875573231033026

printf always shows 0 processor time as output

I wanted to compare two different ways of testing odd or even and I thought of testing which is faster so I tried using the clock() function and clock_t variables.
Nothing seemed to work. I searched a lot on the web and modified my code based on answers I found on stackoverflow, but still nothing.
This is my code:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
#include<stdint.h>
clock_t startm, stopm;
#define START if ( (startm = clock()) == -1) {printf("Error calling clock");exit(1);}
#define STOP if ( (stopm = clock()) == -1) {printf("Error calling clock");exit(1);}
#define PRINTTIME printf( "%ju ticks used by the processor.", (uintmax_t)(stopm-startm));
#define COUNT 18446744073709551600
#define STEP COUNT/100
int timetest(void){
unsigned long long int i = 0, y =0 , x = 76546546545541; // x = a random big odd number
clock_t startTime,stopTime;
printf("\nstarting bitwise method :\n");
START;
for(i = 0 ; i < COUNT ; i++){
if(x&1) y=1;
}
STOP;
printf("\n");
PRINTTIME;
y=0;
printf("\nstarting mul-div method :\n");
START;
for(i = 0; i < COUNT ; i++){
if(((x/2)*2) != x ) y=1;
}
STOP;
printf("\n");
PRINTTIME;
printf("\n\n");
return 0;
}
I'm always getting 0 ticks used by the processor. as the output.
Any help would be highly appreciated.
edit :
iv had enough of compiler issues.
created a java version of the above program. gives me answers. though its for the java platform.
public class test {
private final static int count = 500000000;
private final static long num = 55465465465465L;
private final static int loops = 25;
private long runTime;
private long result;
private long bitArr[] = new long[loops];
private long mulDivArr[] = new long[loops];
private double meanVal;
private void bitwiser() {
for (int i = 0; i < count; i++) {
result = num & 1;
}
}
private void muldiv() {
for (int i = 0; i < count; i++) {
result = (num / 2) * 2;
}
}
public test() {
// run loops and gather info
for (int i = 0; i < loops; i++) {
runTime = System.currentTimeMillis();
bitwiser();
runTime = System.currentTimeMillis() - runTime;
bitArr[i] = runTime;
runTime = System.currentTimeMillis();
muldiv();
runTime = System.currentTimeMillis() - runTime;
mulDivArr[i] = runTime;
}
// calculate stats
meanVal = stats.mean(bitArr);
System.out.println("bitwise time : " + meanVal);
meanVal = stats.mean(mulDivArr);
System.out.println("muldiv time : " + meanVal);
}
public static void main(String[] args) {
new test();
}
}
final class stats {
private stats() {
// empty
}
public static double mean(long[] a) {
if (a.length == 0)
return Double.NaN;
long sum = sum(a);
return (double) sum / a.length;
}
public static long sum(long[] a) {
long sum = 0L;
for (int i = 0; i < a.length; i++) {
sum += a[i];
}
return sum;
}
}
output (in millisecs) :
bitwise time : 1109.52
muldiv time : 1108.16
on average , bitwise seems to be a tad slower than muldiv.
This:
#define COUNT 18446744073709551600
will overflow, you must append ULL to make the literal have type unsigned long long.

Categories