Although there might be similar questions (such as A), their answers do not solve my problem.
I am using Android Studio 1.5.1 targeting Android API 18 (before Android KitKat 4.4, so I’m dealing with Dalvik, not ART runtime).
I have a modified Android that adds memory space overhead (specifically designed by the author and it is outside the scope of this question) with any used variables. For example, if we declare an integer variable, the variable will be stored in 8 bytes (64-bit) instead of 4 bytes (32-bit). This modification is completely transparent to apps which can run on the modified Android without any problem.
I need to measure that overhead in execution time, for example, when I use variables.
Here is what I did so far but it does not seems to work because the overhead variable (at the end of //Method #1 in the code below) is inconsistent, sometime it is negative, positive, or zero. In the ideal solution, it should be always (or at least most of the time) positive.
long start, end, time1, time2, overhead;
//Baseline
start = System.nanoTime();
total=0; total+=1; total+=2; total+=3; total+=4; total+=5; total+=6;
total+=7; total+=8; total+=9;
end = System.nanoTime();
System.out.println("********************* The sum is " + total);
time1 = end - start;
System.out.println("********************* start=" + start + " end=" + end + " time=" + time1);
//Method #1
start = System.nanoTime();
total = (a0() + a1() + a2() + a3() + a4() + a5() + a6() + a7() + a8() + a9());
end = System.nanoTime();
System.out.println("********************* The sum is " + total);
time2 = end - start;
System.out.println("********************* start=" + start + " end=" + end + " time=" + time2);
overhead = time2 - time1;
System.out.println("********************* overhead=" + overhead );
}
private int a0()
{
return 0;
}
private int a1()
{
return 1;
}
private int a2()
{
return 2;
}
private int a3()
{
return 3;
}
private int a4()
{
return 4;
}
private int a5()
{
return 5;
}
private int a6()
{
return 6;
}
private int a7()
{
return 7;
}
private int a8()
{
return 8;
}
private int a9()
{
return 9;
}
My question is:
In Android, how to measure that execution time overhead programmatically?
What you are describing is simply experimental error.
the overhead variable is inconsistent, sometime it is negative,
positive, or zero. In the ideal solution, it should be always (or at
least most of the time) positive.
I don't have an exact solution for you problem on Android, but when I have done experimental testing in other contexts, I typically run multiple iterations and then divide by the number of iterations to get an average.
Here is some pseudocode:
int N = 10000;
startTimer();
for (int i = 0; i < N; i++) {
runExperiment();
}
stopTimer();
double averageRuntime = timer / N;
The problem is that the code that you are trying to time is executing faster than the resolution of System.nanotime(). Try doing your additions in a loop, for e.g.
for (int i = 0; i < 1000; i++) {
total += i;
}
Increase the loop count (1000) until you start getting reasonable elapsed times.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
Summary
I am doing a million floating point divisions in different threads, these threads share nothing from programmer's point of view i.e. no explicit locks are involved.
Following are the perf numbers when I ran java -jar /tmp/my-exps-1.0-SNAPSHOT.jar 1000000 100 on machines having 8, 16 and 32 vcores (n/2 cores with 2 threads per core).
1000000 is number of floating point divisions that each thread does, 100 is number of threads.
All the processors belonged to same family - Intel(R) Xeon(R) Platinum 8175M CPU # 2.50GHz.
Using htop, I was seeing a 100% usage for all the cores when the program was running.
====================================
runtime.availableProcessors() = 8
=====runBenchmark() FINISHED in 3156601 millis ======== average time to complete one thread = 31566.01
====================================
runtime.availableProcessors() = 16
=====runBenchmark() FINISHED in 3297807 millis ======== average time to complete one thread = 32978.07
====================================
runtime.availableProcessors() = 32
=====runBenchmark() FINISHED in 3448590 millis ======== average time to complete one thread = 34485.9
====================================
Expectation
I expected it to scale linearly with number of cores i.e execution time should decrease in proportion to increase in CPU cores. However the above numbers are totally opposite. It increases by a small fraction with increment in vcores :D
Not very sure but some guesses that I made
Context switch time increases with number of cores ?
Some implicit locks are taken on data and those become more expensive as the race increase with number of cores.
Code Details
Thread
private static class BenchmarkThread implements Callable<BenchmarkResult> {
private final long numOperationsPerThread;
private BenchmarkThread(long numOperationsPerThread) {
this.numOperationsPerThread = numOperationsPerThread;
}
#Override
public BenchmarkResult call() {
double sum = 0;
long start = System.currentTimeMillis();
for (long i = 0; i < numOperationsPerThread; i++) {
double numerator = RANDOM.nextDouble();
double denominator = RANDOM.nextDouble();
double result = numerator / denominator;
sum += result;
}
long end = System.currentTimeMillis();
return new BenchmarkResult(Thread.currentThread().getName(),
(end - start),
sum);
}
}
Driver (yes resultFuture.get() blocks but we are not counting that time, we are doing a sum of individual thread times timeToComplete += benchmarkResult.timeToCompleteMillis)
Complete runnable example(edited - see EDIT 1 below)
public class VerticalScalingExp {
private final long numOperationsPerThread;
private final int numThreads;
private final List<Future<BenchmarkResult>> benchMarkResultsFuture;
private final ExecutorService executorService;
public VerticalScalingExp(long numOperationsPerThread, int numThreads) {
this.numOperationsPerThread = numOperationsPerThread;
this.numThreads = numThreads;
this.benchMarkResultsFuture = new ArrayList<>(numThreads);
this.executorService = Executors.newFixedThreadPool(numThreads);
}
public static void main(String[] args) throws Exception {
long numOperationsPerThread;
int numThreads;
if (args.length != 2) {
numOperationsPerThread = 1000000;
numThreads = 50;
} else {
numOperationsPerThread = Long.parseLong(args[0]);
numThreads = Integer.parseInt(args[1]);
}
new VerticalScalingExp(numOperationsPerThread, numThreads).runBenchmark();
}
private void runBenchmark() throws Exception {
try {
System.out.println("[START]====VerticalScalingExp.runBenchmark====" );
System.out.println("numOperationsPerThread = " + numOperationsPerThread + ", numThreads = " + numThreads);
Runtime runtime = Runtime.getRuntime();
System.out.println("runtime.maxMemory() = " + runtime.maxMemory());
System.out.println("runtime.freeMemory() = " + runtime.freeMemory());
System.out.println("runtime.availableProcessors() = " + runtime.availableProcessors());
long timeToComplete = 0;
for (int i = 0; i < numThreads; i++) {
benchMarkResultsFuture.add(executorService.submit(new BenchmarkThread(numOperationsPerThread)));
}
for (Future<BenchmarkResult> resultFuture : benchMarkResultsFuture) {
BenchmarkResult benchmarkResult = resultFuture.get();
System.out.println("resultFuture.get() = " + benchmarkResult);
timeToComplete += benchmarkResult.timeToCompleteMillis;
}
double avg = (double) timeToComplete / numThreads;
System.out.println("=====runBenchmark() FINISHED in " + timeToComplete +
" millis ======== average time to complete one thread = " + avg );
} finally {
executorService.shutdown();
}
}
private static class BenchmarkThread implements Callable<BenchmarkResult> {
private final long numOperationsPerThread;
private BenchmarkThread(long numOperationsPerThread) {
this.numOperationsPerThread = numOperationsPerThread;
}
#Override
public BenchmarkResult call() {
double sum = 0;
long start = System.currentTimeMillis();
ThreadLocalRandom random = ThreadLocalRandom.current();
for (long i = 0; i < numOperationsPerThread; i++) {
double numerator = random.nextDouble();
double denominator = random.nextDouble();
double result = numerator / denominator;
sum += result;
}
long end = System.currentTimeMillis();
return new BenchmarkResult(Thread.currentThread().getName(),
(end - start),
sum);
}
}
private static class BenchmarkResult {
private final String threadName;
private final long timeToCompleteMillis;
private final double sum;
public BenchmarkResult(String threadName, long timeToCompleteMillis, double sum) {
this.threadName = threadName;
this.timeToCompleteMillis = timeToCompleteMillis;
this.sum = sum;
}
#Override
public String toString() {
return "BenchmarkResult{" +
"threadName='" + threadName + '\'' +
", timeToCompleteMillis=" + timeToCompleteMillis +
", sum =" + sum +
'}';
}
}
}
Questions
I found this blog - https://www.theserverside.com/feature/Why-Java-Applications-Fail-to-Scale-Linearly which explains very nicely about data collisions that happen with increase in number of cores, is the code above suffering from the same ? (But in my code I am not sharing anything among threads ?) If yes, at what level these collisions are happening, Heap ? CPU caches ? something else ?
Is this a commonly observed pattern ? (perf does not scale linearly with cpu cores)
What can be done to make it scale as expected ?
Apologies for the long post :) Thanks :)
EDIT 1:
As suggested in comments and answer, I tried using ThreadLocalRandom.
Performance increases a lot as compared to previous cases but still, performance decreases with increase in cores.
====================================
runtime.availableProcessors() = 8
=====runBenchmark() FINISHED in 1683 millis ======== average time to complete one thread = 16.83
====================================
runtime.availableProcessors() = 16
=====runBenchmark() FINISHED in 6622 millis ======== average time to complete one thread = 66.22
====================================
runtime.availableProcessors() = 32
=====runBenchmark() FINISHED in 19924 millis ======== average time to complete one thread = 199.24
====================================
This could be the cause of the problem:
private static final Random RANDOM = new Random();
Because this is contended between all threads.
Try a ThreadLocalRandom instead.
Also, I would use a more reliable benchmarking approach like JMH.
I don't think you're measuring what you think you're measuring. You have 100 tasks and you measure how much time each task takes to finish. Suppose each takes 2sec. So if we execute them one after another it'll be 2sec * 100.
Now suppose you run them in 8 threads and 8 cores. This doesn't (ideally) change the amount of time each task takes, so you still have 2sec for each task. And you again have 2sec * 100 of summed time. But the overall execution time changes - it's (2sec * 100) / 8 because this summed time is now spread across 8 cores instead of 1.
So what you need to measure is the total time it takes for the program to run. Just measure it in runBenchmark() method:
private void runBenchmark() throws Exception {
try {
long started = System.nanoTime();
for (int i = 0; i < numThreads; i++)
benchMarkResultsFuture.add(executorService.submit(new BenchmarkThread(numOperationsPerThread)));
for (Future<BenchmarkResult> resultFuture : benchMarkResultsFuture)
resultFuture.get();
long timeToComplete = (System.nanoTime() - started) / 1000;
System.out.println("=====runBenchmark() FINISHED in " + timeToComplete);
} finally {
executorService.shutdown();
}
}
I solved the Project Euler problem #14 https://projecteuler.net/problem=14 on Java, but when I run it in Powershell, it stops iterating at exactly i = 113383 every time. I rewrote the solution on python, and it works perfectly fine, albeit slowly. According to my (identical) python solution, the answer is that the number that produces the longest chain is 837799 and the chain is 524 operations long.
Why does the Java solution not finish the for-loop? Is there some kind of limit in Java on how long it can stay in a loop? I cannot come up with any other explanation. Java code below. I wrote the System.out.println(i) there just to see what is going on.
class ProjectEuler14 {
public static void main(String[] args) {
int largestNumber = 1;
int largestChain = 1;
int currentNumber;
int chainLength;
for (int i = 2; i < 1000000; i++) {
System.out.println(i);
currentNumber = i;
chainLength = 0;
while (currentNumber != 1) {
if (currentNumber % 2 == 0) currentNumber /= 2;
else currentNumber = 3 * currentNumber + 1;
chainLength++;
}
if (chainLength > largestChain) {
largestChain = chainLength;
largestNumber = i;
}
}
System.out.println("\n\nThe number under million that produces the "
+ "longest chain is " + largestNumber +
" and the chain's length is " + largestChain);
}
}
It's not the for loop. It's the while loop. The condition currentNumber != 1 is always true; forever.
In java, an int is specifically defined as an integral number between -2^31 and +2^31 -1, inclusive, and operations 'roll over'. try it!
int x = 2^31 -1;
x++;
System.out.println(x);
this prints a large negative number (in fact, precisely -2^31).
It's happening in your algorithm, and that's why it never finishes.
A trivial solution is to 'upgrade' to longs; they are just as fast, really (yay 64-bit processors!) and use 64 bits, thus giving them a range of -2^63 to +2^63-1.
Python sort of scales up its numbers into slowness silently, java makes different choices (and, for crypto and other purposes, that rollover thing is in fact desired).
If you want to go even further, you can always use BigInteger, which grows as much as you need forever (becoming slower and taking more memory as it goes).
To know rollover occurred, the 3* operation would then result in a number that is lower than the original, and you can check for that:
replace:
else currentNumber = 3 * currentNumber + 1;
with:
else {
int newNumber = currentNumber * 3 + 1;
if (newNumber < currentNumber) throw new IllegalStateException("Overflow has occurred; 3 * " + currentNumber + " + 1 exceeds ints capacities.");
currentNumber = newNumber;
}
and rerun it. You'll see your app nicely explain itself.
The currentNumber is exceeding size of int, use long instead.
Do you hava problem overflow int.
Change int to long.
long largestNumber = 1;
long largestChain = 1;
long currentNumber;
long chainLength;
for (int i = 2; i < 1000000; i++) {
//System.out.println(i);
currentNumber = i;
chainLength = 0;
while (currentNumber != 1) {
//System.out.println("# = " + currentNumber);
if (currentNumber % 2 == 0) {
currentNumber /= 2;
} else {
currentNumber = (3 * currentNumber) +1 ;
}
chainLength++;
}
// System.out.println("################################ " + i);
if (chainLength > largestChain) {
largestChain = chainLength;
largestNumber = i;
}
}
System.out.println("\n\nThe number under million that produces the "
+ "longest chain is " + largestNumber
+ " and the chain's length is " + largestChain);
I have a program which is PrimeNumbers class. It displays if the x is a prime or not. The x is the number being analyzed in the program.
There is a time on how long will the program take to know the answer. x is so big it takes 9 seconds to know the answer. How could the program run faster using more threads? I am having a hard time on getting how to implement thread in this situation.
public class PrimeNumbers {
private static int x = 2147483647;
public static boolean prime= true;
public static void main(String[]args){
long start, end, elapsetime;
start= System.currentTimeMillis();
for(int y=2; y<x; y++){
if(x % y == 0){
prime=false;
System.out.println(y);
break;
}
}
end = System.currentTimeMillis();
elapsetime = end - start;
System.out.println("Prime: " + prime);
System.out.println(elapsetime+ " mill sec " + (elapsetime / 1000
+ " seconds."));
}
}
I'm going to ignore whether you've got the most efficient approach and focus on how your current code could be faster with more threads.
You currently iterate through all the numbers from 2 -> x and perform a simple test. A way to improve performance might be to split this task into Z chunks and start Z threads to perform the tests in parallel.
E.g. if you had two threads, you would have one thread examine 2 -> x/2 and the other examine x/2 + 1 -> x. Each thread should break from its testing if a global (and probably volatile) flag is set to true, which would indicate the other thread has disproved the prime.
Your primality test is very inefficient, you're looping over each number less than x. How can you improve it? This link should be helpful.
A good algorithm would be the AKS test, or Sieve of Eratosthenes. The code below implements one of the algorithms from the wiki article, which is much more efficient than the test you posted.
public static boolean isPrime(long n) {
// http://en.wikipedia.org/wiki/Primality test
if (n <= 3) return n > 1;
if (n % 2 == 0 || n % 3 == 0) return false;
for (int i = 5; i*i <=n; i+=6) {
if (n % i == 0 || n % (i+2) == 0) return false;
}
return true;
}
}
If you intersted for a better algorithm Munyari has already suggested one.
Ignoring this following example can help you how you can make parallely execute an algorithm (even if it is a stupid algorithm)
We need a class which implements Callable interface (similar one to Runnable). It should get the part of the job and calculate it.
public class PrimeChecker implements Callable<Boolean> {
private final long numberToCheck;
private final long start;
private final long end;
public PrimeChecker(long numberToCheck, long start, long end) {
this.numberToCheck = numberToCheck;
this.start = start;
if (end >= numberToCheck) {
this.end = numberToCheck - 1;
}else{
this.end = end;
}
System.out.println("A PrimeChecker with start " + start + " and end " + end + " values to check number "
+ numberToCheck);
}
#Override
public Boolean call() throws Exception {
boolean prime = true;
long current = start;
if (current != 2 && (current % 2 == 0)) {
current = current + 1;
}
for (; current < end; current = current + 2) {
if (numberToCheck % current == 0) {
prime = false;
System.out.println("The number " + numberToCheck + " is divisable with " + current);
return prime;
}
}
return prime;
}
}
It simply start from a number and check if the given number numberToCheck is divisable and continues until it reaches the number end.
In the Main class we have to create multiple PrimeChecker jobs and execute them parallely. For this purpose we use Java's ExecutorService. It creates for us a thread pool. And then we can divide the job on multiple PrimeCheckers. Finally we execute them invokeAll method of ExecutorService. This gives us a Future list, which contains results of each jobs that we executed parallely.
public class Main {
public static boolean prime= true;
public static void main(String[] args) throws InterruptedException, ExecutionException {
long startTime = System.currentTimeMillis();
long numberToCheck = 5333334345L;
int numberOfThreads = 10;
System.out.println("Checking if the number " + numberToCheck + " ...");
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<PrimeChecker> primeCheckers = new ArrayList<PrimeChecker>();
long partOfNumber = (long) Math.ceil((double)numberToCheck/ numberOfThreads);
long start = 2 ;
long end = 0;
for(int i = 0; i < numberOfThreads; i++){
end = end + partOfNumber;
primeCheckers.add(new PrimeChecker(numberToCheck, start, end));
start = end+1;
}
List<Future<Boolean>> futures = executor.invokeAll(primeCheckers);
for(Future<Boolean> future : futures){
prime = future.get();
if(prime == false){
break;
}
}
System.out.println("The number " + numberToCheck + " is " + (prime ? "a prime" :"NOT !!!!!!!!!!!!!!!!!!!! a prime") + " number");
long endTime = System.currentTimeMillis();
long elapsetime = endTime - startTime;
System.out.println(elapsetime + " milliseconds");
System.exit(0);
}
}
You can try it with different numbers of threads (see numberOfThreads variable) to see the difference.
I hope it is a useful example for you, to understand multi threading better. (Be careful: It is only a tiny tiny part part of the whole threading theme)
If you do not need to implement the prime check yourself, I would propose to use the API. You can control the certainty, depending on your needs. In the example it is: 1-(1/2)100
public static void main(String[] args) {
BigInteger mersenne = new BigInteger("2").pow(521).add(BigInteger.ONE.negate());
System.out.println("digits of the number: " + mersenne.toString().length());
long start = System.currentTimeMillis();
final int certainty = 100;
boolean isPrime = mersenne.isProbablePrime(certainty);
System.out.println("elapsed millis: " + (System.currentTimeMillis() - start));
System.out.println("isPrime : " + isPrime);
}
edit
Here is an optimised version of the proposed example.
public class PrimeNumbers {
private static int x = 2147483647;
public static boolean prime= true;
public static void main(String[]args){
long start, end, elapsetime;
int divisor = 1;
start= System.currentTimeMillis();
if (x % 2 == 0) {
prime = false;
divisor = 2;
} else {
// - you can use an increment of two
// because you don't need to check
// for a divisor which is a multiple
// of two
// - you don't need to check for any divisor
// which is greater than x/2
for(int y=3; y < x/2; y += 2){
if(x % y == 0){
prime=false;
divisor = y;
break;
}
}
}
end = System.currentTimeMillis();
System.out.println("Prime: " + prime);
if (!prime) {
System.out.println("divisible by: " + divisor);
}
elapsetime = end - start;
System.out.println(elapsetime+ " mill sec " + (elapsetime / 1000
+ " seconds."));
}
}
Please if you could just check my work and help guide me through the System.currentTimeMillis() function. I understand that it takes a snapshot of my computer time and then when I end it it takes another snap shot and I use the difference of those times to get my run time. Just not sure I'm implementing it properly as my times for my iterative function and my recursive are almost always identical or at most 1 off. I'm confused a little as to if my start time is called again before my iterative starts or if really my time check for iterative time is iterative plus my recursive function. Should I have my total iterative time be endTimeIter - endTimeRecur? Any help is appreciated.
public class FibTest{
public static void main (String[] args){
long startTime = System.currentTimeMillis();
int n = 40;
System.out.println("The 40th Fibonacci number per my recursive function is: " + fibRecur(n));
long endTimeRecur = System.currentTimeMillis();
long totalTimeRecur = endTimeRecur - startTime;
System.out.println("The 40th Fibonacci number per my recursive function is: " + fibIter(n));
long endTimeIter = System.currentTimeMillis();
long totalTimeIter = endTimeIter - startTime;
System.out.println("The time it took to find Fib(40) with my recursive method was: " + totalTimeRecur);
System.out.println("The time it took to find Fib(40) with my iterative method was: " + totalTimeIter);
}
public static int fibRecur(int n){
if (n < 3) return 1;
return fibRecur(n-2) + fibRecur(n-1);
}
public static int fibIter(int n){
int fib1 = 1;
int fib2 = 1;
int i, result = 0;
for (i = 2; i < n; i++ ){
result = fib1 + fib2;
fib1 = fib2;
fib2 = result;
}
return result;
}
}
That's one way of how the time difference must be done
long time = System.currentTimeMillis();
methodA();
System.out.println(System.currentTimeMillis() - time);
time = System.currentTimeMillis();
methodB();
System.out.println(System.currentTimeMillis() - time);
In addition to Amir's answer:
One bug in your program is that you print
System.out.println("The 40th Fibonacci number per my recursive function is: " + fibIter(n));
I think what you want to say is:
System.out.println("The 40th Fibonacci number per my iterative function is: " + fibIter(n));
I have recently learned objects can be placed on the stack or on the heap and where it is placed is determined by escape analysis. (Declaring multiple arrays with 64 elements 1000 times faster than declaring array of 65 elements)
In the following example I think the object "test" is placed on the heap, making the runtime a lot longer:
public static void main(String args[]) {
double start = System.nanoTime();
long job = 100000000;// 100 million
int total = 0;
for (long i = 0; i < job; i++) {
int j = 0;
double[] test = new double[63];
test[0] =1;
total += test[0];
while (true) {
if (j == 0)
break;
j--;
}
test[0] = 10; // this makes a really big difference
}
double end = System.nanoTime();
System.out.println("Total runtime = " + (end - start) / 1000000 + " ms" + " total ="+ total);
}
If either the while loop is removed or the "test[0] = 10;" statement, the object test is placed on the stack (I derived this from the fact that the garbage collector is not called in this case, whereas it is when both are present. Also the runtime is 350 ms instead of 6803 ms).
My question is why the object test is placed on the heap if I change/access the content of the object after the while loop?
test is a local reference variable to your main method. All local variables are stored on stack. Here is an image to make you understand what goes on heap and what goes on stack:
Also the runtime is 350 ms instead of 6803 ms
I think it is not about stack/heap but optimization. I'm not sure how Java JIT optimization exacly works, but similar code in C/C++ after optimization would look like this:
public static void main(String args[]) {
double start = System.nanoTime();
long job = 100000000;// 100 million
int total = 100000000;
double end = System.nanoTime();
System.out.println("Total runtime = " + (end - start) / 1000000 + " ms" + " total ="+ total);
}
maybe if you refer to test:
test[0]=10;
it causes that for-loop can not be 'removed'