Hi
I have a question that this is my class which for each "n" will get the average time for it.
also the method that I want to take its performance has T(n)= O(nlogn)
my code :
public class NewClass1 {
public static void main(String[] args) {
List<Point> randList = new ArrayList<Point>();
for (int n = 100; n <= 500; n+=200) {
Random rand = new Random();
for (int i = 1; i <= n; i++) {
Point point = new Point(rand.nextInt(10), rand.nextInt(10));
randList.add(point);
}
get(randList);
}
}
public static void get(List<Point> list) {
long time = 0;
for(int i=1;i<10;i++) {
long t = System.currentTimeMillis();
GrahamVersion.grahamScan(list);
long t0 = System.currentTimeMillis();
time = time+t0-t;
}
System.out.println((double)time/10);
}
}
and it will print:
1.5
1.6
0.0
the average time is OK? because for n = 500 will have 0.0 and for n = 300 will have 1.6
A number of things that are / may be causing "strange" results.
First, your benchmarking is not taking account of the need to "warm up" the JVM. You should put a big loop around the benchmark code and run it a number of times until the numbers seem to stabilize. For example:
public static void main(String[] args) {
while (true) {
List<Point> randList = new ArrayList<Point>();
for (int n = 100; n <= 500; n+=200) {
...
}
}
}
(By running the benchmark in a loop like this, you give the JVM a chance to load and compile the code classes to native code, so that your results are not distorted by the overheads of class loading, JIT compilation and so on.)
Second, you should be printing the results with greater precision.
Third, you should be looking at more than just 3 datapoints.
Finally, you may have fallen into the trap of assuming that big O allows you to predict behavior with small values of N. This is not correct. It only tells you what happens as N tends to infinity. And even then, it only tells you the upper bound performance.
You need to run the test for at least 2 seconds before you will get reproduceable results. Your test runs so fast that your can't measure it with currentTimeMillis, I suggest using System.nanoTime(), after you have run the test for 2 secs.
Related
This question already has answers here:
How do I write a correct micro-benchmark in Java?
(11 answers)
Closed last month.
Here is my code - it is just a basic introduction to parallel computing that was done in class. There was supposed to be a speedup of around 2 for numbers being added up in a random array. All but about four people were getting the correct results. To note, I am using a 2020 MacBook Air with a 1.2Ghz Quad-Core I7.
import java.util.Arrays;
public class TestProgram
{
public static double addSequential(double\[\] values) //method that adds numbers in an array.
{
double sum = 0;
for(int i=0; i\<values.length; i++)
sum += values\[i\];
return sum;
}
public static double addParallel(double[] values) //method that adds code to potentially do parallel computing.
{
int mid = values.length / 2; //calculates a mid point.
SumArrayTask left = new SumArrayTask(0, mid, values);
SumArrayTask right = new SumArrayTask(mid, values.length, values);
left.fork();
right.compute();
left.join();
return left.getResult() + right.getResult();
}
public static void main(String[] args)
{
double[] arr = new double[10];
for(int i = 0; i<arr.length; i++) //create an array with 10 RANDOM values 0-100
arr[i] = Math.floor(100*Math.random()); //Math.random picks a random # between 0-1, so we multiply by 100.
System.out.println(Arrays.toString(arr));
long start, sequentialTime, parallelTime;
start = System.nanoTime();
System.out.println("Result (sequential): " + addSequential(arr)); //Prints out all elements of array added up.
System.out.println("Time: " + (sequentialTime = System.nanoTime() - start) + " ns"); //Prints how many nanoseconds the processing takes.
start = System.nanoTime();
System.out.println("Result (parallel): " + addParallel(arr)); //Prints out all elements of array added up with parallel
System.out.println("Time: " + (parallelTime = System.nanoTime() - start) + " ns"); //Prints how many nanoseconds the parallel processing takes.
System.out.println("Speedup: " + sequentialTime / parallelTime);
}
}
import java.util.concurrent.RecursiveAction;
public class SumArrayTask extends RecursiveAction
{
private int start;
private int end;
private double[] data;
private double result;
public SumArrayTask(int startIndex, int endIndex, double[] arr)
{
start = startIndex;
end = endIndex;
data = arr;
}
public double getResult() //getter method for result
{
return result;
}
protected void compute()
{
double sum = 0;
for(int i = start; i<end; i++)
sum += data[i];
result = sum;
}
}
My result:
I was expecting a speedup of around 2. I've had others try and they get a completely different result with their pc's. I am completely unsure if it may have something to do with my setup or the code itself. I appreciate any help.
First of all, your way of "benchmarking" will always give misleading results:
You do I/O (System.out()) within the benchmarked code. This alone will take much longer than adding ten numbers.
You do not execute the code multiple times. The first executions in Java will always be slower than later ones, due to the "learning phase" of the Hotspot compiler.
Seeing that a simple "add ten doubles" task seemingly takes more than 100,000 clock cycles could already have alarmed you that your measuring must be wrong. Ten additions should not take more than maybe 100 cycles or so.
Now let's talk about parallel execution. There is a cost to creating and managing a thread (or letting the java.util.concurrent package do it for you), and this can be quite high. So, although each parallel task will probably (*) consume less time than the full loop, the management time for the threads will outweigh that by far in your case.
So, in general, only think about parallel execution for code that takes seconds, not microseconds.
(*) It's not even as clear that the half-array loops will take less time than the full-array loop, as there are more variables involved, making it harder for the Hotspot compiler to do aggressive optimizations like e.g. loop unfolding.
Say I want to go through a loop a billion times how could I optimize the loop to get my results faster?
As an example:
double randompoint;
for(long count =0; count < 1000000000; count++) {
randompoint = (Math.random() * 1) + 0; //generate a random point
if(randompoint <= .75) {
var++;
}
}
I was reading up on vecterization? But I'm not quite sure how to go about it. Any Ideas?
Since Java is cross-platform, you pretty much have to rely on the JIT to vectorize. In your case it can't, since each iteration depends heavily on the previous one (due to how the RNG works).
However, there are two other major ways to improve your computation.
The first is that this work is very amenable to parallelization. The technical term is embarrassingly parallel. This means that multithreading will give a perfectly linear speedup over the number of cores.
The second is that Math.random() is written to be multithreading safe, which also means that it's slow because it needs to use atomic operations. This isn't helpful, so we can skip that overhead by using a non-threadsafe RNG.
I haven't written much Java since 1.5, but here's a dumb implementation:
import java.util.*;
import java.util.concurrent.*;
class Foo implements Runnable {
private long count;
private double threshold;
private long result;
public Foo(long count, double threshold) {
this.count = count;
this.threshold = threshold;
}
public void run() {
ThreadLocalRandom rand = ThreadLocalRandom.current();
for(long l=0; l<count; l++) {
if(rand.nextDouble() < threshold)
result++;
}
}
public static void main(String[] args) throws Exception {
long count = 1000000000;
double threshold = 0.75;
int cores = Runtime.getRuntime().availableProcessors();
long sum = 0;
List<Foo> list = new ArrayList<Foo>();
List<Thread> threads = new ArrayList<Thread>();
for(int i=0; i<cores; i++) {
// TODO: account for count%cores!=0
Foo t = new Foo(count/cores, threshold);
list.add(t);
Thread thread = new Thread(t);
thread.start();
threads.add(thread);
}
for(Thread t : threads) t.join();
for(Foo f : list) sum += f.result;
System.out.println(sum);
}
}
You can also optimize and inline the random generator, to avoid going via doubles. Here it is with code taken from the ThreadLocalRandom docs:
public void run() {
long seed = new Random().nextLong();
long limit = (long) ((1L<<48) * threshold);
for(int i=0; i<count; i++) {
seed = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1);
if (seed < limit) ++result;
}
}
However, the best approach is to work smarter, not harder. As the number of events increases, the probability tends towards a normal distribution. This means that for your huge range, you can randomly generate a number with such a distribution and scale it:
import java.util.Random;
class StayInSchool {
public static void main(String[] args) {
System.out.println(coinToss(1000000000, 0.75));
}
static long coinToss(long iterations, double threshold) {
double mean = threshold * iterations;
double stdDev = Math.sqrt(threshold * (1-threshold) * iterations);
double p = new Random().nextGaussian();
return (long) (p*stdDev + mean);
}
}
Here are the timings on my 4 core system (including VM startup) for these approaches:
Your baseline: 20.9s
Single threaded ThreadLocalRandom: 6.51s
Single threaded optimized random: 1.75s
Multithreaded ThreadLocalRandom: 1.67s
Multithreaded optimized random: 0.89s
Generating a gaussian: 0.14s
I'm in troubles with a multithreading java program.
The program consists of a splitted sum of an array of integers with multithreads and than the total sum of the slices.
The problem is that computing time does not decrements by incrementing number of threads (I know that there is a limit number of threads after that the computing time is slower than less threads). I expect to see a decrease of execution time before that limit number of threads (benefits of parallel execution). I use the variable fake in run method to make time "readable".
public class MainClass {
private final int MAX_THREAD = 8;
private final int ARRAY_SIZE = 1000000;
private int[] array;
private SimpleThread[] threads;
private int numThread = 1;
private int[] sum;
private int start = 0;
private int totalSum = 0;
long begin, end;
int fake;
MainClass() {
fillArray();
for(int i = 0; i < MAX_THREAD; i++) {
threads = new SimpleThread[numThread];
sum = new int[numThread];
begin = (long) System.currentTimeMillis();
for(int j = 0 ; j < numThread; j++) {
threads[j] = new SimpleThread(start, ARRAY_SIZE/numThread, j);
threads[j].start();
start+= ARRAY_SIZE/numThread;
}
for(int k = 0; k < numThread; k++) {
try {
threads[k].join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
end = (long) System.currentTimeMillis();
for(int g = 0; g < numThread; g++) {
totalSum+=sum[g];
}
System.out.printf("Result with %d thread-- Sum = %d Time = %d\n", numThread, totalSum, end-begin);
numThread++;
start = 0;
totalSum = 0;
}
}
public static void main(String args[]) {
new MainClass();
}
private void fillArray() {
array = new int[ARRAY_SIZE];
for(int i = 0; i < ARRAY_SIZE; i++)
array[i] = 1;
}
private class SimpleThread extends Thread{
int start;
int size;
int index;
public SimpleThread(int start, int size, int sumIndex) {
this.start = start;
this.size = size;
this.index = sumIndex;
}
public void run() {
for(int i = start; i < start+size; i++)
sum[index]+=array[i];
for(long i = 0; i < 1000000000; i++) {
fake++;
}
}
}
Unexpected Result Screenshot
As a general rule, you won't get a speedup from multi-threading if the "work" performed by each thread is less than the overheads of using the threads.
One of the overheads is the cost of starting a new thread. This is surprisingly high. Each time you start a thread the JVM needs to perform syscalls to allocate the thread stack memory segment and the "red zone" memory segment, and initialize them. (The default thread stack size is typically 500KB or 1MB.) Then there are further syscalls to create the native thread and schedule it.
In this example, you have 1,000,000 elements to sum and you divide this work among N threads. As N increases, the amount of work performed by each thread decreases.
It is not hard to see that the time taken to sum 1,000,000 elements is going to be less than the time needed to start 4 threads ... just based on counting the memory read and write operations. Then you need to take into account that the child threads are created one at a time by the parent thread.
If you do the analysis completely, it is clear that there is a point at which adding more threads actually slows down the computation even if you have enough to cores to run all threads in parallel. And your benchmarking seems to suggest1 that that point is around about 2 threads.
By the way, there is a second reason why you may not get as much speedup as you expect for a benchmark like this one. The "work" that each thread is doing is basically scanning a large array. Reading and writing arrays will generate requests to the memory system. Ideally, these requests will be satisfied by the (fast) on-chip memory caches. However, if you try to read / write an array that is larger than the memory cache, then many / most of those requests turn into (slow) main memory requests. Worse still, if you have N cores all doing this then you can find that the number of main memory requests is too much for the memory system to keep up .... and the threads slow down.
The bottom line is that multi-threading does not automatically make an application faster, and it certainly won't if you do it the wrong way.
In your example:
the amount of work per thread is too small compared with the overheads of creating and starting threads, and
memory bandwidth effects are likely to be a problem if can "factor out" the thread creation overheads
1 - I don't understand the point of the "fake" computation. It probably invalidates the benchmark, though it is possible that the JIT compiler optimizes it away.
Why sum is wrong sometimes?
Because ARRAY_SIZE/numThread may have fractional part (e.g. 1000000/3=333333.3333333333) which gets rounded down so start variable loses some hence the sum maybe less than 1000000 depending on the value of divisor.
Why the time taken is increasing as the number of threads increases?
Because in the run function of each thread you do this:
for(long i = 0; i < 1000000000; i++) {
fake++;
}
which I do not understand from your question :
I use the variable fake in run method to make time "readable".
what that means. But every thread needs to increment your fake variable 1000000000 times.
As a side note, for what you're trying to do there is the Fork/Join-Framework. It allows you easily split tasks recursively and implements an algorithm which will distribute your workload automatically.
There is a guide available here; it's example is very similar to your case, which boils down to a RecursiveTask like this:
class Adder extends RecursiveTask<Integer>
{
private int[] toAdd;
private int from;
private int to;
/** Add the numbers in the given array */
public Adder(int[] toAdd)
{
this(toAdd, 0, toAdd.length);
}
/** Add the numbers in the given array between the given indices;
internal constructor to split work */
private Adder(int[] toAdd, int fromIndex, int upToIndex)
{
this.toAdd = toAdd;
this.from = fromIndex;
this.to = upToIndex;
}
/** This is the work method */
#Override
protected Integer compute()
{
int amount = to - from;
int result = 0;
if (amount < 500)
{
// base case: add ints and return the result
for (int i = from; i < to; i++)
{
result += toAdd[i];
}
}
else
{
// array too large: split it into two parts and distribute the actual adding
int newEndIndex = from + (amount / 2);
Collection<Adder> invokeAll = invokeAll(Arrays.asList(
new Adder(toAdd, from, newEndIndex),
new Adder(toAdd, newEndIndex, to)));
for (Adder a : invokeAll)
{
result += a.invoke();
}
}
return result;
}
}
To actually run this, you can use
RecursiveTask adder = new Adder(fillArray(ARRAY_LENGTH));
int result = ForkJoinPool.commonPool().invoke(adder);
Starting threads is heavy and you'll only see the benefit of it on large processes that don't compete for the same resources (none of it applies here).
I wrote a small program to find the first 5 Taxicab numbers (so far only 6 are known) by checking each integer from 2 to 5E+15. The definition of Taxicab numbers is here.
However, my program took 8 minutes just to reach 3E+7. Since Taxicab(3) is in the order of 8E+7, I hesitate to let it run any further without optimizing it first.
I'm using NetBeans 8 on Ubuntu 16.10 on a HP 8560w, i7 2600qm quad core, 16GB RAM. However, Java only uses 1 core, to a maximum of 25% total CPU power, even when given Very High Priority. How do I fix this?
public class Ramanujan
{
public static void main(String[] args)
{
long limit;
//limit = 20;
limit = 500000000000000000L;
int order = 1;
for (long testCase = 2; testCase < limit; testCase++)
{
if (isTaxicab(testCase, order))
{
System.out.printf("Taxicab(%d) = %d*****************************\n",
order, testCase);
order++;
}
else
{
if (testCase%0x186a0 ==0) //Prints very 100000 iterations to track progress
{
//To track progress
System.out.printf("%d \n", testCase);
}
}
}
}
public static boolean isTaxicab(long testCase, int order)
{
int way = 0; //Number of ways that testCase can be expressed as sum of 2 cube numbers.
long i = 1;
long iUpperBound = (long) (1+Math.cbrt(testCase/2));
//If testCase = i*i*i + j*j*j AND i<=j
//then i*i*i cant be > testCase/2
//No need to test beyond that
while (i < iUpperBound)
{
if ( isSumOfTwoCubes(testCase, i) )
{
way++;
}
i++;
}
return (way >= order);
}
public static boolean isSumOfTwoCubes(long testCase,long i)
{
boolean isSum = false;
long jLowerBound = (long) Math.cbrt(testCase -i*i*i);
for (long j = jLowerBound; j < jLowerBound+2; j++)
{
long sumCubes = i*i*i + j*j*j;
if (sumCubes == testCase)
{
isSum = true;
break;
}
}
return isSum;
}
}
The program itself will only ever use one core until you parallelize it.
You need to learn how to use Threads.
Your problem is embarrassingly parallel. Parallelizing too much (i.e. creating too many threads) will be detrimental because each thread creates an overhead, so you need to be careful regarding exactly how you parallelize.
If it was up to me, I would initialize a list of worker threads where each thread effectively performs isTaxicab() and simply assign a single testCase to each worker as it becomes available.
You would want to code such that you can easily experiment with the number of workers.
I have the following program. I was just kind of messing around with other stuff when I noticed something unusual. The line "y = 3;" seems to have an effect on how fast the previous block of code can be run. When the line is commented out, the first half of the code runs around ten times slower than the second half. However, when the line is uncommented, both halves run at the same speed. Interestingly, the line in question should not be doing anything, as at that point the value of y is already 3.
EDIT:
I added line "System.out.println(y)" right above "y=3" and it prints 3. That's why I think it's 3. And I'm measuring based on the output of the program. The two lines that it prints are the two runtimes, and the Timer code at the bottom shows clearly how I am measuring time.
/**
* #author lpreams
*/
public class Misc {
public static void main(String[] args) {
new Misc().run();
}
public void run() {
Timer t = new Timer();
t.start();
int y = Integer.MIN_VALUE;
for (int j = 0; j < Integer.MAX_VALUE; ++j) {
for (int i = 0; i < Integer.MAX_VALUE; ++i) {
++y;
}
}
t.stop();
System.out.println(t.getElapsedTime());
t.reset();
//y = 3;
t.start();
for (int j = 0; j < Integer.MAX_VALUE; ++j) {
for (int i = 0; i < Integer.MAX_VALUE; ++i) {
++y;
}
}
t.stop();
System.out.println(t.getElapsedTime());
}
private static class Timer {
private long startTime = 0;
private long stopTime = 0;
private long elapsed = 0;
public void start() {
this.startTime = System.nanoTime()/1000000;
}
public void stop() {
this.stopTime = System.nanoTime()/1000000;
elapsed += stopTime - startTime;
}
public long getElapsedTime() {
return elapsed;
}
public void reset() {
elapsed = 0;
}
}
}
I am running this code in Eclipse on OS X 10.9.2. I am running the latest version of java. My machine is a MacBook Pro with a 2.4ghz Core 2 Duo with 8gb of RAM.
Any results that you get from this micro-benchmark are suspect. You are not taking account of JVM warmup effects.
Having said that, if we can assume that the effect is real I would put it down to the JIT optimizer being unable to detect that the first loop body can be optimized away ... when the y = 3 assignment is there. You are running into a case where adding a bit more "complexity" is inhibiting an optimization. It happens.
(The value being assigned is immaterial. This is all to do with code generation by the JIT compiler ... which happens before the y value that you predict will be 3 can be calculated by anything. It can influence the JIT compiler's behaviour.)
This is potentially JIT optimisation. If you run with the following vm argument:
-Djava.compiler=NONE
based on this stackoverflow article:
how to make sure no jvm and compiler optimization occurs
You can prevent this and should see the same result. I ran with this argument and got almost the exact same processing time.