I have a little question about Java optimization.
I have a code:
import java.util.LinkedList;
import java.util.List;
public class Test {
public static void main(String[] args) {
int testCount = 1_000_000;
test(testCount);
test(testCount);
test(testCount);
}
public static void test(int test) {
List list = new LinkedList();
long start = System.currentTimeMillis();
for (int i = 0; i< test; i++) {
list.add(0, i);
}
long finish = System.currentTimeMillis();
System.out.println("time " + (finish-start));
}
}
Each next iteration of this test much less than previous.
time 2443
time 924
time 143
Could you help me to understand why does it happen?
The problem is actually that Java has kind of a start-up phase. The code actually really gets fast after a short period of time. That's why the first time you perform the function call lasts the longest. When performing several more calls, you will see, that the execution time will be more stable after the first few iterations.
You are experiencing the process of warming-up JVM and kicking in various performance optimizations, including inlining.
Why tests in Java each time goes faster?
you even can not say: tests in Java each time goes faster because it is not correct, just test it again more (i.e. see this demo) so you can't ask for the why of a incorrect statement.
the time of execution of Java programs depends on many other conditions i.e. the situation of CPU, RAM, OS, etc and of course the time of execution of specific code maybe is different each time but we can't say it goes better per each execution.
Related
I am working on a small game project and want to track time in order to process physics. After scrolling through different approaches, at first I had decided to use Java's Instant and Duration classes and now switched over to Guava's Stopwatch implementation, however, in my snippet, both of those approaches have a big gap at the second call of runtime.elapsed(). That doesn't seem like a big problem in the long run, but why does that happen?
I have tried running the code below as both in focus and as a Thread, in Windows and in Linux (Ubuntu 18.04) and the result stays the same - the exact values differ, but the gap occurs. I am using the IntelliJ IDEA environment with JDK 11.
Snippet from Main:
public static void main(String[] args) {
MassObject[] planets = {
new Spaceship(10, 0, 6378000)
};
planets[0].run();
}
This is part of my class MassObject extends Thread:
public void run() {
// I am using StringBuilder to eliminate flushing delays.
StringBuilder output = new StringBuilder();
Stopwatch runtime = Stopwatch.createStarted();
// massObjectList = static List<MassObject>;
for (MassObject b : massObjectList) {
if(b!=this) calculateGravity(this, b);
}
for (int i = 0; i < 10; i++) {
output.append(runtime.elapsed().getNano()).append("\n");
}
System.out.println(output);
}
Stdout:
30700
1807000
1808900
1811600
1812400
1813300
1830200
1833200
1834500
1835500
Thanks for your help.
You're calling Duration.getNano() on the Duration returned by elapsed(), which isn't what you want.
The internal representation of a Duration is a number of seconds plus a nano offset for whatever additional fraction of a whole second there is in the duration. Duration.getNano() returns that nano offset, and should almost never be called unless you're also calling Duration.getSeconds().
The method you probably want to be calling is toNanos(), which converts the whole duration to a number of nanoseconds.
Edit: In this case that doesn't explain what you're seeing because it does appear that the nano offsets being printed are probably all within the same second, but it's still the case that you shouldn't be using getNano().
The actual issue is probably some combination of classloading or extra work that has to happen during the first call, and/or JIT improving performance of future calls (though I don't think looping 10 times is necessarily enough that you'd see much of any change from JIT).
Does anyone have a Fairly effective way of running a function repetitively in a precise and accurate number of milliseconds. I have tried to accomplish this by using the code below to try to run a function called wave() once a second for 30 seconds:
startTime = System.nanoTime();
wholeTime = System.nanoTime();
while (loop) {
if (startTime >= time2) {
startTime = System.nanoTime();
wave();
sec++;
}
if (sec == 30) {
loop = false;
endTime = System.nanoTime();
System.out.println(wholeTime - System.nanoTime());
}
}
}
This code did not work and am wondering why this code didn't work and if their is a better approach to the problem. Any ideas on how to improve fix the above code or other successful ways of accomplishing the problem are all welcome. Thank you for your help!
more simple:
long start=System.currentTimeMillis(); // Not very very accurate
while (System.currentTimeMillis()-start<30000)
{
wave();
// count something
}
You can use a Timer+TimerTask: https://docs.oracle.com/javase/7/docs/api/java/util/Timer.html
https://docs.oracle.com/javase/7/docs/api/java/util/TimerTask.html
http://bioportal.weizmann.ac.il/course/prog2/tutorial/essential/threads/timer.html
You may use Thread.sleep():
public static void main (String[] args) throws InterruptedException {
int count = 30;
long start = System.currentTimeMillis();
for(int i=0; i<count; i++) {
wave();
// how many milliseconds till the end of the second?
long sleep = start+(i+1)*1000-System.currentTimeMillis();
if(sleep > 0) // condition might be false if wave() runs longer than second
Thread.sleep(sleep);
}
}
Does anyone have a Fairly effective way of running a function repetitively in a precise and accurate number of milliseconds.
There is no way to do this kind of thing reliably and accurately in standard Java. The problem is that there is no way that you can guarantee that your thread will run when you want ti to run. For example:
your thread could be suspended to allow the GC to run
your thread could be preempted to allow another thread in your application to run
your thread could be suspended by the OS while it fetches pages by the JVM back from disk.
You can only get reliable behavior for this kind of code if you run on a hard realtime OS, and an realtime Java.
Note that this is not an issue with clock accuracy. The real problem is that the scheduler does not give you the kind of guarantees you need. For instance, none of the "sleep until X" functionality in a JVM can guarantee that your thread will wake up at time X exactly ... for any useful meaning of "exactly".
The other answers suggest various ways to do this, but beware that they are not (and cannot be) reliable and accurate in all circumstances .. or even on a typical machine running other things as well as your application.
Recently, I was writing a plugin using Java and found that retrieving an element(using get()) from a HashMap for the first time is very slow. Originally, I wanted to ask a question on that and found this (No answers though). With further experiments, however, I notice that this phenomenon happens on ArrayList and then all the methods.
Here is the code:
public class Test {
public static void main(String[] args) {
long startTime, stopTime;
// Method 1
System.out.println("Test 1:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod1();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
// Method 2
System.out.println("Test 2:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod2();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
}
public static void testMethod1() {
// Do nothing
}
public static void testMethod2() {
// Do nothing
}
}
Snippet: Test Snippet
The output would be like this:
Test 1:
2485ns
505ns
453ns
603ns
362ns
414ns
424ns
488ns
325ns
426ns
618ns
794ns
389ns
686ns
464ns
375ns
354ns
442ns
404ns
450ns
Test 2:
3248ns
700ns
538ns
531ns
351ns
444ns
321ns
424ns
523ns
488ns
487ns
491ns
551ns
497ns
480ns
465ns
477ns
453ns
727ns
504ns
I ran the code for a few times and the results are about the same. The first call would be even longer(>8000 ns) on my computer(Windows 8.1, Oracle Java 8u25).
Apparently, the first calls is usually slower than the following calls(Some calls may be longer in random cases).
Update:
I tried to learn some JMH, and write a test program
Code w/ sample output: Code
I don't know whether it's a proper benchmark(If the program has some problems, tell me), but I found that the first warm-up iterations spend more time(I use two warm-up iterations in case the warm-ups affect the results). And I think that the first warm-up should be the first call and is slower. So this phenomenon exists, if the test is proper.
So why does it happen?
You're calling System.nanoTime() inside a loop. Those calls are not free, so in addition to the time taken for an empty method you're actually measuring the time it takes to exit from nanotime call #1 and to enter nanotime call #2.
To make things worse, you're doing that on windows where nanotime performs worse compared to other platforms.
Regarding JMH: I don't think it's much help in this situation. It's designed to measure by averaging many iterations, to avoid dead code elimination, account for JIT warmup, avoid ordering dependence, ... and afaik it simply uses nanotime under the hood too.
Its design goals pretty much aim for the opposite of what you're trying to measure.
You are measuring something. But that something might be several cache misses, nanotime call overhead, some JVM internals (class loading? some kind of lazy initialization in the interpreter?), ... probably a combination thereof.
The point is that your measurement can't really be taken at face value. Even if there is a certain cost for calling a method for the first time, the time you're measuring only provides an upper bound for that.
This kind of behaviour is often caused by the compiler or RE. It starts to optimize the execution after the first iteration. Additionally class loading can have an effect (I guess this is not the case in your example code as all classes are loaded in the first loop latest).
See this thread for a similar problem.
Please keep in mind this kind of behaviour is often dependent on the environment/OS it's running on.
import java.util.ArrayList;
import java.util.List;
public class HowFastMulticoreProgramming {
public static void main(String[] args) {
//Produce Date
List<String> data=new ArrayList<String>();
for(int i=0;i<10000;i++){
data.add(""+i);
}
/*Style Java 1.4*/
long beforeStartJDK14=System.currentTimeMillis();
for (int i = 0; i < data.size(); i++) {
System.out.println(data.get(i));
}
long afterPrintJDK14=System.currentTimeMillis();
/*Style Java 1.5*/
long beforeStartJDK15=System.currentTimeMillis();
for (String s : data) {
System.out.println(s);
}
long afterPrintJDK15=System.currentTimeMillis();
long beforeStartJDK18=System.currentTimeMillis();
data.parallelStream().forEach(string-> System.out.println(string));
long afterPrintJDK18=System.currentTimeMillis();
System.out.println("Milis Need JDK 1.4 : "+(afterPrintJDK14-beforeStartJDK14));
System.out.println("Milis Need JDK 1.5 : "+(afterPrintJDK15-beforeStartJDK15));
System.out.println("Milis Need JDK 1.8 : "+(afterPrintJDK18-beforeStartJDK18));
}
}
I Have 3 styles to print List (based on JDK version). but every Styles need time to complete. In fact style jdk 8 with lambdas.. neeeded greater with any styles.
how come?
This is what I get from running this code;
Time Milis Need JDK 1.4 : 85
Time Milis Need JDK 1.5 : 76
Time Milis Need JDK 1.8 : 939
I hope somebody can answer this question.
This comparison is completely meaningless.
First, the first two variants are completely dominated by I/O time. Any loop over anything whatsoever that does output will usually be. The way you iterate has an effect that is probably in the noise. I/O is slow.
But it is not quite as slow as what you're doing in the third variant. In the third variant, you use parallelStream(), which invokes the join/fork machinery of Java8. You're spawning multiple threads (probably as many as you have CPU cores). You're distributing the tasks to write the list elements over these threads. You're then writing to the same stream from each of these threads, which serializes their operation, i.e. after you went through all the work of creating the threads and distributing tasks, you're still only doing one thing at a time, plus you're now also incurring massive synchronization overhead.
If you want to do an interesting comparison, you need to transform data into some other data, and you need to do non-trivial (but not synchronized) work on each item, so that the task management overhead doesn't swamp the computation time.
In the meantime, try using stream() instead of parallelStream(). That should get the time down to roughly the time of the other two variants. That doesn't make it any more meaningful though.
Disclaimer: You are doing micro benchmarks, and micro benchmarks are hard to do. I'm sure my slightly changed code below has enough problems by itself.
A parallelStream() needs some startup time, and you have the overhead multiple Threads are bringing with them.
Another problem is, that you are doing System.out.println() for each item - it's IO, so you are measuring a whole lot besides the iteration too. That's especially a problem when you are accessing one stream (System.out) from multiple threads.
If you delete your print statements, the JVM will problably just skip the loops, that's why I'm just adding every element to a sum. Should be quite fast, and it won't get optimized away.
When running the following example with a list size of 100000000 (takes about one minute to create), I get these results:
Milis Need JDK 1.4 : 190
Milis Need JDK 1.5 : 681
Milis Need JDK 1.8 : 198
My code:
#Test
public void testIterationSpeed() {
List<Integer> data = new ArrayList<>();
for (int i = 0; i < 100000000; i++) {
data.add(i);
}
/*Style Java 1.4*/
long dummySum = 0;
long beforeStartJDK14 = System.currentTimeMillis();
for (int i = 0; i < data.size(); i++) {
dummySum += data.get(i);
}
long afterPrintJDK14 = System.currentTimeMillis();
/*Style Java 1.5*/
dummySum = 0;
long beforeStartJDK15 = System.currentTimeMillis();
for (Integer i : data) {
dummySum += i;
}
long afterPrintJDK15 = System.currentTimeMillis();
/* Java 1.8 */
long beforeStartJDK18 = System.currentTimeMillis();
data.parallelStream().mapToLong(i -> i).sum();
long afterPrintJDK18 = System.currentTimeMillis();
System.out.println("Milis Need JDK 1.4 : " + (afterPrintJDK14 - beforeStartJDK14));
System.out.println("Milis Need JDK 1.5 : " + (afterPrintJDK15 - beforeStartJDK15));
System.out.println("Milis Need JDK 1.8 : " + (afterPrintJDK18 - beforeStartJDK18));
}
Note that if you decrease the list size, the overhead of the parallelStream will be too much - this fact is also called Amdahl's Law. And the process of creating the sum is different than in the other loops, so it's not a good benchmark.
What's interesting is that for each is slower than for in this case.
I am getting different execution time, if i interchange the HashMap and HashSet. The execution time is always high the one appears first ( either HashMap/ Hashset ). I am not sure about the reason behind this. Any help appreciated
Execution 1 - HashMap first , then HashSet ---
Time taken map add: 2071ms,
Time taken set add: 794ms
Execution 2 - HashSet first , then HashMap ---
Time taken set add: 2147ms,
Time taken map add: 781ms
private static Random secureRandom = new SecureRandom();
public static void main(String args[])
{
int testnumber = 1000000;
// HashMap
long starttimemap = System.currentTimeMillis();
Map<String, String> hashmap = new HashMap<String, String>();
for (int i = 0; i < testnumber; i++)
{
hashmap.put(Long.toHexString(secureRandom.nextLong()), "true");
}
long endtimemap = System.currentTimeMillis();
System.out.println("Time taken map add: " + (endtimemap - starttimemap) + "ms");
// HashSet
long starttimeset = System.currentTimeMillis();
Set<String> hashset = new HashSet<String>();
for (int i = 0; i < testnumber; i++)
{
hashset.add(Long.toHexString(secureRandom.nextLong()));
}
long endtimeset = System.currentTimeMillis();
System.out.println("Time taken set add: " + (endtimeset - starttimeset) + "ms");
}
The reason is the way the JVM works. The JIT compiler needs some time to kick in because it decides which code to compile based on execution count.
So, it's totally natural that the second pass is faster, because the JIT already compiled a lot of Java code to native code.
If you start the program using the -Xint option (which disables the JIT), both runs should be roughly equal in execution time.
One likely reason is that you're not warming up the JIT before performing the benchmarks.
Basically, Java executes bytecode (which is somewhat slower) for a while before figuring out what's used often enough to justify JIT compiling it into native machine code (which is faster). As such, whatever happens first will often be slower.
Run both things a bunch of times before starting the real benchmarks to give it a chance to JIT the relevant code.
You are not getting different execution times, you are getting the same execution times. Regardless of whether you use HashMap or HashSet you get the same time for the first loop and the same time for the second. The difference between the first and second has been explained already, it’s due to the JVM’s optimizations. It’s not surprising that it doesn’t matter whether you use HashMap or HashSet as HashSet uses a HashMap internally. You are executing the same code all the time.