Java 8 Speed test by Simple Print List - java

import java.util.ArrayList;
import java.util.List;
public class HowFastMulticoreProgramming {
public static void main(String[] args) {
//Produce Date
List<String> data=new ArrayList<String>();
for(int i=0;i<10000;i++){
data.add(""+i);
}
/*Style Java 1.4*/
long beforeStartJDK14=System.currentTimeMillis();
for (int i = 0; i < data.size(); i++) {
System.out.println(data.get(i));
}
long afterPrintJDK14=System.currentTimeMillis();
/*Style Java 1.5*/
long beforeStartJDK15=System.currentTimeMillis();
for (String s : data) {
System.out.println(s);
}
long afterPrintJDK15=System.currentTimeMillis();
long beforeStartJDK18=System.currentTimeMillis();
data.parallelStream().forEach(string-> System.out.println(string));
long afterPrintJDK18=System.currentTimeMillis();
System.out.println("Milis Need JDK 1.4 : "+(afterPrintJDK14-beforeStartJDK14));
System.out.println("Milis Need JDK 1.5 : "+(afterPrintJDK15-beforeStartJDK15));
System.out.println("Milis Need JDK 1.8 : "+(afterPrintJDK18-beforeStartJDK18));
}
}
I Have 3 styles to print List (based on JDK version). but every Styles need time to complete. In fact style jdk 8 with lambdas.. neeeded greater with any styles.
how come?
This is what I get from running this code;
Time Milis Need JDK 1.4 : 85
Time Milis Need JDK 1.5 : 76
Time Milis Need JDK 1.8 : 939
I hope somebody can answer this question.

This comparison is completely meaningless.
First, the first two variants are completely dominated by I/O time. Any loop over anything whatsoever that does output will usually be. The way you iterate has an effect that is probably in the noise. I/O is slow.
But it is not quite as slow as what you're doing in the third variant. In the third variant, you use parallelStream(), which invokes the join/fork machinery of Java8. You're spawning multiple threads (probably as many as you have CPU cores). You're distributing the tasks to write the list elements over these threads. You're then writing to the same stream from each of these threads, which serializes their operation, i.e. after you went through all the work of creating the threads and distributing tasks, you're still only doing one thing at a time, plus you're now also incurring massive synchronization overhead.
If you want to do an interesting comparison, you need to transform data into some other data, and you need to do non-trivial (but not synchronized) work on each item, so that the task management overhead doesn't swamp the computation time.
In the meantime, try using stream() instead of parallelStream(). That should get the time down to roughly the time of the other two variants. That doesn't make it any more meaningful though.

Disclaimer: You are doing micro benchmarks, and micro benchmarks are hard to do. I'm sure my slightly changed code below has enough problems by itself.
A parallelStream() needs some startup time, and you have the overhead multiple Threads are bringing with them.
Another problem is, that you are doing System.out.println() for each item - it's IO, so you are measuring a whole lot besides the iteration too. That's especially a problem when you are accessing one stream (System.out) from multiple threads.
If you delete your print statements, the JVM will problably just skip the loops, that's why I'm just adding every element to a sum. Should be quite fast, and it won't get optimized away.
When running the following example with a list size of 100000000 (takes about one minute to create), I get these results:
Milis Need JDK 1.4 : 190
Milis Need JDK 1.5 : 681
Milis Need JDK 1.8 : 198
My code:
#Test
public void testIterationSpeed() {
List<Integer> data = new ArrayList<>();
for (int i = 0; i < 100000000; i++) {
data.add(i);
}
/*Style Java 1.4*/
long dummySum = 0;
long beforeStartJDK14 = System.currentTimeMillis();
for (int i = 0; i < data.size(); i++) {
dummySum += data.get(i);
}
long afterPrintJDK14 = System.currentTimeMillis();
/*Style Java 1.5*/
dummySum = 0;
long beforeStartJDK15 = System.currentTimeMillis();
for (Integer i : data) {
dummySum += i;
}
long afterPrintJDK15 = System.currentTimeMillis();
/* Java 1.8 */
long beforeStartJDK18 = System.currentTimeMillis();
data.parallelStream().mapToLong(i -> i).sum();
long afterPrintJDK18 = System.currentTimeMillis();
System.out.println("Milis Need JDK 1.4 : " + (afterPrintJDK14 - beforeStartJDK14));
System.out.println("Milis Need JDK 1.5 : " + (afterPrintJDK15 - beforeStartJDK15));
System.out.println("Milis Need JDK 1.8 : " + (afterPrintJDK18 - beforeStartJDK18));
}
Note that if you decrease the list size, the overhead of the parallelStream will be too much - this fact is also called Amdahl's Law. And the process of creating the sum is different than in the other loops, so it's not a good benchmark.
What's interesting is that for each is slower than for in this case.

Related

Why is a particular Guava Stopwatch.elapsed() call much later than others? (output in post)

I am working on a small game project and want to track time in order to process physics. After scrolling through different approaches, at first I had decided to use Java's Instant and Duration classes and now switched over to Guava's Stopwatch implementation, however, in my snippet, both of those approaches have a big gap at the second call of runtime.elapsed(). That doesn't seem like a big problem in the long run, but why does that happen?
I have tried running the code below as both in focus and as a Thread, in Windows and in Linux (Ubuntu 18.04) and the result stays the same - the exact values differ, but the gap occurs. I am using the IntelliJ IDEA environment with JDK 11.
Snippet from Main:
public static void main(String[] args) {
MassObject[] planets = {
new Spaceship(10, 0, 6378000)
};
planets[0].run();
}
This is part of my class MassObject extends Thread:
public void run() {
// I am using StringBuilder to eliminate flushing delays.
StringBuilder output = new StringBuilder();
Stopwatch runtime = Stopwatch.createStarted();
// massObjectList = static List<MassObject>;
for (MassObject b : massObjectList) {
if(b!=this) calculateGravity(this, b);
}
for (int i = 0; i < 10; i++) {
output.append(runtime.elapsed().getNano()).append("\n");
}
System.out.println(output);
}
Stdout:
30700
1807000
1808900
1811600
1812400
1813300
1830200
1833200
1834500
1835500
Thanks for your help.
You're calling Duration.getNano() on the Duration returned by elapsed(), which isn't what you want.
The internal representation of a Duration is a number of seconds plus a nano offset for whatever additional fraction of a whole second there is in the duration. Duration.getNano() returns that nano offset, and should almost never be called unless you're also calling Duration.getSeconds().
The method you probably want to be calling is toNanos(), which converts the whole duration to a number of nanoseconds.
Edit: In this case that doesn't explain what you're seeing because it does appear that the nano offsets being printed are probably all within the same second, but it's still the case that you shouldn't be using getNano().
The actual issue is probably some combination of classloading or extra work that has to happen during the first call, and/or JIT improving performance of future calls (though I don't think looping 10 times is necessarily enough that you'd see much of any change from JIT).

Java: Why is calling a method for the first time slower?

Recently, I was writing a plugin using Java and found that retrieving an element(using get()) from a HashMap for the first time is very slow. Originally, I wanted to ask a question on that and found this (No answers though). With further experiments, however, I notice that this phenomenon happens on ArrayList and then all the methods.
Here is the code:
public class Test {
public static void main(String[] args) {
long startTime, stopTime;
// Method 1
System.out.println("Test 1:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod1();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
// Method 2
System.out.println("Test 2:");
for (int i = 0; i < 20; ++i) {
startTime = System.nanoTime();
testMethod2();
stopTime = System.nanoTime();
System.out.println((stopTime - startTime) + "ns");
}
}
public static void testMethod1() {
// Do nothing
}
public static void testMethod2() {
// Do nothing
}
}
Snippet: Test Snippet
The output would be like this:
Test 1:
2485ns
505ns
453ns
603ns
362ns
414ns
424ns
488ns
325ns
426ns
618ns
794ns
389ns
686ns
464ns
375ns
354ns
442ns
404ns
450ns
Test 2:
3248ns
700ns
538ns
531ns
351ns
444ns
321ns
424ns
523ns
488ns
487ns
491ns
551ns
497ns
480ns
465ns
477ns
453ns
727ns
504ns
I ran the code for a few times and the results are about the same. The first call would be even longer(>8000 ns) on my computer(Windows 8.1, Oracle Java 8u25).
Apparently, the first calls is usually slower than the following calls(Some calls may be longer in random cases).
Update:
I tried to learn some JMH, and write a test program
Code w/ sample output: Code
I don't know whether it's a proper benchmark(If the program has some problems, tell me), but I found that the first warm-up iterations spend more time(I use two warm-up iterations in case the warm-ups affect the results). And I think that the first warm-up should be the first call and is slower. So this phenomenon exists, if the test is proper.
So why does it happen?
You're calling System.nanoTime() inside a loop. Those calls are not free, so in addition to the time taken for an empty method you're actually measuring the time it takes to exit from nanotime call #1 and to enter nanotime call #2.
To make things worse, you're doing that on windows where nanotime performs worse compared to other platforms.
Regarding JMH: I don't think it's much help in this situation. It's designed to measure by averaging many iterations, to avoid dead code elimination, account for JIT warmup, avoid ordering dependence, ... and afaik it simply uses nanotime under the hood too.
Its design goals pretty much aim for the opposite of what you're trying to measure.
You are measuring something. But that something might be several cache misses, nanotime call overhead, some JVM internals (class loading? some kind of lazy initialization in the interpreter?), ... probably a combination thereof.
The point is that your measurement can't really be taken at face value. Even if there is a certain cost for calling a method for the first time, the time you're measuring only provides an upper bound for that.
This kind of behaviour is often caused by the compiler or RE. It starts to optimize the execution after the first iteration. Additionally class loading can have an effect (I guess this is not the case in your example code as all classes are loaded in the first loop latest).
See this thread for a similar problem.
Please keep in mind this kind of behaviour is often dependent on the environment/OS it's running on.

Why tests in Java each time goes faster?

I have a little question about Java optimization.
I have a code:
import java.util.LinkedList;
import java.util.List;
public class Test {
public static void main(String[] args) {
int testCount = 1_000_000;
test(testCount);
test(testCount);
test(testCount);
}
public static void test(int test) {
List list = new LinkedList();
long start = System.currentTimeMillis();
for (int i = 0; i< test; i++) {
list.add(0, i);
}
long finish = System.currentTimeMillis();
System.out.println("time " + (finish-start));
}
}
Each next iteration of this test much less than previous.
time 2443
time 924
time 143
Could you help me to understand why does it happen?
The problem is actually that Java has kind of a start-up phase. The code actually really gets fast after a short period of time. That's why the first time you perform the function call lasts the longest. When performing several more calls, you will see, that the execution time will be more stable after the first few iterations.
You are experiencing the process of warming-up JVM and kicking in various performance optimizations, including inlining.
Why tests in Java each time goes faster?
you even can not say: tests in Java each time goes faster because it is not correct, just test it again more (i.e. see this demo) so you can't ask for the why of a incorrect statement.
the time of execution of Java programs depends on many other conditions i.e. the situation of CPU, RAM, OS, etc and of course the time of execution of specific code maybe is different each time but we can't say it goes better per each execution.

Different execution time for HashMap and HashSet based on the order of execution?

I am getting different execution time, if i interchange the HashMap and HashSet. The execution time is always high the one appears first ( either HashMap/ Hashset ). I am not sure about the reason behind this. Any help appreciated
Execution 1 - HashMap first , then HashSet ---
Time taken map add: 2071ms,
Time taken set add: 794ms
Execution 2 - HashSet first , then HashMap ---
Time taken set add: 2147ms,
Time taken map add: 781ms
private static Random secureRandom = new SecureRandom();
public static void main(String args[])
{
int testnumber = 1000000;
// HashMap
long starttimemap = System.currentTimeMillis();
Map<String, String> hashmap = new HashMap<String, String>();
for (int i = 0; i < testnumber; i++)
{
hashmap.put(Long.toHexString(secureRandom.nextLong()), "true");
}
long endtimemap = System.currentTimeMillis();
System.out.println("Time taken map add: " + (endtimemap - starttimemap) + "ms");
// HashSet
long starttimeset = System.currentTimeMillis();
Set<String> hashset = new HashSet<String>();
for (int i = 0; i < testnumber; i++)
{
hashset.add(Long.toHexString(secureRandom.nextLong()));
}
long endtimeset = System.currentTimeMillis();
System.out.println("Time taken set add: " + (endtimeset - starttimeset) + "ms");
}
The reason is the way the JVM works. The JIT compiler needs some time to kick in because it decides which code to compile based on execution count.
So, it's totally natural that the second pass is faster, because the JIT already compiled a lot of Java code to native code.
If you start the program using the -Xint option (which disables the JIT), both runs should be roughly equal in execution time.
One likely reason is that you're not warming up the JIT before performing the benchmarks.
Basically, Java executes bytecode (which is somewhat slower) for a while before figuring out what's used often enough to justify JIT compiling it into native machine code (which is faster). As such, whatever happens first will often be slower.
Run both things a bunch of times before starting the real benchmarks to give it a chance to JIT the relevant code.
You are not getting different execution times, you are getting the same execution times. Regardless of whether you use HashMap or HashSet you get the same time for the first loop and the same time for the second. The difference between the first and second has been explained already, it’s due to the JVM’s optimizations. It’s not surprising that it doesn’t matter whether you use HashMap or HashSet as HashSet uses a HashMap internally. You are executing the same code all the time.

Why using Java threads is not much faster?

I have the following program to remove even numbers from a string vector, when the vector size grows larger, it might take a long time, so I thought of threads, but using 10 threads is not faster then one thread, my PC has 6 cores and 12 threads, why ?
import java.util.*;
public class Test_Threads
{
static boolean Use_Threads_To_Remove_Duplicates(Vector<String> Good_Email_Address_Vector,Vector<String> To_Be_Removed_Email_Address_Vector)
{
boolean Removed_Duplicates=false;
int Threads_Count=10,Delay=5,Average_Size_For_Each_Thread=Good_Email_Address_Vector.size()/Threads_Count;
Remove_Duplicate_From_Vector_Thread RDFVT[]=new Remove_Duplicate_From_Vector_Thread[Threads_Count];
Remove_Duplicate_From_Vector_Thread.To_Be_Removed_Email_Address_Vector=To_Be_Removed_Email_Address_Vector;
for (int i=0;i<Threads_Count;i++)
{
Vector<String> Target_Vector=new Vector<String>();
if (i<Threads_Count-1) for (int j=i*Average_Size_For_Each_Thread;j<(i+1)*Average_Size_For_Each_Thread;j++) Target_Vector.add(Good_Email_Address_Vector.elementAt(j));
else for (int j=i*Average_Size_For_Each_Thread;j<Good_Email_Address_Vector.size();j++) Target_Vector.add(Good_Email_Address_Vector.elementAt(j));
RDFVT[i]=new Remove_Duplicate_From_Vector_Thread(Target_Vector,Delay);
}
try { for (int i=0;i<Threads_Count;i++) RDFVT[i].Remover_Thread.join(); }
catch (Exception e) { e.printStackTrace(); } // Wait for all threads to finish
for (int i=0;i<Threads_Count;i++) if (RDFVT[i].Changed) Removed_Duplicates=true;
if (Removed_Duplicates) // Collect results
{
Good_Email_Address_Vector.clear();
for (int i=0;i<Threads_Count;i++) Good_Email_Address_Vector.addAll(RDFVT[i].Target_Vector);
}
return Removed_Duplicates;
}
public static void out(String message) { System.out.print(message); }
public static void Out(String message) { System.out.println(message); }
public static void main(String[] args)
{
long start=System.currentTimeMillis();
Vector<String> Good_Email_Address_Vector=new Vector<String>(),To_Be_Removed_Email_Address_Vector=new Vector<String>();
for (int i=0;i<1000;i++) Good_Email_Address_Vector.add(i+"");
Out(Good_Email_Address_Vector.toString());
for (int i=0;i<1500000;i++) To_Be_Removed_Email_Address_Vector.add(i*2+"");
Out("=============================");
Use_Threads_To_Remove_Duplicates(Good_Email_Address_Vector,To_Be_Removed_Email_Address_Vector); // [ Approach 1 : Use 10 threads ]
// Good_Email_Address_Vector.removeAll(To_Be_Removed_Email_Address_Vector); // [ Approach 2 : just one thread ]
Out(Good_Email_Address_Vector.toString());
long end=System.currentTimeMillis();
Out("Time taken for execution is " + (end - start));
}
}
class Remove_Duplicate_From_Vector_Thread
{
static Vector<String> To_Be_Removed_Email_Address_Vector;
Vector<String> Target_Vector;
Thread Remover_Thread;
boolean Changed=false;
public Remove_Duplicate_From_Vector_Thread(final Vector<String> Target_Vector,final int Delay)
{
this.Target_Vector=Target_Vector;
Remover_Thread=new Thread(new Runnable()
{
public void run()
{
try
{
Thread.sleep(Delay);
Changed=Target_Vector.removeAll(To_Be_Removed_Email_Address_Vector);
}
catch (InterruptedException e) { e.printStackTrace(); }
finally { }
}
});
Remover_Thread.start();
}
}
In my program you can try "[ Approach 1 : Use 10 threads ]" or "[ Approach 2 : just one thread ]" there isn't much difference speed wise, I expext it to be several times faster, why ?
The simple answer is that your threads are all trying to access a single vector calling synchronized methods. The synchronized modifier on those methods ensures that only one thread can be executing any of the methods on that object at any given time. So a significant part of the parallel part of the computation involves waiting for other threads.
The other problem is that for an O(N) input list, you have an O(N) setup ... population of the Target_Vector objects ... that is done in one thread. Plus the overheads of thread creation.
All of this adds up to not much speedup.
You should get a significant speedup (with multiple threads) if you used a single ConcurrentHashMap instead of a single Good_Email_Address_Vector object that gets split into multiple Target_Vector objects:
the remove operation is O(1) not O(n),
reduced copying,
the data structure provides better multi-threaded performance due to better handling of contention, and
you don't need to jump through hoops to avoid ConcurrentModificationException.
In addition, the To_Be_Removed_Email_Address_Vector object should be replaced with an unsynchronized List, and List.sublist(...) should be used to create views that can be passed to the threads.
In short, you are better of throwing away your current code and starting again. And please use sensible identifier names that follow the Java coding conventions, and
wrap your code at line ~80 so that people can read it!
Vector Synchronization Creates Contention
You've split up the vector to be modified, which avoids a some contention. But multiple threads are accessing a the static Vector To_Be_Removed_Email_Address_Vector, so much contention still remains (all Vector methods are synchronized).
Use an unsynchronized data structure for the shared, read-only information so that there is no contention between threads. On my machine, running your test with ArrayList in place of Vector cut the execution time in half.
Even without contention, thread-safe structures are slower, so don't use them when only a single thread has access to an object. Additionally, Vector is largely obsolete by Java 5. Avoid it unless you have to inter-operate with a legacy API you can't alter.
Choose a Suitable Data Structure
A list data structure is going to provide poor performance for this task. Since email addresses are likely to be unique, a set should be a suitable replace, and will removeAll() much faster on large sets. Using HashSet in place of the original Vector cut execution time on my (8 core) machine from over 5 seconds to around 3 milliseconds. Roughly half of this improvement is due to using the right data structure for the job.
Concurrent Structures Are a Bad Fit
Using a concurrent concurrent data structure is relatively slow, and doesn't simplify the code, so I don't recommend it.
Using a more up-to-date concurrent data structure is much faster than contending for a Vector, but the concurrency overhead of these data structures is still much higher than single-threaded structures. For example, running the original code on my machine took more than five seconds, while a ConcurrentSkipListSet took half a second, and a ConcurrentHashMap took one eighth of a second. But remember, when each thread had its own HashSet to update, the total time was just 3 milliseconds.
Even when all threads are updating a single concurrent data structure, the code needed to partition the workload is very similar to that used to create a separate Vector for each thread in the original code. From a readability and maintenance standpoint, all of these solutions have equivalent complexity.
If you had a situation where "bad" email addresses were being added to the set asynchronously, and you wanted readers of the "good" list to see those updates auto-magically, a concurrent set would be a good choice. But, with the current design of the API, where consumers of the "good" list explicitly call a blocking filter method to update the list, a concurrent data structure may be the wrong choice.
All your threads are working on the same vector. Your access to the vector is serialized (i.e. only one thread can access it at a time) so using multiple threads is likely to be the same speed at best, but more likely to be much slower.
Multiple threads work much faster when you have independent tasks to perform.
In this case, the fastest option is likely to be to create a new List which contains all the elements you want to retain and replacing the original, in one thread. This will be fastest than using a concurrent collection with multiple threads.
For comparison, this is what you can do with one thread. As the collection is fairly small, the JVM doesn't warmup in just one run, so there are multiple dummy runs which are not printed.
public static void main(String... args) throws IOException, InterruptedException, ParseException {
for (int n = -50; n < 5; n++) {
List<String> allIds = new ArrayList<String>();
for (int i = 0; i < 1000; i++) allIds.add(String.valueOf(i));
long start = System.nanoTime();
List<String> oddIds = new ArrayList<String>();
for (String id : allIds) {
if ((id.charAt(id.length()-1) % 2) != 0)
oddIds.add(id);
}
long time = System.nanoTime() - start;
if (n >= 0)
System.out.println("Time taken to filter " + allIds.size() + " entries was " + time / 1000 + " micro-seconds");
}
}
prints
Time taken to filter 1000 entries was 136 micro-seconds
Time taken to filter 1000 entries was 141 micro-seconds
Time taken to filter 1000 entries was 136 micro-seconds
Time taken to filter 1000 entries was 137 micro-seconds
Time taken to filter 1000 entries was 138 micro-seconds

Categories