Out of curiosity, I measured the performance between static block and static method initializer. First, I implemented the above mentioned methods in two separate java classes, like so:
First:
class Dummy {
static java.util.List<Integer> lista = new java.util.ArrayList<Integer>();
static {
for(int i=0; i < 1000000; ++i) {
lista.add(new Integer(i));
}
}
}
public class First {
public static void main(String[] args) {
long st = System.currentTimeMillis();
Dummy d = new Dummy();
long end = System.currentTimeMillis() - st;
System.out.println(end);
}
}
Second:
class Muddy {
static java.util.List<Integer> lista = new java.util.ArrayList<Integer>();
public static void initList() {
for(int i=0; i < 1000000; ++i) {
lista.add(new Integer(i));
}
}
}
public class Second {
public static void main(String[] args) {
long st = System.currentTimeMillis();
Muddy.initList();
Muddy m = new Muddy();
long end = System.currentTimeMillis() - st;
System.out.println(end);
}
}
Then I executed this little batch script to measure it 100 times and put the values in a file. batchFile.bat First Second dum.res.txt
After that, I wrote this piece of code to calculate mean value and standard deviation of Dummy's and Muddy's measured values.
This is the result that I've got:
First size: 100 Second size: 100
First Sum: 132 Std. deviation: 13
Second Sum: 112 Std. deviation: 9
And it is similar on my other machines...every time I test it.
Now I'm wondering, why is it so? I checked the bytecode and Second.class has one instruction more (call to static initList()) between calls to System.currentTimeMillis().
They both do the same thing, but why is the First one slower? I can't really reason it out just by looking at the bytecode, since this was my first time touching javap; I don't understand bytecode yet.
I think that the reason why the static block version is slower than the static method version could be due to the different JIT optimization that they get ...
See this interesting article for more interesting information : Java Secret: Are static blocks interpreted?
Here's my guess as to the reason for this:
The initialization you are doing is creating enough objects that it is causing one or more garbage collections.
When the initialization is called from the static block, it is done during the class initialization rather than during simple method execution. During class initialization, the garbage detector may have a little more work to do (because the execution stack is longer, for example) than during simple method execution, even though the contents of the heap are almost the same.
To test this, you could try adding -Xms200m or something to your java commands; this should eliminate the need to garbage collect during the initialization you are doing.
Related
In the tutorial of java multi-threading, it gives an exmaple of Memory Consistency Errors. But I can not reproduce it. Is there any other method to simulate Memory Consistency Errors?
The example provided in the tutorial:
Suppose a simple int field is defined and initialized:
int counter = 0;
The counter field is shared between two threads, A and B. Suppose thread A increments counter:
counter++;
Then, shortly afterwards, thread B prints out counter:
System.out.println(counter);
If the two statements had been executed in the same thread, it would be safe to assume that the value printed out would be "1". But if the two statements are executed in separate threads, the value printed out might well be "0", because there's no guarantee that thread A's change to counter will be visible to thread B — unless the programmer has established a happens-before relationship between these two statements.
I answered a question a while ago about a bug in Java 5. Why doesn't volatile in java 5+ ensure visibility from another thread?
Given this piece of code:
public class Test {
volatile static private int a;
static private int b;
public static void main(String [] args) throws Exception {
for (int i = 0; i < 100; i++) {
new Thread() {
#Override
public void run() {
int tt = b; // makes the jvm cache the value of b
while (a==0) {
}
if (b == 0) {
System.out.println("error");
}
}
}.start();
}
b = 1;
a = 1;
}
}
The volatile store of a happens after the normal store of b. So when the thread runs and sees a != 0, because of the rules defined in the JMM, we must see b == 1.
The bug in the JRE allowed the thread to make it to the error line and was subsequently resolved. This definitely would fail if you don't have a defined as volatile.
This might reproduce the problem, at least on my computer, I can reproduce it after some loops.
Suppose you have a Counter class:
class Holder {
boolean flag = false;
long modifyTime = Long.MAX_VALUE;
}
Let thread_A set flag as true, and save the time into
modifyTime.
Let another thread, let's say thread_B, read the Counter's flag. If thread_B still get false even when it is later than modifyTime, then we can say we have reproduced the problem.
Example code
class Holder {
boolean flag = false;
long modifyTime = Long.MAX_VALUE;
}
public class App {
public static void main(String[] args) {
while (!test());
}
private static boolean test() {
final Holder holder = new Holder();
new Thread(new Runnable() {
#Override
public void run() {
try {
Thread.sleep(10);
holder.flag = true;
holder.modifyTime = System.currentTimeMillis();
} catch (Exception e) {
e.printStackTrace();
}
}
}).start();
long lastCheckStartTime = 0L;
long lastCheckFailTime = 0L;
while (true) {
lastCheckStartTime = System.currentTimeMillis();
if (holder.flag) {
break;
} else {
lastCheckFailTime = System.currentTimeMillis();
System.out.println(lastCheckFailTime);
}
}
if (lastCheckFailTime > holder.modifyTime
&& lastCheckStartTime > holder.modifyTime) {
System.out.println("last check fail time " + lastCheckFailTime);
System.out.println("modify time " + holder.modifyTime);
return true;
} else {
return false;
}
}
}
Result
last check time 1565285999497
modify time 1565285999494
This means thread_B get false from Counter's flag filed at time 1565285999497, even thread_A has set it as true at time 1565285999494(3 milli seconds ealier).
The example used is too bad to demonstrate the memory consistency issue. Making it work will require brittle reasoning and complicated coding. Yet you may not be able to see the results. Multi-threading issues occur due to unlucky timing. If someone wants to increase the chances of observing issue, we need to increase chances of unlucky timing.
Following program achieves it.
public class ConsistencyIssue {
static int counter = 0;
public static void main(String[] args) throws InterruptedException {
Thread thread1 = new Thread(new Increment(), "Thread-1");
Thread thread2 = new Thread(new Increment(), "Thread-2");
thread1.start();
thread2.start();
thread1.join();
thread2.join();
System.out.println(counter);
}
private static class Increment implements Runnable{
#Override
public void run() {
for(int i = 1; i <= 10000; i++)
counter++;
}
}
}
Execution 1 output: 10963,
Execution 2 output: 14552
Final count should have been 20000, but it is less than that. Reason is count++ is multi step operation,
1. read count
2. increment count
3. store it
two threads may read say count 1 at once, increment it to 2. and write out 2. But if it was a serial execution it should have been 1++ -> 2++ -> 3.
We need a way to make all 3 steps atomic. i.e to be executed by only one thread at a time.
Solution 1: Synchronized
Surround the increment with Synchronized. Since counter is static variable you need to use class level synchronization
#Override
public void run() {
for (int i = 1; i <= 10000; i++)
synchronized (ConsistencyIssue.class) {
counter++;
}
}
Now it outputs: 20000
Solution 2: AtomicInteger
public class ConsistencyIssue {
static AtomicInteger counter = new AtomicInteger(0);
public static void main(String[] args) throws InterruptedException {
Thread thread1 = new Thread(new Increment(), "Thread-1");
Thread thread2 = new Thread(new Increment(), "Thread-2");
thread1.start();
thread2.start();
thread1.join();
thread2.join();
System.out.println(counter.get());
}
private static class Increment implements Runnable {
#Override
public void run() {
for (int i = 1; i <= 10000; i++)
counter.incrementAndGet();
}
}
}
We can do with semaphores, explicit locking too. but for this simple code AtomicInteger is enough
Sometimes when I try to reproduce some real concurrency problems, I use the debugger.
Make a breakpoint on the print and a breakpoint on the increment and run the whole thing.
Releasing the breakpoints in different sequences gives different results.
Maybe to simple but it worked for me.
Please have another look at how the example is introduced in your source.
The key to avoiding memory consistency errors is understanding the happens-before relationship. This relationship is simply a guarantee that memory writes by one specific statement are visible to another specific statement. To see this, consider the following example.
This example illustrates the fact that multi-threading is not deterministic, in the sense that you get no guarantee about the order in which operations of different threads will be executed, which might result in different observations across several runs. But it does not illustrate a memory consistency error!
To understand what a memory consistency error is, you need to first get an insight about memory consistency. The simplest model of memory consistency has been introduced by Lamport in 1979. Here is the original definition.
The result of any execution is the same as if the operations of all the processes were executed in some sequential order and the operations of each individual process appear in this sequence in the order specified by its program
Now, consider this example multi-threaded program, please have a look at this image from a more recent research paper about sequential consistency. It illustrates what a real memory consistency error might look like.
To finally answer your question, please note the following points:
A memory consistency error always depends on the underlying memory model (A particular programming languages may allow more behaviours for optimization purposes). What's the best memory model is still an open research question.
The example given above gives an example of sequential consistency violation, but there is no guarantee that you can observe it with your favorite programming language, for two reasons: it depends on the programming language exact memory model, and due to undeterminism, you have no way to force a particular incorrect execution.
Memory models are a wide topic. To get more information, you can for example have a look at Torsten Hoefler and Markus Püschel course at ETH Zürich, from which I understood most of these concepts.
Sources
Leslie Lamport. How to Make a Multiprocessor Computer That Correctly Executes Multiprocessor Programs, 1979
Wei-Yu Chen, Arvind Krishnamurthy, Katherine Yelick, Polynomial-Time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays, 2003
Design of Parallel and High-Performance Computing course, ETH Zürich
I have an ArrayList around 3483 object that ready to merge in database.
When I use lambda expressions - it takes 85122 ms.
But for(Obj o : list) takes only 25 ms. Why Java8 takes 3404 times more time ?
List<CIBSubjectData> list1 = .....
list1.forEach(data ->
merge(data)
);
for (CIBSubjectData data : list1) {
merge(data);
}
I belive you are not using a proper microbenchmark setting. You are comparing the warmup of the bytecode instrumentation framework (ASM which is used to generated the lambda bytecode at runtime) + lambda execution time with the execution time of the loop.
Check this answer for performance-difference-between-java-8-lambdas-and-anonymous-inner-classes and the linked document. The linked document has a deep insight about the processing under the hood.
edit To provide a small snippet to demonstrate the above.
public class Warmup {
static int dummy;
static void merge(String s) {
dummy += s.length();
dummy++;
dummy -= s.length();
}
public static void main(String[] args) throws IOException {
List<String> list1 = new ArrayList<>();
Random rand = new Random(1);
for (int i = 0; i < 100_000; i++) {
list1.add(Long.toString(rand.nextLong()));
}
// this will boostrap the bytecode instrumentation
// Stream.of("foo".toCharArray()).forEach(System.out::println);
long start = System.nanoTime();
list1.forEach(data -> merge(data));
long end = System.nanoTime();
System.out.printf("duration: %d%n", end - start);
System.out.println(dummy);
}
}
If you run the code as it is posted the printed duration on my machine is
duration: 71694425
If you uncomment the line Stream.of(... (which is only there to use the the bytecode instrumentation framework the first time) the printed duration is
duration: 7516086
Which is only around 10% of the initial run.
note Only to be explicit. Don't use benchmarks like the above. Have a look at jmh for such a requirement.
class testx
{
public testx()
{
long startTime = System.nanoTime();
System.out.println((System.nanoTime() - startTime));
}
public static void main(String args[])
{
new testx();
new testx();
new testx();
}
}
I always get results similar to this 7806 660 517. Why the first call takes 10 times more time than other ones?
Because the JVM loads a bunch o' classes for the first time at that point. Once that first System.nanoTime() returns, you have already loaded System.class and testx.class, but once System.out.println comes into the picture, I suspect a lot of I/O classes get loaded up, and that takes some time.
In any event, this is not a good benchmarking technique; you should really be warming up the JIT by running something for ~10000 iterations before you start measuring it. Alternately (and preferably), use a pre-built benchmarking tool like Caliper.
It is definitely as Louis Wasserman, It takes longer on its first round as it has to load all the necessary System classes, you can get around this by calling a blank println() before creating new instances of the class, because look what happens when we do this:
public class testx
{
public testx()
{
long startTime = System.nanoTime();
System.out.println((System.nanoTime() - startTime));
}
public static void main(String args[])
{
//loads all System.* classes before calling constructor to decrease time it takes
System.out.println();
new testx();
new testx();
new testx();
}
}
output:
405 0 405
where as your initial code outputted:
7293 0 405
In the following code snippet, Foo1 is a class that increments a counter every time the method bar() is called. Foo2 does the same thing but with one additional level of indirection.
I would expect Foo1 to be faster than Foo2, however in practice, Foo2 is consistently 40% faster than Foo1. How is the JVM optimizing the code such that Foo2 runs faster than Foo1?
Some Details
The test was executed with java -server CompositionTest.
Running the test with java -client CompositionTest produces the expected results, that Foo2 is slower than Foo1.
Switching the order of the loops does not make a difference.
The results were verified with java6 on both sun and openjdk's JVMs.
The Code
public class CompositionTest {
private static interface DoesBar {
public void bar();
public int count();
public void count(int c);
}
private static final class Foo1 implements DoesBar {
private int count = 0;
public final void bar() { ++count; }
public int count() { return count; }
public void count(int c) { count = c; }
}
private static final class Foo2 implements DoesBar {
private DoesBar bar;
public Foo2(DoesBar bar) { this.bar = bar; }
public final void bar() { bar.bar(); }
public int count() { return bar.count(); }
public void count(int c) { bar.count(c); }
}
public static void main(String[] args) {
long time = 0;
DoesBar bar = null;
int reps = 100000000;
for (int loop = 0; loop < 10; loop++) {
bar = new Foo1();
bar.count(0);
int i = reps;
time = System.nanoTime();
while (i-- > 0) bar.bar();
time = System.nanoTime() - time;
if (reps != bar.count())
throw new Error("reps != bar.count()");
}
System.out.println("Foo1 time: " + time);
for (int loop = 0; loop < 10; loop++) {
bar = new Foo2(new Foo1());
bar.count(0);
int i = reps;
time = System.nanoTime();
while (i-- > 0) bar.bar();
time = System.nanoTime() - time;
if (reps != bar.count())
throw new Error("reps != bar.count()");
}
System.out.println("Foo2 time: " + time);
}
}
Your microbench mark is meaningless. On my computer the code runs in about 8ms for each loop... To have any meaningful number a benchmark should probably run for at least a second.
When run both for around a second (hint, you need more than Integer.MAX_VALUE repetitions) I find that the run times of both are identical.
The likely explanation for this is that the JIT compiler has noticed that your indirection is meaningless and optimised it out (or at least inlined the method calls) such that the code executed in both loops is identical.
It can do this because it knows bar in Foo2 is effectively final, it also know that the argument to the Foo2 constructor is always going to be a Foo1 (at least in our little test). That way it knows the exact code path when Foo2.bar is called. It also knows that this loop is going to run a lot of times (in fact it knows exactly how many times the loop will execute) -- so it seems like a good idea to inline the code.
I have no idea if that is precisely what it does, but these are all logical observations that the JIT could me making about the code. Perhaps in the future some JIT compilers might even optimise the entire while loop and simply set count to reps, but that seems somewhat unlikely.
Trying to predict performance on modern languages is not very productive.
The JVM is constantly modified to increase performance of common, readable structures which, in contrast, makes uncommon, awkward code slower.
Just write your code as clearly as you can--then if you really identify a point where your code is actually identified as too slow to pass written specifications, you may have to hand-tweak some areas--but this will probably involve large, simple ideas like object caches, tweaking JVM options and eliminating truly stupid/wrong code (Wrong data structures can be HUGE, I once changed an ArrayList to a LinkedList and reduced an operation from 10 minutes to 5 seconds, multi-threading a ping operation that discovered a class-B network took an operation from 8+ hours to minutes).
This code should produce even and uneven output because there is no synchronized on any methods. Yet the output on my JVM is always even. I am really confused as this example comes straight out of Doug Lea.
public class TestMethod implements Runnable {
private int index = 0;
public void testThisMethod() {
index++;
index++;
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
Output
Thread[Thread-8,5,main] 135134
Thread[Thread-8,5,main] 135136
Thread[Thread-8,5,main] 135138
Thread[Thread-8,5,main] 135140
Thread[Thread-8,5,main] 135142
Thread[Thread-8,5,main] 135144
I tried with volatile and got the following (with an if to print only if odd):
Thread[Thread-12,5,main] 122229779
Thread[Thread-12,5,main] 122229781
Thread[Thread-12,5,main] 122229783
Thread[Thread-12,5,main] 122229785
Thread[Thread-12,5,main] 122229787
Answer to comments:
the index is infact shared, because we have one TestMethod instance but many Threads that call testThisMethod() on the one TestMethod that we have.
Code (no changes besides the mentioned above):
public class TestMethod implements Runnable {
volatile private int index = 0;
public void testThisMethod() {
index++;
index++;
if(index % 2 != 0){
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
First off all: as others have noted there's no guarantee at all, that your threads do get interrupted between the two increment operations.
Note that printing to System.out pretty likely forces some kind of synchronization on your threads, so your threads are pretty likely to have just started a time slice when they return from that, so they will probably complete the two incrementation operations and then wait for the shared resource for System.out.
Try replacing the System.out.println() with something like this:
int snapshot = index;
if (snapshot % 2 != 0) {
System.out.println("Oh noes! " + snapshot);
}
You don't know that. The point of automatic scheduling is that it makes no guarantees. It might treat two threads that run the same code completely different. Or completely the same. Or completely the same for an hour and then suddenly different...
The point is, even if you fix the problems mentioned in the other answers, you still cannot rely on things coming out a particular way; you must always be prepared for any possible interleaving that the Java memory and threading model allows, and that includes the possibility that the println always happens after an even number of increments, even if that seems unlikely to you on the face of it.
The result is exactly as I would expect. index is being incremented twice between outputs, and there is no interaction between threads.
To turn the question around - why would you expect odd outputs?
EDIT: Whoops. I wrongly assumed a new runnable was being created per Thread, and therefore there was a distinct index per thread, rather than shared. Disturbing how such a flawed answer got 3 upvotes though...
You have not marked index as volatile. This means that the compiler is allowed to optimize accesses to it, and it probably merges your 2 increments to one addition.
You get the output of the very first thread you start, because this thread loops and gives no chance to other threads to run.
So you should Thread.sleep() or (not recommended) Thread.yield() in the loop.