How Java/Scala synchronized affect performance?

How Java/Scala synchronized affect performance? - java

I am using scala, a timing function to time the method.
def timing()(f: => T) = {
val start = System.currentTimeMillis()
val result = f
val end = System.currentTimeMillis()
// print time here
result
}
I have a fun() and use following to time it.
(0 to 10000).map{
timing(fun())
}
it is 8ms on average
I use following and time it again
(0 to 10000).map{
timing(fun())
Singleton.synchronized(Singleton.i += 1)
Thread.sleep(50)
Singleton.synchronized(Singleton.i -= 1)
}
object Singleton{
var i = 0
}
The timing shows fun on average becomes 30ms now. Very few records could be 8ms, and most of them are around 30~35ms
but the timing is totally outside the synchronized block. How does this happen? How does synchronization bring the overhead?

Related

How to improve the performance for System.currentTimeMillis();?

The System.currentTimeMillis(); is system method in Java.
If invoke this method serially, it seems that no performance issues.
But if you keep invoking this method concurrently, the performance issue will occurred explicitly. As the native method dependent with OS clock_source. But how to improve it performance in Java. Refresh time milli policy with fixed rate is not usable.
Examples like below:
int parallism = 32;
for(int i=0;i< parallism ;i++){
new Thread(() -> {
for(;;){
// Focus here, how can i measure the logic efficiently
long begin = System.currentTimeMillis();
// Here may be the logic:
// Define empty block here means 0ms elapsed
long elapsed = (System.currentTimeMillis() - begin);
if(elapsed >= 5){
System.err.println("Elapsed: "+elapsed+" ms.");
}
}
}).start();
}
Thread.sleep(Integer.MAX_VALUE); // Just avoid process exit

Reason of low performance: https://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html
(Unusable) Another solution: https://programmer.group/5e85bd0cc8b52.html
Wait me to post my solution....

Try to use
System.nanoTime() instead of System.currentTimeMills();

Does test order affect performance result?

I wrote 2 blocks of time measurement code. The print result of t1 is always much bigger than t2.
Block1 and block2 do the exact same thing. If I write block 2 before block1, then The print result of t2 is much lesser than t1.
I wonder why this happens.
#Test
fun test(){
val list = (1..100000).toList()
//block 1
var t1 = System.nanoTime()
list.filter { it % 7 == 0 }
t1 = System.nanoTime() - t1
//block 2
var t2 = System.nanoTime()
list.filter { it % 7 == 0 }
t2 = System.nanoTime() - t2
//print
println(t1)
println(t2)
}

What you are experiencing is called the warmup. The first requests made to a Kotlin (and other JVm based languages) is often substantially slower than the average response time. This warm-up period is caused by lazy class loading and just-in-time compilation.
There are a few ways how to measure performance more reliably. One of them is to create a warmup manually before the test itself is executed. Even more reliable method would be to use a specialized library such as JMH.
Example of manual warmup:
// warmup
for (i in 1..9999) {
val list = (1..100000).toList()
list.filter { it % 7 == 0 }
}
// rest of the test
As a side note, Kotlin has built-it functions which you can use instead of manually calculating the time difference. There are measureTimeMillis and measureNanoTime.
It would be used like this:
val time = measureNanoTime {
list.filter { it % 7 == 0 }
}

Java factorial calculation with thread pool

I achieved to calculate factorial with two threads without the pool. I have two factorial classes which are named Factorial1, Factorial2 and extends Thread class. Let's consider I want to calculate the value of !160000. In Factorial1's run() method I do the multiplication in a for loop from i=2 to i=80000 and in Factorial2's from i=80001 to 160000. After that, i return both values and multiply them in the main method. When I compare the execution time it's much better (which is 5000 milliseconds) than the non-thread calculation's time (15000 milliseconds) even with two threads.
Now I want to write clean and better code because I saw the efficiency of threads at factorial calculation but when I use a thread pool to calculate the factorial value, the parallel calculation always takes more time than the non-thread calculation (nearly 16000). My code pieces look like:
for(int i=2; i<= Calculate; i++)
{
myPool.execute(new Multiplication(result, i));
}
run() method which is in Multiplication class:
public void run()
{
s1.Mltply(s2); // s1 and s2 are instances of my Number class
// their fields holds BigInteger values
}
Mltply() method which is in Number class:
public void Multiply(int number)
{
area.lock(); // result is going wrong without lock
Number temp = new Number(number);
value = value.multiply(temp.value); // value is a BigInteger
area.unlock();
}
In my opinion this lock may kills the all advantage of the thread usage because it seems like all that threads do is multiplication but nothing else. But without it, i can't even calculate the true result. Let's say i want to calculate !10, so thread1 calculates the 10*9*8*7*6 and thread2 calculate the 5*4*3*2*1. Is that the way I'm looking for? Is it even possible with thread pool? Of course execution time must be less than the normal calculation...
I appreciate all your help and suggestion.
EDIT: - My own solution to the problem -
public class MyMultiplication implements Runnable
{
public static BigInteger subResult1;
public static BigInteger subResult2;
int thread1StopsAt;
int thread2StopsAt;
long threadId;
static boolean idIsSet=false;
public MyMultiplication(BigInteger n1, int n2) // First Thread
{
MyMultiplication.subResult1 = n1;
this.thread1StopsAt = n2/2;
thread2StopsAt = n2;
}
public MyMultiplication(int n2,BigInteger n1) // Second Thread
{
MyMultiplication.subResult2 = n1;
this.thread2StopsAt = n2;
thread1StopsAt = n2/2;
}
#Override
public void run()
{
if(idIsSet==false)
{
threadId = Thread.currentThread().getId();
idIsSet=true;
}
if(Thread.currentThread().getId() == threadId)
{
for(int i=2; i<=thread1StopsAt; i++)
{
subResult1 = subResult1.multiply(BigInteger.valueOf(i));
}
}
else
{
for(int i=thread1StopsAt+1; i<= thread2StopsAt; i++)
{
subResult2 = subResult2.multiply(BigInteger.valueOf(i));
}
}
}
}
public class JavaApplication3
{
public static void main(String[] args) throws InterruptedException
{
int calculate=160000;
long start = System.nanoTime();
BigInteger num = BigInteger.valueOf(1);
for (int i = 2; i <= calculate; i++)
{
num = num.multiply(BigInteger.valueOf(i));
}
long end = System.nanoTime();
double time = (end-start)/1000000.0;
System.out.println("Without threads: \t" +
String.format("%.2f",time) + " miliseconds");
System.out.println("without threads Result: " + num);
BigInteger num1 = BigInteger.valueOf(1);
BigInteger num2 = BigInteger.valueOf(1);
ExecutorService myPool = Executors.newFixedThreadPool(2);
start = System.nanoTime();
myPool.execute(new MyMultiplication(num1,calculate));
Thread.sleep(100);
myPool.execute(new MyMultiplication(calculate,num2));
myPool.shutdown();
while(!myPool.isTerminated()) {} // waiting threads to end
end = System.nanoTime();
time = (end-start)/1000000.0;
System.out.println("With threads: \t" +String.format("%.2f",time)
+ " miliseconds");
BigInteger result =
MyMultiplication.subResult1.
multiply(MyMultiplication.subResult2);
System.out.println("With threads Result: " + result);
System.out.println(MyMultiplication.subResult1);
System.out.println(MyMultiplication.subResult2);
}
}
input : !160000
Execution time without threads : 15000 milliseconds
Execution time with 2 threads : 4500 milliseconds
Thanks for ideas and suggestions.

You may calculate !160000 concurrently without using a lock by splitting 160000 into disjunct junks as you explaint by splitting it into 2..80000 and 80001..160000.
But you may achieve this by using the Java Stream API:
IntStream.rangeClosed(1, 160000).parallel()
.mapToObj(val -> BigInteger.valueOf(val))
.reduce(BigInteger.ONE, BigInteger::multiply);
It does exactly what you try to do. It splits the whole range into junks, establishes a thread pool and computes the partial results. Afterwards it joins the partial results into a single result.
So why do you bother doing it by yourself? Just practicing clean coding?
On my real 4 core machine computation in a for loop took 8 times longer than using a parallel stream.

Threads have to run independent to run fast. Many dependencies like locks, synchronized parts of your code or some system calls leads to sleeping threads which are waiting to access some resources.
In your case you should minimize the time a thread is inside the lock. Maybe I am wrong, but it seems like you create a thread for each number. So for 1.000! you spawn 1.000 Threads. All of them trying to get the lock on area and are not able to calculate anything, because one thread has become the lock and all other threads have to wait until the lock is unlocked again. So the threads are only running in serial which is as fast as your non-threaded example plus the extra time for locking and unlocking, thread management and so on. Oh, and because of cpu's context switching it gets even worse.
Your first attempt to splitt the factorial in two threads is the better one. Each thread can calculate its own result and only when they are done the threads have to communicate with each other. So they are independent most of the time.
Now you have to generalize this solution. To reduce context switching of the cpu you only want as many threads as your cpu has cores (maybe a little bit less because of your OS). Every thread gets a rang of numbers and calculates their product. After this it locks the overall result and adds its own result to it.
This should improve the performance of your problem.
Update: You ask for additional advice:
You said you have two classes Factorial1 and Factorial2. Probably they have their ranges hard codes. You only need one class which takes the range as constructor arguments. This class implements Runnable so it has a run-Method which multiplies all values in that range.
In you main-method you can do something like that:
int n = 160_000;
int threads = 2;
ExecutorService executor = Executors.newFixedThreadPool(threads);
for (int i = 0; i < threads; i++) {
int start = i * (n/threads) + 1;
int end = (i + 1) * (n/threads) + 1;
executor.execute(new Factorial(start, end));
}
executor.shutdown();
executor.awaitTermination(1, TimeUnit.DAYS);
Now you have calculated the result of each thread but not the overall result. This can be solved by a BigInteger which is visible to the Factorial-class (like a static BigInteger reuslt; in the same main class.) and a lock, too. In the run-method of Factorial you can calculate the overall result by locking the lock and calculation the result:
Main.lock.lock();
Main.result = Main.result.multiply(value);
Main.lock.unlock();
Some additional advice for the future: This isn't really clean because Factorial needs to have information about your main class, so it has a dependency to it. But ExecutorService returns a Future<T>-Object which can be used to receive the result of the thread. Using this Future-Object you don't need to use locks. But this needs some extra work, so just try to get this running for now ;-)

In addition to my Java Stream API solution here another solution which uses a self-managed thread-pool as you demanded:
public static final int CHUNK_SIZE = 10000;
public static BigInteger fac(int max) {
ExecutorService executor = newCachedThreadPool();
try {
return rangeClosed(0, (max - 1) / CHUNK_SIZE)
.mapToObj(val -> executor.submit(() -> prod(leftBound(val), rightBound(val, max))))
.map(future -> valueOf(future))
.reduce(BigInteger.ONE, BigInteger::multiply);
} finally {
executor.shutdown();
}
}
private static int leftBound(int chunkNo) {
return chunkNo * CHUNK_SIZE + 1;
}
private static int rightBound(int chunkNo, int max) {
return Math.min((chunkNo + 1) * CHUNK_SIZE, max);
}
private static BigInteger valueOf(Future<BigInteger> future) {
try {
return future.get();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private static BigInteger prod(int min, int max) {
BigInteger res = BigInteger.valueOf(min);
for (int val = min + 1; val <= max; val++) {
res = res.multiply(BigInteger.valueOf(val));
}
return res;
}

Java calculations that takes X amount of time

This is just a hypothetical question, but could be a way to get around an issue I have been having.
Imagine you want to be able to time a calculation function based not on the answer, but on the time it takes to calculating. So instead of finding out what a + b is, you wish to continue perform some calculation while time < x seconds.
Look at this pseudo code:
public static void performCalculationsForTime(int seconds)
{
// Get start time
int millisStart = System.currentTimeMillis();
// Perform calculation to find the 1000th digit of PI
// Check if the given amount of seconds have passed since millisStart
// If number of seconds have not passed, redo the 1000th PI digit calculation
// At this point the time has passed, return the function.
}
Now I know that I am horrible, despicable person for using precious CPU cycles to simple get time to pass, but what I am wondering is:
A) Is this possible and would JVM start complaining about non-responsiveness?
B) If it is possible, what calculations would be best to try to perform?
Update - Answer:
Based on the answers and comments, the answer seems to be that "Yes, this is possible. But only if it is not done in Android main UI thread, because the user's GUI will be become unresponsive and will throw an ANR after 5 seconds."

A) Is this possible and would JVM start complaining about non-responsiveness?
It is possible, and if you run it in the background, neither JVM nor Dalvik will complain.
B) If it is possible, what calculations would be best to try to perform?
If the objective is to just run any calculation for x seconds, just keep adding 1 to a sum until the required time has reached. Off the top of my head, something like:
public static void performCalculationsForTime(int seconds)
{
// Get start time
int secondsStart = System.currentTimeMillis()/1000;
int requiredEndTime = millisStart + seconds;
float sum = 0;
while(secondsStart != requiredEndTime) {
sum = sum + 0.1;
secondsStart = System.currentTimeMillis()/1000;
}
}

You can and JVM won't complain if your code is not part of some complex system that actually tracks thread execution time.
long startTime = System.currentTimeMillis();
while(System.currentTimeMillis() - startTime < 100000) {
// do something
}
Or even a for loop that checks time only every 1000 cycles.
for (int i = 0; ;i++) {
if (i % 1000 == 0 && System.currentTimeMillis() - startTime < 100000)
break;
// do something
}
As for your second question, the answer is probably calculating some value that can always be improved upon, like your PI digits example.

Java Reflection Performance Issue

I know there's a lot of topics talking about Reflection performance.
Even official Java docs says that Reflection is slower, but I have this code:
public class ReflectionTest {
public static void main(String[] args) throws Exception {
Object object = new Object();
Class<Object> c = Object.class;
int loops = 100000;
long start = System.currentTimeMillis();
Object s;
for (int i = 0; i < loops; i++) {
s = object.toString();
System.out.println(s);
}
long regularCalls = System.currentTimeMillis() - start;
java.lang.reflect.Method method = c.getMethod("toString");
start = System.currentTimeMillis();
for (int i = 0; i < loops; i++) {
s = method.invoke(object);
System.out.println(s);
}
long reflectiveCalls = System.currentTimeMillis() - start;
start = System.currentTimeMillis();
for (int i = 0; i < loops; i++) {
method = c.getMethod("toString");
s = method.invoke(object);
System.out.println(s);
}
long reflectiveLookup = System.currentTimeMillis() - start;
System.out.println(loops + " regular method calls:" + regularCalls
+ " milliseconds.");
System.out.println(loops + " reflective method calls without lookup:"
+ reflectiveCalls+ " milliseconds.");
System.out.println(loops + " reflective method calls with lookup:"
+ reflectiveLookup + " milliseconds.");
}
}
That I don't think is a valid benchmark, but at least should show some difference.
I executed it waiting to see the reflection normal calls being a bit slower than regular ones.
But this prints this:
100000 regular method calls:1129 milliseconds.
100000 reflective method calls without lookup:910 milliseconds.
100000 reflective method calls with lookup:994 milliseconds.
Just for note, first I executed it without that bunch of sysouts, and then I realized that some JVM optimization are just making it goes faster, so I added these printls to see if reflection was still faster.
The result without sysouts are:
100000 regular method calls:68 milliseconds.
100000 reflective method calls without lookup:48 milliseconds.
100000 reflective method calls with lookup:168 milliseconds.
I saw over internet that the same test executed on old JVMs make the reflective without lookup are two times slower than regular calls, and that speed falls over new updates.
If anyone can execute it and say me I'm wrong, or at least show me if there's something different than the past that make it faster.
Following instructions, I ran every loop separated and the result are (without sysouts)
100000 regular method calls:70 milliseconds.
100000 reflective method calls without lookup:120 milliseconds.
100000 reflective method calls with lookup:129 milliseconds.

Never performance test different bits of code in the same "run". The JVM has various optimisations that mean it though the end result is the same, how the internals are performed may differ. In more concrete terms, during your test the JVM may have noticed you are calling Object.toString a lot and have started to inline the method calls to Object.toString. It may have started to perform loop unfolding. Or there could have been a garbage collection in the first loop but not the second or third loops.
To get a more meaningful, but still not totally accurate picture you should separate your test into three separate programs.
The results on my computer (with no printing and 1,000,000 runs each)
All three loops run in same program
1000000 regular method calls: 490 milliseconds.
1000000 reflective method calls without lookup: 393 milliseconds.
1000000 reflective method calls with loopup: 978 milliseconds.
Loops run in separate programs
1000000 regular method calls: 475 milliseconds.
1000000 reflective method calls without lookup: 555 milliseconds.
1000000 reflective method calls with loopup: 1160 milliseconds.

There's an article by Brian Goetz on microbenchmarks that's worth reading. It looks like you're not doing anything to warm up the JVM (meaning give it a chance to do whatever inlining or other optimizations it's going to do) before doing your measurements, so it's likely the non-reflective test is still not warmed-up yet, and that could skew your numbers.

When you have multiple long running loops, the first loop can trigger the method to compile resulting in the later loops being optimised from the start. However the optimisation can be sub-optimal as it has no runtime information for those loops. The toString is relatively expensive and couple be taking longer than the reflections calls.
You don't need separate programs to avoid loop being optimised due to an earlier loop. You can run them in different methods.
The results I get are
Average regular method calls:2 ns.
Average reflective method calls without lookup:10 ns.
Average reflective method calls with lookup:240 ns.
The code
import java.lang.reflect.Method;
public class ReflectionTest {
public static void main(String[] args) throws Exception {
int loops = 1000 * 1000;
Object object = new Object();
long start = System.nanoTime();
Object s;
testMethodCall(object, loops);
long regularCalls = System.nanoTime() - start;
java.lang.reflect.Method method = Object.class.getMethod("getClass");
method.setAccessible(true);
start = System.nanoTime();
testInvoke(object, loops, method);
long reflectiveCalls = System.nanoTime() - start;
start = System.nanoTime();
testGetMethodInvoke(object, loops);
long reflectiveLookup = System.nanoTime() - start;
System.out.println("Average regular method calls:"
+ regularCalls / loops + " ns.");
System.out.println("Average reflective method calls without lookup:"
+ reflectiveCalls / loops + " ns.");
System.out.println("Average reflective method calls with lookup:"
+ reflectiveLookup / loops + " ns.");
}
private static Object testMethodCall(Object object, int loops) {
Object s = null;
for (int i = 0; i < loops; i++) {
s = object.getClass();
}
return s;
}
private static Object testInvoke(Object object, int loops, Method method) throws Exception {
Object s = null;
for (int i = 0; i < loops; i++) {
s = method.invoke(object);
}
return s;
}
private static Object testGetMethodInvoke(Object object, int loops) throws Exception {
Method method;
Object s = null;
for (int i = 0; i < loops; i++) {
method = Object.class.getMethod("getClass");
s = method.invoke(object);
}
return s;
}
}

Micro-benchmarks like this are never going to be accurate at all - as the VM "warms up" it'll inline bits of code and optimise bits of code as it goes along, so the same thing executed 2 minutes into a program could vastly outperform it right at the start.
In terms of what's happening here, my guess is that the first "normal" method call block warms it up, so the reflective blocks (and indeed all subsequent calls) would be faster. The only overhead added through reflectively calling a method that I can see is looking up the pointer to that method, which is a nanosecond-scale operation anyway and would be easily cached by the JVM. The rest would be on how the VM is warmed up, which it is by the time you reach the reflective calls.

There is no inherent reason why reflective call should be slower than a normal call. JVM can optimize them into the same thing.
Practically, human resources are limited, and they had to optimize normal calls first. As time passes by they can work on optimizing reflective calls; especially when reflection becomes more and more popular.

I have been writing my own micro-benchmark, without loops, and with System.nanoTime():
public static void main(String[] args) throws NoSuchMethodException, IllegalArgumentException, IllegalAccessException, InvocationTargetException
{
Object obj = new Object();
Class<Object> objClass = Object.class;
String s;
long start = System.nanoTime();
s = obj.toString();
long directInvokeEnd = System.nanoTime();
System.out.println(s);
long methodLookupStart = System.nanoTime();
java.lang.reflect.Method method = objClass.getMethod("toString");
long methodLookupEnd = System.nanoTime();
s = (String) (method.invoke(obj));
long reflectInvokeEnd = System.nanoTime();
System.out.println(s);
System.out.println(directInvokeEnd - start);
System.out.println(methodLookupEnd - methodLookupStart);
System.out.println(reflectInvokeEnd - methodLookupEnd);
}
I have been executing that in Eclipse on my machine a dozen times, and the results vary quite a bit, but here is what I typically get:
the direct method invocation clocks at 40-50 microseconds
method lookup clocks at 150-200 microseconds
reflective invocation with the method variable clocks at 250-310 microseconds.
Now, do not forget the caveats on microbenchmarks described in Nathan's reply - there are certainly a lot of flaws in that micro benchmark - and trust the documentation if they say that reflection is a LOT slower than direct invocation.

It strikes me that you have placed a "System.out.println(s)" call inside your inner benchmark loop.
Since performing IO is bound to be slow, it actually "swallows up" your benchmark and the overhead of the invoke becomes negligible.
Try removing the "println()" call and running code like this, I'm sure you'd be surprised by the result (some of the silly calculations are needed to avoid the compiler optimizing away the calls altogether):
public class Experius
{
public static void main(String[] args) throws Exception
{
Experius a = new Experius();
int count = 10000000;
int v = 0;
long tm = System.currentTimeMillis();
for ( int i = 0; i < count; ++i )
{
v = a.something(i + v);
++v;
}
tm = System.currentTimeMillis() - tm;
System.out.println("Time: " + tm);
tm = System.currentTimeMillis();
Method method = Experius.class.getMethod("something", Integer.TYPE);
for ( int i = 0; i < count; ++i )
{
Object o = method.invoke(a, i + v);
++v;
}
tm = System.currentTimeMillis() - tm;
System.out.println("Time: " + tm);
}
public int something(int n)
{
return n + 5;
}
}
-- TR

Even if you look up the method in both cases (i.e. before 2nd and 3rd loop),
the first lookup takes way less time than the second lookup, which should have been the other way around and less than a regular method call on my machine.
Neverthless, if you use the 2nd loop with method lookup, and System.out.println statement, I get this:
regular call : 740 ms
look up(2nd loop) : 640 ms
look up ( 3rd loop) : 800 ms
Without System.out.println statement, I get:
regular call : 78 ms
look up (2nd) : 37 ms
look up (3rd ) : 112 ms

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How Java/Scala synchronized affect performance? - java

Related

How to improve the performance for System.currentTimeMillis();?

Does test order affect performance result?

Java factorial calculation with thread pool

Java calculations that takes X amount of time

Java Reflection Performance Issue

Categories

Resources