I was trying to find the difference between using Java 8's parallelStream(method1) and creating parallel threads(method2)
I measured the time taken when using method 1 and method 2. But I found a huge deviation. Method2(~700ms) is way faster than method1(~20sec)
Method 1: (list has about 100 entries)
list.parallelStream()
.forEach(ele -> {
//Do something.
}));
Method 2:
for(i = 0;i < 100; i++) {
Runnable task = () -> {
//Do something.
}
Thread thread = new Thread(task);
thread.start();
}
NOTE: Do something is an expensive operation like hitting a Database.
I added System.out.println() messages to both. I found that method 1(parallelStream) appeared to be executing sequentially while in method 2 the messages were printed very fast.
Can anyone explain what is happening.
Can anyone explain what is happening.
Most likely you are doing something wrong but it's not clear what.
for (int i = 0; i < 3; i++) {
long start = System.currentTimeMillis();
IntStream.range(0, 100).parallel()
.forEach(ele -> {
try {
Thread.sleep(100);
} catch (InterruptedException ignored) {
}
});
long time = System.currentTimeMillis() - start;
System.out.printf("Took %,d ms to perform 100 tasks of 100 ms on %d processors%n",
time, Runtime.getRuntime().availableProcessors());
}
prints
Took 475 ms to perform 100 tasks of 100 ms on 32 processors
Took 401 ms to perform 100 tasks of 100 ms on 32 processors
Took 401 ms to perform 100 tasks of 100 ms on 32 processors
Related
The System.currentTimeMillis(); is system method in Java.
If invoke this method serially, it seems that no performance issues.
But if you keep invoking this method concurrently, the performance issue will occurred explicitly. As the native method dependent with OS clock_source. But how to improve it performance in Java. Refresh time milli policy with fixed rate is not usable.
Examples like below:
int parallism = 32;
for(int i=0;i< parallism ;i++){
new Thread(() -> {
for(;;){
// Focus here, how can i measure the logic efficiently
long begin = System.currentTimeMillis();
// Here may be the logic:
// Define empty block here means 0ms elapsed
long elapsed = (System.currentTimeMillis() - begin);
if(elapsed >= 5){
System.err.println("Elapsed: "+elapsed+" ms.");
}
}
}).start();
}
Thread.sleep(Integer.MAX_VALUE); // Just avoid process exit
Reason of low performance: https://pzemtsov.github.io/2017/07/23/the-slow-currenttimemillis.html
(Unusable) Another solution: https://programmer.group/5e85bd0cc8b52.html
Wait me to post my solution....
Try to use
System.nanoTime() instead of System.currentTimeMills();
I have a generator which generates events for Flink CEP, code for which is given below. Basically, I am using Thread.sleep() and I have read somewhere that java can't sleep less than 1 millisecond even we use System.nanoTime(). Code for the generator is
public class RR_interval_Gen extends RichParallelSourceFunction<RRIntervalStreamEvent> {
Integer InputRate ; // events/second
Integer Sleeptime ;
Integer NumberOfEvents;
public RR_interval_Gen(Integer inputRate, Integer numberOfEvents ) {
this.InputRate = inputRate;
Sleeptime = 1000 / InputRate;
NumberOfEvents = numberOfEvents;
}
#Override
public void run(SourceContext<RRIntervalStreamEvent> sourceContext) throws Exception {
long currentTime;
Random random = new Random();
int RRInterval;
int Sensor_id;
for(int i = 1 ; i <= NumberOfEvents ; i++) {
Sensor_id = 2;
currentTime = System.currentTimeMillis();
// int randomNum = rand.nextInt((max - min) + 1) + min;
RRInterval = 10 + random.nextInt((20-10)+ 1);
RRIntervalStreamEvent stream = new RRIntervalStreamEvent(Sensor_id,currentTime,RRInterval);
synchronized (sourceContext.getCheckpointLock())
{
sourceContext.collect(stream);
}
Thread.sleep(Sleeptime);
}
}
#Override
public void cancel() {
}
}
I will specify my requirement here in simple words.
I want generator class to generate events, let's say an ECG stream at 1200 Hz. This generator will accept parameters like input rate and total time for which we have to generate the stream.
So far so good, the issue is that I need to send more than 1000 events / second. How can I do this by using generator function which is generating values U[10,20]?
Also please let me know if I am using wrong way to generate x number of events / second in the above below.
Sleeptime = 1000 / InputRate;
Thanks in advance
The least sleep time in Windows systems is ~ 10 ms and in Linux and Macintosh is 1 millisecond as mentioned here.
The granularity of sleep is generally bound by the thread scheduler's
interrupt period. In Linux, this interrupt period is generally 1ms in
recent kernels. In Windows, the scheduler's interrupt period is
normally around 10 or 15 milliseconds
Through my research, I learned that using the nano time sleep in java will not help as the issue in at OS level. If you want to send data at arrival rate > 1000 in a controlled way, then it can be done using Real-Time Operating Systems (RTOS), as they can sleep for less then a millisecond. Now, I have come up with another way of doing it, but in this solution, the interarrival times will not be constantly distributed.
Let's say you want arrival rate of 3000 events/ second, then you can create a for loop which iterates 3 times to send data in each iteration and then sleep for 1ms. So for the 3 tuples, the interarrival time will be close to one another, but the issue will be solved. This may be a stupid solution but it works.
Please let me know if there is some better solution to this.
I wrote an application which reads all lines in text files and measure times. I`m wondering what will be the time of whole block.
For example if I start 2 threads at the same time:
for (int i = 0; i < 2; i++) {
t[i] = new Threads(args[j], 2);
j++;
}
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("TIME for block 1 of threads; "
+ (max(new long[]{t[0].getTime(),t[1].getTime()})));
Wait for them to stop processing the files and read operation times (by getTime). Is it good thinking for multithreading that in this case the time of block of threads, will be the maximum time got from thread? I think yes, because other threads will stop working by the time the thread with max time will stop.
Or maybe should I think in another way?
It's dangerous to argue about execution order when having multiple threads! E.g. If you run your code on a single core CPU, the threads will not really run in parallel, but sequentially, so the total run time for both threads is the sum of each thread's run time, not the maximum of both.
Fortunately, there is a very easy way to just measure this if you use an ExecutorService instead of directly using Threads (btw. this is always a good advice):
// 1. init executor
int numberOfThreads = 2; // or any other number
int numberOfTasks = numberOfThreads; // is this true in your case?
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
long startTime = System.currentTimeMillis();
// 2. execute tasks in parallel using executor
for(int i = 0; i < numberOfTasks; i++) {
executor.execute(new Task()); // Task is your implementation of Runnable
}
// 3. initiate shutdown and wait until all tasks are finished
executor.shutdown();
executor.awaitTermination(1, TimeUnit.MINUTES); // we won't wait forever
// 4. measure time
long delta = System.currentTimeMillis() - startTime;
Now, delta holds the total running time of your tasks. You can play around with numberOfThreads to see if more or less threads give different results.
Important note: Reading from a file is not thread-safe in Java, so it is not allowed to share a Reader or InputStream between threads!
As far as my concern You can Use System class's static methods.
You can use it in starting of the block and end of the block and subtract the later one with earlier time.
those are :
System.currentTimeMillis(); // The current value of the system timer, in miliseconds.
or
System.nanoTime(); //The current value of the system timer, in nanoseconds.
You can use
Starting of block
long startTime = System.currentTimeMillis();
End of block
long endTime = System.currentTimeMillis()- startTime;
By this you can calculate.
I am trying to understand the ExecutorService in java. There is not much performance difference when I use 1 thread or 4 threads. I have a quad core CPU and I do not have any other process running.
ExecutorService exService = Executors.newFixedThreadPool(4);
exService.execute(new Test().new RunnableThread());
exService.awaitTermination(25, TimeUnit.SECONDS);
class RunnableThread implements Runnable {
#Override
public void run() {
StopWatch stopWatch = new StopWatch();
stopWatch.start();
long cnt = 0;
for (cnt = 0; cnt < 999999999; cnt++) {
try {
for (long j = 0; j < 20; j++){
x += j;
}
} catch (Exception e) {
e.printStackTrace();
}
}
stopWatch.stop();
System.out.println(stopWatch.getTime());
}
}
If my understanding is right, my task should have close to 4x performance improvement when I say newFixedThreadPool(4) right?
Unfortunately, there is no magic in the allocation of workload to threads.
Every task runs on its own thread. It does not somehow automatically get transformed into concurrent execution paths.
If you have only one task, the remaining three threads will be idle.
Multiple threads only speed up things if you can split your workload into multiple tasks that can run concurrently (and you have to do that splitting yourself).
If my understanding is right, my task should have close to 4x performance improvement when I say newFixedThreadPool(4) right?
Yes, if you're actually running 4 concurrent tasks.
Currently, you have a single task that you are submitting to the executor. Let's say that it takes 10 seconds. Even if you have 4 cores and 4 threads, Java will not be able to parallelize a single task. However, if you submit 4 independent tasks (that have no memory or lock contention), then you will see all of them complete in those 10 seconds that it took the 1 task.
I'm writing conjugate-gradient method realization.
I use Java multi threading for matrix back-substitution.
Synchronization is made using CyclicBarrier, CountDownLatch.
Why it takes so much time to synchronize threads?
Are there other ways to do it?
code snippet
private void syncThreads() {
// barrier.await();
try {
barrier.await();
} catch (InterruptedException e) {
} catch (BrokenBarrierException e) {
}
}
You need to ensure that each thread spends more time doing useful work than it costs in overhead to pass a task to another thread.
Here is an example of where the overhead of passing a task to another thread far outweighs the benefits of using multiple threads.
final double[] results = new double[10*1000*1000];
{
long start = System.nanoTime();
// using a plain loop.
for(int i=0;i<results.length;i++) {
results[i] = (double) i * i;
}
long time = System.nanoTime() - start;
System.out.printf("With one thread it took %.1f ns per square%n", (double) time / results.length);
}
{
ExecutorService ex = Executors.newFixedThreadPool(4);
long start = System.nanoTime();
// using a plain loop.
for(int i=0;i<results.length;i++) {
final int i2 = i;
ex.execute(new Runnable() {
#Override
public void run() {
results[i2] = i2 * i2;
}
});
}
ex.shutdown();
ex.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.printf("With four threads it took %.1f ns per square%n", (double) time / results.length);
}
prints
With one thread it took 1.4 ns per square
With four threads it took 715.6 ns per square
Using multiple threads is much worse.
However, increase the amount of work each thread does and
final double[] results = new double[10 * 1000 * 1000];
{
long start = System.nanoTime();
// using a plain loop.
for (int i = 0; i < results.length; i++) {
results[i] = Math.pow(i, 1.5);
}
long time = System.nanoTime() - start;
System.out.printf("With one thread it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
{
int threads = 4;
ExecutorService ex = Executors.newFixedThreadPool(threads);
long start = System.nanoTime();
int blockSize = results.length / threads;
// using a plain loop.
for (int i = 0; i < threads; i++) {
final int istart = i * blockSize;
final int iend = (i + 1) * blockSize;
ex.execute(new Runnable() {
#Override
public void run() {
for (int i = istart; i < iend; i++)
results[i] = Math.pow(i, 1.5);
}
});
}
ex.shutdown();
ex.awaitTermination(1, TimeUnit.MINUTES);
long time = System.nanoTime() - start;
System.out.printf("With four threads it took %.1f ns per pow 1.5%n", (double) time / results.length);
}
prints
With one thread it took 287.6 ns per pow 1.5
With four threads it took 77.3 ns per pow 1.5
That's an almost 4x improvement.
How many threads are being used in total? That is likely the source of your problem. Using multiple threads will only really give a performance boost if:
Each task in the thread does some sort of blocking. For example, waiting on I/O. Using multiple threads in this case enables that blocking time to be used by other threads.
or You have multiple cores. If you have 4 cores or 4 CPUs, you can do 4 tasks simultaneously (or 4 threads).
It sounds like you are not blocking in the threads so my guess is you are using too many threads. If you are for example using 10 different threads to do the work at the same time but only have 2 cores, that would likely be much slower than running all of the tasks in sequence. Generally start the number of threads equal to your number of cores/CPUs. Increase the threads used slowly gaging the performance each time. This will give you the optimal thread count to use.
Perhaps you could try to implement to re-implement your code using fork/join from JDK 7 and see what it does?
The default creates a thread-pool with exactly the same amount of threads as you have cores in your system. If you choose the threshold for dividing your work into smaller chunks reasonably this will probably execute much more efficient.
You are most likely aware of this, but in case you aren't, please read up on Amdahl's Law. It gives the relationship between expected speedup of a program by using parallelism and the sequential segments of the program.
synchronizing across cores is much slower than on a single cored environment see if you can limit the jvm to 1 core (see this blog post)
or you can use a ExecuterorService and use invokeAll to run the parallel tasks