java threads vs java processes performance degradation

java threads vs java processes performance degradation - java

Here I would focus on custom application where I got degradation (no need for general discussion about fastness of threads against processes).
I've got MPI application on Java which solve some problem using iteration method. The schematic view to application bellow lets call it MyProcess(n), where "n" is the number of processes:
double[] myArray = new double[M*K];
for(int iter = 0;iter<iterationCount;++iter)
{
//some communication between processes
//main loop
for(M)
for(K)
{
//linear sequence of arithmetical instructions
}
//some communication between processes
}
To improve performance I've decided to use Java threads (lets call it MyThreads(n)). The code is almost the same – myArray becomes matrix, where each row contains array for appropriate thread.
double[][] myArray = new double[threadNumber][M*K];
public void run()
{
for(int iter = 0;iter<iterationCount;++iter)
{
//some synchronization primitives
//main loop
for(M)
for(K)
{
//linear sequence of arithmetical instructions
counter++;
}
// some synchronization primitives
}
}
Threads created and started using Executors.newFixedThreadPool(threadNumber).
The problem is that while for MyProcess(n) we got adequate performance(n in [1,8]), in case of MyThreads(n) performance degrades essentially(on my system by factor of n).
Hardware: Intel(R) Xeon(R) CPU X5355(2 processors, 4 cores on each)
Java version: 1.5(using d32 option).
At first I thought that got different workloads on threads, but no, variable “counter” shows, that number of iterations between different run of MyThreads(n) (n in [1,8]) are identical.
And it isn’t synchronization fault, because I have temporary comment all synchronization primitives.
Any suggestions/ideas would be appreciated.
Thanks.

There are 2 issues I see in your piece of code.
Firstly caching problem. Since you try to do this in multi thread/process I'd assume your M * K results in a large number; then when you do
double[][] myArray = new double[threadNumber][M*K];
You are essentially creating an array of double pointer with size threadNumber; each pointing to a double array of size M*K. The interesting point here is that the threadNumber count of arrays are not necessarily allocated onto the same block of memory. They are just double pointers which can be allocated anywhere inside JVM heap. As a result, when multiple threads run, you might encounter a lot of cache miss and you end up reading memory many times, eventually slow down your program.
If the above is the root cause, you can try enlarge your JVM heap size, and then do
double[] myArray = new double[threadNumber * M * K];
And have the threads operating on different segment of the same array. You should be able to see performance better.
Secondly synchronization issue. Note that double (or any primitive) array is NOT volatile. Thus your result on 1 thread isn't guaranteed to be visible to other threads. If you are using synchronization block this resolves the issue, as a side effect of synchronization is make sure visibility across threads; If not, when you are reading and writing the array, please always make sure you use Unsafe.putXXXVolatile() and Unsafe.getXXXVolatile() so that you can do volatile operations on arrays.
To take this further, Unsafe can also be used to create a continuous segment of memory which you can used to hold your data structure and achieve better performance. In your case I think 1) already do the trick.

Related

Trying to understand shared variables in java threads

I have the following code :
class thread_creation extends Thread{
int t;
thread_creation(int x){
t=x;
}
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t++;
System.out.println(t);
}
}
}
public class test {
public static void main(String[] args) {
int i =0;
thread_creation t1 = new thread_creation(i);
thread_creation t2 = new thread_creation(i);
t1.start();
try {
Thread.sleep(500);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
t2.start();
}
}
When I run it , I get :
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
Why I am getting this output ? According to my understanding , the variable i is a shared variable between the two threads created. So according to the code , the first thread will execute and increments i 10 times , and hence , i will be equal to 10 . The second thread will start after the first one because of the sleep statement and since i is shared , then the second thread will start will i=10 and will start incrementing it 10 times to have i = 20 , but this is not the case in the output , so why that ?

You seem to think that int t; in thread_creation is a shared variable. I'm afraid you are mistaken. Each t instance is a different variable. So the two threads are updating distinct counters.
The output you are seeing reflects that.
This is the nub of your question:
How do I pass a shared variable then ?
Actually, you can't1. Strictly a shared variable is actually a variable belonging to a shared object. You cannot pass a variable per se. Java does not allow passing of variables. This is what "Java does not support call-by-reference" really means. You can't pass or return a variable or the address of a variable in any method call. (Or in any other way.)
In Java you pass and return values: either primitives, or references to objects. The values may read from a variable by the call's parameter expression or assigned to a variable after the call's return. But you are not passing the variable. A variable and its value / contents are different things.
So the only way to implement a shared counter is to implement it as a shared counter object.
Note that "variable" and "object" mean different things, both in Java and in other programming languages. You should NOT use the two terms interchangeable. For example, when I declare this in Java:
String s = "Hello";
the s variable is not a String object. It is a variable that contains a reference to the String object. Other variables may contain references to the same String object as well. The distinction is even more stark when the objects are mutable. (String is not mutable ... in Java.)
Here are the two (IMO) best ways to implement a shared counter object.
You could create a custom Java Counter class with a count variable, a get method, and methods for incrementing, decrementing the counter. The class needs to implement various methods as thread-safe and atomic; e.g. by using synchronized methods or blocks2.
You could just use an AtomicInteger instance. That takes care of atomicity and thread-safety ... to the extent that it is possible with this kind of API.
The latter approach is simpler and likely more efficient ... unless you need to do something special each time the counter changes.
(It is conceivable that you could implement a shared counter other ways, but that is too much detail for this answer.)
1 - I realize that I just said the same thing more than 3 times. But as the Bellman says in "The Hunting of the Snark": "What I tell you three times is true."
2 - If the counter is not implemented using synchronized or an equivalent mutual exclusion mechanism with the appropriate happens before semantics, you are liable to see Heisenbugs; e.g. race conditions and memory visibility problems.

Two crucial things you're missing. Both individually explain this behaviour - you can 'fix' either one and you'll still see this, you'd have to fix both to see 1-20:
Java is pass-by-value
When you pass i, you pass a copy of it. In fact, in java, all parameters to methods are always copies. Hence, when the thread does t++, it has absolutely no effect whatsoever on your i. You can trivially test this, and you don't need to mess with threads to see it:
public static void main(String[] args) {
int i = 0;
add5(i);
System.out.println(i); // prints 0!!
}
static void add5(int i) {
i = i + 5;
}
Note that all non-primitives are references. That means: A copy of the reference is passed. It's like passing the address of a house and not the house itself. If I have an address book, and I hand you a scanned copy of a page that contains the address to my summer home, you can still drive over there and toss a brick through the window, and I'll 'see' that when I go follow my copy of the address. So, when you pass e.g. a list and the method you passed the list to runs list.add("foo"), you DO see that. You may think: AHA! That means java does not pass a copy, it passed the real list! Not so. Java passed a copy of a street address (A reference). The method I handed that copy to decided to drive over there and act - that you can see.
In other words, =, ++, that sort of thing? That is done to the copy. . is java for 'drive to the address and enter the house'. Anything you 'do' with . is visible to the caller, = and ++ and such are not.
Fixing the code to avoid the pass-by-value problem
Change your code to:
class thread_creation extends Thread {
static int t; // now its global!
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t++;
// System.out.println(t);
}
}
}
public class test {
public static void main(String[] args) throws Exception {
thread_creation t1 = new thread_creation();
thread_creation t2 = new thread_creation();
t1.start();
Thread.sleep(500);
t2.start();
Thread.sleep(500);
System.out.println(thread_creation.t);
}
}
Note that I remarked out the print line. I did that intentionally - see below. If you run the above code, you'd think you see 20, but depending on your hardware, the OS, the song playing on your mp3 playing app, which websites you have open, and the phase of the moon, it may be less than 20. So what's going on there? Enter the...
The evil coin.
The relevant spec here is the JMM (The Java Memory Model). This spec explains precisely what a JVM must do, and therefore, what a JVM is free not to do, especially when it comes to how memory is actually managed.
The crucial aspect is the following:
Any effects (updates to fields, such as that t field) may or may not be observable, JVM's choice. There's no guarantee that anything you do is visible to anything else... unless there exists a Happens-Before/Happens-After relationship: Any 2 statements with such a relationship have the property that the JVM guarantees that you cannot observe the lack of the update done by the HB line from the HA line.
HB/HA can be established in various ways:
The 'natural' way: Anything that is 'before' something else _and runs in the same thread has an HB/HA relationship. In other words, if you do in one thread x++; System.out.println(x); then you can't observe that the x++ hasn't happened yet. It's stated like this so that if you're not observing, you get no guarantees, which gives the JVM the freedom to optimize. For example, Given x++;y++; and that's all you do, the JVM is free to re-order that and increment y before x. Or not. There are no guarantees, a JVM can do whatever it wants.
synchronized. The moment of 'exiting' a synchronized (x) {} block has HB to the HA of another thread 'entering' the top of any synchronized block on the same object, if it enters later.
volatile - but note that with volatile it's basically impossible which one came first. But one of them did, and any interaction with a volatile field is HB relative to another thread accessing the same field later.
thread starting. thread.start() is HB relative to the first line of the run() of that thread.
thread yielding. thread.yield() is HA relative to the last line of the thread.
There are a few more exotic ways to establish HB/HA but that's pretty much it.
Crucially, in your code there is no HB/HA between any of the statements that modify or print t!
In other words, the JVM is free to run it all in such a way that the effects of various t++ statements run by one thread aren't observed by another thread.
What the.. WHY????
Because of efficiency. Your memory banks on your CPU are, relative to how fast CPUs are, oceans away from the CPU core. Fetching or writing to core memory from a CPU takes an incredibly long time - your CPU is twiddling its thumbs for a very long time while it waits for the memory controller to get the job done. It could be running hundreds of instructions in that time.
So, CPU cores do not write to memory AT ALL. Instead they work with caches: They have an on-core cache page, and the only interaction with your main memory banks (which are shared by CPU cores) is 'load in an entire cache page' and 'write an entire cache page'. That cache page is then effectively a 'local copy' that only that core can see and interact with (but can do so very very quickly, as that IS very close to the core, unlike the main memory banks), and then once the algorithm is done it can flush that page back to main memory.
The JVM needs to be free to use this. Had the JVM actually worked like you want (that anything any thread does is instantly observable by all others), then anything that any line does must first wait 500 cycles to load the relevant page, then wait another 500 cycles to write it back. All java apps would literally be 1000x slower than they could be.
This in passing also explains that actual synchronizing is really slow. Nothing java can do about that, it is a fundamental limitation of our modern multi-core CPUs.
So, evil coin?
Note that the JVM does not guarantee that the CPU must neccessarily work with this cache stuff, nor does it make any promises about when cache pages are flushed. It merely limits the guarantees so that JVMs can be efficiently written on CPUs that work like that.
That means that any read or write to any field any java code ever does can best be thought of as follows:
The JVM first flips a coin. On heads, it uses a local cached copy. On tails, it copies over the value from some other thread's cached copy instead.
The coin is evil: It is not reliably a 50/50 arrangement. It is entirely plausible that throughout developing a feature and testing it, the coin lands tails every time it is flipped. It remains flipping tails 100% of the time for the first week that you deployed it. And then just when that big potential customer comes in and you're demoing your app, the coin, being an evil, evil coin, starts flipping heads a few times and breaking your app.
The correct conclusion is that the coin will mess with you and that you cannot unit test against it. The only way to win the game is to ensure that the coin is never flipped.
You do this by never touching a field from multiple threads unless it is constant (final, or simply never changes), or if all access to it (both reads and writes) has clearly established HB/HA between all threads.
This is hard to do. That's why the vast majority of apps don't do it at all. Instead, they:
Talk between threads using a database, which has vastly more advanced synchronization primitives: Transactions.
Talk using a message bus such as RabbitMQ or similar.
Use stuff from the java.util.concurrent package such as a Latch, ForkJoin, ConcurrentMap, or AtomicInteger. These are easier to use (specifically: It is a lot harder to write code for these abstractions that is buggy but where the bug cannot be observed or tested for on the machine of the developer that wrote it, it'll only blow up much later in production. But not impossible, of course).
Let's fix it!
volatile doesn't 'fix' ++. x++; is 'read x, increment by 1, write result to x' and volatile doesn't make that atomic, so we cannot use this. We can either replace t++ with:
synchronized(thread_creation.class) {
t++;
}
Which works fine but is really slow (and you shouldn't lock on publicly visible stuff if you can help it, so make a custom object to lock on, but you get the gist hopefully), or, better, dig into that j.u.c package for something that seems useful. And so there is! AtomicInteger!
class thread_creation extends Thread {
static AtomicInteger t = new AtomicInteger();
public void run() {
increment();
}
public void increment() {
for(int i =0 ; i<10 ; i++) {
t.incrementAndGet();
}
}
}
public class test {
public static void main(String[] args) throws Exception {
thread_creation t1 = new thread_creation();
thread_creation t2 = new thread_creation();
t1.start();
Thread.sleep(500);
t2.start();
Thread.sleep(500);
System.out.println(thread_creation.t.get());
}
}
That code will print 20. Every time (unless those threads take longer than 500msec which technically could be, but is rather unlikely of course).
Why did you remark out the print statement?
That HB/HA stuff can sneak up on you: When you call code you did not write, such as System.out.println, who knows what kind of HB/HA relationships are in that code? Javadoc isn't that kind of specific, they won't tell you. Turns out that on most OSes and JVM implementations, interaction with standard out, such as System.out.println, causes synchronization; either the JVM does it, or the OS does. Thus, introducing print statements 'to test stuff' doesn't work - that makes it impossible to observe the race conditions your code does have. Similarly, involving debuggers is a great way to make that coin really go evil on you and flip juuust so that you can't tell your code is buggy.
That is why I remarked it out, because with it in, I bet on almost all hardware you end up seeing 20 eventhough the JVM doesn't guarantee it and that first version is broken. Even if on your particular machine, on this day, with this phase of the moon, it seems to reliably print 20 every single time you run it.

Performance of Concurrent Program Degrading with Increase in Threads?

I have been trying to implement the below code on quad core computer and average running times with No of threads in the Executor service over 100 iterations is as follows
1 thread = 78404.95
2 threads = 174995.14
4 thread = 144230.23
But according to what I have studied 2*(no of cores) of threads should give optimal result for the program which is clearly not the case in my program which bizarrely gives best time for single thread.
Code :
import java.util.Collections;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class TestHashSet {
public static void main(String argv[]){
Set<Integer> S = Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());
S.add(1);
S.add(2);
S.add(3);
S.add(4);
S.add(5);
long startTime = System.nanoTime();
ExecutorService executor = Executors.newFixedThreadPool(8);
int Nb = 0;
for(int i = 0;i<10;i++){
User runnable = new User(S);
executor.execute(runnable);
Nb = Thread.getAllStackTraces().keySet().size();
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
long endTime = System.nanoTime();
System.out.println(0.001*(endTime-startTime)+" And "+Nb);
}
}
class User implements Runnable{
Set<Integer> S;
User(Set<Integer> S){
this.S = S;
}
#Override
public void run() {
// TODO Auto-generated method stub
Set<Integer> t =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
for(int i = 0;i<10;i++){
t.add(i+5);
}
S.retainAll(t);
Set<Integer> t2 =Collections.newSetFromMap(new ConcurrentHashMap<Integer,Boolean>());;
for(int i = 0;i<10;i++){
t2.add(i);
}
S.addAll(t);
/*
ConcurrentHashSet<Integer> D = new ConcurrentHashSet<Integer>();
for(int i=0;i<10;i++){
D.add(i+3);
}
S.difference(D);
*/
}
}
Update : If I increase no of queries per thread to 1000 , 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.Thanks

But 5 Threads Supposed to increase the performance..?
That's what >>you<< suppose. But in fact, there are no guarantees that adding threads will increase performance.
But according to what I have studied 2*(no of cores) of threads should give optimal result ...
If you read that somewhere, then you either misread it or it is plain wrong.
The reality is that the number of threads for optimal performance is highly dependent on the nature of your application, and also on the hardware you are running on.
Based on a cursory reading of your code, it appears that this is a benchmark to test how well Java deals with multi-threaded access and updates to a shared set (S). Each thread is doing some operations on a thread-confined set, then either adding or removing all entries in the thread-confined set to the shared set.
The problem is that the addAll and retainAll calls are likely to be concurrency bottlenecks. A set based on ConcurrentHashMap will give better concurrent performance for point access / update to the set than on based on HashMap. However, addAll and retainAll perform N such operations, on the same entries that the other threads are operating on. Given the nature of this pattern of operations, you are likely to get significant contention within the different regions of the ConcurrentHashMap. That is likely to lead to one thread blocking another ... and a slowdown.
Update : If I increase no of queries per thread 4-threaded is performing better than Single threaded .I think overhead has been higher than run-time when I used only about 4 queries per thread and as no of queries increased Runtime is now greater than Overhead.
I assume that you mean that you are increasing the number of hash map entries. This is likely to reduce the average contention, given the way that ConcurrentHashMap works. (The class divides the map into regions, and arranges that operations involving entries in different regions incur the minimum possible contention overheads. By increasing the number of distinct entries, you are reducing the probability that two simultaneous operations will lead to contention.)
So returning to the "2 x no of threads" factoid.
I suspect that the sources you have been reading don't actually say that that gives you optimal performance. I suspect that they really say that that:
"2 x no of threads" is a good starting point ... and you need to tune it for your application / problem / hardware, and/or
don't go above "2 x no of threads" for a compute intensive task ... because it is unlikely to help.
In your example, it is most likely that the main source of the contention is in the updates to the shared set / map ... and the overheads of ensuring that they happen atomically.
You can also get contention at a lower level; i.e. contention for memory bandwidth (RAM read/write) and memory cache contention. Whether that happens will depend on the specs of the hardware you are running on ...
The final thing to note is that your benchmark is flawed in that it does not allow for various VM warmup effects ... such as JIT compilation. The fact that your 2 thread times are more than double the 1 thread times points to that issue.
There are other questionable aspects about your benchmarking:
The amount of work done by the run() method is too small.
This benchmark does not appear to be representative of a real-world use-case. Measuring speed-up in a totally fictitious (nonsense) algorithm is not going to give you any clues about how a real algorithm is likely to perform when you scale the thread count.
Running the tests on a 4 core machine means that you probably wouldn't have enough data points to draw scientifically meaningful conclusions ... assuming that the benchmark was sound.
Having said that, the 2 to 4 thread slowdown that you seem to be seeing is not unexpected ... to me.

Arrays.sort and Arrays.parallelSort function behavior

I have the following code ,
import java.util.Arrays;
public class ParellelStream {
public static void main(String args[]){
Double dbl[] = new Double[1000000];
for(int i=0; i<dbl.length;i++){
dbl[i]=Math.random();
}
long start = System.currentTimeMillis();
Arrays.parallelSort(dbl);
System.out.println("time taken :"+((System.currentTimeMillis())-start));
}
}
When I run this code it takes time approx 700 to 800 ms, but when I replace the line Arrays.parallelSort to Arrays.sort it takes 500 to 600 ms. I read about the Arrays.parallelSort and Arrays.sort method which says that Arrays.parellelSort gives poor performance when dataset are small but here I am using array of 1000000 elements. what could be the reason for parallelSort poor performance ?? I am using java8.

The parallelSort function will use a thread for each cpu core you have on your machine. Specifically parallelSort runs tasks on the ForkJoin common thread pool. If you only have one core you would not see an improvement over single threaded sort.
If you only have multiple cores you are going to have some upfront cost associated with creating the new threads which will mean that for relatively small arrays you are not going to see linear performance gains.
The compare function for comparing doubles is not an expensive function. I think that in this case 1000000 elements can be safely considered small and the benefits of using multiple threads is outweighed by the upfront costs of creating those threads. Since the upfront costs will be fixed you should see a performance gain with larger arrays.

I read about the Arrays.parallelSort and Arrays.sort method which says
that Arrays.parellelSort gives poor performance when dataset are small
but here I am using array of 1000000 elements.
This is not the only thing to take in consideration. It depends a lot on your machine (how your CPU handle multi-threading etc).
Here a quote from the Parallelism part of The Java Tutorials
Note that parallelism is not automatically faster than performing
operations serially, although it can be if you have enough data and
processor cores [...] it is still your responsibility to determine if
your application is suitable for parallelism.
You might also want to have a look at the code of java.util.ArraysParallelSortHelpers for a better understanding of the algorithm.
Note that the parallelSort method use the ForkJoinPool introduced in Java 7 to take advantages of each processors of your computer as stated in the javadoc :
A ForkJoinPool is constructed with a given target parallelism level;
by default, equal to the number of available processors.
Note that if the length of the array is less then 1 << 13, the array will be sorted using the appropriate Arrays.sort method.
See also
Fork/Join

Is Java slow when creating Objects?

In my current project (OpenGL Voxel Engine) I have a serious issue when generating models. I have a very object oriented structure, meaning that even single parameters of my vertices are Objects. This way I am creating about 75000 Objects for 750 voxels in about 5 seconds. Is Java really this slow when allocating new Objects or do I miss a big failure somewhere in my code?

Very big question. Generally speaking, it depends from the object class definition and by the amount of work required to construct object.
Some issue:
avoid finalize method,
tune memory and GC in order to avoid excessive GC activity,
avoid big work during constructor,
do not use syncronization call during object construction,
use Weak references
these issues solved my problem.
See also http://oreilly.com/catalog/javapt/chapter/ch04.html
Finally let me suggest you the (deprecated) Object Pool pattern or reuse objects.
Concluding, no, generally speaking, java object creation is not slow

Of course it isn't. The following code allocates 10 million objects and stores them in an array. On my 5 year old notebook, it completes in 1.4 seconds.
public class Test {
public static void main(String[] args) {
Object[] o = new Object[10_000_000];
long start = System.nanoTime();
for (int i = 0; i < o.length; i++) {
o[i] = new Object();
}
long end = System.nanoTime();
System.out.println(Arrays.hashCode(o));
System.out.println(new BigDecimal(end - start).movePointLeft(9));
}
}
... and that's even though this benchmark is quite naive in that it doesn't trigger just in time compilation of the code under test before starting the timer.

Simply creating 75,000 objects should not take 5 seconds. Take a look at the work your constructor is doing. What else are you doing during this time besides creating the objects? Have you tried timing the code to pinpoint where delays occur?

Objects will be slower than primitives, and they will also consume considerably more memory - so it's possible you are going overboard on them. It's hard to say without seeing more details.
75000 objects will not take a long time to create though, try this:
List<Integer> numbers = new ArrayList<Integer>();
for(int i = 0;i<75000;i++){
numbers.add(i); // Note this will autobox creating an Integer object after the first 128
}
System.out.println(numbers.size());
}
http://www.tryjava8.com/app/snippets/52d070b1e4b004716da5cb4f
Total time taken less than a second.
When I put the number up to 7,500,000 it finally took a second...

The new operator in java is very fast compared to the common approach in languages without automatic memory management (e.g. the new operator is usually faster than the malloc-command in C because it does not need a system call).
Although the new operator can still be a bottleneck, it is certainly not the problem in your case. Creating 75K objects should be WAY faster than 5 seconds.

I have the same issue with creating new objects.
My object in constructor allocate single three dimensional array 64x64x64 and no more. FPS fell down to quarter of a value.
I solve this issue with reusing old object and reset it's state (BTW this method reallocate this array without lost performance).
If I move allocation array into separate method and call it after creating the object, speed does not increase to acceptable value.
This object I created is in Main game loop.

Concurrent HashMap iterator:How safe is it for Threading?

I used concurrent hashmap for creating a matrix. It indices ranges to 100k. I have created 40 threads. Each of the thread access those elements of matrices and modifies to that and write it back of the matrix as:
ConcurrentHashMap<Integer, ArrayList<Double>> matrix =
new ConcurrentHashMap<Integer, ArrayList<Double>>(25);
for (Entry(Integer,ArrayList<Double>)) entry: matrix.entrySet())
upDateEntriesOfValue(entry.getValue());
I did not found it thread safe. Values are frequently returned as null and my program is getting crashed. Is there any other way to make it thread safe.Or this is thread safe and i have bug in some other places. One thing is my program does not crash in single threaded mode.

The iterator is indeed thread-safe for the ConcurrentHashMap.
But what is not thread-safe in your code is the ArrayList<Double> you seem to update! Your problems might come from this data structure.
You may want to use a concurrent data structure adapted to you needs.

Using a map for a matrix is really inefficient, and the way you have used it, it won't even support sparse arrays particularly well.
I suggest you use a double[][] where you lock each row (or column if that is better) If the matrix is small enough you may be better of using only one CPU as this can save you quite a bit of overhead.
I would suggest you create no more threads than you have cores. For CPU intensive tasks, using more thread can be slower, not faster.
Matrix is 100k*50 at max
EDIT: Depending on the operation you are performing, I would try to ensure you have the shorter dimension first so you can process each long dimension in a different thread efficiently.
e.g
double[][] matrix = new double[50][100*1000];
for(int i=0;i<matrix.length;i++) {
final double[] line = matrix[i];
executorService.submit(new Runnable() {
public void run() {
synchronized(line) {
processOneLine(line);
}
}
});
}
This allows all you thread to run concurrently because they don't share any data structures. They can also access each double efficiently because they are continuous in memory and stored as efficiently as possible. i.e. 100K doubles uses about 800KB, but List<Double> uses 2800KB and each value can be randomly arranged in memory which means your cache has to work much harder.
thanks but in fact i have 80 cores in total
To uses 80 core efficiently you might want to break the longer lines in two or four so you can keep all the cores busy, or find a way to perform more than one operation at a time.

TheConcurrentHashMap will be thread safe for accesses into the Map, but the Lists served out need to be thread-safe, if multiple threads can operate on the same List instances concurrently so use a thread-safe list while modifying.
In your case working on ConcurrentHashMap is tread-safe but when thread goes to ArrayList this is not synchronized and hence multiple threads can access it simultaneously which makes it non thread-safe. either you can use synchronized block where you are performing modification in list

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.