Suppose I have an array of data, can 2 threads safely write to different indexes of the same array concurrently? I'm concerned about write speed, and I want to synchronize the 'get index to write at' bit vs the actual writing.
I am writing code that lets me assume 2 threads will not get the same index.
For two different indexes in an array the same rules apply as for two separate variables.
The Chapter "Threads and Locks" in the Java Language Specification starts by stating:
17.4.1 Shared Variables
[...]
All instance fields, static fields and array elements are stored in heap memory. In this chapter, we use the term variable to refer to both fields and array elements.
This means that you can safely write to two different indexes concurrently. However you need to synchronize a write/read to the same index if you want to make sure the consumer thread sees the last value written by the producer thread.
Modifying two different variables in two different threads is safe. Modifying two different elements in an array can be compared to modifying two different variables under different memory addresses, at least as far as OS is concerned. So yes, it is safe.
Well yes it's technically true, but there are so many caveats to this answer it makes me feel very worried to tell you yes. Because while you can write to two different locations in an array you can't do much else without running into concurrency issues. The real question comes in what next are you going to do if you could do this?
If you had counter variables that moved as arrays wrote to different locations, you could run into concurrency issues. It's possible that as your array fills up you could have two threads try and write to the same location. If you potentially had a reader of the array that could read the same location that's being written too you'll have concurrency issues. Besides writing doesn't do anything if you never plan on reading it back therefore, I think you'd have concurrency problems when you go to add the reader (which will have to lock your writers out). Then there's the question of if you don't move where the threads write to what keeps it from writing over the data? And if you don't ever move the head of where the thread writes to why are you using an array? Just given them individual Latches or variables of their own to write to, and really keep them separated.
Without a full picture of your intentions saying "yes" might lead you into peril without thinking about why you are doing what you are doing.
Have a look at CopyOnWriteArrayList, provided you are okay with an arrayList. from its documentation,
A thread-safe variant of ArrayList in which all mutative operations
(add, set, and so on) are implemented by making a fresh copy of the
underlying array.
This is ordinarily too costly, but may be more efficient than alternatives
when traversal operations vastly outnumber mutations, and is useful when
you cannot or don't want to synchronize traversals, yet need to preclude
interference among concurrent threads.
And
An instance of CopyOnWriteArrayList behaves as a List implementation that
allows multiple concurrent reads, and for reads to occur concurrently with a
write. The way it does this is to make a brand new copy of the list every time
it is altered.
Reads do not block, and effectively pay only the cost of a volatile read;
Writes do not block reads (or vice versa), but only one write can occur at once.
Related
I am trying to code a processor intensive task, so I would like to use multithreading and share the calculation between the available processor cores.
Let's say I have thousands of iterations and all iterations have two phases:
Some working threads that scans through hundreds of thousands of options
while they have to read data from a shared array (or some other data structure), while there is no modification of the data.
One thread that collects the results from all the working threads (while
they are waiting) and makes modifications on the shared array
The phases are in sequence, so that there is no overlap (no concurrent writing and reading of the data). My problem is: How would I be sure that the data (cache) is updated for the working threads before the next phase, Phase 1, starts.
I am assuming that when people speak about cache or caching in this context, they mean the processor cache (fix me if I'm wrong).
As I understood, volatile can be used for nonreference types only, while there is no point to use synchronized, because the working threads will block each other at reading (there can be thousands of reads when processing an option).
What else can I use in this case?
Right now I have a few ideas, but I have no idea how costly they are (most probably they are):
create new working threads for all iterations
in a synchronized block make a copy of the array (can be up to 195kB in size) for each threads before a new iteration begins
I red about ReentrantReadWriteLock, but I can't understand how is it related to caching. Can a read lock acquire force the reader's cache to update?
The thing I was searching for was mentioned in the "Java Tutorial on Concurrence" I just had to look deeper. In this case it was the AtomicIntegerArray class. Unfortunately it is not efficient enough for my needs. I run some tests, maybe it worth to share.
I approximated the cost of different memory access methods, by running them many times and averaged the elapsed times, broke everything down to one average read or write.
I used a size of 50000 integer array, and repeated every test methods 100 times, then averaged the results. The read tests are performing 50000 random(ish) reads. The results shows the approximated time of one read/write access. Still, this can't be stated as exact measurement, but I believe it gives a good sense of the time costs of the different access methods. However on different processors or with different numbers these results may be completely different regarding to the different cache sizes, and clock speeds.
So the results are:
Fill time with set is: 15.922673ns
Fill time with lazySet is: 4.5303152ns
Atomic read time is: 9.146553ns
Synchronized read time is: 57.858261399999996ns
Single threaded fill time is: 0.2879112ns
Single threaded read time is: 0.3152002ns
Immutable copy time is: 0.2920892ns
Immutable read time is: 0.650578ns
Points 1 and 2 shows the write result on an AtomicIntegerArray, with sequential writes. In some article I red about the good efficiency of the lazySet() mehtod so I wanted to test it. It is usually over perform the set() method by about 4 times, however different array sizes show different results.
Points 3 and 4 shows the difference between the "atomic" access and synchronized access (a synchronized getter) to one item of the array via random(ish) reads by four different threads simultaneously. This clearly indicates the benefits of the "atomic" access.
Since the first four value looked shockingly high, I really wanted to measure the access times without multithreading, so I got the reslults of points 5 and 6. I tried to copy and modify methods from the previous tests, to make the code as close as it is possible. Of course there can be optimizations I can't affect.
Then just out of curiosity I come up with points 7. and 8. which imitates the immutable access. Here one thread creates the array (by sequential writes) and passes it's reference to an another thread which does the random(ish) read accesses on it.
The results are heavily vary, if the parameters are changed, like the size of the array or the count of the methods running.
The conclusion:
If an algorithm is extremely memory intensive (lots of reads from the same small array, interrupted by short calculations - which is my case), multithreading can slow down the calculation instead of speeding it up. But if it has many many reads, compared to the size of the array, it may be helpful to use an immutable copy of the array, and use multiple threads.
I have a program that calculates Pi from the Chudnovsky formula. It's written in Java and it uses a shared Vector that is used to save intermediate calculations like factorials and powers that include the index of the element.
However, I believe that since it's a synchronized Vector (thread safe by default) only one thread can read or write to it. So when we have lots of threads, instead of having increasing speedup, we see the computation time becomes constant.
Is there anything that I can do to circumvent that? What to do when there are too many threads reading/writing to the same shared memory?
When the access pattern is lots of reads and occasional writes, you can protect an unsyncronized data structure with a ReentrantReadWriteLock. It allows multiple readers, but only a single writer.
Depending on your implementation, you might also benefit from using a ConcurrentHashMap.
You might be able to cheat a bit and use either an AtomicIntegerArray or an AtomicReferenceArray of Futures/CompletionStages.
Store the results of each thread in a stack. One thread collects results from every thread and adds them together. Of course the stack should not be empty.
If you want multiple threads to work on factorials why not create a thread or two that produce a list of factorial results. Other threads can just look up results if needed.
Instead of having the same shared memory, you can have multiple threads with individual memories in a stack. Eventually, add all these up together (or occasionally) with one thread!
If you need high throughput, you can consider using Disruptor and RingBuffer.
At a crude level you can think of a Disruptor as a multicast graph of queues where producers put objects on it that are sent to all the consumers for parallel consumption through separate downstream queues. When you look inside you see that this network of queues is really a single data structure - a ring buffer.
Each producer and consumer has a sequence counter to indicate which slot in the buffer it's currently working on. Each producer/consumer writes its own sequence counter but can read the others' sequence counters
Few useful links:
https://lmax-exchange.github.io/disruptor
http://martinfowler.com/articles/lmax.html
https://softwareengineering.stackexchange.com/questions/244826/can-someone-explain-in-simple-terms-what-is-the-disruptor-pattern
Under what circumstances do you need to synchronize an array?
My thoughts are, do you need to synchronize for access? Say two threads access the array at the same time, is that going to crash?
What if one edits, while one is reading? (separate values, and the same in different circumstances)
Both editing different things?
Or is there no JVM crash like for arrays when you don't synchronize?
Under what circumstances do you need to synchronize an array?
It's sort of you either always need to or never need to. Like #EJP said, he's never done it because there's almost always a better data structure than an array, anyway (edit: there are lots of good use cases for arrays, but they're almost always used in isolation. e.g. ArrayList). But if you insist on sharing arrays between threads, array elements aren't volatile, so because of possible caching, you'll get inconsistencies and corrupt data without using synchronized.
My thoughts are, do you need to synchronize for access? Say two threads access the array at the same time, is that going to crash?
Crash, no, but your data could be inconsistent, and extra inconsistent if they're 64-bits on a 32-bit architecture.
What if one edits, while one is reading? (separate values, and the same in different circumstances)
Please don't. Wrapping your head around the Java memory model is hard enough. If you haven't established that a read or a write happened-before another read or write, the ultimate sequencing is undefined.
This is a difficult question because it touches on a lot of Concurrency topics.
First I'd start with, http://docs.oracle.com/javase/tutorial/essential/concurrency/sync.html
Threads communicate primarily by sharing access to fields and the objects reference fields refer to. This form of communication is extremely efficient, but makes two kinds of errors possible: thread interference and memory consistency errors. The tool needed to prevent these errors is synchronization.
A. Thread Interference describes how errors are introduced when multiple threads access shared data.
B. Memory Consistency Errors describes errors that result from inconsistent views of shared memory.
So to answer the main question directly, You synchronize an array when you believe that your array maybe be accessed in a way that introduces Thread interference or Memory Consistency Errors mainly.
You end up with what's called a Race Condition. Whether that crashes your application or not depends on your application.
So if you do not synchronize access to an array that is shared between multiple threads you run the chance of threads interleaving modifications to this array ( ie. Thread Interference ). Or the chance that threads read inconsistent data in your array ( ie. Memory Consistency ).
The solution is typically to synchronize the array, or us a Collection built for Concurrency, such as those discribed at https://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html
I have many byte arrays of size 4096 (16x16x16), and I want editing them from many threads in one time, there is small chance that any element will be written in one time by more than one thread, and almost impossible that more than 3 will be accessing it (one of elements) in one time (write or read).
But whole array can be accessed by many threads in one time.
Can this cause any problems? If yes, then how to fix/avoid them?
I know that reading should be safe, and I hear about some problems with writing
Code need be fast (real-time based stuff) so I can't synchronize that, and I can't use any ArrayList, because that will cause problems with memory. (There will be like 1000-20000 (or even more) arrays like that)
Every time someone says real time in the same sentence as Java it peaks my interest because real time has a specific meaning that most people don't understand ( oracle / sun have a real time jvm available for purchase )
But I digress , array reads and writes are atomic, therefore thread safe. 2 threads cannot write to the array at the same time because the operation cannot get broken down to anything smaller ( allowing a scheduler to interrupt halfway through ) As long as you are careful ( e.g. Are not reading a value, doing some math and then writing it back to the array and expecting the the value at the given index to remain the same )
So in short there is nothing stopping you from doing this as long as your logic around it is also thread safe.
I have a computationally demanding task that I am going to spread out over multiple threads. Each thread will require access to the identical, large String[][]. I am wondering whether the fact that they all need to access the same String[][] on heap will hit performance? Would it be better to make copies of this String[][] for each thread to access individually (even though they all only need to access the identical instance of this String[][])?
Note that, for String[][] someArray = new String[100][1000000]; (for example), it is improbable that at any single point in time they will be calling the same someArray[i] at the same time. Generally each thread will be using a different i at any given point in time. However sometimes i will be the same across threads (mostly by chance).
Each thread will be read-only on someArray.
If you are only reading the values then it shouldn't be a problem. It doesn't even matter if they are reading the same 'i'. The problems come when you start writing to shared memory...
EDIT: Removed confusing synchronization comment.