Under what circumstances do you need to synchronize an array in Java? - java

Under what circumstances do you need to synchronize an array?
My thoughts are, do you need to synchronize for access? Say two threads access the array at the same time, is that going to crash?
What if one edits, while one is reading? (separate values, and the same in different circumstances)
Both editing different things?
Or is there no JVM crash like for arrays when you don't synchronize?

Under what circumstances do you need to synchronize an array?
It's sort of you either always need to or never need to. Like #EJP said, he's never done it because there's almost always a better data structure than an array, anyway (edit: there are lots of good use cases for arrays, but they're almost always used in isolation. e.g. ArrayList). But if you insist on sharing arrays between threads, array elements aren't volatile, so because of possible caching, you'll get inconsistencies and corrupt data without using synchronized.
My thoughts are, do you need to synchronize for access? Say two threads access the array at the same time, is that going to crash?
Crash, no, but your data could be inconsistent, and extra inconsistent if they're 64-bits on a 32-bit architecture.
What if one edits, while one is reading? (separate values, and the same in different circumstances)
Please don't. Wrapping your head around the Java memory model is hard enough. If you haven't established that a read or a write happened-before another read or write, the ultimate sequencing is undefined.

This is a difficult question because it touches on a lot of Concurrency topics.
First I'd start with, http://docs.oracle.com/javase/tutorial/essential/concurrency/sync.html
Threads communicate primarily by sharing access to fields and the objects reference fields refer to. This form of communication is extremely efficient, but makes two kinds of errors possible: thread interference and memory consistency errors. The tool needed to prevent these errors is synchronization.
A. Thread Interference describes how errors are introduced when multiple threads access shared data.
B. Memory Consistency Errors describes errors that result from inconsistent views of shared memory.
So to answer the main question directly, You synchronize an array when you believe that your array maybe be accessed in a way that introduces Thread interference or Memory Consistency Errors mainly.
You end up with what's called a Race Condition. Whether that crashes your application or not depends on your application.
So if you do not synchronize access to an array that is shared between multiple threads you run the chance of threads interleaving modifications to this array ( ie. Thread Interference ). Or the chance that threads read inconsistent data in your array ( ie. Memory Consistency ).
The solution is typically to synchronize the array, or us a Collection built for Concurrency, such as those discribed at https://docs.oracle.com/javase/tutorial/essential/concurrency/collections.html

Related

Why define the Java memory model?

Java's multithreaded code is finally mapped to the operating system thread for execution.
Is the operating system thread not thread safe?
Why use the Java memory model to ensure thread safety?Why define the Java memory model?
I hope someone can answer this question, I have looked up a lot of information on the Internet, still do not understand!
The material on the web is all about atomicity, visibility, orderliness, and using the cache consistency model as an example, but I don't think it really answers the question.
thank you very much!
The operating system thread is not thread safe (that statement does not make a lot of sense, but basically, the operating system does not ensure that the intended atomicity of your code is respected).
The problem is that whether two data items are related and therefore need to be synchronized is only really understood by your application.
For example, imagine you are defining a ListOfIntegers class which contains an int array and count of the number of items used in the array. These two data items are related and the way they are updated needs to be co-ordinated in order to ensure that if the object is accessed by two different threads they are always updated in a consistent manner, even if the threads update them simultaneously. Only your application knows how these data items are related. The operating system doesn't know. They are just two pieces of memory as far as it is concerned. That is why you have to implement the thread safety (by using synchronized or carefully arranging how the fields are updated).
The Java "memory model" is pretty close to the hardware model. There is a stack for primitives and objects are allocated on the heap. Synchronization is provided to allow the programmer to lock access to shared data on the heap. In addition, there are rules that the optimizer must follow so that the opimisations don't defeat the synchronisations put in place.
Every programing language that takes concurrency seriously needs a memory model - and here is why.
The memory model is the crux of the concurrency semantics of shared-memory systems. It defines the possible values that a read operation is allowed to return for any given set of write operations performed by a concurrent program, thereby defining the basic semantics of shared variables. In other words, the memory model specifies the set of allowed outputs of a program's read and write operations, and constrains an implementation to produce only (but at least one) such allowed executions. The memory model may and often does allow executions where the outcome cannot be inferred from the order in which read and write operations occur in the program. It is impossible to meaningfully reason about a program or any part of the programming language implementation without an unambiguous memory model. The memory model defines the possible outcomes of a concurrent programs read and write operations. Conversely, the memory model also defines which instruction reorderings may be permitted, either by the processor, the memory system, or the compiler.
This is an excerpt from the paper Memory Models for C/C++ Programmers which I have co-authored. Even though a large part of it is dedicated to the C++ memory model, it also covers more general areas -
starting with the reason why we need a memory model in the first place, explaining the (intuitive) sequential consistent model, and finally the weaker memory models provided by current hardware in x86 and ARM/POWER CPUs.
The Java Memory Model answers the following question: What shall happen when multiple threads modify the same memory location.
And the answer the memory model gives is:
If a program has no data races, then all executions of the program will appear to be sequentially consistent.
There is a great paper by Sarita V. Adve, Hans-J. Boehm about why the Java and C++ Memory Model are designed the way they are: Memory Models: A Case For Rethinking Parallel Languages and Hardware
From the paper:
We have been repeatedly surprised at how difficult it is to formalize the seemingly simple and fundamental property of “what value a read should return in a multithreaded program.”
Memory models, which describe the semantics of shared variables, are crucial to both correct multithreaded applications and the entire underlying implementation stack. It is difficult to teach multithreaded programming without clarity on memory models.
After much prior confusion, major programming languages are converging on a model that guarantees simple interleaving-based semantics for "data-race-free" programs and most hardware vendors have committed to support this model.

Why sharing a static variable between threads reduce performance?

I asked question here and someone leaved a comment saying that the problem is I'm sharing a static variable.
Why is that a problem?
Sharing a static variable of and by itself should have no adverse effect on performance. Global data is common is all programs starting with the JVM and OS constructs.
Mutable shared data is a different story as the mutation of shared data can lead to both performance issues (cache misses at the very least) and correctness issues which are a pain and are often solved using locks, which lead to potentially other performance issues.
The wiki static variable looks like a pretty substantial part of your program. Not knowing anything about what it's going or how it's coded, I would guess that it does locking in order to keep a consistent state. If most of your threads are spending their time blocking waiting to acquire access to this same object then that would explain why you're not seeing any gain from using multiple threads.
For threads to make a difference to the performance of your program they have to be reasonably independent, and not all locking on the same thing. The more locking they have to do, the less gain you will see. So try to split out the work so as much can be done independently as possible. For instance if there are work items that can be gathered independently, then you might be better off by having multiple threads go find the work items, then feed them to a queue that a dedicated thread can use to pull work items off the queue and feed them to the wiki object.

Performance hit if each thread has to access identical array

I have a computationally demanding task that I am going to spread out over multiple threads. Each thread will require access to the identical, large String[][]. I am wondering whether the fact that they all need to access the same String[][] on heap will hit performance? Would it be better to make copies of this String[][] for each thread to access individually (even though they all only need to access the identical instance of this String[][])?
Note that, for String[][] someArray = new String[100][1000000]; (for example), it is improbable that at any single point in time they will be calling the same someArray[i] at the same time. Generally each thread will be using a different i at any given point in time. However sometimes i will be the same across threads (mostly by chance).
Each thread will be read-only on someArray.
If you are only reading the values then it shouldn't be a problem. It doesn't even matter if they are reading the same 'i'. The problems come when you start writing to shared memory...
EDIT: Removed confusing synchronization comment.

Java: Using ConcurrentHashMap as a lock manager

I'm writing a highly concurrent application, needing access to a large fine-grained set of shared resources. I'm currently writing a global lock manager to organize this. I'm wondering if I can piggyback off the standard ConcurrentHashMap and use that to handle the locking? I'm thinking of a system like the following:
A single global ConcurrentHashMap object contains a mapping between the unique string id of the resource, and a lock protecting that resource unique id of the thread using the resource
Tune the concurrency factor to reflect the need for a high level of concurrency
Locks are acquired using the atomic conditional replace(K key, V oldValue, V newValue) method in the hashmap
To prevent lock contention when locking multiple resources, locks must be acquired in alphabetical order
Are there any major issues with the setup? How will the performance be?
I know this is probably going to be much slower and more memory-heavy than a properly written locking system, but I'd rather not spend days trying to write my own, especially given that I probably won't be able to match Java's professionally-written concurrency code implementing the map.
Also, I've never used ConcurrentHashMap in a high-load situation, so I'm interested in the following:
How well will this scale to large numbers of elements? (I'm looking at ~1,000,000 being a good cap. If I reach beyond that I'd be willing to rewrite this more efficiently)
The documentation states that re-sizing is "relatively" slow. Just how slow is it? I'll probably have to re-size the map once every minute or so. Is this going to be problematic with the size of map I'm looking at?
Edit: Thanks Holger for pointing out that HashMaps shouldn't have that big of an issue with scaling
Also, is there is a better/more standard method out there? I can't find any places where a system like this is used, so I'm guessing that either I'm not seeing a major flaw, or theres something else.
Edit:
The application I'm writing is a network service, handling a variable number of requests. I'm using the Grizzly project to balance the requests among multiple threads.
Each request uses a small number of the shared resources (~30), so in general, I do not expect a large great deal of contention. The requests usually finish working with the resources in under 500ms. Thus, I'd be fine with a bit of blocking/continuous polling, as the requests aren't extremely time-sensitive and contention should be minimal.
In general, seeing that a proper solution would be quite similar to how ConcurrentHashMap works behind the scenes, I'm wondering if I can safely use that as a shortcut instead of writing/debugging/testing my own version.
The re-sizing issue is not relevant as you already told an estimate of the number of elements in your question. So you can give a ConcurrentHashMap an initial capacity large enough to avoid any rehashing.
The performance will not depend on the number of elements, that’s the main goal of hashing, but the number of concurrent threads.
The main problem is that you don’t have a plan of how to handle failed locks. Unless you want to poll until locking succeeds (which is not recommended) you need a way of putting a thread to sleep which implies that the thread currently owning the lock has to wake up a sleeping thread on release if one exists. So you end up requiring conventional Lock features a ConcurrentHashMap does not offer.
Creating a Lock per element (as you said ~1,000,000) would not be a solution.
A solution would look a bit like the ConcurrentHashMap works internally. Given a certain concurrency level, i.e. the number of threads you might have (rounded up), you create that number of Locks (which would be a far smaller number than 1,000,000).
Now you assign each element one of the Locks. A simple assignment would be based on the element’s hashCode, assuming it is stable. Then locking an element means locking the assigned Lock which gives you up to the configured concurrency level if all currently locked elements are mapped to different Locks.
This might imply that threads locking different elements block each other if the elements are mapped to the same Lock, but with a predictable likelihood. You can try fine-tuning the concurrency level (as said, use a number higher than the number of threads) to find the best trade-off.
A big advantage of this approach is that you do not need to maintain a data structure that depends on the number of elements. Afaik, the new parallel ClassLoader uses a similar technique.

Java concurrency - writing to different indexes of the same array

Suppose I have an array of data, can 2 threads safely write to different indexes of the same array concurrently? I'm concerned about write speed, and I want to synchronize the 'get index to write at' bit vs the actual writing.
I am writing code that lets me assume 2 threads will not get the same index.
For two different indexes in an array the same rules apply as for two separate variables.
The Chapter "Threads and Locks" in the Java Language Specification starts by stating:
17.4.1 Shared Variables
[...]
All instance fields, static fields and array elements are stored in heap memory. In this chapter, we use the term variable to refer to both fields and array elements.
This means that you can safely write to two different indexes concurrently. However you need to synchronize a write/read to the same index if you want to make sure the consumer thread sees the last value written by the producer thread.
Modifying two different variables in two different threads is safe. Modifying two different elements in an array can be compared to modifying two different variables under different memory addresses, at least as far as OS is concerned. So yes, it is safe.
Well yes it's technically true, but there are so many caveats to this answer it makes me feel very worried to tell you yes. Because while you can write to two different locations in an array you can't do much else without running into concurrency issues. The real question comes in what next are you going to do if you could do this?
If you had counter variables that moved as arrays wrote to different locations, you could run into concurrency issues. It's possible that as your array fills up you could have two threads try and write to the same location. If you potentially had a reader of the array that could read the same location that's being written too you'll have concurrency issues. Besides writing doesn't do anything if you never plan on reading it back therefore, I think you'd have concurrency problems when you go to add the reader (which will have to lock your writers out). Then there's the question of if you don't move where the threads write to what keeps it from writing over the data? And if you don't ever move the head of where the thread writes to why are you using an array? Just given them individual Latches or variables of their own to write to, and really keep them separated.
Without a full picture of your intentions saying "yes" might lead you into peril without thinking about why you are doing what you are doing.
Have a look at CopyOnWriteArrayList, provided you are okay with an arrayList. from its documentation,
A thread-safe variant of ArrayList in which all mutative operations
(add, set, and so on) are implemented by making a fresh copy of the
underlying array.
This is ordinarily too costly, but may be more efficient than alternatives
when traversal operations vastly outnumber mutations, and is useful when
you cannot or don't want to synchronize traversals, yet need to preclude
interference among concurrent threads.
And
An instance of CopyOnWriteArrayList behaves as a List implementation that
allows multiple concurrent reads, and for reads to occur concurrently with a
write. The way it does this is to make a brand new copy of the list every time
it is altered.
Reads do not block, and effectively pay only the cost of a volatile read;
Writes do not block reads (or vice versa), but only one write can occur at once.

Categories