Java Concurrency: should I synchronize all List and Maps? - java

So I have a SomeTask class which extends Thread, and it has Map and List fields. What would be the behavior when you don't do Collections.synchronizedXXX and you have multiple thread of SomeTask running?
Once a Map is called from the database (I am using Object Database to directly store POJO), would I need to synchronized the Map object returned from this database as well?
Map SomeTasksOwnMap = Collections.synchronizedMap(MapReturnedFromDatabase);

Collections.synchronizedXXX is required when 2 or more Threads are accessing the same Map/List.
If your task doesn't access other tasks Map/List, then there is no need to synchronize them.
Example.
Task 1 builds a list of numbers divisible exactly by 2.
Task 2 builds a list of numbers divisible exactly by 3.
These two tasks have individual lists that do not require synchronization.
Example require synchronization.
Task 1 and 2 both calculate numbers and store them in a shared list.
To answer the questions: "What would be the behavior when you don't", you could lose one of the writes if it was timed that both threads wanted to write to index 'x'.
You may also have a null value in the list as the size of the array was increased before the write to the location was done.
Basically you would have an inconsistent view.

No. There is nothing in your question that suggests synchronization is required, because as far as I can tell each thread reads only data within itself: You only need synchronization when threads access data in other threads.
As an aside, having SomeTask extends Thread is a poor design - it should extends Runnable, then use new Thread(new SomeTask()).start().

... should I synchronize all List and Maps?
No you shouldn't. Synchronizing things that don't need it is a waste of resources. And for things that do need synchronization, you need to do it the right way. (And the synchronizedXxx wrappers are not always the right way.)
First, you need to identify the data structures that are going to be visible to multiple threads. Data structures that are provably thread confined don't need synchronizing at all.
Second, you need to examine the way that the data structures are used to see if a synchronizedXxx wrapper is sufficient. For instance, these wrappers don't synchronize iteration, and you can get into trouble if one thread changes a collection while another one is iterating it.
Finally, you need to think about whether the synchronized data structures are heavily used by different threads. The synchronzedXxx wrappers can result in a performance bottleneck if the data structure is heavily used. If this is the case, you should consider using one of the ConcurrentYyyy classes instead.

Related

Java making a copy of a list thread safe?

I have multiple threads running in a Java application, and all of them need to access the same list. Only one thread, however, actually needs to insert/delete/change the list, and the others just need to access it.
In the other threads, I want to make a copy of the list for me to use whenever I need to read it, but is there any way to do this in a thread safe way? If I had a method to copy the list element by element and the list changed, that would mess it up wouldn't it?
EDIT:
The list will not be deleted from very often, so would it work if I just copied it normally and caught exceptions? If the list grew in the middle of the copy and I missed it, it wouldn't really make a difference to functionality
You can use CopyOnWriteArrayList for your purpose.
CopyOnWriteArrayList is a concurrent Collection class introduced in Java 5 Concurrency API along with its popular cousin ConcurrentHashMap in Java.
As name suggest CopyOnWriteArrayList creates copy of underlying
ArrayList with every mutation operation e.g. add or set. Normally
CopyOnWriteArrayList is very expensive because it involves costly
Array copy with every write operation but its very efficient if you
have a List where Iteration outnumber mutation e.g. you mostly need to
iterate the ArrayList and don't modify it too often.
With this collection, you shouldn't create a new instance every time. You should have only one object of this and it will work.
Hmm, so I think that what are you looking for is called CopyOnWriteArrayList.
CopyOnWriteArrayList - A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
Ref: CopyOnWriteArrayList
You can use CopyOnWriteArrayList which is thread safe ,but it create new one on every update process.
Or you can use readWriteLock so when update use no one can read while multiple thread can read simultaneously .
I decided to solve this by having a separate thread that handles the thread, with BlockingQueue for the other threads to submit to if they want to write to the list, or get from the list. If they wanted to write to the list it would submit an object with the content that they wanted to write, and if they wanted to read, it would submit a Future that the thread with the list would populate
Depending on your particular usage, you might benefit from one of these:
If you don't really need random access, use ConcurrentLinkedQueue. No explicit synchronization required.
If you don't need random access for writes but only need it for reads, use ConcurrentLinkedQueue for writes and copy it to a list from time to time if changes were made to the queue (in a separate thread), give this list to "readers". Does not require explicit synchronization; gives a "weakly consistent" read view.
Since your writes come from one thread, the previous could work with 2 lists (e.g. the writing thread will copy it to the "reading view" from time to time). However, be aware that if you use an ArrayList implementation and require random access for writes then you are looking at constant copies of memory regions, not good even in the absence of excessive synchronization. This option requires synchronization for the duration of copying.
Use a map instead, ConcurrentHashMap if you don't care about ordering and want O(1) performance or ConcurrentSkipListMap if you do need ordering and are ok with O(logN) performance. No explicit synchronization required.
Use Collections.synchronizedList().
Example :
Collections.synchronizedList(new ArrayList<YourClassNameHere>())

Efficient multithreaded array building in Java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?
You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Is there a Java data structure that is thread-safe for parallel threads writing to different parts of an array of fixed size?

This is what I'm trying to implement:
A (singleton) array of fixed size (say 1000 elements)
A pool of threads writing smaller (<=100) element blocks to that array in parallel
We are guaranteed that total writes by all threads in the pool will write <1000 elements, so we never have to grow the array.
The order of writes doesn't matter but they have to be contiguous, e.g Thread1 populates array indexes 0-49, Thread 3 indexes 50-149, Thread 2 indexes 149-200
Is there a thread-safe data structure to achieve this?
Clearly, I would need to synchronize the "index manager" which allocates where in the array indexes a given thread needs to write. But is there a Java data structure for the array itself that can be used for this, without worrying about thread safety?
You should be able to use an AtomicReferenceArray. You can safely update indexes or atomically update with compareAndSet (though it appears you wont need that).
Editing to address akhil_mittal's question.
Let's switch the train of thought from updating an array to updating individual fields. If you were to update a field in a class the write will occur without word tearing, it won't be the case that the write will be some bits from one thread and some bits from another thread. The same is true for array indexes.
However, if you were to update a field in a class by multiple threads, the write from one thread may not be immediately visible to another thread. That is because the write may be buffered on a processor cache and eventually flushed to the other processors. The same is true for an array write to a particular index. It will be eventually visible but does not guarantee a happens-before ordering.
do we still need to concern about thread safety
You would need to worry about thread-safety the same way you would need to worry about thread-safety for a non-volatile field. It turns out that DVK may not need to worry about the writes being immediately visible.
The point of this answer is to explain that array writes are not necessarily thread-safe and using an AtomicReferenceArray can protect you from delayed writes.
Your question has been answered already by others so I'll just add examples:
Adding to an array by different threads is the way parallel sort works.
Creating arrays with the Fork/Join framework does so by the work-threads writing to different parts of the array.
Go ahead and do it, you're fine.

Accessing one array with multiple threads but either only reading or only writing

I'm wondering if there could be any problems while accessing one array with multiple threads but either only reading or only writing.
When the threads write to the array it wouldn't matter in which order they write and even if they write to the same entry all threads would write the same value.
For example, if I want to find prime numbers via the Sieve of Eratosthenes:
I create an array of consecutive numbers and set all multiples of prime numbers to 0 using multiple threads.
It wouldn't matter if the thread which strikes off the multiples of two and the thread which strikes off the multiples of 5 set the entry of the number 20 to 0 at the same time or one before or after the other.
So it's not an question of the qualitiy or consistency of the data, but of the technical possibility to do it wihout facing any java errors.
I'm assuming you mean 'without synchronization controls'. The short answer is no.
Synchronization is used for 2 reasons:
Mutual exclusion of data
communication between threads
Your setup indicates that the first reason isn't really a problem in your case. The algorithm effectively separates the data out so that multiple worker threads won't be using the same data.
However, in order for changes done in one thread to become visible to another thread, you must use synchronization. Without synchronization, the JVM makes no guarantee as to the ordering of writes. Updates that one thread makes may be visible in another thread at any time later, or even never. See Effective Java Item #66, and maybe look at the Java Concurrency in Practice book.
I don't think it would work since eventually you need to read the variables (to output them, save to disk, etc.). And the read has to be synchronized in order to guarantee correct interthread operation ordering. Remember that without synchronization java only guarantees intrathread operation ordering.
Now, you can say that you don't want to read them at all in anyway, but if that is the case, java can just optimize throwing away the whole code.

Java: Large collection and concurrent threads

I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks
Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.
Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)
Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface
I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.
If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.

Categories