CopyOnWriteArrayList vs AtomicArrays in Java - java

I know that CopyOnArrayList rather than locking the list, creates a seperate copies of the list for every modification and multiple threads do access it at the same time but still data consistency is maintained because of cloning, and AtomicArrays will use a lock, and only single thread can access it at a time.
So CopyOnArrayList will be helpful if we have more read operations.
Have I got it right till here and is there anymore to it?

Related

CopyOnWriteArrayList suitable only for iterations and not random access reads

I came across below comment on this question thread :
Because the CopyOnWriteArrayList is for safe traversals. The cost of using it is duplicating the underlying array of references for each modification and possibly retaining multiple copies for threads iterating over stale versions of the structure. A ReadWriteLock would allow multiple readers and still let the occasional writer perform the necessary modifications
I have just started learning about CopyOnWriteArrayList, can someone please elaborate what does the above statement mean?How does a random access read instead of iteration, make the ReadWriteLock a better option?
When you use iterator to traversal the CopyOnWriteArrayList, you will get a snapshot of list when you calling the iterator(), and future modification will not affect your snapshot so you will always loop through the data copy from the time you call iterator.
For random access loop, it will get the data from current fresh copy of list. And if some modification occurs, the future random access will read the modified list and may cause some synchronize problems. So a ReadWriteLock will be helpful here to make the traversal thread-safe.

Java making a copy of a list thread safe?

I have multiple threads running in a Java application, and all of them need to access the same list. Only one thread, however, actually needs to insert/delete/change the list, and the others just need to access it.
In the other threads, I want to make a copy of the list for me to use whenever I need to read it, but is there any way to do this in a thread safe way? If I had a method to copy the list element by element and the list changed, that would mess it up wouldn't it?
EDIT:
The list will not be deleted from very often, so would it work if I just copied it normally and caught exceptions? If the list grew in the middle of the copy and I missed it, it wouldn't really make a difference to functionality
You can use CopyOnWriteArrayList for your purpose.
CopyOnWriteArrayList is a concurrent Collection class introduced in Java 5 Concurrency API along with its popular cousin ConcurrentHashMap in Java.
As name suggest CopyOnWriteArrayList creates copy of underlying
ArrayList with every mutation operation e.g. add or set. Normally
CopyOnWriteArrayList is very expensive because it involves costly
Array copy with every write operation but its very efficient if you
have a List where Iteration outnumber mutation e.g. you mostly need to
iterate the ArrayList and don't modify it too often.
With this collection, you shouldn't create a new instance every time. You should have only one object of this and it will work.
Hmm, so I think that what are you looking for is called CopyOnWriteArrayList.
CopyOnWriteArrayList - A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
Ref: CopyOnWriteArrayList
You can use CopyOnWriteArrayList which is thread safe ,but it create new one on every update process.
Or you can use readWriteLock so when update use no one can read while multiple thread can read simultaneously .
I decided to solve this by having a separate thread that handles the thread, with BlockingQueue for the other threads to submit to if they want to write to the list, or get from the list. If they wanted to write to the list it would submit an object with the content that they wanted to write, and if they wanted to read, it would submit a Future that the thread with the list would populate
Depending on your particular usage, you might benefit from one of these:
If you don't really need random access, use ConcurrentLinkedQueue. No explicit synchronization required.
If you don't need random access for writes but only need it for reads, use ConcurrentLinkedQueue for writes and copy it to a list from time to time if changes were made to the queue (in a separate thread), give this list to "readers". Does not require explicit synchronization; gives a "weakly consistent" read view.
Since your writes come from one thread, the previous could work with 2 lists (e.g. the writing thread will copy it to the "reading view" from time to time). However, be aware that if you use an ArrayList implementation and require random access for writes then you are looking at constant copies of memory regions, not good even in the absence of excessive synchronization. This option requires synchronization for the duration of copying.
Use a map instead, ConcurrentHashMap if you don't care about ordering and want O(1) performance or ConcurrentSkipListMap if you do need ordering and are ok with O(logN) performance. No explicit synchronization required.
Use Collections.synchronizedList().
Example :
Collections.synchronizedList(new ArrayList<YourClassNameHere>())

Efficient multithreaded array building in Java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?
You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Java Concurrency: should I synchronize all List and Maps?

So I have a SomeTask class which extends Thread, and it has Map and List fields. What would be the behavior when you don't do Collections.synchronizedXXX and you have multiple thread of SomeTask running?
Once a Map is called from the database (I am using Object Database to directly store POJO), would I need to synchronized the Map object returned from this database as well?
Map SomeTasksOwnMap = Collections.synchronizedMap(MapReturnedFromDatabase);
Collections.synchronizedXXX is required when 2 or more Threads are accessing the same Map/List.
If your task doesn't access other tasks Map/List, then there is no need to synchronize them.
Example.
Task 1 builds a list of numbers divisible exactly by 2.
Task 2 builds a list of numbers divisible exactly by 3.
These two tasks have individual lists that do not require synchronization.
Example require synchronization.
Task 1 and 2 both calculate numbers and store them in a shared list.
To answer the questions: "What would be the behavior when you don't", you could lose one of the writes if it was timed that both threads wanted to write to index 'x'.
You may also have a null value in the list as the size of the array was increased before the write to the location was done.
Basically you would have an inconsistent view.
No. There is nothing in your question that suggests synchronization is required, because as far as I can tell each thread reads only data within itself: You only need synchronization when threads access data in other threads.
As an aside, having SomeTask extends Thread is a poor design - it should extends Runnable, then use new Thread(new SomeTask()).start().
... should I synchronize all List and Maps?
No you shouldn't. Synchronizing things that don't need it is a waste of resources. And for things that do need synchronization, you need to do it the right way. (And the synchronizedXxx wrappers are not always the right way.)
First, you need to identify the data structures that are going to be visible to multiple threads. Data structures that are provably thread confined don't need synchronizing at all.
Second, you need to examine the way that the data structures are used to see if a synchronizedXxx wrapper is sufficient. For instance, these wrappers don't synchronize iteration, and you can get into trouble if one thread changes a collection while another one is iterating it.
Finally, you need to think about whether the synchronized data structures are heavily used by different threads. The synchronzedXxx wrappers can result in a performance bottleneck if the data structure is heavily used. If this is the case, you should consider using one of the ConcurrentYyyy classes instead.

Java: create a middleman class between multiple threads accessing a non synchronized class?

So taking cues from this question Multithreaded access to file
My scenario is I have a spreadsheet component in which multiple threads will access and write to each workbook. The component itself is not thread safe and so am I correct in thinking that while a thread is writing to it, other thread needs to be blocked until the first one is finished writing? How would I going about to achieve this when I am dealing with a non thread safe class? put the writing method in a synchronized block?
another concern this raises is that what if one thread is busy writing long rows of data to it's respective workbook, the other thread would have to stop dead in it's tracks until the first one is finished, and this is not desirable.
instead, I imagine a scenario where each thread runs without blocking each other but the data being written to the spreadsheet is done by another middleman class which will buffer and flush the data onto spreadsheet componenet without causing multiple threads to "wait" until their writing process is complete.
Basically each thread does two things on it's own. 1) performs some long running processing of data from each respective source, 2) the writing of processed data to the spreadsheet. I am seeking a concurrent solution where 1) faces no "waiting" due to 2).
The best solution really depends on the types of operations that you're performing on the spreadsheet. For example, if one thread needs to read the value written by another thread, then it's probably necessary to lock either the whole spreadsheet or at least specific rows at a time. Since the spreadsheet itself isn't thread safe, you're correct that you'll need to do your own synchronization.
If it's important to serialize all access (which hurts performance, as it gets rid of parallelism), consider using a thread-safe queue, where each thread adds an object to the queue that represents the operation it wants to perform. Then you can have a worker thread pull items off of the queue (again, in a thread-safe manner, since the queue is thread-safe) and perform the operation.
There may be room here to parallelize the queue workers, since they can communicate with each other, and do some row-based locking amongst themselves. For example, if the first operation is to read rows 1-4 and write to row 5, and the 2nd operation is to read fro rows 6-10 and write to row 11, then these should be able to execute in parallel. But be careful here, since it may depend on the underlying structure of the spreadsheet, which you say isn't thread safe. Reads are probably fine to perform in parallel nonetheless.
While non-trivial, synchronizing access to a queue is the basic readers-writers problem, and while you have to make sure to avoid starvation as well as deadlocks, it's a lot easier to think about than random-access to a spreadsheet.
That said, the best solution would be to use a thread-safe spreadsheet, or only use one thread to ever access it. Why not use a database-backed spreadsheet and then have multiple threads reading/writing the database at once?

Categories