Efficient multithreaded array building in Java

Efficient multithreaded array building in Java - java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?

You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Related

Java making a copy of a list thread safe?

I have multiple threads running in a Java application, and all of them need to access the same list. Only one thread, however, actually needs to insert/delete/change the list, and the others just need to access it.
In the other threads, I want to make a copy of the list for me to use whenever I need to read it, but is there any way to do this in a thread safe way? If I had a method to copy the list element by element and the list changed, that would mess it up wouldn't it?
EDIT:
The list will not be deleted from very often, so would it work if I just copied it normally and caught exceptions? If the list grew in the middle of the copy and I missed it, it wouldn't really make a difference to functionality

You can use CopyOnWriteArrayList for your purpose.
CopyOnWriteArrayList is a concurrent Collection class introduced in Java 5 Concurrency API along with its popular cousin ConcurrentHashMap in Java.
As name suggest CopyOnWriteArrayList creates copy of underlying
ArrayList with every mutation operation e.g. add or set. Normally
CopyOnWriteArrayList is very expensive because it involves costly
Array copy with every write operation but its very efficient if you
have a List where Iteration outnumber mutation e.g. you mostly need to
iterate the ArrayList and don't modify it too often.
With this collection, you shouldn't create a new instance every time. You should have only one object of this and it will work.

Hmm, so I think that what are you looking for is called CopyOnWriteArrayList.
CopyOnWriteArrayList - A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
Ref: CopyOnWriteArrayList

You can use CopyOnWriteArrayList which is thread safe ,but it create new one on every update process.
Or you can use readWriteLock so when update use no one can read while multiple thread can read simultaneously .

I decided to solve this by having a separate thread that handles the thread, with BlockingQueue for the other threads to submit to if they want to write to the list, or get from the list. If they wanted to write to the list it would submit an object with the content that they wanted to write, and if they wanted to read, it would submit a Future that the thread with the list would populate

Depending on your particular usage, you might benefit from one of these:
If you don't really need random access, use ConcurrentLinkedQueue. No explicit synchronization required.
If you don't need random access for writes but only need it for reads, use ConcurrentLinkedQueue for writes and copy it to a list from time to time if changes were made to the queue (in a separate thread), give this list to "readers". Does not require explicit synchronization; gives a "weakly consistent" read view.
Since your writes come from one thread, the previous could work with 2 lists (e.g. the writing thread will copy it to the "reading view" from time to time). However, be aware that if you use an ArrayList implementation and require random access for writes then you are looking at constant copies of memory regions, not good even in the absence of excessive synchronization. This option requires synchronization for the duration of copying.
Use a map instead, ConcurrentHashMap if you don't care about ordering and want O(1) performance or ConcurrentSkipListMap if you do need ordering and are ok with O(logN) performance. No explicit synchronization required.

Use Collections.synchronizedList().
Example :
Collections.synchronizedList(new ArrayList<YourClassNameHere>())

Using Java Concurrent Collections in Scala [duplicate]

I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.

Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]

You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.

Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."

What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.

Java avoid race condition WITHOUT synchronized/lock

In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks

Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.

Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).

Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html

One of the alternatives is to make shared objects immutable. Check out this post for more details.

You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.

Java: Large collection and concurrent threads

I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks

Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.

Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)

Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface

I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.

If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.

How can I perform a threadsafe point-in-time snapshot of a key-value map in Java?

In my application, I have a key-value map that serves as a central repository for storing data that is used to return to a defined state after a crash or restart (checkpointing).
The application is multithreaded and several threads may put key-value pairs into that map. One thread is responsible for regularly creating a checkpoint, i. e. serialize the map to persistant storage.
While the checkpoint is being written, the map should remain unchanged. It's rather easy to avoid new items being added, but what about other threads changing members of "their" objects inside the map?
I could have a single object whose monitor is seized when the checkpointing starts and wrap all write access to any member of the map, and members thereof, in blocks synchronizing on that object. This seems very error-prone and tedious to me.
I could also make the map private to the checkpointer and only put copies of the submitted objects in it. But then I would have to ensure that the copies are deep copies and I wouldn't be able to have the data in the map being automatically updated, on every change to the submitted objects, the submitters would have to re-submit them. This seems like a lot of overhead and also error-prone, as I have to remember putting resubmit code in all the right places.
What's an elegant and reliable way to solve this?

what about other threads changing members of "their" objects inside the map
Here you have a problem :) and it cannot be solved by any kind of Map...
One solution would be to allow only immutable objects in your Map, but this may be impossible for you.
Otherwise you have to share a lock will all threads that may change data referenced by your map and block them all during your snapshot ; but this is a stop the world approach...

pgras is right that immutability would fix things, but that would also be tough. You could just lock the whole thing but that could be a performance problem. I can think of two good ideas.
First is to use a ReadWriteLock (which requires 1.5 or newer). Since your checkpoint can acquire the read lock it can be assured things are safe, but when no one is reading performance should be pretty good. This is still a pretty coarse lock, so you may also want to do #2...
Second is to break things up. Each area of the program could keep it's own map (the map for GUI stuff, the map for user settings, the map for hardware settings, whatever). Each one would have a lock on it and things would go about as usual. When it came time to checkpoint, the checkpointer would grab ALL the locks (so things are consistant) and then do it's job. The catch here is you have define an order for the locks to be grabbed in (say alphabetical) otherwise you'll end-up with deadlocks.
If the maps are orthogonal to each other (updates to one don't require updates to another to be consistent) then the easiest thing may be to push the updates to a central "backup" map in the checkpointer, not unlike something you described.
My biggest question to you would be, how much of a problem is this (performance wise)? Are updates very frequent, or are they rare? That would help to advise on something since my last idea (previous paragraph) could be slow, but it's easy and may not matter.
There is a fantastic book called Java Concurrency in Practice which is basically the Java threading bible. It discusses how to figure out this kind of stuff and strategies to avoid problems or make solving them easier. If you are going to be doing more threading, it's a very useful read.
Actually if your key values are orthogonal to eachother, then things are really easy. The ConcurrentMap interface (there are implemetations such as the ConcurrentHashMap) would solve your problems since they can do changes atomically, so readers wouldn't see inconsistent data. But if you have any two (or more) keys that must be updated at the same time this won't cover you.
I hope this helps. Threading access to shared data structures is complex stuff.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.