Using Java Concurrent Collections in Scala [duplicate] - java

I have an Actor that - in its very essence - maintains a list of objects. It has three basic operations, an add, update and a remove (where sometimes the remove is called from the add method, but that aside), and works with a single collection. Obviously, that backing list is accessed concurrently, with add and remove calls interleaving each other constantly.
My first version used a ListBuffer, but I read somewhere it's not meant for concurrent access. I haven't gotten concurrent access exceptions, but I did note that finding & removing objects from it does not always work, possibly due to concurrency.
I was halfway rewriting it to use a var List, but removing items from Scala's default immutable List is a bit of a pain - and I doubt it's suitable for concurrent access.
So, basic question: What collection type should I use in a concurrent access situation, and how is it used?
(Perhaps secondary: Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread?)
(Tertiary: In Scala, what collection type is best for inserts and random access (delete / update)?)
Edit: To the kind responders: Excuse my late reply, I'm making a nasty habit out of dumping a question on SO or mailing lists, then moving on to the next problem, forgetting the original one for the moment.

Take a look at the scala.collection.mutable.Synchronized* traits/classes.
The idea is that you mixin the Synchronized traits into regular mutable collections to get synchronized versions of them.
For example:
import scala.collection.mutable._
val syncSet = new HashSet[Int] with SynchronizedSet[Int]
val syncArray = new ArrayBuffer[Int] with SynchronizedBuffer[Int]

You don't need to synchronize the state of the actors. The aim of the actors is to avoid tricky, error prone and hard to debug concurrent programming.
Actor model will ensure that the actor will consume messages one by one and that you will never have two thread consuming message for the same Actor.

Scala's immutable collections are suitable for concurrent usage.
As for actors, a couple of things are guaranteed as explained here the Akka documentation.
the actor send rule: where the send of the message to an actor happens before the receive of the same actor.
the actor subsequent processing rule: where processing of one message happens before processing of the next message by the same actor.
You are not guaranteed that the same thread processes the next message, but you are guaranteed that the current message will finish processing before the next one starts, and also that at any given time, only one thread is executing the receive method.
So that takes care of a given Actor's persistent state. With regard to shared data, the best approach as I understand it is to use immutable data structures and lean on the Actor model as much as possible. That is, "do not communicate by sharing memory; share memory by communicating."

What collection type should I use in a concurrent access situation, and how is it used?
See #hbatista's answer.
Is an Actor actually a multithreaded entity, or is that just my wrong conception and does it process messages one at a time in a single thread
The second (though the thread on which messages are processed may change, so don't store anything in thread-local data). That's how the actor can maintain invariants on its state.

Related

Is it a bad practice to use a BlockingObservable in this context?

I have a use case where I'm calling four separate downstream endpoints and they can all be called in parallel. After every call is completed, I return a container object from the lambda function, its only purpose being to contain the raw responses from the downstream calls on it. From there, the container object will be transformed into the required model for the consumer.
Here's the structure of the code, roughly speaking:
Observable.zip(o1, o2, o3, o4,
(resp1, resp2, resp3, resp4)
-> new RawResponseContainer(resp1, resp2, resp3, resp4)
).toBlocking().first();
Is there a better way to do this? I 100% need every observable to complete; otherwise, the transformation of the consumer model will be incomplete. While I suppose I could transform each individual response from each observable "on the fly", rather than waiting to transform every response at once, I still need every call to finish before the transformation's done.
I've read it's a bad practice to ever use toBlocking() when using rx (aside from for 'legacy' apps), so any help's appreciated.
This is not a response, a comment:
You are asking, essentially a sequential vs. parallel processing question. What you are doing is sequential processing (by blocking), what is recommended is parallel. Though which is better over the other is complete on the context. In your case, you need all the responses, even in parallel model all has to complete successfully. If even one fails, the entire processing is for naught. In parallel, every processing will occur if one fails all 3 would go to waste. In sequential, it would generate error in the middle. If you can live with the latency sequential processing brings, stay with it. Sequential processing are (in general) are less complicated implementations.

Efficient multithreaded array building in Java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?
You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Non blocking strategy for executing a pair of operations atomically in Java

Lets say I have a Set and another Queue. I want to check in the set if it contains(Element) and if not add(element) to the queue. I want to do the two steps atomically.
One obvious way is to use synchronized blocks or Lock.lock()/unlock() methods. Under thread contention , these will cause context switches. Is there any simple design strategy for achieving this in a non-blocking manner ? may be using some Atomic constructs ?
I don't think you can rely on any mechanism, except the ones you pointed out yourself, simply because you're operating on two structures.
There's decent support for concurrent/atomic operations on one data structure (like "put if not exists" in a ConcurrentHashMap), but for a sequence of operations, you're stuck with either a lock or a synchronized block.
For some operations you can employ what is called a "safe sequence", where concurrent operations may overlap without conflicting. For instance, you might be able to add a member to a set (in theory) without the need to synchronize, since two threads simultaneously adding the same member do not conceptually conflict with each other.
But to query one object and then conditionally operate on a different object is a much more complicated scenario. If your sequence was to query the set, then conditionally insert the member into the set and into the queue, the query and first insert could be replaced with a "compare and swap" operation that syncs without stalling (except perhaps at the memory access level), and then one could insert the member into the queue based on the success of the first operation, only needing to synchronize the queue insert itself. However, this sequence leaves the scenario where another thread could fail the insert and still not find the member in the queue.
Since the contention case is the relevant case you should look at "spin locks". They do not give away the CPU but spin on a flag expecting the flag to be free very soon.
Note however that real spin locks are seldom useful in Java because the normal Lock is quite good. See this blog where someone had first implemented a spinlock in Java only to find that after some corrections (i.e. after making the test correct) spin locks are on par with the standard stuff.
You can use java.util.concurrent.ConcurrentHashMap to get the semantics you want. They have a putIfAbsent that does an atomic insert. You then essentially try to add an element to the map, and if it succeeds, you know that thread that performed the insert is the only one that has, and you can then put the item in the queue safely. The other significant point here is that the operations on a ConcurrentMap insure "happens-before" semantics.
ConcurrentMap<Element,Boolean> set = new ConcurrentHashMap<Element,Boolean>();
Queue<Element> queue = ...;
void maybeAddToQueue(Element e) {
if (set.putIfAbsent(e, true) == null) {
queue.offer(e);
}
}
Note, the actual value type (Boolean) of the map is unimportant here.

Queue implementation with blocked 'take()' but with eviction policy

Is there an implementation with a blocking queue for take but bounded by a maximum size. When the size of the queue reaches a given max-size, instead of blocking 'put', it will remove the head element and insert it. So put is not blocked() but take() is.
One usage is that if I have a very slow consumer, the system will not crash ( runs out of memory ) rather these message will be removed but I do not want to block the producer.
An example of this would stock trading system. When you get a spike in stock trade/quote data, if you haven't consumed data, you want to automatically throw away old stock trade/quote.
There currently isnt in Java a thread-safe queue that will do what you are looking for. However, there is a BlockingDequeue (Double Ended Queue) that you can write a wrapper in which you can take from the head and and tail as you see freely.
This class, similar to a BlockingQueue, is thread safe.
Several strategies are provided in ThreadPoolExecutor. Search for "AbortPolicy" in this javadoc . You can also implement your own policy if you want. Perhaps Discard is similar to what you want. Personally I think CallerRuns is what you want in most cases.
I think using these is a better solution, but if you absolutely want to implement it at the queue, I'd probably do it by composition. Perhaps use a LinkedList or something and wrap it with synchronize keyword.
EDIT:(some clarifications..)
"Executor" is basically a thread pool combined with a blocking queue. It is the recommended way to implement a producer/consumer pattern in java. The authors of these libraries provides several strategies to cope with issues like you mentioned. If you are interested, here is another approach to specifically address the OOME issue (the source is framework specific and can't be used as is).

Java: Large collection and concurrent threads

I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks
Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.
Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)
Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface
I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.
If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.

Categories