Java: Large collection and concurrent threads - java

I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks

Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.

Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)

Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface

I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.

If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.

Related

Efficient multithreaded array building in Java

I have many threads adding result-like objects to an array, and would like to improve the performance of this area by removing synchronization.
To do this, I would like for each thread to instead post their results to a ThreadLocal array - then once processing is complete, I can combine the arrays for the following phase. Unfortunately, for this purpose ThreadLocal has a glaring issue: I cannot combine the collections at the end, as no thread has access the collection of another.
I can work around this by additionally adding each ThreadLocal array to a list next to the ThreadLocal as they are created, so I have all the lists available later on (this will require synchronization but only needs to happen once for each thread), however in order to avoid a memory leak I will have to somehow get all the threads to return at the end to clean up their ThreadLocal cache... I would much rather the simple process of adding a result be transparent, and not require any follow up work beyond simply adding the result.
Is there a programming pattern or existing ThreadLocal-like object which can solve this issue?
You're right, ThreadLocal objects are designed to be only accessible to the current thread. If you want to communicate across threads you cannot use ThreadLocal and should use a thread-safe data structure instead, such as ConcurrentHashMap or ConcurrentLinkedQueue.
For the use case you're describing it would be easy enough to share a ConcurrentLinkedQueue between your threads and have them all write to the queue as needed. Once they're all done (Thread.join() will wait for them to finish) you can read the queue into whatever other data structure you need.

Thread safe queue in Java

I want to implement a queue, that is hit by multiple threads.
This is stack is in a singleton class.
Now, a simple solution is to synchronize this? I assume it would need this as standard?
However, I want to prioritize writing to it.
So, write is high priority, read is low priority.
Is this possible?
Ideally writing by multiple threads without synchronizing would be great, if possible.
Why do you want to avoid synchronizing? It's possible to write "lock-free" structures, but it's quite tricky and easy to get wrong.
If I were you, I'd use ArrayBlockingQueue or ConcurrentLinkedQueue (or one of the other structures from java.util.concurrent) and make your life easy!
Oh, and I missed the bit about prioritising reads over writes. You can do that with the ReentrantReadWriteLock class. Then you don't need a thread-safe queue - you just lock externally using the read-write lock depending on whether you're reading or writing.

Most efficient way to make a data structure thread-safe (Java)

I have a shared Map data structure that needs to be thread-safe. Is synchronized the most efficient way to read or add to the Map?
Thanks!
Edit: The data structure is a non-updatable cache, i.e. once it fills up it does not update the cache. So lots of writes initially with some reads then it is mostly reads
"Most efficient" is relative, of course, and depends on your specific situation. But consider something like ConcurrentHashMap if you expect there to be many threads working with the map simultaneously; it's thread safe but still allows concurrent access, unlike Hashtable or Collections.synchronizedMap().
That depends on how you use it in the app.
If you're doing lots of reads and writes on it, a ConcurrentHashMap is possibly the best choice, if it's mostly reading, a common Map wrapped inside a collection using a ReadWriteLock
(since writes would not be common, you'd get faster access and locking only when writing).
Collections.synchronizedMap() is possibly the worst case, since it might just give you a wrapper with all methods synchronized, avoid it at all costs.
For your specific use case (non-updatable cache), a copy on write map will outperform both a synchronized map and ConcurrentHashMap.
See: https://labs.atlassian.com/wiki/display/CONCURRENT/CopyOnWriteMap as one example (I believe apache also has a copy on write map implementation).
synchronised methods or collections will certainly work. It's not the most efficient approach but is simple to implement and you won't notice the overhead unless you are access the structure millions of times per second.
A better idea though might be to use a ConcurrentHashMap - this was designed for concurrency from the start and should perform better in a highly concurrent situation.

Java avoid race condition WITHOUT synchronized/lock

In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.

ConcurrentLinkedQueue Questions

Could you please clarify if we need to use explicit synchronization or locks for using ConcurrentLinkedQueue? I am specifically interested in knowing if sync calls are needed for following ConcurrentLinkedQueue methods.
add
clear
size
Possibly size is the only method which might require explicit sync since it's a not an atomic method but ConcurrentLinkedQueue java docs say that
"Beware that, unlike in most
collections, the size method is NOT a
constant-time operation. Because of
the asynchronous nature of these
queues, determining the current number
of elements requires a traversal of
the elements. "
which make me believe that though size call may be slow but it doesn't require any explicit sync call.
Thanks in advance ...
clear() is not atomic operation (it is implemented in AbstractQueue class), as Javadoc and source says: "This implementation repeatedly invokes poll until it returns null.". poll is atomic, but if you use offer while clear() is ongoing, you will add something in the middle of clearing, and clear() will delete it...
If you will use clear() you should use LinkedBlockingQueue instead of ConcurrentLinkedQueue.
You don't need any explicit synchronization or locks. As the docs state, it is a thread-safe collection. This mean each of these methods is correctly atomic (though as you point out, size() may be slow).
You should not and do not need to use explicit locking on any of those methods.
Yeah, you do not need to use explicit synchronization because this is a thread safe collection. Any concurrent access is allowed without worry
It is unnecessary to synchronize to preserve the internal structure of the queue. However it may be necessary to linearise other invariants of your structure.
For instance size() is fairly meaningless in any shared mutable container. All it can ever tell you is something about what it was the last time you asked, not what it is now, unless you stop the world and prevent concurrent modification. It is only useful for indicative monitoring purposes, you should never use it in your algorithm.
Similarly, clear() doesn't really mean much without some kind of external intervention. Clear what? The things that are in it at the time you call clear? In a concurrent structure answering that is a difficult if not impossible question.
So, you are better off using it as a simple thread-safe queue (only offering and polling) and steering clear of the others unless you externally lock.

Categories