Concurrent blocking map with entries eviction

Concurrent blocking map with entries eviction - java

What I need is a fairly complex data structure with the following requirements:
It should support concurrent reads/writes without any excessive locking (like java.util.concurrent.ConcurrentHashMap)
It should have capacity limit and block once the limit is reached (just like BlockingQueue implementations)
It should have efficient search mechanism, like Map/HashSet do: given an ID of an object, I need to be able to find it without sequential scan.
It should be possible to evict elements on timeout, for instance: if an entry is put in this structure more than X minutes ago, it should be automatically removed.
Of course, there's always a chance to implement it on my own, but I'd prefer to find something existing, optimized and well-tested.
The only thing that's near is Guava's cache, but it seems to be missing #2. Any ideas on known implementations of this?

You could write a simple BlockingCache, which wraps an existing Guava Cache and checks capacity on put operations, so the put would look something like this:
public V put(K key, V value)
{
while (size() >= capacity) Thread.sleep(100);
return innerCache.put(key, value);
}

Related

Can a striped lock be used to synchronize a map?

I don't know why I can't get my head around this question.
I see examples on the internet where people talk about using striped locking to synchronize a map. The idea is to use multiple locks to allow for more concurrency, while retaining correctness. But is an approach like this actually correct?
I believe that the idea is to have one lock per bucket, the same thing that ConcurrentHashMap does under the hood, but how can one achieve such a feat given the fact the the mapping key -> bucket is an internal Map implementation detail, and is not actually possible to match it from the outside of the Map?
Take this example:
public void concurrentMethod(String key) {
val lock = stripedLock.get(key))
lock.lock()
// do work on map[key]
lock.unlock()
}
there's no guarantee that stripedLocks.get(key) will return the same lock for 2 keys that will end up in the same bucket. So, as far as I can tell, this is not correct and the synchronization doesn't actually work.
Is my reasoning wrong here? Can an approach like this lead to correct synchronization?

Java 8 Streams: count all elements which enter the terminal operation

I wonder whether there is a nicer (or just an other) approach to get the count of all items that enter the terminal operation of a stream instead of the following:
Stream<T> stream = ... // given as parameter
AtomicLong count = new AtomicLong();
stream.filter(...).map(...)
.peek(t -> count.incrementAndGet())
where count.get() gives me the actual count of the processed items at that stage.
I deliberately skipped the terminal operation as that might change between .forEach, .reduce or .collect.
I do know .count already, but it seems to work well only if I exchange a .forEach with a .map and use the .count as terminal operation instead. But it seems to me as if .map is then misused.
What I don't really like with the above solution: if a filter is added after it, it just counts the elements at that specific stage, but not the ones that are going into the terminal operation.
The other approach that comes to my mind is to collect the filtered and mapped values into a list and operate on that and just call list.size() to get the count. However this will not work, if the collection of the stream would lead to an error, whereas with the above solution I could have a count for all processed items so far, if an appropriate try/catch is in place. That however isn't a hard requirement.

It seems you already have the cleanest solution via peek before the terminal operation IMO. The only reason I could think that this is needed is for debug purposes - and if that is the case, than peek was designed for that. Wrapping the Stream for that and providing separate implementations is way too much - besides the huge amount of time and later support for everything that get's added to Streams.
For the part of what if there is another filter added? Well, provide a code comment(lots of us do that) and a few test cases that would otherwise fail for example.
Just my 0.02$

The best idea that is possible is using a mapping on itself and while doing so counting the invocation of the mapping routine.
steam.map(object -> {counter.incrementAndGet(); return object;});
Since this lambda can be reused and you can replace any lambda with an object you can create a counter object like this:
class StreamCounter<T> implements Function<? super T,? extends T> {
int counter = 0;
public T apply(T object) { counter++; return object;}
public int get() { return counter;}
}
So using:
StreamCounter<String> myCounter = new ...;
stream.map(myCounter)...
int count = myCounter.get();
Since again the map invocation is just another point of reuse the map method can be provided by extending Stream and wrap the ordinary stream.
This way you can create something like:
AtomicLong myValue = new AtomicLong();
...
convert(stream).measure(myValue).map(...).measure(mySecondValue).filter(...).measure(myThirdValue).toList(...);
This way you can simply have your own Stream wrapper that wraps transparently every stream in its own version (which is no performance or memory overhead) and measure the cardinality of any such point of measure.
This is often done when analyzing complexity of algorithms when creating map/reduce solutions. Extending your stream implementation by not taking a atomic long instance for counting but only the name of the measure point your stream implementation can hold unlimited number of measure points while providing a flexible way to print a report.
Such an implementation can remember the concrete sequence of stream methods along with the position of each measure point and brings outputs like:
list -> (32k)map -> (32k)filter -> (5k)map -> avg().
Such a stream implementation is written once, can be used for testing but also for reporting.
Build in into an every day implementation gives the possibility to gather statistics for certain processing and allow for a dynamic optimization by using a different permutation of operations. This would be for example a query optimizer.
So in your case the best would be reusing a StreamCounter first and depending on the frequency of use, the number of counters and the affinity for the DRY-principle eventually implement a more sophisticated solution later on.
PS: StreamCounter uses an int value and is not thread-safe so in a parallel stream setup one would replace the int with an AtomicInteger instance.

LocalCache guava, optimization for higher throughput

I'm using CacheBuilder and LocalCache from guava library, but have some performance issues p99.9 latency around 300-400 ms for getAllPresent.
Latency for requests almost doubles between p99 and p99.9 (p99 is around 150 ms)
The following configuration is used:
120 sec for refreshAfterWrite, maxsize is set to be 2e6 and expiration for 24 hours, initial capacity is 1e6. No removeListener is used and no expireAfterWrite. ConcurrencyLevel 256 (Tried different values). Machine has 12 cores.
While cache is in use it has between 8e5 to 1.2e6 entries.
Pattern of usage is getAllPresent for around 3k keys on p99.9 and around 100 qps.
Key is a complex object for hashCode, Objects.hash method is used with all fields supplied there. I tried different hash function to make sure that distribution is uniform (murmur3 shown similar results). So, the problem is not in collisions.
Any pointers on how to tune it to be more performant?

I would say it is efficient in Java for the 99%tile to be double the 90%tile and for the 99.9%tile to be double the 99%tile. If you see this pattern, you will need to reduce the cost of the operation over all to reduce the latency i.e. it is unlikely there is some quick wins that will help you.
NOTE: when you have a large cache and scan across it you can expect every entry to involve at least one or two L3 cache misses. This is going to be expensive. For a small cache which fits in your CPU cache this will be many times faster.
I would use a profiler to reduce CPU and memory allocation for this operation, or change the how you call the cache to do what you need and this will also bring down the 99.9%tile.

On varying request times / "Request times doubles between p99 and p99.9"
That might simply be an occasional GC during the getAllPresent call. To really investigate this you should do a stripped down benchmark which tracks the GC activity (just the counters).
Another source of trouble may be a lock contention. I am missing in your problem statement the exact access pattern. How many requests are done in parallel? How does the key space overlap? Guava partitions the cache hashtables internally and uses the concurrencyLevel as hint. The read access is not completely lock free, since the LRU list needs to be updated. For accessing the same key from different threads, this is a source for lock contention. Here is an (outdated) evaluation on nitro cache performance showing this effect. (Update: the guava cache has some strategy to avoid the locks on read; this needs further investigation)
On how to get (15 times?) faster
The most costly thing when you access the cache is the eviction algorithm updating its data structure. However, your maximum cache size (2E6) is above the maximum experienced size (1.2E6). This means no eviction will take place, because the capacity limit is never reached. This means that all the updating of the LRU list in Guava Cache is senseless. I have benchmarked the cache runtime for Google Guava, EHCache, infinispan and different eviction strategies at cache2k benchmarks see the "runtime comparison for hits". Benchmarks for multi threaded accesses are missing yet, this will show up during august.
From my understanding there is no option to change or switch of the eviction strategy in Guava Cache (can anybody second this?).
Within cache2k I do experiment with alternative eviction strategies which allow a lock free read access. Within your scenario, you could simply select "random eviction", and I would expect a speedup of about factor 15. BTW: The cache2k cache also prints out hash table statistics and a quality metric for your hashCode() implementation see the notes on cache2k statistics.
It should be possible to do a quick evaluation. Here some code snippets to get you started quickly:
<dependency>
<groupId>org.cache2k</groupId>
<artifactId>cache2k-core</artifactId>
<version>0.19.1</version>
</dependency>
<dependency>
<groupId>org.cache2k</groupId>
<artifactId>cache2k-api</artifactId>
<version>0.19.1</version>
</dependency>
Remark: The cache implementations are not exposed in the API module, that's why we need the core module in the compile scope. Cache initialization:
// optional data source (similar to CacheLoader)
CacheSource<Integer, String> source =
new CacheSource<Integer, String>() {
public String get(Integer o) {
return o + "hello";
}
};
Cache<Integer, String> cache =
CacheBuilder.newCache(Integer.class, String.class)
.implementation(RandomCache.class)
.maxSize(3000000)
.expiryMillis(120 * 1000)
/* optional, if cache should do the refresh itself
.source(source)
.backgroundRefresh(true)
*/
.build();
You can experiment with other eviction algorithms by altering the implementation option.
getAllPresent is not available in cache2k, you can code it yourself:
public Map<Integer, String> getAllPresent(Iterator<Integer> it) {
HashMap<Integer, String> hash = new HashMap<>();
while(it.hasNext()) {
int k = it.next();
String v = cache.peek(k);
if (v != null) {
hash.put(k, v);
}
}
return hash;
}
In cache2k cache.peek() returns a mapped element without invoking the cache source, that is exactly the intended semantic of getAllPresent. Building up the hash map produces actually a lot GC load. The usage of bulk operations like getAll or getAllPresent should be a careful decision. Since the access times in cache2k are similar to a hash table access time, bulk operations will probably not speed things up.
A note on getAllPresent()
Within cache2k there is a JSR107 compatible getAll() method which serves about the same purpose. From an API designers standpoint these methods are evil, since it contradicts the idea of the cache to control the resources. Just got with cache.get() or cache.peek(). If there is a CacheSource (aka CacheLoader) use cache.prefetch(keys) "say to the cache" that you want to work with these keys next.... Sorry, a little offtopic.

reliably forcing Guava map eviction to take place

EDIT: I've reorganized this question to reflect the new information that since became available.
This question is based on the responses to a question by Viliam concerning Guava Maps' use of lazy eviction: Laziness of eviction in Guava's maps
Please read this question and its response first, but essentially the conclusion is that Guava maps do not asynchronously calculate and enforce eviction. Given the following map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeMap();
Once ten minutes has passed following access to an entry, it will still not be evicted until the map is "touched" again. Known ways to do this include the usual accessors - get() and put() and containsKey().
The first part of my question [solved]: what other calls cause the map to be "touched"? Specifically, does anyone know if size() falls into this category?
The reason for wondering this is that I've implemented a scheduled task to occasionally nudge the Guava map I'm using for caching, using this simple method:
public static void nudgeEviction() {
cache.containsKey("");
}
However I'm also using cache.size() to programmatically report the number of objects contained in the map, as a way to confirm this strategy is working. But I haven't been able to see a difference from these reports, and now I'm wondering if size() also causes eviction to take place.
Answer: So Mark has pointed out that in release 9, eviction is invoked only by the get(), put(), and replace() methods, which would explain why I wasn't seeing an effect for containsKey(). This will apparently change with the next version of guava which is set for release soon, but unfortunately my project's release is set sooner.
This puts me in an interesting predicament. Normally I could still touch the map by calling get(""), but I'm actually using a computing map:
ConcurrentMap<String, MyObject> cache = new MapMaker()
.expireAfterAccess(10, TimeUnit.MINUTES)
.makeComputingMap(loadFunction);
where loadFunction loads the MyObject corresponding to the key from a database. It's starting to look like I have no easy way of forcing eviction until r10. But even being able to reliably force eviction is put into doubt by the second part of my question:
The second part of my question [solved]: In reaction to one of the responses to the linked question, does touching the map reliably evict all expired entries? In the linked answer, Niraj Tolia indicates otherwise, saying eviction is potentially only processed in batches, which would mean multiple calls to touch the map might be needed to ensure all expired objects were evicted. He did not elaborate, however this seems related to the map being split into segments based on concurrency level. Assuming I used r10, in which a containsKey("") does invoke eviction, would this then be for the entire map, or only for one of the segments?
Answer: maaartinus has addressed this part of the question:
Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
So it looks like calling containsKey("") wouldn't be a viable fix, even in r10. This reduces my question to the title: How can I reliably force eviction to occur?
Note: Part of the reason my web app is noticeably affected by this issue is that when I implemented caching I decided to use multiple maps - one for each class of my data objects. So with this issue there is the possibility that one area of code is executed, causing a bunch of Foo objects to be cached, and then the Foo cache isn't touched again for a long time so it doesn't evict anything. Meanwhile Bar and Baz objects are being cached from other areas of code, and memory is being eaten. I'm setting a maximum size on these maps, but this is a flimsy safeguard at best (I'm assuming its effect is immediate - still need to confirm this).
UPDATE 1: Thanks to Darren for linking the relevant issues - they now have my votes. So it looks like a resolution is in the pipeline, but seems unlikely to be in r10. In the meantime, my question remains.
UPDATE 2: At this point I'm just waiting for a Guava team member to give feedback on the hack maaartinus and I put together (see answers below).
LAST UPDATE: feedback received!

I just added the method Cache.cleanUp() to Guava. Once you migrate from MapMaker to CacheBuilder you can use that to force eviction.

I was wondering the about the same issue you described in the first part of your question. From what I can tell from looking at the source code for Guava's CustomConcurrentHashMap (release 9), it appears that entries are evicted on the get(), put(), and replace() methods. The containsKey() method does not appear to invoke eviction. I'm not 100% sure because I took a quick pass at the code.
Update:
I also found a more recent version of the CustomConcurrentHashmap in Guava's git repository and it looks like containsKey() has been updated to invoke eviction.
Both release 9 and the latest version I just found do not invoke eviction when size() is called.
Update 2:
I recently noticed that Guava r10 (yet to be released) has a new class called CacheBuilder. Basically this class is a forked version of the MapMaker but with caching in mind. The documentation suggests that it will support some of the eviction requirements you are looking for.
I reviewed the updated code in r10's version of the CustomConcurrentHashMap and found what looks like a scheduled map cleaner. Unfortunately, that code appears unfinished at this point but r10 looks more and more promising each day.

Beware that containsKey and other reading methods only run postReadCleanup, which does nothing but on each 64th invocation (see DRAIN_THRESHOLD). Moreover, it looks like all cleanup methods work with single Segment only.
The easiest way to enforce eviction seems to be to put some dummy object into each segment. For this to work, you'd need to analyze CustomConcurrentHashMap.hash(Object), which is surely no good idea, as this method may change anytime. Moreover, depending on the key class it may be hard to find a key with a hashCode ensuring it lands in a given segment.
You could use reads instead, but would have to repeat them 64 times per segment. Here, it'd easy to find a key with an appropriate hashCode, since here any object is allowed as an argument.
Maybe you could hack into the CustomConcurrentHashMap source code instead, it could be as trivial as
public void runCleanup() {
final Segment<K, V>[] segments = this.segments;
for (int i = 0; i < segments.length; ++i) {
segments[i].runCleanup();
}
}
but I wouldn't do it without a lot of testing and/or an OK by a guava team member.

Yep, we've gone back and forth a few times on whether these cleanup tasks should be done on a background thread (or pool), or should be done on user threads. If they were done on a background thread, this would eventually happen automatically; as it is, it'll only happen as each segment gets used. We're still trying to come up with the right approach here - I wouldn't be surprised to see this change in some future release, but I also can't promise anything or even make a credible guess as to how it will change. Still, you've presented a reasonable use case for some kind of background or user-triggered cleanup.
Your hack is reasonable, as long as you keep in mind that it's a hack, and liable to break (possibly in subtle ways) in future releases. As you can see in the source, Segment.runCleanup() calls runLockedCleanup and runUnlockedCleanup: runLockedCleanup() will have no effect if it can't lock the segment, but if it can't lock the segment it's because some other thread has the segment locked, and that other thread can be expected to call runLockedCleanup as part of its operation.
Also, in r10, there's CacheBuilder/Cache, analogous to MapMaker/Map. Cache is the preferred approach for many current users of makeComputingMap. It uses a separate CustomConcurrentHashMap, in the common.cache package; depending on your needs, you may want your GuavaEvictionHacker to work with both. (The mechanism is the same, but they're different Classes and therefore different Methods.)

I'm not a big fan of hacking into or forking external code until absolutely necessary. This problem occurs in part due to an early decision for MapMaker to fork ConcurrentHashMap, thereby dragging in a lot of complexity that could have been deferred until after the algorithms were worked out. By patching above MapMaker, the code is robust to library changes so that you can remove your workaround on your own schedule.
An easy approach is to use a priority queue of weak reference tasks and a dedicated thread. This has the drawback of creating many stale no-op tasks, which can become excessive in due to the O(lg n) insertion penalty. It works reasonably well for small, less frequently used caches. It was the original approach taken by MapMaker and its simple to write your own decorator.
A more robust choice is to mirror the lock amortization model with a single expiration queue. The head of the queue can be volatile so that a read can always peek to determine if it has expired. This allows all reads to trigger an expiration and an optional clean-up thread to check regularly.
By far the simplest is to use #concurrencyLevel(1) to force MapMaker to use a single segment. This reduces the write concurrency, but most caches are read heavy so the loss is minimal. The original hack to nudge the map with a dummy key would then work fine. This would be my preferred approach, but the other two options are okay if you have high write loads.

I don't know if it is appropriate for your use case, but your main concern about the lack of background cache eviction seems to be memory consumption, so I would have thought that using softValues() on the MapMaker to allow the Garbage Collector to reclaim entries from the cache when a low memory situation occurs. Could easily be the solution for you. I have used this on a subscription-server (ATOM) where entries are served through a Guava cache using SoftReferences for values.

Based on maaartinus's answer, I came up with the following code which uses reflection rather than directly modifying the source (If you find this useful please upvote his answer!). While it will come at a performance penalty for using reflection, the difference should be negligible since I'll run it about once every 20 minutes for each caching Map (I'm also caching the dynamic lookups in the static block which will help). I have done some initial testing and it appears to work as intended:
public class GuavaEvictionHacker {
//Class objects necessary for reflection on Guava classes - see Guava docs for info
private static final Class<?> computingMapAdapterClass;
private static final Class<?> nullConcurrentMapClass;
private static final Class<?> nullComputingConcurrentMapClass;
private static final Class<?> customConcurrentHashMapClass;
private static final Class<?> computingConcurrentHashMapClass;
private static final Class<?> segmentClass;
//MapMaker$ComputingMapAdapter#cache points to the wrapped CustomConcurrentHashMap
private static final Field cacheField;
//CustomConcurrentHashMap#segments points to the array of Segments (map partitions)
private static final Field segmentsField;
//CustomConcurrentHashMap$Segment#runCleanup() enforces eviction on the calling Segment
private static final Method runCleanupMethod;
static {
try {
//look up Classes
computingMapAdapterClass = Class.forName("com.google.common.collect.MapMaker$ComputingMapAdapter");
nullConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullConcurrentMap");
nullComputingConcurrentMapClass = Class.forName("com.google.common.collect.MapMaker$NullComputingConcurrentMap");
customConcurrentHashMapClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap");
computingConcurrentHashMapClass = Class.forName("com.google.common.collect.ComputingConcurrentHashMap");
segmentClass = Class.forName("com.google.common.collect.CustomConcurrentHashMap$Segment");
//look up Fields and set accessible
cacheField = computingMapAdapterClass.getDeclaredField("cache");
segmentsField = customConcurrentHashMapClass.getDeclaredField("segments");
cacheField.setAccessible(true);
segmentsField.setAccessible(true);
//look up the cleanup Method and set accessible
runCleanupMethod = segmentClass.getDeclaredMethod("runCleanup");
runCleanupMethod.setAccessible(true);
}
catch (ClassNotFoundException cnfe) {
throw new RuntimeException("ClassNotFoundException thrown in GuavaEvictionHacker static initialization block.", cnfe);
}
catch (NoSuchFieldException nsfe) {
throw new RuntimeException("NoSuchFieldException thrown in GuavaEvictionHacker static initialization block.", nsfe);
}
catch (NoSuchMethodException nsme) {
throw new RuntimeException("NoSuchMethodException thrown in GuavaEvictionHacker static initialization block.", nsme);
}
}
/**
* Forces eviction to take place on the provided Guava Map. The Map must be an instance
* of either {#code CustomConcurrentHashMap} or {#code MapMaker$ComputingMapAdapter}.
*
* #param guavaMap the Guava Map to force eviction on.
*/
public static void forceEvictionOnGuavaMap(ConcurrentMap<?, ?> guavaMap) {
try {
//we need to get the CustomConcurrentHashMap instance
Object customConcurrentHashMap;
//get the type of what was passed in
Class<?> guavaMapClass = guavaMap.getClass();
//if it's a CustomConcurrentHashMap we have what we need
if (guavaMapClass == customConcurrentHashMapClass) {
customConcurrentHashMap = guavaMap;
}
//if it's a NullConcurrentMap (auto-evictor), return early
else if (guavaMapClass == nullConcurrentMapClass) {
return;
}
//if it's a computing map we need to pull the instance from the adapter's "cache" field
else if (guavaMapClass == computingMapAdapterClass) {
customConcurrentHashMap = cacheField.get(guavaMap);
//get the type of what we pulled out
Class<?> innerCacheClass = customConcurrentHashMap.getClass();
//if it's a NullComputingConcurrentMap (auto-evictor), return early
if (innerCacheClass == nullComputingConcurrentMapClass) {
return;
}
//otherwise make sure it's a ComputingConcurrentHashMap - error if it isn't
else if (innerCacheClass != computingConcurrentHashMapClass) {
throw new IllegalArgumentException("Provided ComputingMapAdapter's inner cache was an unexpected type: " + innerCacheClass);
}
}
//error for anything else passed in
else {
throw new IllegalArgumentException("Provided ConcurrentMap was not an expected Guava Map: " + guavaMapClass);
}
//pull the array of Segments out of the CustomConcurrentHashMap instance
Object[] segments = (Object[])segmentsField.get(customConcurrentHashMap);
//loop over them and invoke the cleanup method on each one
for (Object segment : segments) {
runCleanupMethod.invoke(segment);
}
}
catch (IllegalAccessException iae) {
throw new RuntimeException(iae);
}
catch (InvocationTargetException ite) {
throw new RuntimeException(ite.getCause());
}
}
}
I'm looking for feedback on whether this approach is advisable as a stopgap until the issue is resolved in a Guava release, particularly from members of the Guava team when they get a minute.
EDIT: updated the solution to allow for auto-evicting maps (NullConcurrentMap or NullComputingConcurrentMap residing in a ComputingMapAdapter). This turned out to be necessary in my case, since I'm calling this method on all of my maps and a few of them are auto-evictors.

Thread-safe iteration over a collection

We all know when using Collections.synchronizedXXX (e.g. synchronizedSet()) we get a synchronized "view" of the underlying collection.
However, the document of these wrapper generation methods states that we have to explicitly synchronize on the collection when iterating of the collections using an iterator.
Which option do you choose to solve this problem?
I can only see the following approaches:
Do it as the documentation states: synchronize on the collection
Clone the collection before calling iterator()
Use a collection which iterator is thread-safe (I am only aware of CopyOnWriteArrayList/Set)
And as a bonus question: when using a synchronized view - is the use of foreach/Iterable thread-safe?

You've already answered your bonus question really: no, using an enhanced for loop isn't safe - because it uses an iterator.
As for which is the most appropriate approach - it really depends on how your context:
Are writes very infrequent? If so, CopyOnWriteArrayList may be most appropriate.
Is the collection reasonably small, and the iteration quick? (i.e. you're not doing much work in the loop) If so, synchronizing may well be fine - especially if this doesn't happen too often (i.e. you won't have much contention for the collection).
If you're doing a lot of work and don't want to block other threads working at the same time, the hit of cloning the collection may well be acceptable.

Depends on your access model. If you have low concurrency and frequent writes, 1 will have the best performance. If you have high concurrency with and infrequent writes, 3 will have the best performance. Option 2 is going to perform badly in almost all cases.
foreach calls iterator(), so exactly the same things apply.

You could use one of the newer collections added in Java 5.0 which support concurrent access while iterating. Another approach is to take a copy using toArray which is thread safe (during the copy).
Collection<String> words = ...
// enhanced for loop over an array.
for(String word: words.toArray(new String[0])) {
}

I might be totally off with your requirements, but if you are not aware of them, check out google-collections with "Favor immutability" in mind.

I suggest dropping Collections.synchronizedXXX and handle all locking uniformly in the client code. The basic collections don't support the sort of compound operations useful in threaded code, and even if you use java.util.concurrent.* the code is more difficult. I suggest keeping as much code as possible thread-agnostic. Keep difficult and error-prone thread-safe (if we are very lucky) code to a minimum.

All three of your options will work. Choosing the right one for your situation will depend on what your situation is.
CopyOnWriteArrayList will work if you want a list implementation and you don't mind the underlying storage being copied every time you write. This is pretty good for performance as long as you don't have very big collections.
ConcurrentHashMap or "ConcurrentHashSet" (using Collections.newSetFromMap) will work if you need a Map or Set interface, obviously you don't get random access this way. One great! thing about these two is that they will work well with large data sets - when mutated they just copy little bits of the underlying data storage.

It does depend on the result one needs to achieve cloning/copying/toArray(), new ArrayList(..) and the likes obtain a snapshot and does not lock the the collection.
Using synchronized(collection) and iteration through ensure by the end of the iteration would be no modification, i.e. effectively locking it.
side note:(toArray() is usually preferred with some exceptions when internally it needs to create a temporary ArrayList). Also please note, anything but toArray() should be wrapped in synchronized(collection) as well, provided using Collections.synchronizedXXX.

This Question is rather old (sorry, i am a bit late..) but i still want to add my Answer.
I would choose your second choice (i.e. Clone the collection before calling iterator()) but with a major twist.
Asuming, you want to iterate using iterator, you do not have to coppy the Collection before calling .iterator() and sort of negating (i am using the term "negating" loosely) the idea of the iterator pattern, but you could write a "ThreadSafeIterator".
It would work on the same premise, coppying the Collection, but without letting the iterating class know, that you did just that. Such an Iterator might look like this:
class ThreadSafeIterator<T> implements Iterator<T> {
private final Queue<T> clients;
private T currentElement;
private final Collection<T> source;
AsynchronousIterator(final Collection<T> collection) {
clients = new LinkedList<>(collection);
this.source = collection;
}
#Override
public boolean hasNext() {
return clients.peek() != null;
}
#Override
public T next() {
currentElement = clients.poll();
return currentElement;
}
#Override
public void remove() {
synchronized(source) {
source.remove(currentElement);
}
}
}
Taking this a Step furhter, you might use the Semaphore Class to ensure thread-safety or something. But take the remove method with a grain of salt.
The point is, by using such an Iterator, no one, neither the iterating nor the iterated Class (is that a real word) has to worrie about Thread safety.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.