We've recently had a discussion at my work about whether we need to use ConcurrentHashMap or if we can simply use regular HashMap, in our multithreaded environment. The argument for HashMaps are two: it is faster then the ConcurrentHashMap, so we should use it if possible. And ConcurrentModificationException apparently only appears as you iterate over the Map as it is modified, so "if we only PUT and GET from the map, what is the problem with the regular HashMap?" was the arguments.
I thought that concurrent PUT actions or concurrent PUT and READ could lead to exceptions, so I put together a test to show this. The test is simple; create 10 threads, each which writes the same 1000 key-value pairs into the map again-and-again for 5 seconds, then print the resulting map.
The results were quite confusing actually:
Length:1299
Errors recorded: 0
I thought each key-value pair was unique in a HashMap, but looking through the map, I can find multiple Key-Value pairs that are identical. I expected either some kind of exception or corrupted keys or values, but I did not expect this. How does this occur?
Here's the code I used, for reference:
public class ConcurrentErrorTest
{
static final long runtime = 5000;
static final AtomicInteger errCount = new AtomicInteger();
static final int count = 10;
public static void main(String[] args) throws InterruptedException
{
List<Thread> threads = new LinkedList<>();
final Map<String, Integer> map = getMap();
for (int i = 0; i < count; i++)
{
Thread t = getThread(map);
threads.add(t);
t.start();
}
for (int i = 0; i < count; i++)
{
threads.get(i).join(runtime + 1000);
}
for (String s : map.keySet())
{
System.out.println(s + " " + map.get(s));
}
System.out.println("Length:" + map.size());
System.out.println("Errors recorded: " + errCount.get());
}
private static Map<String, Integer> getMap()
{
Map<String, Integer> map = new HashMap<>();
return map;
}
private static Map<String, Integer> getConcMap()
{
Map<String, Integer> map = new ConcurrentHashMap<>();
return map;
}
private static Thread getThread(final Map<String, Integer> map)
{
return new Thread(new Runnable() {
#Override
public void run()
{
long start = System.currentTimeMillis();
long now = start;
while (now - start < runtime)
{
try
{
for (int i = 0; i < 1000; i++)
map.put("i=" + i, i);
now = System.currentTimeMillis();
}
catch (Exception e)
{
System.out.println("P - Error occured: " + e.toString());
errCount.incrementAndGet();
}
}
}
});
}
}
What you're faced with seems to be a TOCTTOU class problem. (Yes, this kind of bug happens so often, it's got its own name. :))
When you insert an entry into a map, at least the following two things need to happen:
Check whether the key already exists.
If the check returned true, update the existing entry, if it didn't, add a new one.
If these two don't happen atomically (as they would in a correctly synchronized map implementation), then several threads can come to the conclusion that the key doesn't exist yet in step 1, but by the time they reach step 2, that isn't true any more. So multiple threads will happily insert an entry with the same key.
Please note that this isn't the only problem that can happen, and depending on the implementation and your luck with visibility, you can get all kinds of different and unexpected failures.
In multi thread environment, you should always use CuncurrentHashMap, if you are going to perform any operation except get.
Most of the time you won't get an exception, but definitely get the corrupt data because of the thread local copy value.
Every thread has its own copy of the Map data when performing the put operation and when they check for key existence, multiple threads found it false and they enter the data.
Related
tldr: How can I search for an entry in multiple (read-only) Java HashMaps at the same time?
The long version:
I have several dictionaries of various sizes stored as HashMap< String, String >. Once they are read in, they are never to be changed (strictly read-only).
I want to check whether and which dictionary had stored an entry with my key.
My code was originally looking for a key like this:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Then it got a little more complicated: my search string could contain typos, or was a variant of the stored entry. Like, if the stored key was "banana", it is possible that I'd look up "bannana" or "a banana", but still would like the entry for "banana" returned. Using the Levenshtein-Distance, I now loop through all dictionaries and each entry in them:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
So far everything works as it should and I'm getting the entry I want. Unfortunately I have to look up around 7000 strings, in five dictionaries of various sizes (~ 30 - 70k entries) and it takes a while. From my processing output I have the strong impression my lookup dominates overall runtime.
My first idea to improve runtime was to search all dictionaries parallely. Since none of the dictionaries is to be changed and no more than one thread is accessing a dictionary at the same time, I don't see any safety concerns.
The question is just: how do I do this? I have never used multithreading before. My search only came up with Concurrent HashMaps (but to my understanding, I don't need this) and the Runnable-class, where I'd have to put my processing into the method run(). I think I could rewrite my current class to fit into Runnable, but I was wondering if there is maybe a simpler method to do this (or how can I do it simply with Runnable, right now my limited understanding thinks I have to restructure a lot).
Since I was asked to share the Levenshtein-Logic: It's really nothing fancy, but here you go:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}
In order to use multi-threading in your case, could be something like:
The "monitor" class, which basically stores the results and coordinates the threads;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
The Thread it's self, which can be set to search for the specific dictionary:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
#Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
And the main method for demonstration:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}
I think the easiest would be to use a stream over the entry set:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
map.entrySet().parallelStream().foreach( (entry) ->
{
// Calculate Levenshtein distance, store closest match etc.
}
);
}
// return closest match or null.
}
Provided you are using java 8 of course. You could also wrap the outer loop into an IntStream as well. Also you could directly use the Stream.reduce to get the entry with the smallest distance.
Maybe try thread pools:
ExecutorService es = Executors.newFixedThreadPool(_numDictionaries);
for (int i = 0; i < _numDictionaries; i++) {
//prepare a Runnable implementation that contains a logic of your search
es.submit(prepared_runnable);
}
I believe you may also try to find a quick estimate of strings that completely do not match (i.e. significant difference in length), and use it to finish your logic ASAP, moving to next candidate.
I have my strong doubts that HashMaps are a suitable solution here, especially if you want to have some fuzzing and stop words. You should utilize a proper full text search solutions like ElaticSearch or Apache Solr or at least an available engine like Apache Lucene.
That being said, you can use a poor man's version: Create an array of your maps and a SortedMap, iterate over the array, take the keys of the current HashMap and store them in the SortedMap with the index of their HashMap. To retrieve a key, you first search in the SortedMap for said key, get the respective HashMap from the array using the index position and lookup the key in only one HashMap. Should be fast enough without the need for multiple threads to dig through the HashMaps. However, you could make the code below into a runnable and you can have multiple lookups in parallel.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Search {
public static void main(String[] arg) {
if (arg.length == 0) {
System.out.println("Must give a search word!");
System.exit(1);
}
String searchString = arg[0].toLowerCase();
/*
* Populating our HashMaps.
*/
HashMap<String, String> english = new HashMap<String, String>();
english.put("banana", "fruit");
english.put("tomato", "vegetable");
HashMap<String, String> german = new HashMap<String, String>();
german.put("Banane", "Frucht");
german.put("Tomate", "Gemüse");
/*
* Now we create our ArrayList of HashMaps for fast retrieval
*/
List<HashMap<String, String>> maps = new ArrayList<HashMap<String, String>>();
maps.add(english);
maps.add(german);
/*
* This is our index
*/
SortedMap<String, Integer> index = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
/*
* Populating the index:
*/
for (int i = 0; i < maps.size(); i++) {
// We iterate through or HashMaps...
HashMap<String, String> currentMap = maps.get(i);
for (String key : currentMap.keySet()) {
/* ...and populate our index with lowercase versions of the keys,
* referencing the array from which the key originates.
*/
index.put(key.toLowerCase(), i);
}
}
// In case our index contains our search string...
if (index.containsKey(searchString)) {
/*
* ... we find out in which map of the ones stored in maps
* the word in the index originated from.
*/
Integer mapIndex = index.get(searchString);
/*
* Next, we look up said map.
*/
HashMap<String, String> origin = maps.get(mapIndex);
/*
* Last, we retrieve the value from the origin map
*/
String result = origin.get(searchString);
/*
* The above steps can be shortened to
* String result = maps.get(index.get(searchString).intValue()).get(searchString);
*/
System.out.println(result);
} else {
System.out.println("\"" + searchString + "\" is not in the index!");
}
}
}
Please note that this is a rather naive implementation only provided for illustration purposes. It doesn't address several problems (you can't have duplicate index entries, for example).
With this solution, you are basically trading startup speed for query speed.
Okay!!..
Since your concern is to get faster response.
I would suggest you to divide the work between threads.
Lets you have 5 dictionaries May be keep three dictionaries to one thread and rest two will take care by another thread.
And then witch ever thread finds the match will halt or terminate the other thread.
May be you need an extra logic to do that dividing work ... But that wont effect your performance time.
And may be you need little more changes in your code to get your close match:
for (Map.Entry entry : _dictionary.entrySet()) {
you are using EntrySet But you are not using values anyway it seems getting entry set is a bit expensive. And I would suggest you to just use keySet since you are not really interested in the values in that map
for (Map.Entry entry : _dictionary.keySet()) {
For more details on the proformance of map Please read this link Map performances
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.
I'm trying to multi thread an import job, but running into a problem where it's causing duplicate data. I need to keep my map outside of the loop so all my threads can update and read from it, but I can't do this without it being final and with it being final I can't update the map. Currently I need to put my Map object in the run method, but the problem comes when the values are not initially in the database and each thread creates a new one. This results in duplicate data in the database. Does anybody know how to do some sort of call back to update my map outside?
ExecutorService executorService = Executors.newFixedThreadPool(10);
final Map<Integer, Object> map = new HashMap<>();
map.putAll(populate from database);
for (int i = 0; i < 10; i++) {
executorService.execute(new Runnable() {
public void run() {
while ((line = br.readLine()) != null) {
if(map.containsKey(123)) {
//read map object
session.update(object);
} else {
map.put(123,someObject);
session.save(object);
}
if(rowCount % 250 == 0)
tx.commit;
});
}
executorService.shutdown();
You need to use some synchronization techniques.
Problematic part is when different threads are trying to put some data into map.
Example:
Thread 1 is checking if there is object with key 123 in map. Before thread 1 added new object to map, thread 2 is executed. Thread 2 also check if there is object with key 123. Then both threads added object 123 to map. This causes duplicates...
You can read more about synchronization here
http://docs.oracle.com/javase/tutorial/essential/concurrency/sync.html
Based on your problem description it appears that you want to have a map where the data is consistent and you always have the latest up-t-date data without having missed any updates.
In this case make you map as a Collections.synchronizedMap(). This will ensure that all read and write updates to the map are synchronized and hence you are guaranteed to find a key using the latest data in the map and also guaranteed to write exclusively to the map.
Refer to this SO discussion for a difference between the concurrency techniques used with maps.
Also, one more thing - defining a Map as final does not mean yu cannot modify the map - you can definitely add and remove elements from the map. What you cannot do however is change the variable to point to another map. This is illustrated by a simple code snippet below:
private final Map<Integer, String> testMap = Collections.synchronizedMap(new HashMap<Integer,String>());
testMap.add(1,"Tom"); //OK
testMap.remove(1); //OK
testMap = new HashMap<Integer,String>(); //ERROR!! Cannot modify a variable with the final modifier
I would suggest the following solution
Use ConcurrentHashmap
Don't use update and commit inside your crawling threads
Trigger save and commit when your map reaches a critical size in a separate thread.
Pseudocode sample:
final Object lock = new Object();
...
executorService.execute(new Runnable() {
public void run() {
...
synchronized(lock){
if(concurrentMap.size() > 250){
saveInASeparateThread(concurrentMap.values().removeAll()));
}
}
}
}
This following logic resolves my issue. The code below isn't tested.
ExecutorService executorService = Executors.newFixedThreadPool(10);
final Map<Integer, Object> map = new ConcurrentHashMap<>();
map.putAll(myObjectList);
List<Future> futures = new ArrayList<>();
for (int i = 0; i < 10; i++) {
final thread = i;
Future future = executorService.submit(new Callable() {
public void call() {
List<MyObject> list;
CSVReader reader = new CSVReader(new InputStreamReader(csvFile.getStream()));
list = bean.parse(strategy, reader);
int listSize = list.size();
int rowCount = 0;
for(MyObject myObject : list) {
rowCount++;
Integer key = myObject.getId();
if(map.putIfAbsent(key, myObject) == null) {
session.save(object);
} else {
myObject = map.get(key);
//Do something
session.update(myObject);
}
if(rowCount % 250 == 0 || rowCount == listSize) {
tx.flush();
tx.clear();
}
};
tx.commit();
return "Thread " + thread + " completed.";
});
futures.add(future);
}
for(Future future : futures) {
System.out.println(future.get());
}
executorService.shutdown();
I have a game where every X seconds it will write changed values in memory back to my DB. These values are stored in containers(HashMaps and ArrayLists) when the data they hold is edited.
For simplicity lets pretend I have only 1 container to write to the DB:
public static HashMap<String, String> dbEntitiesDeletesBacklog = new HashMap<String, String>();
My DB writing loop:
Timer dbUpdateJob = new Timer();
dbUpdateJob.schedule(new TimerTask() {
public void run() {
long startTime = System.nanoTime();
boolean updateEntitiesTableSuccess = UpdateEntitiesTable();
if (!updateEntitiesTableSuccess){
try {
conn.rollback();
} catch (SQLException e) {
e.printStackTrace();
logger.fatal(e.getMessage());
System.exit(1);
}
} else { //everything saved to DB - commit time
try {
conn.commit();
} catch (SQLException e) {
e.printStackTrace();
logger.fatal(e.getMessage());
System.exit(1);
}
}
logger.debug("Time to save to DB: " + (System.nanoTime() - startTime) / 1000000 + " milliseconds");
}
}, 0, 10000); //TODO:: figure out the perfect saving delay
My update method:
private boolean UpdateEntitiesTable() {
Iterator<Entry<String, String>> it = dbEntitiesDeletesBacklog.entrySet().iterator();
while (it.hasNext()) {
Entry<String, String> pairs = it.next();
String tmpEntityId = pairs.getKey();
int deletedSuccess = UPDATE("DELETE" +
" FROM " + DB_NAME + ".entities" +
" WHERE entity_id=(?)", new String[]{tmpEntityId});
if (deletedSuccess != 1) {
logger.error("Entity " + tmpEntityId + " was unable to be deleted.");
return false;
}
it.remove();
dbEntitiesDeletesBacklog.remove(tmpEntityId);
}
Do I need to create some sort of locking mechanism while 'saving to DB' for the dbEntitiesDeletesBacklog HashMap and other containers not included in this excerpt? I would think I need to, because it creates its iterator, then loops. What if something is added after the iterator is created, and before its done looping through the entries. I'm sorry this is more of a process question and less of a code help question(since I included so much sample code), but I wanted to make sure it was easy to understand what I am trying to do and asking.
Same question for my other containers which I use like so:
public static ArrayList<String> dbCharacterDeletesBacklog = new ArrayList<String>();
private boolean DeleteCharactersFromDB() {
for (String deleteWho : dbCharacterDeletesBacklog){
int deleteSuccess = MyDBSyncher.UPDATE("DELETE FROM " + DB_NAME + ".characters" +
" WHERE name=(?)",
new String[]{deleteWho});
if (deleteSuccess != 1) {
logger.error("Character(deleteSuccess): " + deleteSuccess);
return false;
}
}
dbCharacterDeletesBacklog.clear();
return true;
}
Thanks so much, as always, for any help on this. It is greatly appreciated!!
At the very least, you need a synchronized map (via Collections.synchronizedMap) if you are accessing your map concurrently, otherwise you may experience non deterministic behaviour.
Further than that, as you suggest, you also need to lock your map during iteration. From the javadoc for Collections.synchronizedMap() the suggestion is:
It is imperative that the user manually synchronize on the returned
map when iterating over any of its collection views:
Map m = Collections.synchronizedMap(new HashMap());
...
Set s = m.keySet(); // Needn't be in synchronized block
...
synchronized(m) { // Synchronizing on m, not s!
Iterator i = s.iterator(); // Must be in synchronized block
while (i.hasNext())
foo(i.next());
}
Failure to follow this advice may result in non-deterministic
behavior.
Alternatively, use a ConcurrentHashMap instead of a regular HashMap to avoid requiring synchronization during iteration. For a game, this is likely a better option since you avoid locking your collection for a long period of time.
Possibly even better, consider rotating through new collections such that every time you update the database you grab the collection and replace it with a new empty one where all new updates are written to, avoiding locking the collection while the database writes are occurring. The collections in this case would be managed by some container to allow this grab and replace to be thread safe. <<< Note: You cannot expose the underlying collection in this case to modifying code since you need to keep its reference strictly private for the swap to be effective (and not introduce any race conditions).
Here is a sample of what I will be using. I am posting there here in the hopes that it will help someone else with a similar issue.
public class MyDBSyncher {
public static boolean running = false;
public static HashMap<String, String> dbEntitiesInsertsBacklog_A = new HashMap<String, String>();
public static HashMap<String, String> dbEntitiesInsertsBacklog_B = new HashMap<String, String>();
public MyDBSyncher(){
Timer dbUpdateJob = new Timer();
dbUpdateJob.schedule(new TimerTask() {
public void run() {
running = true;
boolean updateEntitiesTableSuccess = UpdateEntitiesTable();
running = false;
}
}, 0, 10000); //TODO:: figure out the perfect saving delay
}
public HashMap getInsertableEntitiesHashMap(){
if (running){
return dbEntitiesInsertsBacklog_B;
} else {
return dbEntitiesInsertsBacklog_A;
}
}
private boolean UpdateEntitiesTable() {
Iterator<Entry<String, String>> it2 = getInsertableEntitiesHashMap().entrySet().iterator();
while (it2.hasNext()) {
Entry<String, String> pairs = it2.next();
String tmpEntityId = pairs.getKey();
//some DB updates here
it2.remove();
getInsertableEntitiesHashMap().remove(tmpEntityId);
}
return true;
}
}
I am aggregating multiple values for keys in a multi-threaded environment. The keys are not known in advance. I thought I would do something like this:
class Aggregator {
protected ConcurrentHashMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public Aggregator() {}
public void record(String key, String value) {
List<String> newList =
Collections.synchronizedList(new ArrayList<String>());
List<String> existingList = entries.putIfAbsent(key, newList);
List<String> values = existingList == null ? newList : existingList;
values.add(value);
}
}
The problem I see is that every time this method runs, I need to create a new instance of an ArrayList, which I then throw away (in most cases). This seems like unjustified abuse of the garbage collector. Is there a better, thread-safe way of initializing this kind of a structure without having to synchronize the record method? I am somewhat surprised by the decision to have the putIfAbsent method not return the newly-created element, and by the lack of a way to defer instantiation unless it is called for (so to speak).
Java 8 introduced an API to cater for this exact problem, making a 1-line solution:
public void record(String key, String value) {
entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())).add(value);
}
For Java 7:
public void record(String key, String value) {
List<String> values = entries.get(key);
if (values == null) {
entries.putIfAbsent(key, Collections.synchronizedList(new ArrayList<String>()));
// At this point, there will definitely be a list for the key.
// We don't know or care which thread's new object is in there, so:
values = entries.get(key);
}
values.add(value);
}
This is the standard code pattern when populating a ConcurrentHashMap.
The special method putIfAbsent(K, V)) will either put your value object in, or if another thread got before you, then it will ignore your value object. Either way, after the call to putIfAbsent(K, V)), get(key) is guaranteed to be consistent between threads and therefore the above code is threadsafe.
The only wasted overhead is if some other thread adds a new entry at the same time for the same key: You may end up throwing away the newly created value, but that only happens if there is not already an entry and there's a race that your thread loses, which would typically be rare.
As of Java-8 you can create Multi Maps using the following pattern:
public void record(String key, String value) {
entries.computeIfAbsent(key,
k -> Collections.synchronizedList(new ArrayList<String>()))
.add(value);
}
The ConcurrentHashMap documentation (not the general contract) specifies that the ArrayList will only be created once for each key, at the slight initial cost of delaying updates while the ArrayList is being created for a new key:
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-
In the end, I implemented a slight modification of #Bohemian's answer. His proposed solution overwrites the values variable with the putIfAbsent call, which creates the same problem I had before. The code that seems to work looks like this:
public void record(String key, String value) {
List<String> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedList(new ArrayList<String>());
List<String> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
It's not as elegant as I'd like, but it's better than the original that creates a new ArrayList instance at every call.
Created two versions based on Gene's answer
public static <K,V> void putIfAbsetMultiValue(ConcurrentHashMap<K,List<V>> entries, K key, V value) {
List<V> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedList(new ArrayList<V>());
List<V> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
public static <K,V> void putIfAbsetMultiValueSet(ConcurrentMap<K,Set<V>> entries, K key, V value) {
Set<V> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedSet(new HashSet<V>());
Set<V> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
It works well
This is a problem I also looked for an answer. The method putIfAbsent does not actually solve the extra object creation problem, it just makes sure that one of those objects doesn't replace another. But the race conditions among threads can cause multiple object instantiation. I could find 3 solutions for this problem (And I would follow this order of preference):
1- If you are on Java 8, the best way to achieve this is probably the new computeIfAbsent method of ConcurrentMap. You just need to give it a computation function which will be executed synchronously (at least for the ConcurrentHashMap implementation). Example:
private final ConcurrentMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public void method1(String key, String value) {
entries.computeIfAbsent(key, s -> new ArrayList<String>())
.add(value);
}
This is from the javadoc of ConcurrentHashMap.computeIfAbsent:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is performed
atomically, so the function is applied at most once per key. Some
attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.
2- If you cannot use Java 8, you can use Guava's LoadingCache, which is thread-safe. You define a load function to it (just like the compute function above), and you can be sure that it'll be called synchronously. Example:
private final LoadingCache<String, List<String>> entries = CacheBuilder.newBuilder()
.build(new CacheLoader<String, List<String>>() {
#Override
public List<String> load(String s) throws Exception {
return new ArrayList<String>();
}
});
public void method2(String key, String value) {
entries.getUnchecked(key).add(value);
}
3- If you cannot use Guava either, you can always synchronise manually and do a double-checked locking. Example:
private final ConcurrentMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public void method3(String key, String value) {
List<String> existing = entries.get(key);
if (existing != null) {
existing.add(value);
} else {
synchronized (entries) {
List<String> existingSynchronized = entries.get(key);
if (existingSynchronized != null) {
existingSynchronized.add(value);
} else {
List<String> newList = new ArrayList<>();
newList.add(value);
entries.put(key, newList);
}
}
}
}
I made an example implementation of all those 3 methods and additionally, the non-synchronized method, which causes extra object creation: http://pastebin.com/qZ4DUjTr
Waste of memory (also GC etc.) that Empty Array list creation problem is handled with Java 1.7.40. Don't worry about creating empty arraylist.
Reference : http://javarevisited.blogspot.com.tr/2014/07/java-optimization-empty-arraylist-and-Hashmap-cost-less-memory-jdk-17040-update.html
The approach with putIfAbsent has the fastest execution time, it is from 2 to 50 times faster than the "lambda" approach in evironments with high contention. The Lambda isn't the reason behind this "powerloss", the issue is the compulsory synchronisation inside of computeIfAbsent prior to the Java-9 optimisations.
the benchmark:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
public class ConcurrentHashMapTest {
private final static int numberOfRuns = 1000000;
private final static int numberOfThreads = Runtime.getRuntime().availableProcessors();
private final static int keysSize = 10;
private final static String[] strings = new String[keysSize];
static {
for (int n = 0; n < keysSize; n++) {
strings[n] = "" + (char) ('A' + n);
}
}
public static void main(String[] args) throws InterruptedException {
for (int n = 0; n < 20; n++) {
testPutIfAbsent();
testComputeIfAbsentLamda();
}
}
private static void testPutIfAbsent() throws InterruptedException {
final AtomicLong totalTime = new AtomicLong();
final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
final Random random = new Random();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
long start, end;
for (int n = 0; n < numberOfRuns; n++) {
String s = strings[random.nextInt(strings.length)];
start = System.nanoTime();
AtomicInteger count = map.get(s);
if (count == null) {
count = new AtomicInteger(0);
AtomicInteger prevCount = map.putIfAbsent(s, count);
if (prevCount != null) {
count = prevCount;
}
}
count.incrementAndGet();
end = System.nanoTime();
totalTime.addAndGet(end - start);
}
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
+ " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
}
private static void testComputeIfAbsentLamda() throws InterruptedException {
final AtomicLong totalTime = new AtomicLong();
final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
final Random random = new Random();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
long start, end;
for (int n = 0; n < numberOfRuns; n++) {
String s = strings[random.nextInt(strings.length)];
start = System.nanoTime();
AtomicInteger count = map.computeIfAbsent(s, (k) -> new AtomicInteger(0));
count.incrementAndGet();
end = System.nanoTime();
totalTime.addAndGet(end - start);
}
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
+ " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
}
}
The results:
Test testPutIfAbsent average time per run: 115.756501 ns
Test testComputeIfAbsentLamda average time per run: 276.9667055 ns
Test testPutIfAbsent average time per run: 134.2332435 ns
Test testComputeIfAbsentLamda average time per run: 223.222063625 ns
Test testPutIfAbsent average time per run: 119.968893625 ns
Test testComputeIfAbsentLamda average time per run: 216.707419875 ns
Test testPutIfAbsent average time per run: 116.173902375 ns
Test testComputeIfAbsentLamda average time per run: 215.632467375 ns
Test testPutIfAbsent average time per run: 112.21422775 ns
Test testComputeIfAbsentLamda average time per run: 210.29563725 ns
Test testPutIfAbsent average time per run: 120.50643475 ns
Test testComputeIfAbsentLamda average time per run: 200.79536475 ns
I have a Java HashMap called statusCountMap.
Calling size() results in 30.
But if I count the entries manually, it's 31
This is in one of my TestNG unit tests. These results below are from Eclipse's Display window (type code -> highlight -> hit Display Result of Evaluating Selected Text).
statusCountMap.size()
(int) 30
statusCountMap.keySet().size()
(int) 30
statusCountMap.values().size()
(int) 30
statusCountMap
(java.util.HashMap) {40534-INACTIVE=2, 40526-INACTIVE=1, 40528-INACTIVE=1, 40492-INACTIVE=3, 40492-TOTAL=4, 40513-TOTAL=6, 40532-DRAFT=4, 40524-TOTAL=7, 40526-DRAFT=2, 40528-ACTIVE=1, 40524-DRAFT=2, 40515-ACTIVE=1, 40513-DRAFT=4, 40534-DRAFT=1, 40514-TOTAL=3, 40529-DRAFT=4, 40515-TOTAL=3, 40492-ACTIVE=1, 40528-TOTAL=4, 40514-DRAFT=2, 40526-TOTAL=3, 40524-INACTIVE=2, 40515-DRAFT=2, 40514-ACTIVE=1, 40534-TOTAL=3, 40513-ACTIVE=2, 40528-DRAFT=2, 40532-TOTAL=4, 40524-ACTIVE=3, 40529-ACTIVE=1, 40529-TOTAL=5}
statusCountMap.entrySet().size()
(int) 30
What gives ? Anyone has experienced this ?
I'm pretty sure statusCountMap is not being modified at this point.
There are 2 methods (lets call them methodA and methodB) that modify statusCountMap concurrently, by repeatedly calling incrementCountInMap.
private void incrementCountInMap(Map map, Long id, String qualifier) {
String key = id + "-" + qualifier;
if (map.get(key) == null) {
map.put(key, 0);
}
synchronized (map) {
map.put(key, map.get(key).intValue() + 1);
}
}
methodD is where I'm getting the issue. methodD has a TestNG #dependsOnMethods = { "methodA", "methodB" } so when methodD is executing, statusCountMap is pretty much static already.
I'm mentioning this because it might be a bug in TestNG.
I'm using Sun JDK 1.6.0_24. TestNG is testng-5.9-jdk15.jar
Hmmm ... after rereading my post, could it be because of concurrent execution of outside-of-synchronized-block map.get(key) == null & map.put(key,0) that's causing this issue ?
I believe you can achieve this if you modify a key after it is added to a HashMap.
However in your case it appears to be just a case of modifying the same map in two threads without proper synchronization. e.g. in thread A, map.put(key, 0), thread B map.put(key2, 0) can results in a size of 1 or 2. If you do the same with remove you can end up with a size larger than you should.
Hmmm ... after rereading my post, could it be because of concurrent execution of outside-of-synchronized-block map.get(key) == null & map.put(key,0) that's causing this issue ?
In a word ... yes.
HashMap is not thread-safe. Therefore, if there is any point where two threads could update a HashMap without proper synchronization, the map could get into an inconsistent state. And even if one of the threads is only reading, that thread could see an inconsistent state for the map.
The correct way to write that method is:
private void incrementCountInMap(Map map, Long id, String qualifier) {
String key = id + "-" + qualifier;
synchronized (map) {
Integer count = map.get(key);
map.put(key, count == null ? 1 : count + 1);
}
}
If you using the default initial capacity of 16 and accessing them map in a non thread safe manner your ripe for an inconsistent state. Size is a state member in the Map getting updated as each item is entered(size++). This is because the map itself is an array of linked lists and cannot really return its actual size because its not indicative of the number of items it contains. Once the Map reaches a percentage(load_factor) of the intial capacity it has to resize itself to accomodate more items. If a rogue thread is attempting to add items as the map is resizing who knows what state the map will be in.
The problem that the first map.put(..) isn't synchronized. Either synchronize it, or use Collections.synchronizedMap(..). Test case:
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class Test {
public static void main(String... args) throws InterruptedException {
final Random random = new Random();
final int max = 10;
for (int j = 0; j < 100000; j++) {
// final Map<String, Integer> map = Collections.synchronizedMap(new HashMap<String, Integer>());
final HashMap<String, Integer> map = new HashMap<String, Integer>();
Thread t = new Thread() {
public void run() {
for (int i = 0; i < 100; i++) {
incrementCountInMap(map, random.nextInt(max));
}
}
};
t.start();
for (int i = 0; i < 100; i++) {
incrementCountInMap(map, random.nextInt(max));
}
t.join();
if (map.size() != max) {
System.out.println("size: " + map.size() + " entries: " + map);
}
}
}
static void incrementCountInMap(Map<String, Integer> map, int id) {
String key = "k" + id;
if (map.get(key) == null) {
map.put(key, 0);
}
synchronized (map) {
map.put(key, map.get(key).intValue() + 1);
}
}
}
Some results I get:
size: 11 entries: {k3=24, k4=20, k5=16, k6=30, k7=16, k8=18, k9=11, k0=18, k1=16, k1=13, k2=18}
size: 11 entries: {k3=18, k4=19, k5=21, k6=20, k7=18, k8=26, k9=20, k0=16, k1=25, k2=15}
size: 11 entries: {k3=25, k4=20, k5=27, k6=15, k7=17, k8=17, k9=24, k0=21, k1=16, k1=1, k2=17}
size: 11 entries: {k3=13, k4=21, k5=18, k6=21, k7=13, k8=17, k9=25, k0=20, k1=23, k2=28}
size: 11 entries: {k3=21, k4=25, k5=19, k6=12, k7=17, k8=14, k9=23, k0=24, k1=26, k2=18}
size: 9 entries: {k3=13, k4=17, k5=23, k6=24, k7=18, k8=19, k9=28, k0=21, k1=17, k2=20}
size: 9 entries: {k3=15, k4=24, k5=21, k6=18, k7=21, k8=30, k9=20, k0=17, k1=15, k2=19}
size: 11 entries: {k3=15, k4=13, k5=21, k6=21, k7=15, k8=19, k9=23, k0=30, k1=15, k2=27}
size: 11 entries: {k3=29, k4=15, k5=19, k6=19, k7=15, k8=23, k9=14, k0=31, k1=18, k2=12}
size: 11 entries: {k3=17, k4=18, k5=20, k6=11, k6=13, k7=20, k8=22, k9=30, k0=12, k1=21, k2=16}