Intersecting ranges when using RangeMap - java

I came across a problem like
You are maintaining a trading platform for a hedge fund. Traders in your hedge fund execute trading strategies throughout the day.
For the sake of simplicity let's assume that each trading strategy makes i pounds/minute for as long as it's running. i can be negative.
In the end of the day you have a log file that looks as follows:
timestamp_start_1, timestamp_end_1, i_1
timestamp_start_2, timestamp_end_2, i_2
timestamp_start_3, timestamp_end_3, i_3
Each line represents when the strategy started executing, when it stopped, and the income produced as a rate.
Write some code to return the time during the day when the hedge fund was making the highest amount of money per minute.
Examples:
Input:
(1, 13, 400)
(10, 20, 100)
Result :
(10,13) 500
Input:
(12,14, 400)
(10,20,100)
Result:
(12,14) 500
Input:
(10, 20, 400)
(21,25,100)
Result:
(10,20) 400
I have been trying to use guava RangeMap to solve it, but there is no apparent way to intersect the overlapping intervals.
For example:
private static void method(Record[] array){
RangeMap<Integer, Integer> rangeMap = TreeRangeMap.create();
for (Record record : array) {
rangeMap.put(Range.closed(record.startTime, record.endTime), record.profitRate);
}
System.out.println(rangeMap);
}
public static void main(String[] args) {
Record[] array = {new Record(1,13,400), new Record(10,20,100)};
method(array);
}
And the map looks like:
[[1..10)=400, [10..20]=100]
Is there any way to override the overlap behavior or any other data structure that can be used to solve the problem?

Use RangeMap.subRangeMap(Range) to identify all the existing range entries intersecting with some other particular range, which will allow you to filter out the intersections.
This might look something like:
void add(RangeMap<Integer, Integer> existing, Range<Integer> range, int add) {
List<Map.Entry<Range<Integer>, Integer>> overlaps = new ArrayList<>(
existing.subRangeMap(range).asMapOfRanges().entrySet());
existing.put(range, add);
for (Map.Entry<Range, Integer> overlap : overlaps) {
existing.put(overlap.getKey(), overlap.getValue() + add);
}
}

As mentioned in the comments: A RangeMap is probably not suitable for this, because the ranges of a RangeMap have to be disjoint.
One approach for solving this in general was mentioned in the comments: One could combine all the ranges, and generate all disjoint ranges from that. For example, given these ranges
|------------| :400
|----------|:100
one could compute all sub-ranges that are implied by their intersections
|--------| :400
|---| :500
|------|:100
Where the range in the middle would obviously be the solution in this case.
But in general, there are some imponderabilities in the problem statement. For example, it is not entirely clear whether multiple ranges may have the same start time and/or the same end time. Things that may be relevant for possible optimizations may be whether the records are "ordered" in any way.
But regardless of that, one generic approach could be as follows:
Compute a mapping from all start times to the records that start there
Compute a mapping from all end times to the records that end there
Walk through all start- and end times (in order), keeping track of the accumulated profit rate for the current interval
(Yes, this is basically the generation of the disjoint sets from the comments. But it does not "construct" a data structure holding this information. It just uses this information to compute the maximum, on the fly).
An implementation could look like this:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import java.util.TreeSet;
class Record
{
int startTime;
int endTime;
int profitRate;
public Record(int startTime, int endTime, int profitRate)
{
this.startTime = startTime;
this.endTime = endTime;
this.profitRate = profitRate;
}
#Override
public String toString()
{
return "(" + startTime + "..." + endTime + ", " + profitRate + ")";
}
}
public class MaxRangeFinder
{
public static void main(String[] args)
{
test01();
test02();
}
private static void test01()
{
System.out.println("Test case 01:");
Record[] records = {
new Record(1,13,400),
new Record(10,20,100),
};
Record max = computeMax(Arrays.asList(records));
printNicely(Arrays.asList(records), max);
}
private static void test02()
{
System.out.println("Test case 02:");
Record[] records = {
new Record(1,5,100),
new Record(2,6,200),
new Record(3,4,50),
new Record(3,4,25),
new Record(5,8,200),
};
Record max = computeMax(Arrays.asList(records));
printNicely(Arrays.asList(records), max);
}
private static Record computeMax(Collection<? extends Record> records)
{
// Create mappings from the start times to all records that start
// there, and from the end times to the records that end there
Map<Integer, List<Record>> recordsByStartTime =
new LinkedHashMap<Integer, List<Record>>();
for (Record record : records)
{
recordsByStartTime.computeIfAbsent(record.startTime,
t -> new ArrayList<Record>()).add(record);
}
Map<Integer, List<Record>> recordsByEndTime =
new LinkedHashMap<Integer, List<Record>>();
for (Record record : records)
{
recordsByEndTime.computeIfAbsent(record.endTime,
t -> new ArrayList<Record>()).add(record);
}
// Collect all times where a record starts or ends
Set<Integer> eventTimes = new TreeSet<Integer>();
eventTimes.addAll(recordsByStartTime.keySet());
eventTimes.addAll(recordsByEndTime.keySet());
// Walk over all events, keeping track of the
// starting and ending records
int accumulatedProfitRate = 0;
int maxAccumulatedProfitRate = -Integer.MAX_VALUE;
int maxAccumulatedProfitStartTime = 0;
int maxAccumulatedProfitEndTime = 0;
for (Integer eventTime : eventTimes)
{
int previousAccumulatedProfitRate = accumulatedProfitRate;
// Add the profit rate of the starting records
List<Record> startingRecords = Optional
.ofNullable(recordsByStartTime.get(eventTime))
.orElse(Collections.emptyList());
for (Record startingRecord : startingRecords)
{
accumulatedProfitRate += startingRecord.profitRate;
}
// Subtract the profit rate of the ending records
List<Record> endingRecords = Optional
.ofNullable(recordsByEndTime.get(eventTime))
.orElse(Collections.emptyList());
for (Record endingRecord : endingRecords)
{
accumulatedProfitRate -= endingRecord.profitRate;
}
// Update the information about the maximum, if necessary
if (accumulatedProfitRate > maxAccumulatedProfitRate)
{
maxAccumulatedProfitRate = accumulatedProfitRate;
maxAccumulatedProfitStartTime = eventTime;
maxAccumulatedProfitEndTime = eventTime;
}
if (previousAccumulatedProfitRate == maxAccumulatedProfitRate &&
accumulatedProfitRate < previousAccumulatedProfitRate)
{
maxAccumulatedProfitEndTime = eventTime;
}
}
return new Record(
maxAccumulatedProfitStartTime,
maxAccumulatedProfitEndTime,
maxAccumulatedProfitRate);
}
private static void printNicely(
Collection<? extends Record> records,
Record max)
{
StringBuilder sb = new StringBuilder();
int maxEndTime = Collections.max(records,
(r0, r1) -> Integer.compare(r0.endTime, r1.endTime)).endTime;
for (Record record : records)
{
sb.append(" ")
.append(createString(record, maxEndTime))
.append("\n");
}
sb.append("Max: ").append(createString(max, maxEndTime));
System.out.println(sb.toString());
}
private static String createString(Record record, int maxEndTime)
{
StringBuilder sb = new StringBuilder();
int i = 0;
while (i < record.startTime)
{
sb.append(" ");
i++;
}
sb.append("|");
while (i < record.endTime)
{
sb.append("-");
i++;
}
sb.append("|");
while (i < maxEndTime)
{
sb.append(" ");
i++;
}
sb.append(":").append(record.profitRate);
return sb.toString();
}
}
The output for the two test cases given in the code is
Test case 01:
|------------| :400
|----------|:100
Max: |---| :500
Test case 02:
|----| :100
|----| :200
|-| :50
|-| :25
|---|:200
Max: |-| :400

Related

Find most occurring String within ArrayList

I am kind of new to Java and wanted to ask how I can find the most occurring String within an ArrayList.
I have two classes.
One holds game activities and the winner for those games.
The other class is supposed to be the Event that holds all the games.
For the event class, I have to find the person with the most game activities won. I made an ArrayList in the events class that holds all the winners from the games. Now I need to find the name that occurs most often in the ArrayList and output the String.
private ArrayList<Game> games;
public String getEventWinner(){
ArrayList<String> winner;
winner = new ArrayList<String>();
for(Game game : games)
{
winner.add(game.getWinner());
}
return eventWinner;
}
That gets me all the winners from the games in an ArrayList, but now I do not know how to proceed and couldn't find any answer online. Could somebody lead me in the right direction?
Calculate the frequencies of the strings into a map, find the entry with max value, and return it.
With Stream API this could look like this:
public static String getEventWinner(List<Game> games) {
return games.stream()
.map(Game::getWinner) // Stream<String>
.collect(Collectors.groupingBy(
winner -> winner, LinkedHashMap::new,
Collectors.summingInt(x -> 1)
)) // build the frequency map
.entrySet().stream()
.max(Map.Entry.comparingByValue()) // Optional<Map.Entry<String, Integer>>
.map(Map.Entry::getKey) // the key - winner name
.orElse(null);
}
Here LinkedHashMap is used as a tie breaker, however, it allows to select as a winner the person who appeared earlier in the game list.
Test:
System.out.println(getEventWinner(Arrays.asList(
new Game("John"), new Game("Zach"), new Game("Zach"),
new Game("Chad"), new Game("John"), new Game("Jack")
)));
// output John
If it is needed to define the winner the person who achieved the maximum number of wins earlier, another loop-based solution should be used:
public static String getEventWinnerFirst(List<Game> games) {
Map<String, Integer> winMap = new HashMap<>();
String resultWinner = null;
int max = -1;
for (Game game : games) {
String winner = game.getWinner();
int tempSum = winMap.merge(winner, 1, Integer::sum);
if (max < tempSum) {
resultWinner = winner;
max = tempSum;
}
}
return resultWinner;
}
For the same input data, the output will be Zach because he occurred twice in the list earlier than John.
Update 2
It may be possible to find the earliest achiever of the max result using Stream but a temporary map needs to be created to store the number of wins:
public static String getEventWinner(List<String> games) {
Map<String, Integer> tmp = new HashMap<>();
return games.stream()
.map(Game::getWinner) // Stream<String>
.map(winner -> Map.entry(winner, tmp.merge(winner, 1, Integer::sum))) // store sum in tmp map
.collect(Collectors.maxBy(Map.Entry.comparingByValue()))
.map(Map.Entry::getKey) // the key - winner name
.orElse(null);
}
Stream APIs are great but can be a bit cryptic for the uninitiated. I find that non streaming code is often more readable.
Here is a solution in that spirit:
List<String> winners = Arrays.asList("a","b","c","c","b","b");
Map<String,Integer> entries = new HashMap<>();
String mostFrequent=null;
int mostFrequentCount=0;
for (String winner : winners) {
Integer count = entries.get(winner);
if (count == null) count = 0;
entries.put(winner, count+1);
if (count>=mostFrequentCount)
{
mostFrequentCount=count;
mostFrequent=winner;
}
}
System.out.println("Most frequent = " + mostFrequent + " # of wins = " + mostFrequentCount );

Sorting Arraylist using Bubblesort in parallel with fork/join pool

So what I am trying to do is sort an ArrayList using Fork/Join pool. The algorithm that I will use to sort doesn't matter: I just choosed a random one for here. What matters is how I am supposed to use the Recursive task with fork/join pool so that the ArrayList will keep on splitting until the ArrayList size reaches a certain number (like 1000) then it will perform the sorting and then it will join back into one ArrayList.
Here is my code:
assignment5
public class assignment5 {
ArrayList<Integer> numbers = new ArrayList<>();
//lets say this arraylist is full with random numbers
public void run(){
Instant start = Instant.now();
MyRecursiveTask myRecursiveTask = new MyRecursiveTask(numbers);
ArrayList<Integer> mergedResult = ForkJoinPool.invoke(myRecursiveTask);
Instant end = Instant.now();
Duration duration = Duration.between(start, end);
System.out.println("Seconds: " + duration.getSeconds());
}
}
over here in assignment5 gives me error on
ForkJoinPool.invoke(myRecursiveTask);
it says non static method invoke connot be reffereced from a static content
MyRecursiveTask
private List<Integer> numbers;
protected List<Integer> compute() {
//numbers here is the same as in assignment 5
//if work is above threshold, break tasks up into smaller tasks
if(this.numbers.size() > 1000) {
System.out.println("Splitting workLoad : " + this.numbers.size());
List<MyRecursiveTask> subtasks = new ArrayList<MyRecursiveTask>();
subtasks.addAll(createSubtasks());
for(MyRecursiveTask subtask : subtasks){
subtask.fork();
}
for(MyRecursiveTask subtask : subtasks) {
subtask.join();
}
return numbers;
} else {
System.out.println("Doing workLoad myself: " + this.numbers.size());
bubbleSort(numbers);
}
return numbers;
}
private List<MyRecursiveTask> createSubtasks() {
List<MyRecursiveTask> subtasks = new ArrayList<MyRecursiveTask>();
List<Integer> list1 = numbers.subList(0,numbers.size()/2);
List<Integer> list2 = numbers.subList(numbers.size()/2, numbers.size());
MyRecursiveTask subtask1 = new MyRecursiveTask(list1);
MyRecursiveTask subtask2 = new MyRecursiveTask(list2);
subtasks.add(subtask1);
subtasks.add(subtask2);
return subtasks;
}
public void bubbleSort(List<Integer> numbers){//bubble sort alg here}
}
ForkJoinPool.invoke() is not a static method, but you are using as it was.
You need to create an instance of ForkJoinPool and call the invoke of that instance.
Also you will need an explicit cast to make things work as you wanted (about that ArrayList).
Replace the line that gives you "error" with these ones and it'll work!
ForkJoinPool forkJoinPool = new ForkJoinPool();
ArrayList<Integer> mergedResult = (ArrayList<Integer>) forkJoinPool.invoke(myRecursiveTask);
Hope I helped!
PS: If you haven't done it already, I invite you to read the comments of the other users, you may find them helpful.

Getting the sum and max from a list using stream

Hi I have a List where the data looks like this
[{"month":"April","day":"Friday","count":5},
{"month":"April","day":"Monday","count":6},
{"month":"April","day":"Saturday","count":2},
{"month":"April","day":"Sunday","count":1},
{"month":"April","day":"Thursday","count":7},
{"month":"April","day":"Tuesday","count":8},
{"month":"April","day":"Wednesday","count":10},
{"month":"March","day":"Friday","count":3},
{"month":"March","day":"Monday","count":2},
{"month":"March","day":"Saturday","count":15},
{"month":"March","day":"Sunday","count":11},
{"month":"March","day":"Thursday","count":4},
{"month":"March","day":"Tuesday","count":20},
{"month":"March","day":"Wednesday","count":7},
{"month":"May","day":"Friday","count":2},
{"month":"May","day":"Monday","count":0},
{"month":"May","day":"Saturday","count":7},
{"month":"May","day":"Sunday","count":4},
{"month":"May","day":"Thursday","count":8},
{"month":"May","day":"Tuesday","count":3},
{"month":"May","day":"Wednesday","count":6}]
My object class is
String month;
String day;
Integer count;
What I want to get by using stream is sum of count grouped by month and the day with max count for that month.
so end result will look something like
April, Wednesday, 39
March, Tuesday, 62
May, Thursday , 30
I have been trying to use stream and grouping by but no luck. Any help is appreciated. Thanks
EDIT
Map<String, Integer> totalMap = transactions.stream().collect(Collectors.groupingBy(MonthlyTransaction::getMonth, Collectors.summingInt(MonthlyTransaction::getCount)));
Map<String, String> maxMap = transactions.stream().collect(Collectors.groupingBy(MonthlyTransaction::getMonth)).values().stream().toMap(Object::getDay, Collextions.max(Object::getCount);
obviously the maxMap method is wrong but I do not know how to write it.
If you want to find both the sum of counts per month and the day with the max count per month in a single pass, I think you need a custom collector.
First, let's create a holder class where to store the results:
public class Statistics {
private final String dayWithMaxCount;
private final long totalCount;
public Statistics(String dayWithMaxCount, long totalCount) {
this.dayWithMaxCount = dayWithMaxCount;
this.totalCount = totalCount;
}
// TODO getters and toString
}
Then, create this method, which returns a collector that accumulates both the sum of counts and the max count, along with the day in which that max was found:
public static Collector<MonthlyTransaction, ?, Statistics> withStatistics() {
class Acc {
long sum = 0;
long maxCount = Long.MIN_VALUE;
String dayWithMaxCount;
void accumulate(MonthlyTransaction transaction) {
sum += transaction.getCount();
if (transaction.getCount() > maxCount) {
maxCount = transaction.getCount();
dayWithMaxCount = transaction.getDay();
}
}
Acc merge(Acc another) {
sum += another.sum;
if (another.maxCount > maxCount) {
maxCount = another.maxCount;
dayWithMaxCount = another.dayWithMaxCount;
}
return this;
}
Statistics finish() {
return new Statistics(dayWithMaxCount, sum);
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::merge, Acc::finish);
}
This uses the local class Acc to accumulate and merge partial results. The finish method returns an instance of the Statistics class, which holds the final results. At the end, I'm using Collector.of to create a collector based on the methods of the Acc class.
Finally, you can use the method and class defined above as follows:
Map<String, Statistics> statisticsByMonth = transactions.stream()
.collect(Collectors.groupingBy(MonthlyTransaction::getMonth, withStatistics()));
did this in 2 steps instead of trying to write 1 stream to achieve the result
//First get the total of counts grouping by month
Map<String, Integer> totalMap = transactions.stream()
.collect(Collectors.groupingBy(MonthlyTransaction::getMonth, Collectors.summingInt(MonthlyTransaction::getCount)));
List<MonthlyTransaction> finalStat = new ArrayList<>();
//iterate over the total count map
totalMap.entrySet().stream().forEach(entry -> {
//Using the Stream filter to mimic a group by
MonthlyTransaction maxStat = transactions.stream()
.filter(t -> t.getMonth().equals(entry.getKey()))
//getting the item with the max count for the month
.max(Comparator.comparing(MonthlyTransaction::getCount)).get();
//Setting the count to the total value from the map as the max count value is not a requirement.
maxStat.setCount(entry.getValue());
//add the item to the list
finalStat.add(maxStat);
});
This may not be the best approach to the problem but this gives me the exact result. Thanks to everyone who had a look at it and tried to help.

Search multiple HashMaps at the same time

tldr: How can I search for an entry in multiple (read-only) Java HashMaps at the same time?
The long version:
I have several dictionaries of various sizes stored as HashMap< String, String >. Once they are read in, they are never to be changed (strictly read-only).
I want to check whether and which dictionary had stored an entry with my key.
My code was originally looking for a key like this:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Then it got a little more complicated: my search string could contain typos, or was a variant of the stored entry. Like, if the stored key was "banana", it is possible that I'd look up "bannana" or "a banana", but still would like the entry for "banana" returned. Using the Levenshtein-Distance, I now loop through all dictionaries and each entry in them:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
So far everything works as it should and I'm getting the entry I want. Unfortunately I have to look up around 7000 strings, in five dictionaries of various sizes (~ 30 - 70k entries) and it takes a while. From my processing output I have the strong impression my lookup dominates overall runtime.
My first idea to improve runtime was to search all dictionaries parallely. Since none of the dictionaries is to be changed and no more than one thread is accessing a dictionary at the same time, I don't see any safety concerns.
The question is just: how do I do this? I have never used multithreading before. My search only came up with Concurrent HashMaps (but to my understanding, I don't need this) and the Runnable-class, where I'd have to put my processing into the method run(). I think I could rewrite my current class to fit into Runnable, but I was wondering if there is maybe a simpler method to do this (or how can I do it simply with Runnable, right now my limited understanding thinks I have to restructure a lot).
Since I was asked to share the Levenshtein-Logic: It's really nothing fancy, but here you go:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}
In order to use multi-threading in your case, could be something like:
The "monitor" class, which basically stores the results and coordinates the threads;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
The Thread it's self, which can be set to search for the specific dictionary:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
#Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
And the main method for demonstration:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}
I think the easiest would be to use a stream over the entry set:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
map.entrySet().parallelStream().foreach( (entry) ->
{
// Calculate Levenshtein distance, store closest match etc.
}
);
}
// return closest match or null.
}
Provided you are using java 8 of course. You could also wrap the outer loop into an IntStream as well. Also you could directly use the Stream.reduce to get the entry with the smallest distance.
Maybe try thread pools:
ExecutorService es = Executors.newFixedThreadPool(_numDictionaries);
for (int i = 0; i < _numDictionaries; i++) {
//prepare a Runnable implementation that contains a logic of your search
es.submit(prepared_runnable);
}
I believe you may also try to find a quick estimate of strings that completely do not match (i.e. significant difference in length), and use it to finish your logic ASAP, moving to next candidate.
I have my strong doubts that HashMaps are a suitable solution here, especially if you want to have some fuzzing and stop words. You should utilize a proper full text search solutions like ElaticSearch or Apache Solr or at least an available engine like Apache Lucene.
That being said, you can use a poor man's version: Create an array of your maps and a SortedMap, iterate over the array, take the keys of the current HashMap and store them in the SortedMap with the index of their HashMap. To retrieve a key, you first search in the SortedMap for said key, get the respective HashMap from the array using the index position and lookup the key in only one HashMap. Should be fast enough without the need for multiple threads to dig through the HashMaps. However, you could make the code below into a runnable and you can have multiple lookups in parallel.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Search {
public static void main(String[] arg) {
if (arg.length == 0) {
System.out.println("Must give a search word!");
System.exit(1);
}
String searchString = arg[0].toLowerCase();
/*
* Populating our HashMaps.
*/
HashMap<String, String> english = new HashMap<String, String>();
english.put("banana", "fruit");
english.put("tomato", "vegetable");
HashMap<String, String> german = new HashMap<String, String>();
german.put("Banane", "Frucht");
german.put("Tomate", "Gemüse");
/*
* Now we create our ArrayList of HashMaps for fast retrieval
*/
List<HashMap<String, String>> maps = new ArrayList<HashMap<String, String>>();
maps.add(english);
maps.add(german);
/*
* This is our index
*/
SortedMap<String, Integer> index = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
/*
* Populating the index:
*/
for (int i = 0; i < maps.size(); i++) {
// We iterate through or HashMaps...
HashMap<String, String> currentMap = maps.get(i);
for (String key : currentMap.keySet()) {
/* ...and populate our index with lowercase versions of the keys,
* referencing the array from which the key originates.
*/
index.put(key.toLowerCase(), i);
}
}
// In case our index contains our search string...
if (index.containsKey(searchString)) {
/*
* ... we find out in which map of the ones stored in maps
* the word in the index originated from.
*/
Integer mapIndex = index.get(searchString);
/*
* Next, we look up said map.
*/
HashMap<String, String> origin = maps.get(mapIndex);
/*
* Last, we retrieve the value from the origin map
*/
String result = origin.get(searchString);
/*
* The above steps can be shortened to
* String result = maps.get(index.get(searchString).intValue()).get(searchString);
*/
System.out.println(result);
} else {
System.out.println("\"" + searchString + "\" is not in the index!");
}
}
}
Please note that this is a rather naive implementation only provided for illustration purposes. It doesn't address several problems (you can't have duplicate index entries, for example).
With this solution, you are basically trading startup speed for query speed.
Okay!!..
Since your concern is to get faster response.
I would suggest you to divide the work between threads.
Lets you have 5 dictionaries May be keep three dictionaries to one thread and rest two will take care by another thread.
And then witch ever thread finds the match will halt or terminate the other thread.
May be you need an extra logic to do that dividing work ... But that wont effect your performance time.
And may be you need little more changes in your code to get your close match:
for (Map.Entry entry : _dictionary.entrySet()) {
you are using EntrySet But you are not using values anyway it seems getting entry set is a bit expensive. And I would suggest you to just use keySet since you are not really interested in the values in that map
for (Map.Entry entry : _dictionary.keySet()) {
For more details on the proformance of map Please read this link Map performances
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.

ConcurrentHashMap: avoid extra object creation with "putIfAbsent"?

I am aggregating multiple values for keys in a multi-threaded environment. The keys are not known in advance. I thought I would do something like this:
class Aggregator {
protected ConcurrentHashMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public Aggregator() {}
public void record(String key, String value) {
List<String> newList =
Collections.synchronizedList(new ArrayList<String>());
List<String> existingList = entries.putIfAbsent(key, newList);
List<String> values = existingList == null ? newList : existingList;
values.add(value);
}
}
The problem I see is that every time this method runs, I need to create a new instance of an ArrayList, which I then throw away (in most cases). This seems like unjustified abuse of the garbage collector. Is there a better, thread-safe way of initializing this kind of a structure without having to synchronize the record method? I am somewhat surprised by the decision to have the putIfAbsent method not return the newly-created element, and by the lack of a way to defer instantiation unless it is called for (so to speak).
Java 8 introduced an API to cater for this exact problem, making a 1-line solution:
public void record(String key, String value) {
entries.computeIfAbsent(key, k -> Collections.synchronizedList(new ArrayList<String>())).add(value);
}
For Java 7:
public void record(String key, String value) {
List<String> values = entries.get(key);
if (values == null) {
entries.putIfAbsent(key, Collections.synchronizedList(new ArrayList<String>()));
// At this point, there will definitely be a list for the key.
// We don't know or care which thread's new object is in there, so:
values = entries.get(key);
}
values.add(value);
}
This is the standard code pattern when populating a ConcurrentHashMap.
The special method putIfAbsent(K, V)) will either put your value object in, or if another thread got before you, then it will ignore your value object. Either way, after the call to putIfAbsent(K, V)), get(key) is guaranteed to be consistent between threads and therefore the above code is threadsafe.
The only wasted overhead is if some other thread adds a new entry at the same time for the same key: You may end up throwing away the newly created value, but that only happens if there is not already an entry and there's a race that your thread loses, which would typically be rare.
As of Java-8 you can create Multi Maps using the following pattern:
public void record(String key, String value) {
entries.computeIfAbsent(key,
k -> Collections.synchronizedList(new ArrayList<String>()))
.add(value);
}
The ConcurrentHashMap documentation (not the general contract) specifies that the ArrayList will only be created once for each key, at the slight initial cost of delaying updates while the ArrayList is being created for a new key:
http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#computeIfAbsent-K-java.util.function.Function-
In the end, I implemented a slight modification of #Bohemian's answer. His proposed solution overwrites the values variable with the putIfAbsent call, which creates the same problem I had before. The code that seems to work looks like this:
public void record(String key, String value) {
List<String> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedList(new ArrayList<String>());
List<String> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
It's not as elegant as I'd like, but it's better than the original that creates a new ArrayList instance at every call.
Created two versions based on Gene's answer
public static <K,V> void putIfAbsetMultiValue(ConcurrentHashMap<K,List<V>> entries, K key, V value) {
List<V> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedList(new ArrayList<V>());
List<V> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
public static <K,V> void putIfAbsetMultiValueSet(ConcurrentMap<K,Set<V>> entries, K key, V value) {
Set<V> values = entries.get(key);
if (values == null) {
values = Collections.synchronizedSet(new HashSet<V>());
Set<V> values2 = entries.putIfAbsent(key, values);
if (values2 != null)
values = values2;
}
values.add(value);
}
It works well
This is a problem I also looked for an answer. The method putIfAbsent does not actually solve the extra object creation problem, it just makes sure that one of those objects doesn't replace another. But the race conditions among threads can cause multiple object instantiation. I could find 3 solutions for this problem (And I would follow this order of preference):
1- If you are on Java 8, the best way to achieve this is probably the new computeIfAbsent method of ConcurrentMap. You just need to give it a computation function which will be executed synchronously (at least for the ConcurrentHashMap implementation). Example:
private final ConcurrentMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public void method1(String key, String value) {
entries.computeIfAbsent(key, s -> new ArrayList<String>())
.add(value);
}
This is from the javadoc of ConcurrentHashMap.computeIfAbsent:
If the specified key is not already associated with a value, attempts
to compute its value using the given mapping function and enters it
into this map unless null. The entire method invocation is performed
atomically, so the function is applied at most once per key. Some
attempted update operations on this map by other threads may be
blocked while computation is in progress, so the computation should be
short and simple, and must not attempt to update any other mappings of
this map.
2- If you cannot use Java 8, you can use Guava's LoadingCache, which is thread-safe. You define a load function to it (just like the compute function above), and you can be sure that it'll be called synchronously. Example:
private final LoadingCache<String, List<String>> entries = CacheBuilder.newBuilder()
.build(new CacheLoader<String, List<String>>() {
#Override
public List<String> load(String s) throws Exception {
return new ArrayList<String>();
}
});
public void method2(String key, String value) {
entries.getUnchecked(key).add(value);
}
3- If you cannot use Guava either, you can always synchronise manually and do a double-checked locking. Example:
private final ConcurrentMap<String, List<String>> entries =
new ConcurrentHashMap<String, List<String>>();
public void method3(String key, String value) {
List<String> existing = entries.get(key);
if (existing != null) {
existing.add(value);
} else {
synchronized (entries) {
List<String> existingSynchronized = entries.get(key);
if (existingSynchronized != null) {
existingSynchronized.add(value);
} else {
List<String> newList = new ArrayList<>();
newList.add(value);
entries.put(key, newList);
}
}
}
}
I made an example implementation of all those 3 methods and additionally, the non-synchronized method, which causes extra object creation: http://pastebin.com/qZ4DUjTr
Waste of memory (also GC etc.) that Empty Array list creation problem is handled with Java 1.7.40. Don't worry about creating empty arraylist.
Reference : http://javarevisited.blogspot.com.tr/2014/07/java-optimization-empty-arraylist-and-Hashmap-cost-less-memory-jdk-17040-update.html
The approach with putIfAbsent has the fastest execution time, it is from 2 to 50 times faster than the "lambda" approach in evironments with high contention. The Lambda isn't the reason behind this "powerloss", the issue is the compulsory synchronisation inside of computeIfAbsent prior to the Java-9 optimisations.
the benchmark:
import java.util.Random;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
public class ConcurrentHashMapTest {
private final static int numberOfRuns = 1000000;
private final static int numberOfThreads = Runtime.getRuntime().availableProcessors();
private final static int keysSize = 10;
private final static String[] strings = new String[keysSize];
static {
for (int n = 0; n < keysSize; n++) {
strings[n] = "" + (char) ('A' + n);
}
}
public static void main(String[] args) throws InterruptedException {
for (int n = 0; n < 20; n++) {
testPutIfAbsent();
testComputeIfAbsentLamda();
}
}
private static void testPutIfAbsent() throws InterruptedException {
final AtomicLong totalTime = new AtomicLong();
final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
final Random random = new Random();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
long start, end;
for (int n = 0; n < numberOfRuns; n++) {
String s = strings[random.nextInt(strings.length)];
start = System.nanoTime();
AtomicInteger count = map.get(s);
if (count == null) {
count = new AtomicInteger(0);
AtomicInteger prevCount = map.putIfAbsent(s, count);
if (prevCount != null) {
count = prevCount;
}
}
count.incrementAndGet();
end = System.nanoTime();
totalTime.addAndGet(end - start);
}
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
+ " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
}
private static void testComputeIfAbsentLamda() throws InterruptedException {
final AtomicLong totalTime = new AtomicLong();
final ConcurrentHashMap<String, AtomicInteger> map = new ConcurrentHashMap<String, AtomicInteger>();
final Random random = new Random();
ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads);
for (int i = 0; i < numberOfThreads; i++) {
executorService.execute(new Runnable() {
#Override
public void run() {
long start, end;
for (int n = 0; n < numberOfRuns; n++) {
String s = strings[random.nextInt(strings.length)];
start = System.nanoTime();
AtomicInteger count = map.computeIfAbsent(s, (k) -> new AtomicInteger(0));
count.incrementAndGet();
end = System.nanoTime();
totalTime.addAndGet(end - start);
}
}
});
}
executorService.shutdown();
executorService.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);
System.out.println("Test " + Thread.currentThread().getStackTrace()[1].getMethodName()
+ " average time per run: " + (double) totalTime.get() / numberOfThreads / numberOfRuns + " ns");
}
}
The results:
Test testPutIfAbsent average time per run: 115.756501 ns
Test testComputeIfAbsentLamda average time per run: 276.9667055 ns
Test testPutIfAbsent average time per run: 134.2332435 ns
Test testComputeIfAbsentLamda average time per run: 223.222063625 ns
Test testPutIfAbsent average time per run: 119.968893625 ns
Test testComputeIfAbsentLamda average time per run: 216.707419875 ns
Test testPutIfAbsent average time per run: 116.173902375 ns
Test testComputeIfAbsentLamda average time per run: 215.632467375 ns
Test testPutIfAbsent average time per run: 112.21422775 ns
Test testComputeIfAbsentLamda average time per run: 210.29563725 ns
Test testPutIfAbsent average time per run: 120.50643475 ns
Test testComputeIfAbsentLamda average time per run: 200.79536475 ns

Categories