Get an Object from a collection without looping in java - java

I need to repeatedly (hundred of thousands of times) retrieve an element (different each time) from a Collection which contains dozens of thousand of Objects.
What is the quickest way to do this retrieval operation? At the moment my Collection is a List and I iterate on it until I have found the element, but is there a quicker way? Using a Map maybe? I was thinking to do:
Putting the Objects in a Map, with the key being the id field of the Object, and the Object itself being the value.
Then doing get(id) on the Map should be much faster than looping through a List.
If that is a correct way to do it, should I use a HashMap or TreeMap? - my objects have no particular ordering.
Any advice on the matter would be appreciated!
Last note: if an external library provides a tool to answer this I'd take it gladly!

As per the documentation of the Tree Map (emphasis my own):
The map is sorted according to the natural ordering of its keys,
or by a Comparator provided at map creation time, depending on which
constructor is used.
In your case, you state that the items have no particular order and it does not seem that you are after any particular order, but rather just be able to retrieve data as fast as possible.
HashMaps provide constant read time but do not guarantee order, so I think that you should go with HashMaps:
This class makes no guarantees as to the order of the map; in
particular, it does not guarantee that the order will remain constant
over time. This implementation provides constant-time performance for
the basic operations (get and put), assuming the hash function
disperses the elements properly among the buckets.
As a side note, the memory footprint of this can get quite high quite fast, so it might also be a good idea to look into a database approach and maybe use a cache like mechanism to handle more frequently used information.

I've created code which tests the proformance of BinarySearch, TreeMap and HashMap for the given problem.
In case you are rebuilding the collection each time, HashMap is the fastest (even with standard Object's hashCode() implementation!), sort+array binary search goes second and a TreeMap is last (due to complex rebuilding procedure).
proc array: 2395
proc tree : 4413
proc hash : 1325
If you are not rebuilding the collection, HashMap is still the fastest, an array's binary search is second and a TreeMap is the slowest, but with only slightly lower speed than array.
proc array: 506
proc tree : 561
proc hash : 122
Test code:
public class SearchSpeedTest {
private List<DataObject> data;
private List<Long> ids;
private Map<Long, DataObject> hashMap;
private Map<Long, DataObject> treeMap;
private int numRep;
private int dataAmount;
private boolean rebuildEachTime;
static class DataObject implements Comparable<DataObject>{
Long id;
public DataObject(Long id) {
super();
this.id = id;
}
public DataObject() {
// TODO Auto-generated constructor stub
}
#Override
public final int compareTo(DataObject o) {
return Long.compare(id, o.id);
}
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public void dummyCode() {
}
}
#FunctionalInterface
public interface Procedure {
void execute();
}
public void testSpeeds() {
rebuildEachTime = true;
numRep = 100;
dataAmount = 60_000;
data = new ArrayList<>(dataAmount);
ids = new ArrayList<>(dataAmount);
Random gen = new Random();
for (int i=0; i< dataAmount; i++) {
long id = i*7+gen.nextInt(7);
ids.add(id);
data.add(new DataObject(id));
}
Collections.sort(data);
treeMap = new TreeMap<Long, DataObject>();
populateMap(treeMap);
hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
populateMap(hashMap);
Procedure[] procedures = new Procedure[] {this::testArray, this::testTreeMap, this::testHashMap};
String[] names = new String[] {"array", "tree ", "hash "};
for (int n=0; n<procedures.length; n++) {
Procedure proc = procedures[n];
long startTime = System.nanoTime();
for (int i=0; i<numRep; i++) {
if (rebuildEachTime) {
Collections.shuffle(data);
}
proc.execute();
}
long endTime = System.nanoTime();
long diff = endTime - startTime;
System.out.println("proc "+names[n]+":\t"+(diff/1_000_000));
}
}
void testHashMap() {
if (rebuildEachTime) {
hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
populateMap(hashMap);
}
testMap(hashMap);
}
void testTreeMap() {
if (rebuildEachTime) {
treeMap = new TreeMap<Long, SearchSpeedTest.DataObject>();
populateMap(treeMap);
}
testMap(treeMap);
}
void testMap(Map<Long, DataObject> map) {
for (Long id: ids) {
DataObject ret = map.get(id);
ret.dummyCode();
}
}
void populateMap(Map<Long, DataObject> map) {
for (DataObject dataObj : data) {
map.put(dataObj.getId(), dataObj);
}
}
void testArray() {
if (rebuildEachTime) {
Collections.sort(data);
}
DataObject key = new DataObject();
for (Long id: ids) {
key.setId(id);
DataObject ret = data.get(Collections.binarySearch(data, key));
ret.dummyCode();
}
}
public static void main(String[] args) {
new SearchSpeedTest().testSpeeds();
}
}

HashMap will be more efficient in general, so use it whenever you don't care about the order of the keys.
when you want to keep your entries in Map sorted by key than use TreeMap but sorting will be overhead in your case as you dont want any particulate order.

You can use a map if you have a good way to define the key of the map. In the worst case you can use your object as key and value.
As ordering is not important use a HashMap. To maintain the order in a TreeMap there is additional cost when adding an element, as it must be added at the correct position.

Related

Search multiple HashMaps at the same time

tldr: How can I search for an entry in multiple (read-only) Java HashMaps at the same time?
The long version:
I have several dictionaries of various sizes stored as HashMap< String, String >. Once they are read in, they are never to be changed (strictly read-only).
I want to check whether and which dictionary had stored an entry with my key.
My code was originally looking for a key like this:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
if (map.containsKey(key))
return new DictionaryEntry(map.get(key), i);
}
return null;
}
Then it got a little more complicated: my search string could contain typos, or was a variant of the stored entry. Like, if the stored key was "banana", it is possible that I'd look up "bannana" or "a banana", but still would like the entry for "banana" returned. Using the Levenshtein-Distance, I now loop through all dictionaries and each entry in them:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
for (Map.Entry entry : map.entrySet) {
// Calculate Levenshtein distance, store closest match etc.
}
}
// return closest match or null.
}
So far everything works as it should and I'm getting the entry I want. Unfortunately I have to look up around 7000 strings, in five dictionaries of various sizes (~ 30 - 70k entries) and it takes a while. From my processing output I have the strong impression my lookup dominates overall runtime.
My first idea to improve runtime was to search all dictionaries parallely. Since none of the dictionaries is to be changed and no more than one thread is accessing a dictionary at the same time, I don't see any safety concerns.
The question is just: how do I do this? I have never used multithreading before. My search only came up with Concurrent HashMaps (but to my understanding, I don't need this) and the Runnable-class, where I'd have to put my processing into the method run(). I think I could rewrite my current class to fit into Runnable, but I was wondering if there is maybe a simpler method to do this (or how can I do it simply with Runnable, right now my limited understanding thinks I have to restructure a lot).
Since I was asked to share the Levenshtein-Logic: It's really nothing fancy, but here you go:
private int _maxLSDistance = 10;
public Map.Entry getClosestMatch(String key) {
Map.Entry _closestMatch = null;
int lsDist;
if (key == null) {
return null;
}
for (Map.Entry entry : _dictionary.entrySet()) {
// Perfect match
if (entry.getKey().equals(key)) {
return entry;
}
// Similar match
else {
int dist = StringUtils.getLevenshteinDistance((String) entry.getKey(), key);
// If "dist" is smaller than threshold and smaller than distance of already stored entry
if (dist < _maxLSDistance) {
if (_closestMatch == null || dist < _lsDistance) {
_closestMatch = entry;
_lsDistance = dist;
}
}
}
}
return _closestMatch
}
In order to use multi-threading in your case, could be something like:
The "monitor" class, which basically stores the results and coordinates the threads;
public class Results {
private int nrOfDictionaries = 4; //
private ArrayList<String> results = new ArrayList<String>();
public void prepare() {
nrOfDictionaries = 4;
results = new ArrayList<String>();
}
public synchronized void oneDictionaryFinished() {
nrOfDictionaries--;
System.out.println("one dictionary finished");
notifyAll();
}
public synchronized boolean isReady() throws InterruptedException {
while (nrOfDictionaries != 0) {
wait();
}
return true;
}
public synchronized void addResult(String result) {
results.add(result);
}
public ArrayList<String> getAllResults() {
return results;
}
}
The Thread it's self, which can be set to search for the specific dictionary:
public class ThreadDictionarySearch extends Thread {
// the actual dictionary
private String dictionary;
private Results results;
public ThreadDictionarySearch(Results results, String dictionary) {
this.dictionary = dictionary;
this.results = results;
}
#Override
public void run() {
for (int i = 0; i < 4; i++) {
// search dictionary;
results.addResult("result of " + dictionary);
System.out.println("adding result from " + dictionary);
}
results.oneDictionaryFinished();
}
}
And the main method for demonstration:
public static void main(String[] args) throws Exception {
Results results = new Results();
ThreadDictionarySearch threadA = new ThreadDictionarySearch(results, "dictionary A");
ThreadDictionarySearch threadB = new ThreadDictionarySearch(results, "dictionary B");
ThreadDictionarySearch threadC = new ThreadDictionarySearch(results, "dictionary C");
ThreadDictionarySearch threadD = new ThreadDictionarySearch(results, "dictionary D");
threadA.start();
threadB.start();
threadC.start();
threadD.start();
if (results.isReady())
// it stays here until all dictionaries are searched
// because in "Results" it's told to wait() while not finished;
for (String string : results.getAllResults()) {
System.out.println("RESULT: " + string);
}
I think the easiest would be to use a stream over the entry set:
public DictionaryEntry getEntry(String key) {
for (int i = 0; i < _numDictionaries; i++) {
HashMap<String, String> map = getDictionary(i);
map.entrySet().parallelStream().foreach( (entry) ->
{
// Calculate Levenshtein distance, store closest match etc.
}
);
}
// return closest match or null.
}
Provided you are using java 8 of course. You could also wrap the outer loop into an IntStream as well. Also you could directly use the Stream.reduce to get the entry with the smallest distance.
Maybe try thread pools:
ExecutorService es = Executors.newFixedThreadPool(_numDictionaries);
for (int i = 0; i < _numDictionaries; i++) {
//prepare a Runnable implementation that contains a logic of your search
es.submit(prepared_runnable);
}
I believe you may also try to find a quick estimate of strings that completely do not match (i.e. significant difference in length), and use it to finish your logic ASAP, moving to next candidate.
I have my strong doubts that HashMaps are a suitable solution here, especially if you want to have some fuzzing and stop words. You should utilize a proper full text search solutions like ElaticSearch or Apache Solr or at least an available engine like Apache Lucene.
That being said, you can use a poor man's version: Create an array of your maps and a SortedMap, iterate over the array, take the keys of the current HashMap and store them in the SortedMap with the index of their HashMap. To retrieve a key, you first search in the SortedMap for said key, get the respective HashMap from the array using the index position and lookup the key in only one HashMap. Should be fast enough without the need for multiple threads to dig through the HashMaps. However, you could make the code below into a runnable and you can have multiple lookups in parallel.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.SortedMap;
import java.util.TreeMap;
public class Search {
public static void main(String[] arg) {
if (arg.length == 0) {
System.out.println("Must give a search word!");
System.exit(1);
}
String searchString = arg[0].toLowerCase();
/*
* Populating our HashMaps.
*/
HashMap<String, String> english = new HashMap<String, String>();
english.put("banana", "fruit");
english.put("tomato", "vegetable");
HashMap<String, String> german = new HashMap<String, String>();
german.put("Banane", "Frucht");
german.put("Tomate", "Gemüse");
/*
* Now we create our ArrayList of HashMaps for fast retrieval
*/
List<HashMap<String, String>> maps = new ArrayList<HashMap<String, String>>();
maps.add(english);
maps.add(german);
/*
* This is our index
*/
SortedMap<String, Integer> index = new TreeMap<String, Integer>(String.CASE_INSENSITIVE_ORDER);
/*
* Populating the index:
*/
for (int i = 0; i < maps.size(); i++) {
// We iterate through or HashMaps...
HashMap<String, String> currentMap = maps.get(i);
for (String key : currentMap.keySet()) {
/* ...and populate our index with lowercase versions of the keys,
* referencing the array from which the key originates.
*/
index.put(key.toLowerCase(), i);
}
}
// In case our index contains our search string...
if (index.containsKey(searchString)) {
/*
* ... we find out in which map of the ones stored in maps
* the word in the index originated from.
*/
Integer mapIndex = index.get(searchString);
/*
* Next, we look up said map.
*/
HashMap<String, String> origin = maps.get(mapIndex);
/*
* Last, we retrieve the value from the origin map
*/
String result = origin.get(searchString);
/*
* The above steps can be shortened to
* String result = maps.get(index.get(searchString).intValue()).get(searchString);
*/
System.out.println(result);
} else {
System.out.println("\"" + searchString + "\" is not in the index!");
}
}
}
Please note that this is a rather naive implementation only provided for illustration purposes. It doesn't address several problems (you can't have duplicate index entries, for example).
With this solution, you are basically trading startup speed for query speed.
Okay!!..
Since your concern is to get faster response.
I would suggest you to divide the work between threads.
Lets you have 5 dictionaries May be keep three dictionaries to one thread and rest two will take care by another thread.
And then witch ever thread finds the match will halt or terminate the other thread.
May be you need an extra logic to do that dividing work ... But that wont effect your performance time.
And may be you need little more changes in your code to get your close match:
for (Map.Entry entry : _dictionary.entrySet()) {
you are using EntrySet But you are not using values anyway it seems getting entry set is a bit expensive. And I would suggest you to just use keySet since you are not really interested in the values in that map
for (Map.Entry entry : _dictionary.keySet()) {
For more details on the proformance of map Please read this link Map performances
Iteration over the collection-views of a LinkedHashMap requires time proportional to the size of the map, regardless of its capacity. Iteration over a HashMap is likely to be more expensive, requiring time proportional to its capacity.

String[] or ArrayList better as Key in HashMap?

So I need to choose between
HashMap<String[], Object>
HashMap<ArrayList<String>,Object>
My input Parameter is: ArrayList<String> in.
The whole ArrayList<String> in cannot be the key, since it does contain elements, which are not supposed to be like a Primary Key in a database. I do know, that the first n elements of the incoming ArrayList<String> in supposed to be the primary Keys.
Which one would be faster?
Scenario:
HashMap<ArrayList<String>, Object> hmAL = new HashMap<>();
HashMap<String[], Object> hmSA = new HashMap<>();
ArrayList<String> in = new ArrayList<>();
fillWithStuff(in);
//Which one would be faster?
getObject(in,hmAL,5);
getObject(in,hmSA,5);
With Option 1:
private Object getObject(ArrayList<String> in, HashMap<ArrayList<String>, Object> hm, int n){
return hm.get(in.sublist(0,n));
}
With Option 2:
private Object getObject(ArrayList<String> in, HashMap<String[], Object> hm, int n){
String[] temp = new String[n];
for(int i=0; i<n; i++)
temp[i]=in.get(i);
return hm.get(temp);
}
Considering:
Which is faster? Short the list, or copy to an array?
I'm wondering, which hash (since it is a HashMap) would be faster. Hashing of a ArrayList, or an equal-sized array. Or doesn't it make any difference?
Using String[] is not a good idea because it does not implement hashCode(). This means if you have 2 string arrays which are different objects but with the exact same values, the map will not find it.
The implementation of 'hashCode` seems to use each of the string elements hashcode so the lookup in a map would succeed. So I'd go with this one.
That said, I would rather build a key myself based on the objects in the list.
Dealing with copying only
The subList method is implemented very efficiently in Java 7+, not requiring any copying at all. It simply returns a view directly onto the original array. Thus, in Java 7+, it will be faster than the copy element by element method. However, in Java 6, both ways are essentially equivalent.
Dealing with the method as a whole
If you look at the whole method, your choice is no longer a choice. If you want the method to function, you will have to use the first implementation. Array hashCode() does not look at the elements inside it---only the identity of the array. Because you are creating the array in your method, the Map.get() will necessary return null.
On the other hand, the List.hashCode() method runs a hash on all of the contained elements, meaning that it will successfully match if all of the contained elements are the same.
Your choice is clear.
Just to add on above two answers, I have tested in Java 7 and found on an average with list it's 50 times faster with 2000000 total elements and 1000000 elements which participate in calculating hashcode i.e. primary keys (hypothetical number). Below is the program.
public class TestHashing {
public static void main(String[] args) {
HashMap<ArrayList<String>, Object> hmAL = new HashMap();
HashMap<String[], Object> hmSA = new HashMap<>();
ArrayList<String> in = new ArrayList<>();
fillWithStuff(in);
// Which one would be faster?
long start = System.nanoTime();
getObject(in, hmAL, 1000000);
long end = System.nanoTime();
long firstTime = (end-start);
System.out.println("firstTime :: "+ firstTime);
start = System.nanoTime();
getObject1(in, hmSA, 1000000);
end = System.nanoTime();
long secondTime = (end-start);
System.out.println("secondTime :: "+ secondTime);
System.out.println("First is faster by "+ secondTime/firstTime);
}
private static void fillWithStuff(ArrayList<String> in) {
for(int i =0; i< 2000000; i++) {
in.add(i+"");
}
}
private static Object getObject(ArrayList<String> in,
HashMap<ArrayList<String>, Object> hm, int n) {
return hm.get(in.subList(0, n));
}
private static Object getObject1(ArrayList<String> in, HashMap<String[], Object> hm, int n){
String[] temp = new String[n];
for(int i=0; i<n; i++)
temp[i]=in.get(i);
return hm.get(temp);
}
}
Output
firstTime :: 218000
secondTime :: 11627000
First is faster by 53

PriorityQueue with indices for keeping counts sorted

A problem I often encounter in Java (usually while writing computational linguistics code) is the need to count the number of occurrences of some items in a dataset, then sort the items by their counts. The simplest concrete example is word counting: I need to count the number of occurrences of each word in a text file, then sort the words by their counts to find the most frequently used words.
Unfortunately, Java doesn't seem to have a good data structure for this task. I need to use the words as indices of a collection while I'm counting, so that I can efficiently look up the right counter to increment every time I read a word, but the values I want to sort on are the counts, not the words.
Map<String, Integer> provides the interface I need for looking up the count associated with a word, but Maps can only be sorted by their keys (i.e. TreeMap). PriorityQueue is a nice heap implementation that will sort on whatever comparator you give it, but it provides no way to access the elements by some kind of index and no way to update and re-heapify an element (other than by removing and adding it). Its single type parameter also means I need to stick the words and their counts together into one object in order to use it.
My current "solution" is to store the counts in a Map while counting them, then copy them all into a PriorityQueue to sort them:
Map<String, Integer> wordCounts = countStuff();
PriorityQueue<NamedCount> sortedCounts = new PriorityQueue<>(wordCounts.size(),
Collections.reverseOrder());
for(Entry<String, Integer> count : wordCounts.entrySet()) {
sortedCounts.add(new NamedCount(count.getKey(), count.getValue()));
}
(Note that NamedCount is just a simple pair<string, int> that implements Comparable to compare the integers). But this is inefficient, especially since the data set can be very large, and keeping two copies of the count set in memory is wasteful.
Is there any way I can get random access to the objects inside the PriorityQueue, so that I can just store one copy of the counts in the PriorityQueue and re-heapify as I update them? Would it make sense to use a Map<String, NamedCount> that keeps "pointers" to the objects in the PriorityQueue<NamedCount>?
First, for the base data structure, typically Guava's Multiset<String> is preferable to Map<String, Integer> in the same way that Set<String> is preferable to Map<String, Boolean>. It's a cleaner API and encapsulates the incrementing.
Now, if this were me, I would implement a custom Multiset which adds some additional logic to index the counts, and return them. Something like this:
class IndexedMultiset<T extends Comparable<T>> extends ForwardingMultiset<T> {
private final Multiset<T> delegate = HashMultiset.create();
private final TreeMultimap<Integer, T> countIndex = TreeMultimap.create();
#Override
protected Multiset<T> delegate() {
return delegate;
}
#Override
public int add(T element, int occurrences) {
int prev = super.add(element, occurrences);
countIndex.remove(prev, element);
countIndex.put(count(element), element);
return prev;
}
#Override
public boolean add(T element) {
return super.standardAdd(element);
}
//similar for remove, setCount, etc
}
Then I'd add whatever query capabilities you need based on counts. For example, retrieving an iterable of word/count pairs in descending order could look something like this:
public Iterable<CountEntry<T>> descendingCounts() {
return countIndex.keySet().descendingSet().stream()
.flatMap((count) -> countIndex.get(count).stream())
.map((element) -> new CountEntry<>(element, count(element)))
.collect(Collectors.toList());
}
public static class CountEntry<T> {
private final T element;
private final int count;
public CountEntry(T element, int count) {
this.element = element;
this.count = count;
}
public T element() {
return element;
}
public int count() {
return count;
}
#Override
public String toString() {
return element + ": " + count;
}
}
And it would all be used like this:
public static void main(String... args) {
IndexedMultiset<String> wordCounts = new IndexedMultiset<>();
wordCounts.add("foo");
wordCounts.add("bar");
wordCounts.add("baz");
wordCounts.add("baz");
System.out.println(wordCounts.descendingCounts()); //[baz: 2, bar: 1, foo: 1]
wordCounts.add("foo");
wordCounts.add("foo");
wordCounts.add("foo");
System.out.println(wordCounts.descendingCounts()); //[foo: 4, baz: 2, bar: 1]
}
If you can use third-party libraries like Guava, Multiset is designed pretty specifically as a solution to this problem:
Multiset<String> multiset = HashMultiset.create();
for (String word : words) {
multiset.add(word);
}
System.out.println(Multisets.copyHighestCountFirst(multiset));

Summing values in a List. Could I be doing this more efficiently?

I have a list of Fact objects. Each object has a Date field (reportingDate) and a long field (numberSaved). There are several results for each reportingDate. I'm trying to get a sum of all numberSaved values for each reporting date. Currently, I'm doing it like this:
private static List<Fact> sumFacts(List<Fact> facts) {
List<Fact> summedFacts = new ArrayList<Fact>();
for (Fact fact : facts) {
boolean found = false;
for (Fact sumFact : summedFacts) {
if(sumFact.getReportingDate().equals(fact.getReportingDate())) {
found = true;
sumFact.setNumberSaved(sumFact.getNumberSaved() + fact.getNumberSaved());
}
}
if (!found) summedFacts.add(fact);
}
return summedFacts;
}
public class Fact {
String reportingDate;
long numberSaved;
public String getReportingDate() {
return reportingDate;
}
public void setReportingDate(String reportingDate) {
this.reportingDate = reportingDate;
}
public long getNumberSaved() {
return numberSaved;
}
public void setNumberSaved(long numberSaved) {
this.numberSaved = numberSaved;
}
}
For each item in the original list, it iterates through the new list looking for a matching Date. If it finds an object with a matching date, it adds its numberSaved value to it. If it makes it through the whole list without finding a matching date, it adds itself to the new list.
Is there a more efficient way that I could be summing the values into a list of Fact objects with unique dates?
EDIT:
I forgot to mention that I need to maintain the order of the items
Instead of keeping your facts in a List and iterating over it (producing an O(n^2) complexity), you could store them in a map form the reporting date to the fact object, giving you an O(n) complexity:
private static List<Fact> sumFacts(List<Fact> facts) {
Map<String, Fact> summedFacts = new HashMap<Fact>();
for (Fact fact : facts) {
summedFact = summedFacts.get(fact.getReportingDate());
if (summedFact == null) {
summedFacts.put (fact.getReportingDate(), fact);
} else {
summedFact.setNumberSaved(summedFact.getNumberSaved() + fact.getNumberSaved());
}
}
return new ArrayList<Fact>(summedFacts.values());
}
The only way this could be faster is if both lists were sorted by some key (most likely your date that you are using). Checking for the existence of an object in an unsorted list is O(n), and you are doing this for every element of another list, making the problem O(m * n).
This shows that your solution is as efficient as it can be without presorting lists.
The most you can improve on is to use List.add(int, Object) so that it inserts the item to the front of the list so that it is not looped over again.
You could greatly increase performance by ussing a HashTable for summedFacts (read more on http://docs.oracle.com/javase/7/docs/api/java/util/Hashtable.html)
You would have your date converted to string and use it as Key of the HashTable. The value of the HashTable will hold the sum for the Fact objects with the same date.
HashTable access is instant (O(1)) therefore this solution will lead you to an O(n) implementation instead of your O(n*m) one.
For example:
private static HashTable<string, Fact> sumFacts(List<Fact> facts) {
HashTable<string, Fact> summedFacts = new Hashtable<string, Fact>();
for (Fact fact : facts) {
// Check if the item with this date is already added to the HashTable. If not, then add it
if (summedFacts.get(sumFact.getReportingDate()) == null)
summedFacts.put(fact.getReportingDate(), fact); // add the value to the HashTable.
else {
// If the date is already there, than perform adition.
currentFact = summedFacts.get(fact.getReportingDate());
currentFact.setNumberSaved(fact.getNumberSaved() + currentFact.getNumberSaved());
}
}
}
return summedFacts;
}

Ordered insertion in linkedHashSet, any performant way ?

So I have a LinkedHashSet , with values say a1, a2, , b, c1, c2
I want to replace, b with x , such that the order of x should be same as order of b.
One obvious way would be
private LinkedHashSet<String> orderedSubstitution(final Set<String> originalOrderedSet, final String oldItem,
final String newItem) {
final LinkedHashSet<String> newOrderedSet = new LinkedHashSet<String>();
// Things we do to maintain order in a linkedHashSet
for (final String stringItem : originalOrderedSet) {
if (stringItem.equals(oldItem)) {
newOrderedSet.add(newItem);
} else {
newOrderedSet.add(stringItem);
}
}
return newOrderedSet;
}
not only this is O(n) i also feel this is not the fastest way. Any better solution ?
NOTE : I HAVE TO use linkedHashMap.
One way to do it would be to use a subclass of LinkedHashSet that has the replacement built in, e.g.:
public class ReplacingLinkedHashSet extends LinkedHashSet<String> {
private final String what;
private final String with;
public ReplacingLinkedHashSet(String what, String with) {
this.what = what;
this.with = with;
}
#Override
public Iterator<String> iterator() {
final Iterator<String> iterator = super.iterator();
return new Iterator<String>() {
#Override
public boolean hasNext() {
return iterator.hasNext();
}
#Override
public String next() {
String next = iterator.next();
return what.equals(next) ? with : next;
}
#Override
public void remove() {
iterator.remove();
}
};
}
}
But that means the replacement would have to be known before you fill the Set.
(Of course you could easily turn this <String> version into a generic one.
Responding to comments:
OK, then there is no way to solve it without a full iteration. You could however just leave the LinkedHashSet untouched and decorate the iterator when retrieving the values.
Create a structure Map
Insert all the string with < String, OrderOfTheString>
Do the insertion of the new String by adding a small Delta after the current string's OrderOfTheString.
Convert Map to LikedHashSet
I know it is complicated but it is definately better when we have linked hash Map of ~1000000 elements and there are about 1000 elements to be inserted.

Categories