Say I have a Hashtable<String, Object> with such keys and values:
apple => 1
orange => 2
mossberg => 3
I can use the standard get method to get 1 by "apple", but what I want is getting the same value (or a list of values) by a part of the key, for example "ppl". Of course it may yield several results, in this case I want to be able to process each key-value pair. So basically similar to the LIKE '%ppl%' SQL statement, but I don't want to use a (in-memory) database just because I don't want to add unnecessary complexity. What would you recommend?
Update:
Storing data in a Hashtable isn't a requirement. I'm seeking for a kind of a general approach to solve this.
The obvious brute-force approach would be to iterate through the keys in the map and match them against the char sequence. That could be fine for a small map, but of course it does not scale.
This could be improved by using a second map to cache search results. Whenever you collect a list of keys matching a given char sequence, you can store these in the second map so that next time the lookup is fast. Of course, if the original map is changed often, it may get complicated to update the cache. As always with caches, it works best if the map is read much more often than changed.
Alternatively, if you know the possible char sequences in advance, you could pre-generate the lists of matching strings and pre-fill your cache map.
Update: Hashtable is not recommended anyway - it is synchronized, thus much slower than it should be. You are better off using HashMap if no concurrency is involved, or ConcurrentHashMap otherwise. Latter outperforms a Hashtable by far.
Apart from that, out of the top of my head I can't think of a better collection to this task than maps. Of course, you may experiment with different map implementations, to find the one which suits best your specific circumstances and usage patterns. In general, it would thus be
Map<String, Object> fruits;
Map<String, List<String>> matchingKeys;
Not without iterating through explicitly. Hashtable is designed to go (exact) key->value in O(1), nothing more, nothing less. If you will be doing query operations with large amounts of data, I recommend you do consider a database. You can use an embedded system like SQLite (see SQLiteJDBC) so no separate process or installation is required. You then have the option of database indexes.
I know of no standard Java collection that can do this type of operation efficiently.
Sounds like you need a trie with references to your data. A trie stores strings and lets you search for strings by prefix. I don't know the Java standard library too well and I have no idea whether it provides an implementation, but one is available here:
http://www.cs.duke.edu/~ola/courses/cps108/fall96/joggle/trie/Trie.java
Unfortunately, a trie only lets you search by prefixes. You can work around this by storing every possible suffix of each of your keys:
For 'apple', you'd store the strings
'apple'
'pple'
'ple'
'le'
'e'
Which would allow you to search for every prefix of every suffix of your keys.
Admittedly, this is the kind of "solution" that would prompt me to continue looking for other options.
first of all, use hashmap, not hashtable.
Then, you can filter the map using a predicate by using utilities in google guava
public Collection<Object> getValues(){
Map<String,Object> filtered = Maps.filterKeys(map,new Predicate<String>(){
//predicate methods
});
return filtered.values();
}
Can't be done in a single operation
You may want to try to iterate the keys and use the ones that contain your desired string.
The only solution I can see (I'm not Java expert) is to iterate over the keys and check for matching against a regular expression. If it matches, you put the matched key-value pair in the hashtable that will be returned.
If you can somehow reduce the problem to searching by prefix, you might find a NavigableMap helpful.
it will be interesting to you to look throw these question: Fuzzy string search library in Java
Also take a look on Lucene (answer number two)
Related
I was trying to implement a trie and read in an example implementation that it would be more space efficient to use a small array of size 26 to store the children because then you wouldn't have to waste space with a HashMap (the code was in Java, if that makes a difference)
But wouldn't a map be more space efficient since you don't necessarily need to store all 26 values? Or is a HashMap object that contains Character objects as keys just more space because a simple int[] type does not use extra space in the background that the implementation for these more complex objects would use?
Just wanting to check if maybe this person was mistaken or if there's some overhead involved in using object types like HashMap that I should be aware of.
A hashmap stores keys and values, so if you were to implement a trie using a hashmap, you would be storing not only the values, but also the keys. If you use an array, then the key is actually the index of the value in the array, so you do not have to store it anywhere.
Besides that, hashmaps are less space efficient than arrays because they always have a load factor which is smaller than one, which is the same as saying that they keep more entries allocated than necessary due to the way they work. I am not expanding on this, because it is not related to your issue, but if you are curious, search for hashmap load factor.
This is a classic programming compromise. In this case, using a HashMap will take more space, but that may be a price you're willing to pay for improved performance. Or it might not. Depends on your problem.
I'm looking for most effective way to get all the elements from List<String> which contain some String value ("value1") for example.
First thought - simple iteration and adding the elements which contains "value1" to another List<String> But this task must be done very often and by many users.
Thought about list.RemoveAll(), but how do I remove all elements which don't contain "value1"?
So, what is the way to make it most efficiently?
UPDATE:
The whole picture - need to need to read the logs from file very often and for multiple users simultaneously. The logs must be filtered by the username from file. each string in file contains username.
In terms of time efficiency, you cannot get to better result than linear (O(n)) if you want to iterate through the whole list.
Deciding between LinkedList and ArrayList etc. is most likely irrelevant as the differences are small.
If you want a better time than linear to list size, you need to build on some assumptions and prerequisites:
if you know beforehand what string you'll search for, you can build another list along with your original list containing only relevant records
if you know you're going to query one list multiple times, you could build an index
If you just have a list on input that someone gave you, and you need to read through this one input once and find the relevant strings, then you're stuck with linear time since you cannot avoid reading the list at least once.
From your comments it seems like your list is a couple of log statements that should be grouped by user id (which would be your "value1"). If you really need to read the logs very often and for multiple users simultaneously you might consider some caching, possibly with grouping by user id.
As an example you could maintain an additional log file per user and just display it when needed. Alterantively you could keep the latest log statements in memory by employing some FIFO buffer which is grouped by user id (could be a buffer per user and maybe another LIFO layer on top of that).
However, depending on your use case it might not be worth the effort and you might just go and filter the list whenever the user requests to do so. In that case I'd recommend reading the file line by line and only adding the matching lines to the list. If you first read everything into a single list and then remove non-matching elements it'll be less efficient (you'd have to iterate more often, shift elements etc.) and temporarily use more memory (as opposed by discarding every non-matching line right after checking it).
Instead of List, Use TreeSet with provided Comparator so that all Strings with "value1" are at the beginning. When iterating, as soon as the string does not contain "value1", all the remaining do not have it, and you can stop to iterate.
The iteration is likely the only way, but you can allow Java to optimize it as much as possible (and use an elegant, non imperative syntax) by employing Java 8's streams:
// test list
List<String> original = new ArrayList<String>(){
{
add("value1");add("foo");add("foovalue1");add("value1foo");
}
};
List<String> trimmed = original
.stream()
.filter((s) -> s.contains("value1"))
.collect(Collectors.toList());
System.out.println(trimmed);
Output
[value1, foovalue1, value1foo]
Notes
One part of your question that may require more information is "performed often, by many users" - this may call for some concurrency-handling mechanism.
The actual functionality is not very clear. You may still have room to optimize your code early by fetching and collecting the "value1"-containing Strings prior to building you List
Ok, in this I can suggest you the simplest one, I had used.
Use of an Iterator, makes it easier but if you go with list.remove(val) , where val = "value1" , may give you UnsupportedOperationException
List list = yourList; /contains "value1"/
for (Iterator<String> itr = list.iterator(); itr.hasNext();){
String val = itr.next();
if(!val.equals("value1")){
itr.remove();
}
}
Try this one and let me know. :)
I have the following key-value system (HashMap) , where String would be a key like this "2014/12/06".
LinkedHashMap<String, Value>
So, I can retrieve an item knowing the key, but what I'm looking for is a method to retrieve a list of the value which key matches partialy, I mean, how could I retrieve all the values of 2014?.
I would like to avoid solutions like, test every item in the list, brute-force, or similar.
thanks.
Apart from doing the brute-force solution of iterating over all the keys, I can think of two options :
Use a TreeMap, in which the keys are sorted, so you can find the first key that is >= "2014/01/01" (using map.getCeilingEntry("2014/01/01")) and go over all the keys from there.
Use a hierarchy of Maps - i.e. Map<String,Map<String,Value>>. The key in the outer Map would be the year. The key in the inner map would be the full date.
Not possible with LinkedHashMap only. If you can copy the keys to an ordered list you can perform a binary search on that and then do a LinkedHashMap.get(...) with the full key(s).
If you're only ever going to want to retrieve items using the first part of the key, then you want a TreeMap rather than a LinkedHashMap. A LinkedHashMap is sorted according to insertion order, which is no use for this, but a TreeMap is sorted according to natural ordering, or to a Comparator that you supply. This means that you can find the first entry that starts with 2014 efficiently (in log time), and then iterate through until you get to the first one that doesn't match.
If you want to be able to match on any part of the key, then you need a totally different solution, way beyond a simple Map. You'd need to look into full text searching and indexing. You could try something like Lucene.
You could refine a hash function for your values so that values with similar year would hash around similar prefixed hashes. That wouldn't be efficient (probably poor distribution of hashes) nor to the spirit of HashMaps. Use other map implementations such as TreeMaps that keep an order of your choice.
I am looking for some kind of map that would have fixed size, for example 20 entries, but not only, I want to keep only the lowest values, lets say I'm evaluating some kind of function and inserting results in my map ( I need map because I have to keep Key-Value ) but I want to have only 20 lowest results. I was thinking about sorting and then removing last element but I need to do it for milions of records, so sorting everytime I add value is not efficient, maybe there is some better way?
Thanks for help.
There is no built in data structure for this in java. You can try looking for one in the guava library. Otherwise think about using a LinkedHashMap or a TreeMap for this. You can wrap it in your own class to take care of the limiting.
If you care about efficiency be advised that TreeMap is in fact a red-black tree internally so put() has the time complexity of log(n).
Imagine you have a huge cache of data that is to be searched through by 4 ways :
exact match
prefix%
%suffix
%infix%
I'm using Trie for the first 3 types of searching, but I can't figure out how to approach the fourth one other than sequential processing of huge array of elements.
If your dataset is huge cosider using a search platform like Apache Solr so that you dont end up in a performance mess.
You can construct a navigable map or set (eg TreeMap or TreeSet) for the 2 (with keys in normal order) and 3 (keys in reverse)
For option 4 you can construct a collection with a key for every starting letter. You can simplify this depending on your requirement. This can lead to more space being used but get O(log n) lookup times.
For #4 I am thinking if you pre-compute the number of occurances of each character then you can look up in that table for entires that have at least as many occurances of the characters in the search string.
How efficient this algorithm is will probably depend on the nature of the data and the search string. It might be useful to give some examples of both here to get better answers.