Searching a Hashmap - java

Hi I am populating a Hashmap with a dictionary.txt file and I am splitting the hashmap into sets of word lengths.
Im having trouble searching the Hashmap for a pattern of "a*d**k";
Can anyone help me?
I need to know how to search a Hashmap?
I would really appreciate if you could help me.
Thank you.

A HashMap is simply the wrong data structure for a pattern search.
You should look into technologies that feature pattern searching out of the box, like Lucene
And in answer to this comment:
Im using it for Android, and its the
fastest way of searching.
HashMaps are awfully fast, that's true, but only if you use them as intended. In your scenario, hash codes are not important, as you know that all keys are numeric and you probably won't have any word that's longer than, say, 30 letters.
So why not just use an Array or ArrayList of Sets instead of a HashMap and replace map.get(string.length()) with list.get(string.length()-1) or array[string.length()-1]. I bet the performance will be better than with a HashMap (but we won't be able to tell the difference unless you have a reaaaallly old machine or gazillions of entries).
I'm not saying my design with a List or Array is nicer, but you are using a data structure for a purpose it wasn't intended for.
Seriously: How about writing all your words to a flat file (one word per line, sorted by word length and then by alphabetically) and just running the regex query on that file? Stream the file and search the individual lines if it's too large, or read it as a String and keep that in memory if IO is too slow.
Or how about just using a TreeSet with a custom Comparator?
Sample code:
public class PatternSearch{
enum StringComparator implements Comparator<String>{
LENGTH_THEN_ALPHA{
#Override
public int compare(final String first, final String second){
// compare lengths
int result =
Integer.valueOf(first.length()).compareTo(
Integer.valueOf(second.length()));
// and if they are the same, compare contents
if(result == 0){
result = first.compareTo(second);
}
return result;
}
}
}
private final SortedSet<String> data =
new TreeSet<String>(StringComparator.LENGTH_THEN_ALPHA);
public boolean addWord(final String word){
return data.add(word.toLowerCase());
}
public Set<String> findByPattern(final String patternString){
final Pattern pattern =
Pattern.compile(patternString.toLowerCase().replace('*', '.'));
final Set<String> results = new TreeSet<String>();
for(final String word : data.subSet(
// this should probably be optimized :-)
patternString.replaceAll(".", "a"),
patternString.replaceAll(".", "z"))){
if(pattern.matcher(word).matches()){
results.add(word);
}
}
return results;
}
}

Related

Sorting LinkedHashMap<String, String[]>.entrySet() by the keys, but in the middle of the keys using a regex

I have a LinkedHashMap which maps strings to string arrays.
The keys have the format of something like this: "xxx (yyy(0.123))"
Basically, I want to be able to sort the entry set in such a way that it sorts it by the decimal part, and not the beginning of the string. What I have done so far is converting the entry set to an ArrayList so that I can try calling Arrays.sort on it, but obviously that's going to just sort by the beginning of the string.
What I'm currently thinking is that I would have to go through this array, convert each key in the pair to a custom class with a comparator that compares the way I want it to (with the regular expression .*\((.*)\)\) to find the decimal). However, that sounds like a bunch of unnecessary overhead, so I was wondering if there was a simpler way. Thanks in advance.
First, you cannot "sort" a LinkedHashMap. LinkedHashMap maintain the iteration order based on the order of insertion.
If you means creating another LinkedHashMap by inserting using values from the original map, with order based on sorted order: You need to be aware of any new entries added after your initial construction will be unsorted. So you may want to create an unmodifiable Map.
For the Comparator implementation, you do not need to make it to your custom class. Just create a comparator that do the comparison is good enough.
Like this:
(haven't compiled, just to show you the idea)
// assume the key is in format of "ABCDE,12345", and you want to sort by the numeric part:
Map<String, Foo> oldMap = ....; // assume you populated something in it
Map<String, Foo> sortedMap
= new TreeMap((a,b) -> {
// here I use split(), you can use regex
int aNum = Integer.valueOf(a.split(",")[1]);
int bNum = Integer.valueOf(b.split(",")[1]);
if (aNum != bNum ) {
return aNum - bNum;
} else {
return a.compareTo(b);
});
sortedMap.addAll(oldMap);
// now sortedMap contains your entries in sorted order.
// you may construct a new LinkedHashMap with it or do whatever you want
Your solution sounds fine.
If you run into performance issues, you could look buffering the decimal value by replacing your strings with an object that contains the string and the decimal value. Then it does not need to be recalculated multiple times during the sort.
There are trade offs for the buffered solution as above and figuring out which technique is optimal will really depend on your entire solution.
Is there a reason you need to use LinkedHashMap? The javadoc specifically states
This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order)
TreeMap seems a better fit for what you're trying to achieve, which allows you to provide a Comparator at construction. Using Java 8, this could be achieved with something like:
private static final String DOUBLE_REGEX = "(?<value>\\d+(?:\\.\\d+)?)";
private static final String FIND_REGEX = "[^\\d]*\\(" + DOUBLE_REGEX + "\\)[^\\d]*";
private static final Pattern FIND_PATTERN = Pattern.compile(FIND_REGEX);
private static final Comparator<String> COMPARATOR = Comparator.comparingDouble(
s -> {
final Matcher matcher = FIND_PATTERN.matcher(s);
if (!matcher.find()) {
throw new IllegalArgumentException("Cannot compare key: " + s);
}
return Double.parseDouble(matcher.group("value"));
});
private final Map<String, List<String>> map = new TreeMap<>(COMPARATOR);
Edit: If it has to be a LinkedHashMap (yours), you can always:
map.putAll(yours);
yours.clear();
yours.putAll(map);

Efficient way to test if a string is substring of any in a list of strings

I want to know the best way to compare a string to a list of strings. Here is the code I have in my mind, but it's clear that it's not good in terms of time complexity.
for (String large : list1) {
for (String small : list2) {
if (large.contains(small)) {
// DO SOMETHING
} else {
// NOT FOR ME
}
}
// FURTHER MANIPULATION OF STRING
}
Both lists of strings can contain more than thousand values, so the worst case complexity can rise to 1000×1000×length which is a mess. I want to know the best way to perform the task of comparing a string with a list of strings, in the given scenario above.
You could just do this:
for (String small : list2) {
if (set1.contains(small)) {
// DO SOMETHING
} else {
// NOT FOR ME
}
}
set1 should be the larger list of String, and instead of keeping it as a List<String>, use a Set<String> or a HashSet<String>
Thanks to the first answer by sandeep. Here is the solution:
List<String> firstCollection = new ArrayList<>();
Set<String> secondCollection = new HashSet<>();
//POPULATE BOTH LISTS HERE.
for(String string: firstCollection){
if(secondCollection.contains(string)){
//YES, THE STRING IS THERE IN THE SECOND LIST
}else{
//NOPE, THE STRING IS NOT THERE IN THE SECOND LIST
}
}
This is, unfortunately, a difficult and messy problem. It's because you're checking whether a small string is a substring of a bunch of large strings, instead of checking that the small string is equal to a bunch of large strings.
The best solution depends on exactly what problem you need to solve, but here is a reasonable first attempt:
In a temporary place, concatenate all the large strings together, then construct a suffix tree on this long concatenated string. With this structure, we should be able to find all the substring matches of any given small among all the large quickly.

Finding a loose match for a string in arraylist

I have a huge array list which contains 1000 entries out of which one of the entry is "world". And, I have a word "big world". I want to get the word "big world" matched with "world" in the arraylist.
What is the most cost effective way of doing it? I cannot use .contains method of array list, and If I traverse all the 1000 entries and match them by pattern its going to be very costly in terms of performance. I am using Java for this.
Could you please let me know what is the best way for this?
Cheers,
J
You can split up every single element of the ArrayList into words and stop as soon as you find one of them.
I suppose by your profile you develop in Java, with Lucene you would easily do something like that
public class NodesAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
Tokenizer tokenizer = new StandardTokenizer(reader)
TokenFilter lowerCaseFilter = new LowerCaseFilter(tokenizer)
TokenFilter stopFilter = new StopFilter(lowerCaseFilter, Data.stopWords.collect{ it.text } as String[])
SnowballFilter snowballFilter = new SnowballFilter(stopFilter, new org.tartarus.snowball.ext.ItalianStemmer())
return snowballFilter
}
}
Analyzer analyzer = new NodesAnalyzer()
TokenStream ts = analyzer.tokenStream(null, new StringReader(str));
Token token = ts.next()
while (token != null) {
String cur = token.term()
token = ts.next();
}
Note: this is Groovy code that I copied from a personal project so you will have to translate things like Data.stopWords.collect{ it.text } as String[] to use with plain Java
Assuming you dont know the content of the arraylist elements. you will have to traverse the whole arraylist.
Traversing the arraylist would cost you O(n).
Sorting the arraylist wouldnt help you because you are talking about a searching a string in a set of strings. and still sorting would be more expensive. O(nlogn)
If you have to search the list repeatedly, it may make sense to use the sort() and binarySearch() methods of Collections.
Addendum: As noted by #user177883, the cost of an O(n log n) sort must be weighed against the benefit of subsequent O(log n) searches.
The word "heart" matches the [word] "ear".
As an exact match is insufficient, this approach would be inadequate.
I had a very similar issue.
Solved it by using this if/else if statement.
if (myArrayList.contains(wordThatIsEntered)
&& wordThatCantBeMatched.equals(wordThatIsEntered)) {
Toast.makeText(getApplicationContext(),
"WORD CAN'T BE THE SAME OR THAT WORD ISN'T HERE",
Toast.LENGTH_SHORT).show();
}
else if (myArrayList.contains(wordThatIsEntered)) {
Toast.makeText(getApplicationContext(),
"FOUND THE EXACT WORD YOU ARE LOOKING FOR!",
Toast.LENGTH_SHORT).show();
}

how to do sorting using java

I have text file with list of alphabets and numbers. I want to do sorting w.r.t this number using java.
My text file looks like this:
a--->12347
g--->65784
r--->675
I read the text file and i split it now. But i dont know how to perform sorting . I am new to java. Please give me a idea.
My output want to be
g--->65784
a--->12347
r--->675
Please help me. Thanks in advance.
My coding is
String str = "";
BufferedReader br = new BufferedReader(new FileReader("counts.txt"));
while ((str = br.readLine()) != null) {
String[] get = str.split("---->>");
When i search the internet all suggest in the type of arrays. I tried. But no use.How to include the get[1] into array.
int arr[]=new int[50]
arr[i]=get[1];
for(int i=0;i<50000;i++){
for(int j=i+1;j<60000;j++){
if(arr[i]>arr[j]){
System.out.println(arr[i]);
}
}
You should use the Arrays.sort() or Collections.sort() methods that allows you to specify a custom Comparator, and implement such a Comparator to determine how the strings should be compared for the purpose of sorting (since you don't want the default lexicographic order). It looks like that should involve parsing them as integers.
Your str.split looks good to me. Use Integer.parseInt to get an int out of the string portion representing the number. Then put the "labels" and numbers in a TreeMap as described below. The TreeMap will keep the entries sorted according to the keys (the numbers in your case).
import java.util.TreeMap;
public class Test {
public static void main(String[] args) {
TreeMap<Integer, String> tm = new TreeMap<Integer, String>();
tm.put(12347, "a");
tm.put(65784, "g");
tm.put(675, "r");
for (Integer num : tm.keySet())
System.out.println(tm.get(num) + "--->" + num);
}
}
Output:
r--->675
a--->12347
g--->65784
From the API for TreeMap:
The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.
you can use TreeMap and print its content with iterator for keys. You may have to implement your own Comparator.
rather than give you the code, I would point you on the following path: TreeMap. Read, learn, implement
What you want to do is:
1) convert the numbers into integers
2) Store them in a collection
3) use Collections.sort() to sort the list.
I assume that you are an absolute beginner.
You are correct till the split part.
You need to place the split number immediately into a string or object (custom object)
You would create something like:
class MyClass //please, a better name,
{
//and better field names, based on your functionality
int number;
String string;
}
Note: You have to implement equals and hashCode
After the split (your first snippet), create an object of this class, place get[0] into string and get[1] into number (after converting the string to integer)
You place this object into an TreeMap.
Now you have a sorted list.
I have deliberately not specified the details. Feel free to google for any term/phrase you dont understand. By this way you understand, rather than copy pasting some code.

How can I form an ordered list of values extracted from HashMap?

My problem is actually more nuanced than the question suggests, but wanted to keep the header brief.
I have a HashMap<String, File> of File objects as values. The keys are String name fields which are part of the File instances. I need to iterate over the values in the HashMap and return them as a single String.
This is what I have currently:
private String getFiles()
{
Collection<File> fileCollection = files.values();
StringBuilder allFilesString = new StringBuilder();
for(File file : fileCollection) {
allFilesString.append(file.toString());
}
return allFilesString.toString();
}
This does the job, but ideally I want the separate File values to be appended to the StringBuilder in order of int fileID, which is a field of the File class.
Hope I've made that clear enough.
Something like this should work:
List<File> fileCollection = new ArrayList<File>(files.values());
Collections.sort(fileCollection,
new Comparator<File>()
{
public int compare(File fileA, File fileB)
{
final int retVal;
if(fileA.fileID > fileB.fileID)
{
retVal = 1;
}
else if(fileA.fileID < fileB.fileID)
{
retVal = -1;
}
else
{
retVal = 0;
}
return (retVal);
}
});
Unfortunately there is no way of getting data out of a HashMap in any recognizable order. You have to either put all the values into a TreeSet with a Comparator that uses the fileID, or put them into an ArrayList and sort them with Collections.sort, again with a Comparator that compares the way you want.
The TreeSet method doesn't work if there are any duplicates, and it may be overkill since you're not going to be adding things to or removing things from the Set. The Collections.sort method is a good solution for instances like this where you're going to take the whole HashSet, sort the results, and then toss away the sorted collection as soon as you've generated the result.
OK, this is what I've come up with. Seems to solve the problem, returns a String with the File objects nicely ordered by their fileId.
public String getFiles()
{
List<File> fileList = new ArrayList<File>(files.values());
Collections.sort(fileList, new Comparator<File>()
{
public int compare(File fileA, File fileB)
{
if(fileA.getFileId() > fileB.getFileId())
{
return 1;
}
else if(fileA.getFileId() < fileB.getFileId())
{
return -1;
}
return 0;
}
});
StringBuilder allFilesString = new StringBuilder();
for(File file : fileList) {
allFilesString.append(file.toString());
}
return allFilesString.toString();
}
I've never used Comparator before (relatively new to Java), so would appreciate any feedback if I've implemented anything incorrectly.
Why not collect it in an array, sort it, then concatenate it?
-- MarkusQ
You'll have to add your values() Collection to an ArrayList and sort it using Collections.sort() with a custom Comparator instance before iterating over it.
BTW, note that it's pointless to initialize the StringBuffer with the size of the collection, since you'll be adding far more than 1 character per collection element.
Create a temporary List, then add each pair of data into it. Sort it with Collections.sort() according to your custom comparator then you will have the List in your desired order.
Here is the method you're looking for: http://java.sun.com/javase/6/docs/api/java/util/Collections.html#sort(java.util.List,%20java.util.Comparator)
I created LinkedHashMap a dozen times before it was added to the set of collections.
What you probably want to do is create a TreeHashMap collection.
Creating a second collection and appending anything added to both isn't really a size hit, and you get the performance of both (with the cost of a little bit of time when you add).
Doing it as a new collection helps your code stay clean and neat. The collection class should just be a few lines long, and should just replace your existing hashmap...
If you get in the habit of always wrapping your collections, this stuff just works, you never even think of it.
StringBuffer allFilesString = new StringBuffer(fileCollection.size());
Unless all your file.toString() is one character on average, you are probably making the StringBuffer too small. (If it not right, you may as well not set it and make the code simpler) You may get better results if you make it some multiple of the size. Additionally StringBuffer is synchronized, but StringBuilder is not and there for more efficient here.
Remove unnecessary if statment.
List<File> fileCollection = new ArrayList<File>(files.values());
Collections.sort(fileCollection,
new Comparator<File>() {
public int compare(File a, File b) {
return (a.fileID - b.fileID);
}
});

Categories