Efficient alternative to nested For Loop - java

I am doing profanity filter. I have 2 for loops nested as shown below. Is there a better way of avoiding nested for loop and improve time complexity.
boolean isProfane = false;
final String phraseInLowerCase = phrase.toLowerCase();
for (int start = 0; start < phraseInLowerCase.length(); start++) {
if (isProfane) {
break;
}
for (int offset = 1; offset < (phraseInLowerCase.length() - start + 1 ); offset++) {
String subGeneratedCode = phraseInLowerCase.substring(start, start + offset);
//BlacklistPhraseSet is a HashSet which contains all profane words
if (blacklistPhraseSet.contains(subGeneratedCode)) {
isProfane=true;
break;
}
}
}

Consider Java 8 version of #Mad Physicist implementation:
boolean isProfane = Stream.of(phrase.split("\\s+"))
.map(String::toLowerCase)
.anyMatch(w -> blacklistPhraseSet.contains(w));
or
boolean isProfane = Stream.of(phrase
.toLowerCase()
.split("\\s+"))
.anyMatch(w -> blacklistPhraseSet.contains(w));

If you want to check every possible combination of consecutive characters, then your algorithm is O(n^2), assuming that you use a Set with O(1) lookup characteristics, like a HashSet. You would probably be able to reduce this by breaking the data and the blacklist into Trie structures and walking along each possibility that way.
A simpler approach might be to use a heuristic like "profanity always starts and ends at a word boundary". Then you can do
isProfane = false;
for(String word: phrase.toLowerCase().split("\\s+")) {
if(blacklistPhraseSet.contains(word)) {
isProfane = true;
break;
}
}

You won't improve a lot on time complexity, because those use iterations under the hood but you could split the phrase on spaces and iterate over the array of words from your phrase.
Something like:
String[] arrayWords = phrase.toLowerCase().split(" ");
for(String word:arrayWords){
if(blacklistPhraseSet.contains(word)){
isProfane = true;
break;
}
}
The problem of this code is that unless your word contains compound words, it won't match those, whereas your code as I understand it will. The word "f**k" in the black list won't match "f**kwit" in my code, it will in yours.

Related

Find a complex element in a set of elements

I have a function that allows me to find a match between an incomplete element and at least one element in a set. An example of an incomplete element is 22.2.X.13, in which there is an item (defined with X) that could assume any value.
The goal of this function is to find at least one element in a set of elements that has 22 in the first position, 2 on the second, and 13 on the fourth.
For example, if we consider the set:
{
20.8.31.13,
32.3.29.13,
24.2.12.13,
19.2.37.13,
22.2.22.13,
27.17.22.13,
26.22.32.13,
22.3.22.13,
20.19.12.13,
17.4.37.13,
31.8.34.13
}
The output of the function return True since there are elements 22.2.22.13 which correspond to 22.2.X.13.
My function compares each pair of elements like strings and each item of the elements as an integer:
public boolean containsElement(String element) {
StringTokenizer strow = null, st = null;
boolean check = true;
String nextrow = "", next = "";
for(String row : setOfElements) {
strow = new StringTokenizer(row, ".");
st = new StringTokenizer(element, ".");
check = true;
while(st.hasMoreTokens()) {
next = st.nextToken();
if(!strow.hasMoreTokens()) {
break;
}
nextrow = strow.nextToken();
if(next.compareTo("X") != 0) {
int x = Integer.parseInt(next);
int y = Integer.parseInt(nextrow);
if(x != y) {
check = false;
break;
}
}
}
if(check) return true;
}
return false;
However, it is an expensive operation, particularly if the size of the string increases. Can you suggest to me another strategy or data structure to quickly perform this operation?
My solution is closely related to strings. However, we can consider other types for elements (e.g. array, list, tree node, etc)
Thanks to all for your answers. I have tried almost all the functions, and the bench:
myFunction: 0ms
hasMatch: 2ms
Stream API: 5ms
isIPMatch; 2ms
I think that the main problem of the regular expression is the time to create the pattern and match the strings.
You want to use Regex which is made exactly for tasks like this. Check out the demo.
22\.2\.\d+\.13
Java 8 and higher
You can use Stream API as of Java 8 to find at least one matching the Regex using Pattern and Matcher classes:
Set<String> set = ... // the set of Strings (can be any collection)
Pattern pattern = Pattern.compile("22\\.2\\.\\d+\\.13"); // compiled Pattern
boolean matches = set.stream() // Stream<String>
.map(pattern::matcher) // Stream<Matcher>
.anyMatch(Matcher::matches); // true if at least one matches
Java 7 and lower
The way is equal to Stream API: a short-circuit for-each loop with a break statement in case the match is found.
boolean matches = false;
Pattern pattern = Pattern.compile("22\\.2\\.\\d+\\.13");
for (String str: set) {
Matcher matcher = pattern.matcher(str);
if (matcher.matches()) {
matches = true;
break;
}
}
You can solve this by approaching the problem in a regex-based manner, as suggested by Nikolas Charalambidis (+1), or you can do it differently. To avoid being redundant with another answer, I will focus on an alternative approach here, using the split method.
public boolean isIPMatch(String pattern[], String input[]) {
if ((pattern == null) || (input == null) || (pattern.length <> input.length)) return false; //edge cases
for (int index = 0; index < pattern.length; index++) {
if ((!pattern[index].equals("X")) && (!pattern[index].equals(input[index]))) return false; //difference
}
return true; //everything matched
}
And you can call the method above in your loop, after converting the items to compare to String arrays via split.
For strings, regular expressions solve the task a lot better:
private boolean hasMatch(String[] haystack, String partial) {
String patternString = partial.replace("X", "[0-9]+").replace(".", "\\.");
// "22.2.X.13" becomes "22\\.2\\.[0-9]+\\.13"
Pattern p = Pattern.compile(patternString);
for (String s : haystack) {
if (p.matcher(s).matches()) return true;
}
return false;
}
For other types of objects, it depends on their structure.
If there is some kind of order, you could consider making your elements implement Comparable - and then you can place them into a TreeSet (or as keys in a TreeMap), which will always be kept sorted. This way, you can compare only against the elements that can match: mySortedSet.subSet(fromElement, toElement) returns only the elements between those two.
If there is no order, you will simply have to compare all elements against your "pattern".
Note that strings are comparable, but their default sorting order ignores the special semantics of your .-separators. So, with some care you can implement a treeset-based approach to make the search better-than-linear.
Other answers have already discussed using a regular expression by converting e.g. 22.2.X.13 to 22\.2\.\d+\.13 (don't forget to also escape the . or they mean "anything"). But while this will definitely be simpler and probably also a good bit faster, it does not lower the overall complexity. You still have to check each element in the set.
Instead, you might try to convert your set of IPs to a nested Map in this form:
{20: {8: {31: {13: null}}, 19: {12: {13: null}}}, 22: {2: {...}, 3: {...}}, ...}
(Of course, you should create this structure just once, and not for each search query.)
You can then write a recursive function match that works roughly as follows (pseudocode):
boolean match(ip: String, map: Map<String, Map<...>>) {
if (ip.empty) return true // done
first, rest = ip.splitfirst
if (first == "X") {
return map.values().any(submap -> match(rest, submap))
} else {
return first in map && match(rest, map[first])
}
}
This should reduce the complexity from O(n) to O(log n); more than that the more often you have to branch out, but at most O(n) for X.X.X.123 (X.X.X.X is trivial again). For small sets, a regular expression might still be faster, as it has less overhead, but for larger sets, this should be faster.

Hash Tables: Ransom Note hackerrank

Harold is a kidnapper who wrote a ransom note, but now he is worried it will be traced back to him through his handwriting. He found a magazine and wants to know if he can cut out whole words from it and use them to create an untraceable replica of his ransom note. The words in his note are case-sensitive and he must use only whole words available in the magazine. He cannot use substrings or concatenation to create the words he needs.
Given the words in the magazine and the words in the ransom note, print Yes if he can replicate his ransom note exactly using whole words from the magazine; otherwise, print No.
For example, the note is "Attack at dawn". The magazine contains only "attack at dawn". The magazine has all the right words, but there's a case mismatch. The answer is .
Sample Input 0
6 4
give me one grand today night
give one grand today
Sample Output 0
Yes
Sample Input 1
6 5
two times three is not four
two times two is four
Sample Output 1
No
My code 5/22 test cases failed :(
I can't figure out why 5 failed.
static void checkMagazine(String[] magazine, String[] note) {
int flag = 1;
Map<String, Integer> wordMap = new HashMap<>();
for(String word: magazine) {
if(!wordMap.containsKey(word)) {
wordMap.put(word, 1);
} else
wordMap.put(word,wordMap.get(word)+1);
}
for(String word: note){
if(!wordMap.containsKey(word)){
flag = 0;
break;
}
else wordMap.remove(word, wordMap.get(word));
}
if(flag == 0)
System.out.println("No");
else
System.out.println("Yes");
}
It's probably because instead of decrementing the count of the words in the magazine when you retrieve one, you're removing all counts of that word completely. Try this:
for(String word: note){
if(!(wordMap.containsKey(word) && wordMap.get(word) > 0)){
flag = 0;
break;
}
else wordMap.put(word, wordMap.get(word)-1);
}
wordMap is a frequency table and gives word counts.
However for every word in the note, you must decrease the word count instead of entirely removing the entry. Only when the word count reaches 0 one could remove the entry.
An other isssue is the case-sensitivity. Depending on the requirements you may need to convert all words to lowercase.
else {
wordMap.computeIfPresent(word, (k, v) -> v <= 1? null : v - 1);
}
This checks that the old value v is above 1 and then decreases it, or else returns a null value signaling to delete the entry.
The frequency counts can be done:
Map<String, Integer> wordMap = new HashMap<>();
for(String word: magazine) {
wordMap.merge(word, 1, Integer::sum);
}
I think, this implementation is simplier
static boolean checkMagazine(String[] magazine, String[] note) {
List<String> magazineCopy = new ArrayList<>(Arrays.asList(magazine));
for (String word : note)
{
if (magazineCopy.contains(word)) {
magazineCopy.remove(word);
continue;
}
return false;
}
return true;
}
I suppose your error is here:
else wordMap.remove(word, wordMap.get(word));
you are removing the word from the map, instead of decreasing the number of such words and only if the number reaches 0, you should remove the word from the map.
Python Solution
def checkMagazine(magazine, ransom):
magazine.sort()
ransom.sort()
for word in ransom:
if word not in magazine:
flag = False
break
else:
magazine.remove(word)
flag = True
if (flag):
print("Yes")
else:
print("No")

Is the time complexity of this code O(N^2)

This is a solution of mine for one of the problems in leetcode. Through my deductions, I concluded it to have an overall O(N^2) time complexity. However, I would like to get a confirmation on this just so that I don't continue making the same mistakes when it comes to judging the time/space complexity of an algorithm.
Oh, and the problem goes as follows:
Given an input string, reverse the string word by word.
e.g. "I am you" == "you am I"
The code is as follows:-
public String reverseWords(String s) {
//This solution is in assumption that I am restricted to a one-pass algorithm.
//This can also be done through a two-pass algorithm -- i.e. split the string and etc.
if(null == s)
return "";
//remove leading and trailing spaces
s = s.trim();
int lengthOfString = s.length();
StringBuilder sb = new StringBuilder();
//Keeps track of the number of characters that have passed.
int passedChars = 0;
int i = lengthOfString-1;
for(; i >= 0; i--){
if(s.charAt(i) == ' '){
//Appends the startOfWord and endOfWord according to passedChars.
sb.append(s.substring(i+1, (i+1+passedChars))).append(" ");
//Ignore additional space chars.
while(s.charAt(i-1) == ' '){
i--;
}
passedChars = 0;
}else{
passedChars++;
}
}
//Handle last reversed word that have been left out.
sb.append(s.substring(i+1, (i+1+passedChars)));
//return reversedString;
return sb.toString();
}
My reasoning for this being an O(N^2) algorithm:-
The loop = O(n)
StringBuilder.append = O(1)
Substring method = O(n) [as of Java 7]
On that note, if anyone else has a better solution than this, please feel free to share it! :)
I was aiming for a one-pass solution and therefore, opted out of splitting the string before the loop.
Appreciate the help!
EDIT: I meant to ask about the time complexity of the portion of the code that contains the loop. I apologize in advance if the question was misleading/confusing. The whole chunk of code is meant for clarification purposes. :)
Time complexity is O(n).
Each insertion (append(x)) to a StringBuilder is done in O(|x|), where |x| is the size of the input string you are appending. (independent of the state of the builder, on average).
Your algorithm iterates the entire string, and use String#substring() for each word in it. Since the words do not overlap, it means you do a substring() for each word one time, and append it to the builder (also once) - giving you 2|x| for each word x.
Summing it up, gives you
T(S) = |S| + sum{2|x| for each word x}
But since sum{|x| for each word x} <= |S|, this gives you total of:
T(S) = |S| + 2sum{|x| for each word x} = |S| + 2|S| = 3|S|
Since |S| is the size of the input (n), this is O(n)
Note that the important part is in jdk7, the substring() method is linear in the size of the output string, not the original one (you copy only the relevant part, not all of the string).
Here is an alternative solution which I believe may perform a little better.
public String reverseWords(String s) {
String[] array = s.split(" ");
int len = array.length;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < len; i++) {
sb.append(" ").append(array[len - i - 1]);
}
return sb.toString().trim();
}
Amit has already given you detailed explanation on the complexity computation, I would like to give you a simpler version.
In general, if we have nested loops, we consider the complexity to be O(N^2). But this is not the case always, as you have to do some activity n times for each nth part of input. E.g., if you have input of size 3, you have to do some action 3 times on each of the element. Then, you can say that your algorithm has O(n^2) complexity.
Since you are traversing and processing each part of your input string only once (even though you are using nested loops), complexity should be on the order of O(n). For proof, Amit has done quite a job.
Although, I would have used below code to reverse the order of words
String delim = " ";
String [] words = s.split(delim);
int wordCount = words.length;
for(int i = 0; i < wordCount / 2; i++) {
String temp = words[i];
words[i] = words[wordCount - i - 1];
words[wordCount - i - 1] = temp;
}
String result = Arrays.toString(words).replace(", ", delim).replaceAll("[\\[\\]]", "");

Faster String Matching/Iteration Method?

In the program I'm currently working on, there's one part that's taking a bit long. Basically, I have a list of Strings and one target phrase. As an example, let's say the target phrase is "inventory of finished goods". Now, after filtering out the stop word (of), I want to extract all Strings from the list that contains one of the three words: "inventory", "finished", and "goods". Right now, I implemented the idea as follows:
String[] targetWords; // contains "inventory", "finished", and "goods"
ArrayList<String> extractedStrings = new ArrayList<String>();
for (int i = 0; i < listOfWords.size(); i++) {
String[] words = listOfWords.get(i).split(" ");
outerloop:
for (int j = 0; j < words.length; j++) {
for (int k = 0; k < targetWords.length; k++) {
if (words[j].equalsIgnoreCase(targetWords[k])) {
extractedStrings.add(listOfWords.get(i));
break outerloop;
}
}
}
}
The list contains over 100k words, and with this it takes rounghly .4 to .8 seconds to complete the task for each target phrase. The things is, I have a lot of these target phrases to process, and the seconds really add up. Thus, I was wondering if anyone knew of a more efficient way to complete this task? Thanks for the help in advance!
Your list of 100k words could be added (once) to a HashSet. Rather than iterating through your list, use wordSet.contains() - a HashSet gives constant-time performance for this, so not affected by the size of the list.
You can take your giant list of words and add them to a hash map and then when your phrase comes in, just loop over the words in your phrase and check against the hash map. Currently you are doing a linear search and what I'm proposing would cut it down to a constant time search.
The key is minimizing lookups. Using this technique you would be effectively indexing your giant list of words for fast lookups.
You are passing trough each of the elements from targetWords, instead of checking for all words from targetWords simultaneously. In addition, you are splitting your list of words in each iteration without really needing it, creating overhead.
I would suggest that you combine your targetWords into one (compiled) regular expression:
(?xi) # turn on comments, use case insensitive matching
\b # word boundary, i.e. start/end of string, whitespace
( # begin of group containing 'inventory' or 'finished' or 'goods'
inventory|finished|goods # bar separates alternatives
) # end of group
\b # word boundary
Don't forget to double-quote the backspaces in your regular expression string.
import java.util.regex.*;
...
Pattern targetPattern = Pattern.compile("(?xi)\\b(inventory|finished|goods)\\b");
for (String singleString : listOfWords) {
if (targetPattern.matcher(singleString).find()) {
extractedStrings.add(singleString);
}
}
If you are not satisfied with the speed of regular expressions - although regular expression engines are usually optimized for performance - you need to roll your own high-speed multi-string search. The Aho–Corasick string matching algorithm is optimized for searching several fixed strings in text, but of course implementing this algorithm is quite some effort compared with simply creating a Pattern.
I'm a little confused to if you want the whole phrase or just single words from listOfWords. If you are trying to get the string from listOfWords if one of your target words is in the string this should work for you.
String[] targetWords= new String[]{"inventory", "finished", "goods"};
List<String> listOfWords = new ArrayList<String>();
// build lookup map
Map<String, ArrayList<String>> lookupMap = new HashMap<String, ArrayList<String>>();
for(String words : listOfWords) {
for(String word : words.split(" ")) {
if(lookupMap.get(word) == null) lookupMap.put(word, new ArrayList<String>());
lookupMap.get(word).add(words);
}
}
// find phrases
Set<String> extractedStrings = new HashSet<String>();
for(String target : targetWords) {
if(lookupMap.containsKey(target)) extractedStrings.addAll(lookupMap.get(target));
}
I would try to implement it with ExecutorService to parallelize search for each word.
http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/ExecutorService.html
For example with fixed thread pool size:
Executors.newFixedThreadPool(20);

Optimizing a simple search algorithm

I have been playing around a bit with a fairly simple, home-made search engine, and I'm now twiddling with some relevancy sorting code.
It's not very pretty, but I'm not very good when it comes to clever algorithms, so I was hoping I could get some advice :)
Basically, I want each search result to get scoring based on how many words match the search criteria. 3 points per exact word and one point for partial matches
For example, if I search for "winter snow", these would be the results:
winter snow => 6 points
winter snowing => 4 points
winterland snow => 4 points
winter sun => 3 points
winterland snowing => 2 points
Here's the code:
String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = 0;
for (String resultWord : resultWords) {
for (String searchWord : searchWords) {
if (resultWord.equalsIgnoreCase(searchWord))
score += 3;
else if (resultWord.toLowerCase().contains(searchWord.toLowerCase()))
score++;
}
}
Your code seems ok to me. I suggest little changes:
Since your are going through all possible combinations you might get the toLowerCase() of your back at the start.
Also, if an exact match already occurred, you don't need to perform another equals.
result = result.toLowerCase();
searchStr = searchStr.toLowerCase();
String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = 0;
for (String resultWord : resultWords) {
boolean exactMatch = false;
for (String searchWord : searchWords) {
if (!exactMatch && resultWord.equals(searchWord)) {
exactMatch = true;
score += 3;
} else if (resultWord.contains(searchWord))
score++;
}
}
Of course, this is a very basic level. If you are really interested in this area of computer science and want to learn more about implementing search engines start with these terms:
Natural Language Processing
Information retrieval
Text mining
stemming
for acronyms case sensitivity is important, i.e. SUN; any word that matches both content and case must be weighted more than 3 points (5 or 7)?
use the strategy design pattern
For example, consider this naive score model:
interface ScoreModel {
int startingScore();
int partialMatch();
int exactMatch();
}
...
int search(String result, String searchStr, ScoreModel model) {
String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = model.startingScore();
for (String resultWord : resultWords) {
for (String searchWord : searchWords) {
if (resultWord.equalsIgnoreCase(searchWord)) {
score += model.exactMatch();
} else if (resultWord.toLowerCase().contains(searchWord.toLowerCase())) {
score += model.partialMatch();
}
}
}
return score;
}
Basic optimization can be done by preprocessing your database: don't split entries into words every time.
Build words list (prefer hash or binary tree to speedup search in the list) for every entry during adding it into DB, remove all too short words, lower case and store this data for further usage.
Do the same actions with the search string on search start (split, lower case, cleanup) and use this words list for comparing with every entry words list.
1) You can sort searchWords first. You could break out of the loop once your result word was alphabetically after your current search word.
2) Even better, sort both, then walk along both lists simultaneously to find where any matches occur.
You can use regular expressions for finding patterns and lengths of matched patterns (for latter classification/scoring).

Categories