Related
Given is a String word and a String array book that contains some strings. The program should give out the number of possibilities to create word only using elements in book. An element can be used as many times as we want and the program must terminate in under 6 seconds.
For example, input:
String word = "stackoverflow";
String[] book = new String[9];
book[0] = "st";
book[1] = "ck";
book[2] = "CAG";
book[3] = "low";
book[4] = "TC";
book[5] = "rf";
book[6] = "ove";
book[7] = "a";
book[8] = "sta";
The output should be 2, since we can create "stackoverflow" in two ways:
1: "st" + "a" + "ck" + "ove" + "rf" + "low"
2: "sta" + "ck" + "ove" + "rf" + "low"
My implementation of the program only terminates in the required time if word is relatively small (<15 characters). However, as I mentioned before, the running time limit for the program is 6 seconds and it should be able to handle very large word strings (>1000 characters). Here is an example of a large input.
Here is my code:
1) the actual method:
input: a String word and a String[] book
output: the number of ways word can be written only using strings in book
public static int optimal(String word, String[] book){
int count = 0;
List<List<String>> allCombinations = allSubstrings(word);
List<String> empty = new ArrayList<>();
List<String> wordList = Arrays.asList(book);
for (int i = 0; i < allCombinations.size(); i++) {
allCombinations.get(i).retainAll(wordList);
if (!sumUp(allCombinations.get(i), word)) {
allCombinations.remove(i);
allCombinations.add(i, empty);
}
else count++;
}
return count;
}
2) allSubstrings():
input: a String input
output: A list of lists, each containing a combination of substrings that add up to input
static List<List<String>> allSubstrings(String input) {
if (input.length() == 1) return Collections.singletonList(Collections.singletonList(input));
List<List<String>> result = new ArrayList<>();
for (List<String> temp : allSubstrings(input.substring(1))) {
List<String> firstList = new ArrayList<>(temp);
firstList.set(0, input.charAt(0) + firstList.get(0));
if (input.startsWith(firstList.get(0), 0)) result.add(firstList);
List<String> l = new ArrayList<>(temp);
l.add(0, input.substring(0, 1));
if (input.startsWith(l.get(0), 0)) result.add(l);
}
return result;
}
3.) sumup():
input: A String list input and a String expected
output: true if the elements in input add up to expected
public static boolean sumUp (List<String> input, String expected) {
String x = "";
for (int i = 0; i < input.size(); i++) {
x = x + input.get(i);
}
if (expected.equals(x)) return true;
return false;
}
I've figured out what I was doing wrong in my previous answer: I wasn't using memoization, so I was redoing an awful lot of unnecessary work.
Consider a book array {"a", "aa", "aaa"}, and a target word "aaa". There are four ways to construct this target:
"a" + "a" + "a"
"aa" + "a"
"a" + "aa"
"aaa"
My previous attempt would have walk through all four, separately. But instead, one can observe that:
There is 1 way to construct "a"
You can construct "aa" in 2 ways, either "a" + "a" or using "aa" directly.
You can construct "aaa" either by using "aaa" directly (1 way); or "aa" + "a" (2 ways, since there are 2 ways to construct "aa"); or "a" + "aa" (1 way).
Note that the third step here only adds a single additional string to a previously-constructed string, for which we know the number of ways it can be constructed.
This suggests that if we count the number of ways in which a prefix of word can be constructed, we can use that to trivially calculate the number of ways a longer prefix by adding just one more string from book.
I defined a simple trie class, so you can quickly look up prefixes of the book words that match at any given position in word:
class TrieNode {
boolean word;
Map<Character, TrieNode> children = new HashMap<>();
void add(String s, int i) {
if (i == s.length()) {
word = true;
} else {
children.computeIfAbsent(s.charAt(i), k -> new TrieNode()).add(s, i + 1);
}
}
}
For each letter in s, this creates an instance of TrieNode, and stores the TrieNode for the subsequent characters etc.
static long method(String word, String[] book) {
// Construct a trie from all the words in book.
TrieNode t = new TrieNode();
for (String b : book) {
t.add(b, 0);
}
// Construct an array to memoize the number of ways to construct
// prefixes of a given length: result[i] is the number of ways to
// construct a prefix of length i.
long[] result = new long[word.length() + 1];
// There is only 1 way to construct a prefix of length zero.
result[0] = 1;
for (int m = 0; m < word.length(); ++m) {
if (result[m] == 0) {
// If there are no ways to construct a prefix of this length,
// then just skip it.
continue;
}
// Walk the trie, taking the branch which matches the character
// of word at position (n + m).
TrieNode tt = t;
for (int n = 0; tt != null && n + m <= word.length(); ++n) {
if (tt.word) {
// We have reached the end of a word: we can reach a prefix
// of length (n + m) from a prefix of length (m).
// Increment the number of ways to reach (n+m) by the number
// of ways to reach (m).
// (Increment, because there may be other ways).
result[n + m] += result[m];
if (n + m == word.length()) {
break;
}
}
tt = tt.children.get(word.charAt(n + m));
}
}
// The number of ways to reach a prefix of length (word.length())
// is now stored in the last element of the array.
return result[word.length()];
}
For the very long input given by OP, this gives output:
$ time java Ideone
2217093120
real 0m0.126s
user 0m0.146s
sys 0m0.036s
Quite a bit faster than the required 6 seconds - and this includes JVM startup time too.
Edit: in fact, the trie isn't necessary. You can simply replace the "Walk the trie" loop with:
for (String b : book) {
if (word.regionMatches(m, b, 0, b.length())) {
result[m + b.length()] += result[m];
}
}
and it performs slower, but still way faster than 6s:
2217093120
real 0m0.173s
user 0m0.226s
sys 0m0.033s
A few observations:
x = x + input.get(i);
As you are looping, using String+ isn't a good idea. Use a StringBuilder and append to that within the loop, and in the end return builder.toString(). Or you follow the idea from Andy. There is no need to merge strings, you already know the target word. See below.
Then: List implies that adding/removing elements might be costly. So see if you can get rid of that part, and if it would be possible to use maps, sets instead.
Finally: the real point would be to look into your algorithm. I would try to work "backwards". Meaning: first identify those array elements that actually occur in your target word. You can ignore all others right from start.
Then: look at all array entries that **start*+ your search word. In your example you can notice that there are just two array elements that fit. And then work your way from there.
My first observation would be that you don't actually need to build anything: you know what string you are trying to construct (e.g. stackoverflow), so all you really need to keep track of is how much of that string you have matched so far. Call this m.
Next, having matched m characters, provided m < word.length(), you need to choose a next string from book which matches the portion of word from m to m + nextString.length().
You could do this by checking each string in turn:
if (word.matches(m, nextString, 0, nextString.length()) { ...}
But you can do better, by determining strings that can't match in advance: the next string you append will have the following properties:
word.charAt(m) == nextString.charAt(0) (the next characters match)
m + nextString.length() <= word.length() (adding the next string shouldn't make the constructed string longer than word)
So, you can cut down the potential words from book that you might check by constructing a map of letters to words that start with that (point 1); and if you store the words with the same starting letter in increasing length order, you can stop checking that letter as soon as the length gets too big (point 2).
You can construct a map once and reuse:
Map<Character, List<String>> prefixMap =
Arrays.asList(book).stream()
.collect(groupingBy(
s -> s.charAt(0),
collectingAndThen(
toList(),
ss -> {
ss.sort(comparingInt(String::length));
return ss;
})));
You can count the number of ways recursively, without constructing any additional objects (*):
int method(String word, String[] book) {
return method(word, 0, /* construct map as above */);
}
int method(String word, int m, Map<Character, List<String>> prefixMap) {
if (m == word.length()) {
return 1;
}
int result = 0;
for (String nextString : prefixMap.getOrDefault(word.charAt(m), emptyList())) {
if (m + nextString.length() > word.length()) {
break;
}
// Start at m+1, because you already know they match at m.
if (word.regionMatches(m + 1, nextString, 1, nextString.length()-1)) {
// This is a potential match!
// Make a recursive call.
result += method(word, m + nextString.length(), prefixMap);
}
}
return result;
}
(*) This may construct new instances of Character, because of the boxing of the word.charAt(m): cached instances are guaranteed to be used for chars in the range 0-127 only. There are ways to work around this, but they would only clutter the code.
I think you are already doing a pretty good job at optimizing your application. In addition to the answer by GhostCat here are a few suggestions of my own:
public static int optimal(String word, String[] book){
int count = 0;
List<List<String>> allCombinations = allSubstrings(word);
List<String> wordList = Arrays.asList(book);
for (int i = 0; i < allCombinations.size(); i++)
{
/*
* allCombinations.get(i).retainAll(wordList);
*
* There is no need to retrieve the list element
* twice, just set it in a local variable
*/
java.util.List<String> combination = allCombinations.get(i);
combination.retainAll(wordList);
/*
* Since we are only interested in the count here
* there is no need to remove and add list elements
*/
if (sumUp(combination, word))
{
/*allCombinations.remove(i);
allCombinations.add(i, empty);*/
count++;
}
/*else count++;*/
}
return count;
}
public static boolean sumUp (List<String> input, String expected) {
String x = "";
for (int i = 0; i < input.size(); i++) {
x = x + input.get(i);
}
// No need for if block here, just return comparison result
/*if (expected.equals(x)) return true;
return false;*/
return expected.equals(x);
}
And since you are interested in seeing the execution time of your method I would recommend implementing a benchmarking system of some sort. Here is a quick mock-up:
private static long benchmarkOptima(int cycles, String word, String[] book) {
long totalTime = 0;
for (int i = 0; i < cycles; i++)
{
long startTime = System.currentTimeMillis();
int a = optimal(word, book);
long executionTime = System.currentTimeMillis() - startTime;
totalTime += executionTime;
}
return totalTime / cycles;
}
public static void main(String[] args)
{
String word = "stackoverflow";
String[] book = new String[] {
"st", "ck", "CAG", "low", "TC",
"rf", "ove", "a", "sta"
};
int result = optimal(word, book);
final int cycles = 50;
long averageTime = benchmarkOptima(cycles, word, book);
System.out.println("Optimal result: " + result);
System.out.println("Average execution time - " + averageTime + " ms");
}
Output
2
Average execution time - 6 ms
Note: The implementation is getting stuck in the test case mentioned by #user1221, working on it.
What I could think of is a Trie based approach that is O(sum of length of words in dict) space. Time is not optimal.
Procedure:
Build a Trie of all the words in the dictionary. This is a pre-processing task that will take O(sum of lengths of all strings in dict).
We try finding the string that you want to make in the trie, with a twist. We start with searching a prefix of the string. If we get a prefix in the trie, we start the search from the top recursively and continue to look for more prefixes.
When we reach the end of out string i.e. stackoverflow, we check if we arrived at the end of any string, if yes, then we reached a valid combination of this string. we count this while going back up the recursion.
eg:
In the above case, we use the dict as {"st", "sta", "a", "ck"}
We construct our trie ($ is the sentinel char, i.e. a char which is not in the dict):
$___s___t.___a.
|___a.
|___c___k.
the . represents that a word in the dict ends at that position.
We try to find the no of constructions of stack.
We start searching stack in the trie.
depth=0
$___s(*)___t.___a.
|___a.
|___c___k.
We see that we are at the end of one word, we start a new search with the remaining string ack from the top.
depth=0
$___s___t(*).___a.
|___a.
|___c___k.
Again we are at the end of one word in the dict. We start a new search for ck.
depth=1
$___s___t.___a.
|___a(*).
|___c___k.
depth=2
$___s___t.___a.
|___a.
|___c(*)___k.
We reach the end of stack and end of a word in the dict, hence we have 1 valid representation of stack.
depth=2
$___s___t.___a.
|___a.
|___c___k(*).
We go back to the caller of depth=2
No next char is available, we return to the caller of depth=1.
depth=1
$___s___t.___a.
|___a(*, 1).
|___c___k.
depth=0
$___s___t(*, 1).___a.
|___a.
|___c___k.
We move to next char. We see that we reached the end of one word in the dict, we launch a new search for ck in the dict.
depth=0
$___s___t.___a(*, 1).
|___a.
|___c___k.
depth=1
$___s___t.___a.
|___a.
|___c(*)___k.
We reach the end of the stack and a work in the dict, so another valid representation. We go back to the caller of depth=1
depth=1
$___s___t.___a.
|___a.
|___c___k(*, 1).
There are no more chars to proceed, we return with the result 2.
depth=0
$___s___t.___a(*, 2).
|___a.
|___c___k.
Note: The implementation is in C++, shouldn't be too hard to convert to Java and this implementation assumes that all chars are lowercase, it's trivial to extend it to both cases.
Sample code (full version):
/**
Node *base: head of the trie
Node *h : current node in the trie
string s : string to search
int idx : the current position in the string
*/
int count(Node *base, Node *h, string s, int idx) {
// step 3: found a valid combination.
if (idx == s.size()) return h->end;
int res = 0;
// step 2: we recursively start a new search.
if (h->end) {
res += count(base, base, s, idx);
}
// move ahead in the trie.
if (h->next[s[idx] - 'a'] != NULL) {
res += count(base, h->next[s[idx] - 'a'], s, idx + 1);
}
return res;
}
def cancons(target,wordbank, memo={}):
if target in memo:
return memo[target]
if target =='':
return 1
total_count =0
for word in wordbank:
if target.startswith(word):
l= len(word)
number_of_way=cancons(target[l:],wordbank,memo)
total_count += number_of_way
memo[target]= total_count
return total_count
if __name__ == '__main__':
word = "stackoverflow";
String= ["st", "ck","CAG","low","TC","rf","ove","a","sta"]
b=cancons(word,String,memo={})
print(b)
Trying to search for patterns of letters in a file, the pattern is entered by a user and comes out as a String, so far I've got it to find the first letter by unsure how to make it test to see if the next letter also matches the pattern.
This is the loop I currently have. any help would be appreciated
public void exactSearch(){
if (pattern==null){UI.println("No pattern");return;}
UI.println("===================\nExact searching for "+patternString);
int j = 0 ;
for(int i=0; i<data.size(); i++){
if(patternString.charAt(i) == data.get(i) )
j++;
UI.println( "found at " + j) ;
}
}
You need to iterate over the first string until you find the first character of the other string. From there, you can create an inner loop and iterate on both simultaneously, like you did.
Hint: be sure to look watch for boundaries as the strings might not be of the same size.
You can try this :-
String a1 = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = 0;
while(foundIndex != -1) {
foundIndex = a1.indexOf(pattern,foundIndex);
if(foundIndex != -1)
{
System.out.println(foundIndex);
foundIndex += 1;
}
}
indexOf - first parameter is the pattern string,
second parameter is starting index from where we have to search.
If pattern is found, it will return the starting index from where the pattern matched.
If pattern is not found, indexOf will return -1.
String data = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = data.indexOf(pattern);
while (foundIndex > -1) {
System.out.println("Match found at: " + foundIndex);
foundIndex = data.indexOf(pattern, foundIndex + pattern.length());
}
Based on your request, you can use this algorithm to search for your positions:
1) We check if we reach at the end of the string, to avoid the invalidIndex error, we verify if the remaining substring's size is smaller than the pattern's length.
2) We calculate the substring at each iteration and we verify the string with the pattern.
List<Integer> positionList = new LinkedList<>();
String inputString = "AAACABCCCABC";
String pattern = "ABC";
for (int i = 0 ; i < inputString.length(); i++) {
if (inputString.length() - i < pattern.length()){
break;
}
String currentSubString = inputString.substring(i, i + pattern.length());
if (currentSubString.equals(pattern)){
positionList.add(i);
}
}
for (Integer pos : positionList) {
System.out.println(pos); // Positions : 4 and 9
}
EDIT :
Maybe it can be optimized, not to use a Collection for this simple task, but I used a LinkedList to write a quicker approach.
I was recently in an interview and they asked me the following question:
Write a function to return true if a string matches a pattern, false
otherwise
Pattern: 1 character per item, (a-z), input: space delimited string
This was my solution for the first problem:
static boolean isMatch(String pattern, String input) {
char[] letters = pattern.toCharArray();
String[] split = input.split("\\s+");
if (letters.length != split.length) {
// early return - not possible to match if lengths aren't equal
return false;
}
Map<String, Character> map = new HashMap<>();
// aaaa test test test1 test1
boolean used[] = new boolean[26];
for (int i = 0; i < letters.length; i++) {
Character existing = map.get(split[i]);
if (existing == null) {
// put into map if not found yet
if (used[(int)(letters[i] - 'a')]) {
return false;
}
used[(int)(letters[i] - 'a')] = true;
map.put(split[i], letters[i]);
} else {
// doesn't match - return false
if (existing != letters[i]) {
return false;
}
}
}
return true;
}
public static void main(String[] argv) {
System.out.println(isMatch("aba", "blue green blue"));
System.out.println(isMatch("aba", "blue green green"));
}
The next part of the problem stumped me:
With no delimiters in the input, write the same function.
eg:
isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true
I wrote a bruteforce solution (generating all possible splits of the input string of size letters.length and checking in turn against isMatch) but the interviewer said it wasn't optimal.
I have no idea how to solve this part of the problem, is this even possible or am I missing something?
They were looking for something with a time complexity of O(M x N ^ C), where M is the length of the pattern and N is the length of the input, C is some constant.
Clarifications
I'm not looking for a regex solution, even if it works.
I'm not looking for the naive solution that generates all possible splits and checks them, even with optimization since that'll always be exponential time.
It is possible to optimize a backtracking solution. Instead of generating all splits first and then checking that it is a valid one, we can check it "on fly". Let's assume that we have already split a prefix(with length p) of the initial string and have matched i characters from the pattern. Let's take look at the i + 1 character.
If there is a string in the prefix that corresponds to the i + 1 letter, we should just check that a substring that starts at the position p + 1 is equal to it. If it is, we just proceed to i + 1 and p + the length of this string. Otherwise, we can kill this branch.
If there is no such string, we should try all substrings that start in the position p + 1 and end somewhere after it.
We can also use the following idea to reduce the number of branches in your solution: we can estimate the length of the suffix of the pattern which has not been processed yet(we know the length for the letters that already stand for some strings, and we know a trivial lower bound of the length of a string for any letter in the pattern(it is 1)). It allows us to kill a branch if the remaining part of the initial string is too short to match a the rest of the pattern.
This solution still has an exponential time complexity, but it can work much faster than generating all splits because invalid solutions can be thrown away much earlier, so the number of reachable states can reduce significantly.
I feel like this is cheating, and I'm not convinced the capture group and reluctant quantifier will do the right thing. Or maybe they're looking to see if you can recognize that, because of how quantifiers work, matching is ambiguous.
boolean matches(String s, String pattern) {
StringBuilder patternBuilder = new StringBuilder();
Map<Character, Integer> backreferences = new HashMap<>();
int nextBackreference = 1;
for (int i = 0; i < pattern.length(); i++) {
char c = pattern.charAt(i);
if (!backreferences.containsKey(c)) {
backreferences.put(c, nextBackreference++);
patternBuilder.append("(.*?)");
} else {
patternBuilder.append('\\').append(backreferences.get(c));
}
}
return s.matches(patternBuilder.toString());
}
You could improve on brute force by first assuming token lengths, and checking that the sum of token lengths equals the length of the test string. That would be quicker than pattern matching each time. Still very slow as number of unique tokens increases however.
UPDATE:
Here is my solution. Based it off of the explanation I made before.
import com.google.common.collect.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.math3.util.Combinations;
import java.util.*;
/**
* Created by carlos on 2/14/15.
*/
public class PatternMatcher {
public static boolean isMatch(char[] pattern, String searchString){
return isMatch(pattern, searchString, new TreeMap<Integer, Pair<Integer, Integer>>(), Sets.newHashSet());
}
private static boolean isMatch(char[] pattern, String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution, Set<String> mappedStrings) {
List<Integer> occurrencesOfCharacterInPattern = getNextUnmappedPatternOccurrences(candidateSolution, pattern);
if(occurrencesOfCharacterInPattern.size() == 0)
return isValidSolution(candidateSolution, searchString, pattern, mappedStrings);
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = sectionsOfUnmappedStrings(searchString, candidateSolution);
if(sectionsOfUnmappedStrings.size() == 0)
return false;
String firstUnmappedString = substring(searchString, sectionsOfUnmappedStrings.get(0));
for (int substringSize = 1; substringSize <= firstUnmappedString.length(); substringSize++) {
String candidateSubstring = firstUnmappedString.substring(0, substringSize);
if(mappedStrings.contains(candidateSubstring))
continue;
List<Pair<Integer, Integer>> listOfAllOccurrencesOfSubstringInString = Lists.newArrayList();
for (int currentIndex = 0; currentIndex < sectionsOfUnmappedStrings.size(); currentIndex++) {
Pair<Integer,Integer> currentUnmappedSection = sectionsOfUnmappedStrings.get(currentIndex);
List<Pair<Integer, Integer>> occurrencesOfSubstringInString =
findAllInstancesOfSubstringInString(searchString, candidateSubstring,
currentUnmappedSection);
for(Pair<Integer,Integer> possibleAddition:occurrencesOfSubstringInString) {
listOfAllOccurrencesOfSubstringInString.add(possibleAddition);
}
}
if(listOfAllOccurrencesOfSubstringInString.size() < occurrencesOfCharacterInPattern.size())
return false;
Iterator<int []> possibleSolutionIterator =
new Combinations(listOfAllOccurrencesOfSubstringInString.size(),
occurrencesOfCharacterInPattern.size()).iterator();
iteratorLoop:
while(possibleSolutionIterator.hasNext()) {
Set<String> newMappedSets = Sets.newHashSet(mappedStrings);
newMappedSets.add(candidateSubstring);
TreeMap<Integer,Pair<Integer,Integer>> newCandidateSolution = Maps.newTreeMap();
// why doesn't Maps.newTreeMap(candidateSolution) work?
newCandidateSolution.putAll(candidateSolution);
int [] possibleSolutionIndexSet = possibleSolutionIterator.next();
for(int i = 0; i < possibleSolutionIndexSet.length; i++) {
Pair<Integer, Integer> candidatePair = listOfAllOccurrencesOfSubstringInString.get(possibleSolutionIndexSet[i]);
//if(candidateSolution.containsValue(Pair.of(0,1)) && candidateSolution.containsValue(Pair.of(9,10)) && candidateSolution.containsValue(Pair.of(18,19)) && listOfAllOccurrencesOfSubstringInString.size() == 3 && candidateSolution.size() == 3 && possibleSolutionIndexSet[0]==0 && possibleSolutionIndexSet[1] == 2){
if (makesSenseToInsert(newCandidateSolution, occurrencesOfCharacterInPattern.get(i), candidatePair))
newCandidateSolution.put(occurrencesOfCharacterInPattern.get(i), candidatePair);
else
break iteratorLoop;
}
if (isMatch(pattern, searchString, newCandidateSolution,newMappedSets))
return true;
}
}
return false;
}
private static boolean makesSenseToInsert(TreeMap<Integer, Pair<Integer, Integer>> newCandidateSolution, Integer startIndex, Pair<Integer, Integer> candidatePair) {
if(newCandidateSolution.size() == 0)
return true;
if(newCandidateSolution.floorEntry(startIndex).getValue().getRight() > candidatePair.getLeft())
return false;
Map.Entry<Integer, Pair<Integer, Integer>> ceilingEntry = newCandidateSolution.ceilingEntry(startIndex);
if(ceilingEntry !=null)
if(ceilingEntry.getValue().getLeft() < candidatePair.getRight())
return false;
return true;
}
private static boolean isValidSolution( Map<Integer, Pair<Integer, Integer>> candidateSolution,String searchString, char [] pattern, Set<String> mappedStrings){
List<Pair<Integer,Integer>> values = Lists.newArrayList(candidateSolution.values());
return areIntegersConsecutive(Lists.newArrayList(candidateSolution.keySet())) &&
arePairsConsecutive(values) &&
values.get(values.size() - 1).getRight() == searchString.length() &&
patternsAreUnique(pattern,mappedStrings);
}
private static boolean patternsAreUnique(char[] pattern, Set<String> mappedStrings) {
Set<Character> uniquePatterns = Sets.newHashSet();
for(Character character:pattern)
uniquePatterns.add(character);
return uniquePatterns.size() == mappedStrings.size();
}
private static List<Integer> getNextUnmappedPatternOccurrences(Map<Integer, Pair<Integer, Integer>> candidateSolution, char[] searchArray){
List<Integer> allMappedIndexes = Lists.newLinkedList(candidateSolution.keySet());
if(allMappedIndexes.size() == 0){
return occurrencesOfCharacterInArray(searchArray,searchArray[0]);
}
if(allMappedIndexes.size() == searchArray.length){
return Lists.newArrayList();
}
for(int i = 0; i < allMappedIndexes.size()-1; i++){
if(!areIntegersConsecutive(allMappedIndexes.get(i),allMappedIndexes.get(i+1))){
return occurrencesOfCharacterInArray(searchArray,searchArray[i+1]);
}
}
List<Integer> listOfNextUnmappedPattern = Lists.newArrayList();
listOfNextUnmappedPattern.add(allMappedIndexes.size());
return listOfNextUnmappedPattern;
}
private static String substring(String string, Pair<Integer,Integer> bounds){
try{
string.substring(bounds.getLeft(),bounds.getRight());
}catch (StringIndexOutOfBoundsException e){
System.out.println();
}
return string.substring(bounds.getLeft(),bounds.getRight());
}
private static List<Pair<Integer, Integer>> sectionsOfUnmappedStrings(String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution) {
if(candidateSolution.size() == 0) {
return Lists.newArrayList(Pair.of(0, searchString.length()));
}
List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = Lists.newArrayList();
List<Pair<Integer,Integer>> allMappedPairs = Lists.newLinkedList(candidateSolution.values());
// Dont have to worry about the first index being mapped because of the way the first candidate solution is made
for(int i = 0; i < allMappedPairs.size() - 1; i++){
if(!arePairsConsecutive(allMappedPairs.get(i), allMappedPairs.get(i + 1))){
Pair<Integer,Integer> candidatePair = Pair.of(allMappedPairs.get(i).getRight(), allMappedPairs.get(i + 1).getLeft());
sectionsOfUnmappedStrings.add(candidatePair);
}
}
Pair<Integer,Integer> lastMappedPair = allMappedPairs.get(allMappedPairs.size() - 1);
if(lastMappedPair.getRight() != searchString.length()){
sectionsOfUnmappedStrings.add(Pair.of(lastMappedPair.getRight(),searchString.length()));
}
return sectionsOfUnmappedStrings;
}
public static boolean areIntegersConsecutive(List<Integer> integers){
for(int i = 0; i < integers.size() - 1; i++)
if(!areIntegersConsecutive(integers.get(i),integers.get(i+1)))
return false;
return true;
}
public static boolean areIntegersConsecutive(int left, int right){
return left == (right - 1);
}
public static boolean arePairsConsecutive(List<Pair<Integer,Integer>> pairs){
for(int i = 0; i < pairs.size() - 1; i++)
if(!arePairsConsecutive(pairs.get(i), pairs.get(i + 1)))
return false;
return true;
}
public static boolean arePairsConsecutive(Pair<Integer, Integer> left, Pair<Integer, Integer> right){
return left.getRight() == right.getLeft();
}
public static List<Integer> occurrencesOfCharacterInArray(char[] searchArray, char searchCharacter){
assert(searchArray.length>0);
List<Integer> occurrences = Lists.newLinkedList();
for(int i = 0;i<searchArray.length;i++){
if(searchArray[i] == searchCharacter)
occurrences.add(i);
}
return occurrences;
}
public static List<Pair<Integer,Integer>> findAllInstancesOfSubstringInString(String searchString, String substring, Pair<Integer,Integer> bounds){
String string = substring(searchString,bounds);
assert(StringUtils.isNoneBlank(substring,string));
int lastIndex = 0;
List<Pair<Integer,Integer>> listOfOccurrences = Lists.newLinkedList();
while(lastIndex != -1){
lastIndex = string.indexOf(substring,lastIndex);
if(lastIndex != -1){
int newIndex = lastIndex + substring.length();
listOfOccurrences.add(Pair.of(lastIndex + bounds.getLeft(), newIndex + bounds.getLeft()));
lastIndex = newIndex;
}
}
return listOfOccurrences;
}
}
It works with the cases provided, but is not thoroughly tested. Let me know if there are any mistakes.
ORIGINAL RESPONSE:
Assuming your string you are searching can have arbitrary length tokens (which some of your examples do) then:
You want to start trying to break your string into parts that match the pattern. Looking for contradictions along the way to cut down on your search tree.
When you start processing you're going to select N characters of the beginning of the string. Now, go and see if you can find that substring in the rest of the string. If you can't then it can't possibly be a solution. If you can then your string looks something like this
(N characters)<...>[(N characters)<...>] where either one of the <...> contains 0+ characters and aren't necessarily the same substring. And whats inside of [] could repeat a number of times equal to the number of times (N characters) appears in the string.
Now, you have the first letter of your pattern matched, your not sure if the rest of the pattern matches, but you can basically re-use this algorithm (with modifications) to interrogate the <...> parts of the string.
You would do this for N = 1,2,3,4...
Make sense?
I'll work an example (which doesn't cover all cases, but hopefully illustrates) Note, when i'm referring to substrings in the pattern i'll use single quotes and when i'm referring to substrings of the string i'll use double quotes.
isMatch("ababac", "bluegreenbluegreenbluewhite")
Ok, 'a' is my first pattern.
for N = 1 i get the string "b"
where is "b" in the search string?
bluegreenbluegreenbluewhite.
Ok, so at this point this string MIGHT match with "b" being the pattern 'a'. Lets see if we can do the same with the pattern 'b'. Logically, 'b' MUST be the entire string "luegreen" (because its squeezed between two consecutive 'a' patterns) then I check in between the 2nd and 3rd 'a'. YUP, its "luegreen".
Ok, so far i've matched all but the 'c' of my pattern. Easy case, its the rest of the string. It matches.
This is basically writing a Perl regex parser. ababc = (.+)(.+)(\1)(\2)(.+). So you just have to convert it to a Perl regex
Here's a sample snippet of my code:
public static final boolean isMatch(String patternStr, String input) {
// Initial Check (If all the characters in the pattern string are unique, degenerate case -> immediately return true)
char[] patt = patternStr.toCharArray();
Arrays.sort(patt);
boolean uniqueCase = true;
for (int i = 1; i < patt.length; i++) {
if (patt[i] == patt[i - 1]) {
uniqueCase = false;
break;
}
}
if (uniqueCase) {
return true;
}
String t1 = patternStr;
String t2 = input;
if (patternStr.length() == 0 && input.length() == 0) {
return true;
} else if (patternStr.length() != 0 && input.length() == 0) {
return false;
} else if (patternStr.length() == 0 && input.length() != 0) {
return false;
}
int count = 0;
StringBuffer sb = new StringBuffer();
char[] chars = input.toCharArray();
String match = "";
// first read for the first character pattern
for (int i = 0; i < chars.length; i++) {
sb.append(chars[i]);
count++;
if (!input.substring(count, input.length()).contains(sb.toString())) {
match = sb.delete(sb.length() - 1, sb.length()).toString();
break;
}
}
if (match.length() == 0) {
match = t2;
}
// based on that character, update patternStr and input string
t1 = t1.replace(String.valueOf(t1.charAt(0)), "");
t2 = t2.replace(match, "");
return isMatch(t1, t2);
}
I basically decided to first parse the pattern string and determine if there are any matching characters in the pattern string. For example in "aab" "a" is used twice in the pattern string and so "a" cannot map to something else. Otherwise, if there are no matching characters in a string such as "abc", it won't matter what our input string is since the pattern is unique and so it doesn't matter what each pattern character matches to (degenerative case).
If there are matching characters in the pattern string, then I would begin to check what each string matches to. Unfortunately, without knowing the delimiter I wouldn't know how long each string would be. Instead, I just decided to parse 1 character at a time and check if the other parts of the string contains the same string and continue adding characters to the buffer letter by letter until the buffer string cannot be found in the input string. Once I have the string determined, it's now in the buffer I would simply delete all the matched strings in the input string and the character pattern from the pattern string then recurse.
Apologies if my explanation wasn't very clear, I hope my code can be clear though.
Assume I have the following list of string objects:
ABC1, ABC2, ABC_Whatever
What's the most efficient way to extract the left most common characters from this list ? So I'd get ABC in my case.
StringUtils.getCommonPrefix(String... strs) from Apache Commons Lang.
This will work for you
public static void main(String args[]) {
String commonInFirstTwo=greatestCommon("ABC1","ABC2");
String commonInLastTwo=greatestCommon("ABC2","ABC_Whatever");
System.out.println(greatestCommon(commonInFirstTwo,commonInLastTwo));
}
public static String greatestCommon(String a, String b) {
int minLength = Math.min(a.length(), b.length());
for (int i = 0; i < minLength; i++) {
if (a.charAt(i) != b.charAt(i)) {
return a.substring(0, i);
}
}
return a.substring(0, minLength);
}
You hash all the substrings of the words in the given list and keep track of those substrings. The one with the maximum occurrences is the one you want. Here is a sample implementation. It returns the most common substring
static String mostCommon(List<String> list) {
Map<String, Integer> word2Freq = new HashMap<String, Integer>();
String maxFreqWord = null;
int maxFreq = 0;
for (String word : list) {
for (int i = 0; i < word.length(); ++i) {
String sub = word.substring(0, i + 1);
Integer f = word2Freq.get(sub);
if (f == null) {
f = 0;
}
word2Freq.put(sub, f + 1);
if (f + 1 > maxFreq) {
if (maxFreqWord == null || maxFreqWord.length() < sub.length()) {
maxFreq = f + 1;
maxFreqWord = sub;
}
}
}
}
return maxFreqWord;
}
The above implementation may not suffice if you more than one common substring. Use the map within it.
System.out.println(mostCommon(Arrays.asList("ABC1", "ABC2", "ABC_Whatever")));
System.out.println(mostCommon(Arrays.asList("ABCDEFG1", "ABGG2", "ABC11_Whatever")));
Returns
ABC
AB
Your problem is just a rephrase of the standard problem of finding the longest common prefix
If you know what the common characters are, then you could check if the other strings contain those characters by using the .contains() method.
If you're willing to use a third party library, then the following using jOOλ generates that prefix for you:
String prefix = Seq.of("ABC1", "ABC2", "ABC_Whatever").commonPrefix();
Disclaimer: I work for the company behind jOOλ
if there are N strings and the minimum length among them is M charterers, then the most efficient (correct) answer will take N * M at worst case (when all strings are same).
outer loop - each character of first string at a time
inner loop - each of the strings
test - each charterer of the string in inner
loop against the charterer in outer loop.
the performance can be tuned upto (N-1) * M if we do not test against the first string in ther inner loop
I was asked this question in a phone interview for summer internship, and tried to come up with a n*m complexity solution (although it wasn't accurate too) in Java.
I have a function that takes 2 strings, suppose "common" and "cmn". It should return True based on the fact that 'c', 'm', 'n' are occurring in the same order in "common". But if the arguments were "common" and "omn", it would return False because even though they are occurring in the same order, but 'm' is also appearing after 'o' (which fails the pattern match condition)
I have worked over it using Hashmaps, and Ascii arrays, but didn't get a convincing solution yet! From what I have read till now, can it be related to Boyer-Moore, or Levenshtein Distance algorithms?
Hoping for respite at stackoverflow! :)
Edit: Some of the answers talk about reducing the word length, or creating a hashset. But per my understanding, this question cannot be done with hashsets because occurrence/repetition of each character in first string has its own significance. PASS conditions- "con", "cmn", "cm", "cn", "mn", "on", "co". FAIL conditions that may seem otherwise- "com", "omn", "mon", "om". These are FALSE/FAIL because "o" is occurring before as well as after "m". Another example- "google", "ole" would PASS, but "google", "gol" would fail because "o" is also appearing before "g"!
I think it's quite simple. Run through the pattern and fore every character get the index of it's last occurence in the string. The index must always increase, otherwise return false.
So in pseudocode:
index = -1
foreach c in pattern
checkindex = string.lastIndexOf(c)
if checkindex == -1 //not found
return false
if checkindex < index
return false
if string.firstIndexOf(c) < index //characters in the wrong order
return false
index = checkindex
return true
Edit: you could further improve the code by passing index as the starting index to the lastIndexOf method. Then you would't have to compare checkindex with index and the algorithm would be faster.
Updated: Fixed a bug in the algorithm. Additional condition added to consider the order of the letters in the pattern.
An excellent question and couple of hours of research and I think I have found the solution. First of all let me try explaining the question in a different approach.
Requirement:
Lets consider the same example 'common' (mainString) and 'cmn'(subString). First we need to be clear that any characters can repeat within the mainString and also the subString and since its pattern that we are concentrating on, the index of the character play a great role to. So we need to know:
Index of the character (least and highest)
Lets keep this on hold and go ahead and check the patterns a bit more. For the word common, we need to find whether the particular pattern cmn is present or not. The different patters possible with common are :- (Precedence apply )
c -> o
c -> m
c -> n
o -> m
o -> o
o -> n
m -> m
m -> o
m -> n
o -> n
At any moment of time this precedence and comparison must be valid. Since the precedence plays a huge role, we need to have the index of each unique character Instead of storing the different patterns.
Solution
First part of the solution is to create a Hash Table with the following criteria :-
Create a Hash Table with the key as each character of the mainString
Each entry for a unique key in the Hash Table will store two indices i.e lowerIndex and higherIndex
Loop through the mainString and for every new character, update a new entry of lowerIndex into the Hash with the current index of the character in mainString.
If Collision occurs, update the current index with higherIndex entry, do this until the end of String
Second and main part of pattern matching :-
Set Flag as False
Loop through the subString and for
every character as the key, retreive
the details from the Hash.
Do the same for the very next character.
Just before loop increment, verify two conditions
If highestIndex(current character) > highestIndex(next character) Then
Pattern Fails, Flag <- False, Terminate Loop
// This condition is applicable for almost all the cases for pattern matching
Else If lowestIndex(current character) > lowestIndex(next character) Then
Pattern Fails, Flag <- False, Terminate Loop
// This case is explicitly for cases in which patterns like 'mon' appear
Display the Flag
N.B : Since I am not so versatile in Java, I did not submit the code. But some one can try implementing my idea
I had myself done this question in an inefficient manner, but it does give accurate result! I would appreciate if anyone can make out an an efficient code/algorithm from this!
Create a function "Check" which takes 2 strings as arguments. Check each character of string 2 in string 1. The order of appearance of each character of s2 should be verified as true in S1.
Take character 0 from string p and traverse through the string s to find its index of first occurrence.
Traverse through the filled ascii array to find any value more than the index of first occurrence.
Traverse further to find the last occurrence, and update the ascii array
Take character 1 from string p and traverse through the string s to find the index of first occurence in string s
Traverse through the filled ascii array to find any value more than the index of first occurrence. if found, return False.
Traverse further to find the last occurrence, and update the ascii array
As can be observed, this is a bruteforce method...I guess O(N^3)
public class Interview
{
public static void main(String[] args)
{
if (check("google", "oge"))
System.out.println("yes");
else System.out.println("sorry!");
}
public static boolean check (String s, String p)
{
int[] asciiArr = new int[256];
for(int pIndex=0; pIndex<p.length(); pIndex++) //Loop1 inside p
{
for(int sIndex=0; sIndex<s.length(); sIndex++) //Loop2 inside s
{
if(p.charAt(pIndex) == s.charAt(sIndex))
{
asciiArr[s.charAt(sIndex)] = sIndex; //adding char from s to its Ascii value
for(int ascIndex=0; ascIndex<256; ) //Loop 3 for Ascii Array
{
if(asciiArr[ascIndex]>sIndex) //condition to check repetition
return false;
else ascIndex++;
}
}
}
}
return true;
}
}
Isn't it doable in O(n log n)?
Step 1, reduce the string by eliminating all characters that appear to the right. Strictly speaking you only need to eliminate characters if they appear in the string you're checking.
/** Reduces the maximal subsequence of characters in container that contains no
* character from container that appears to the left of the same character in
* container. E.g. "common" -> "cmon", and "whirlygig" -> "whrlyig".
*/
static String reduceContainer(String container) {
SparseVector charsToRight = new SparseVector(); // Like a Bitfield but sparse.
StringBuilder reduced = new StringBuilder();
for (int i = container.length(); --i >= 0;) {
char ch = container.charAt(i);
if (charsToRight.add(ch)) {
reduced.append(ch);
}
}
return reduced.reverse().toString();
}
Step 2, check containment.
static boolean containsInOrder(String container, String containee) {
int containerIdx = 0, containeeIdx = 0;
int containerLen = container.length(), containeeLen == containee.length();
while (containerIdx < containerLen && containeeIdx < containeeLen) {
// Could loop over codepoints instead of code-units, but you get the point...
if (container.charAt(containerIdx) == containee.charAt(containeeIdx)) {
++containeeIdx;
}
++containerIdx;
}
return containeeIdx == containeeLen;
}
And to answer your second question, no, Levenshtein distance won't help you since it has the property that if you swap the arguments the output is the same, but the algo you want does not.
public class StringPattern {
public static void main(String[] args) {
String inputContainer = "common";
String inputContainees[] = { "cmn", "omn" };
for (String containee : inputContainees)
System.out.println(inputContainer + " " + containee + " "
+ containsCommonCharsInOrder(inputContainer, containee));
}
static boolean containsCommonCharsInOrder(String container, String containee) {
Set<Character> containerSet = new LinkedHashSet<Character>() {
// To rearrange the order
#Override
public boolean add(Character arg0) {
if (this.contains(arg0))
this.remove(arg0);
return super.add(arg0);
}
};
addAllPrimitiveCharsToSet(containerSet, container.toCharArray());
Set<Character> containeeSet = new LinkedHashSet<Character>();
addAllPrimitiveCharsToSet(containeeSet, containee.toCharArray());
// retains the common chars in order
containerSet.retainAll(containeeSet);
return containerSet.toString().equals(containeeSet.toString());
}
static void addAllPrimitiveCharsToSet(Set<Character> set, char[] arr) {
for (char ch : arr)
set.add(ch);
}
}
Output:
common cmn true
common omn false
I would consider this as one of the worst pieces of code I have ever written or one of the worst code examples in stackoverflow...but guess what...all your conditions are met!
No algorithm could really fit the need, so I just used bruteforce...test it out...
And I could just care less for space and time complexity...my aim was first to try and solve it...and maybe improve it later!
public class SubString {
public static void main(String[] args) {
SubString ss = new SubString();
String[] trueconditions = {"con", "cmn", "cm", "cn", "mn", "on", "co" };
String[] falseconditions = {"com", "omn", "mon", "om"};
System.out.println("True Conditions : ");
for (String str : trueconditions) {
System.out.println("SubString? : " + str + " : " + ss.test("common", str));
}
System.out.println("False Conditions : ");
for (String str : falseconditions) {
System.out.println("SubString? : " + str + " : " + ss.test("common", str));
}
System.out.println("SubString? : ole : " + ss.test("google", "ole"));
System.out.println("SubString? : gol : " + ss.test("google", "gol"));
}
public boolean test(String original, String match) {
char[] original_array = original.toCharArray();
char[] match_array = match.toCharArray();
int[] value = new int[match_array.length];
int index = 0;
for (int i = 0; i < match_array.length; i++) {
for (int j = index; j < original_array.length; j++) {
if (original_array[j] != original_array[j == 0 ? j : j-1] && contains(match.substring(0, i), original_array[j])) {
value[i] = 2;
} else {
if (match_array[i] == original_array[j]) {
if (value[i] == 0) {
if (contains(original.substring(0, j == 0 ? j : j-1), match_array[i])) {
value[i] = 2;
} else {
value[i] = 1;
}
}
index = j + 1;
}
}
}
}
for (int b : value) {
if (b != 1) {
return false;
}
}
return true;
}
public boolean contains(String subStr, char ch) {
for (char c : subStr.toCharArray()) {
if (ch == c) {
return true;
}
}
return false;
}
}
-IvarD
I think this one is not a test of your computer science fundamentals, more what you would practically do within the Java programming environment.
You could construct a regular expression out of the second argument, i.e ...
omn -> o.*m[^o]*n
... and then test candidate string against this by either using String.matches(...) or using the Pattern class.
In generic form, the construction of the RegExp should be along the following lines.
exp -> in[0].* + for each x : 2 -> in.lenght { (in[x-1] +
[^in[x-2]]* + in[x]) }
for example:
demmn -> d.*e[^d]*m[^e]*m[^m]*n
I tried it myself in a different way. Just sharing my solution.
public class PatternMatch {
public static boolean matchPattern(String str, String pat) {
int slen = str.length();
int plen = pat.length();
int prevInd = -1, curInd;
int count = 0;
for (int i = 0; i < slen; i++) {
curInd = pat.indexOf(str.charAt(i));
if (curInd != -1) {
if(prevInd == curInd)
continue;
else if(curInd == (prevInd+1))
count++;
else if(curInd == 0)
count = 1;
else count = 0;
prevInd = curInd;
}
if(count == plen)
return true;
}
return false;
}
public static void main(String[] args) {
boolean r = matchPattern("common", "on");
System.out.println(r);
}
}