Matching the occurrence and pattern of characters of String2 in String1 - java

I was asked this question in a phone interview for summer internship, and tried to come up with a n*m complexity solution (although it wasn't accurate too) in Java.
I have a function that takes 2 strings, suppose "common" and "cmn". It should return True based on the fact that 'c', 'm', 'n' are occurring in the same order in "common". But if the arguments were "common" and "omn", it would return False because even though they are occurring in the same order, but 'm' is also appearing after 'o' (which fails the pattern match condition)
I have worked over it using Hashmaps, and Ascii arrays, but didn't get a convincing solution yet! From what I have read till now, can it be related to Boyer-Moore, or Levenshtein Distance algorithms?
Hoping for respite at stackoverflow! :)
Edit: Some of the answers talk about reducing the word length, or creating a hashset. But per my understanding, this question cannot be done with hashsets because occurrence/repetition of each character in first string has its own significance. PASS conditions- "con", "cmn", "cm", "cn", "mn", "on", "co". FAIL conditions that may seem otherwise- "com", "omn", "mon", "om". These are FALSE/FAIL because "o" is occurring before as well as after "m". Another example- "google", "ole" would PASS, but "google", "gol" would fail because "o" is also appearing before "g"!

I think it's quite simple. Run through the pattern and fore every character get the index of it's last occurence in the string. The index must always increase, otherwise return false.
So in pseudocode:
index = -1
foreach c in pattern
checkindex = string.lastIndexOf(c)
if checkindex == -1 //not found
return false
if checkindex < index
return false
if string.firstIndexOf(c) < index //characters in the wrong order
return false
index = checkindex
return true
Edit: you could further improve the code by passing index as the starting index to the lastIndexOf method. Then you would't have to compare checkindex with index and the algorithm would be faster.
Updated: Fixed a bug in the algorithm. Additional condition added to consider the order of the letters in the pattern.

An excellent question and couple of hours of research and I think I have found the solution. First of all let me try explaining the question in a different approach.
Requirement:
Lets consider the same example 'common' (mainString) and 'cmn'(subString). First we need to be clear that any characters can repeat within the mainString and also the subString and since its pattern that we are concentrating on, the index of the character play a great role to. So we need to know:
Index of the character (least and highest)
Lets keep this on hold and go ahead and check the patterns a bit more. For the word common, we need to find whether the particular pattern cmn is present or not. The different patters possible with common are :- (Precedence apply )
c -> o
c -> m
c -> n
o -> m
o -> o
o -> n
m -> m
m -> o
m -> n
o -> n
At any moment of time this precedence and comparison must be valid. Since the precedence plays a huge role, we need to have the index of each unique character Instead of storing the different patterns.
Solution
First part of the solution is to create a Hash Table with the following criteria :-
Create a Hash Table with the key as each character of the mainString
Each entry for a unique key in the Hash Table will store two indices i.e lowerIndex and higherIndex
Loop through the mainString and for every new character, update a new entry of lowerIndex into the Hash with the current index of the character in mainString.
If Collision occurs, update the current index with higherIndex entry, do this until the end of String
Second and main part of pattern matching :-
Set Flag as False
Loop through the subString and for
every character as the key, retreive
the details from the Hash.
Do the same for the very next character.
Just before loop increment, verify two conditions
If highestIndex(current character) > highestIndex(next character) Then
Pattern Fails, Flag <- False, Terminate Loop
// This condition is applicable for almost all the cases for pattern matching
Else If lowestIndex(current character) > lowestIndex(next character) Then
Pattern Fails, Flag <- False, Terminate Loop
// This case is explicitly for cases in which patterns like 'mon' appear
Display the Flag
N.B : Since I am not so versatile in Java, I did not submit the code. But some one can try implementing my idea

I had myself done this question in an inefficient manner, but it does give accurate result! I would appreciate if anyone can make out an an efficient code/algorithm from this!
Create a function "Check" which takes 2 strings as arguments. Check each character of string 2 in string 1. The order of appearance of each character of s2 should be verified as true in S1.
Take character 0 from string p and traverse through the string s to find its index of first occurrence.
Traverse through the filled ascii array to find any value more than the index of first occurrence.
Traverse further to find the last occurrence, and update the ascii array
Take character 1 from string p and traverse through the string s to find the index of first occurence in string s
Traverse through the filled ascii array to find any value more than the index of first occurrence. if found, return False.
Traverse further to find the last occurrence, and update the ascii array
As can be observed, this is a bruteforce method...I guess O(N^3)
public class Interview
{
public static void main(String[] args)
{
if (check("google", "oge"))
System.out.println("yes");
else System.out.println("sorry!");
}
public static boolean check (String s, String p)
{
int[] asciiArr = new int[256];
for(int pIndex=0; pIndex<p.length(); pIndex++) //Loop1 inside p
{
for(int sIndex=0; sIndex<s.length(); sIndex++) //Loop2 inside s
{
if(p.charAt(pIndex) == s.charAt(sIndex))
{
asciiArr[s.charAt(sIndex)] = sIndex; //adding char from s to its Ascii value
for(int ascIndex=0; ascIndex<256; ) //Loop 3 for Ascii Array
{
if(asciiArr[ascIndex]>sIndex) //condition to check repetition
return false;
else ascIndex++;
}
}
}
}
return true;
}
}

Isn't it doable in O(n log n)?
Step 1, reduce the string by eliminating all characters that appear to the right. Strictly speaking you only need to eliminate characters if they appear in the string you're checking.
/** Reduces the maximal subsequence of characters in container that contains no
* character from container that appears to the left of the same character in
* container. E.g. "common" -> "cmon", and "whirlygig" -> "whrlyig".
*/
static String reduceContainer(String container) {
SparseVector charsToRight = new SparseVector(); // Like a Bitfield but sparse.
StringBuilder reduced = new StringBuilder();
for (int i = container.length(); --i >= 0;) {
char ch = container.charAt(i);
if (charsToRight.add(ch)) {
reduced.append(ch);
}
}
return reduced.reverse().toString();
}
Step 2, check containment.
static boolean containsInOrder(String container, String containee) {
int containerIdx = 0, containeeIdx = 0;
int containerLen = container.length(), containeeLen == containee.length();
while (containerIdx < containerLen && containeeIdx < containeeLen) {
// Could loop over codepoints instead of code-units, but you get the point...
if (container.charAt(containerIdx) == containee.charAt(containeeIdx)) {
++containeeIdx;
}
++containerIdx;
}
return containeeIdx == containeeLen;
}
And to answer your second question, no, Levenshtein distance won't help you since it has the property that if you swap the arguments the output is the same, but the algo you want does not.

public class StringPattern {
public static void main(String[] args) {
String inputContainer = "common";
String inputContainees[] = { "cmn", "omn" };
for (String containee : inputContainees)
System.out.println(inputContainer + " " + containee + " "
+ containsCommonCharsInOrder(inputContainer, containee));
}
static boolean containsCommonCharsInOrder(String container, String containee) {
Set<Character> containerSet = new LinkedHashSet<Character>() {
// To rearrange the order
#Override
public boolean add(Character arg0) {
if (this.contains(arg0))
this.remove(arg0);
return super.add(arg0);
}
};
addAllPrimitiveCharsToSet(containerSet, container.toCharArray());
Set<Character> containeeSet = new LinkedHashSet<Character>();
addAllPrimitiveCharsToSet(containeeSet, containee.toCharArray());
// retains the common chars in order
containerSet.retainAll(containeeSet);
return containerSet.toString().equals(containeeSet.toString());
}
static void addAllPrimitiveCharsToSet(Set<Character> set, char[] arr) {
for (char ch : arr)
set.add(ch);
}
}
Output:
common cmn true
common omn false

I would consider this as one of the worst pieces of code I have ever written or one of the worst code examples in stackoverflow...but guess what...all your conditions are met!
No algorithm could really fit the need, so I just used bruteforce...test it out...
And I could just care less for space and time complexity...my aim was first to try and solve it...and maybe improve it later!
public class SubString {
public static void main(String[] args) {
SubString ss = new SubString();
String[] trueconditions = {"con", "cmn", "cm", "cn", "mn", "on", "co" };
String[] falseconditions = {"com", "omn", "mon", "om"};
System.out.println("True Conditions : ");
for (String str : trueconditions) {
System.out.println("SubString? : " + str + " : " + ss.test("common", str));
}
System.out.println("False Conditions : ");
for (String str : falseconditions) {
System.out.println("SubString? : " + str + " : " + ss.test("common", str));
}
System.out.println("SubString? : ole : " + ss.test("google", "ole"));
System.out.println("SubString? : gol : " + ss.test("google", "gol"));
}
public boolean test(String original, String match) {
char[] original_array = original.toCharArray();
char[] match_array = match.toCharArray();
int[] value = new int[match_array.length];
int index = 0;
for (int i = 0; i < match_array.length; i++) {
for (int j = index; j < original_array.length; j++) {
if (original_array[j] != original_array[j == 0 ? j : j-1] && contains(match.substring(0, i), original_array[j])) {
value[i] = 2;
} else {
if (match_array[i] == original_array[j]) {
if (value[i] == 0) {
if (contains(original.substring(0, j == 0 ? j : j-1), match_array[i])) {
value[i] = 2;
} else {
value[i] = 1;
}
}
index = j + 1;
}
}
}
}
for (int b : value) {
if (b != 1) {
return false;
}
}
return true;
}
public boolean contains(String subStr, char ch) {
for (char c : subStr.toCharArray()) {
if (ch == c) {
return true;
}
}
return false;
}
}
-IvarD

I think this one is not a test of your computer science fundamentals, more what you would practically do within the Java programming environment.
You could construct a regular expression out of the second argument, i.e ...
omn -> o.*m[^o]*n
... and then test candidate string against this by either using String.matches(...) or using the Pattern class.
In generic form, the construction of the RegExp should be along the following lines.
exp -> in[0].* + for each x : 2 -> in.lenght { (in[x-1] +
[^in[x-2]]* + in[x]) }
for example:
demmn -> d.*e[^d]*m[^e]*m[^m]*n

I tried it myself in a different way. Just sharing my solution.
public class PatternMatch {
public static boolean matchPattern(String str, String pat) {
int slen = str.length();
int plen = pat.length();
int prevInd = -1, curInd;
int count = 0;
for (int i = 0; i < slen; i++) {
curInd = pat.indexOf(str.charAt(i));
if (curInd != -1) {
if(prevInd == curInd)
continue;
else if(curInd == (prevInd+1))
count++;
else if(curInd == 0)
count = 1;
else count = 0;
prevInd = curInd;
}
if(count == plen)
return true;
}
return false;
}
public static void main(String[] args) {
boolean r = matchPattern("common", "on");
System.out.println(r);
}
}

Related

Count all possible decoding Combination of the given binary String in Java

Suppose we have a string of binary values in which some portions may correspond to specific letters, for example:
A = 0
B = 00
C = 001
D = 010
E = 0010
F = 0100
G = 0110
H = 0001
For example, if we assume the string "00100", we can have 5 different possibilities:
ADA
AF
CAA
CB
EA
I have to extract the exact number of combinations using Dynamic programming.
But I have difficulty in the formulation of subproblems and in the composition of the corresponding vector of solutions.
I appreciate any indications of the correct algorithm formulation.
class countString {
static int count(String a, String b, int m, int n) {
if ((m == 0 && n == 0) || n == 0)
return 1;
if (m == 0)
return 0;
if (a.charAt(m - 1) == b.charAt(n - 1))
return count(a, b, m - 1, n - 1) +
count(a, b, m - 1, n);
else
return count(a, b, m - 1, n);
}
public static void main(String[] args) {
Locale.setDefault(Locale.US);
ArrayList<String> substrings = new ArrayList<>();
substrings.add("0");
substrings.add("00");
substrings.add("001");
substrings.add("010");
substrings.add("0010");
substrings.add("0100");
substrings.add("0110");
substrings.add("0001");
if (args.length != 1) {
System.err.println("ERROR - execute with: java countString -filename- ");
System.exit(1);
}
try {
Scanner scan = new Scanner(new File(args[0])); // not important
String S = "00100";
int count = 0;
for(int i=0; i<substrings.size(); i++){
count = count + count(S,substrings.get(i),S.length(),substrings.get(i).length());
}
System.out.println(count);
} catch (FileNotFoundException e) {
System.out.println("File not found " + e);
}
}
}
In essence, Dynamic Programming is an enhanced brute-force approach.
Like in the case of brute-force, we need to generate all possible results. But contrary to a plain brute-force the problem should be divided into smaller subproblems, and previously computed result of each subproblem should be stored and reused.
Since you are using recursion you need to apply so-called Memoization technic in order to store and reuse the intermediate results. In this case, HashMap would be a perfect mean of storing results.
But before applying the memoization in order to understand it better, it makes sense to start with a clean and simple recursive solution that works correctly, and only then enhance it with DP.
Plain Recursion
Every recursive implementation should contain two parts:
Base case - that represents a simple edge-case (or a set of edge-cases) for which the outcome is known in advance. For this problem, there are two edge-cases: the length of the given string is 0 and result would be 1 (an empty binary string "" results into an empty string of letters ""), another case is when it's impossible to decode a given binary string and result will be 0 (in the solution below it resolves naturally when the recursive case is being executed).
Recursive case - a part of a solution where recursive calls a made and when the main logic resides. In the recursive case, we need to find each binary "binary letter" at the beginning of the string and then call the method recursively by passing the substring (without the "letter"). Results of these recursive calls need to be accumulated in the total count that will returned from the method.
In order to implement this logic we need only two arguments: the binary string to analyze and a list of binary letters:
public static int count(String str, List<String> letters) {
if (str.isEmpty()) { // base case - a combination was found
return 1;
}
// recursive case
int count = 0;
for (String letter: letters) {
if (str.startsWith(letter)) {
count += count(str.substring(letter.length()), letters);
}
}
return count;
}
This concise solution is already capable of producing the correct result. Now, let's turn this brute-force version into a DP-based solution, by applying the memoization.
Dynamic Programming
As I've told earlier, a HashMap will be a perfect mean to store the intermediate results because allows to associate a count (number of combinations) with a particular string and then retrieve this number almost instantly (in O(1) time).
That how it might look like:
public static int count(String str, List<String> letters, Map<String, Integer> vocab) {
if (str.isEmpty()) { // base case - a combination was found
return 1;
}
if (vocab.containsKey(str)) { // result was already computed and present in the map
return vocab.get(str);
}
int count = 0;
for (String letter: letters) {
if (str.startsWith(letter)) {
count += count(str.substring(letter.length()), letters, vocab);
}
}
vocab.put(str, count); // storing the total `count` into the map
return count;
}
main()
public static void main(String[] args) {
List<String> letters = List.of("0", "00", "001", "010", "0010", "0100", "0110", "0001"); // binary letters
System.out.println(count("00100", letters, new HashMap<>())); // DP
System.out.println(count("00100", letters)); // brute-force recursion
}
Output:
5 // DP
5 // plain recursion
A link to Online Demo
Hope this helps.
Idea is to create every possible string with these values and check whether input starts with the value or not. If not then switch to another index.
If you have test cases ready with you you can verify more.
I have tested only with 2-3 values.
public int getCombo(String[] array, int startingIndex, String val, String input) {
int count = 0;
for (int i = startingIndex; i < array.length; i++) {
String matchValue = val + array[i];
if (matchValue.length() <= input.length()) {
// if value matches then count + 1
if (matchValue.equals(input)) {
count++;
System.out.println("match Found---->" + count); //ommit this sysout , its only for testing.
return count;
} else if (input.startsWith(matchValue)) { // checking whether the input is starting with the new value
// search further combos
count += getCombo(array, 0, matchValue, input);
}
}
}
return count;
}
In main Method
String[] arr = substrings.toArray(new String[0]);
int count = 0;
for (int i = 0; i < arr.length; i++) {
System.out.println("index----?> " + i);
//adding this condition for single inputs i.e "0","010";
if(arr[i].equals(input))
count++;
else
count = count + getCombo(arr, 0, arr[i], input);
}
System.out.println("Final count : " + count);
My test results :
input : 00100
Final count 5
input : 000
Final count 3

Number of ways to recreate a given string using a given list of words

Given is a String word and a String array book that contains some strings. The program should give out the number of possibilities to create word only using elements in book. An element can be used as many times as we want and the program must terminate in under 6 seconds.
For example, input:
String word = "stackoverflow";
String[] book = new String[9];
book[0] = "st";
book[1] = "ck";
book[2] = "CAG";
book[3] = "low";
book[4] = "TC";
book[5] = "rf";
book[6] = "ove";
book[7] = "a";
book[8] = "sta";
The output should be 2, since we can create "stackoverflow" in two ways:
1: "st" + "a" + "ck" + "ove" + "rf" + "low"
2: "sta" + "ck" + "ove" + "rf" + "low"
My implementation of the program only terminates in the required time if word is relatively small (<15 characters). However, as I mentioned before, the running time limit for the program is 6 seconds and it should be able to handle very large word strings (>1000 characters). Here is an example of a large input.
Here is my code:
1) the actual method:
input: a String word and a String[] book
output: the number of ways word can be written only using strings in book
public static int optimal(String word, String[] book){
int count = 0;
List<List<String>> allCombinations = allSubstrings(word);
List<String> empty = new ArrayList<>();
List<String> wordList = Arrays.asList(book);
for (int i = 0; i < allCombinations.size(); i++) {
allCombinations.get(i).retainAll(wordList);
if (!sumUp(allCombinations.get(i), word)) {
allCombinations.remove(i);
allCombinations.add(i, empty);
}
else count++;
}
return count;
}
2) allSubstrings():
input: a String input
output: A list of lists, each containing a combination of substrings that add up to input
static List<List<String>> allSubstrings(String input) {
if (input.length() == 1) return Collections.singletonList(Collections.singletonList(input));
List<List<String>> result = new ArrayList<>();
for (List<String> temp : allSubstrings(input.substring(1))) {
List<String> firstList = new ArrayList<>(temp);
firstList.set(0, input.charAt(0) + firstList.get(0));
if (input.startsWith(firstList.get(0), 0)) result.add(firstList);
List<String> l = new ArrayList<>(temp);
l.add(0, input.substring(0, 1));
if (input.startsWith(l.get(0), 0)) result.add(l);
}
return result;
}
3.) sumup():
input: A String list input and a String expected
output: true if the elements in input add up to expected
public static boolean sumUp (List<String> input, String expected) {
String x = "";
for (int i = 0; i < input.size(); i++) {
x = x + input.get(i);
}
if (expected.equals(x)) return true;
return false;
}
I've figured out what I was doing wrong in my previous answer: I wasn't using memoization, so I was redoing an awful lot of unnecessary work.
Consider a book array {"a", "aa", "aaa"}, and a target word "aaa". There are four ways to construct this target:
"a" + "a" + "a"
"aa" + "a"
"a" + "aa"
"aaa"
My previous attempt would have walk through all four, separately. But instead, one can observe that:
There is 1 way to construct "a"
You can construct "aa" in 2 ways, either "a" + "a" or using "aa" directly.
You can construct "aaa" either by using "aaa" directly (1 way); or "aa" + "a" (2 ways, since there are 2 ways to construct "aa"); or "a" + "aa" (1 way).
Note that the third step here only adds a single additional string to a previously-constructed string, for which we know the number of ways it can be constructed.
This suggests that if we count the number of ways in which a prefix of word can be constructed, we can use that to trivially calculate the number of ways a longer prefix by adding just one more string from book.
I defined a simple trie class, so you can quickly look up prefixes of the book words that match at any given position in word:
class TrieNode {
boolean word;
Map<Character, TrieNode> children = new HashMap<>();
void add(String s, int i) {
if (i == s.length()) {
word = true;
} else {
children.computeIfAbsent(s.charAt(i), k -> new TrieNode()).add(s, i + 1);
}
}
}
For each letter in s, this creates an instance of TrieNode, and stores the TrieNode for the subsequent characters etc.
static long method(String word, String[] book) {
// Construct a trie from all the words in book.
TrieNode t = new TrieNode();
for (String b : book) {
t.add(b, 0);
}
// Construct an array to memoize the number of ways to construct
// prefixes of a given length: result[i] is the number of ways to
// construct a prefix of length i.
long[] result = new long[word.length() + 1];
// There is only 1 way to construct a prefix of length zero.
result[0] = 1;
for (int m = 0; m < word.length(); ++m) {
if (result[m] == 0) {
// If there are no ways to construct a prefix of this length,
// then just skip it.
continue;
}
// Walk the trie, taking the branch which matches the character
// of word at position (n + m).
TrieNode tt = t;
for (int n = 0; tt != null && n + m <= word.length(); ++n) {
if (tt.word) {
// We have reached the end of a word: we can reach a prefix
// of length (n + m) from a prefix of length (m).
// Increment the number of ways to reach (n+m) by the number
// of ways to reach (m).
// (Increment, because there may be other ways).
result[n + m] += result[m];
if (n + m == word.length()) {
break;
}
}
tt = tt.children.get(word.charAt(n + m));
}
}
// The number of ways to reach a prefix of length (word.length())
// is now stored in the last element of the array.
return result[word.length()];
}
For the very long input given by OP, this gives output:
$ time java Ideone
2217093120
real 0m0.126s
user 0m0.146s
sys 0m0.036s
Quite a bit faster than the required 6 seconds - and this includes JVM startup time too.
Edit: in fact, the trie isn't necessary. You can simply replace the "Walk the trie" loop with:
for (String b : book) {
if (word.regionMatches(m, b, 0, b.length())) {
result[m + b.length()] += result[m];
}
}
and it performs slower, but still way faster than 6s:
2217093120
real 0m0.173s
user 0m0.226s
sys 0m0.033s
A few observations:
x = x + input.get(i);
As you are looping, using String+ isn't a good idea. Use a StringBuilder and append to that within the loop, and in the end return builder.toString(). Or you follow the idea from Andy. There is no need to merge strings, you already know the target word. See below.
Then: List implies that adding/removing elements might be costly. So see if you can get rid of that part, and if it would be possible to use maps, sets instead.
Finally: the real point would be to look into your algorithm. I would try to work "backwards". Meaning: first identify those array elements that actually occur in your target word. You can ignore all others right from start.
Then: look at all array entries that **start*+ your search word. In your example you can notice that there are just two array elements that fit. And then work your way from there.
My first observation would be that you don't actually need to build anything: you know what string you are trying to construct (e.g. stackoverflow), so all you really need to keep track of is how much of that string you have matched so far. Call this m.
Next, having matched m characters, provided m < word.length(), you need to choose a next string from book which matches the portion of word from m to m + nextString.length().
You could do this by checking each string in turn:
if (word.matches(m, nextString, 0, nextString.length()) { ...}
But you can do better, by determining strings that can't match in advance: the next string you append will have the following properties:
word.charAt(m) == nextString.charAt(0) (the next characters match)
m + nextString.length() <= word.length() (adding the next string shouldn't make the constructed string longer than word)
So, you can cut down the potential words from book that you might check by constructing a map of letters to words that start with that (point 1); and if you store the words with the same starting letter in increasing length order, you can stop checking that letter as soon as the length gets too big (point 2).
You can construct a map once and reuse:
Map<Character, List<String>> prefixMap =
Arrays.asList(book).stream()
.collect(groupingBy(
s -> s.charAt(0),
collectingAndThen(
toList(),
ss -> {
ss.sort(comparingInt(String::length));
return ss;
})));
You can count the number of ways recursively, without constructing any additional objects (*):
int method(String word, String[] book) {
return method(word, 0, /* construct map as above */);
}
int method(String word, int m, Map<Character, List<String>> prefixMap) {
if (m == word.length()) {
return 1;
}
int result = 0;
for (String nextString : prefixMap.getOrDefault(word.charAt(m), emptyList())) {
if (m + nextString.length() > word.length()) {
break;
}
// Start at m+1, because you already know they match at m.
if (word.regionMatches(m + 1, nextString, 1, nextString.length()-1)) {
// This is a potential match!
// Make a recursive call.
result += method(word, m + nextString.length(), prefixMap);
}
}
return result;
}
(*) This may construct new instances of Character, because of the boxing of the word.charAt(m): cached instances are guaranteed to be used for chars in the range 0-127 only. There are ways to work around this, but they would only clutter the code.
I think you are already doing a pretty good job at optimizing your application. In addition to the answer by GhostCat here are a few suggestions of my own:
public static int optimal(String word, String[] book){
int count = 0;
List<List<String>> allCombinations = allSubstrings(word);
List<String> wordList = Arrays.asList(book);
for (int i = 0; i < allCombinations.size(); i++)
{
/*
* allCombinations.get(i).retainAll(wordList);
*
* There is no need to retrieve the list element
* twice, just set it in a local variable
*/
java.util.List<String> combination = allCombinations.get(i);
combination.retainAll(wordList);
/*
* Since we are only interested in the count here
* there is no need to remove and add list elements
*/
if (sumUp(combination, word))
{
/*allCombinations.remove(i);
allCombinations.add(i, empty);*/
count++;
}
/*else count++;*/
}
return count;
}
public static boolean sumUp (List<String> input, String expected) {
String x = "";
for (int i = 0; i < input.size(); i++) {
x = x + input.get(i);
}
// No need for if block here, just return comparison result
/*if (expected.equals(x)) return true;
return false;*/
return expected.equals(x);
}
And since you are interested in seeing the execution time of your method I would recommend implementing a benchmarking system of some sort. Here is a quick mock-up:
private static long benchmarkOptima(int cycles, String word, String[] book) {
long totalTime = 0;
for (int i = 0; i < cycles; i++)
{
long startTime = System.currentTimeMillis();
int a = optimal(word, book);
long executionTime = System.currentTimeMillis() - startTime;
totalTime += executionTime;
}
return totalTime / cycles;
}
public static void main(String[] args)
{
String word = "stackoverflow";
String[] book = new String[] {
"st", "ck", "CAG", "low", "TC",
"rf", "ove", "a", "sta"
};
int result = optimal(word, book);
final int cycles = 50;
long averageTime = benchmarkOptima(cycles, word, book);
System.out.println("Optimal result: " + result);
System.out.println("Average execution time - " + averageTime + " ms");
}
Output
2
Average execution time - 6 ms
Note: The implementation is getting stuck in the test case mentioned by #user1221, working on it.
What I could think of is a Trie based approach that is O(sum of length of words in dict) space. Time is not optimal.
Procedure:
Build a Trie of all the words in the dictionary. This is a pre-processing task that will take O(sum of lengths of all strings in dict).
We try finding the string that you want to make in the trie, with a twist. We start with searching a prefix of the string. If we get a prefix in the trie, we start the search from the top recursively and continue to look for more prefixes.
When we reach the end of out string i.e. stackoverflow, we check if we arrived at the end of any string, if yes, then we reached a valid combination of this string. we count this while going back up the recursion.
eg:
In the above case, we use the dict as {"st", "sta", "a", "ck"}
We construct our trie ($ is the sentinel char, i.e. a char which is not in the dict):
$___s___t.___a.
|___a.
|___c___k.
the . represents that a word in the dict ends at that position.
We try to find the no of constructions of stack.
We start searching stack in the trie.
depth=0
$___s(*)___t.___a.
|___a.
|___c___k.
We see that we are at the end of one word, we start a new search with the remaining string ack from the top.
depth=0
$___s___t(*).___a.
|___a.
|___c___k.
Again we are at the end of one word in the dict. We start a new search for ck.
depth=1
$___s___t.___a.
|___a(*).
|___c___k.
depth=2
$___s___t.___a.
|___a.
|___c(*)___k.
We reach the end of stack and end of a word in the dict, hence we have 1 valid representation of stack.
depth=2
$___s___t.___a.
|___a.
|___c___k(*).
We go back to the caller of depth=2
No next char is available, we return to the caller of depth=1.
depth=1
$___s___t.___a.
|___a(*, 1).
|___c___k.
depth=0
$___s___t(*, 1).___a.
|___a.
|___c___k.
We move to next char. We see that we reached the end of one word in the dict, we launch a new search for ck in the dict.
depth=0
$___s___t.___a(*, 1).
|___a.
|___c___k.
depth=1
$___s___t.___a.
|___a.
|___c(*)___k.
We reach the end of the stack and a work in the dict, so another valid representation. We go back to the caller of depth=1
depth=1
$___s___t.___a.
|___a.
|___c___k(*, 1).
There are no more chars to proceed, we return with the result 2.
depth=0
$___s___t.___a(*, 2).
|___a.
|___c___k.
Note: The implementation is in C++, shouldn't be too hard to convert to Java and this implementation assumes that all chars are lowercase, it's trivial to extend it to both cases.
Sample code (full version):
/**
Node *base: head of the trie
Node *h : current node in the trie
string s : string to search
int idx : the current position in the string
*/
int count(Node *base, Node *h, string s, int idx) {
// step 3: found a valid combination.
if (idx == s.size()) return h->end;
int res = 0;
// step 2: we recursively start a new search.
if (h->end) {
res += count(base, base, s, idx);
}
// move ahead in the trie.
if (h->next[s[idx] - 'a'] != NULL) {
res += count(base, h->next[s[idx] - 'a'], s, idx + 1);
}
return res;
}
def cancons(target,wordbank, memo={}):
if target in memo:
return memo[target]
if target =='':
return 1
total_count =0
for word in wordbank:
if target.startswith(word):
l= len(word)
number_of_way=cancons(target[l:],wordbank,memo)
total_count += number_of_way
memo[target]= total_count
return total_count
if __name__ == '__main__':
word = "stackoverflow";
String= ["st", "ck","CAG","low","TC","rf","ove","a","sta"]
b=cancons(word,String,memo={})
print(b)

searching a Char letter by letter

Trying to search for patterns of letters in a file, the pattern is entered by a user and comes out as a String, so far I've got it to find the first letter by unsure how to make it test to see if the next letter also matches the pattern.
This is the loop I currently have. any help would be appreciated
public void exactSearch(){
if (pattern==null){UI.println("No pattern");return;}
UI.println("===================\nExact searching for "+patternString);
int j = 0 ;
for(int i=0; i<data.size(); i++){
if(patternString.charAt(i) == data.get(i) )
j++;
UI.println( "found at " + j) ;
}
}
You need to iterate over the first string until you find the first character of the other string. From there, you can create an inner loop and iterate on both simultaneously, like you did.
Hint: be sure to look watch for boundaries as the strings might not be of the same size.
You can try this :-
String a1 = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = 0;
while(foundIndex != -1) {
foundIndex = a1.indexOf(pattern,foundIndex);
if(foundIndex != -1)
{
System.out.println(foundIndex);
foundIndex += 1;
}
}
indexOf - first parameter is the pattern string,
second parameter is starting index from where we have to search.
If pattern is found, it will return the starting index from where the pattern matched.
If pattern is not found, indexOf will return -1.
String data = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = data.indexOf(pattern);
while (foundIndex > -1) {
System.out.println("Match found at: " + foundIndex);
foundIndex = data.indexOf(pattern, foundIndex + pattern.length());
}
Based on your request, you can use this algorithm to search for your positions:
1) We check if we reach at the end of the string, to avoid the invalidIndex error, we verify if the remaining substring's size is smaller than the pattern's length.
2) We calculate the substring at each iteration and we verify the string with the pattern.
List<Integer> positionList = new LinkedList<>();
String inputString = "AAACABCCCABC";
String pattern = "ABC";
for (int i = 0 ; i < inputString.length(); i++) {
if (inputString.length() - i < pattern.length()){
break;
}
String currentSubString = inputString.substring(i, i + pattern.length());
if (currentSubString.equals(pattern)){
positionList.add(i);
}
}
for (Integer pos : positionList) {
System.out.println(pos); // Positions : 4 and 9
}
EDIT :
Maybe it can be optimized, not to use a Collection for this simple task, but I used a LinkedList to write a quicker approach.

How to Check if a String Contains a Second String with its Characters in Order?

I'm just starting out and I'm completely lost on how to do this.
I want to be able to check a string for a smaller string and return true if the string contains the letters of the string in order.
I'm not sure how to go about making sure the letters of the second string are in order, even if there are other letters between them.
An example would be that "chemistry" would return true for the string "hit".
It would return false for the string "him" though.
Any help would be greatly appreciated.
EDIT: Thank you, I changed the word "substring" to string. As I said, I'm just beginning and wasn't aware that meant something else. I really appreciate all the help. It should get me moving in the right direction.
The general approach is to iterate over the characters of the longer string ("chemistry"), always keeping track of the index of the next required character from the shorter string ("hit" — first 0, then 1 once you find h, then 2 once you find i, and then when you find t you're done). For example:
public static boolean containsSubsequence(
final String sequence, final String subsequence) {
if (subsequence.isEmpty()) {
return true;
}
int subsequenceIndex = 0;
for (int i = 0; i < sequence.length(); ++i) {
if (sequence.charAt(i) == subsequence.charAt(subsequenceIndex)) {
++subsequenceIndex;
if (subsequenceIndex == subsequence.length()) {
return true;
}
}
}
return false;
}
SInce you haven't posted any code, I will just explain what would I do.
Iterate over all the substring letter by letter, on your example "hit"
Check if the current letter (iterate 0 is h) is on the string, if/when it find you remove the ocurrences before it and let the string from that (emistry)
Do this process for all left substrings
use a control boolean variable to see if it has found or not.
if in any pass of the iteration you did not find you return false.
Well, you could go through both strings simultaneously, advancing your index to the "substring" (the correct term is subsequence - "mist" is a substring of "chemistry", but "hit" is only a subsequence) string only if its current character matches the current character in the outer string. I.e., for "chemistry" and "hit", you start with indices i = 0, j = 0. You increase the index i into the first string until you encounter s1.charAt(i) == s2.charAt(j), which is the case for i = 1 (second character in chemistry is h). Then you increase j, and are now increasing i again until you hit an "i" (i = 4). The second string is contained as a subsequence in the first if at the end, j == s2.length() holds. Note that here - unlike for more complex problems, such as testing if the second string really is a substring - a greedy strategy works, meaning you don't have to worry about which of multiple occurrences of the same character you match against the current in one in the second string; you can always "greedily" choose the first one you see.
Alternatively, you can use regexes: convert the second (search) string into the regex pattern String pat = ".*h.*i.*t.*", and test s1.matches(pat).
You could do the following (not sure how efficient it is):
Consider your search string "hit" as an array of char: ["h","i","t"]
Use indexOf(c) to determine whether the first character in the array can be found.
Repeat the search on the remaining substring.
Here's the code:
public class SearchString {
public static void main(String[] args) {
String searchSpace = "this is where to search?";
String needle = "tweus?";
char[] chars = needle.toCharArray();
int index = 0;
boolean found = true;
int startIndex = 0;
while (found && index < chars.length){
searchSpace = searchSpace.substring(startIndex);
startIndex = searchSpace.indexOf(chars[index]);
found = (startIndex != -1);
index++;
}
if (index==chars.length && found){
System.out.println("Found it");
} else {
System.out.println("Nothing here");
}
}
}
A slightly modified answer based on #ruakh's solution:
public static boolean containsSubsequence(final String sequence, final String subsequence) {
if (Objects.requireNonNull(sequence).isEmpty() || Objects.requireNonNull(subsequence).isEmpty() || subsequence.length() > sequence.length()) {
return false;
}
int index = 0;
for (int i = 0; i < sequence.length(); i++) {
if (sequence.charAt(i) == subsequence.charAt(index) && ++index == subsequence.length()) {
return true;
}
}
return false;
}
Objects.requireNonNull() is from Java 7, remember to substitute for something similar (from Apache Commons's StringUtils?) if you're not on Java 7. The validation assumes returning false is suitable for an empty sequence or subsequence, or you may want to consider throwing something like IllegalArgumentException.
The two if statements have been combined into a single clause for compactness.
edit:
If one is mathematically inclined, or following #ruakh's original solution, any sequence should contain an empty subsequence. The only reason why my code above is doing it differently is because I prefer to imagine an empty argument as a form of invalid argument, thus returning false. It really depends on how this method is used, and how 'severe' an empty argument is.
I know this was asked as a Java question. But just for reference, here is a version of it in C.\
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
int find_str_in_str(const char* const base, const char* const sub)
{
int base_len = strlen(base);
int sub_len = strlen(sub);
char *tmp_sub = NULL;
/* allocate enough mem for the max string length */
if(base_len > sub_len) {
tmp_sub = malloc(base_len + 1);
} else {
tmp_sub = malloc(sub_len + 1);
}
if(NULL == tmp_sub) {
fprintf(stderr, "Runtime error (malloc)\n");
exit(1);
}
int i = 0;
int j = 0;
for(; i < sub_len; i++) {
for(; j < base_len; j++) {
if(base[j] == sub[i]) {
tmp_sub[i] = base[j];
/* the first occurance was found */
break;
}
}
}
tmp_sub[i++] = '\0';
if(0 == strcmp(sub, tmp_sub)) {
free(tmp_sub);
return 1;
} else {
free(tmp_sub);
return 0;
}
}
int main(int argc, char **argv)
{
if(argc < 3) {
fprintf(stderr, "Usage: %s %s %s\n", argv[0], "base", "derived");
return EXIT_FAILURE;
}
if(1 == find_str_in_str(argv[1], argv[2])) {
printf("true\n");
} else {
printf("false\n");
}
return EXIT_SUCCESS;
}
to compile: gcc -Wall -Wextra main.c -o main
main self elf
main chemistry try
main chemistry hit
main chemistry tim

Determining if two strings are a substring of a permutation of another String

So I am trying to figure out if two strings when combined together are a substring of a permutation of another string.
I have what I believe to be a working solution but it is failing some of the JUnit test cases and I dont have access to the ones that it is failing on.
here is my code with one test case
String a="tommarvoloriddle";
String b="lord";
String c="voldemort";
String b= b+c;
char[] w= a.toCharArray();
char[] k= b.toCharArray();
Arrays.sort(k);
Arrays.sort(w);
pw.println(isPermuation(w,k)?"YES":"NO");
static boolean isPermuation(char[] w, char[] k)
{
boolean found=false;
for(int i=0; i<k.length; i++)
{
for(int j=i; j<w.length; j++)
{
if(k[i]==w[j])
{
j=w.length;
found=true;
}
else
found=false;
}
}
return found;
}
any help getting this to always produce the correct answer would be awesome and help making it more efficient would be great too
What you have is not a working solution. However, you don't explain why you thought it might be, so it's hard to figure out what you intended. I will point out that your code updates found unconditionally for each inner loop, so isPermutation() will always return the result of the last comparison (which is certainly not what you want).
You did the right thing in sorting the two arrays in the first place -- this is a classic step which should allow you to efficiently evaluate them in one pass. But then, instead of a single pass, you use a nested loop -- what did you intend here?
A single pass implementation might be something like:
static boolean isPermutation(char[] w, char[] k) {
int k_idx=0;
for(w_idx=0; w_idx < w.length; ++w_idx) {
if(k_idx == k.length)
return true; // all characters in k are present in w
if( w[w_idx] > k[k_idx] )
return false; // found character in k not present in w
if( w[w_idx] == k[k_idx] )
++k_idx; // character from k corresponds to character from w
}
// any remaining characters in k are not present in w
return k_idx == k.length;
}
So we are only interested in whether the two combined strings are a subset of a permutation of another string, meaning that the lengths can in fact differ. So let's say we have:
String a = "tommarvoloriddle";
String b = "lord";
String c = "voldemort";
char[] master = a.ToCharArray();
char[] combined = (b + c).ToCharArray();
Arrays.Sort(master);
Arrays.Sort(combined);
System.out.println(IsPermutation(master, combined) ? "YES" : "NO");
Then our method is:
static boolean IsPermutation(char[] masterString, char[] combinedString)
{
int combinedStringIndex = 0;
int charsFound = 0;
int result = 0;
for (int i = 0; i < masterString.Length; ++i) {
result = combinedString[combinedStringIndex].CompareTo(masterString[i]);
if (result == 0) {
charsFound++;
combinedStringIndex++;
}
else if (result < 0) {
return false;
}
}
return (charsFound == combinedString.Length);
}
What the above method does: it starts comparing characters of the two strings. If we have a mismatch, that is, the character at the current masterString index does not match the character at the current combinedString index, then we simply look at the next character of masterString and see if that matches. At the end, we tally the total number of characters matched from our combinedString, and, if they are equal to the total number of characters in combinedString (its length), then we have established that it is indeed a permutation of masterString. If at any point, the current character in masterString is numerically greater than the current character in combinedString then it means that we will never be able to match the current character, so we give up. Hope that helps.
If two Strings are a permuation of the other you should be able to do this
public static boolean isPermuted(Strign s1, String s2) {
if (s1.length() != s2.length()) return false;
char[] chars1 = s1.toCharArray();
char[] chars2 = s2.toCharArray();
Arrays.sort(chars1);
Arrays.sort(chars2);
return Arrays.equals(chars1, chars2);
}
This means that when sorted the characters are the same, in the same number.

Categories