Subsequence of a string - java

I have to write a program that takes string argument s and integer argument k and prints out all subsequences of s of length k. For example if I have
subSequence("abcd", 3);
the output should be
abc abd acd bcd
I would like guidance. No code, please!
Thanks in advance.
Update:
I was thinking to use this pseudocode:
Start with an empty string
Append the first letter to the string
Append the second letter
Append the third letter
Print the so-far build substring - base case
Return the second letter
Append the fourth letter
Print the substring - base case
Return the first letter
Append the third letter
Append the fourth letter
Print the substring - base case
Return third letter
Append the second letter
Append the third letter
Append the fourth letter
Print the substring - base case
Return the third letter
Return the second letter
Append the third letter
Append the fourth letter
Return third letter
Return fourth letter
Return third letter
Return second letter
Return first letter
The different indent means going deeper in the recursive calls.
(In response to Diego Sevilla):
Following your suggestion:
private String SSet = "";
private String subSequence(String s, int substr_length){
if(k == 0){
return SSet;
}
else{
for(int i = 0; i < substr_length; i++){
subString += s.charAt(i);
subSequence(s.substring(i+1), k-1);
}
}
return SSet;
}
}

As you include "recursion" as a tag, I'll try to explain you the strategy for the solution. The recursive function should be a function like that you show:
subSequence(string, substr_length)
that actually returns a Set of (sub)-strings. Note how the problem could be divided in sub-problems that are apt to recursion. Each subSequence(string, substr_length) should:
Start with an empty substring set, that we call SSet.
Do a loop from 0 to the length of the string minus substr_length
In each loop position i, you take string[i] as the beginning character, and call recursively to subSequence(string[i+1..length], substr_length - 1) (here the .. imply an index range into the string, so you have to create the substring using these indices). That recursive call to subSequence will return all the strings of size substr_length -1. You have to prepend to all those substrings the character you selected (in this case string[i]), and add all of them to the SSet set.
Just return the constructed SSet. This one will contain all the substrings.
Of course, this process is highly optimizable (for example using dynamic programming storing all the substrings of length i), but you get the idea.

So, I see you want to implement a method: subSequence(s, n): Which returns a collection of all character character combinations from s of length n, such that ordering is preserved.
In the spirit of your desire to not provide you with code, I assume you would prefer no pseudo-code either. So, I will explain my suggested approach in a narrative fashion, leaving the translation to procedural code as an exercise-to-the-reader(TM).
Think of this problem where you are obtaining all combinations of character positions, which could be represented as an array of bits (a.k.a. flags). So where s="abcd" and n=3 (as in your example), all combinations could be represented as follows:
1110 = abc
1101 = abd
1011 = acd
0111 = bcd
Note, that we start with a bit-field where all characters are turned "on" and then shift the "off" bit over by 1. Things get interesting in an example where n < length(s) - 1. For example, say s="abcd" and n=2. Then we have:
1100 = ab
1001 = ad
1010 = ac
0110 = bc
0101 = bd
0011 = cd
The recursion comes into play when you analyze a sub set of the bit-fields. Hence, a recursive call would reduce the size of the bit-field and "bottom-out" where you have three flags:
100
010
001
The bulk of the work is a recursive approach to find all of the bit-fields. Once you have them, the positions of each bit can be used as an index in the the array of characters (that is s).
This should be sufficient to get you started on some pseudo-code!

The problem is precisely this:
Given an ordered set S : {C0, C1, C2, ..., Cn}, derive all ordered subsets S', where each member of S' is a member of S, and relative order of {S':Cj, S':Cj+1} is equivalent to relative order {S:Ci, S:Ci+d} where S':Cj = S:Ci and S':Cj+1 = S:Ci+d. |S|>=|S'|.
Assume/assert size of set S, |S| is >= the size of the subset, |S'|
If |S| - |S'| = d, then you know each of the subsets S' begins with digit at Si, where 0 < i < d.
e.g given S:{a, b, d, c} and |S'| = 3
d = 1
S' sets begin with 'a' (S:0), and 'b' (S:1).
So we see the problem is actually to solve d lexically ordered permutations of length 3 of subsets of S.
#d=0: get l.o.permutations of length 3 for {a, b, c, d}
#d=1: get l.o.permutations of length 3 for {b, c, d}
#d=2: d > |S|-|S'|. STOP.

string subSeqString() {
string s1="hackerrank";
string s="hhaacckkekraraannk";
int k=0,c=0;
int size=s1.size();
for(int i=0;i<size;i++)
{
for(int j=k;j<s.size();j++)
{
if(s1[i]==s[j])
{
c++;
k++;
break;
}
k++;
}
}
if(c==size)
return "YES";
else
return "NO";
}

Related

String metrics alghoritms in Java

I am solving the problem of processing strings in Java. Please help me solve the problem.
Condition of the problem: John has launched his new startup to recognize the clouds he has seen, which he called string A of length N . But suddenly he found out that Sam also launched his cloud recognition startup and called it string B of length N.
More formally, let there be strings A, the name of John's startup, and string B, the name of Sam's startup. Both strings are the same length
N. For each position 1 ≤ i ≤ N of string B , you need to calculate the type of match at this position with string A .
If a
Bi=Ai, then in position i the match type must be equal to P (from the word plagiarism).
If Bi≠Ai, but there is another position 1≤j≤N such that Bi=Aj, then in position i
match type must be equal to S (from the word suspicious).
Note:
Letters within one line can be repeated.
Each letter of string A can be used in at most one plagiarism or suspicious match.
Preference is always given to the plagiarism type.
In the case of a suspicious match, the leftmost position in row A is always preferred.
. In other positions, the match type must be equal to I (from the word innocent).
Input Format
The first line contains the string
A(1≤∣∣A∣∣≤10^6) is the startup name chosen by John.
The second line contains the string B(|B|=|A|) — the name of Sam's startup.
It is guaranteed that strings A and B
contain only uppercase latin letters.
Output Format
Output a single line
C(|C|=|B|), where Ci is the match type of the character Bi(1≤i≤|B|):
for type plagiarism Ci= P.
for type suspicious Ci=S.
for type innocentCi=I.
Example 1:
Input Output
CLOUD PSIIP
CUPID
Example 2:
Input Output
ALICE SPII
ELIBO
Example 3:
Input Output
ABCBCYA IPSSPIP
ZBBACAA
Notes:
Explanation for the first test
B1=A1 and B5=A5 , so for positions 1 and 5 the answer is P.
B2≠A2 , but B2=A4, so for position 2 the answer is S.
Letters P and I do not occur in string A, so for positions 3 and 4 the answer is I.
Explanation for the second test:
B2=A2 and B3=A3, so for positions 2 and 3 the answer is P.
B1≠A1 , but B1=A5, so for position 1 the answer is S.
Letters B and O do not occur in string A, so for positions 4 and 5 the answer is I.
Explanation for the third test:
B2=A2 , B5=A5 and B7=A7 so for positions 2, 5 and 7 the answer is P.
B3≠A3 but B3=A2=A4. A2 is already enabled according to B2=A2,
therefore, the correspondence B3=A4 is chosen - for position 3 the answer is S.
B4≠A4 and B6≠A6, but B4=B6=A1=A7.
A7 is already enabled according to B7=A7;
4<6, therefore, for position 4, the correspondence B4=A1 (answer S) is selected;
there are no matches left for position 6 (answer I).
The letter Z does not occur in string A, so for position 1 the answer is I.
A7 is already enabled according to B7=A7;
4<6, therefore, for position 4, the correspondence B4=A1 (answer S) is selected;
there are no matches left for position 6 (answer I).
The letter Z does not occur in string A, so for position 1 the answer is I.
My solution:
public class Solution {
public boolean backspaceCompare(String s, String t) {
return formBackSpaceString(s).equals(formBackSpaceString(t));
}
private String formBackSpaceString(String s) {
Stack<Character> stack = new Stack<>();
for (char c : s.toCharArray()) {
if (c == '#') {
if (!stack.isEmpty()) {
stack.pop();
}
} else {
stack.push(c);
}
}
StringBuilder sb = new StringBuilder();
while (!stack.isEmpty()) {
sb.append(stack.pop());
}
return sb.toString();
}
public static void main (String[] args){
}
}
I am bogged down in the logic of this task. I would be grateful for help at least at the pseudocode level.
The code you provided does not seem related to the problem you described.
The problem asks to apply a simple rule for chars (Ai, Bi) from the original string to construct a new string C. There are only three rules:
If Ai == Bi, Ci = "P"
Otherwise, if A contains char Bi, Ci = "S"
Otherwise, Ci = "I"
You can do that in a simple way using stream API:
String problem(String A, String B) {
// construct a set of all chars in A
Set<Integer> aChars = A.chars().boxed().collect(Collectors.toSet());
// apply rules for chars Ai, Bi
return IntStream.range(0, A.length())
.mapToObj(i -> A.charAt(i) == B.charAt(i) ? "P" :
aChars.contains((int) B.charAt(i)) ? "S" : "I")
.collect(Collectors.joining());
}
Or, more verbosely, without it:
String problem(String A, String B) {
Set<Character> aChars = new HashSet<>();
for (int i = 0; i < A.length(); i++) {
aChars.add(A.charAt(i));
}
StringBuilder builder = new StringBuilder();
for (int i = 0; i < A.length(); i++) {
if (A.charAt(i) == B.charAt(i)) {
builder.append("P");
} else if (aChars.contains(B.charAt(i))) {
builder.append("S");
} else {
builder.append("I");
}
}
return builder.toString();
}

How and why does this code work? Finding the minimum number of steps to change one word to another

I'm researching on how to find the minimum number of steps required to convert word1 to word2, and came across the following implementation with the rules:
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
And the idea of the implementation is:
Use distance[i][j] to represent the shortest edit distance between word1[0,i) and word2[0, j). Then compare the last character of word1[0,i) and word2[0,j), which are c and d respectively (c == word1[i-1], d == word2[j-1]):
if c == d, then : distance[i][j] = distance[i-1][j-1]
Otherwise we can use three operations to convert word1 to word2:
(a) if we replaced c with d: distance[i][j] = distance[i-1][j-1] + 1;
(b) if we added d after c: distance[i][j] = distance[i][j-1] + 1;
(c) if we deleted c: distance[i][j] = distance[i-1][j] + 1;
Code:
public class Solution {
public int minDistance(String word1, String word2) {
int len1 = word1.length();
int len2 = word2.length();
//distance[i][j] is the distance converse word1(1~ith) to word2(1~jth)
int[][] distance = new int[len1 + 1][len2 + 1];
for (int j = 0; j <= len2; j++)
{distance[0][j] = j;} //delete all characters in word2
for (int i = 0; i <= len1; i++)
{distance[i][0] = i;}
for (int i = 1; i <= len1; i++) {
for (int j = 1; j <= len2; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) { //ith & jth
distance[i][j] = distance[i - 1][j - 1];
} else {
distance[i][j] = Math.min(Math.min(distance[i][j - 1], distance[i - 1][j]), distance[i - 1][j - 1]) + 1;
}
}
}
return distance[len1][len2];
}
}
And my question is what does distance[][] represent? What's the point of storing a value for every 2D index? And why do you add 1 to len1 and len2 in int[][] distance = new int[len1 + 1][len2 + 1];?
For example, from my understanding, it is comparing each character of word1 to word2, but once both characters match, shouldn't both words' indexes move up? Meaning, if String word1="ab"; and String word2="ac", since a characters match, there is no need to compare a in word1 to c in word2, but rather move up indexes, and compare b in word1 to c in word2.
Lastly, how do the three operations represent the way that do, e.g. how come distance[i-1][j-1] represent replacement?
Thank you in advance and will accept answer/up vote.
what does distance[][] represent?
It represents minDistance(word1.substring(0, i), word2.substring(0, j)). Here i and j are lengthes of substrings.
What's the point of storing a value for every 2D index?
It's dynamic programming idea. The answer for partial solution is calculated once and then used multiple times. If you don't store it in "global" array, you have to calculate it every time you need it. For this case there are 1-3 possible cases, so recursive calculations could take O((N*M)^3) time, where N is length of word1 and M is length of word2. But if instead you simply use previously calculated result, it will only take O(N*M) time.
And why do you add 1 to len1 and len2 in int[][] distance = new int[len1 + 1][len2 + 1];?
For technical reasons you need to look up back to trivial cases of empty substrings. To store these cases distance[0][i] and distance[j][0] slots in array are being used.
You can replace it with special case calculations (the solution for trivial cases is known), but it will make code more complex. Would you go for recursive calls instead of direct look ups in array, it would be viable.
once both characters match, shouldn't both words' indexes move up?
No, it's not about moving index, it's about calculation of partial solution here and now. Nested loops for i and j will care about "index moving" in appropriate time. Remember, we are not going through one particulary good case, but calculate all partial solutions for i=0..len1 and j=0..len2. Each partial solution only makes 1 step back in 1 or 3 different directions.
Lastly, how do the three operations represent the way that do, e.g. how come distance[i-1][j-1] represent replacement?
For example, minDistance("abc", "abd") = 1 + minDistance("ab", "ab") = 1 + minDistance("a", "a") = 1 + minDistance("", "") = 1 + 0 = 1.
In this example for the case of calculation of final answer:
i=3="abc".length()
j=3="abd".length()
c = 'c' = "abc".charAt(i-1)
d = 'd' = "abd".charAt(j-1)
If we decide to replace c to d, that is last character in word1 to last character in word2, we use already calculated answer for left parts of words 1 character shorter, since replacement will take care of last character. We add 1 to total number of operations, since we decided to make this replacement here.
We're adding 1 to the word lengths for convenience since string indexing corresponds with some length at index zero, but we need to refer to distance[0-1][0-1] on the first comparison, if word1[0] == word2[0]. Without an extra cell, the assignment, distance[i][j] = distance[i-1][j-1], would have to be especially addressed rather than just be part of the loop.
This kind of solution formulation, where each iteration relies on a previous iteration's result is called dynamic programming. Let's try to put words to this particular rule formulation.
First of all, we define what each cell represents: its the smallest number of changes we need to apply to the prefix of word1 that ends at index i to change that prefix to the prefix of word2 that ends at index j. Now you can see how the preparation, distance[i][0] = i makes sense - it would take exactly i deletions to make any prefix of length i into a string of length zero!
if c == d, then : distance[i][j] = distance[i-1][j-1]
Translation: since we had to change nothing, the number of changes it took to make prefix length i the same as j would be the same number of changes to get the previous two prefixes equal, those with lengths [i-1][j-1].
If c does not equal d, we are going to choose the smaller of three options:
(a) if we replaced c with d: distance[i][j] = distance[i-1][j-1] + 1
Translation: imagine our prefixes are of similar length at this point and we just replaced the wrong character at i to be the same as the one at j. Again we look at the solution for the previous prefix lengths [i-1][j-1] but we need to add 1 since we made a change. (Now remeber this is one option of three that we will choose from. Also remember that any previous cell stores the optimal solution up to that point.)
(b) if we added d after c: distance[i][j] = distance[i][j-1] + 1
Translation: we've reached index i but it doesn't match j, therefore we can look at the optimal solution for adjusting this prefix length so it matches the one ending at (j-1) (a solution we already computed) and add the d so both prefixes reach [i][j] in a correct state. Again we need to add 1 to the previous state's solution.
(c) if we deleted c: distance[i][j] = distance[i-1][j] + 1
Translation: we've reached index i but it doesn't match j, therefore we can look at the optimal solution for adjusting the previous prefix length (i-1 which we already computed) so it matches the one ending at j, but we need to add 1 since we need to remove c to reach the previous prefix length.
Example:
word1 = 'ab'
word2 = 'ac'
m = [[0,1,2]
,[1,0,1]
,[2,1,_]]
(i,j)
1,1 => m[i][j] = m[i-1][j-1] = 0 // no change needed
1,2 => min(m[0][1],m[1,1],m[0,2]) + 1
= min(1 ,0 ,2 ) + 1
= 1
choice represented: easiest to change 'a' to 'ac' by adding
1 ('c') to the solution for [i][j-1] = [1][1]
2,1 => min(m[1][0],m[2][0],m[1][1]) + 1
= min(1, ,2 ,0 ) + 1
= 1
choice represented: easiest to change 'ab' to 'a' by adding
1 (deletion) to the solution for [i-1][j] = [1][1]
2,2 => min(m[1][1],m[2][1],m[1][2]) + 1
= min(0 ,1 ,1 ) + 1
= _
choice represented: you figure it out...

Generating a list of all permutations of a given character set between a minimum and maximum amount of characters

I am trying to generate a list of all permutations of a given character set between a specified minimum and maximum amount of characters in Java, for use in a password cracking program. For example, with a character set of ab, a minimum number of characters of 2 and a maximum number of characters of 4, this would be the output:
aa
ab
ba
bb
aaa
aab
aba
baa
abb
bab
bba
bbb
aaaa
aaab
aaba
abaa
baaa
aabb
abab
baab
abbb
babb
bbbb
bbba
bbab
baba
bbaa
abba
I am stumped and can't think of a way to do this efficiently and without duplicates. What is the best recursive algorithm for doing this given a String for the character set, an int for the minimum number of characters, and an int for the maximum number of characters?
Here is the pseudo-code for what I am trying to do:
//1.start at the minimum number of characters with all characters at index 0
//2.increment rightmost by 1 until last char is reached
//3.shift left by 1
//4.increment this char by 1 unless last char is reached
//5.repeat step 2
//6.repeat step 3
//7.repeat step 4; if last char is reached repeat step 3
//8.when you can't shift left anymore go to the next number of characters, unless the maximum has been reached
All I need to do is figure out how to translate this into a recursive method.
The most straight-forward version is to use a single recursive call for each letter, keeping track of the depth. Then make a recursive call for each length from min to max:
char[] letterBank;
// Read in the chars and separate them into a char array
List<String> myList = new ArrayList<>();
void populateMyList(int depth, String stringSoFar) {
if (depth == 0) {
myList.add(stringSoFar)
return;
}
for (int i = 0; i < letterBank.length; i++) {
populateMyList(depth - 1, stringSoFar + letterBank[i]);
}
}
main() {
for(int i = min; i <= max; i++)
populateMyList(i, "");
}
Note that if you want efficiency then use a StringBuilder and not a string as a parameter.

Find every possible subset given a string [duplicate]

This question already has answers here:
Memory efficient power set algorithm
(5 answers)
Closed 8 years ago.
I'm trying to find every possible anagram of a string in Java - By this I mean that if I have a 4 character long word I want all the possible 3 character long words derived from it, all the 2 character long and all the 1 character long. The most straightforward way I tought of is to use two nested for loops and iterare over the string. This is my code as of now:
private ArrayList<String> subsets(String word){
ArrayList<String> s = new ArrayList<String>();
int length = word.length();
for (int c=0; c<length; c++){
for (int i=0; i<length-c; i++){
String sub = word.substring(c, c+i+1);
System.out.println(sub);
//if (!s.contains(sub) && sub!=null)
s.add(sub);
}
}
//java.util.Collections.sort(s, new MyComparator());
//System.out.println(s.toString());
return s;
}
My problem is that it works for 3 letter words, fun yelds this result (Don't mind the ordering, the word is processed so that I have a string with the letters in alphabetical order):
f
fn
fnu
n
nu
u
But when I try 4 letter words, it leaves something out, as in catq gives me:
a
ac
acq
acqt
c
cq
cqt
q
qt
t
i.e., I don't see the 3 character long word act - which is the one I'm looking for when testing this method. I can't understand what the problem is, and it's most likely a logical error I'm making when creating the substrings. If anyone can help me out, please don't give me the code for it but rather the reasoning behind your solution. This is a piece of coursework and I need to come up with the code on my own.
EDIT: to clear something out, for me acq, qca, caq, aqc, cqa, qac, etc. are the same thing - To make it even clearer, what happens is that the string gets sorted in alphabetical order, so all those permutations should come up as one unique result, acq. So, I don't need all the permutations of a string, but rather, given a 4 character long string, all the 3 character long ones that I can derive from it - that means taking out one character at a time and returning that string as a result, doing that for every character in the original string.
I hope I have made my problem a bit clearer
It's working fine, you just misspelled "caqt" as "acqt" in your tests/input.
(The issue is probably that you're sorting your input. If you want substrings, you have to leave the input unsorted.)
After your edits: see Generating all permutations of a given string Then just sort the individual letters, and put them in a set.
Ok, as you've already devised your own solution, I'll give you my take on it. Firstly, consider how big your result list is going to be. You're essentially taking each letter in turn, and either including it or not. 2 possibilities for each letter, gives you 2^n total results, where n is the number of letters. This of course includes the case where you don't use any letter, and end up with an empty string.
Next, if you enumerate every possibility with a 0 for 'include this letter' and a 1 for don't include it, taking your 'fnu' example you end up with:
000 - ''
001 - 'u'
010 - 'n'
011 - 'nu'
100 - 'f'
101 - 'fu' (no offense intended)
110 - 'fn'
111 - 'fnu'.
Clearly, these are just binary numbers, and you can derive a function that given any number from 0-7 and the three letter input, will calculate the corresponding subset.
It's fairly easy to do in java.. don't have a java compiler to hand, but this should be approximately correct:
public string getSubSet(string input, int index) {
// Should check that index >=0 and < 2^input.length here.
// Should also check that input.length <= 31.
string returnValue = "";
for (int i = 0; i < input.length; i++) {
if (i & (1 << i) != 0) // 1 << i is the equivalent of 2^i
returnValue += input[i];
}
return returnValue;
}
Then, if you need to you can just do a loop that calls this function, like this:
for (i = 1; i < (1 << input.length); i++)
getSubSet(input, i); // this doesn't do anything, but you can add it to a list, or output it as desired.
Note I started from 1 instead of 0- this is because the result at index 0 will be the empty string. Incidentally, this actually does the least significant bit first, so your output list would be 'f', 'n', 'fn', 'u', 'fu', 'nu', 'fnu', but the order didn't seem important.
This is the method I came up with, seems like it's working
private void subsets(String word, ArrayList<String> subset){
if(word.length() == 1){
subset.add(word);
return;
}
else {
String firstChar = word.substring(0,1);
word = word.substring(1);
subsets(word, subset);
int size = subset.size();
for (int i = 0; i < size; i++){
String temp = firstChar + subset.get(i);
subset.add(temp);
}
subset.add(firstChar);
return;
}
}
What I do is check if the word is bigger than one character, otherwise I'll add the character alone to the ArrayList and start the recursive process. If it is bigger, I save the first character and make a recursive call with the rest of the String. What happens is that the whole string gets sliced in characters saved in the recursive stack, until I hit the point where my word has become of length 1, only one character remaining.
When that happens, as I said at the start, the character gets added to the List, now the recursion starts and it looks at the size of the array, in the first iteration is 1, and then with a for loop adds the character saved in the stack for the previous call concatenated with every element in the ArrayList. Then it adds the character on its own and unwinds the recursion again.
I.E., with the word funthis happens:
f saved
List empty
recursive call(un)
-
u saved
List empty
recursive call(n)
-
n.length == 1
List = [n]
return
-
list.size=1
temp = u + list[0]
List = [n, un]
add the character saved in the stack on its own
List = [n, un, u]
return
-
list.size=3
temp = f + list[0]
List = [n, un, u, fn]
temp = f + list[1]
List = [n, un, u, fn, fun]
temp = f + list[2]
List = [n, un, u, fn, fun, fu]
add the character saved in the stack on its own
List = [n, un, u, fn, fun, fu, f]
return
I have been as clear as possible, I hope this clarifies what was my initial problem and how to solve it.
This is working code:
public static void main(String[] args) {
String input = "abcde";
Set<String> returnList = permutations(input);
System.out.println(returnList);
}
private static Set<String> permutations(String input) {
if (input.length() == 1) {
Set<String> a = new TreeSet<>();
a.add(input);
return a;
}
Set<String> returnSet = new TreeSet<>();
for (int i = 0; i < input.length(); i++) {
String prefix = input.substring(i, i + 1);
Set<String> permutations = permutations(input.substring(i + 1));
returnSet.add(prefix);
returnSet.addAll(permutations);
Iterator<String> it = permutations.iterator();
while (it.hasNext()) {
returnSet.add(prefix + it.next());
}
}
return returnSet;
}

Finding the index of a permutation within a string

I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.
If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.
Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.
Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).

Categories