Java: How to implement Word Break? - java

I'm trying to understanding the following Java + Dynamic Programming implementation (https://pingzhblog.wordpress.com/2015/09/17/word-break/):
public class Solution {
public boolean wordBreak(String s, Set<String> wordDict) {
if(s == null) {
return false;
}
boolean[] wordBreakDp = new boolean[s.length() + 1];
wordBreakDp[0] = true;
for(int i = 1; i <= s.length(); i++) {
for(int j = 0; j < i; j++) {
String word = s.substring(j, i);
if(wordBreakDp[j] && wordDict.contains(word)) {
wordBreakDp[i] = true;
break;
}
}
}//end for i
return wordBreakDp[s.length()];
}
}
But need some clarification with String s = "abcxyz" and Set <String> wordDict = ["z", "xy", "ab", "c"].
I'm still unclear as to what wordBreakDp[] represents, and setting one to true means.
So I made the attempt and got wordBreakDP[2,3,5,6]=true, but what do those indexes tell? Couldn't I have just checked for i=6 since all we are checking is if last index of wordBreakDp[] is true, wordBreakDp[s.length()];?
And say for example I got ab for s.substring(0, 2);, but then how can we just assume that next loop, s.substring(1, 2);, is not useful and just break; out of the loop?
Thank you

This isn't really an answer, but it might be helpful in understanding the loops. It prints the value of substring(i,j) and also wordBreakDp[j] at each iteration. It also prints the final solution (segmentation) at the end of the method.
public boolean wordBreak(String s, Set<String> wordDict) {
if(s == null) {
return false;
}
boolean[] wordBreakDp = new boolean[s.length() + 1];
wordBreakDp[0] = true;
for(int i = 1; i <= s.length(); i++) {
for(int j = 0; j < i; j++) {
String word = s.substring(j, i);
System.out.println("["+j+","+i+"]="+s.substring(j,i)+", wordBreakDP["+j+"]="+wordBreakDp[j]);
if(wordBreakDp[j] && wordDict.contains(word)) {
wordBreakDp[i] = true;
break;
}
}
}//end for i
for (int i = 1, start=0; i <= s.length(); i++) {
if (wordBreakDp[i]) {
System.out.println(s.substring(start,i));
start = i;
}
}
return wordBreakDp[s.length()];
}

I think I have the answer to your last question, which was:
And say for example I got ab for s.substring(0, 2);, but then how can
we just assume that next loop, s.substring(1, 2);, is not useful and
just break; out of the loop?
You are right that if you got "ab" for s.substring(0,2) and you break out of the loop, this doesn't necessarily mean that s.substring(1,2) is not useful. Let's say that s.substring(1,2) is useful. This means that "a" is also a valid word, and so is "b". The solution found by this algorithm would start with "ab" whereas the solution found by not breaking would be "a" followed by "b". Both these solutions are correct (assuming that the rest of the string can be broken into valid words as well). The algorithm doesn't find all valid solutions. It just returns true if the string can be broken into valid words, i.e., if there is one solution that satisfies the condition. There may be more than one solution, of course, but that is not the purpose of this algorithm.

Note, f[i] means whether the first i characters of s is valid. An example,
String = catsand
Dict = [cat, cats, sand, and]
s = c a t s a n d
dp = T F F T T F F ?
i = 0 1 2 3 4 5 6 7
*
Note dp[0] is True, because empty string is valid. Say we have dp[0] through dp[6].
Let's calculate dp[7], which denotes whether catsand is valid.
For catsand to be valid, there must be a non-empty suffix that is in dict, AND the remaining prefix is valid.
"catsand" is a valid, if:
If "" is valid, and "catsand" is inDict, or
If "c" is valid, and "atsand" is inDict, or
If "ca" is valid, and "tsand" is inDict, or
If "cat" is valid, and "sand" is inDict, or
If "cats" is valid, and "and" is inDict, or
If "catsa" is valid, and "nd" is inDict, or
If "catsan" is valid, and "d" is inDict
Translated to pesudo code:
dp[7] is True, if:
j j i
If dp[0] && inDict( s[0..7) ) or
If dp[1] && inDict( s[1..7) ) or
If dp[2] && inDict( s[2..7) ) or
If dp[3] && inDict( s[3..7) ) or
If dp[4] && inDict( s[4..7) ) or
If dp[5] && inDict( s[5..7) ) or
If dp[6] && inDict( s[6..7) )

public class Solution {
public int wordBreak(String a, ArrayList<String> b) {
int n = a.length();
boolean dp[] = new boolean[n+1];
dp[0] = true;
for(int i = 0;i<n;i++){
if(!dp[i])continue;
for(String s : b){
int len = s.length();
int end = len + i;
if(end > n || dp[end])continue;
if(a.substring(i,end).equals(s)){
dp[end] = true;
}
}
}
if(dp[n])return 1;
else return 0;
}
}

Related

Iterate over two strings checking to see if it matches with its pair

I am still new to Java, and I am currently working on a program that will take two strings as arguments and return the number of mismatched pairs. For my program I am working with ATGC because in science, A's always match up with T's and G's always match up with C's. I cant quite figure out how to iterate over the strings and see that the first character in string one (T for example) matches up with its intended pair (A), and if it doesn't it is a mismatched pair and it should be added to a counter to be totaled at the end. I believe I can use something called charAt(), but I am unsure of how that works.
I also need to figure out how to be able to take the absolute value of counter before it is added to the finalCounter. The main reason for this is because I just want to worry about getting the length difference between the two rather than making sure that the longer string is subracted from the smaller string.
Any help would be greatly appreciated!
''''
public class CountMismatches {
public static void main(String[] args) {
{
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2))
//*expected to print out 5 because there are 3 mismatched pairs and 2 that do not have a pair*
}
}
public static int count_mismatches(String seq1, String seq2) {
int mismatchCount = 0;
int counter = seq1.length() - seq2.length();
int finalCounter = mismatchCount + counter;
for(int i = 0; i < seq1.length(); i++) if (seq1.charAt(i) == seq2.charAt(i)) {
break; //checks to see if the length of seq1 and seq2 are the same
}
for(int i = 0; i < seq1.length(); i++) if (seq1.charAt(i) != seq2.charAt(i)) {
return counter; //figure out how to do absolute value for negative numbers
}
return finalCounter;
}
}
'''
Since you want to count only the places where there are differences, you can iterate through the minimum length present in both the strings and find out the places where they are different.
In the end, you can add absolute difference of length between seq1 and seq2 and return that value to the main function.
For the logic, all you have to do is apply 4 if conditions to check if character is A,G,C,T and if suitable pair is present in the other string.
public class CountMismatches {
public static void main(String[] args) {
{
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2));
}
}
public static int count_mismatches(String seq1, String seq2) {
int finalCounter = 0;
for (int i = 0; i < Math.min(seq1.length(), seq2.length()); i++) {
char c1 = seq1.charAt(i);
char c2 = seq2.charAt(i);
if (c1 == 'A') {
if (c2 == 'T')
continue;
else
finalCounter++;
} else if (c1 == 'T') {
if (c2 == 'A')
continue;
else
finalCounter++;
} else if (c1 == 'G') {
if (c2 == 'C')
continue;
else
finalCounter++;
} else if (c1 == 'C') {
if (c2 == 'G')
continue;
else
finalCounter++;
}
}
return finalCounter + (Math.abs(seq1.length() - seq2.length()));
}
}
and the output is as follows :
5
Make these refactorings:
To make the comparisons easy to code and understand, create a Map whose entires are each pair (both directions)
Iterate over the Strings up to the length of the shortest one, adding up the number of matching pairs as you go
The result is the length of the longest String minus the number of pairs
Like this:
public static int count_mismatches(String seq1, String seq2) {
Map<Character, Character> pairs = Map.of('A', 'T', 'T', 'A', 'G', 'C', 'C', 'G');
int count = 0;
for (int i = 0; i < Math.min(seq1.length(), seq2.length()); i++) {
if (pairs.get(seq1.charAt(i)) == seq2.charAt(i)) {
count++;
}
}
return Math.max(seq1.length(), seq2.length()) - count;
}
See live demo, which returns 5 for your sample input.
Good Evening,
Something seems off here, this snippet of code:
for(int i = 0; i < seq1.length(); i++)
if (seq1.charAt(i) == seq2.charAt(i)) {
break; //checks to see if the length of seq1 and seq2 are the same
}
Does not do what you think it does. This cycle will loop through all characters in sequence1 using i < seq1.length() and for each character that exists in seq1, it will check if said character is equal to the character with the same index in seq2.
This means that a correction is in order:
int countMismatches = 0;
for(int i = 0; i < seq1.length();i++){
switch(seq1.charAt(i)){
case 'A':
if(seq2.charAt(i) != 'T') countMismatches++;
break;
}
}
Repeat this process for the other letters, and voilá, you should be able to count your mismatches this way.
Do be careful with sequences having different lengths, as if that happens, as soon as you step out of a bound, you will receive an IndexOutOfBoundsException, indicating you've tried to check a character that does not exist.
First you must find out which string is the shortest in length. Also you need to get the length difference when calculating the shortest string. After that, use that length as a terminating condition in your for loop. You can use booleans to check whether the values are present before incrementing the counter with an if statement.
The absolute value of any number can be obtained by calling the static method abs() from the Math class. Last, just add the mismatchCounts to the absolute value of the length difference in order to obtain the result.
Here is my solution.
public class App {
public static void main(String[] args) throws Exception {
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(compareStrings(seq1, seq2));
}
public static int compareStrings(String stringOne, String stringTwo) {
Character A = 'A', T = 'T', G = 'G', C = 'C';
int mismatchCount = 0;
int lowestStringLenght = 0;
int length_one = stringOne.length();
int length_two = stringTwo.length();
int lenght_difference = 0;
if (length_one < length_two) {// string one lenght is greater
lowestStringLenght = length_one;
lenght_difference = length_one - length_two;
} else if (length_one > length_two) {// string two lenght is greater
lowestStringLenght = length_two;
lenght_difference = length_two - length_one;
} else { // lenghts must be equal, use either
lowestStringLenght = length_one;
lenght_difference = 0; // there is no difference because they are equal
}
for (int i = 0; i < lowestStringLenght; i++) {
// A matches with T
// G matches with C
// evaluate if the values A, T, G, C are present
boolean A_T_PRESENT = stringOne.charAt(i) == A && stringTwo.charAt(i) == T;
boolean G_C_PRESENT = stringOne.charAt(i) == G && stringTwo.charAt(i) == C;
boolean T_A_PRESENT = stringOne.charAt(i) == T && stringTwo.charAt(i) == A;
boolean C_G_PRESENT = stringOne.charAt(i) == C && stringTwo.charAt(i) == G;
boolean TWO_EQUAL = stringOne.charAt(i) == stringTwo.charAt(i);
// characters are equal, increase mismatch counter
if (TWO_EQUAL) {
mismatchCount++;
continue;
}
// all booleans evaluated to false, it means that the characters are not proper
// matches. Increment mismatchCount
else if (!A_T_PRESENT && !G_C_PRESENT && !T_A_PRESENT && !C_G_PRESENT) {
mismatchCount++;
continue;
} else {
continue;
}
}
// calculate the sum of the mismatches plus the abs of the lenght difference
lenght_difference = Math.abs(lenght_difference);
return mismatchCount + lenght_difference;
}
}
Avoid char
The char type is legacy, essentially broken. As a 16-bit value, char is physically incapable of representing most characters. The char type in your particular case would work. But using char is a bad habit generally, as such code may break when encountering any of about 75,000 characters defined in Unicode.
Code point
Use code point integer numbers instead. A code point is the number assigned to each of the over 140,000 characters defined by the Unicode Consortium.
Here we get an IntStream, a series of int values, one for each character in the input string. Then we collect these integer numbers into an array of int values.
int[] codePoints1 = seq1.codePoints().toArray() ;
int[] codePoints2 = seq2.codePoints().toArray() ;
You said the input strings may be of unequal length. So our two arrays may be jagged, of different lengths. Figure out the size of the shorter array.
int smallerSize = Math.min( codePoints1.length , codePoints2.length ) ;
Keep track of the index number of mismatched rows.
List<Integer> mismatchIndices = new ArrayList <>();
Loop the arrays based on that smaller size.
for( int i = 0 ; i < smallerSize ; i ++ )
{
if ( isBasePairValid( codePoint first , codePoint second ) )
{
…
} else
{
mismatchIndices.add( i ) ;
}
}
Write an isBasePairValid method
Write the isBasePairValid method, taking two arguments, the code points of the two nucleobase letters.
static int A = "A".codePointAt( 0 ) ; // Annoying zero-based index counting. So first character is number zero.
static int C = "C".codePointAt( 0 ) ;
static int G = "G".codePointAt( 0 ) ;
static int T = "T".codePointAt( 0 ) ;
if( first == A ) return ( second == T )
else if( first == T ) return ( second == A )
else if( first == C ) return ( second == G )
else if( first == G ) return ( second == C )
else { throw new IllegalStateException( … ) ; }
Count the mismatches.
int countMismatches = mismatchIndices.size() ;
The numerical sum of chars T & A and G & C is fixed and unique for legal nucleobase pairs. So you just need to ensure that the corresponding bases have one of those sums.
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2));
prints
5
find max length to iterate
establish fixed sums for comparison
iterate and compare to expected pairing and update count appropriately
public static int count_mismatches(String seq1, String seq2) {
int len1 = seq1.length();
int len2 = seq2.length();
int len = len1;
if (len1 > len2) {
len = len2;
}
int sumTA = 'T'+'A';
int sumGC = 'G'+'C';
int misMatchCount = Math.abs(len1-len2);
for (int i = 0; i < len; i++) {
int pair = seq1.charAt(i) + seq2.charAt(i);
if (pair != sumTA && pair != sumGC) {
misMatchCount++;
}
}
return misMatchCount;
}

Finding the Number of Times an Expression Occurs in a String Continuously and Non Continuously

I had a coding interview over the phone and was asked this question:
Given a String (for example):
"aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc"
and an expression (for example):
"a+b+c-"
where:
+: means the char before it is repeated 2 times
-: means the char before it is repeated 4 times
Find the number of times the given expression appears in the string with the operands occurring non continuously and continuously.
The above expression occurs 4 times:
1) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
2) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
3) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
4) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
I had no idea how to do it. I started doing an iterative brute force method with lots of marking of indices but realized how messy and hard that would to code half way through:
import java.util.*;
public class Main {
public static int count(String expression, String input) {
int count = 0;
ArrayList<char[]> list = new ArrayList<char[]>();
// Create an ArrayList of chars to iterate through the expression and match to string
for(int i = 1; i<expression.length(); i=i+2) {
StringBuilder exp = new StringBuilder();
char curr = expression.charAt(i-1);
if(expression.charAt(i) == '+') {
exp.append(curr).append(curr);
list.add(exp.toString().toCharArray());
}
else { // character is '-'
exp.append(curr).append(curr).append(curr).append(curr);
list.add(exp.toString().toCharArray());
}
}
char[] inputArray = input.toCharArray();
int i = 0; // outside pointer
int j = 0; // inside pointer
while(i <= inputArray.length) {
while(j <= inputArray.length) {
for(int k = 0; k< list.size(); k++) {
/* loop through
* all possible combinations in array list
* with multiple loops
*/
}
j++;
}
i++;
j=i;
}
return count;
}
public static void main(String[] args) {
String expression = "a+b+c-";
String input = "aaksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
System.out.println("The expression occurs: "+count(expression, input)+" times");
}
}
After spending a lot of time doing it iteratively he mentioned recursion and I still couldn't see a clear way doing it recursively and I wasn't able to solve the question. I am trying to solve it now post-interview and am still not sure how to go about this question. How should I go about solving this problem? Is the solution obvious? I thought this was a really hard question for a coding phone interview.
Non-recursion algorithm that requires O(m) space and operates in O(n*m), where m is number of tokens in query:
#Test
public void subequences() {
String input = "aabbccaacccccbbd";
String query = "a+b+";
// here to store tokens of a query: e.g. {a, +}, {b, +}
char[][] q = new char[query.length() / 2][];
// here to store counts of subsequences ending by j-th token found so far
int[] c = new int[query.length() / 2]; // main
int[] cc = new int[query.length() / 2]; // aux
// tokenize
for (int i = 0; i < query.length(); i += 2)
q[i / 2] = new char[] {query.charAt(i), query.charAt(i + 1)};
// init
char[] sub2 = {0, 0}; // accumulator capturing last 2 chars
char[] sub4 = {0, 0, 0, 0}; // accumulator capturing last 4 chars
// main loop
for (int i = 0; i < input.length(); i++) {
shift(sub2, input.charAt(i));
shift(sub4, input.charAt(i));
boolean all2 = sub2[1] != 0 && sub2[0] == sub2[1]; // true if all sub2 chars are same
boolean all4 = sub4[3] != 0 && sub4[0] == sub4[1] // true if all sub4 chars are same
&& sub4[0] == sub4[2] && sub4[0] == sub4[3];
// iterate tokens
for (int j = 0; j < c.length; j++) {
if (all2 && q[j][1] == '+' && q[j][0] == sub2[0]) // found match for "+" token
cc[j] = j == 0 // filling up aux array
? c[j] + 1 // first token, increment counter by 1
: c[j] + c[j - 1]; // add value of preceding token counter
if (all4 && q[j][1] == '-' && q[j][0] == sub4[0]) // found match for "-" token
cc[j] = j == 0
? c[j] + 1
: c[j] + c[j - 1];
}
if (all2) sub2[1] = 0; // clear, to make "aa" occur in "aaaa" 2, not 3 times
if (all4) sub4[3] = 0;
copy(cc, c); // copy aux array to main
}
}
System.out.println(c[c.length - 1]);
}
// shifts array 1 char left and puts c at the end
void shift(char[] cc, char c) {
for (int i = 1; i < cc.length; i++)
cc[i - 1] = cc[i];
cc[cc.length - 1] = c;
}
// copies array contents
void copy(int[] from, int[] to) {
for (int i = 0; i < from.length; i++)
to[i] = from[i];
}
The main idea is to catch chars from the input one by one, holding them in 2- and 4-char accumulators and check if any of them match some tokens of the query, remembering how many matches have we got for sub-queries ending by these tokens so far.
Query (a+b+c-) is splitted into tokens (a+, b+, c-). Then we collect chars in accumulators and check if they match some tokens. If we find match for first token, we increment its counter by 1. If we find match for another j-th token, we can create as many additional subsequences matching subquery composed of tokens [0...j], as many of them now exist for subquery composed of tokens [0... j-1], because this match can be appended to every of them.
For example, we have:
a+ : 3 (3 matches for a+)
b+ : 2 (2 matches for a+b+)
c- : 1 (1 match for a+b+c-)
when cccc arrives. Then c- counter should be increased by b+ counter value, because so far we have 2 a+b+ subsequences and cccc can be appended to both of them.
Let's call the length of the string n, and the length of the query expression (in terms of the number of "units", like a+ or b-) m.
It's not clear exactly what you mean by "continuously" and "non-continuously", but if "continuously" means that there can't be any gaps between query string units, then you can just use the KMP algorithm to find all instances in O(m+n) time.
We can solve the "non-continuous" version in O(nm) time and space with dynamic programming. Basically, what we want to compute is a function:
f(i, j) = the number of occurrences of the subquery consisting of the first i units
of the query expression, in the first j characters of the string.
So with your example, f(2, 41) = 2, since there are 2 separate occurrences of the subpattern a+b+ in the first 41 characters of your example string.
The final answer will then be f(n, m).
We can compute this recursively as follows:
f(0, j) = 0
f(i, 0) = 0
f(i > 0, j > 0) = f(i, j-1) + isMatch(i, j) * f(i-1, j-len(i))
where len(i) is the length of the ith unit in the expression (always 2 or 4) and isMatch(i, j) is a function that returns 1 if the ith unit in the expression matches the text ending at position j, and 0 otherwise. For example, isMatch(15, 2) = 1 in your example, because s[14..15] = bb. This function takes just constant time to run, because it never needs to check more than 4 characters.
The above recursion will already work as-is, but we can save time by making sure that we only solve each subproblem once. Because the function f() depends only on its 2 parameters i and j, which range between 0 and m, and between 0 and n, respectively, we can just compute all n*m possible answers and store them in a table.
[EDIT: As Sasha Salauyou points out, the space requirement can in fact be reduced to O(m). We never need to access values of f(i, k) with k < j-1, so instead of storing m columns in the table we can just store 2, and alternate between them by always accessing column m % 2.]
Wanted to try it for myself and figured I could then share my solution as well. The parse method obviously has issues when there is indeed a char 0 in the expression (although that would probably be the bigger issue itself), the find method will fail for an empty needles array and I wasn't sure if ab+c- should be considered a valid pattern (I treat it as such). Note that this covers only the non-continous part so far.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class Matcher {
public static void main(String[] args) {
String haystack = "aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
String[] needles = parse("a+b+c-");
System.out.println("Needles: " + Arrays.toString(needles));
System.out.println("Found: " + find(haystack, needles, 0));
needles = parse("ab+c-");
System.out.println("Needles: " + Arrays.toString(needles));
System.out.println("Found: " + find(haystack, needles, 0));
}
private static int find(String haystack, String[] needles, int i) {
String currentNeedle = needles[i];
int pos = haystack.indexOf(currentNeedle);
if (pos < 0) {
// Abort: Current needle not found
return 0;
}
// Current needle found (also means that pos + currentNeedle.length() will always
// be <= haystack.length()
String remainingHaystack = haystack.substring(pos + currentNeedle.length());
// Last needle?
if (i == needles.length - 1) {
// +1: We found one match for all needles
// Try to find more matches of current needle in remaining haystack
return 1 + find(remainingHaystack, needles, i);
}
// Try to find more matches of current needle in remaining haystack
// Try to find next needle in remaining haystack
return find(remainingHaystack, needles, i) + find(remainingHaystack, needles, i + 1);
}
private static String[] parse(String expression) {
List<String> searchTokens = new ArrayList<String>();
char lastChar = 0;
for (int i = 0; i < expression.length(); i++) {
char c = expression.charAt(i);
char[] chars;
switch (c) {
case '+':
// last char is repeated 2 times
chars = new char[2];
Arrays.fill(chars, lastChar);
searchTokens.add(String.valueOf(chars));
lastChar = 0;
break;
case '-':
// last char is repeated 4 times
chars = new char[4];
Arrays.fill(chars, lastChar);
searchTokens.add(String.valueOf(chars));
lastChar = 0;
break;
default:
if (lastChar != 0) {
searchTokens.add(String.valueOf(lastChar));
}
lastChar = c;
}
}
return searchTokens.toArray(new String[searchTokens.size()]);
}
}
Output:
Needles: [aa, bb, cccc]
Found: 4
Needles: [a, bb, cccc]
Found: 18
How about preprocessing aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc?
This become a1k1s1d1b1a2l1a1s1k1d1h1f1b2l1a1j1d1f1h1a1c4a1o1u1d1g1a1l1s1a2b2l1i1s1d1f1h1c4
Now find occurrences of a2, b2, c4.
Tried it code below but right now it gives only first possible match based of depth first.
Need to be changed to do all possible combination instead of just first
import java.util.ArrayList;
import java.util.List;
public class Parsing {
public static void main(String[] args) {
String input = "aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
System.out.println(input);
for (int i = 0; i < input.length(); i++) {
System.out.print(i/10);
}
System.out.println();
for (int i = 0; i < input.length(); i++) {
System.out.print(i%10);
}
System.out.println();
List<String> tokenisedSearch = parseExp("a+b+c-");
System.out.println(tokenisedSearch);
parse(input, 0, tokenisedSearch, 0);
}
public static boolean parse(String input, int searchFromIndex, List<String> tokensToSeach, int currentTokenIndex) {
if(currentTokenIndex >= tokensToSeach.size())
return true;
String token = tokensToSeach.get(currentTokenIndex);
int found = input.indexOf(token, searchFromIndex);
if(found >= 0) {
System.out.println("Found at Index "+found+ " Token " +token);
return parse(input, searchFromIndex+1, tokensToSeach, currentTokenIndex+1);
}
return false;
}
public static List<String> parseExp(String exp) {
List<String> list = new ArrayList<String>();
String runningToken = "";
for (int i = 0; i < exp.length(); i++) {
char at = exp.charAt(i);
switch (at) {
case '+' :
runningToken += runningToken;
list.add(runningToken);
runningToken = "";
break;
case '-' :
runningToken += runningToken;
runningToken += runningToken;
list.add(runningToken);
runningToken = "";
break;
default :
runningToken += at;
}
}
return list;
}
}
Recursion may be the following (pseudocode):
int search(String s, String expression) {
if expression consists of only one token t /* e. g. "a+" */ {
search for t in s
return number of occurrences
} else {
int result = 0
divide expression into first token t and rest expression
// e. g. "a+a+b-" -> t = "a+", rest = "a+b-"
search for t in s
for each occurrence {
s1 = substring of s from the position of occurrence to the end
result += search(s1, rest) // search for rest of expression in rest of string
}
return result
}
}
Applying this to entire string, you'll get number of non-continuous occurrences. To get continuous occurrences, you don't need recursion at all--just transform expression into string and search by iteration.
If you convert the search string first with a simple parser/compiler so a+ becomes aa etc. then you can simply take this string and run a regular expression match against your hay stack. (Sorry, I'm no Java coder so can't deliver any real code but it is not really difficult)

Determine if a given string is a k-palindrome

I'm trying to solve the following interview practice question:
A k-palindrome is a string which transforms into a palindrome on removing at most
k characters.
Given a string S, and an integer K, print "YES" if S is a k-palindrome;
otherwise print "NO".
Constraints:
S has at most 20,000 characters.
0 <= k <= 30
Sample Test Cases:
Input - abxa 1
Output - YES
Input - abdxa 1
Output - NO
My approach I've decided is going to be taking all possible String combinations of length s.length - k or greater, i.e. "abc" and k = 1 -> "ab" "bc" "ac" "abc" and checking if they are palindromes. I have the following code so far, but can't seem to figure out a proper way to generate all these string combinations in the general case:
public static void isKPalindrome(String s, int k) {
// Generate all string combinations and call isPalindrome on them,
// printing "YES" at first true
}
private static boolean isPalindrome(String s) {
char[] c = s.toCharArray()
int slow = 0;
int fast = 0;
Stack<Character> stack = new Stack<>();
while (fast < c.length) {
stack.push(c[slow]);
slow += 1;
fast += 2;
}
if (c.length % 2 == 1) {
stack.pop();
}
while (!stack.isEmpty()) {
if (stack.pop() != c[slow++]) {
return false;
}
}
return true;
}
Can anyone figure out a way to implement this, or perhaps demonstrate a better way?
I think there is a better way
package se.wederbrand.stackoverflow;
public class KPalindrome {
public static void main(String[] args) {
KPalindrome kPalindrome = new KPalindrome();
String s = args[0];
int k = Integer.parseInt(args[1]);
if (kPalindrome.testIt(s, k)) {
System.out.println("YES");
}
else {
System.out.println("NO");
}
}
boolean testIt(String s, int k) {
if (s.length() <= 1) {
return true;
}
while (s.charAt(0) == s.charAt(s.length()-1)) {
s = s.substring(1, s.length()-1);
if (s.length() <= 1) {
return true;
}
}
if (k == 0) {
return false;
}
// Try to remove the first or last character
return testIt(s.substring(0, s.length() - 1), k - 1) || testIt(s.substring(1, s.length()), k - 1);
}
}
Since K is max 30 it's likely the string can be invalidated pretty quick and without even examining the middle of the string.
I've tested this with the two provided test cases as well as a 20k characters long string with just "ab" 10k times and k = 30;
All tests are fast and returns the correct results.
This can be solved using Edit distance dynamic programming algorithm. Edit distance DP algorithm is used to find the minimum operations required to convert a source string to destination string. The operations can be either addition or deletion of characters.
The K-palindrome problem can be solved using Edit distance algorithm by checking the minimum operation required to convert the input string to its reverse.
Let editDistance(source,destination) be the function which takes source string and destination string and returns the minimum operations required to convert the source string to destination string.
A string S is K-palindrome if editDistance(S,reverse(S))<=2*K
This is because we can transform the given string S into its reverse by deleting atmost K letters and then inserting the same K letters in different position.
This will be more clear with an example.
Let S=madtam and K=1.
To convert S into reverse of S (i.e matdam) first we have to remove the character 't' at index 3 ( 0 based index) in S.
Now the intermediate string is madam. Then we have to insert the character 't' at index 2 in the intermediate string to get "matdam" which is the reverse of string s.
If you look carefully you will know that the intermediate string "madam" is the palindrome that is obtained by removing k=1 characters.
I found the length of a longest string such that after removing characters >= k, we will be having a palindrome. I have used dynamic programming here. The palindrome I have considered need not be consecutive. Its like abscba has a longest palindromic length of 4.
So now this can be used further, such that whenever k >= (len - len of longest palindrome), it results to true else false.
public static int longestPalindrome(String s){
int len = s.length();
int[][] cal = new int[len][len];
for(int i=0;i<len;i++){
cal[i][i] = 1; //considering strings of length = 1
}
for(int i=0;i<len-1;i++){
//considering strings of length = 2
if (s.charAt(i) == s.charAt(i+1)){
cal[i][i+1] = 2;
}else{
cal[i][i+1] = 0;
}
}
for(int p = len-1; p>=0; p--){
for(int q=p+2; q<len; q++){
if (s.charAt(p)==s.charAt(q)){
cal[p][q] = 2 + cal[p+1][q-1];
}else{
cal[p][q] = max(cal[p+1][q], cal[p][q-1]);
}
}
}
return cal[0][len-1];
}
This is a common interview question, and I'm little surprised that no one has mentioned dynamic programming yet. This problem exhibits optimal substructure (if a string is a k-palindrome, some substrings are also k-palindromes), and overlapping subproblems (the solution requires comparing the same substrings more than once).
This is a special case of the edit distance problem, where we check if a string s can be converted to string p by only deleting characters from either or both strings.
Let the string be s and its reverse rev. Let dp[i][j] be the number of deletions required to convert the first i characters of s to the first j characters of rev. Since deletions have to be done in both strings, if dp[n][n] <= 2 * k, then the string is a k-palindrome.
Base case: When one of the strings is empty, all characters from the other string need to be deleted in order to make them equal.
Time complexity: O(n^2).
Scala code:
def kPalindrome(s: String, k: Int): Boolean = {
val rev = s.reverse
val n = s.length
val dp = Array.ofDim[Int](n + 1, n + 1)
for (i <- 0 to n; j <- 0 to n) {
dp(i)(j) = if (i == 0 || j == 0) i + j
else if (s(i - 1) == rev(j - 1)) dp(i - 1)(j - 1)
else 1 + math.min(dp(i - 1)(j), dp(i)(j - 1))
}
dp(n)(n) <= 2 * k
}
Since we are doing bottom-up DP, an optimization is to return false if at any time i == j && dp[i][j] > 2 * k, since all subsequent i == j must be greater.
Thanks to Andreas, that algo worked like a charm. Here my implementation for anyone who's curious. Slightly different, but fundamentally your same logic:
public static boolean kPalindrome(String s, int k) {
if (s.length() <= 1) {
return true;
}
char[] c = s.toCharArray();
if (c[0] != c[c.length - 1]) {
if (k <= 0) {
return false;
} else {
char[] minusFirst = new char[c.length - 1];
System.arraycopy(c, 1, minusFirst, 0, c.length - 1);
char[] minusLast = new char[c.length - 1];
System.arraycopy(c, 0, minusLast, 0, c.length - 1);
return kPalindrome(String.valueOf(minusFirst), k - 1)
|| kPalindrome(String.valueOf(minusLast), k - 1);
}
} else {
char[] minusFirstLast = new char[c.length - 2];
System.arraycopy(c, 1, minusFirstLast, 0, c.length - 2);
return kPalindrome(String.valueOf(minusFirstLast), k);
}
}
This problem can be solved using the famous Longest Common Subsequence(LCS) method. When LCS is applied with the string and the reverse of the given string, then it gives us the longest palindromic subsequence present in the string.
Let the longest palindromic subsequence length of a given string of length string_length be palin_length. Then (string_length - palin_length) gives the number of characters required to be deleted to convert the string to a palindrome. Thus, the given string is k-palindrome if (string_length - palin_length) <= k.
Let me give some examples,
Initial String: madtam (string_length = 6)
Longest Palindromic Subsequence: madam (palin_length = 5)
Number of non-contributing characters: 1 ( string_length - palin_length)
Thus this string is k-palindromic where k>=1. This is because you need to delete atmost k characters ( k or less).
Here is the code snippet:
#include<iostream>
#include<cstdio>
#include<algorithm>
using namespace std;
#define MAX 10000
int table[MAX+1][MAX+1];
int longest_common_subsequence(char *first_string, char *second_string){
int first_string_length = strlen(first_string), second_string_length = strlen(second_string);
int i, j;
memset( table, 0, sizeof(table));
for( i=1; i<=first_string_length; i++ ){
for( j=1; j<=second_string_length; j++){
if( first_string[i-1] == second_string[j-1] )
table[i][j] = table[i-1][j-1] + 1;
else
table[i][j] = max(table[i-1][j], table[i][j-1]);
}
}
return table[first_string_length][second_string_length];
}
char first_string[MAX], second_string[MAX];
int main(){
scanf("%s", first_string);
strcpy(second_string, first_string);
reverse(second_string, second_string+strlen(second_string));
int max_palindromic_length = longest_common_subsequence(first_string, second_string);
int non_contributing_chars = strlen(first_string) - max_palindromic_length;
if( k >= non_contributing_chars)
printf("K palindromic!\n");
else
printf("Not K palindromic!\n");
return 0;
}
I designed a solution purely based on recursion -
public static boolean isKPalindrome(String str, int k) {
if(str.length() < 2) {
return true;
}
if(str.charAt(0) == str.charAt(str.length()-1)) {
return isKPalindrome(str.substring(1, str.length()-1), k);
} else{
if(k == 0) {
return false;
} else {
if(isKPalindrome(str.substring(0, str.length() - 1), k-1)) {
return true;
} else{
return isKPalindrome(str.substring(1, str.length()), k-1);
}
}
}
}
There is no while loop in above implementation as in the accepted answer.
Hope it helps somebody looking for it.
public static boolean failK(String s, int l, int r, int k) {
if (k < 0)
return false;
if (l > r)
return true;
if (s.charAt(l) != s.charAt(r)) {
return failK(s, l + 1, r, k - 1) || failK(s, l, r - 1, k - 1);
} else {
return failK(s, l + 1, r - 1, k);
}
}

How to simplify this array method?

Here is my code to an array method:
private int _a;
public static void main(String[] args) {}
public int[] countAll(String s) {
int[] xArray = new int[27];
int[] yArray = new int[27];
_a = (int)'a';
for (int i = 0; i < xArray.length; i++) {
xArray[i] = _a;
_a = _a++;
}
for (int j = 0; j < s.length(); j++) {
s = s.toLowerCase();
char c = s.charAt(j);
int g = (int) c;
int letterindex = g - yArray[0];
if (letterindex >= 0 && letterindex <= 25) {
xArray[letterindex]++;
} else if (letterindex < 0 || letterindex > 25) {
xArray[26]++;
}
}
return xArray;
}
This code works in java but I was told that there is a simpler way. I am having a lot of trouble figuring out a simplified version of my code. Please help me.
If all you want to do is count the upper and lower case, that's a very roundabout way of doing it, what's wrong with something like:
public static int countUpper(String str)
{
int upper = 0;
for(char c : str.toCharArray())
{
if(Character.isUpperCase(c))
{
upper++;
}
}
return upper;
}
Then just the same thing with Character.isLowerCase(c) for the opposite.
public static int[] countAll(String s) {
int[] xArray = new int[27];
for (char c : s.toLowerCase().toCharArray()){
if (Character.isLetter(c))
xArray[c -'a']++;
else
xArray[26]++;
}
return xArray;
}
It looks like your program is trying to find frequencies of different alphabets in a string, and you are counting the non letters in special index 26. In that case your code to initialize the count is wrong. It is getting pre-initialized with some values in following for loop:
for (int i = 0; i < xArray.length; i++) {
xArray[i] = _a;
_a = _a++;
}
I think the method can be simply something like:
s = s.toLowerCase();
int histogram[] = new int[27];
for (char c: s.toCharArray()) {
int index = c - 'a';
if (index < 0 || index > 25) {
index = 26;
}
histogram[index]++;
}
Here are two important improvements that should be made to your code:
Add a method javadoc for countAll, so that readers don't have to trawl through 20+ lines of turgid code to reverse engineer what the method is supposed to be.
Get rid of the _a abomination. According to the most widely accepted Java coding standard, the underscore character has no place in a variable name. Besides, a is about the most useless field name I've ever come across. If it is intended to convey some meaning to the reader ... you have totally lost me.
(Oh I get it. It shouldn't be a field at all. Bzzzt!!!)
Then there is the yArray array. As far as I can tell the only place it is used is here:
int letterindex = g - yArray[0];
which is actually the same as:
int letterindex = g;
since yArray[0] is never assigned to. In short yArray is completely redundant.
And this:
if (letterindex >= 0 && letterindex <= 25) {
xArray[letterindex]++;
} else if (letterindex < 0 || letterindex > 25) {
xArray[26]++;
}
The condition in the else part is redundant. Your code will be easier to read if you just write this:
if (letterindex >= 0 && letterindex <= 25) {
xArray[letterindex]++;
} else {
xArray[26]++;
}
The two are equivalent. Do you see why?
Finally the initialization of the xArray elements looks plain wrong to me. If xArray contains counts, the elements need to start at zero. (Didn't you wonder why your code was telling you that every string contained lots of "zees"?)
"This code works in java ..."
I don't think so. Maybe it compiles. Maybe it runs without crashing. But it doesn't give correct answers!
public static int[] countAll(String s) {
int[] count = new int[26];
for (char c : s.toLowerCase().toCharArray()) {
if ('a' <= c && c <= 'z') {
count[c - 'a']++;
}
}
return count;
}
First.. your arrays where to big.
Second.. why do you need two arrays at all?
Third.. your code didn't seemt to work.. the word "hello" returned an array with the number 97 (26 times) and the number 102.
Edit: Made it shorter.

java String - String index out of range , charAt

I try to make a program which it can find palindromic number (it has to be pruduct of two 3-digits number and I hope that it contain 6 digit but it is not important). Here is my code:
public class palindromicNumber {
public static void getPalindromicNumber() {
boolean podminka = false;
int test;
String s;
for (int a = 999; podminka == false && a > 100; a--) {
for (int b = 999; podminka == false && b > 100; b--) {
test = a * b;
s = Integer.toString(test);
int c = 0;
int d = s.length();
while (c != d && podminka == false) {
if (s.charAt(c) == s.charAt(d)) { // I think that problem is here but I can't see what
System.out.println(s);
podminka = true;
}
c++;
d--;
}
}
}
}
}
and if I want to compile it :
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 6
at java.lang.String.charAt(String.java:695)
at faktorizace.palindromicNumber.getPalindromicNumber(palindromicNumber.java:24)
at faktorizace.Faktorizace.main(Faktorizace.java:19)
Java Result: 1
There are two problems here:
You're starting off with the wrong upper bound, as other answers have mentioned
If c starts off odd and d starts off even, then c will never equal d. You need to use
while (c < d && !podminka) // Prefer !x to x == false
Additionally, judicious use of break and return would avoid you having to have podminka at all.
As another aside, you've got a separation of concerns issue. Your method currently does three things:
Iterates over numbers in a particular way
Checks whether or not they're palandromic
Prints the first it finds
You should separate those out. For example:
public void printFirstPalindrome() {
long palindrome = findFirstPalindrome();
System.out.println(palindrome);
}
public long findFirstPalindrome() {
// Looping here, calling isPalindrome
}
public boolean isPalindrome(long value) {
// Just checking here
}
I suspect findFirstPalindrome would normally take some parameters, too. At this point, you'd have methods which would be somewhat easier to both write and test.
String indices go from [0..length - 1]
Change int d = s.length(); to int d = s.length() - 1;
Update: As a quick aside, you are setting podminka to true when
s.charAt(c) == s.charAt(d)
If s = 100101 for example, you will terminate all of the loops on the first iteration of the while loop because the first and last characters are the same.
int d = s.length();
An array of the strings chars will only go from 0 - length-1.
s.charAt(d) will always be out of bounds on the first iteration.
Take a look on JDK source code:
public char charAt(int index) {
if ((index < 0) || (index >= count)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index + offset];
}
You can see that this exception is thrown when index is less then zero or exceeds the string length. Now use debugger, debug your code and see why do you pass this wrong parameter value to charAt().
public class palindromicNumber {
public static void getPalindromicNumber(){
boolean podminka = false;
int test;
String s;
for(int a = 999;podminka == false && a>100; a-- ){
for(int b = 999;podminka == false && b>100; b-- ){
test = a*b;
s = Integer.toString(test);
int c = 0;
int d = s.length();
while(c!=d && podminka == false){
if(s.charAt(c)==s.charAt(d - 1)){
System.out.println(s);
podminka = true;
}
c++;
d--;
}
}
}
try this! string count starts from 0!

Categories