Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
I have came up with a solution that worked, but failed for several test cases. I then found a better solution and I rewrote it to try and understand it. The solution below works flawlessly, but after about 2 hours of battling with this thing, I still can not understand why this particular line of code works.
import java.util.*;
import java.math.*;
public class Solution {
public int lengthOfLongestSubstring(String str) {
if(str.length() == 0)
return 0;
HashMap<Character,Integer> map = new HashMap<>();
int startingIndexOfLongestSubstring = 0;
int max = 0;
for(int i = 0; i < str.length(); i++){
char currentChar = str.charAt(i);
if(map.containsKey(currentChar))
startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);
map.put(currentChar, i);
max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
}//End of loop
return max;
}
}
The line in question is max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
I don't understand why this works. We're taking the max between our previous max, and the difference between our current index and the starting index of what is currently the longest substring and then adding 1. I know that the code is getting the difference between our current index, and the startingIndexOfSubstring, but I can't conceptualize WHY it works to give us the intended result; Can someone please explain this step to me, particularly WHY it works?
I'm usually bad at explaining, let me give it a shot by considering an example.
String is "wcabcdeghi".
Forget the code for a minute and assume we're trying to come up with a logic.
We start from w and keep going until we reach c -> a -> b -> c.
We need to stop at this point because "c" is repeating. So we need a map to store if a character is repeated. (In code : map.put(currentChar, i); )
Now that we know if a character is repeated, We need to know what is the max. length so far. (In code -) max
Now we know there is no point in keeping track of count of first 2 variables w->c. This is because including this, we already got the Max. value. So from next iteration onwards we need to check length only from a -> b -> soon.
Lets have a variable (In code -)startingIndexOfLongestSubstring to keep track of this. (This should've been named startingIndexOfNonRepetativeCharacter, then again I'm bad with naming as well).
Now we again keep continuing, but wait we still haven't finalized on how to keep track of sub-string that we're currently parsing. (i.e., from abcd...)
Coming to think of it, all I need is the position of where "a" was present (which is startingIndexOfNonRepetativeCharacter) so to know the length of current sub-string all I need to do is (In code -)i - startingIndexOfLongestSubstring + 1 (current character position - The non-repetative character length + (subtraction doesn't do inclusive of both sides so adding 1). Lets call this currentLength
But wait, what are we going to do with this count. Every time we find a new variable we need to check if this currentLength can break our max.
So (In code -) max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
Now we've covered most of the statements that we need and according to our logic everytime we encounter a variable which was already present all we need is startingIndexOfLongestSubstring = map.get(currentChar). So why are we doing a Max?
Consider a scenario where String is "wcabcdewghi". when we start processing our new counter as a -> b -> c -> d -> e -> w At this point our logic checks if this character was present previously or not. Since its present, it starts the count from index "1". Which totally messes up the whole count. So We need to make sure, the next index we take from map is always greater than the starting point of our count(i.e., select a character from the map only if the character occurs before startingIndexOfLongestSubstring).
Hope I've answered all lines in the code and mainly If the explanation was understandable.
Because
i - startingIndexOfLongestSubstring + 1
is amount of characters between i and startingIndexOfLongestSubstring indexes. For example how many characters between position 2 and 3? 3-2=1 but we have 2 characters: on position 2 and position 3.
I've described every action in the code:
public class Solution {
public int lengthOfLongestSubstring(String str) {
if(str.length() == 0)
return 0;
HashMap<Character,Integer> map = new HashMap<>();
int startingIndexOfLongestSubstring = 0;
int max = 0;
// loop over all characters in the string
for(int i = 0; i < str.length(); i++){
// get character at position i
char currentChar = str.charAt(i);
// if we already met this character
if(map.containsKey(currentChar))
// then get maximum of previous 'startingIndexOfLongestSubstring' and
// map.get(currentChar) + 1 (it is last occurrence of the current character in our word before plus 1)
// "plus 1" - it is because we should start count from the next character because our current character
// is the same
startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);
// save position of the current character in the map. If map already has some value for current character
// then it will override (we don't want to know previous positions of the character)
map.put(currentChar, i);
// get maximum between 'max' (candidate for return value) and such value for current character
max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
}//End of loop
return max;
}
}
Related
There is a problem in codingbat.com which you're supposed to remove "yak" substring from the original string. and they provided a solution for that which I can't understand what happens when the if statement goes true!
public String stringYak(String str) {
String result = "";
for (int i=0; i<str.length(); i++) {
// Look for i starting a "yak" -- advance i in that case
if (i+2<str.length() && str.charAt(i)=='y' && str.charAt(i+2)=='k') {
i = i + 2;
} else { // Otherwise do the normal append
result = result + str.charAt(i);
}
}
return result;
}
It just adds up i by 2 and what? When it appends to the result string?
Link of the problem:
https://codingbat.com/prob/p126212
The provided solution checks for all single characters in the input string. For this i is the current index of the checked character. When the current char is not a y and also the (i+2) character is not a k the current char index is advanced by 1 position.
Example:
yakpak
012345
i
So here in the first iteration the char at i is y and i+2 is a k, so we have to skip 3 chars. Keep in mind i is advanced by 1 everytime. So i has to be increased by 2 more. After this iteration i is here
yakpak
012345
i
So now the current char is no y and this char will get added to the result string.
But it's even simpler in Java as this functionality is build in with regex:
public String stringYak(String str) {
return str.replaceAll("y.k","");
}
The . means every char.
If i is pointing at a y and there is as k two positions down, then it wants to skip the full y*k substring, so it add 2 to i so i now refers to the k. WHen then loop continues, i++ will skip past the k, so in effect, the entire 3-letter y*k substring has been skipped.
I'm researching on how to find the minimum number of steps required to convert word1 to word2, and came across the following implementation with the rules:
Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)
You have the following 3 operations permitted on a word:
a) Insert a character
b) Delete a character
c) Replace a character
And the idea of the implementation is:
Use distance[i][j] to represent the shortest edit distance between word1[0,i) and word2[0, j). Then compare the last character of word1[0,i) and word2[0,j), which are c and d respectively (c == word1[i-1], d == word2[j-1]):
if c == d, then : distance[i][j] = distance[i-1][j-1]
Otherwise we can use three operations to convert word1 to word2:
(a) if we replaced c with d: distance[i][j] = distance[i-1][j-1] + 1;
(b) if we added d after c: distance[i][j] = distance[i][j-1] + 1;
(c) if we deleted c: distance[i][j] = distance[i-1][j] + 1;
Code:
public class Solution {
public int minDistance(String word1, String word2) {
int len1 = word1.length();
int len2 = word2.length();
//distance[i][j] is the distance converse word1(1~ith) to word2(1~jth)
int[][] distance = new int[len1 + 1][len2 + 1];
for (int j = 0; j <= len2; j++)
{distance[0][j] = j;} //delete all characters in word2
for (int i = 0; i <= len1; i++)
{distance[i][0] = i;}
for (int i = 1; i <= len1; i++) {
for (int j = 1; j <= len2; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) { //ith & jth
distance[i][j] = distance[i - 1][j - 1];
} else {
distance[i][j] = Math.min(Math.min(distance[i][j - 1], distance[i - 1][j]), distance[i - 1][j - 1]) + 1;
}
}
}
return distance[len1][len2];
}
}
And my question is what does distance[][] represent? What's the point of storing a value for every 2D index? And why do you add 1 to len1 and len2 in int[][] distance = new int[len1 + 1][len2 + 1];?
For example, from my understanding, it is comparing each character of word1 to word2, but once both characters match, shouldn't both words' indexes move up? Meaning, if String word1="ab"; and String word2="ac", since a characters match, there is no need to compare a in word1 to c in word2, but rather move up indexes, and compare b in word1 to c in word2.
Lastly, how do the three operations represent the way that do, e.g. how come distance[i-1][j-1] represent replacement?
Thank you in advance and will accept answer/up vote.
what does distance[][] represent?
It represents minDistance(word1.substring(0, i), word2.substring(0, j)). Here i and j are lengthes of substrings.
What's the point of storing a value for every 2D index?
It's dynamic programming idea. The answer for partial solution is calculated once and then used multiple times. If you don't store it in "global" array, you have to calculate it every time you need it. For this case there are 1-3 possible cases, so recursive calculations could take O((N*M)^3) time, where N is length of word1 and M is length of word2. But if instead you simply use previously calculated result, it will only take O(N*M) time.
And why do you add 1 to len1 and len2 in int[][] distance = new int[len1 + 1][len2 + 1];?
For technical reasons you need to look up back to trivial cases of empty substrings. To store these cases distance[0][i] and distance[j][0] slots in array are being used.
You can replace it with special case calculations (the solution for trivial cases is known), but it will make code more complex. Would you go for recursive calls instead of direct look ups in array, it would be viable.
once both characters match, shouldn't both words' indexes move up?
No, it's not about moving index, it's about calculation of partial solution here and now. Nested loops for i and j will care about "index moving" in appropriate time. Remember, we are not going through one particulary good case, but calculate all partial solutions for i=0..len1 and j=0..len2. Each partial solution only makes 1 step back in 1 or 3 different directions.
Lastly, how do the three operations represent the way that do, e.g. how come distance[i-1][j-1] represent replacement?
For example, minDistance("abc", "abd") = 1 + minDistance("ab", "ab") = 1 + minDistance("a", "a") = 1 + minDistance("", "") = 1 + 0 = 1.
In this example for the case of calculation of final answer:
i=3="abc".length()
j=3="abd".length()
c = 'c' = "abc".charAt(i-1)
d = 'd' = "abd".charAt(j-1)
If we decide to replace c to d, that is last character in word1 to last character in word2, we use already calculated answer for left parts of words 1 character shorter, since replacement will take care of last character. We add 1 to total number of operations, since we decided to make this replacement here.
We're adding 1 to the word lengths for convenience since string indexing corresponds with some length at index zero, but we need to refer to distance[0-1][0-1] on the first comparison, if word1[0] == word2[0]. Without an extra cell, the assignment, distance[i][j] = distance[i-1][j-1], would have to be especially addressed rather than just be part of the loop.
This kind of solution formulation, where each iteration relies on a previous iteration's result is called dynamic programming. Let's try to put words to this particular rule formulation.
First of all, we define what each cell represents: its the smallest number of changes we need to apply to the prefix of word1 that ends at index i to change that prefix to the prefix of word2 that ends at index j. Now you can see how the preparation, distance[i][0] = i makes sense - it would take exactly i deletions to make any prefix of length i into a string of length zero!
if c == d, then : distance[i][j] = distance[i-1][j-1]
Translation: since we had to change nothing, the number of changes it took to make prefix length i the same as j would be the same number of changes to get the previous two prefixes equal, those with lengths [i-1][j-1].
If c does not equal d, we are going to choose the smaller of three options:
(a) if we replaced c with d: distance[i][j] = distance[i-1][j-1] + 1
Translation: imagine our prefixes are of similar length at this point and we just replaced the wrong character at i to be the same as the one at j. Again we look at the solution for the previous prefix lengths [i-1][j-1] but we need to add 1 since we made a change. (Now remeber this is one option of three that we will choose from. Also remember that any previous cell stores the optimal solution up to that point.)
(b) if we added d after c: distance[i][j] = distance[i][j-1] + 1
Translation: we've reached index i but it doesn't match j, therefore we can look at the optimal solution for adjusting this prefix length so it matches the one ending at (j-1) (a solution we already computed) and add the d so both prefixes reach [i][j] in a correct state. Again we need to add 1 to the previous state's solution.
(c) if we deleted c: distance[i][j] = distance[i-1][j] + 1
Translation: we've reached index i but it doesn't match j, therefore we can look at the optimal solution for adjusting the previous prefix length (i-1 which we already computed) so it matches the one ending at j, but we need to add 1 since we need to remove c to reach the previous prefix length.
Example:
word1 = 'ab'
word2 = 'ac'
m = [[0,1,2]
,[1,0,1]
,[2,1,_]]
(i,j)
1,1 => m[i][j] = m[i-1][j-1] = 0 // no change needed
1,2 => min(m[0][1],m[1,1],m[0,2]) + 1
= min(1 ,0 ,2 ) + 1
= 1
choice represented: easiest to change 'a' to 'ac' by adding
1 ('c') to the solution for [i][j-1] = [1][1]
2,1 => min(m[1][0],m[2][0],m[1][1]) + 1
= min(1, ,2 ,0 ) + 1
= 1
choice represented: easiest to change 'ab' to 'a' by adding
1 (deletion) to the solution for [i-1][j] = [1][1]
2,2 => min(m[1][1],m[2][1],m[1][2]) + 1
= min(0 ,1 ,1 ) + 1
= _
choice represented: you figure it out...
I just took a codility test and was wondering why my solution only scored 37/100. The problem was that you were given a String and had to search through it for valid passwords.
Here are the rules:
1) A valid password starts with a capital letter and cannot contain any numbers. The input is restricted to any combination of a-z, A-Z and 0-9.
2)The method they wanted you to create is suppose to return the size of the largest valid password. So for example if you input "Aa8aaaArtd900d" the number 4 is suppose to be outputted by the solution method. If no valid String is found the method should return -1
I cannot seem to figure out where I went wrong in my solution. Any help would be greatly appreciated! Also any suggestions on how to better test code for something like this would be greatly appreciated.
class Solution2 {
public int solution(String S) {
int first = 0;
int last = S.length()-1;
int longest = -1;
for(int i = 0; i < S.length(); i++){
if(Character.isUpperCase(S.charAt(i))){
first = i;
last = first;
while(last < S.length()){
if(Character.isDigit(S.charAt(last))){
i = last;
break;
}
last++;
}
longest = Math.max(last - first, longest);
}
}
return longest;
}
}
added updated solution, any thoughts to optimize this further?
Your solution is too complicated. Since you are not asked to find the longest password, only the length of the longest password, there is no reason to create or store strings with that longest password. Therefore, you do not need to use substring or an array of Strings, only int variables.
The algorithm for finding the solution is straightforward:
Make an int pos = 0 variable representing the current position in s
Make a loop that searches for the next candidate password
Starting at position pos, find the next uppercase letter
If you hit the end of line, exit
Starting at the position of the uppercase letter, find the next digit
If you hit the end of line, stop
Find the difference between the position of the digit (or the end of line) and the position of the uppercase letter.
If the difference is above max that you have previously found, replace max with the difference
Advance pos to the position of the last letter (or the end of line)
If pos is under s.length, continue the loop at step 2
Return max.
Demo.
This question already has answers here:
Memory efficient power set algorithm
(5 answers)
Closed 8 years ago.
I'm trying to find every possible anagram of a string in Java - By this I mean that if I have a 4 character long word I want all the possible 3 character long words derived from it, all the 2 character long and all the 1 character long. The most straightforward way I tought of is to use two nested for loops and iterare over the string. This is my code as of now:
private ArrayList<String> subsets(String word){
ArrayList<String> s = new ArrayList<String>();
int length = word.length();
for (int c=0; c<length; c++){
for (int i=0; i<length-c; i++){
String sub = word.substring(c, c+i+1);
System.out.println(sub);
//if (!s.contains(sub) && sub!=null)
s.add(sub);
}
}
//java.util.Collections.sort(s, new MyComparator());
//System.out.println(s.toString());
return s;
}
My problem is that it works for 3 letter words, fun yelds this result (Don't mind the ordering, the word is processed so that I have a string with the letters in alphabetical order):
f
fn
fnu
n
nu
u
But when I try 4 letter words, it leaves something out, as in catq gives me:
a
ac
acq
acqt
c
cq
cqt
q
qt
t
i.e., I don't see the 3 character long word act - which is the one I'm looking for when testing this method. I can't understand what the problem is, and it's most likely a logical error I'm making when creating the substrings. If anyone can help me out, please don't give me the code for it but rather the reasoning behind your solution. This is a piece of coursework and I need to come up with the code on my own.
EDIT: to clear something out, for me acq, qca, caq, aqc, cqa, qac, etc. are the same thing - To make it even clearer, what happens is that the string gets sorted in alphabetical order, so all those permutations should come up as one unique result, acq. So, I don't need all the permutations of a string, but rather, given a 4 character long string, all the 3 character long ones that I can derive from it - that means taking out one character at a time and returning that string as a result, doing that for every character in the original string.
I hope I have made my problem a bit clearer
It's working fine, you just misspelled "caqt" as "acqt" in your tests/input.
(The issue is probably that you're sorting your input. If you want substrings, you have to leave the input unsorted.)
After your edits: see Generating all permutations of a given string Then just sort the individual letters, and put them in a set.
Ok, as you've already devised your own solution, I'll give you my take on it. Firstly, consider how big your result list is going to be. You're essentially taking each letter in turn, and either including it or not. 2 possibilities for each letter, gives you 2^n total results, where n is the number of letters. This of course includes the case where you don't use any letter, and end up with an empty string.
Next, if you enumerate every possibility with a 0 for 'include this letter' and a 1 for don't include it, taking your 'fnu' example you end up with:
000 - ''
001 - 'u'
010 - 'n'
011 - 'nu'
100 - 'f'
101 - 'fu' (no offense intended)
110 - 'fn'
111 - 'fnu'.
Clearly, these are just binary numbers, and you can derive a function that given any number from 0-7 and the three letter input, will calculate the corresponding subset.
It's fairly easy to do in java.. don't have a java compiler to hand, but this should be approximately correct:
public string getSubSet(string input, int index) {
// Should check that index >=0 and < 2^input.length here.
// Should also check that input.length <= 31.
string returnValue = "";
for (int i = 0; i < input.length; i++) {
if (i & (1 << i) != 0) // 1 << i is the equivalent of 2^i
returnValue += input[i];
}
return returnValue;
}
Then, if you need to you can just do a loop that calls this function, like this:
for (i = 1; i < (1 << input.length); i++)
getSubSet(input, i); // this doesn't do anything, but you can add it to a list, or output it as desired.
Note I started from 1 instead of 0- this is because the result at index 0 will be the empty string. Incidentally, this actually does the least significant bit first, so your output list would be 'f', 'n', 'fn', 'u', 'fu', 'nu', 'fnu', but the order didn't seem important.
This is the method I came up with, seems like it's working
private void subsets(String word, ArrayList<String> subset){
if(word.length() == 1){
subset.add(word);
return;
}
else {
String firstChar = word.substring(0,1);
word = word.substring(1);
subsets(word, subset);
int size = subset.size();
for (int i = 0; i < size; i++){
String temp = firstChar + subset.get(i);
subset.add(temp);
}
subset.add(firstChar);
return;
}
}
What I do is check if the word is bigger than one character, otherwise I'll add the character alone to the ArrayList and start the recursive process. If it is bigger, I save the first character and make a recursive call with the rest of the String. What happens is that the whole string gets sliced in characters saved in the recursive stack, until I hit the point where my word has become of length 1, only one character remaining.
When that happens, as I said at the start, the character gets added to the List, now the recursion starts and it looks at the size of the array, in the first iteration is 1, and then with a for loop adds the character saved in the stack for the previous call concatenated with every element in the ArrayList. Then it adds the character on its own and unwinds the recursion again.
I.E., with the word funthis happens:
f saved
List empty
recursive call(un)
-
u saved
List empty
recursive call(n)
-
n.length == 1
List = [n]
return
-
list.size=1
temp = u + list[0]
List = [n, un]
add the character saved in the stack on its own
List = [n, un, u]
return
-
list.size=3
temp = f + list[0]
List = [n, un, u, fn]
temp = f + list[1]
List = [n, un, u, fn, fun]
temp = f + list[2]
List = [n, un, u, fn, fun, fu]
add the character saved in the stack on its own
List = [n, un, u, fn, fun, fu, f]
return
I have been as clear as possible, I hope this clarifies what was my initial problem and how to solve it.
This is working code:
public static void main(String[] args) {
String input = "abcde";
Set<String> returnList = permutations(input);
System.out.println(returnList);
}
private static Set<String> permutations(String input) {
if (input.length() == 1) {
Set<String> a = new TreeSet<>();
a.add(input);
return a;
}
Set<String> returnSet = new TreeSet<>();
for (int i = 0; i < input.length(); i++) {
String prefix = input.substring(i, i + 1);
Set<String> permutations = permutations(input.substring(i + 1));
returnSet.add(prefix);
returnSet.addAll(permutations);
Iterator<String> it = permutations.iterator();
while (it.hasNext()) {
returnSet.add(prefix + it.next());
}
}
return returnSet;
}
Problem solved, I ended up need a seperate counter for the array position. Thanks for the help!
I'm writing a small app that takes a string, processes each string into 7-bits of binary code and then fills in a musical scale based on the string. For instance, if I had the binary 1000100, in the key of C Major that would give me the notes C and G(C 0 0 0 G 0 0).
I'm having an issue with a specific piece of code that takes an input of String[] (in which each element is a single character worth of binary, 7-bits) and processes each individual character in the strings themselves and stores the index number of where 1's occur in the string. For example, the string 1000100 would output 1 and 5.
Here's the method that does that:
public static String[][] convertToScale(String[] e){
String[][] notes = new String[e.length][]; //create array to hold arrays of Strings that represent notes
for(int i = 0; i < e.length; i++){
notes[i] = new String[findOccurancesOf(e[i])]; //create arrays to hold array of strings
for(int x = 0; x < e[i].length(); x++){
if((e[i].charAt(x)) != 48){ //checks to see if the char being evaluated is 0(Ascii code 48)
notes[i][x] = Integer.toString(x + 1); // if the value isn't 0, it fills in the array for that position.the value at x+1 represents the position of the scale the note is at
}
}
}
return notes;
}
Here is the code that is uses to get the occurrences of 1 in e[1]:
public static int findOccurancesOf(String s){
int counter = 0;
for(int i = 0; i < s.length(); i++ ) {
if( s.charAt(i) == 1 ) {
counter++;
}
}
return counter;
}
The issue I'm having is with the convertToScale method. When using "Hello world" as my input(the input gets converted into 7-bit binary before it gets processed by either of these methods) it passes through the 2nd for-each loop just fine the first time around, but after it tries to fill another spot in the array, it throws
java.lang.ArrayIndexOutOfBoundsException: 3
EDIT:It occurs in the line notes[i][x] = Integer.toString(x + 1); of the convertToScale method. I've run the debugger multiple times through after trying the proposes changes below and I still get the same error at the same line. The findOccurancesOf method returns the right value(When evaluating H(1001000) it returns 2.) So the thing that confuses me is that the out of bounds exception comes up right when it fills the 2nd spot in the array.
Also, feel free to tell me if anything else is crazy or my syntax is bad. Thanks!
In findOccurancesOf():
if( s.charAt(i) == 1 ) { should be if( s.charAt(i) == '1' ) { to check for the character '1'.
Otherwise it's looking for the character with ASCII value 1.
There is an out of bounds exception because if findOccuranceOf() returns the wrong value, then notes[i] is not constructed with the correct length in the following line of convertToScale():
notes[i] = new String[findOccurancesOf(e[i])];
In addition, you probably want to use something like:
notes[i][c++] = Integer.toString(x + 1);
with some counter c initialized to 0, if I understand your intentions correctly.
The reason for AIOOBE lies in this line:
notes[i] = new String[findOccurancesOf(e[i])]; //create arrays to hold array of strings
Where you call findOccurancesOf method to find occurance of 1 in your String say Hello which you dont find and return 0 and then you call notes[i][x] = Integer.toString(x + 1); with x as 0. Now since you never allocated space, you get array index out of bound exception.
I would suggest the folowing:
Validate your string before assigning the index say to be greater than 0 or something.
Initialize you notes[i] as notes[i] = new String[e[i].length];
Checking character with single quotes like a == '1' rather than a == 1
The exception is caused by what almas mentioned, note however, that your logical error is most likely inside findOccurencesOf method, if the idea was to find all the '1' chars inside a string you must change to what I outlined below, note the apostrohes. Otherwise a char is getting converted to a byte ascii code, and unless matched with a code of ascii code one, the method will return 0, causing your exception
if( s.charAt(i) == '1' ) {