Misleading javadoc comment on StringBuilder indexOf?

Misleading javadoc comment on StringBuilder indexOf? - java

I'm trying to understand the following comment from the javadoc of the StringBuilder class's indexOf(String str,
int fromIndex) method.
It says:
Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:
k >= Math.min(fromIndex, str.length()) &&
this.toString().startsWith(str, k)
If no such value of k exists, then -1 is returned.
Now, I can't see the reason for str.length() in Math.min(fromIndex, str.length()) since it would allow for a String to be found on an index < fromIndex. Am I missing something, or this is simply a misleading/wrong commment?
Edit: as pointed below this is the comment from the Java 7 javadoc; java 6 has the right comment.

It's a mistake. It's supposed to be this.length() instead of str.length().
That allows for fromIndex to be greater than this.length() in the case where str is empty.
Example:
StringBuilder sb = new StringBuilder("Example");
System.out.println(sb.indexOf("", 1234)); //Outputs sb.length(), which is 7.
Note: String#indexOf(String, int) behaves the same way.

No the condition basically boils down to the first k such that k is at least fromIndex and less than or equal to str.length(), at which the string contains the desired substring.
If str.length() < fromIndex, then the startsWith condition will always be false, because str.length() is not a valid index to the generated string.

Related

What happens when if statement goes true (in this code)?

There is a problem in codingbat.com which you're supposed to remove "yak" substring from the original string. and they provided a solution for that which I can't understand what happens when the if statement goes true!
public String stringYak(String str) {
String result = "";
for (int i=0; i<str.length(); i++) {
// Look for i starting a "yak" -- advance i in that case
if (i+2<str.length() && str.charAt(i)=='y' && str.charAt(i+2)=='k') {
i = i + 2;
} else { // Otherwise do the normal append
result = result + str.charAt(i);
}
}
return result;
}
It just adds up i by 2 and what? When it appends to the result string?
Link of the problem:
https://codingbat.com/prob/p126212

The provided solution checks for all single characters in the input string. For this i is the current index of the checked character. When the current char is not a y and also the (i+2) character is not a k the current char index is advanced by 1 position.
Example:
yakpak
012345
i
So here in the first iteration the char at i is y and i+2 is a k, so we have to skip 3 chars. Keep in mind i is advanced by 1 everytime. So i has to be increased by 2 more. After this iteration i is here
yakpak
012345
i
So now the current char is no y and this char will get added to the result string.
But it's even simpler in Java as this functionality is build in with regex:
public String stringYak(String str) {
return str.replaceAll("y.k","");
}
The . means every char.

If i is pointing at a y and there is as k two positions down, then it wants to skip the full y*k substring, so it add 2 to i so i now refers to the k. WHen then loop continues, i++ will skip past the k, so in effect, the entire 3-letter y*k substring has been skipped.

Length of the Longest Common Substring without repeating characters

Given "abcabcbb", the answer is "abc", which the length is 3.
Given "bbbbb", the answer is "b", with the length of 1.
Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.
I have came up with a solution that worked, but failed for several test cases. I then found a better solution and I rewrote it to try and understand it. The solution below works flawlessly, but after about 2 hours of battling with this thing, I still can not understand why this particular line of code works.
import java.util.*;
import java.math.*;
public class Solution {
public int lengthOfLongestSubstring(String str) {
if(str.length() == 0)
return 0;
HashMap<Character,Integer> map = new HashMap<>();
int startingIndexOfLongestSubstring = 0;
int max = 0;
for(int i = 0; i < str.length(); i++){
char currentChar = str.charAt(i);
if(map.containsKey(currentChar))
startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);
map.put(currentChar, i);
max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
}//End of loop
return max;
}
}
The line in question is max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
I don't understand why this works. We're taking the max between our previous max, and the difference between our current index and the starting index of what is currently the longest substring and then adding 1. I know that the code is getting the difference between our current index, and the startingIndexOfSubstring, but I can't conceptualize WHY it works to give us the intended result; Can someone please explain this step to me, particularly WHY it works?

I'm usually bad at explaining, let me give it a shot by considering an example.
String is "wcabcdeghi".
Forget the code for a minute and assume we're trying to come up with a logic.
We start from w and keep going until we reach c -> a -> b -> c.
We need to stop at this point because "c" is repeating. So we need a map to store if a character is repeated. (In code : map.put(currentChar, i); )
Now that we know if a character is repeated, We need to know what is the max. length so far. (In code -) max
Now we know there is no point in keeping track of count of first 2 variables w->c. This is because including this, we already got the Max. value. So from next iteration onwards we need to check length only from a -> b -> soon.
Lets have a variable (In code -)startingIndexOfLongestSubstring to keep track of this. (This should've been named startingIndexOfNonRepetativeCharacter, then again I'm bad with naming as well).
Now we again keep continuing, but wait we still haven't finalized on how to keep track of sub-string that we're currently parsing. (i.e., from abcd...)
Coming to think of it, all I need is the position of where "a" was present (which is startingIndexOfNonRepetativeCharacter) so to know the length of current sub-string all I need to do is (In code -)i - startingIndexOfLongestSubstring + 1 (current character position - The non-repetative character length + (subtraction doesn't do inclusive of both sides so adding 1). Lets call this currentLength
But wait, what are we going to do with this count. Every time we find a new variable we need to check if this currentLength can break our max.
So (In code -) max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
Now we've covered most of the statements that we need and according to our logic everytime we encounter a variable which was already present all we need is startingIndexOfLongestSubstring = map.get(currentChar). So why are we doing a Max?
Consider a scenario where String is "wcabcdewghi". when we start processing our new counter as a -> b -> c -> d -> e -> w At this point our logic checks if this character was present previously or not. Since its present, it starts the count from index "1". Which totally messes up the whole count. So We need to make sure, the next index we take from map is always greater than the starting point of our count(i.e., select a character from the map only if the character occurs before startingIndexOfLongestSubstring).
Hope I've answered all lines in the code and mainly If the explanation was understandable.

Because
i - startingIndexOfLongestSubstring + 1
is amount of characters between i and startingIndexOfLongestSubstring indexes. For example how many characters between position 2 and 3? 3-2=1 but we have 2 characters: on position 2 and position 3.
I've described every action in the code:
public class Solution {
public int lengthOfLongestSubstring(String str) {
if(str.length() == 0)
return 0;
HashMap<Character,Integer> map = new HashMap<>();
int startingIndexOfLongestSubstring = 0;
int max = 0;
// loop over all characters in the string
for(int i = 0; i < str.length(); i++){
// get character at position i
char currentChar = str.charAt(i);
// if we already met this character
if(map.containsKey(currentChar))
// then get maximum of previous 'startingIndexOfLongestSubstring' and
// map.get(currentChar) + 1 (it is last occurrence of the current character in our word before plus 1)
// "plus 1" - it is because we should start count from the next character because our current character
// is the same
startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);
// save position of the current character in the map. If map already has some value for current character
// then it will override (we don't want to know previous positions of the character)
map.put(currentChar, i);
// get maximum between 'max' (candidate for return value) and such value for current character
max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
}//End of loop
return max;
}
}

In Java, I have a String named string "A"( size 1). Why does string.substring(1) give no exceptions

In java, I have a String named string with the value "A".
i.e. String string = "A";
Its size is 1 and we know that characters in a string are 0 indexed. They are represented as a Char array.
Then, why does NOT string.substring(1); give me an exception?

If you look at code of substring(int beginIndex) :
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = value.length - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
}
There is no condition to specify that if length is 1 and index is 1, an exception should be thrown. Infact, a new empty string is returned.
System.out.println("A".substring(1).equals("")); returns true because of last line in the method.

Based on the JavaDocs for String#subString(int)
Throws: IndexOutOfBoundsException - if beginIndex is negative or larger than the length of this String object".
A String with a single character has a size of 1, therefore 1 == length

As you can se in the code of substring
int subLen = value.length - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
it will only throw an exception if then lenght of the spring is lower then ie begin index.
In your case the begin index is equals the length and thats why you do not get an exception.

Others have quoted the rule, but I wanted to point out the implications. If you say s.substring(n), then n can be anywhere from 0 to s.length(), inclusive. That means that if s has length len, there are len+1 (rather than len) possible ways to call s.substring with one parameter. This makes sense, because if you have the string "abcde", for example, and you're using substring(n) to get a suffix of the string, there are six possible suffixes: "abcde", "bcde", "cde", "de", "e", "".
This is helpful in practice, because you can code things like
while (<something>) {
if (s.beginsWith(prefix)) {
... do some other stuff ...
s = s.substring(prefix.length());
// This sets "s" to the remainder of the string, after the prefix.
// It works even if there is no more text in the string, i.e.
// prefix.length() == s.length(), so that the remaining text is "".
}
In my experience, the ability to have things work this way is very beneficial. It avoids forcing to you write special logic to handle this boundary case, since the boundary case's handling is consistent with the non-boundary cases.

Strange behavior of Java String split() method

I have a method which takes a string parameter and split the string by # and after splitting it prints the length of the array along with array elements. Below is my code
public void StringSplitTesting(String inputString) {
String tokenArray[] = inputString.split("#");
System.out.println("tokenArray length is " + tokenArray.length
+ " and array elements are " + Arrays.toString(tokenArray));
}
Case I : Now when my input is abc# the output is tokenArray length is 1 and array elements are [abc]
Case II : But when my input is #abc the output is tokenArray length is 2 and array elements are [, abc]
But I was expecting the same output for both the cases. What is the reason behind this implementation? Why split() method is behaving like this? Could someone give me proper explanation on this?

One aspect of the behavior of the one-argument split method can be surprising -- trailing nulls are discarded from the returned array.
Trailing empty strings are therefore not included in the resulting array.
To get a length of 2 for each case, you can pass in a negative second argument to the two-argument split method, which means that the length is unrestricted and no trailing empty strings are discarded.

Just take a look in the documentation:
Trailing empty strings are therefore not included in the resulting
array.
So in case 1, the output would be {"abc", ""} but Java cuts the trailing empty String.
If you don't want the trailing empty String to be discarded, you have to use split("#", -1).

The observed behavior is due to the inherently asymmetric nature of the substring() method in Java:
This is the core of the implementation of split():
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
The key to understanding the behavior of the above code is to understand the behavior of the substring() method:
From the Javadocs:
String java.lang.String.substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at index
endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
"hamburger".substring(4, 8) returns "urge" (not "urger")
"smiles".substring(1, 5) returns "mile" (not "miles")
Hope this helps.

Find char index where contains() found sequence

if(input.contains("Angle ")) {
input.charAt(?);
}
So, basically, how would you find the char directly after "Angle "? In absolute simplest terms, how do you find the indexes in which "Angle " was found?

You can use the indexOf method both to find out that the input contains the string, and where its index is:
int pos = input.indexOf("Angle ");
if (pos >= 0) {
... // Substring is found at index pos
}

Have you tried the indexOf() method?
From java doc...
Returns the index within this string of the first occurrence of the
specified substring. The integer returned is the smallest value k such
that: this.startsWith(str, k) is true.
Then since you know the length of the string, you could add that to the input to find the char directly after "Angle ".

To find the word following "Angle ", you could use regex:
String next = str.replaceAll(".*Angle (\\w+).*", "$1");
Then you don't have to sully yourself with indexes, iteration and lots of code.

As others have stated, you may use indexOf to find the location of the substring. If you have more than one occurrence of the substring and you want to find all of them, you can use the version of indexOf that takes a starting position to continue the search after the current occurrence, e.g. to find all occurrences of needle in haystack:
int index = 0;
while ((index = haystack.indexOf(needle, index)) != -1) {
System.out.println("Found substring at " + index);
index += needle.length();
}
Note, by the way, that .contains(needle) is essentially the same as .indexOf(needle) > -1 (in fact, that is precisely how contains() is implemented).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Misleading javadoc comment on StringBuilder indexOf? - java

Related

What happens when if statement goes true (in this code)?

Length of the Longest Common Substring without repeating characters

In Java, I have a String named string "A"( size 1). Why does string.substring(1) give no exceptions

Strange behavior of Java String split() method

Find char index where contains() found sequence

Categories

Resources