Why do Strings start with a "" in Java? [duplicate] - java

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does “abcd”.StartsWith(“”) return true?
Whilst debugging through some code I found a particular piece of my validation was using the .startsWith() method on the String class to check if a String started with a blank character
Considering the following :
public static void main(String args[])
{
String s = "Hello";
if (s.startsWith(""))
{
System.out.println("It does");
}
}
It prints out It does
My question is, why do Strings start off with a blank character? I'm presuming that under the hood Strings are essentially character arrays, but in this case I would have thought the first character would be H
Can anyone explain please?

"" is an empty string containing no characters. There is no "empty character", unless you mean a space or the null character, neither of which are empty strings.
You can think of a string as starting with an infinite number of empty strings, just like you can think of a number as starting with an infinite number of leading zeros without any change to the meaning.
1 = ...00001
"foo" = ... + "" + "" + "" + "foo"
Strings also end with an infinite number of empty strings (as do decimal numbers with zeros):
1 = 001.000000...
"foo" = "foo" + "" + "" + "" + ...

Seems like there is a misunderstanding in your code. Your statement s.startsWith("") checks if string starts with an empty string (and not a blank character). It may be a weird implementation choice, anyway, it's as is : all strings will say you they start with an empty string.
Also notice a blank character will be the " " string, as opposed to your empty string "".

"Hello" starts with "" and it also starts with "H" and it also starts with "He" and it also sharts with "Hel" ... do you see?

That "" is not a blank it's an empty string. I guess that the API is asking the question is this a substring of that. And the zero-length empty string is a substring of everything.

The empty String ("") basically "satisfies" every string. In your example, java calls
s.startsWith("");
to
s.startsWith("", 0);
which essentially follows the principle that "an empty element(string) satisfies its constraint (your string sentence).".
From String.java
/**
* Tests if the substring of this string beginning at the
* specified index starts with the specified prefix.
*
* #param prefix the prefix.
* #param toffset where to begin looking in this string.
* #return <code>true</code> if the character sequence represented by the
* argument is a prefix of the substring of this object starting
* at index <code>toffset</code>; <code>false</code> otherwise.
* The result is <code>false</code> if <code>toffset</code> is
* negative or greater than the length of this
* <code>String</code> object; otherwise the result is the same
* as the result of the expression
* <pre>
* this.substring(toffset).startsWith(prefix)
* </pre>
*/
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = offset + toffset;
char pa[] = prefix.value;
int po = prefix.offset;
int pc = prefix.count;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > count - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}

For folks who have taken automata theory, this makes sense because the empty string ε is a substring of any string and also is the concatenation identity element, ie:
for all strings x, ε + x = x, and x + ε = x
So yes, every string "startWith" the empty string. Also note (as many others said it), the empty string is different from a blank or null character.

A blank is (" "), that's different from an empty string (""). A blank space is a character, the empty string is the absence of any character.

An empty string is not a blank character. Assuming your question with empty string, I guess they decided to leave it that way but it does seem odd. They could have checked the length but they didn't.

Related

Finding the strings in a TreeSet that start with a given prefix

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.
(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)
How can I do this?
To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.
But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:
/**
* #param prefix
* #return The least string that's greater than all strings starting with
* prefix, if one exists. Otherwise, returns Optional.empty().
* (Specifically, returns Optional.empty() if the prefix is the
* empty string, or is just a sequence of Character.MAX_VALUE-s.)
*/
private static Optional<String> incrementPrefix(final String prefix) {
final StringBuilder sb = new StringBuilder(prefix);
// remove any trailing occurrences of Character.MAX_VALUE:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
sb.setLength(sb.length() - 1);
}
// if the prefix is empty, then there's no upper bound:
if (sb.length() == 0) {
return Optional.empty();
}
// otherwise, increment the last character and return the result:
sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
return Optional.of(sb.toString());
}
To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:
/**
* #param allElements - a SortedSet of strings. This set must use the
* natural string ordering; otherwise this method
* may not behave as intended.
* #param prefix
* #return The subset of allElements containing the strings that start
* with prefix.
*/
private static SortedSet<String> getElementsWithPrefix(
final SortedSet<String> allElements, final String prefix) {
final Optional<String> endpoint = incrementPrefix(prefix);
if (endpoint.isPresent()) {
return allElements.subSet(prefix, endpoint.get());
} else {
return allElements.tailSet(prefix);
}
}
See it in action at: http://ideone.com/YvO4b3.
If anybody is looking for a shorter version of ruakh's answer:
First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)
public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
String first = set.ceiling(prefix);
char[] chars = prefix.toCharArray();
if(chars.length>0)
chars[chars.length-1] = (char) (chars[chars.length-1]+1);
String last = set.floor(new String(chars));
if(first==null || last==null || last.compareTo(first)<0)
return new TreeSet<>();
return set.subSet(first, true, last, true);
}

Strange behavior of Java String split() method

I have a method which takes a string parameter and split the string by # and after splitting it prints the length of the array along with array elements. Below is my code
public void StringSplitTesting(String inputString) {
String tokenArray[] = inputString.split("#");
System.out.println("tokenArray length is " + tokenArray.length
+ " and array elements are " + Arrays.toString(tokenArray));
}
Case I : Now when my input is abc# the output is tokenArray length is 1 and array elements are [abc]
Case II : But when my input is #abc the output is tokenArray length is 2 and array elements are [, abc]
But I was expecting the same output for both the cases. What is the reason behind this implementation? Why split() method is behaving like this? Could someone give me proper explanation on this?
One aspect of the behavior of the one-argument split method can be surprising -- trailing nulls are discarded from the returned array.
Trailing empty strings are therefore not included in the resulting array.
To get a length of 2 for each case, you can pass in a negative second argument to the two-argument split method, which means that the length is unrestricted and no trailing empty strings are discarded.
Just take a look in the documentation:
Trailing empty strings are therefore not included in the resulting
array.
So in case 1, the output would be {"abc", ""} but Java cuts the trailing empty String.
If you don't want the trailing empty String to be discarded, you have to use split("#", -1).
The observed behavior is due to the inherently asymmetric nature of the substring() method in Java:
This is the core of the implementation of split():
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
The key to understanding the behavior of the above code is to understand the behavior of the substring() method:
From the Javadocs:
String java.lang.String.substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at index
endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
"hamburger".substring(4, 8) returns "urge" (not "urger")
"smiles".substring(1, 5) returns "mile" (not "miles")
Hope this helps.

Comparing String Integers Issue

I have a scanner that reads a 7 character alphanumeric code (inputted by the user). the String variable is called "code".
The last character of the code (7th character, 6th index) MUST BE NUMERIC, while the rest may be either numeric or alphabetical.
So, I sought ought to make a catch, which would stop the rest of the method from executing if the last character in the code was anything but a number (from 0 - 9).
However, my code does not work as expected, seeing as even if my code ends in an integer between 0 and 9, the if statement will be met, and print out "last character in code is non-numerical).
example code: 45m4av7
CharacterAtEnd prints out as the string character 7, as it should.
however my program still tells me my code ends non-numerically.
I'm aware that my number values are string characters, but it shouldnt matter, should it?
also I apparently cannot compare actual integer values with an "|", which is mainly why im using String.valueOf, and taking the string characters of 0-9.
String characterAtEnd = String.valueOf(code.charAt(code.length()-1));
System.out.println(characterAtEnd);
if(!characterAtEnd.equals(String.valueOf(0|1|2|3|4|5|6|7|8|9))){
System.out.println("INVALID CRC CODE: last character in code in non-numerical.");
System.exit(0);
I cannot for the life of me, figure out why my program is telling me my code (that has a 7 at the end) ends non-numerically. It should skip the if statement and continue on. right?
The String contains method will work here:
String digits = "0123456789";
digits.contains(characterAtEnd); // true if ends with digit, false otherwise
String.valueOf(0|1|2|3|4|5|6|7|8|9) is actually "15", which of course can never be equal to the last character. This should make sense, because 0|1|2|3|4|5|6|7|8|9 evaluates to 15 using integer math, which then gets converted to a String.
Alternatively, try this:
String code = "45m4av7";
char characterAtEnd = code.charAt(code.length() - 1);
System.out.println(characterAtEnd);
if(characterAtEnd < '0' || characterAtEnd > '9'){
System.out.println("INVALID CRC CODE: last character in code in non-numerical.");
System.exit(0);
}
You are doing bitwise operations here: if(!characterAtEnd.equals(String.valueOf(0|1|2|3|4|5|6|7|8|9)))
Check out the difference between | and ||
This bit of code should accomplish your task using regular expressions:
String code = "45m4av7";
if (!code.matches("^.+?\\d$")){
System.out.println("INVALID CRC CODE");
}
Also, for reference, this method sometimes comes in handy in similar situations:
/* returns true if someString actually ends with the specified suffix */
someString.endsWith(suffix);
As .endswith(suffix) does not take regular expressions, if you wanted to go through all possible lower-case alphabet values, you'd need to do something like this:
/* ASCII approach */
String s = "hello";
boolean endsInLetter = false;
for (int i = 97; i <= 122; i++) {
if (s.endsWith(String.valueOf(Character.toChars(i)))) {
endsInLetter = true;
}
}
System.out.println(endsInLetter);
/* String approach */
String alphabet = "abcdefghijklmnopqrstuvwxyz";
boolean endsInLetter2 = false;
for (int i = 0; i < alphabet.length(); i++) {
if (s.endsWith(String.valueOf(alphabet.charAt(i)))) {
endsInLetter2 = true;
}
}
System.out.println(endsInLetter2);
Note that neither of the aforementioned approaches are a good idea - they are clunky and rather inefficient.
Going off of the ASCII approach, you could even do something like this:
ASCII reference : http://www.asciitable.com/
int i = (int)code.charAt(code.length() - 1);
/* Corresponding ASCII values to digits */
if(i <= 57 && i >= 48){
System.out.println("Last char is a digit!");
}
If you want a one-liner, stick to regular expressions, for example:
System.out.println((!code.matches("^.+?\\d$")? "Invalid CRC Code" : "Valid CRC Code"));
I hope this helps!

indexOf (String str) - equals string to other string

I need to write method that will chek "String str" on other string, and return the index that the str starts.
That's sound like homework, and it is some of homework but for my use to learn for a test...
i've tried:
public int IndexOf (String str) {
for (i= 0;i<_st.length();i++)
{
if (_st.charAt(i) == str.charAt(i)) {
i++;
if (_st.charAt(i) == str.charAt(i)) {
return i;
}
}
}
return -1;
}
but i dont get the right return. why? i'm on the right way or don't even close?
I am afraid, you are not close.
Here's what you have to do:
Loop on the characters of the string (the one on which you are supposed to do an indexOf, I will call this the master) (you are going this right)
For every character check whether your other string's character and this character are the same.
If they are (a potential start of the same sequence) check whether the next characters in the master match with your String to check (You might want to loop through the elements of the string and check one by one).
If they don't match, continue with the characters in the master string
Something like:
Loop master string
for every character (using index i, lets say)
check whether this is same as first character of the other string
if it is
//potential match
loop through the characters in the child string (lets say using index j)
match them with the consecutive characters in the master string
(something like master[j+i] == sub[j])
If everything match, 'i' is what you want
otherwise, continue with the master, hoping you find a match
Some other points:
In java, method names start with a
lower case letter by convention
(meaning, the compiler won't
complain, but your fellow programmers
may). So IndexOf should actually be
indexOf
Having instance variables
(class level variables) start with a
_ (as in _st) is not a really good
practice. If your professor insists,
you may not have many options, but
keep this in mind)
Not really very close, I'm afraid. What that code basically does is check there if the two strings have two characters in the same positions at any point and, if so, returns the index of the second of those characters. E.g., if _str is "abcdefg" and str is "12cd45", you'll return 3 because they have "cd" in the same place, and that's the index of the "d". At least, that's as near as I can tell what it's actually doing. That's because you're indexing into both strings with the same indexing variable.
To re-write indexOf, looking for str within _st, you have to scan _st for the first character in str and then check whether the remaining characters match; if not, bump forward one place from where you started checking and continue your scan. (There are optimisations you can do, but that's the essence of it.) So for instance, if you find the first character of str at index 4 in _st and str is six characters long, having found the first character you need to see if the remaining five (str's indexes 1-5 inclusive) match _st's indexes 5-10 inclusive (easiest just to check all six of str's characters against a substring of _st starting at 4 and going for six charactesr). If everything matches, return the index at which you found the first character (so, 4 in that example). You can stop scanning at _st.length() - str.length() since if you haven't found it starting prior to that point, you're not going to find it at all.
Side point: Don't call the length function on every loop. The JIT may be able to optimize out the call, but if you know that _st won't change during the course of this function (and if you don't know that, you should require it), grab length() to a local and then refer to that. And of course, since you know you can stop earlier than length(), you'l use a local to remember where you can stop.
You are using i for both strings equal, but what you wan't is the first string to always start at 0 unless the character is found is the other string. Then check if the next characters are equal and so on.
Hope this helps
Your code loops through the string being searched and if the characters at position i match, it checks the next position. If the strings match at the next position, you assume that the string str is contained in _st.
What you probably want to do is:
keep track of whether the whole of str is contained in _st. You could probably check whether the string that you are searching for has length equal to the number of matching characters so far.
if you do the above then you could get the starting index by subtracting the number of matches so far from the current value of i.
One question:
Why are you not using the built in String.IndexOf() function? Is this assignment meant for you to implement this functionality on your own?
Maybe the Oracle Java API Source code does help:
/**
* Returns the index within this string of the first occurrence of the
* specified substring. The integer returned is the smallest value
* <i>k</i> such that:
* <blockquote><pre>
* this.startsWith(str, <i>k</i>)
* </pre></blockquote>
* is <code>true</code>.
*
* #param str any string.
* #return if the string argument occurs as a substring within this
* object, then the index of the first character of the first
* such substring is returned; if it does not occur as a
* substring, <code>-1</code> is returned.
*/
public int indexOf(String str) {
return indexOf(str, 0);
}
/**
* Returns the index within this string of the first occurrence of the
* specified substring, starting at the specified index. The integer
* returned is the smallest value <tt>k</tt> for which:
* <blockquote><pre>
* k >= Math.min(fromIndex, this.length()) && this.startsWith(str, k)
* </pre></blockquote>
* If no such value of <i>k</i> exists, then -1 is returned.
*
* #param str the substring for which to search.
* #param fromIndex the index from which to start the search.
* #return the index within this string of the first occurrence of the
* specified substring, starting at the specified index.
*/
public int indexOf(String str, int fromIndex) {
return indexOf(value, offset, count,
str.value, str.offset, str.count, fromIndex);
}
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* #param source the characters being searched.
* #param sourceOffset offset of the source string.
* #param sourceCount count of the source string.
* #param target the characters being searched for.
* #param targetOffset offset of the target string.
* #param targetCount count of the target string.
* #param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j] ==
target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}

Java String.indexOf and empty Strings

I'm curious why the String.indexOf is returning a 0 (instead of -1) when asking for the index of an empty string within a string.
The Javadocs only say this method returns the index in this string of the specified string, -1 if the string isn't found.
To me this behavior seems highly unexpected, I would have expected a -1. Any ideas why this unexpected behavior is going on? I would at the least think this is worth a note in the method's Javadocs...
System.out.println("FOO".indexOf("")); // outputs 0 wtf!!!
System.out.println("FOO".indexOf("bar")); // outputs -1 as expected
System.out.println("FOO".indexOf("F")); // outputs 0 as expected
System.out.println("".indexOf("")); // outputs 0 as expected, I think
The empty string is everywhere, and nowhere. It is within all strings at all times, permeating the essence of their being, yet as you seek it you shall never catch a glimpse.
How many empty strings can you fit at the beginning of a string? Mu
The student said to the teacher,
Teacher, I believe that I have found the nature of the empty string. The empty string is like a particle of dust, and it floats freely through a string as dust floats freely through the room, glistening in a beam of sunlight.
The teacher responded to the student,
Hmm. A fine notion. Now tell me, where is the dust, and where is the sunlight?
The teacher struck the student with a strap and instructed him to continue his meditation.
Well, if it helps, you can think of "FOO" as "" + "FOO".
int number_of_empty_strings_in_string_named_text = text.length() + 1
All characters are separated by an empty String. Additionally empty String is present at the beginning and at the end.
By using the expression "", you are actually referring to a null string. A null string is an ethereal tag placed on something that exists only to show that there is a lack of anything at this location.
So, by saying "".indexOf( "" ), you are really asking the interpreter:
Where does a string value of null exist in my null string?
It returns a zero, since the null is at the beginning of the non-existent null string.
To add anything to the string would now make it a non-null string... null can be thought of as the absence of everything, even nothing.
Using an algebraic approach, "" is the neutral element of string concatenation: x + "" == x and "" + x == x (although + is non commutative here).
Then it must also be:
x.indexOf ( y ) == i and i != -1
<==> x.substring ( 0, i ) + y + x.substring ( i + y.length () ) == x
when y = "", this holds if i == 0 and x.substring ( 0, 0 ) == "".
I didn't design Java, but I guess mathematicians participated in it...
if we look inside of String implementation for a method "foo".indexOf(""), we arrive at this method:
public int indexOf(String str) {
byte coder = coder();
if (coder == str.coder()) {
return isLatin1() ? StringLatin1.indexOf(value, str.value)
: StringUTF16.indexOf(value, str.value);
}
if (coder == LATIN1) { // str.coder == UTF16
return -1;
}
return StringUTF16.indexOfLatin1(value, str.value);
}
If we look inside of any of the called indexOf(value, str.value) methods we find a condition that says:
if the second parameter (string we are searching for) length is 0 return 0:
public static int indexOf(byte[] value, byte[] str) {
if (str.length == 0) {
return 0;
}
...
This is just defensive coding for an edge case, and it is necessary because in the next method that is called to do actual searching by comparing bytes of the string (string is a byte array) it would otherwise have resulted in an ArrayIndexOutOfBounds exception:
public static int indexOf(byte[] value, int valueCount, byte[] str, int strCount, int fromIndex) {
byte first = str[0];
...
This question is actually two questions:
Why should a string contain the empty string?
Why should the empty string be found specifically at index zero?
Answering #1:
A string contains the empty string in order to be in accordance with Set Theory, according to which:
The empty set is a subset of every set including itself.
This also means that even the empty string contains the empty string, and the following statement proves it:
assert "".indexOf( "" ) == 0;
I am not sure why mathematicians have decided that it should be so, but I am pretty sure they have their reasons, and it appears that these reasons can be explained in layman's terms, as various youtube videos seem to do, (for example, https://www.youtube.com/watch?v=1nBKadtFViM) although I have not actually viewed any of those videos, because #AintNoBodyGotNoTimeFoDat.
Answering #2:
The empty string can be found specifically at index zero of any string, because why not? In other words, if not at index zero, then at which index? Index zero is as good as any other index, and index zero is guaranteed to be a valid index for all strings except for the trifling exception of the empty string.

Categories