indexOf (String str) - equals string to other string - java

I need to write method that will chek "String str" on other string, and return the index that the str starts.
That's sound like homework, and it is some of homework but for my use to learn for a test...
i've tried:
public int IndexOf (String str) {
for (i= 0;i<_st.length();i++)
{
if (_st.charAt(i) == str.charAt(i)) {
i++;
if (_st.charAt(i) == str.charAt(i)) {
return i;
}
}
}
return -1;
}
but i dont get the right return. why? i'm on the right way or don't even close?

I am afraid, you are not close.
Here's what you have to do:
Loop on the characters of the string (the one on which you are supposed to do an indexOf, I will call this the master) (you are going this right)
For every character check whether your other string's character and this character are the same.
If they are (a potential start of the same sequence) check whether the next characters in the master match with your String to check (You might want to loop through the elements of the string and check one by one).
If they don't match, continue with the characters in the master string
Something like:
Loop master string
for every character (using index i, lets say)
check whether this is same as first character of the other string
if it is
//potential match
loop through the characters in the child string (lets say using index j)
match them with the consecutive characters in the master string
(something like master[j+i] == sub[j])
If everything match, 'i' is what you want
otherwise, continue with the master, hoping you find a match
Some other points:
In java, method names start with a
lower case letter by convention
(meaning, the compiler won't
complain, but your fellow programmers
may). So IndexOf should actually be
indexOf
Having instance variables
(class level variables) start with a
_ (as in _st) is not a really good
practice. If your professor insists,
you may not have many options, but
keep this in mind)

Not really very close, I'm afraid. What that code basically does is check there if the two strings have two characters in the same positions at any point and, if so, returns the index of the second of those characters. E.g., if _str is "abcdefg" and str is "12cd45", you'll return 3 because they have "cd" in the same place, and that's the index of the "d". At least, that's as near as I can tell what it's actually doing. That's because you're indexing into both strings with the same indexing variable.
To re-write indexOf, looking for str within _st, you have to scan _st for the first character in str and then check whether the remaining characters match; if not, bump forward one place from where you started checking and continue your scan. (There are optimisations you can do, but that's the essence of it.) So for instance, if you find the first character of str at index 4 in _st and str is six characters long, having found the first character you need to see if the remaining five (str's indexes 1-5 inclusive) match _st's indexes 5-10 inclusive (easiest just to check all six of str's characters against a substring of _st starting at 4 and going for six charactesr). If everything matches, return the index at which you found the first character (so, 4 in that example). You can stop scanning at _st.length() - str.length() since if you haven't found it starting prior to that point, you're not going to find it at all.
Side point: Don't call the length function on every loop. The JIT may be able to optimize out the call, but if you know that _st won't change during the course of this function (and if you don't know that, you should require it), grab length() to a local and then refer to that. And of course, since you know you can stop earlier than length(), you'l use a local to remember where you can stop.

You are using i for both strings equal, but what you wan't is the first string to always start at 0 unless the character is found is the other string. Then check if the next characters are equal and so on.
Hope this helps

Your code loops through the string being searched and if the characters at position i match, it checks the next position. If the strings match at the next position, you assume that the string str is contained in _st.
What you probably want to do is:
keep track of whether the whole of str is contained in _st. You could probably check whether the string that you are searching for has length equal to the number of matching characters so far.
if you do the above then you could get the starting index by subtracting the number of matches so far from the current value of i.
One question:
Why are you not using the built in String.IndexOf() function? Is this assignment meant for you to implement this functionality on your own?

Maybe the Oracle Java API Source code does help:
/**
* Returns the index within this string of the first occurrence of the
* specified substring. The integer returned is the smallest value
* <i>k</i> such that:
* <blockquote><pre>
* this.startsWith(str, <i>k</i>)
* </pre></blockquote>
* is <code>true</code>.
*
* #param str any string.
* #return if the string argument occurs as a substring within this
* object, then the index of the first character of the first
* such substring is returned; if it does not occur as a
* substring, <code>-1</code> is returned.
*/
public int indexOf(String str) {
return indexOf(str, 0);
}
/**
* Returns the index within this string of the first occurrence of the
* specified substring, starting at the specified index. The integer
* returned is the smallest value <tt>k</tt> for which:
* <blockquote><pre>
* k >= Math.min(fromIndex, this.length()) && this.startsWith(str, k)
* </pre></blockquote>
* If no such value of <i>k</i> exists, then -1 is returned.
*
* #param str the substring for which to search.
* #param fromIndex the index from which to start the search.
* #return the index within this string of the first occurrence of the
* specified substring, starting at the specified index.
*/
public int indexOf(String str, int fromIndex) {
return indexOf(value, offset, count,
str.value, str.offset, str.count, fromIndex);
}
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* #param source the characters being searched.
* #param sourceOffset offset of the source string.
* #param sourceCount count of the source string.
* #param target the characters being searched for.
* #param targetOffset offset of the target string.
* #param targetCount count of the target string.
* #param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j] ==
target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}

Related

What happens when if statement goes true (in this code)?

There is a problem in codingbat.com which you're supposed to remove "yak" substring from the original string. and they provided a solution for that which I can't understand what happens when the if statement goes true!
public String stringYak(String str) {
String result = "";
for (int i=0; i<str.length(); i++) {
// Look for i starting a "yak" -- advance i in that case
if (i+2<str.length() && str.charAt(i)=='y' && str.charAt(i+2)=='k') {
i = i + 2;
} else { // Otherwise do the normal append
result = result + str.charAt(i);
}
}
return result;
}
It just adds up i by 2 and what? When it appends to the result string?
Link of the problem:
https://codingbat.com/prob/p126212
The provided solution checks for all single characters in the input string. For this i is the current index of the checked character. When the current char is not a y and also the (i+2) character is not a k the current char index is advanced by 1 position.
Example:
yakpak
012345
i
So here in the first iteration the char at i is y and i+2 is a k, so we have to skip 3 chars. Keep in mind i is advanced by 1 everytime. So i has to be increased by 2 more. After this iteration i is here
yakpak
012345
i
So now the current char is no y and this char will get added to the result string.
But it's even simpler in Java as this functionality is build in with regex:
public String stringYak(String str) {
return str.replaceAll("y.k","");
}
The . means every char.
If i is pointing at a y and there is as k two positions down, then it wants to skip the full y*k substring, so it add 2 to i so i now refers to the k. WHen then loop continues, i++ will skip past the k, so in effect, the entire 3-letter y*k substring has been skipped.

How to read and return the second index of an array if first index matches a string?

I'm trying to write a translate method using the following parameters. However, every time I run the method it skips the first if statement and goes right to the second for loop.
/**
Translates a word according to the data in wordList then matches the case.
The parameter wordList contains the mappings for the translation. The data is
organized in an ArrayList containing String arrays of length 2. The first
cell (index 0) contains the word in the original language, called the key,
and the second cell (index 1) contains the translation.
It is assumed that the items in the wordList are sorted in ascending order
according to the keys in the first cell.
#param word
The word to translate.
#param wordList
An ArrayList containing the translation mappings.
#return The mapping in the wordList with the same case as the original. If no
match is found in wordList, it returns a string of Config.LINE_CHAR of the same length as word.
*/
public static String translate(String word, ArrayList<String[]> wordList) {
String newWord = "";
int i = 0;
for (i = 0; i < wordList.size(); i++) {
word = matchCase(wordList.get(i)[0], word); //make cases match
if (word.equals(wordList.get(i)[0])) { //check each index at 0
newWord = wordList.get(i)[1]; //update newWord to skip second for loop
return wordList.get(i)[1];
}
}
if (newWord == "") {
for (i = 0; i < word.length(); i++) {
newWord += Config.LINE_CHAR;
}
}
return newWord;
}
For the files I'm running, each word should have a translated word so no Config.LINE_CHAR should be printed. But this is the only thing that prints. How do I fix this.
You are initializing newWord to the value "". The only time newWord can possibly change is in the first loop, where it is promptly followed by a return statement, exiting your method. The only way your if statement can be reached is if you didn't return during the first loop, so if it reaches that if statement, then newWord must be unchanged since its initial assignment of "".
Some unrelated advice: You should use the equals operator when comparing strings. For example, if ("".equals(newWord)). Otherwise, you're comparing the memory address of the two String objects rather than their values.
You may need to share your matchCase method to ensure all bugs are addressed, though.

Finding the strings in a TreeSet that start with a given prefix

I'm trying to find the strings in a TreeSet<String> that start with a given prefix. I found a previous question asking for the same thing — Searching for a record in a TreeSet on the fly — but the answer given there doesn't work for me, because it assumes that the strings don't include Character.MAX_VALUE, and mine can.
(The answer there is to use treeSet.subSet(prefix, prefix + Character.MAX_VALUE), which gives all strings between prefix (inclusive) and prefix + Character.MAX_VALUE (exclusive), which comes out to all strings that start with prefix except those that start with prefix + Character.MAX_VALUE. But in my case I need to find all strings that start with prefix, including those that start with prefix + Character.MAX_VALUE.)
How can I do this?
To start with, I suggest re-examining your requirements. Character.MAX_VALUE is U+FFFF, which is not a valid Unicode character and never will be; so I can't think of a good reason why you would need to support it.
But if there's a good reason for that requirement, then — you need to "increment" your prefix to compute the least string that's greater than all strings starting with your prefix. For example, given "city", you need "citz". You can do that as follows:
/**
* #param prefix
* #return The least string that's greater than all strings starting with
* prefix, if one exists. Otherwise, returns Optional.empty().
* (Specifically, returns Optional.empty() if the prefix is the
* empty string, or is just a sequence of Character.MAX_VALUE-s.)
*/
private static Optional<String> incrementPrefix(final String prefix) {
final StringBuilder sb = new StringBuilder(prefix);
// remove any trailing occurrences of Character.MAX_VALUE:
while (sb.length() > 0 && sb.charAt(sb.length() - 1) == Character.MAX_VALUE) {
sb.setLength(sb.length() - 1);
}
// if the prefix is empty, then there's no upper bound:
if (sb.length() == 0) {
return Optional.empty();
}
// otherwise, increment the last character and return the result:
sb.setCharAt(sb.length() - 1, (char) (sb.charAt(sb.length() - 1) + 1));
return Optional.of(sb.toString());
}
To use it, you need to use subSet when the above method returns a string, and tailSet when it returns nothing:
/**
* #param allElements - a SortedSet of strings. This set must use the
* natural string ordering; otherwise this method
* may not behave as intended.
* #param prefix
* #return The subset of allElements containing the strings that start
* with prefix.
*/
private static SortedSet<String> getElementsWithPrefix(
final SortedSet<String> allElements, final String prefix) {
final Optional<String> endpoint = incrementPrefix(prefix);
if (endpoint.isPresent()) {
return allElements.subSet(prefix, endpoint.get());
} else {
return allElements.tailSet(prefix);
}
}
See it in action at: http://ideone.com/YvO4b3.
If anybody is looking for a shorter version of ruakh's answer:
First element is actually set.ceiling(prefix),and last - you have to increment the prefix and use set.floor(next_prefix)
public NavigableSet<String> subSetWithPrefix(NavigableSet<String> set, String prefix) {
String first = set.ceiling(prefix);
char[] chars = prefix.toCharArray();
if(chars.length>0)
chars[chars.length-1] = (char) (chars[chars.length-1]+1);
String last = set.floor(new String(chars));
if(first==null || last==null || last.compareTo(first)<0)
return new TreeSet<>();
return set.subSet(first, true, last, true);
}

Is there an algorithm for fuzzy search like Levenshtein Distance specialised for a set of ordered character?

I found an algorithm (on https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance) and after reading a bit more about levenshtein, I understood there should be a better way of telling the edit distance of twwo strings if these strings are strictly composed of ascii-aphabetically ordered and unique chars.
Meaning, for every a and b like a < b, a will be prior to b, and the reciprocal (or contraposed or I don't remember) for every a, b, and c like a < b < c, if one strings reads ac and the other ab, one knows for sure the first one does not contain the b.
And that precisely means there is a better way of determining the edit distance between two strings of this kind.
If it is any useful, the class I'm using to organize my characters is a TreeSet of Character.
This is the solution I came up with :
I used it with values :
String tested, String target, 1, 0.
/** returns the cost of the difference between a tested CharSequence and a target CharSequence. CS = CharSequence
* #param tested input, the CS which will be compared to the target. all letters sorted by ASCII order and unique
* #param target is the CS to which the tested will be compared. all letters sorted by ASCII order and unique
* #param positiveDifferenceCost is the cost to add when a letter is in the tested CS but not in the target.
* #param negativeDifferenceCost is the cost to add when a letter is in the target CS but not in the tested.
* #return int the number of differences.
*/
public static int oneSidedOAUDistance(final CharSequence tested, final CharSequence target,
final int positiveDifferenceCost, final int negativeDifferenceCost) {
int diffCount = 0;
int index_tested = 0;
int index_target = 0;
if (positiveDifferenceCost == 0 && negativeDifferenceCost == 0)
return 0;
for (; index_tested < tested.length() && index_target < target.length(); ) {
if (tested.charAt(index_tested) == target.charAt(index_target)) {
++index_tested;
++index_target;
continue;
}
if (tested.charAt(index_tested) < target.charAt(index_target)) {
//some letters should not be there in tested string.
diffCount+= positiveDifferenceCost;
index_tested++;
} else {
//some letters miss in tested string.
diffCount+=negativeDifferenceCost;
index_target++;
}
}
return diffCount;
}

Why do Strings start with a "" in Java? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does “abcd”.StartsWith(“”) return true?
Whilst debugging through some code I found a particular piece of my validation was using the .startsWith() method on the String class to check if a String started with a blank character
Considering the following :
public static void main(String args[])
{
String s = "Hello";
if (s.startsWith(""))
{
System.out.println("It does");
}
}
It prints out It does
My question is, why do Strings start off with a blank character? I'm presuming that under the hood Strings are essentially character arrays, but in this case I would have thought the first character would be H
Can anyone explain please?
"" is an empty string containing no characters. There is no "empty character", unless you mean a space or the null character, neither of which are empty strings.
You can think of a string as starting with an infinite number of empty strings, just like you can think of a number as starting with an infinite number of leading zeros without any change to the meaning.
1 = ...00001
"foo" = ... + "" + "" + "" + "foo"
Strings also end with an infinite number of empty strings (as do decimal numbers with zeros):
1 = 001.000000...
"foo" = "foo" + "" + "" + "" + ...
Seems like there is a misunderstanding in your code. Your statement s.startsWith("") checks if string starts with an empty string (and not a blank character). It may be a weird implementation choice, anyway, it's as is : all strings will say you they start with an empty string.
Also notice a blank character will be the " " string, as opposed to your empty string "".
"Hello" starts with "" and it also starts with "H" and it also starts with "He" and it also sharts with "Hel" ... do you see?
That "" is not a blank it's an empty string. I guess that the API is asking the question is this a substring of that. And the zero-length empty string is a substring of everything.
The empty String ("") basically "satisfies" every string. In your example, java calls
s.startsWith("");
to
s.startsWith("", 0);
which essentially follows the principle that "an empty element(string) satisfies its constraint (your string sentence).".
From String.java
/**
* Tests if the substring of this string beginning at the
* specified index starts with the specified prefix.
*
* #param prefix the prefix.
* #param toffset where to begin looking in this string.
* #return <code>true</code> if the character sequence represented by the
* argument is a prefix of the substring of this object starting
* at index <code>toffset</code>; <code>false</code> otherwise.
* The result is <code>false</code> if <code>toffset</code> is
* negative or greater than the length of this
* <code>String</code> object; otherwise the result is the same
* as the result of the expression
* <pre>
* this.substring(toffset).startsWith(prefix)
* </pre>
*/
public boolean startsWith(String prefix, int toffset) {
char ta[] = value;
int to = offset + toffset;
char pa[] = prefix.value;
int po = prefix.offset;
int pc = prefix.count;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > count - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
For folks who have taken automata theory, this makes sense because the empty string ε is a substring of any string and also is the concatenation identity element, ie:
for all strings x, ε + x = x, and x + ε = x
So yes, every string "startWith" the empty string. Also note (as many others said it), the empty string is different from a blank or null character.
A blank is (" "), that's different from an empty string (""). A blank space is a character, the empty string is the absence of any character.
An empty string is not a blank character. Assuming your question with empty string, I guess they decided to leave it that way but it does seem odd. They could have checked the length but they didn't.

Categories