Find char index where contains() found sequence - java

if(input.contains("Angle ")) {
input.charAt(?);
}
So, basically, how would you find the char directly after "Angle "? In absolute simplest terms, how do you find the indexes in which "Angle " was found?

You can use the indexOf method both to find out that the input contains the string, and where its index is:
int pos = input.indexOf("Angle ");
if (pos >= 0) {
... // Substring is found at index pos
}

Have you tried the indexOf() method?
From java doc...
Returns the index within this string of the first occurrence of the
specified substring. The integer returned is the smallest value k such
that: this.startsWith(str, k) is true.
Then since you know the length of the string, you could add that to the input to find the char directly after "Angle ".

To find the word following "Angle ", you could use regex:
String next = str.replaceAll(".*Angle (\\w+).*", "$1");
Then you don't have to sully yourself with indexes, iteration and lots of code.

As others have stated, you may use indexOf to find the location of the substring. If you have more than one occurrence of the substring and you want to find all of them, you can use the version of indexOf that takes a starting position to continue the search after the current occurrence, e.g. to find all occurrences of needle in haystack:
int index = 0;
while ((index = haystack.indexOf(needle, index)) != -1) {
System.out.println("Found substring at " + index);
index += needle.length();
}
Note, by the way, that .contains(needle) is essentially the same as .indexOf(needle) > -1 (in fact, that is precisely how contains() is implemented).

Related

How many times the word is used on the html page

I have a method that should return an integer which is the number of uses of the searchWord in the text of an HTML document:
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
if (bodyText.toLowerCase().contains(searchWord.toLowerCase())){
count++;
}
return count;
}
But my method always returns count=1, even if the word is used several times. I understand that the error should be obvious, but I’m stuck and I don’t see it.
You are currently only checking once that the text contains the search word, so the count will always be either 0 or 1. To find the total count, keep looping using String#indexOf(str, fromIndex) while the String can be found using the second argument that indicates the index to start searching from.
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
for(int idx = -1; (idx = bodyText.indexOf(searchWord, idx + 1)) != -1; count++);
return count;
}
According to the Java docs String#contains:
Returns true if and only if this string contains the specified sequence of char values.
You're asking if the word you're looking for is contained in the document, which it is.
You could:
Split the text on words (splitting it by spaces) and then count how many times it appears
Iterate the String using String#indexOf starting on index 0 and then from last index you found until the end of the String.
Iterate the String using contains but starting from a certain index (doing this logic yourself).
I'd go for the 2nd approach as it seems like the easiest one.
These are only conditional statements, you aren't looping through the HTML text, therefor, if it finds the instance of searchWord in bodyText, it'll increment it, and then exit the method with a value of 1. I suggest looping through every word in the html, adding it to an array, and counting it that way using something like this:
char[] bodyTextA = bodyText.toCharArray();
Or keep it in a string array and split it by a space, or new line, or whatever criteria you have. Example of space:
//puts hello, i'm, your, and string into their own array slots in the array
/split
str = "Hello I'm your String";
String[] split = str.split("\\s+");
Your issue here is that the if statement is checking if the text contains the word and the increments your count variable. So even if it contains the word multiple time, your logic goes basically, if it contains it at all, increase count by one. You will have to rewrite your code to check for multiple occurrences of the word. There are many ways you can go about this, you could loop through the entire body text, you could split the body text into an array of words and check that, or you could remove the search word from the text each time you find it and keep checking until it no longer contains the search word.
You can use indexOf(,) with an index for the last found word
public int searchForWord(String searchWord) {
int count = 0;
if(this.htmlDocument == null){
System.out.println("ERROR! Call crawl() before performing analysis on the document");
}
System.out.println("Searching for the word " + searchWord + "...");
String bodyText = this.htmlDocument.body().text();
int index = 0;
while ((index = bodyText.indexOf(searchWord, index + 1)) != -1) {
count++;
}
return count;
}

indexOf method asking for a char that appears multiple times?

String str = "Aardvark";
str.indexOf('a');
I was wondering what index str would return if it asked for a certain character and the string contained multiple of it. For example, aardvark: would the method return index 0, for the first instance it saw the char? There are 3 'a' chars in the word, so which would it return?
One additional question (couldn't fit it in the original question)
What is the difference between
str.indexOf('a');
and
str.indexOf("a");
I know the first is a char and the second is a String, but if str = "Aardvark", wouldn't the second statement return -1 or some sort of error, because "a" refers to a single-character String, not one char of a string?
I'm very sorry if this was unclear, I couldn't really think of a better way to pose my question. Thanks in advance!
indexOf() will return the index of the first occurrence of the string/char
like you say, one looks for a char and the other on a sub string. "a" will be found, as "a" is a substring of "Aardvark"
It would print the first occurence..
To get the second occurence you
would have to
fill in
indexOf(char c, int lookafterfirstindex);
indexOf can also take those two parameters instead of just the char.
Link to API Doc:
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String,%20int)
Here is a simple example:
String text = "abcd_a";
System.out.println("Index of a: "+ text.indexOf('a')); // Index of a: 0
System.out.println("Index of a: "+ text.indexOf("a")); // Index of a: 0
System.out.println("Index of b: "+ text.indexOf('b')); // Index of b: 1
System.out.println("Index of c: "+ text.indexOf('c')); // Index of c: 2
System.out.println("Index of z: "+ text.indexOf('z')); // Index of z: -1
simple index of:
indexOf(char/string) will always return the first index of the occurrence.
from index:
There is also indexOf(char/string, int fromIndex) - which will search from a given position in your string.
last index:
There is a lastIndexOf(char/string) - which will search last occurrence.
Regarding the char vs String, I would use char if I only need one char index lookup. The char will peform much faster than the String index-lookup-methods!!!
Java String Spec

Strange behavior of Java String split() method

I have a method which takes a string parameter and split the string by # and after splitting it prints the length of the array along with array elements. Below is my code
public void StringSplitTesting(String inputString) {
String tokenArray[] = inputString.split("#");
System.out.println("tokenArray length is " + tokenArray.length
+ " and array elements are " + Arrays.toString(tokenArray));
}
Case I : Now when my input is abc# the output is tokenArray length is 1 and array elements are [abc]
Case II : But when my input is #abc the output is tokenArray length is 2 and array elements are [, abc]
But I was expecting the same output for both the cases. What is the reason behind this implementation? Why split() method is behaving like this? Could someone give me proper explanation on this?
One aspect of the behavior of the one-argument split method can be surprising -- trailing nulls are discarded from the returned array.
Trailing empty strings are therefore not included in the resulting array.
To get a length of 2 for each case, you can pass in a negative second argument to the two-argument split method, which means that the length is unrestricted and no trailing empty strings are discarded.
Just take a look in the documentation:
Trailing empty strings are therefore not included in the resulting
array.
So in case 1, the output would be {"abc", ""} but Java cuts the trailing empty String.
If you don't want the trailing empty String to be discarded, you have to use split("#", -1).
The observed behavior is due to the inherently asymmetric nature of the substring() method in Java:
This is the core of the implementation of split():
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
The key to understanding the behavior of the above code is to understand the behavior of the substring() method:
From the Javadocs:
String java.lang.String.substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at index
endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
"hamburger".substring(4, 8) returns "urge" (not "urger")
"smiles".substring(1, 5) returns "mile" (not "miles")
Hope this helps.

In following program, what is the purpose of the while loop?

There are no problems with the compilation, but whether or not I have the while loop in place or not, the result is the same. I can't understand why the while loop is included. BTW, this is just an example program from the Java SE tutorial:
public class ContinueWithLabelDemo {
public static void main(String[] args) {
String searchMe = "Look for a substring in me";
String substring = "sub";
boolean foundIt = false;
int max = searchMe.length() - substring.length();
test:
for (int i = 0; i <= max; i++) {
int n = substring.length();
int j = i;
int k = 0;
while (n-- != 0) { // WTF???
if (searchMe.charAt(j++) != substring.charAt(k++)) {
continue test;
}
}
foundIt = true;
break test;
}
System.out.println(foundIt ? "Found it" : "Didn't find it");
}
}
You can replace your
while (n-- != 0) { // WTF???
with
System.out.println("outside loop");
while (n-- != 0) { // WTF???
System.out.println("inside loop: comparing "
+ searchMe.charAt(j) + ":" + substring.charAt(k));
to see how this example works. Below is little explanation.
This code is searching for substring in searchMe string. Take a look at this example:
Look for a substring in me
^
sub
If you compare characters at position 0 in searchMe and substring you will notice that they are not the same L != s so we can skip matching rest of letters and go to next position (that is the purpose of continue test;)
Look for a substring in me
^
sub
So now we will try compare next letter with first letter of searchMe with first letter of substring. This time we get o!=s so there is no way that substring starts in this place, lets carry on.
After few comparisons we finally found promising place
Look for a substring in me
^
sub
where first letter of substring is the same as current letter in searchMe (s==s) so we wont jump from while loop yet and will try to check next letter. And we have another success
Look for a substring in me
^
sub
because u==u, so we will continue our loop until we iterate over our entire substring which can happen in next step.
Look for a substring in me
^
sub
And this time we compared b with b. Since they are equal and we don't have more letters in substring to check we can set value of foundIt to true and brake test for loop.
And that is the end.
If you remove while from your code you will get positive response as soon as you will find first character that will match first letter of substring in your case in after checking Look for a program will match s with first letter on substring which will also be s.
While loop is used here to iterate over entire substring and only in case of fail in matching corresponding characters we will move searching one place forward. If we would ignore this inner loop and just iterate over entire data we can ignore some positive results like in case where we would look for aab in aaab String. Take a look
aaab
aab
^^
^ will match but after them we will have to match a with b which will fail. Without inner while loop we would probably start another match from last checked position that failed which would be
aaab
aab
^
This time we also failed to find match for substring so we skipped a*aab* part.

Finding index of an array while only knowing half the value

sender.sendMessage("Your referal code is: " + codestring[ArrayUtils.indexOf(namestring, value )]);
the value is equal to "name" plus a random number, how can i make this work without knowing the second part of this string array?
iterate through array and check for startsWith()
for(int index = 0 ; index < array.length ; index ++){
if(array[index].startsWith(key)){return index;}
}
return -1; // not found
I didn't understand what you asked, but if you're trying to find a String, knowing only the first characters, you might use a regular expression to check, like:
for(String string: arrayOfStrings){
if(string.matches("beginningOfString^[1-9]")){
// your code
}
}

Categories