How do I manually find a substring in a string? (Java)

How do I manually find a substring in a string? (Java) - java

public int lookFor(String s) {
final int EXIST = 1;
final int NOT_EXIST = -1;
int thisIndex = 0;
int otherIndex = 0;
char thisNext;
char otherNext;
if (s == null || s.length() == 0)
return NOT_EXIST;
for(; thisIndex < this.mainString.length() ; ) {
thisNext = this.mainString.charAt(thisIndex);
otherNext = s.charAt(otherIndex);
if (thisNext == otherNext) {
thisIndex++;
otherIndex++;
}
else if (thisNext != otherNext)
thisIndex++;
if (otherIndex == s.length()-1)
return EXIST;
}
return NOT_EXIST;
}
This is my attempt so far.
mainString = the main string I want to find the substring in.
s = the substring.
So my idea was to get the first chars of both strings, see if they equal. if they don't, i'll get the second char of mainString, see if they equal (mainString second char to s first char). If they're not equal, i'll get the third char of mainString and so forth. Once they're equal, i'll get the next char of both strings and see if they both equal.
Basically the loops knows that mainString contains s when index of s equals to s length minus one (that means the loop looped all the way to the last char inc, of s, so s index == s length -1).
Is the logic I'm trying to work with incorrect? or I just executed it not good? i'll happy to get answers!

Here's my naïve approach:
private final int EXIST = 1;
private final int NOT_EXIST = -1;
private int lookFor(String a, String b, int index) {
for (int i = 0; i < b.length(); i++) {
if ((index + i) >= a.length()) return NOT_EXIST;
if (a.charAt(index + i) != b.charAt(i)) return NOT_EXIST;
}
return EXIST;
}
public int lookFor(String a, String b) {
char start = b.charAt(0);
for (int i=0; i < a.length(); i++) {
if (a.charAt(i) == start) {
if (lookFor(a, b, i) == EXIST) return EXIST;
}
}
return NOT_EXIST;
}
Though, I'm not sure why you would do this when you could just do:
int ret = a.contains(b) ? EXIST : NOT_EXIST
However I wanted to actually answer your question.
Here's a slightly improved version that satisfies your "all in one method" requirement.
public static int lookFor(String a, String b) {
// Fancy way of preventing errors when one of the strings is empty
boolean az = a.length() == 0;
boolean bz = b.length() == 0;
if (az ^ bz) return NOT_EXIST;
// Need this next line if you want to interpret two empty strings as containing eachother
if (az && bz) return EXIST;
char start = b.charAt(0);
// This is known as a "label". Some say it's bad practice.
outer:
for (int i=0; i < a.length(); i++) {
if (a.charAt(i) == start) {
// Instead of using two methods, we can condense it like so
for (int q = 0; q < b.length(); q++) {
if ((i + q) >= a.length()) continue outer;
if (a.charAt(i + q) != b.charAt(q)) continue outer;
}
return EXIST;
}
}
return NOT_EXIST;
}

To find a substring "by hand", you need a nested loop; i.e. a loop inside a loop.
The outer loop tries all of the possible start positions for the substring in the string.
For a given start position, the inner loop tests all of the characters of the string that you are looking for against the string you are searching in.
In the naive substring search algorithm, the outer loop steps starts at index zero, and increments the index by one until it gets to the end of the string being searched. This can be improved on:
Every non-null string "contains" the empty string. It may be worth treating this as a special case.
It is easy to see that the outer loop can usually stop before the final. If you are searching for a string of length (say) 3, then the outer loop can stop at 3 from the end. (Think about it ....)
There are some clever algorithms which allow the outer loop to skip over some indexes. If you are interested, start by Googling for "Boyer-Moore string search".
(Note: the looping could be replaced with / written using recursion, but it is still there.)
Your code doesn't have a nested loops. By my reading, it is only going to find a match if the string you are searching for is at the start of the string you are searching. That is not correct.

Related

Iterate over two strings checking to see if it matches with its pair

I am still new to Java, and I am currently working on a program that will take two strings as arguments and return the number of mismatched pairs. For my program I am working with ATGC because in science, A's always match up with T's and G's always match up with C's. I cant quite figure out how to iterate over the strings and see that the first character in string one (T for example) matches up with its intended pair (A), and if it doesn't it is a mismatched pair and it should be added to a counter to be totaled at the end. I believe I can use something called charAt(), but I am unsure of how that works.
I also need to figure out how to be able to take the absolute value of counter before it is added to the finalCounter. The main reason for this is because I just want to worry about getting the length difference between the two rather than making sure that the longer string is subracted from the smaller string.
Any help would be greatly appreciated!
''''
public class CountMismatches {
public static void main(String[] args) {
{
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2))
//*expected to print out 5 because there are 3 mismatched pairs and 2 that do not have a pair*
}
}
public static int count_mismatches(String seq1, String seq2) {
int mismatchCount = 0;
int counter = seq1.length() - seq2.length();
int finalCounter = mismatchCount + counter;
for(int i = 0; i < seq1.length(); i++) if (seq1.charAt(i) == seq2.charAt(i)) {
break; //checks to see if the length of seq1 and seq2 are the same
}
for(int i = 0; i < seq1.length(); i++) if (seq1.charAt(i) != seq2.charAt(i)) {
return counter; //figure out how to do absolute value for negative numbers
}
return finalCounter;
}
}
'''

Since you want to count only the places where there are differences, you can iterate through the minimum length present in both the strings and find out the places where they are different.
In the end, you can add absolute difference of length between seq1 and seq2 and return that value to the main function.
For the logic, all you have to do is apply 4 if conditions to check if character is A,G,C,T and if suitable pair is present in the other string.
public class CountMismatches {
public static void main(String[] args) {
{
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2));
}
}
public static int count_mismatches(String seq1, String seq2) {
int finalCounter = 0;
for (int i = 0; i < Math.min(seq1.length(), seq2.length()); i++) {
char c1 = seq1.charAt(i);
char c2 = seq2.charAt(i);
if (c1 == 'A') {
if (c2 == 'T')
continue;
else
finalCounter++;
} else if (c1 == 'T') {
if (c2 == 'A')
continue;
else
finalCounter++;
} else if (c1 == 'G') {
if (c2 == 'C')
continue;
else
finalCounter++;
} else if (c1 == 'C') {
if (c2 == 'G')
continue;
else
finalCounter++;
}
}
return finalCounter + (Math.abs(seq1.length() - seq2.length()));
}
}
and the output is as follows :
5

Make these refactorings:
To make the comparisons easy to code and understand, create a Map whose entires are each pair (both directions)
Iterate over the Strings up to the length of the shortest one, adding up the number of matching pairs as you go
The result is the length of the longest String minus the number of pairs
Like this:
public static int count_mismatches(String seq1, String seq2) {
Map<Character, Character> pairs = Map.of('A', 'T', 'T', 'A', 'G', 'C', 'C', 'G');
int count = 0;
for (int i = 0; i < Math.min(seq1.length(), seq2.length()); i++) {
if (pairs.get(seq1.charAt(i)) == seq2.charAt(i)) {
count++;
}
}
return Math.max(seq1.length(), seq2.length()) - count;
}
See live demo, which returns 5 for your sample input.

Good Evening,
Something seems off here, this snippet of code:
for(int i = 0; i < seq1.length(); i++)
if (seq1.charAt(i) == seq2.charAt(i)) {
break; //checks to see if the length of seq1 and seq2 are the same
}
Does not do what you think it does. This cycle will loop through all characters in sequence1 using i < seq1.length() and for each character that exists in seq1, it will check if said character is equal to the character with the same index in seq2.
This means that a correction is in order:
int countMismatches = 0;
for(int i = 0; i < seq1.length();i++){
switch(seq1.charAt(i)){
case 'A':
if(seq2.charAt(i) != 'T') countMismatches++;
break;
}
}
Repeat this process for the other letters, and voilá, you should be able to count your mismatches this way.
Do be careful with sequences having different lengths, as if that happens, as soon as you step out of a bound, you will receive an IndexOutOfBoundsException, indicating you've tried to check a character that does not exist.

First you must find out which string is the shortest in length. Also you need to get the length difference when calculating the shortest string. After that, use that length as a terminating condition in your for loop. You can use booleans to check whether the values are present before incrementing the counter with an if statement.
The absolute value of any number can be obtained by calling the static method abs() from the Math class. Last, just add the mismatchCounts to the absolute value of the length difference in order to obtain the result.
Here is my solution.
public class App {
public static void main(String[] args) throws Exception {
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(compareStrings(seq1, seq2));
}
public static int compareStrings(String stringOne, String stringTwo) {
Character A = 'A', T = 'T', G = 'G', C = 'C';
int mismatchCount = 0;
int lowestStringLenght = 0;
int length_one = stringOne.length();
int length_two = stringTwo.length();
int lenght_difference = 0;
if (length_one < length_two) {// string one lenght is greater
lowestStringLenght = length_one;
lenght_difference = length_one - length_two;
} else if (length_one > length_two) {// string two lenght is greater
lowestStringLenght = length_two;
lenght_difference = length_two - length_one;
} else { // lenghts must be equal, use either
lowestStringLenght = length_one;
lenght_difference = 0; // there is no difference because they are equal
}
for (int i = 0; i < lowestStringLenght; i++) {
// A matches with T
// G matches with C
// evaluate if the values A, T, G, C are present
boolean A_T_PRESENT = stringOne.charAt(i) == A && stringTwo.charAt(i) == T;
boolean G_C_PRESENT = stringOne.charAt(i) == G && stringTwo.charAt(i) == C;
boolean T_A_PRESENT = stringOne.charAt(i) == T && stringTwo.charAt(i) == A;
boolean C_G_PRESENT = stringOne.charAt(i) == C && stringTwo.charAt(i) == G;
boolean TWO_EQUAL = stringOne.charAt(i) == stringTwo.charAt(i);
// characters are equal, increase mismatch counter
if (TWO_EQUAL) {
mismatchCount++;
continue;
}
// all booleans evaluated to false, it means that the characters are not proper
// matches. Increment mismatchCount
else if (!A_T_PRESENT && !G_C_PRESENT && !T_A_PRESENT && !C_G_PRESENT) {
mismatchCount++;
continue;
} else {
continue;
}
}
// calculate the sum of the mismatches plus the abs of the lenght difference
lenght_difference = Math.abs(lenght_difference);
return mismatchCount + lenght_difference;
}
}

Avoid char
The char type is legacy, essentially broken. As a 16-bit value, char is physically incapable of representing most characters. The char type in your particular case would work. But using char is a bad habit generally, as such code may break when encountering any of about 75,000 characters defined in Unicode.
Code point
Use code point integer numbers instead. A code point is the number assigned to each of the over 140,000 characters defined by the Unicode Consortium.
Here we get an IntStream, a series of int values, one for each character in the input string. Then we collect these integer numbers into an array of int values.
int[] codePoints1 = seq1.codePoints().toArray() ;
int[] codePoints2 = seq2.codePoints().toArray() ;
You said the input strings may be of unequal length. So our two arrays may be jagged, of different lengths. Figure out the size of the shorter array.
int smallerSize = Math.min( codePoints1.length , codePoints2.length ) ;
Keep track of the index number of mismatched rows.
List<Integer> mismatchIndices = new ArrayList <>();
Loop the arrays based on that smaller size.
for( int i = 0 ; i < smallerSize ; i ++ )
{
if ( isBasePairValid( codePoint first , codePoint second ) )
{
…
} else
{
mismatchIndices.add( i ) ;
}
}
Write an isBasePairValid method
Write the isBasePairValid method, taking two arguments, the code points of the two nucleobase letters.
static int A = "A".codePointAt( 0 ) ; // Annoying zero-based index counting. So first character is number zero.
static int C = "C".codePointAt( 0 ) ;
static int G = "G".codePointAt( 0 ) ;
static int T = "T".codePointAt( 0 ) ;
if( first == A ) return ( second == T )
else if( first == T ) return ( second == A )
else if( first == C ) return ( second == G )
else if( first == G ) return ( second == C )
else { throw new IllegalStateException( … ) ; }
Count the mismatches.
int countMismatches = mismatchIndices.size() ;

The numerical sum of chars T & A and G & C is fixed and unique for legal nucleobase pairs. So you just need to ensure that the corresponding bases have one of those sums.
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2));
prints
5
find max length to iterate
establish fixed sums for comparison
iterate and compare to expected pairing and update count appropriately
public static int count_mismatches(String seq1, String seq2) {
int len1 = seq1.length();
int len2 = seq2.length();
int len = len1;
if (len1 > len2) {
len = len2;
}
int sumTA = 'T'+'A';
int sumGC = 'G'+'C';
int misMatchCount = Math.abs(len1-len2);
for (int i = 0; i < len; i++) {
int pair = seq1.charAt(i) + seq2.charAt(i);
if (pair != sumTA && pair != sumGC) {
misMatchCount++;
}
}
return misMatchCount;
}

recursion subtraction problem from CodingBat.com

I'm trying to solve this question from CodingBat.com. - https://codingbat.com/prob/p143900
I run the recursion twice, then I subtract their values to give the final answer. Individually, I and J are getting the correct values, but when I do i - j, the result doesn't make sense.
public int countHi2(String str) {
if(str.length() == 0)
return 0;
int i = count(str, "hi");
int j = count(str, "xhi");
return i-j;
}
public int count(String str, String match)
{
int i = str.indexOf(match);
if(i != -1)
return 1 + countHi2(str.substring(i+match.length()-1));
else
return 0 + countHi2(str.substring(1));
}

I think you treat rescursive illegally. You should not use external method or even split a string.
public int countHi2(String str) {
if (str == null || str.length() < 2)
return 0;
if (str.startsWith("hi"))
return 1 + countHi2(str.substring(2));
return countHi2(str.substring(str.startsWith("xhi") ? 3 : 1));
}

you might wanna use:
str.substring(i+match.length())
suppose your match was on index 3 and the length of the word to match is 2, then you want the substring from index 5 to the end.
thus,
str.substring(5);
which includes the index 5
though, i don't understand why you used
return 0 + countHi2(str.substring(1));
because this condition will hold if no match is found, so what is the point of checking for a match again in a substring !!!

How to modify my code to remove (not generate) duplicated permutations

I cannot figure out how to recognize that I generate repeated permutation in a recursive call. Let's say we 2 repeated letters in a string of length n. Then I need to create n!/2! sequences, instead of n! sequences.
How to modify my code to achieve this?
public class GeneralPermutationGenerator{
public static void main(String[] args) {
String s = "AABC";
perm(s);
}
public static void perm(String s){
char cs[] = s.toCharArray();
char result[] = new char[cs.length];
rperm(cs, result, 0);
}
static int j = 1;
private static void rperm(char[] cs, char[] result, int level){
if(level == result.length){
System.out.println(j++ + " " + new String(result));
return;
}
for(int i = 0; i < cs.length; i++){
if(cs[i] != 0){
result[level] = cs[i];
char temp = cs[i];
cs[i] = 0;
rperm(cs, result, ++level);
cs[i] = temp;
level--;
}
}
}
}

The uniqueness can be enforced by always taking a letter that appears multiple times from the first position available.
That is, at each level, when choosing a letter, you can look backward and see if it already occurred in the cs array. If it did occur before (which means it was not selected yet, because that position in cs is not zero), then it should not be allowed to select it from this position.
Implementation
One possible implementation involves changing the rperm code as follows (looping through the previous characters, to see if the current char was already encountered):
private static void rperm(char[] cs, char[] result, int level) {
if (level == result.length) {
System.out.println(j++ + " " + new String(result));
return;
}
for (int i = 0; i < cs.length; i++) {
if (cs[i] != 0) {
// first, determine if the current char was already
// encountered among the available options
boolean encountered = false;
for (int j = 0; j < i; j++) {
if (cs[j] == cs[i]) {
encountered = true;
break;
}
}
if (!encountered) {
result[level] = cs[i];
char temp = cs[i];
cs[i] = 0;
rperm(cs, result, ++level);
cs[i] = temp;
level--;
}
}
}
}
Explanation
To see how this works, consider again the example AABC.
To differentiate the two As in this discussion, let us denote them as A1 and A2.
For level = 0, we should choose a character to be put into result[0]:
we can choose A1;
we can NOT choose A2, because there was already an A encountered before in the list of available chars for this level;
we can choose B;
we can choose C.
First, the algorithm will choose A1, and proceed with recursion at next level.
At level = 1.
Now, the position associated to A1 has been marked with a 0 in the ch array.
Thus we have the following alternatives for the character to be put in result[1]:
choose A2 (because now there is not an A available before, as the first one was already taken at the previous recursion level, and marked with 0)
choose B;
choose C.
It will first select A2, and the partial permutation so far will be A1 A2, with two more levels to go in the recursion. However, the key for not having duplicates is that for a same character, its indices will always be in increasing order. The algorithm will not be able to also generate a permutation starting with A2 A1, simply because A2 is not allowed to be chosen if A1 is still available.

There is a simple non-recursive algorithm for finding the lexicographically next permutation of a sequence:
Scan backwards from the end of the sequence until you find an element which is (strictly) less than the following one. If there isn't one, the sequence is the lexicographically greatest possible permutation.
Reverse the subsequence of elements following the one which you found.
Exchange the element you found in step 1 with the first following element which is (strictly) greater than it.
I'm not really a Java programmer, so here's an implementation in simplified C++, using fewer standard library functions than I would usually use in the hopes that it is easier to understand:
template<typename V>
bool nextPerm(V& v) {
for (auto i = v.size(); i > 1; --i)
if (v[i-2] < v[i-1]) {
std::reverse(v.begin() + i - 1, v.end());
for (auto j = i - 1; j < v.size(); ++j)
if (v[i-2] < v[j]) { std::swap(v[i-2], v[j]); break; }
return true;
}
return false;
}

Recursively counting character occurrences in a string

Im making a program to count the number of times a character is found in a string. This is what my method looks like:
public static int count (String line, char c)
{
int charOccurences = 0; //= 0;
for (int x = 0 ; x < line.length () ; x++)
{
if (line.charAt (x) == c)
{
charOccurences++;
line = line.substring (0, x) + line.substring (x + 1);
return count (line, c);
}
else
return charOccurences;
}
return charOccurences;
}
It always returns 0, due to the fact that once the method calls itself it sets charOccurences back to 0. But i need to declare that variable for the method to work. I cant figure any way around this. Any help would be appreciated.

You ignored charOccurences right after you incremented it.
charOccurences++;
line = line.substring (0, x) + line.substring (x + 1);
return charOccurences + count (line, c); // Fixed for you.
Others have mentioned that you don't need a for loop at all. If you wanted to do this purely recursively, you would simply lose the loop, and follow these steps:
base case:
first character doesn't exist (length is zero)
return 0;
recursion case:
The first character does exist
if it matches, increment occurrences
else do nothing
return (occurrences) + (result of recursing with substring);

Yea, it is very easy to do it recursively :)
public static void main(String[] args) {
String text = "This is my text. Life is great";
System.out.println(count(text,'i',0));
}
public static int count(String line, char c, int pos) {
if (pos >= line.length()){
return 0;
}
return compare(line.charAt(pos), c) + count(line, c, pos+1);
}
public static int compare(char a, char b){
if (a == b){
return 1;
} else {
return 0;
}
}
Note that thanks to not substringing every time, time complexity is O(n) instead of yours O(n^2)

Here's a general approach for writing recursive methods for tasks that really shouldn't be recursive but have to be because you're learning about recursion in class:
Find a way to break the problem down into a smaller problem(s).
Here, your problem is to count the occurrences of character c in a string. Well, suppose you break your string down into "the first character" and a substring of "all the other characters". You can tell whether the first character equals c. Then you look at "all the other characters", and if that's not empty (the base case), then that's just a smaller version of the same problem. So you can use recursion on that. So pretend the recursion already happened, so then you know: (1) is the first character equal to c, and (2) how many occurrences of c are there in the smaller string. Once you know those two pieces of data, you should be able to figure out how many occurrences of c there are in the whole string.
For this problem, your solution should not have a loop in it.

You never actually increment count. You just keep returning count. At the very end of your recursive stack, count will return 0, as that is what you initialize count to at the begining of every method call, and it will keep returning zero until it gets to the bottom of the stack, then return 0. You need to do this:
charOccurences += count (line, c);
return charOccurences;
so charOccurences will start at 1 at the first occurence, then propagate up.

I think you're making it much harder than it needs to be?
public static int count(String line, char c) {
int orig = line.length();
int repl = line.replaceAll("" + c, "").length();
return orig - repl;
}

Despite doing it recursively is not required (let's do it for fun). You were almost done. Just be sure to have a condition that stops the recursion: here it is if (len == 0)… statement.
public static int count (String line, char c)
{
int len = line.length();
if ((len == 0) || (c == '\0')) // obvious case for empty string or nil char.
return 0; // Your recursion always ends here
String rest = line.substring(1);
if (line.charAt(0) == c)
{
return count(rest, c) + 1; // recurse on substring
}
else
{
return count(rest, c); // recurse on substring
}
}

i had the same issue you can always do this i did it on a word same applies for a sentence
private static int count(String word, String letter) {
int count = 0;
return occurrence(word, letter, count);
}
private static int occurrence(String word, String letter, int count) {
if ()
base case
else
// compare and increment if it matches..
return occurrence(word.substring(0, word.length() - 1), letter,count)
}
the other method occurrence be the recursion method,
and repeat your code now count is already defined and you can increment without having any problem! :)

Please remove the else loop inside the for loop. If you keep that loop you should get occurrence of only one character.
public static int count (String line, char c)
{
int charOccurences = 0; //= 0;
for (int x = 0 ; x < line.length () ; x++)
{
if (line.charAt (x) == c)
{
charOccurences++;
line = line.substring (0, x) + line.substring (x + 1);
return count (line, c);
}
}
return charOccurences;
}

I need to use substring, but I need the second parameter to be inclusive

I've got a String that I need to cycle through and create every possible substring. For example, if I had "HelloWorld", "rld" should be one of the possibilities. The String method, substring(int i, int k) is exclusive of k, so if
|H|e|l|l|o|W|o|r|l|d|
0 1 2 3 4 5 6 7 8 9
then
substring(7,9) returns "rl"
How would I work around this and get it to work inclusively? I understand why a substring shouldn't be able to equal the String it was created from, but in this case it would be very helpful to me in this case.
Example from Codingbat: http://codingbat.com/prob/p137918
What I was able to come up with:
public String parenBit(String str) {
String sub;
if (str.charAt(0) == '(' && str.charAt(str.length() - 1) == ')')
return str;
for (int i = 0; i < str.length() - 1; i++) {
for (int k = i + 1; k < str.length(); k++) {
sub = str.substring(i,k);
}
}
return null;
}

The transformation between exclusive to inclusive is simple when you're working in integers. You just add 1.
String substringInclusive(String s, int a, int b)
{
return s.substring(a, b+1);
}

As Jon Skeet rightly pointed out that adding 1 would be the right thing to do as the second parameter in String.substring is not inclusive.
However your answer is not recursive, below is the recursive solution:
public String parenBit(String str) {
if(str.charAt(0)!='(')
return parenBit(str.substring(1));
if(str.charAt(0)=='('&&(str.charAt(str.length()-1)!=')'))
return parenBit(str.substring(0, str.length()-1));
return str;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I manually find a substring in a string? (Java) - java

Related

Iterate over two strings checking to see if it matches with its pair

recursion subtraction problem from CodingBat.com

How to modify my code to remove (not generate) duplicated permutations

Recursively counting character occurrences in a string

I need to use substring, but I need the second parameter to be inclusive

Categories

Resources