Same char in one string - java

I need to know how many chars of the same type are in one string.
I have tried this
String x ="(3+3)*(4-2)";
int a = x.indexOf( "(" );
But that only give me the first index

You can use a loop and use the other method indexOf(int, int):
String x ="(3+3)*(4-2)";
int a = x.indexOf( "(" );
while (a >= 0) {
System.out.println("Char '(' found at: "+a);
a = x.indexOf('(', a+1);
}

It seems like it would be better to put it in a separate function:
// accepts a string and a char to find the number of occurrences of
public static int get_count(String s, char c) {
int count = 0; // count initially 0
for (int i = 0; i < s.length(); i++) // loop through the whole string
if (s.charAt(i) == c)
count ++; // increment every time an occurrence happens
return count; // return the count in the end
}
You can call it like this:
System.out.println(get_count("(3+3)*(4-2)", '('));
// Output: 2

There's a few ways I could think of doing this, but one of the simplest would be to simply loop the through characters in the String....
String x ="(3+3)*(4-2)";
int count = 0;
for (char c : x.toCharArray()) {
if (c == '(') {
count++;
}
}
System.out.println(count);
And just because it can be done...you could use a little regexp...(I know, overkill)
Pattern p = Pattern.compile("\\(");
Matcher matcher = p.matcher(x);
while (matcher.find()) {
count++;
}
System.out.println(count);

The code below does what you want. If performance is critical you can make optimization with this. If you want more elegant solutions you may take a look at regex library of java.
int occurences = 0;
String x ="(3+3)*(4-2)";
char tolookfor = '(';
for(int i = 0; i < x.length() ; i++)
{
if(x.charAt(i) == tolookfor)
occurences++;
}

You can try this
String x ="(3+3)*(4-2)";
char[] arr=x.toCharArray();
Map<String,Integer> map=new HashMap<>();
for(int i=0;i<arr.length;i++){
Integer upTo=map.get(String.valueOf(arr[i]));
if (upTo==null) {
upTo=0;
}
map.put(String.valueOf(arr[i]),upTo+1) ;
}
for (Map.Entry<String,Integer> entry:map.entrySet()){
System.out.println("Number of "+entry.getKey()+" in this string is: "+entry.getValue());
}
out put
Number of 3 in this string is: 2
Number of 2 in this string is: 1
Number of 4 in this string is: 1
Number of * in this string is: 1
Number of + in this string is: 1
Number of ( in this string is: 2
Number of ) in this string is: 2
Number of - in this string is: 1

It’s unbelievable how complicated the answers to such a simple question can be.
x.indexOf( "(" );
But that only give me the first index
Use x.indexOf( "(", fromIndex ); to find more occurrences. Point.
By the way, if you are looking for a single char you can use x.indexOf( '('); and x.indexOf( '(', fromIndex ); to be more efficient.
So the most efficient way without reinventing the wheel would be:
int count=0;
for(int pos=s.indexOf('('); pos!=-1; pos=s.indexOf('(', pos+1)) count++;

Use StringUtils.countMatches
StringUtils.countMatches(value,"(");
or
public static int countMatches(String value, String valueToCount) {
if (value.isEmpty() || valueToCount.isEmpty()) {
return 0;
}
int count = 0;
int index = 0;
while ((index = value.indexOf(valueToCount, index)) != -1) {
count++;
index += valueToCount.length();
}
return count;
}

This will help you!
public static int counter(String x, char y) {
char[] array=x.toCharArray();
int count=0;
for(int i=0;i<x.length();i++)
{
if(y==array[i]) count++;
}
return (count>0)? count:0;
}

Related

Iterate over two strings checking to see if it matches with its pair

I am still new to Java, and I am currently working on a program that will take two strings as arguments and return the number of mismatched pairs. For my program I am working with ATGC because in science, A's always match up with T's and G's always match up with C's. I cant quite figure out how to iterate over the strings and see that the first character in string one (T for example) matches up with its intended pair (A), and if it doesn't it is a mismatched pair and it should be added to a counter to be totaled at the end. I believe I can use something called charAt(), but I am unsure of how that works.
I also need to figure out how to be able to take the absolute value of counter before it is added to the finalCounter. The main reason for this is because I just want to worry about getting the length difference between the two rather than making sure that the longer string is subracted from the smaller string.
Any help would be greatly appreciated!
''''
public class CountMismatches {
public static void main(String[] args) {
{
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2))
//*expected to print out 5 because there are 3 mismatched pairs and 2 that do not have a pair*
}
}
public static int count_mismatches(String seq1, String seq2) {
int mismatchCount = 0;
int counter = seq1.length() - seq2.length();
int finalCounter = mismatchCount + counter;
for(int i = 0; i < seq1.length(); i++) if (seq1.charAt(i) == seq2.charAt(i)) {
break; //checks to see if the length of seq1 and seq2 are the same
}
for(int i = 0; i < seq1.length(); i++) if (seq1.charAt(i) != seq2.charAt(i)) {
return counter; //figure out how to do absolute value for negative numbers
}
return finalCounter;
}
}
'''
Since you want to count only the places where there are differences, you can iterate through the minimum length present in both the strings and find out the places where they are different.
In the end, you can add absolute difference of length between seq1 and seq2 and return that value to the main function.
For the logic, all you have to do is apply 4 if conditions to check if character is A,G,C,T and if suitable pair is present in the other string.
public class CountMismatches {
public static void main(String[] args) {
{
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2));
}
}
public static int count_mismatches(String seq1, String seq2) {
int finalCounter = 0;
for (int i = 0; i < Math.min(seq1.length(), seq2.length()); i++) {
char c1 = seq1.charAt(i);
char c2 = seq2.charAt(i);
if (c1 == 'A') {
if (c2 == 'T')
continue;
else
finalCounter++;
} else if (c1 == 'T') {
if (c2 == 'A')
continue;
else
finalCounter++;
} else if (c1 == 'G') {
if (c2 == 'C')
continue;
else
finalCounter++;
} else if (c1 == 'C') {
if (c2 == 'G')
continue;
else
finalCounter++;
}
}
return finalCounter + (Math.abs(seq1.length() - seq2.length()));
}
}
and the output is as follows :
5
Make these refactorings:
To make the comparisons easy to code and understand, create a Map whose entires are each pair (both directions)
Iterate over the Strings up to the length of the shortest one, adding up the number of matching pairs as you go
The result is the length of the longest String minus the number of pairs
Like this:
public static int count_mismatches(String seq1, String seq2) {
Map<Character, Character> pairs = Map.of('A', 'T', 'T', 'A', 'G', 'C', 'C', 'G');
int count = 0;
for (int i = 0; i < Math.min(seq1.length(), seq2.length()); i++) {
if (pairs.get(seq1.charAt(i)) == seq2.charAt(i)) {
count++;
}
}
return Math.max(seq1.length(), seq2.length()) - count;
}
See live demo, which returns 5 for your sample input.
Good Evening,
Something seems off here, this snippet of code:
for(int i = 0; i < seq1.length(); i++)
if (seq1.charAt(i) == seq2.charAt(i)) {
break; //checks to see if the length of seq1 and seq2 are the same
}
Does not do what you think it does. This cycle will loop through all characters in sequence1 using i < seq1.length() and for each character that exists in seq1, it will check if said character is equal to the character with the same index in seq2.
This means that a correction is in order:
int countMismatches = 0;
for(int i = 0; i < seq1.length();i++){
switch(seq1.charAt(i)){
case 'A':
if(seq2.charAt(i) != 'T') countMismatches++;
break;
}
}
Repeat this process for the other letters, and voilá, you should be able to count your mismatches this way.
Do be careful with sequences having different lengths, as if that happens, as soon as you step out of a bound, you will receive an IndexOutOfBoundsException, indicating you've tried to check a character that does not exist.
First you must find out which string is the shortest in length. Also you need to get the length difference when calculating the shortest string. After that, use that length as a terminating condition in your for loop. You can use booleans to check whether the values are present before incrementing the counter with an if statement.
The absolute value of any number can be obtained by calling the static method abs() from the Math class. Last, just add the mismatchCounts to the absolute value of the length difference in order to obtain the result.
Here is my solution.
public class App {
public static void main(String[] args) throws Exception {
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(compareStrings(seq1, seq2));
}
public static int compareStrings(String stringOne, String stringTwo) {
Character A = 'A', T = 'T', G = 'G', C = 'C';
int mismatchCount = 0;
int lowestStringLenght = 0;
int length_one = stringOne.length();
int length_two = stringTwo.length();
int lenght_difference = 0;
if (length_one < length_two) {// string one lenght is greater
lowestStringLenght = length_one;
lenght_difference = length_one - length_two;
} else if (length_one > length_two) {// string two lenght is greater
lowestStringLenght = length_two;
lenght_difference = length_two - length_one;
} else { // lenghts must be equal, use either
lowestStringLenght = length_one;
lenght_difference = 0; // there is no difference because they are equal
}
for (int i = 0; i < lowestStringLenght; i++) {
// A matches with T
// G matches with C
// evaluate if the values A, T, G, C are present
boolean A_T_PRESENT = stringOne.charAt(i) == A && stringTwo.charAt(i) == T;
boolean G_C_PRESENT = stringOne.charAt(i) == G && stringTwo.charAt(i) == C;
boolean T_A_PRESENT = stringOne.charAt(i) == T && stringTwo.charAt(i) == A;
boolean C_G_PRESENT = stringOne.charAt(i) == C && stringTwo.charAt(i) == G;
boolean TWO_EQUAL = stringOne.charAt(i) == stringTwo.charAt(i);
// characters are equal, increase mismatch counter
if (TWO_EQUAL) {
mismatchCount++;
continue;
}
// all booleans evaluated to false, it means that the characters are not proper
// matches. Increment mismatchCount
else if (!A_T_PRESENT && !G_C_PRESENT && !T_A_PRESENT && !C_G_PRESENT) {
mismatchCount++;
continue;
} else {
continue;
}
}
// calculate the sum of the mismatches plus the abs of the lenght difference
lenght_difference = Math.abs(lenght_difference);
return mismatchCount + lenght_difference;
}
}
Avoid char
The char type is legacy, essentially broken. As a 16-bit value, char is physically incapable of representing most characters. The char type in your particular case would work. But using char is a bad habit generally, as such code may break when encountering any of about 75,000 characters defined in Unicode.
Code point
Use code point integer numbers instead. A code point is the number assigned to each of the over 140,000 characters defined by the Unicode Consortium.
Here we get an IntStream, a series of int values, one for each character in the input string. Then we collect these integer numbers into an array of int values.
int[] codePoints1 = seq1.codePoints().toArray() ;
int[] codePoints2 = seq2.codePoints().toArray() ;
You said the input strings may be of unequal length. So our two arrays may be jagged, of different lengths. Figure out the size of the shorter array.
int smallerSize = Math.min( codePoints1.length , codePoints2.length ) ;
Keep track of the index number of mismatched rows.
List<Integer> mismatchIndices = new ArrayList <>();
Loop the arrays based on that smaller size.
for( int i = 0 ; i < smallerSize ; i ++ )
{
if ( isBasePairValid( codePoint first , codePoint second ) )
{
…
} else
{
mismatchIndices.add( i ) ;
}
}
Write an isBasePairValid method
Write the isBasePairValid method, taking two arguments, the code points of the two nucleobase letters.
static int A = "A".codePointAt( 0 ) ; // Annoying zero-based index counting. So first character is number zero.
static int C = "C".codePointAt( 0 ) ;
static int G = "G".codePointAt( 0 ) ;
static int T = "T".codePointAt( 0 ) ;
if( first == A ) return ( second == T )
else if( first == T ) return ( second == A )
else if( first == C ) return ( second == G )
else if( first == G ) return ( second == C )
else { throw new IllegalStateException( … ) ; }
Count the mismatches.
int countMismatches = mismatchIndices.size() ;
The numerical sum of chars T & A and G & C is fixed and unique for legal nucleobase pairs. So you just need to ensure that the corresponding bases have one of those sums.
String seq1 = "TTCGATGGAGCTGTA";
String seq2 = "TAGCTAGCTCGGCATGA";
System.out.println(count_mismatches(seq1, seq2));
prints
5
find max length to iterate
establish fixed sums for comparison
iterate and compare to expected pairing and update count appropriately
public static int count_mismatches(String seq1, String seq2) {
int len1 = seq1.length();
int len2 = seq2.length();
int len = len1;
if (len1 > len2) {
len = len2;
}
int sumTA = 'T'+'A';
int sumGC = 'G'+'C';
int misMatchCount = Math.abs(len1-len2);
for (int i = 0; i < len; i++) {
int pair = seq1.charAt(i) + seq2.charAt(i);
if (pair != sumTA && pair != sumGC) {
misMatchCount++;
}
}
return misMatchCount;
}

How to swap digits in number?

I need to write function that gets 3 params(int num, int k, int nDigit).
The function get number and replace the digit inside the number in k index by nDigit.
for example:
int num = 5498
int k = 2
int nDigit= 3
the result is num = 5398
My question is how can I implement it?I undastand that the best way to convert the num to string and then just replace char on specific index by nDigit char.
But is there any way to implement it?Without
public int changeDigit(int num, int k, int nDigit){
k = pow(10,k);
double saved = num%k; // Save digits after
num = num - (num%(k*10)); //Get what's before k
return ((int) (num + (nDigit*k) + saved));
}
I won't do your homework for you, but here's some hints:
Convert integer to string:
String s = Integer.toString(1234);
Enumerating a string:
for (i = 0; i < s.length; i++)
{
char c = s.charAt(i);
}
String building (a little less efficient without the StringBuilder class)
char c = '1';
String s = "3";
String j = "";
j = j + c;
j = j + s; // j is now equal to "13"
String back to integer:
int val = Integer.parseInt("42");
You can use a StringBuilder. It's easier to see what you're doing and you don't need to perform mathematics, only adjust the characters in their positions. Then convert it back to int.
public class Main {
static int swapParams(int num, int k, int nDigit) {
StringBuilder myName = new StringBuilder(Integer.toString(num));
myName.setCharAt(k-1, Integer.toString(nDigit).charAt(0));
return Integer.parseInt(myName.toString());
}
public static void main(String[] args) {
System.out.println(swapParams(5498, 2, 3));
}
}
http://ideone.com/e4MF6m
You can do it like this:
public int func(int num, int k, int nDigit) {
String number = String.valueOf(num);
return Integer.parseInt(number.substring(0, k - 1) + nDigit + number.substring(k, number.length()));
}
This function takes the first characters of the number without the k'th number and adds the nDigit to it. Then it adds the last part of the number and returns it as an integer number.
This is my javascript solution.
const solution = numbers => { //declare a variable that will hold
the array el that is not strictly ascending let flawedIndex;
//declare a boolean variable to actually check if there is a flawed array el in the given array let flawed = false;
//iterate through the given array for(let i=0; i<numbers.length; i++) {
//check if current array el is greater than the next
if(numbers[i] > numbers[i+1])
{
//check if we already set flawed to true once.
//if flawed==true, then return that this array cannot be sorted
//strictly ascending even if we swap one elements digits
if(flawed) {
return false;
}
//if flawed is false, then set it to true and store the index of the flawed array el
else {
flawed = true;
flawedIndex = i;
}
}
}
//if flawed is still false after the end of the for loop, return true //where true = the array is sctrictly ascending if(flawed ==
false) return true;
//if flawed==true, that is there is an array el that is flawed if(flawed){
//store the result of calling the swap function on the digits of the flawed array el
let swapResult = swap(flawedIndex,numbers);
//if the swapresult is true, then return that it is ascending
if (swapResult == true) return true; }
//else return that its false return false; }
const swap = (flawIndex, numbers) => {
let num = numbers[flawIndex];
//convert the given array el to a string, and split the string based on '' let numToString = num.toString().split('');
//iterate through every digit from index 0 for(let i=0;
i<numToString.length; i++) {
//iterate from every digit from index 1
for(let j=i+1; j<numToString.length; j++) {
//swap the first index digit with every other index digit
let temp = numToString[i];
numToString[i] = numToString[j]
numToString[j] = temp;
console.log(numToString)
//check if the swapped number is lesser than the next number in the main array
//AND if it is greater than the previous el in the array. if yes, return true
let swappedNum = Number(numToString.join(''));
if(swappedNum < numbers[flawIndex + 1] && swappedNum > numbers[flawIndex-])
{
return true;
}
} } //else return false return false; }
console.log("the solution is ",solution([1, 3, 900, 10]))

Pair Palindrome

I have this code to find all pairs of string to form a palindrome. e.g) D: { AB, DEEDBA } => AB + DEEDBA -> YES and will be returned. Another example, { NONE, XENON } => NONE + XENON = > YES.
What would be running time of this ?
public static List<List<String>> pairPalindrome(List<String> D) {
List<List<String>> pairs = new LinkedList<>();
Set<String> set = new HashSet<>();
for (String s : D) {
set.add(s);
}
for (String s : D) {
String r = reverse(s);
for (int i = 0; i <= r.length(); i++) {
String prefix = r.substring(0, i);
if (set.contains(prefix)) {
String suffix = r.substring(i);
if (isPalindrom(suffix)) {
pairs.add(Arrays.asList(s, prefix));
}
}
}
}
return pairs;
}
private static boolean isPalindrom(String s) {
int i = 0;
int j = s.length() - 1;
char[] c = s.toCharArray();
while (i < j) {
if (c[i] != c[j]) {
return false;
}
i++;
j--;
}
return true;
}
private static String reverse(String s) {
char[] c = s.toCharArray();
int i = 0;
int j = c.length - 1;
while (i < j) {
char temp = c[i];
c[i] = c[j];
c[j] = temp;
i++;
j--;
}
return new String(c);
}
I'm going to take a few guesses here as I don't have much experience with Java.
First, isPalindrome is O(N) with the size of suffix string. Add operation to 'pairs' would probably be O(1).
Then, we have the for loop, it's O(N) with the length of r. Getting a substring I'd think is O(M) with the size of the substring. Checking if a hashmap contains a certain key, with a perfect hash function would be (IIRC) O(1), in your case we can assume O(lgN) (possibly). So, first for loop has O(NMlgK), where K is your hash table size, N is r's length and M is substring's length.
Finally we have the outmost for loop, it runs for each string in the string list, so that's O(N). Then, we reverse each of them. So for each of these strings we have another O(N) operation inside, with the other loop being O(NMlgK). So, overall complexity is O(L(N + NMlgK)), where L is the amount of strings you have. But, it'd reduce to O(LNMlgK). I'd like if someone verified or corrected my mistakes.
EDIT: Actually, substring length will at most be N, as the length of the entire string, so M is actually N. Now I'd probably say it's O(LNlgK).

Finding the Number of Times an Expression Occurs in a String Continuously and Non Continuously

I had a coding interview over the phone and was asked this question:
Given a String (for example):
"aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc"
and an expression (for example):
"a+b+c-"
where:
+: means the char before it is repeated 2 times
-: means the char before it is repeated 4 times
Find the number of times the given expression appears in the string with the operands occurring non continuously and continuously.
The above expression occurs 4 times:
1) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
2) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
3) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
4) aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc
^^ ^^ ^^^^
aa bb cccc
I had no idea how to do it. I started doing an iterative brute force method with lots of marking of indices but realized how messy and hard that would to code half way through:
import java.util.*;
public class Main {
public static int count(String expression, String input) {
int count = 0;
ArrayList<char[]> list = new ArrayList<char[]>();
// Create an ArrayList of chars to iterate through the expression and match to string
for(int i = 1; i<expression.length(); i=i+2) {
StringBuilder exp = new StringBuilder();
char curr = expression.charAt(i-1);
if(expression.charAt(i) == '+') {
exp.append(curr).append(curr);
list.add(exp.toString().toCharArray());
}
else { // character is '-'
exp.append(curr).append(curr).append(curr).append(curr);
list.add(exp.toString().toCharArray());
}
}
char[] inputArray = input.toCharArray();
int i = 0; // outside pointer
int j = 0; // inside pointer
while(i <= inputArray.length) {
while(j <= inputArray.length) {
for(int k = 0; k< list.size(); k++) {
/* loop through
* all possible combinations in array list
* with multiple loops
*/
}
j++;
}
i++;
j=i;
}
return count;
}
public static void main(String[] args) {
String expression = "a+b+c-";
String input = "aaksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
System.out.println("The expression occurs: "+count(expression, input)+" times");
}
}
After spending a lot of time doing it iteratively he mentioned recursion and I still couldn't see a clear way doing it recursively and I wasn't able to solve the question. I am trying to solve it now post-interview and am still not sure how to go about this question. How should I go about solving this problem? Is the solution obvious? I thought this was a really hard question for a coding phone interview.
Non-recursion algorithm that requires O(m) space and operates in O(n*m), where m is number of tokens in query:
#Test
public void subequences() {
String input = "aabbccaacccccbbd";
String query = "a+b+";
// here to store tokens of a query: e.g. {a, +}, {b, +}
char[][] q = new char[query.length() / 2][];
// here to store counts of subsequences ending by j-th token found so far
int[] c = new int[query.length() / 2]; // main
int[] cc = new int[query.length() / 2]; // aux
// tokenize
for (int i = 0; i < query.length(); i += 2)
q[i / 2] = new char[] {query.charAt(i), query.charAt(i + 1)};
// init
char[] sub2 = {0, 0}; // accumulator capturing last 2 chars
char[] sub4 = {0, 0, 0, 0}; // accumulator capturing last 4 chars
// main loop
for (int i = 0; i < input.length(); i++) {
shift(sub2, input.charAt(i));
shift(sub4, input.charAt(i));
boolean all2 = sub2[1] != 0 && sub2[0] == sub2[1]; // true if all sub2 chars are same
boolean all4 = sub4[3] != 0 && sub4[0] == sub4[1] // true if all sub4 chars are same
&& sub4[0] == sub4[2] && sub4[0] == sub4[3];
// iterate tokens
for (int j = 0; j < c.length; j++) {
if (all2 && q[j][1] == '+' && q[j][0] == sub2[0]) // found match for "+" token
cc[j] = j == 0 // filling up aux array
? c[j] + 1 // first token, increment counter by 1
: c[j] + c[j - 1]; // add value of preceding token counter
if (all4 && q[j][1] == '-' && q[j][0] == sub4[0]) // found match for "-" token
cc[j] = j == 0
? c[j] + 1
: c[j] + c[j - 1];
}
if (all2) sub2[1] = 0; // clear, to make "aa" occur in "aaaa" 2, not 3 times
if (all4) sub4[3] = 0;
copy(cc, c); // copy aux array to main
}
}
System.out.println(c[c.length - 1]);
}
// shifts array 1 char left and puts c at the end
void shift(char[] cc, char c) {
for (int i = 1; i < cc.length; i++)
cc[i - 1] = cc[i];
cc[cc.length - 1] = c;
}
// copies array contents
void copy(int[] from, int[] to) {
for (int i = 0; i < from.length; i++)
to[i] = from[i];
}
The main idea is to catch chars from the input one by one, holding them in 2- and 4-char accumulators and check if any of them match some tokens of the query, remembering how many matches have we got for sub-queries ending by these tokens so far.
Query (a+b+c-) is splitted into tokens (a+, b+, c-). Then we collect chars in accumulators and check if they match some tokens. If we find match for first token, we increment its counter by 1. If we find match for another j-th token, we can create as many additional subsequences matching subquery composed of tokens [0...j], as many of them now exist for subquery composed of tokens [0... j-1], because this match can be appended to every of them.
For example, we have:
a+ : 3 (3 matches for a+)
b+ : 2 (2 matches for a+b+)
c- : 1 (1 match for a+b+c-)
when cccc arrives. Then c- counter should be increased by b+ counter value, because so far we have 2 a+b+ subsequences and cccc can be appended to both of them.
Let's call the length of the string n, and the length of the query expression (in terms of the number of "units", like a+ or b-) m.
It's not clear exactly what you mean by "continuously" and "non-continuously", but if "continuously" means that there can't be any gaps between query string units, then you can just use the KMP algorithm to find all instances in O(m+n) time.
We can solve the "non-continuous" version in O(nm) time and space with dynamic programming. Basically, what we want to compute is a function:
f(i, j) = the number of occurrences of the subquery consisting of the first i units
of the query expression, in the first j characters of the string.
So with your example, f(2, 41) = 2, since there are 2 separate occurrences of the subpattern a+b+ in the first 41 characters of your example string.
The final answer will then be f(n, m).
We can compute this recursively as follows:
f(0, j) = 0
f(i, 0) = 0
f(i > 0, j > 0) = f(i, j-1) + isMatch(i, j) * f(i-1, j-len(i))
where len(i) is the length of the ith unit in the expression (always 2 or 4) and isMatch(i, j) is a function that returns 1 if the ith unit in the expression matches the text ending at position j, and 0 otherwise. For example, isMatch(15, 2) = 1 in your example, because s[14..15] = bb. This function takes just constant time to run, because it never needs to check more than 4 characters.
The above recursion will already work as-is, but we can save time by making sure that we only solve each subproblem once. Because the function f() depends only on its 2 parameters i and j, which range between 0 and m, and between 0 and n, respectively, we can just compute all n*m possible answers and store them in a table.
[EDIT: As Sasha Salauyou points out, the space requirement can in fact be reduced to O(m). We never need to access values of f(i, k) with k < j-1, so instead of storing m columns in the table we can just store 2, and alternate between them by always accessing column m % 2.]
Wanted to try it for myself and figured I could then share my solution as well. The parse method obviously has issues when there is indeed a char 0 in the expression (although that would probably be the bigger issue itself), the find method will fail for an empty needles array and I wasn't sure if ab+c- should be considered a valid pattern (I treat it as such). Note that this covers only the non-continous part so far.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class Matcher {
public static void main(String[] args) {
String haystack = "aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
String[] needles = parse("a+b+c-");
System.out.println("Needles: " + Arrays.toString(needles));
System.out.println("Found: " + find(haystack, needles, 0));
needles = parse("ab+c-");
System.out.println("Needles: " + Arrays.toString(needles));
System.out.println("Found: " + find(haystack, needles, 0));
}
private static int find(String haystack, String[] needles, int i) {
String currentNeedle = needles[i];
int pos = haystack.indexOf(currentNeedle);
if (pos < 0) {
// Abort: Current needle not found
return 0;
}
// Current needle found (also means that pos + currentNeedle.length() will always
// be <= haystack.length()
String remainingHaystack = haystack.substring(pos + currentNeedle.length());
// Last needle?
if (i == needles.length - 1) {
// +1: We found one match for all needles
// Try to find more matches of current needle in remaining haystack
return 1 + find(remainingHaystack, needles, i);
}
// Try to find more matches of current needle in remaining haystack
// Try to find next needle in remaining haystack
return find(remainingHaystack, needles, i) + find(remainingHaystack, needles, i + 1);
}
private static String[] parse(String expression) {
List<String> searchTokens = new ArrayList<String>();
char lastChar = 0;
for (int i = 0; i < expression.length(); i++) {
char c = expression.charAt(i);
char[] chars;
switch (c) {
case '+':
// last char is repeated 2 times
chars = new char[2];
Arrays.fill(chars, lastChar);
searchTokens.add(String.valueOf(chars));
lastChar = 0;
break;
case '-':
// last char is repeated 4 times
chars = new char[4];
Arrays.fill(chars, lastChar);
searchTokens.add(String.valueOf(chars));
lastChar = 0;
break;
default:
if (lastChar != 0) {
searchTokens.add(String.valueOf(lastChar));
}
lastChar = c;
}
}
return searchTokens.toArray(new String[searchTokens.size()]);
}
}
Output:
Needles: [aa, bb, cccc]
Found: 4
Needles: [a, bb, cccc]
Found: 18
How about preprocessing aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc?
This become a1k1s1d1b1a2l1a1s1k1d1h1f1b2l1a1j1d1f1h1a1c4a1o1u1d1g1a1l1s1a2b2l1i1s1d1f1h1c4
Now find occurrences of a2, b2, c4.
Tried it code below but right now it gives only first possible match based of depth first.
Need to be changed to do all possible combination instead of just first
import java.util.ArrayList;
import java.util.List;
public class Parsing {
public static void main(String[] args) {
String input = "aksdbaalaskdhfbblajdfhacccc aoudgalsaa bblisdfhcccc";
System.out.println(input);
for (int i = 0; i < input.length(); i++) {
System.out.print(i/10);
}
System.out.println();
for (int i = 0; i < input.length(); i++) {
System.out.print(i%10);
}
System.out.println();
List<String> tokenisedSearch = parseExp("a+b+c-");
System.out.println(tokenisedSearch);
parse(input, 0, tokenisedSearch, 0);
}
public static boolean parse(String input, int searchFromIndex, List<String> tokensToSeach, int currentTokenIndex) {
if(currentTokenIndex >= tokensToSeach.size())
return true;
String token = tokensToSeach.get(currentTokenIndex);
int found = input.indexOf(token, searchFromIndex);
if(found >= 0) {
System.out.println("Found at Index "+found+ " Token " +token);
return parse(input, searchFromIndex+1, tokensToSeach, currentTokenIndex+1);
}
return false;
}
public static List<String> parseExp(String exp) {
List<String> list = new ArrayList<String>();
String runningToken = "";
for (int i = 0; i < exp.length(); i++) {
char at = exp.charAt(i);
switch (at) {
case '+' :
runningToken += runningToken;
list.add(runningToken);
runningToken = "";
break;
case '-' :
runningToken += runningToken;
runningToken += runningToken;
list.add(runningToken);
runningToken = "";
break;
default :
runningToken += at;
}
}
return list;
}
}
Recursion may be the following (pseudocode):
int search(String s, String expression) {
if expression consists of only one token t /* e. g. "a+" */ {
search for t in s
return number of occurrences
} else {
int result = 0
divide expression into first token t and rest expression
// e. g. "a+a+b-" -> t = "a+", rest = "a+b-"
search for t in s
for each occurrence {
s1 = substring of s from the position of occurrence to the end
result += search(s1, rest) // search for rest of expression in rest of string
}
return result
}
}
Applying this to entire string, you'll get number of non-continuous occurrences. To get continuous occurrences, you don't need recursion at all--just transform expression into string and search by iteration.
If you convert the search string first with a simple parser/compiler so a+ becomes aa etc. then you can simply take this string and run a regular expression match against your hay stack. (Sorry, I'm no Java coder so can't deliver any real code but it is not really difficult)

Find all substrings that are palindromes

If the input is 'abba' then the possible palindromes are a, b, b, a, bb, abba.
I understand that determining if string is palindrome is easy. It would be like:
public static boolean isPalindrome(String str) {
int len = str.length();
for(int i=0; i<len/2; i++) {
if(str.charAt(i)!=str.charAt(len-i-1) {
return false;
}
return true;
}
But what is the efficient way of finding palindrome substrings?
This can be done in O(n), using Manacher's algorithm.
The main idea is a combination of dynamic programming and (as others have said already) computing maximum length of palindrome with center in a given letter.
What we really want to calculate is radius of the longest palindrome, not the length.
The radius is simply length/2 or (length - 1)/2 (for odd-length palindromes).
After computing palindrome radius pr at given position i we use already computed radiuses to find palindromes in range [i - pr ; i]. This lets us (because palindromes are, well, palindromes) skip further computation of radiuses for range [i ; i + pr].
While we search in range [i - pr ; i], there are four basic cases for each position i - k (where k is in 1,2,... pr):
no palindrome (radius = 0) at i - k
(this means radius = 0 at i + k, too)
inner palindrome, which means it fits in range
(this means radius at i + k is the same as at i - k)
outer palindrome, which means it doesn't fit in range
(this means radius at i + k is cut down to fit in range, i.e because i + k + radius > i + pr we reduce radius to pr - k)
sticky palindrome, which means i + k + radius = i + pr
(in that case we need to search for potentially bigger radius at i + k)
Full, detailed explanation would be rather long. What about some code samples? :)
I've found C++ implementation of this algorithm by Polish teacher, mgr Jerzy Wałaszek.
I've translated comments to english, added some other comments and simplified it a bit to be easier to catch the main part.
Take a look here.
Note: in case of problems understanding why this is O(n), try to look this way:
after finding radius (let's call it r) at some position, we need to iterate over r elements back, but as a result we can skip computation for r elements forward. Therefore, total number of iterated elements stays the same.
Perhaps you could iterate across potential middle character (odd length palindromes) and middle points between characters (even length palindromes) and extend each until you cannot get any further (next left and right characters don't match).
That would save a lot of computation when there are no many palidromes in the string. In such case the cost would be O(n) for sparse palidrome strings.
For palindrome dense inputs it would be O(n^2) as each position cannot be extended more than the length of the array / 2. Obviously this is even less towards the ends of the array.
public Set<String> palindromes(final String input) {
final Set<String> result = new HashSet<>();
for (int i = 0; i < input.length(); i++) {
// expanding even length palindromes:
expandPalindromes(result,input,i,i+1);
// expanding odd length palindromes:
expandPalindromes(result,input,i,i);
}
return result;
}
public void expandPalindromes(final Set<String> result, final String s, int i, int j) {
while (i >= 0 && j < s.length() && s.charAt(i) == s.charAt(j)) {
result.add(s.substring(i,j+1));
i--; j++;
}
}
So, each distinct letter is already a palindrome - so you already have N + 1 palindromes, where N is the number of distinct letters (plus empty string). You can do that in single run - O(N).
Now, for non-trivial palindromes, you can test each point of your string to be a center of potential palindrome - grow in both directions - something that Valentin Ruano suggested.
This solution will take O(N^2) since each test is O(N) and number of possible "centers" is also O(N) - the center is either a letter or space between two letters, again as in Valentin's solution.
Note, there is also O(N) solution to your problem, based on Manacher's algoritm (article describes "longest palindrome", but algorithm could be used to count all of them)
I just came up with my own logic which helps to solve this problem.
Happy coding.. :-)
System.out.println("Finding all palindromes in a given string : ");
subPal("abcacbbbca");
private static void subPal(String str) {
String s1 = "";
int N = str.length(), count = 0;
Set<String> palindromeArray = new HashSet<String>();
System.out.println("Given string : " + str);
System.out.println("******** Ignoring single character as substring palindrome");
for (int i = 2; i <= N; i++) {
for (int j = 0; j <= N; j++) {
int k = i + j - 1;
if (k >= N)
continue;
s1 = str.substring(j, i + j);
if (s1.equals(new StringBuilder(s1).reverse().toString())) {
palindromeArray.add(s1);
}
}
}
System.out.println(palindromeArray);
for (String s : palindromeArray)
System.out.println(s + " - is a palindrome string.");
System.out.println("The no.of substring that are palindrome : "
+ palindromeArray.size());
}
Output:-
Finding all palindromes in a given string :
Given string : abcacbbbca
******** Ignoring single character as substring palindrome ********
[cac, acbbbca, cbbbc, bb, bcacb, bbb]
cac - is a palindrome string.
acbbbca - is a palindrome string.
cbbbc - is a palindrome string.
bb - is a palindrome string.
bcacb - is a palindrome string.
bbb - is a palindrome string.
The no.of substring that are palindrome : 6
I suggest building up from a base case and expanding until you have all of the palindomes.
There are two types of palindromes: even numbered and odd-numbered. I haven't figured out how to handle both in the same way so I'll break it up.
1) Add all single letters
2) With this list you have all of the starting points for your palindromes. Run each both of these for each index in the string (or 1 -> length-1 because you need at least 2 length):
findAllEvenFrom(int index){
int i=0;
while(true) {
//check if index-i and index+i+1 is within string bounds
if(str.charAt(index-i) != str.charAt(index+i+1))
return; // Here we found out that this index isn't a center for palindromes of >=i size, so we can give up
outputList.add(str.substring(index-i, index+i+1));
i++;
}
}
//Odd looks about the same, but with a change in the bounds.
findAllOddFrom(int index){
int i=0;
while(true) {
//check if index-i and index+i+1 is within string bounds
if(str.charAt(index-i-1) != str.charAt(index+i+1))
return;
outputList.add(str.substring(index-i-1, index+i+1));
i++;
}
}
I'm not sure if this helps the Big-O for your runtime, but it should be much more efficient than trying each substring. Worst case would be a string of all the same letter which may be worse than the "find every substring" plan, but with most inputs it will cut out most substrings because you can stop looking at one once you realize it's not the center of a palindrome.
I tried the following code and its working well for the cases
Also it handles individual characters too
Few of the cases which passed:
abaaa --> [aba, aaa, b, a, aa]
geek --> [g, e, ee, k]
abbaca --> [b, c, a, abba, bb, aca]
abaaba -->[aba, b, abaaba, a, baab, aa]
abababa -->[aba, babab, b, a, ababa, abababa, bab]
forgeeksskeegfor --> [f, g, e, ee, s, r, eksske, geeksskeeg,
o, eeksskee, ss, k, kssk]
Code
static Set<String> set = new HashSet<String>();
static String DIV = "|";
public static void main(String[] args) {
String str = "abababa";
String ext = getExtendedString(str);
// will check for even length palindromes
for(int i=2; i<ext.length()-1; i+=2) {
addPalindromes(i, 1, ext);
}
// will check for odd length palindromes including individual characters
for(int i=1; i<=ext.length()-2; i+=2) {
addPalindromes(i, 0, ext);
}
System.out.println(set);
}
/*
* Generates extended string, with dividors applied
* eg: input = abca
* output = |a|b|c|a|
*/
static String getExtendedString(String str) {
StringBuilder builder = new StringBuilder();
builder.append(DIV);
for(int i=0; i< str.length(); i++) {
builder.append(str.charAt(i));
builder.append(DIV);
}
String ext = builder.toString();
return ext;
}
/*
* Recursive matcher
* If match is found for palindrome ie char[mid-offset] = char[mid+ offset]
* Calculate further with offset+=2
*
*
*/
static void addPalindromes(int mid, int offset, String ext) {
// boundary checks
if(mid - offset <0 || mid + offset > ext.length()-1) {
return;
}
if (ext.charAt(mid-offset) == ext.charAt(mid+offset)) {
set.add(ext.substring(mid-offset, mid+offset+1).replace(DIV, ""));
addPalindromes(mid, offset+2, ext);
}
}
Hope its fine
public class PolindromeMyLogic {
static int polindromeCount = 0;
private static HashMap<Character, List<Integer>> findCharAndOccurance(
char[] charArray) {
HashMap<Character, List<Integer>> map = new HashMap<Character, List<Integer>>();
for (int i = 0; i < charArray.length; i++) {
char c = charArray[i];
if (map.containsKey(c)) {
List list = map.get(c);
list.add(i);
} else {
List list = new ArrayList<Integer>();
list.add(i);
map.put(c, list);
}
}
return map;
}
private static void countPolindromeByPositions(char[] charArray,
HashMap<Character, List<Integer>> map) {
map.forEach((character, list) -> {
int n = list.size();
if (n > 1) {
for (int i = 0; i < n - 1; i++) {
for (int j = i + 1; j < n; j++) {
if (list.get(i) + 1 == list.get(j)
|| list.get(i) + 2 == list.get(j)) {
polindromeCount++;
} else {
char[] temp = new char[(list.get(j) - list.get(i))
+ 1];
int jj = 0;
for (int ii = list.get(i); ii <= list
.get(j); ii++) {
temp[jj] = charArray[ii];
jj++;
}
if (isPolindrome(temp))
polindromeCount++;
}
}
}
}
});
}
private static boolean isPolindrome(char[] charArray) {
int n = charArray.length;
char[] temp = new char[n];
int j = 0;
for (int i = (n - 1); i >= 0; i--) {
temp[j] = charArray[i];
j++;
}
if (Arrays.equals(charArray, temp))
return true;
else
return false;
}
public static void main(String[] args) {
String str = "MADAM";
char[] charArray = str.toCharArray();
countPolindromeByPositions(charArray, findCharAndOccurance(charArray));
System.out.println(polindromeCount);
}
}
Try out this. Its my own solution.
// Maintain an Set of palindromes so that we get distinct elements at the end
// Add each char to set. Also treat that char as middle point and traverse through string to check equality of left and right char
static int palindrome(String str) {
Set<String> distinctPln = new HashSet<String>();
for (int i=0; i<str.length();i++) {
distinctPln.add(String.valueOf(str.charAt(i)));
for (int j=i-1, k=i+1; j>=0 && k<str.length(); j--, k++) {
// String of lenght 2 as palindrome
if ( (new Character(str.charAt(i))).equals(new Character(str.charAt(j)))) {
distinctPln.add(str.substring(j,i+1));
}
// String of lenght 2 as palindrome
if ( (new Character(str.charAt(i))).equals(new Character(str.charAt(k)))) {
distinctPln.add(str.substring(i,k+1));
}
if ( (new Character(str.charAt(j))).equals(new Character(str.charAt(k)))) {
distinctPln.add(str.substring(j,k+1));
} else {
continue;
}
}
}
Iterator<String> distinctPlnItr = distinctPln.iterator();
while ( distinctPlnItr.hasNext()) {
System.out.print(distinctPlnItr.next()+ ",");
}
return distinctPln.size();
}
Code is to find all distinct substrings which are palindrome.
Here is the code I tried. It is working fine.
import java.util.HashSet;
import java.util.Set;
public class SubstringPalindrome {
public static void main(String[] args) {
String s = "abba";
checkPalindrome(s);
}
public static int checkPalindrome(String s) {
int L = s.length();
int counter =0;
long startTime = System.currentTimeMillis();
Set<String> hs = new HashSet<String>();
// add elements to the hash set
System.out.println("Possible substrings: ");
for (int i = 0; i < L; ++i) {
for (int j = 0; j < (L - i); ++j) {
String subs = s.substring(j, i + j + 1);
counter++;
System.out.println(subs);
if(isPalindrome(subs))
hs.add(subs);
}
}
System.out.println("Total possible substrings are "+counter);
System.out.println("Total palindromic substrings are "+hs.size());
System.out.println("Possible palindromic substrings: "+hs.toString());
long endTime = System.currentTimeMillis();
System.out.println("It took " + (endTime - startTime) + " milliseconds");
return hs.size();
}
public static boolean isPalindrome(String s) {
if(s.length() == 0 || s.length() ==1)
return true;
if(s.charAt(0) == s.charAt(s.length()-1))
return isPalindrome(s.substring(1, s.length()-1));
return false;
}
}
OUTPUT:
Possible substrings:
a
b
b
a
ab
bb
ba
abb
bba
abba
Total possible substrings are 10
Total palindromic substrings are 4
Possible palindromic substrings: [bb, a, b, abba]
It took 1 milliseconds

Categories