Remove the duplicate characters in a string - java

This is from cracking the Coding Interview Book.
Design an algorithm and write code to remove the duplicate characters in a string
without using any additional buffer. NOTE: One or two additional variables are fine.
An extra copy of the array is not.
In the book it says time complexity is $O(N^2)$. How can we tell that the time complexity is $O(N^2)$ from the solution?
I have questions as to how the solution is removing duplicate characters. I have included them in the inline comments below.
public static void removeDuplicates(char[] str) {
if (str == null) return; // if the array is empty return nothing?
int len = str.length; // get the length of the array
if (len < 2) return; // if the array length is only 1 return nothing?
int tail = 1; // initialise tail variable as 1 !
// Why are we starting at second character here (at i = 1),why not start at i = 0 ?
for (int i = 1; i < len; ++i) {
int j;
for (j = 0; j < tail; ++j) { // so are we checking if j is less then 1 as tail has been initialized to 1
if (str[i] == str[j]) break; // stop, if we find duplicate.
}
if (j == tail) { why are we comparing j to tail(1) here ?
str[tail] = str[i]; // assigning the value
++tail; // incrementing tail
}
}
str[tail] = 0; //setting last element as 0
}
-

I completly rely on #pbabcdefp comment as I'm to lazy to test it, but it seems like your algorithm does not work.
I did not like it anyway, here how I would do it with explanation in comments :
public static void main(String[] args) {
removeDuplicates(new char[]{'a','a','b','b','c','d','c'});
}
public static final void removeDuplicates(char[] str)
{
/*
* If the str is not instantiated, or there is maximum 1 char there is no need to remove duplicates as
* it is just impossible to have duplicate with 1 char.
*/
if (str == null || str.length < 2)
return;
//loop over each char
for(int i = 0; i < str.length; i++)
{
//no need to check with earlier character as they were already checked, so we start at i + 1
for(int j = i + 1; j < str.length; j++)
{
//if they match we clear
if(str[j] == str[i])
str[j] = ' ';
}
}
System.out.println(str);
}
That would output a b cd.

First of all, this is a great book, I wish to recommend to everyone!
Generally if you are allowed to use a lots of memory, you can save time, if you are allowed to use a few variables, then you still can solve this problem by a slower algorithm. And there is the complete brute-force algorithm, when you check every possible solution.
public static void removeDuplicates(char[] str) {
if (str == null) return; // if the array is empty return nothing?
The input is a string pointer, so the string exists somewhere in the memory, the code will may modify it, but it stays at the same place. That's why the return type of the function is void, so it doesn't return anything. When it returns, the string at its original place is without duplication.
int len = str.length; // get the length of the array
if (len < 2) return; // if the array length is only 1 return nothing?
Same as above, no return value. If the string is less then 2 character, then it cannot contain a duplication.
From here the logic is the following:
Take the i-th character. Check if it was existing before this place in the string. If it exists, then the algorithm deletes the i-th character. If it doesn't exists then it stays in the string.
The proof that it's the right algorithm:
None of the characters will stay which existed earlier in the string. If a character would exists later in the string, it would be deleted because of the previous rule.
If this would be the algorithm, it would work fine, but there would be "Empty" characters in the string. The string wouldn't be smaller, even tough it should contain less characters.
That's why the algorithm keeps track on the "tail of the output string". That's why the tail is equals 1 at the beginning, since the 1st character will be definitely part of the result string.
When the current character should be deleted, the output string's tail wont move, no new character added to the result. When the current character should be part of the result, then it gets copied to the tail of the result string.
When the algorithm reaches the end of the input string, it closes the result string.
Complexity:
It means, relative to the size of the input, which is called 'n' how many steps the algorithm has to take. Typically cycles and recursions counts only.
This code has 2 for loop embedded into each other.
The external goes from 1 to n every time.
The internal one goes from 0 to tail where tail goes from 1 to n. So the worst case scenario, the internal one goes by average from 1 to n/2.
This means your complexity is n*(n/2). Since 2 is a constant, your complexity is n*n.

The O time complexity is about worst-case. Ignoring the array you get and the actions you do on it, when you have 2 nested for loops bounded by the length of the string, your complexity couldn't be higher than n^2, and thus it is O(n^2) (it is only an upper bound, if you'd like to show that it's also a lower bound more work should be done).

O(N^2) basically means that as the number of inputs increases, N being the number of inputs, the complexity (number of operations performed) will scale porportional to N^2 + some constant value.
So looking at the code, str.length is N. For each element, you compare it to each other element, N compared N times = N^2.
Now O(N^2) is not meant to be exact. By definition it is concerned with only the non-constant factors that contribute to complexity growth. It will never tell you how quickly a particular algorithm will run, it purely tells you how the time it takes to run will scale with fluctuations in the number of elements being operated on.

Use this for remove every duplicate lowercase
static boolean contains(char c, char[] array) {
for (char x : array) {
if (x == c) {
return true;
}
}
return false;
}
public static void main(String[] args) {
String s = "stackoverflow11221113" ;
String result = "";
for(char ch:s.toCharArray()){
if(!contains(ch,result.toCharArray())){
result +=ch;
}
}
System.out.println(result);
}
Use this for remove every duplicated character both lowercase or uppercase
static boolean contains(char c, char[] array) {
for (char x : array) {
if (x == c) {
return true;
}
}
return false;
}
public static void main(String[] args) {
String s = "StackOverFlow11221113" ;
String result = "";
for(char ch:s.toCharArray()){
if(!contains(ch,result.toLowerCase().toCharArray())){
result +=ch;
}
}
System.out.println(result);
}

Related

I have to check if an element of the array is the same as the previous one (java)

I followed the advice of the site but I did not find answers that satisfied me.
I have to solve a school exercise. I have an array and I need to check if there is at least a sequence of 3 consecutive "a" characters.
public static String alternative(char[] a) {
String ret = "";
int consecutiveCounter = 0;
int i = 1;
while(consecutiveCounter<3){
while(i<= a.length){
if(a[i] =='a' && a[i] == a[i-1] ) {
consecutiveCounter++;
} else {
consecutiveCounter = 0;
}
i++;
}
}
if (consecutiveCounter == 3) {
ret += "there are three consecutive a char";
} else {
ret += "there are not three consecutive a char";
}
return ret;
}
public static void main(String[] args) {
char[] a = new char[]{'a', 'a', 'b', 'c', 'a', 'b', 'a'};
System.out.println(alternative(a));
}
the terminal gives me this exception:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 7 out of bounds for length 7
at Es1.alternative(Es1.java:9)
at Es1.main(Es1.java:31)
I can't increase the value of the index (i) without going out of the array bounds
It may be better to use for loops here, both checking the bounds of array, and the inner counting occurrences of 'a' and returning early as soon as the required limit is reached:
public static String alternative(char[] a) {
for (int i = 0, n = a.length; i < n; ) {
for (int count = 1; i < n && a[i++] == 'a'; count++) {
if (count == 3) {
return "there are three consecutive 'a' chars";
}
}
}
return "there are not three consecutive 'a' chars";
}
It's worth mentioning that String class (which is basically built on char array) has several methods to implement this functionality:
String::contains: "aabbccaaa".contains("aaa") // true
String::indexOf: "aabbccaa".indexOf("aaa") // -1, aaa not found
String::matches (using regular expression): "aabbaaaccaa".matches(".*a{3}.*") // true
Also, I don't think your outer looping will work well.
1.Suppose there is no consecutive char, then consecutiveCounter will remain 0 and while(consecutiveCounter<3) wont end.
2.Even if there is one or two, but it will be set to 0 again, and while(consecutiveCounter<3) wont end.
Here are some suggestions.
Use a for loop from i = 1 to i < a.length. Then i won't exceed the last index of the array.
You are only trying to find 3 consecutive 'a's. So initialize consecutiveCounter to 1.
As soon as the first consecutive pair are found, you increment consecutiveCounter and it will now be 2 which is correct.
Then in the same if clause check to see if that value equals 3. If so, return the String immediately (you may have even 4 or 5 consecutive a's but you also have 3 so return when the count of 3 is first encountered.
Else, if the if statement fails, reset consecutiveCounter to 1 and continue the loop.
At the end, outside the loop, return the string that indicates the requirement wasn't met.
Note: If you were trying to find the maximum number of consecutive a's setting the counter to 1 wouldn't work because you may have no a's at all. But since you are looking for a specific number == 3, it works fine.

Comparing all the characters beetween two strings(even if they contain numbers)in java

Ok so I currently have a String array which contains keycodes, and i want to check if the first element shares common specifications with the second , e.g. [012] has similar elements with [123]. I currently loop through the length of the first element, and then loop through the length of the second element, and compare those two like this:
If(A[1].charAt(j) == A[2].charAt[i]) c++; c is a counter to show how many
common elements the keycodes have. Here is the method i created
static boolean hasSimilarity(String[] A, int K, int i){
int c = 0;
for(int j = 0;j<K;j++){
for(int m = j;m<K;m++){
if(A[i].charAt(j) == A[i+1].charAt(m)) c++;
}
}
return c != 0;
}
And here is the execution of it in the Main class:
int max = -1;
findSimilar FS = new findSimilar();
for (int i = 0; i < sum.length -1; i++) {
boolean hasSimilar = FS.hasSimilarity(key,K,i);
if (!hasSimilar) {
int summ = sum[i] + sum[i + 1];
System.out.println(summ);
if (summ > max) {
max = summ;
}
}
}
When i run this, i get a java.lang.StringIndexOutOfBoundsException out of range: 0 . What am I doing wrong? Is there any better way to compare two keycodes in order to find similarities beetween them?
This error:
java.lang.StringIndexOutOfBoundsException out of range: 0
Can only occur if one of your strings is the blank string "".
You are attempting to get charAt(0) when there is no char 0 (ie first char).
——-
You would avoid this problem, and have a far more efficient algorithm, if you first collected the counts of each character then compared those, which would have time complexity O(n), whereas your algorithm is O(n2) (albeit that it seems your n - the length of your inputs - is small).

Create substring using loop

I am given a string from which i have to find out sub-strings that satisfy the following conditions
all characters in the sub-string are same. eg: aa,bbb,cccc.
all the character except the middle character have to be the same.
eg: aba, bbabb, etc.
I've made an algo something like this
I beak the string using two loops 1st loop holds the first char and the second loop traverses through the string.
Then i send the sub-string to the vet() to see if the substring contains less than or equals two character.
If the sub-string contains two character then i check if its a palindrome
public static int reverse(String s)
{
String wrd="";
for(int i = s.length()-1 ;i>=0;i--)
wrd = wrd + s.charAt(i);
if(s.equals(wrd))
return 1;
else
return 0;
}
public static boolean vet(String s)
{
HashSet<Character> hs = new HashSet<>();
for(char c : s.toCharArray())
{
hs.add(c);
}
if(hs.size() <= 2)
return true;
else
return false;
}
static long substrCount(int n, String s) {
List<String> al = new ArrayList<>();
for(int i=0;i<s.length();i++)
{
for(int j=i;j<s.length();j++)
{
if(vet(s.substring(i,j+1)))
{
if(reverse(s.substring(i,j+1)) == 1)
al.add(s.substring(i,j+1));
}
}
}
return al.size();
}
This code works fine for small strings, however if the string is big say ten thousand character, this code will throw Time limit exception.
I suspect the loop that breaks the string and create the sub-string in the substrCount() is causing the time complexity as it has nested loops.
Please review this code and provide a better way to break the string or if the complexity is increasing due to some other section then let me know.
link : https://www.hackerrank.com/challenges/special-palindrome-again/problem?h_l=interview&playlist_slugs%5B%5D=interview-preparation-kit&playlist_slugs%5B%5D=strings
You can collect counts from left side and right side of the string in 2 separate arrays.
Now, we collect counts in the fashion of if previous char equals current char, increase count by 1, else set it to 1.
Example:
a a b a a c a a
1 2 1 1 2 1 1 2 // left to right
2 1 1 2 1 1 2 1 // right to left
For strings that have all characters equal, we just collect all of them while iterating itself.
For strings with all equal except the middle character, you can use above the above table and you can collect string as below:
Pseudocode:
if(str.charAt(i-1) == str.charAt(i+1)){ // you will add checks for boundaries
int min_count = Math.min(left[i-1],right[i+1]);
for(int j=min_count;j>=1;--j){
set.add(str.substring(i-j,i+j+1));
}
}
Update:
Below is my accepted solution.
static long substrCount(int n, String s) {
long cnt = 0;
int[] left = new int[n];
int[] right = new int[n];
int len = s.length();
for(int i=0;i<len;++i){
left[i] = 1;
if(i > 0 && s.charAt(i) == s.charAt(i-1)) left[i] += left[i-1];
}
for(int i=len-1;i>=0;--i){
right[i] = 1;
if(i < len-1 && s.charAt(i) == s.charAt(i+1)) right[i] += right[i+1];
}
for(int i=len-1;i>=0;--i){
if(i == 0 || i == len-1) cnt += right[i];
else{
if(s.charAt(i-1) == s.charAt(i+1) && s.charAt(i-1) != s.charAt(i)){
cnt += Math.min(left[i-1],right[i+1]) + 1;
}else if(s.charAt(i) == s.charAt(i+1)) cnt += right[i];
else cnt++;
}
}
return cnt;
}
Algorithm:
The algorithm is the same as explained above with a few additional stuff.
If the character is at the boundary, say 0 or at len-1, we just look at right[i] to count the strings, because we don't have a left here.
If a character is inside this boundary, we do checks as follows:
If previous character equals next character, we check if previous character does not equal current character. We do this because, we want to avoid future addition of strings at the current iteration itself(say for strings like aaaaa where we are at the middle a).
Second condition says s.charAt(i) == s.charAt(i+1), meaning, we again have strings like aaa and we are at the first a. So we just add right[i] to indicate addition of strings like a,aa,aaa).
Third does cnt++ meaning addition of individual character.
You can make a few optimizations like completely avoiding right array etc, but I leave that to you as an exercise.
Time complexity: O(n), Space complexity: O(n)
Your current solution runtime is O(n^4). You can reduce it to O(n^2logn) by removing the number of character count in substrings and optimising the palindrome check portion.
To do so, you have to pre-calculate an array say "counter" where every position of the "counter" array indicates number of different characters from starting index to that position.
After constructing the array, you can check if a substring has more than two characters in O(1) by subtracting the end position and starting position value of counter array. If the value is 1 then there have only one character in the substring. If the value is 2, then you can binary search in the counter array between the substrings starting and end positions to find the position of single character. After finding out the position of the single character its straight forward to check if the substring is palindrome or not.
UPDATE!
Let me explain with an example:
Suppose the string is "aaabaaa".
So, the counter array would be = [1, 1, 1, 2, 2, 2, 2];
Now, lets assume for a specific time, the outer for loops value i = 1 and the inner for loops value j = 5; so the substring is "aabaa".
Now to find the number of character in the substring by following code:
noOfDifferentCharacter = counter[j] - counter[i-1] + 1
If the noOfDifferentCharacter is 1 then no need to check for palindrome. If the noOfDifferentCharacter is 2 like in our case we need to check if the substring is palindrome. To check if the substring is palindrome have to perform a binary search in the counter array from index i to j to check for the position where the value is greater than its previous index. In our case the position is 3, then you just need to check if the position is the middle position of the substring. Note that the counter array is sorted.
Hope this helps. Let me know if you don't understand any step. Happy coding!

How do I improve the below algorithm to recursively remove adjacent duplicate characters from right to left in a string?

The problem statement is to remove duplicates in a string recursively.
For eg: abcddaacg -> abccg-> abg
Now the algorithm I implemented is thus:
Keep 2 pointers (i and j). i is always < j. str[j] is always the element str[i] compares to remove duplicates. Hence when j = 6 and i = 5, I replace both of them with '\0' and then I update j to 7 (c). Then when j = 4 and i = 3 (both are d, j got updated cause str[4] != str[6] and hence j = i, became j = 4) we update both of them to '\0'.
My problem is with the next step when I update j to 7. For this I have to search for the next character which is not '\0'. This, is what is making it a O(n^2) implementation. How can I make this better ? O(n)
Below is the code:
static void remDups (String input) {
char [] str = input.toCharArray();
int j = input.length()-1;
int i = input.length()-2;
while (i >= 0){
if (str[i] == str[j]) {
str[i] = '\0';
str[j] = '\0';
j++;
while (str[j] == '\0' && j < str.length) {
j++;
}
} else {
j = i;
}
i--;
}
i = 0;
while (i < input.length()) {
if (str[i] != '\0')
System.out.print(str[i]);
i++;
}
}
Here is an iterative solution using a stack and removing adjacent duplicates on a char[] because they are easier to work with since String object is immutable.
On each iteration:
use a boolean flag to keep track if any adjacent dups need to be removed.
push elements that don't have equal adjacents onto the stack.
if we found equal adjacents, increment counter while they are equal (as you may have more than two equal adjacent's) update the flag to say that adjacents dups need to be removed
Otherwise we are done
if the flag was activated, we overwrite our char[] with the non-dups that were stored in the stack and repeat the loop until no dups are removed.
Here is the code:
static char[] removeAdjDups(char[] data) {
if (data == null) {
return null;
}
Stack<Character> stack = new Stack<Character>();
boolean removal = true; // flag keeping track if any dups were removed
char[] temp;
while (removal) {
/* set removal to false */
removal = false;
/* push elements that don't have equal adjacents onto the stack */
for (int i = 1; i < data.length; i++) {
if (data[i - 1] != data[i]) {
stack.push(data[i - 1]);
} else {
while (i < data.length && data[i - 1] == data[i]) {
i++;
}
/* if we found equal adjacents, activate the removal flag */
removal = true;
}
if (i == data.length - 1) {
stack.push(data[i]);
}
}
/* if dups were removed
store the array with removed adjacent dups into original data */
if (removal) {
temp = new char[stack.size()];
for (int i = temp.length - 1; i >= 0; i--) {
temp[i] = stack.pop();
}
data = temp;
}
}
return data;
}
public static void main(String[] args) {
String str = "abcddaacg";
System.out.println(removeAdjDups(str.toCharArray()));
}
Output:
abg
The inefficiency in your algorithm comes from the fact that you're using a data structure (and an approach) that leaves empty space where removed pairs used to be, which makes it expensive to work around them. I can think of a more efficient algorithm that uses your approach but a better data structure, and I can also think of a more efficient algorithm that uses a different approach.
This feels like homework, so I'd rather nudge you towards finding the answers on your own than give them to you directly, but I'd encourage you to figure out both algorithms I mentioned (different data structure but same algorithm, and different algorithm). If you need a hint on either, reply in the comments and I'll work with you to get there...
EDIT: I just saw your comment saying that matching is right-to-left, so my alternate algorithm won't work. But I'd encourage you to try to figure out what it would be, if greedy left-to-right matches were allowed.
EDIT 2: This algorithm actually will work with right-to-left matching, you just need to iterate right-to-left across the string.

recursive method for flipping card

in java - if i have an index card with the letter C written on side, and S on the other. how do i write a recursive method that print each sessoin of dropping the cards with C's and S's. for example: if i drop it 4 times,all possible ways to drop it ara as follows in this specific order:
SCSC
SCSS
SSCC
SSCS
SSSC
SSSS
It's actually rather simple:
public void flip(List<String> items, int length, String prefix)
{
if (prefix.length() == length) {
items.add(prefix);
return;
}
increments(items, length, prefix + 'C');
increments(items, length, prefix + 'S');
}
As you can see, there are two recursive calls, one for the 'C' character and one for the 'S' character, and the recursion base case is when the prefix length is the length specified (4, in your case)
Call like so:
List<String> inc = new LinkedList<>();
increments(inc, 4, "");
for (String s : inc)
System.out.println(s);
Which outputs:
CCCC
CCCS
CCSC
CCSS
CSCC
CSCS
CSSC
CSSS
SCCC
SCCS
SCSC
SCSS
SSCC
SSCS
SSSC
SSSS
This method can easily be generalised for any array of characters:
public void increments(List<String> items, int length,
String prefix, char[] chars)
{
if (prefix.length() == length) {
items.add(prefix);
return;
}
for (char c : chars)
increments(items, length, prefix + c, chars);
}
List<String> inc = new LinkedList<>();
increments(inc, 4, "", new char[] {'C', 'S'});
for (String s : inc)
System.out.println(s);
This yields the same output.
Note: this method has a high complexity, O(pow(chars.length, length)), so attempting to run it with a large input size will take a (very) long time to complete.
Integer.toBinaryString(int) Approach
As requested:
public void increments_ibs(List<String> items, int n, int i)
{
if (i >= Math.pow(2, n))
return;
String bs = Integer.toBinaryString(i);
while (bs.length() < n)
bs = "0" + bs;
items.add(bs.replaceAll("0", "C").replaceAll("1", "S"));
increments_ibs(items, n, i+1);
}
This is essentially an iterative algorithm written recursively.
So this problem is actually a lot simpler if you realize that the card flipping is actually just counting upwards by one in binary.
With that in mind, you can just keep track of an X bit long number (where X is the number of cards you want to keep track of) and then print the cards by checking to see where there are 1's (S) or 0's (C).
After you do that, check to see if all of the positions have 1's. If they do, exit recursion. If they don't, add one to the number and run the function again.
Edit
If you know the number of bits in the number beforehand (something easy enough to calculate) you could use bit shifting (>> will be the best here). So for instance, you could have a quick for loop to go through the number and check each position.
int cardTracker = 0b0110; //this is 6 in decimal
char[] toPrint = new char[4];
//The 4 here is the length of the binary number
for(int i = 0; i < 4; i++)
{
//This if statement checks if the last number is 0 or 1
if((cardTracker >> ((4 - 1) - i) % 2 == 0)
{
toPrint[i] = 'C';
}
else
{
toPrint[i] = 'S';
}
}
The above will print the following if you were to print the contents of toPrint.
CSSC
Hopefully you can use the above and adapt it to your code for the recursive problem.

Categories