Remove duplicate characters in a string in Java

Remove duplicate characters in a string in Java - java

I started to read the famous "cracking the Coding Interview" book.
Design an algorithm and write code to remove the duplicate characters in a string
without using any additional buffer. NOTE: One or two additional variables are fine.
An extra copy of the array is not.
I found a similar topic here : Remove the duplicate characters in a string
The solution given by the author was that :
public static void removeDuplicates(char[] str) {
if (str == null) return;
int len = str.length;
if (len < 2) return;
int tail = 1;
for (int i = 1; i < len; ++i) {
int j;
for (j = 0; j < tail; ++j) {
if (str[i] == str[j]) break;
}
if (j == tail) {
str[tail] = str[i];
++tail;
}
}
str[tail] = 0;
}
The problem here is that the author used an array to be an argument for this function. So my question is : how can you write an algorithms with a STRING as an argument? Because I felt like it's really easier to use an array here and it's like that you "avoid the difficulty" of the exercice (in my opinion, I'm a newly Java developer).
How can you write such an algorithm?

Java strings are immutable, so you can't do it with a string without copying the array into a buffer.

for this to work with a String you'd have to return a String from the method that represents the modified str with no duplicates. not sure if it'll go against the rules, but here's how I'd solve the problem with String's:
for each character in the string, i would split the string at that character. i would remove all instances of that character from the latter substring. i would then concatenate the former substring with the modified latter substring, making sure that the character is still kept in it's place. something like this:
public static String removeDuplicates( String str ) {
if( str == null || str.length() < 2 )
return str;
String temp;
for( int x = 0; x + 1 < str.length(); x++ ) {
temp = str.charAt( x ) + "";
str = str.substring( 0, x ) + temp + str.substring( x + 1 ).replaceAll( temp, "" );
}
return str;
}

In Java 8 we can do it like this
private void removeduplicatecharactersfromstring() {
String myString = "aabcd eeffff ghjkjkl";
StringBuilder builder = new StringBuilder();
System.out.println(myString);
Arrays.asList(myString.split(" "))
.forEach(s -> {
builder.append(Stream.of(s.split(""))
.distinct().collect(Collectors.joining()).concat(" "));
});
System.out.println(builder); // abcd ef ghjkl
}

Related

How to efficiently remove consecutive same characters in a string

I wrote a method to reduce a sequence of the same characters to a single character as follows. It seems its logic is correct while there is a room for improvement in terms of performance, according to my tutor. Could anyone shed some light on this?
Comments of aspects other than performance is also really appreciated.
public class RemoveRepetitions {
public static String remove(String input) {
String ret = "";
String last = "";
String[] stringArray = input.split("");
for(int j=0; j < stringArray.length; j++) {
if (! last.equals(stringArray[j]) ) {
ret += stringArray[j];
}
last = stringArray[j];
}
return ret;
}
public static void main(String[] args) {
System.out.println(RemoveRepetitions.remove("foobaarrbuzz"));
}
}

We can improve the performance by using StringBuilder instead of using string as string operations are costlier. Also, the split function is also not required (it will make the program slower as well).
Here is a way to solve this:
public static String remove(String input)
{
StringBuilder answer = new StringBuilder("");
int N = input.length();
int i = 0;
while (i < N)
{
char c = input.charAt(i);
answer.append( c );
while (i<N && input.charAt(i)==c)
++i;
}
return answer.toString();
}
The idea is to iterate over all characters of the input string and keep appending every new character to the answer and skip all the same consecutive characters.

Possible change which you could think of in your code is:
Time Complexity: Your code is achieving output in O(n) time complexity, which might be the best possible way.
Space Complexity: Your code is using extra memory space which arises due to splitting.
Question to ask: Can you achieve this output, without using the extra space for character array that you get after splitting the string? (as character by character traversal is possible directly on string).
I can provide you the code here but, it would be great if you could try it on your own, once you are done with your attempts
you can lookup for the best solution here (you are almost there)
https://www.geeksforgeeks.org/remove-consecutive-duplicates-string/
Good luck!

As mentioned before, it is much better to access the characters in the string using method String::charAt or at least by iterating a char array retrieved with String::toCharArray instead of splitting the input string into String array.
However, Java strings may contain characters exceeding basic multilingual plane of Unicode (e.g. emojis 😂😍😊, Chinese or Japanese characters etc.) and therefore String::codePointAt should be used. Respectively, Character.charCount should be used to calculate appropriate offset while iterating the input string.
Also the input string should be checked if it's null or empty, so the resulting code may look like this:
public static String dedup(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int prev = -1;
int n = str.length();
System.out.println("length = " + n + " of [" + str + "], real length: " + str.codePointCount(0, n));
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ) {
int cp = str.codePointAt(i);
if (i == 0 || cp != prev) {
sb.appendCodePoint(cp);
}
prev = cp;
i += Character.charCount(cp); // for emojis it returns 2
}
return sb.toString();
}
A version with String::charAt may look like this:
public static String dedup2(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int n = str.length();
StringBuilder sb = new StringBuilder(n);
sb.append(str.charAt(0));
for (int i = 1; i < n; i++) {
if (str.charAt(i) != str.charAt(i - 1)) {
sb.append(str.charAt(i));
}
}
return sb.toString();
}
The following test proves that charAt fails to deduplicate repeated emojis:
System.out.println("codePoint: " + dedup ("😂😂😍😍😊😊😂 hello"));
System.out.println("charAt: " + dedup2("😂😂😍😍😊😊😂 hello"));
Output:
length = 20 of [😂😂😍😍😊😊😂 hello], real length: 13
codePoint: 😂😍😊😂 helo
charAt: 😂😂😍😍😊😊😂 helo

CodingBat starOut, why using substring won't work correctly

I am solving coding challenge on CodingBat.com. Here is the question:
Given a string and a non-empty word string, return a version of the
original String where all chars have been replaced by pluses ("+"),
except for appearances of the word string which are preserved
unchanged.
plusOut("12xy34", "xy") → "++xy++"
plusOut("12xy34", "1") → "1+++++"
plusOut("12xy34xyabcxy", "xy") → "++xy++xy+++xy"
Here is my attempted solution:
public String plusOut(String str, String word)
{
String ret = "";
for (int i = 0; i < str.length() - word.length() + 1; ++i) {
if (str.substring(i, i + word.length()).equals(word))
ret += word;
else
ret += "+";
}
return ret;
}
But is giving wrong outputs: giving too many plus signs. I don't understand why this shouldn't work. I suspect that the substring method is not returning enough matches, so the plus sign is appended. But I don't see why this maybe so.

I would use a StringBuilder to construct the result to avoid creating multiple String objects as String in java is immutable:
public String plusOut(String str, String word) {
StringBuilder result = new StringBuilder(str);
int len = str.length(), wordLen = word.length(), index = 0;
while(index < len){
if ( (index <= len-wordLen) && (str.substring(index, index+wordLen).equals(word))){
index += wordLen;
continue;
}
result.setCharAt(index++, '+');
}
return result.toString();
}

You were doing a few things wrong. I've corrected your code although there is probably a cleaner way to do this. I will explain what's changed below.
public static String plusOut(String str, String word)
{
String ret = "";
for (int i = 0; i < str.length(); ++i) {
int endIndex = i + word.length();
if (endIndex < str.length() + 1
&& str.substring(i, i + word.length()).equals(word)) {
ret += word;
i = i + word.length() - 1;
} else
ret += "+";
}
return ret;
}
First mistake is that you are not looping over the whole content of str and therefore never reach the last character of str.
Another problem is that once you find a word, you don't "jump" to the correct next index in the loop, but still continue looping over characters of the found word, which results in additional + characters in your result string.
i = i + word.length() - 1;
In your solution, the above will put you to the next index of a character inside str that you should be looking at. Example:
In string 12xy34xyabcxy looking for xy.
You will find that word xy starts at index 2 and ends at 3.
At that point you have result string ++xy after adding the found word to it.
Now, the problem begins. You still end up going over index 3 and adding an additional + because the next couple of characters do not add up to your word.
The 2 characters after the found xy also add + and you now have ++xy+++ which is incorrect.
endIndex < str.length() + 1
endIndex is named after what it is - end index of your substring.
This check prevents us from checking for xy when there aren't enough characters left in the string from current index to the last in order to make up xy, so we end up adding + for each remaining character instead.

Do it like this :
public static String plusOut(String str, String word)
{
String ret = "";
int i;
for (i = 0; i < str.length() - word.length() +1 ; i++) {
if (str.substring(i, i + word.length()).equals(word)) {
ret += word;
i += word.length() - 1;
}
else
ret += "+";
}
while (i < str.length()) {
ret += "+";
i++;
}
return ret;
}

Here is your solution
public String plusOut(String str, String word)
{
String ret = "";
for (int i = 0; i < str.length();) {
if (i + word.length()<= str.length() && str.substring(i, i + word.length()).equals(word)) {
ret += word;
i+=word.length();
}
else{
ret += "+";
i++;
}
}
return ret;
}

Trying to change a string to altcase

Trying to write a code that makes a string become altcase (ie. "hello" becomes "HeLlO". I borrowed code from another question on this forum that asked for something similar (Java Case Switcher) However, the code only switched the casing of a letter instead of having a capital letter (first), then lowercase letter, etc. pattern.
What I have so far:
public String altCase(String text)
{
String str = "";
for (int i = 0; i <= text.length(); i++)
{
char cA = text.charAt(i);
if (text.charAt(0).isUppercase)
{
str += Character.toLowerCase(cA);
}
if (text.charAt(0).isLowercase)
{
str += Character.toUpperCase;
}
if(i != 0 && Character.isUpperCase(cA))
{
if (text.charAt(i)-1.isUpperCase || text.charAt(i)+1.isUpperCase)
{
str += Character.toLowerCase(cA);
}
else
{
str += cA;
}
}
if(i != 0 && Character.isLowerCase(cA))
{
if (text.charAt(i)-1.isLowerCase || text.charAt(i)+1.isLowerCase)
{
str += Character.toUpperCase(cA);
}
else
{
str += cA;
}
}
}
return str;
}
I'm still relatively new to coding in general so please excuse my inefficiencies, as well as any headaches I might induce from the lack of experience in my coding. I cannot tell where I am going wrong except maybe when I typed "text.charAt(i)-1.isLowerCase" as the statement seems a bit illogical, but I am lost in terms of trying to come up with something else that would accomplish the same thing. Or is my error completely elsewhere? Thanks for any help in advance.

The modulus operator could take you a long way here...
StringBuilder rslt = new StringBuilder();
for (int i = 0; i < text.length(); i++) {
char c = text.charAt(i);
switch (i % 2) {
case 0:
rslt.append(Character.toUpperCase(c));
break;
case 1:
rslt.append(Character.toLowerCase(c));
break;
}
}
return rslt.toString();

If I truly understand what you want to get is that:
Get a string, change it in a format of AbCdEfG.... and so on.
There is more simple solution.
Get a string and with for loop, for every character, change character size depending on position in string, for i%2 == 0 upper case, and i%2 == 1 lower case.
public String altCase(String text)
{
String str = "";
for (int i = 0; i < text.length(); i++)
{
char cA = text.charAt(i);
if (i%2 == 0)
{
str += Character.toUpperCase(cA);
}
else
{
str += Character.toLowerCase(cA);
}
}
return str;
}

I would start with a StringBuilder (a mutable character sequence) of text.toLowerCase(); then set the characters at even indices to their capital equivalents (and your method doesn't appear to depend on instance state, so it might be static). Something like,
public static String altCase(String text) {
StringBuilder sb = new StringBuilder(text.toLowerCase());
for (int i = 0; i < text.length(); i += 2) {
sb.setCharAt(i, Character.toUpperCase(sb.charAt(i)));
}
return sb.toString();
}

IntStream.range(0, s.length()).mapToObj(i -> i % 2 == 0 ?
Character.toUpperCase(s.charAt(i)) :
Character.toLowerCase(s.charAt(i)))
.map(String::valueOf)
.collect(Collectors.joining());

Java string index out of bounds in for loop (codingbat function mirrorEnds)

I have a question regarding the problem at codingbat in String 3. Question is as follows:
Given a string, look for a mirror image (backwards) string at both the
beginning and end of the given string. In other words, zero or more
characters at the very begining of the given string, and at the very
end of the string in reverse order (possibly overlapping).
For example, the string "abXYZba" has the mirror end "ab"
mirrorEnds("abXYZba") → "ab"
mirrorEnds("abca") → "a"
mirrorEnds("aba") → "aba"
My code is as follows:
public String mirrorEnds(String string) {
if(string.length() <=1) return string;
String x = "";
int y = string.length() - 1;
for(int i = 0; i < string.length()/2; i++)
{
if(string.charAt(i) == string.charAt(y))
{
x+= Character.toString(x.charAt(i));
y--;
}
else
{
return x;
}
}
return string;
}
When I try it for the following:
"xxYxx"
String length is 5 so index from 0-4. If I run it on my code, the logic will be:
i = 0 and y = 4;
string.charAt(i) == string.charAt(y) //true and i++ and y--
string.charAt(i) == string.charAt(y) //true and i++ and y--
//i is == string.length()/2 at this point
But the problem throws me an error saying indexoutofbounds. Why is this the case?

You are accessing the ith character of the wrong string here:
x += Character.toString(x.charAt(i));
The String x is empty at first, so the character at index 0 doesn't exist.
Access the original string instead.
x += Character.toString(string.charAt(i));

Here my code for this problem , simple one
public String mirrorEnds(String string) {
int start = 0;
int end = string.length()-1;
for(int i=0;i<string.length();i++){
if(string.charAt(start) == string.charAt(end) ){
start++;
end--;
}
if(start != ((string.length()-1)-end)){
break;
}
}
return string.substring(0,start);
}

public String mirrorEnds(String string) {
String g="";
for(int i=0;i<string.length();i++){
if(string.charAt(i)==string.charAt(string.length()-1-i)){
g=g+string.charAt(i);
} else{
break;
}
}
return g;
}

You have a good start, but I think you should consider an even simpler approach. You only need to use one index (not both i and y) to keep track of where you are in the string because the question states that overlapping is possible. Therefore, you do not need to run your for loop until string.length() / 2, you can have it run for the entire length of the string.
Additionally, you should consider using a while loop because you have a clear exit condition within the problem: once the string at the beginning stops being equal to the string at the end, break the loop and return the length of the string. A while loop would also use less variables and would reduce the amount of conditional operators in your code.
Here's my answer to this problem.
public String mirrorEnds(String string) {
String mirror = "";
int i = 0;
while (i < string.length() && string.charAt(i) == string.charAt(string.length() - i - 1) {
mirror += string.charAt(i);
i++;
}
return mirror;
}
Another handy tip to note is that characters can be appended to strings in Java without casting. In your first if statement within your for loop, you don't need to cast x.charAt(i) to a string using Character.toString(), you can simply append x.charAt(i) to the end of the string by writing x += x.charAt(i).

public String mirrorEnds(String str) {
StringBuilder newStr = new StringBuilder();
String result = "";
for (int i=0; i <= str.length(); i++){
newStr.append(str.substring(0, i));
if (str.startsWith(newStr.toString()) && str.endsWith(newStr.reverse().toString()))
result = str.substring(0, i);
newStr.setLength(0);
}
return result;
}

public String mirrorEnds(String string) {
// reverse given string
String reversed = "";
for (int i = string.length() - 1; i >= 0; i--) {
reversed += string.charAt(i);
}
// loop through each string simultaneously. if substring of 'string' is equal to that of 'reversed',
// assign the substring to variable 'text'
String text = "";
for (int i = 0; i <= string.length(); i++) {
if (string.startsWith(string.substring(0, i)) ==
string.startsWith(reversed.substring(0, i))) {
text = string.substring(0, i);
}
}
return text;
}

public String mirrorEnds(String string) {
String out = "";
int len = string.length();
for(int i=0,j = len-1;i<len;i++,j--)
{
if(string.charAt(i) == string.charAt(j))
out += string.charAt(i);
else
break;
}
return out;
}

Can anybody help me to correct the following code?

Please help me to identify my mistakes in this code. I am new to Java. Excuse me if I have done any mistake. This is one of codingbat java questions. I am getting Timed Out error message for some inputs like "xxxyakyyyakzzz". For some inputs like "yakpak" and "pakyak" this code is working fine.
Question:
Suppose the string "yak" is unlucky. Given a string, return a version where all the "yak" are removed, but the "a" can be any char. The "yak" strings will not overlap.
public String stringYak(String str) {
String result = "";
int yakIndex = str.indexOf("yak");
if (yakIndex == -1)
return str; //there is no yak
//there is at least one yak
//if there are yaks store their indexes in the arraylist
ArrayList<Integer> yakArray = new ArrayList<Integer>();
int length = str.length();
yakIndex = 0;
while (yakIndex < length - 3) {
yakIndex = str.indexOf("yak", yakIndex);
yakArray.add(yakIndex);
yakIndex += 3;
}//all the yak indexes are stored in the arraylist
//iterate through the arraylist. skip the yaks and get non-yak substrings
for(int i = 0; i < length; i++) {
if (yakArray.contains(i))
i = i + 2;
else
result = result + str.charAt(i);
}
return result;
}

Shouldn't you be looking for any three character sequence starting with a 'y' and ending with a 'k'? Like so?
public static String stringYak(String str) {
char[] chars = (str != null) ? str.toCharArray()
: new char[] {};
StringBuilder sb = new StringBuilder();
for (int i = 0; i < chars.length; i++) {
if (chars[i] == 'y' && chars[i + 2] == 'k') { // if we have 'y' and two away is 'k'
// then it's unlucky...
i += 2;
continue; //skip the statement sb.append
} //do not append any pattern like y1k or yak etc
sb.append(chars[i]);
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(stringYak("1yik2yak3yuk4")); // Remove the "unlucky" strings
// The result will be 1234.
}

It looks like your programming assignment. You need to use regular expressions.
Look at http://www.vogella.com/articles/JavaRegularExpressions/article.html#regex for more information.
Remember, that you can not use contains. Your code maybe something like
result = str.removeall("y\wk")

you can try this
public static String stringYak(String str) {
for (int i = 0; i < str.length(); i++) {
if(str.charAt(i)=='y'){
str=str.replace("yak", "");
}
}
return str;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove duplicate characters in a string in Java - java

Java strings are immutable, so you can't do it with a string without copying the array into a buffer.

Related

How to efficiently remove consecutive same characters in a string

CodingBat starOut, why using substring won't work correctly

Trying to change a string to altcase

Java string index out of bounds in for loop (codingbat function mirrorEnds)

Can anybody help me to correct the following code?

Categories

Resources