How to efficiently remove consecutive same characters in a string - java

I wrote a method to reduce a sequence of the same characters to a single character as follows. It seems its logic is correct while there is a room for improvement in terms of performance, according to my tutor. Could anyone shed some light on this?
Comments of aspects other than performance is also really appreciated.
public class RemoveRepetitions {
public static String remove(String input) {
String ret = "";
String last = "";
String[] stringArray = input.split("");
for(int j=0; j < stringArray.length; j++) {
if (! last.equals(stringArray[j]) ) {
ret += stringArray[j];
}
last = stringArray[j];
}
return ret;
}
public static void main(String[] args) {
System.out.println(RemoveRepetitions.remove("foobaarrbuzz"));
}
}

We can improve the performance by using StringBuilder instead of using string as string operations are costlier. Also, the split function is also not required (it will make the program slower as well).
Here is a way to solve this:
public static String remove(String input)
{
StringBuilder answer = new StringBuilder("");
int N = input.length();
int i = 0;
while (i < N)
{
char c = input.charAt(i);
answer.append( c );
while (i<N && input.charAt(i)==c)
++i;
}
return answer.toString();
}
The idea is to iterate over all characters of the input string and keep appending every new character to the answer and skip all the same consecutive characters.

Possible change which you could think of in your code is:
Time Complexity: Your code is achieving output in O(n) time complexity, which might be the best possible way.
Space Complexity: Your code is using extra memory space which arises due to splitting.
Question to ask: Can you achieve this output, without using the extra space for character array that you get after splitting the string? (as character by character traversal is possible directly on string).
I can provide you the code here but, it would be great if you could try it on your own, once you are done with your attempts
you can lookup for the best solution here (you are almost there)
https://www.geeksforgeeks.org/remove-consecutive-duplicates-string/
Good luck!

As mentioned before, it is much better to access the characters in the string using method String::charAt or at least by iterating a char array retrieved with String::toCharArray instead of splitting the input string into String array.
However, Java strings may contain characters exceeding basic multilingual plane of Unicode (e.g. emojis 😂😍😊, Chinese or Japanese characters etc.) and therefore String::codePointAt should be used. Respectively, Character.charCount should be used to calculate appropriate offset while iterating the input string.
Also the input string should be checked if it's null or empty, so the resulting code may look like this:
public static String dedup(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int prev = -1;
int n = str.length();
System.out.println("length = " + n + " of [" + str + "], real length: " + str.codePointCount(0, n));
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ) {
int cp = str.codePointAt(i);
if (i == 0 || cp != prev) {
sb.appendCodePoint(cp);
}
prev = cp;
i += Character.charCount(cp); // for emojis it returns 2
}
return sb.toString();
}
A version with String::charAt may look like this:
public static String dedup2(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int n = str.length();
StringBuilder sb = new StringBuilder(n);
sb.append(str.charAt(0));
for (int i = 1; i < n; i++) {
if (str.charAt(i) != str.charAt(i - 1)) {
sb.append(str.charAt(i));
}
}
return sb.toString();
}
The following test proves that charAt fails to deduplicate repeated emojis:
System.out.println("codePoint: " + dedup ("😂😂😍😍😊😊😂 hello"));
System.out.println("charAt: " + dedup2("😂😂😍😍😊😊😂 hello"));
Output:
length = 20 of [😂😂😍😍😊😊😂 hello], real length: 13
codePoint: 😂😍😊😂 helo
charAt: 😂😂😍😍😊😊😂 helo

Related

How can I get the whole String printed out and not only the single character?

Here is my problem: I'm building a StringBuilder and I built a toLowerCase() method which only gives me back a single character and not the whole string.
public MyStringBuilder1 toLowerCase() {
String newStr = "";
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) >= 'A' && s.charAt(i) <= 'Z') {
newStr = newStr + (char)(s.charAt(i) + 32) + "";
}
}
return new MyStringBuilder1(newStr);
}
public static void main(String[] args) {
// Create a MyStringBuilder1 object
MyStringBuilder1 str1 = new MyStringBuilder1("Radixsort");
// Display string as lowercase
System.out.println("\nString to lower case: " + str1.toLowerCase());
}
you can use toString() on StringBuilder in order to return String
but you aren't leveraging the StringBuilder power, when it come to long strings the concatenation of them might take great amount of resources
in order to concatenate strings, one string is copied to a new slot in the memory and then the next one
StringBuilder gives an List like behaviour, you can append char dynamically to the stringBuilder with much less effort
so instead of
newStr = newStr + (char)(s.charAt(i) + 32) + "";
you better use
stringBuilder.append((char)(s.charAt(i) + 32))
suggest to read the following tutorial
keep in mind, the significant performance gain will increase with the string length, but on short strings, the compiler optimization will yield much better performance

Remove duplicate characters in a string in Java

I started to read the famous "cracking the Coding Interview" book.
Design an algorithm and write code to remove the duplicate characters in a string
without using any additional buffer. NOTE: One or two additional variables are fine.
An extra copy of the array is not.
I found a similar topic here : Remove the duplicate characters in a string
The solution given by the author was that :
public static void removeDuplicates(char[] str) {
if (str == null) return;
int len = str.length;
if (len < 2) return;
int tail = 1;
for (int i = 1; i < len; ++i) {
int j;
for (j = 0; j < tail; ++j) {
if (str[i] == str[j]) break;
}
if (j == tail) {
str[tail] = str[i];
++tail;
}
}
str[tail] = 0;
}
The problem here is that the author used an array to be an argument for this function. So my question is : how can you write an algorithms with a STRING as an argument? Because I felt like it's really easier to use an array here and it's like that you "avoid the difficulty" of the exercice (in my opinion, I'm a newly Java developer).
How can you write such an algorithm?
Java strings are immutable, so you can't do it with a string without copying the array into a buffer.
for this to work with a String you'd have to return a String from the method that represents the modified str with no duplicates. not sure if it'll go against the rules, but here's how I'd solve the problem with String's:
for each character in the string, i would split the string at that character. i would remove all instances of that character from the latter substring. i would then concatenate the former substring with the modified latter substring, making sure that the character is still kept in it's place. something like this:
public static String removeDuplicates( String str ) {
if( str == null || str.length() < 2 )
return str;
String temp;
for( int x = 0; x + 1 < str.length(); x++ ) {
temp = str.charAt( x ) + "";
str = str.substring( 0, x ) + temp + str.substring( x + 1 ).replaceAll( temp, "" );
}
return str;
}
In Java 8 we can do it like this
private void removeduplicatecharactersfromstring() {
String myString = "aabcd eeffff ghjkjkl";
StringBuilder builder = new StringBuilder();
System.out.println(myString);
Arrays.asList(myString.split(" "))
.forEach(s -> {
builder.append(Stream.of(s.split(""))
.distinct().collect(Collectors.joining()).concat(" "));
});
System.out.println(builder); // abcd ef ghjkl
}

Replace " " of a string with "%20" - Complexity issue, which of the two below mentioned should be preferred?

Converting it to char array and then concatenating it back replacing spaces with "%20".
OR
Dividing string into substrings with "white space" as the "separator" and just combining the strings with "%20" between them.
For eg:
Str = "This is John Shaw "
(There are as many extra spaces at the end as there are spaces in the string)
expected outcome:
"This%20is%20John%20Shaw"
Is it not this ?
txt = txt.replaceAll(" ", "%20");
Let me know if I understood it wrong.
By replaceAll method of the String class as follow.
String str = "This is John Shaw ";
str = str.replaceAll(" ", "%20");
Output
This%20is%20John%20Shaw%20
You can write both algorithms with a complexity O(n) where n is the number of characters in the String but there are much better algorithms to do that.
By the way I wrote an example that show you the computing time, one method is faster than the other but they are both, as I said, O(n)
public class ComplexityTester
{
//FIRST METHOD
public static String replaceSpacesArray(String str)
{
str = str.trim(); // leading and trailing whitespaces omitted
char[] charArray = str.toCharArray();
String result = "";
for(int i = 0; i<charArray.length; i++) // it replaces spaces with %20
{
if(charArray[i] == ' ') //it's a space, replace it!
result += "%20";
else //it's not a space, add it!
result += charArray[i];
}
return result;
}
//SECOND METHOD
public static String replaceSpacesWithSubstrings(String str)
{
str = str.trim(); // leading and trailing whitespaces omitted
String[] words = new String[5]; //array of strings, to add substrings
int wordsSize = 0; //strings in the array
//From the string to an array of substrings
//(the words separated by spaces of the string)
int indexFrom = 0;
int indexTo = 1;
while(indexTo<=str.length())
{
if(wordsSize == words.length) //if the array is full, resize it!
words = resize(words);
//we reach the end of the sting, add the last word to the array!
if(indexTo == str.length())
{
words[wordsSize++] = str.substring(indexFrom, indexTo++);
}
else if(str.substring(indexTo-1,indexTo).equals(" "))//it's a space
{
//we add the last word to the array
words[wordsSize++] = str.substring(indexFrom, indexTo-1);
indexFrom = indexTo; //update the indices
indexTo++;
}
else //it's a character not equal to space
{
indexTo++; //update the index
}
}
String result = "";
// From the array to the result string
for(int i = 0; i<wordsSize; i++)
{
result += words[i];
if(i+1!=wordsSize)
result += "%20";
}
return result;
}
private static String[] resize(String[] array)
{
int newLength = array.length*2;
String[] newArray = new String[newLength];
System.arraycopy(array,0,newArray,0,array.length);
return newArray;
}
public static void main(String[] args)
{
String example = "The Java Tutorials are practical guides "
+"for programmers who want to use the Java programming "
+"language to create applications. They include hundreds "
+"of complete, working examples, and dozens of lessons. "
+"Groups of related lessons are organized into \"trails\"";
String testString = "";
for(int i = 0; i<100; i++) //String 'testString' is string 'example' repeted 100 times
{
testString+=example;
}
long time = System.currentTimeMillis();
replaceSpacesArray(testString);
System.out.println("COMPUTING TIME (ARRAY METHOD) = "
+ (System.currentTimeMillis()-time));
time = System.currentTimeMillis();
replaceSpacesWithSubstrings(testString);
System.out.println("COMPUTING TIME (SUBSTRINGS METHOD) = "
+ (System.currentTimeMillis()-time));
}
}

How to extract the left most common characters in a string list?

Assume I have the following list of string objects:
ABC1, ABC2, ABC_Whatever
What's the most efficient way to extract the left most common characters from this list ? So I'd get ABC in my case.
StringUtils.getCommonPrefix(String... strs) from Apache Commons Lang.
This will work for you
public static void main(String args[]) {
String commonInFirstTwo=greatestCommon("ABC1","ABC2");
String commonInLastTwo=greatestCommon("ABC2","ABC_Whatever");
System.out.println(greatestCommon(commonInFirstTwo,commonInLastTwo));
}
public static String greatestCommon(String a, String b) {
int minLength = Math.min(a.length(), b.length());
for (int i = 0; i < minLength; i++) {
if (a.charAt(i) != b.charAt(i)) {
return a.substring(0, i);
}
}
return a.substring(0, minLength);
}
You hash all the substrings of the words in the given list and keep track of those substrings. The one with the maximum occurrences is the one you want. Here is a sample implementation. It returns the most common substring
static String mostCommon(List<String> list) {
Map<String, Integer> word2Freq = new HashMap<String, Integer>();
String maxFreqWord = null;
int maxFreq = 0;
for (String word : list) {
for (int i = 0; i < word.length(); ++i) {
String sub = word.substring(0, i + 1);
Integer f = word2Freq.get(sub);
if (f == null) {
f = 0;
}
word2Freq.put(sub, f + 1);
if (f + 1 > maxFreq) {
if (maxFreqWord == null || maxFreqWord.length() < sub.length()) {
maxFreq = f + 1;
maxFreqWord = sub;
}
}
}
}
return maxFreqWord;
}
The above implementation may not suffice if you more than one common substring. Use the map within it.
System.out.println(mostCommon(Arrays.asList("ABC1", "ABC2", "ABC_Whatever")));
System.out.println(mostCommon(Arrays.asList("ABCDEFG1", "ABGG2", "ABC11_Whatever")));
Returns
ABC
AB
Your problem is just a rephrase of the standard problem of finding the longest common prefix
If you know what the common characters are, then you could check if the other strings contain those characters by using the .contains() method.
If you're willing to use a third party library, then the following using jOOλ generates that prefix for you:
String prefix = Seq.of("ABC1", "ABC2", "ABC_Whatever").commonPrefix();
Disclaimer: I work for the company behind jOOλ
if there are N strings and the minimum length among them is M charterers, then the most efficient (correct) answer will take N * M at worst case (when all strings are same).
outer loop - each character of first string at a time
inner loop - each of the strings
test - each charterer of the string in inner
loop against the charterer in outer loop.
the performance can be tuned upto (N-1) * M if we do not test against the first string in ther inner loop

Removing contiguous spaces in a String without trim() and replaceAll()

I have to remove leading and trailing spaces from the given string as well as combine the contiguous spaces. For example,
String str = " this is a string containing numerous whitespaces ";
and I need to return it as:
"this is a string containing numerous whitespaces";
But the problem is I can't use String#trim(). (This is a homework and I'm not allowed to use such methods.) I'm currently trying it by accessing each character one-by-one but quite unsuccessful.
I need an optimized code for this. Could anybody help? I need it to be done by today :(
EDIT: Answer posted before we were told we couldn't use replaceAll. I'm leaving it here on the grounds that it may well be useful to other readers, even if it's not useful to the OP.
I need an optimized code for this.
Do you really need it to be opimtized? Have you identified this as a bottleneck?
This should do it:
str = str.replaceAll("\\s+", " ");
That's a regular expression to say "replace any contintiguous whitespace with a single space". It may not be the fastest possible, but I'd benchmark it before trying anything else.
Note that this will replace all whitespace with spaces - so if you have tabs or other whitespace characters, they will be replaced with spaces too.
I'm not permitted to use these methods. I've to do this with loops
and all.
So i wrote for you some little snipet of code if you can't use faster and more efficient way:
String str = " this is a string containing numerous whitespaces ";
StringBuffer buff = new StringBuffer();
String correctedString = "";
boolean space = false;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (c == ' ') {
if (!space && i > 0) {
buff.append(c);
}
space = true;
}
else {
buff.append(c);
space = false;
}
}
String temp = buff.toString();
if (temp.charAt(temp.length() - 1) == ' ') {
correctedString = temp.substring(0, buff.toString().length() - 1);
System.out.println(correctedString);
}
System.out.println(buff.toString())
Note:
But this is "harcoded" and only for "learning".
More efficient way is for sure use approaches pointed out by #JonSkeet and #BrunoReis
What about str = str.replaceAll(" +", " ").trim();?
If you don't want to use trim() (and I really don't see a reason not to), replace it with:
str = str.replaceAll(" +", " ").replaceAll("^ ", "").replaceAll(" $", "");`
Remove White Spaces without Using any inbuilt library Function
this is just a simple example with fixed array size.
public class RemWhite{
public static void main(String args[]){
String s1=" world qwer ";
int count=0;
char q[]=new char[9];
char ch[]=s1.toCharArray();
System.out.println(ch);
for(int i=0;i<=ch.length-1;i++)
{
int j=ch[i];
if(j==32)
{
continue;
}
else
q[count]=ch[i];
count++;
}
System.out.println(q);
}}
To remove single or re-occurrence of space.
public class RemoveSpace {
public static void main(String[] args) {
char space = ' ';
int ascii = (int) space;
String str = " this is a string containing numerous whitespaces ";
char c[] = str.toCharArray();
for (int i = 0; i < c.length - 1; i++) {
if (c[i] == ascii) {
continue;
} else {
System.out.print(c[i]);
}
}
}
}
If you don't want to use any inbuilt methods here's what you refer
private static String trim(String s)
{
String s1="";boolean nonspace=false;
for(int i=0;i<s.length();i++)
{
if(s.charAt(i)!=' ' || nonspace)
{
s1 = s1+s.charAt(i);
nonspace = true;
}
}
nonspace = false;
s="";
for(int i=s1.length()-1;i>=0;i--)
{
if(s1.charAt(i)!=' ' || nonspace)
{
s = s1.charAt(i)+s;
nonspace = true;
}
}
return s;
}
package removespace;
import java.util.Scanner;
public class RemoveSpace {
public static void main(String[] args) {
Scanner scan= new Scanner(System.in);
System.out.println("Enter the string");
String str= scan.nextLine();
String str2=" ";
char []arr=str.toCharArray();
int i=0;
while(i<=arr.length-1)
{
if(arr[i]==' ')
{
i++;
}
else
{
str2= str2+arr[i];
i++;
}
}
System.out.println(str2);
}
}
This code is used for removing the white spaces and re-occurrence of alphabets in the given string,without using trim(). We accept a string from user. We separate it in characters by using charAt() then we compare each character with null(' '). If null is found we skip it and display that character in the else part. For skipping the null we increment the index i by 1.
try this code to get the solution of your problem.
String name = " abc ";
System.out.println(name);
for (int i = 0; i < name.length(); i++) {
char ch = name.charAt(i);
if (ch == ' ') {
i = 2 + i - 2;
} else {
System.out.print(name.charAt(i));
}
}

Categories