Java CharAt() and deleteCharAt() performance

Java CharAt() and deleteCharAt() performance - java

I've been wondering about the implementation of charAt function for String/StringBuilder/StringBuffer in java
what is the complexity of that ?
also what about the deleteCharAt() in StringBuffer/StringBuilder ?

For String, StringBuffer, and StringBuilder, charAt() is a constant-time operation.
For StringBuffer and StringBuilder, deleteCharAt() is a linear-time operation.
StringBuffer and StringBuilder have very similar performance characteristics. The primary difference is that the former is synchronized (so is thread-safe) while the latter is not.

Let us just look at the corresponding actual java implementation(only relevant code) for each of these methods in turn. That itself will answer about their efficiency.
String.charAt :
public char charAt(int index) {
if ((index < 0) || (index >= value.length)) {
throw new StringIndexOutOfBoundsException(index);
}
return value[index];
}
As we can see, it is just a single array access which is a constant time operation.
StringBuffer.charAt :
public synchronized char charAt(int index) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
return value[index];
}
Again, single array access, so a constant time operation.
StringBuilder.charAt :
public char charAt(int index) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
return value[index];
}
Again, single array access, so a constant time operation. Even though all these three methods look same, there are some minor differences. For example, only StringBuffer.charAt method is synchronized but not other methods. Similarly if check is slightly different for String.charAt (guess why?). Closer look at these method implementations itself give us other minor differences among them.
Now, let us look at deleteCharAt implementations.
String does not have deleteCharAt method. The reason might be it is an immutable object. So exposing an API which explicitly indicates that this method modifies the object is not probably a good idea.
Both StringBuffer and StringBuilder are subclasses of AbstractStringBuilder. The deleteCharAt method of these two classes is delegating the implementation to its parent class itself.
StringBuffer.deleteCharAt :
public synchronized StringBuffer deleteCharAt(int index) {
super.deleteCharAt(index);
return this;
}
StringBuilder.deleteCharAt :
public StringBuilder deleteCharAt(int index) {
super.deleteCharAt(index);
return this;
}
AbstractStringBuilder.deleteCharAt :
public AbstractStringBuilder deleteCharAt(int index) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
System.arraycopy(value, index+1, value, index, count-index-1);
count--;
return this;
}
A closer look at AbstractStringBuilder.deleteCharAt method reveals that it is actually calling System.arraycopy. This can be O(N) in worst case. So deleteChatAt method is O(N) time complexity.

The charAt method is O(1).
The deleteCharAt method on StringBuilder and StringBuffer is O(N) on average, assuming you are deleting a random character from an N character StringBuffer / StringBuilder. (It has to move, on average, half of the remaining characters to fill up the "hole" left by the deleted character. There is no amortization over multiple operations; see below.) However, if you delete the last character, the cost will be O(1).
There is no deleteCharAt method for String.
In theory, StringBuilder and StringBuffer could be optimized for the case where you are inserting or deleting multiple characters in a "pass" through the buffer. They could do this by maintaining an optional "gap" in the buffer, and moving characters across it. (IIRC, emacs implements its text buffers this way.) The problems with this approach are:
It requires more space, for the attributes that say where the gap is, and for the gap itself.
It makes the code a lot more complicated, and slows down other operations. For instance, charAt would have to compare the offset with the start and end points of the gap, and make the corresponding adjustments to the actual index value before fetching the character array element.
It is only going to help if the application does multiple inserts / deletes on the same buffer.
Not surprisingly, this "optimization" has not been implemented in the standard StringBuilder / StringBuffer classes. However, a custom CharSequence class could use this approach.

charAt is super fast (and can use intrinsics for String), it's a simple index into an array. deleteCharAt would require an arraycopy, thus deleting a char won't be fast.

Since we all know that the string is implemented in JDK as a character array, which implements the randomAccess interface. Therefore the time complexity of charAt should be int O(1). As other arrays, the delete operation has the O(n) time complexity.

Summary of all responses from above:
charAt is O(1) since its just accessing the index of an array
deleteCharAt can be O(N) in worst case since it copies the entire array for it.

Related

Count the Characters in a String Recursively & treat "eu" as a Single Character

I am new to Java, and I'm trying to figure out how to count Characters in the given string and threat a combination of two characters "eu" as a single character, and still count all other characters as one character.
And I want to do that using recursion.
Consider the following example.
Input:
"geugeu"
Desired output:
4 // g + eu + g + eu = 4
Current output:
2
I've been trying a lot and still can't seem to figure out how to implement it correctly.
My code:
public static int recursionCount(String str) {
if (str.length() == 1) {
return 0;
}
else {
String ch = str.substring(0, 2);
if (ch.equals("eu") {
return 1 + recursionCount(str.substring(1));
}
else {
return recursionCount(str.substring(1));
}
}
}

OP wants to count all characters in a string but adjacent characters "ae", "oe", "ue", and "eu" should be considered a single character and counted only once.
Below code does that:
public static int recursionCount(String str) {
int n;
n = str.length();
if(n <= 1) {
return n; // return 1 if one character left or 0 if empty string.
}
else {
String ch = str.substring(0, 2);
if(ch.equals("ae") || ch.equals("oe") || ch.equals("ue") || ch.equals("eu")) {
// consider as one character and skip next character
return 1 + recursionCount(str.substring(2));
}
else {
// don't skip next character
return 1 + recursionCount(str.substring(1));
}
}
}

Recursion explained
In order to address a particular task using Recursion, you need a firm understanding of how recursion works.
And the first thing you need to keep in mind is that every recursive solution should (either explicitly or implicitly) contain two parts: Base case and Recursive case.
Let's have a look at them closely:
Base case - a part that represents a simple edge-case (or a set of edge-cases), i.e. a situation in which recursion should terminate. The outcome for these edge-cases is known in advance. For this task, base case is when the given string is empty, and since there's nothing to count the return value should be 0. That is sufficient for the algorithm to work, outcomes for other inputs should be derived from the recursive case.
Recursive case - is the part of the method where recursive calls are made and where the main logic resides. Every recursive call eventually hits the base case and stars building its return value.
In the recursive case, we need to check whether the given string starts from a particular string like "eu". And for that we don't need to generate a substring (keep in mind that object creation is costful). instead we can use method String.startsWith() which checks if the bytes of the provided prefix string match the bytes at the beginning of this string which is chipper (reminder: starting from Java 9 String is backed by an array of bytes, and each character is represented either with one or two bytes depending on the character encoding) and we also don't bother about the length of the string because if the string is shorter than the prefix startsWith() will return false.
Implementation
That said, here's how an implementation might look:
public static int recursionCount(String str) {
if(str.isEmpty()) {
return 0;
}
return str.startsWith("eu") ?
1 + recursionCount(str.substring(2)) : 1 + recursionCount(str.substring(1));
}
Note: that besides from being able to implement a solution, you also need to evaluate it's Time and Space complexity.
In this case because we are creating a new string with every call time complexity is quadratic O(n^2) (reminder: creation of the new string requires allocating the memory to coping bytes of the original string). And worse case space complexity also would be O(n^2).
There's a way of solving this problem recursively in a linear time O(n) without generating a new string at every call. For that we need to introduce the second argument - current index, and each recursive call should advance this index either by 1 or by 2 (I'm not going to implement this solution and living it for OP/reader as an exercise).
In addition
In addition, here's a concise and simple non-recursive solution using String.replace():
public static int count(String str) {
return str.replace("eu", "_").length();
}
If you would need handle multiple combination of character (which were listed in the first version of the question) you can make use of the regular expressions with String.replaceAll():
public static int count(String str) {
return str.replaceAll("ue|au|oe|eu", "_").length();
}

Equals validation vs indexOf validation?

I need to validate if one String contains the char $ before replace this one.
I did two implementations for this propose.
The first implementation always execute replace(char oldChar, char newChar) and equals(Object anObject) as validation.
String getImportLine(Class<?> clazz) {
String importLine = toSanitizedClassName(clazz.getName());
String importStaticLine = importLine.replace('$', '.');
if (importLine.equals(importStaticLine)) {
return String.format("import %s;", importLine);
}
return String.format("import static %s;", importStaticLine);
}
This implementation parses the string two times with:
importLine.replace('$', '.')
importLine.equals(importStaticLine)
The second implementation uses indexOf(int ch) as validation and replace(char oldChar, char newChar) in the worst case.
String getImportLine(Class<?> clazz) {
String importLine = toSanitizedClassName(clazz.getName());
if (importLine.indexOf('$') == -1) {
return String.format("import %s;", importLine);
}
importLine = importLine.replace('$', '.');
return String.format("import static %s;", importLine);
}
The second implementation, in the worst case, parse the string two times with:
importLine.indexOf('$') == -1
importLine.replace('$', '.')
Is there some difference in terms of performance between the use of equals vs indexOf as validation?

What you are asking are the difference in execution time between String.indexOf and String.equals. With Big-O notation these are the same, since both (worst case) will iterate through the entire String before returning.
In practice, it really depends on the input.
For instance:
equals will return pretty much immediatly if the two strings compared are a different length
equals will return sooner if the difference in the strings occur early ("abcdef".equals("aXcdef") is faster than "abcdef".equals("abcdeX"))
indexOf('$') will be faster if $ occurs early in the string ("a$cdef".indexOf('$') is faster than "abcde$".indexOf('$'))
indexOf will be slower if the input char is a special character
On modern computers this should not matter, since they are so fast that any difference will be unnoticable, unless the method is called hundreds of thousands of times (or with really large input strings). When optimizing code one should focus on saving seconds, not nanoseconds. With your current problem you should be a lot more worried about making your code readable and understandable to others than you should be worried about which uses the most CPU cycles..

Java StringBuilder Delete Last Occurance of Character Efficiently

What is the most efficient way to delete the last occurance of a char from a StringBuilder?
My current solution is O(N), but I feel like this problem can be solved in constant time.
public StringBuilder deleteLastOccurance(StringBuilder builder, char c) {
int lastIndex = builder.lastIndexOf(String.valueOf(c));
if (lastIndex != -1) {
builder.deleteCharAt(lastIndex); // O(N)
}
return builder;
}

In the end it will be an O(n) time no matter what. There is no other way to determine the last character without checking all the way to the end.
Even internal java API methods will have the same underlying implementation.

Space complexity of a recursive algorithm

I have a recursive algorithm to find a palindrome in Java. It should return true if the given string is palindrome. False otherwise. This method uses substring method, which is little bit trickier to find the complexity.
Here's my algorithm:
static boolean isPalindrome (String str) {
if (str.length() > 1) {
if (str.charAt(0) == (str.charAt(str.length() - 1))) {
if (str.length() == 2) return true;
return isPalindrome(str.substring(1, str.length() - 1));
}
return false;
}
else {
return true;
}
}
What is the space complexity of this algorithm ?
I mean, when I call the method substring(), does it create a new string all the time ? What actually substring method do in Java ?

In older versions of Java (mainly in Java 6 and before)*, substring returned a new instance that shared the internal char array of the longer string (that is nicely illustrated here). Then substring had time and a space complexity of O(1).
Newer versions use a different representation of String, which does not rely on a shared array. Instead, substring allocates a new array of just the required size, and copies the contents from the longer string. Then substring has a time and a space complexity of O(n).
*Actually the change was introduced in update 6 of Java 7.

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.
Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?
Example for StringBuilder:
StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
newString.setCharAt(i, 'X');
}
Example for char array conversion:
char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
myNameChars[i] = 'X';
}
myString = String.valueOf(newStringArray);
What are the pros/cons to each different way?
I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.

I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:
Sting s = "foo";
s += "bar";
s += "baz";
If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.
Also it's important to know that these strings are not being modified. In java, Strings are immutable.
This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.

What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.
As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.
If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.
If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.
On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.
For example:
public String toString() {
return "{field1 =" + this.field1 +
", field2 =" + this.field2 +
...
", field50 =" + this.field50 + "}";
}
Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.
String s = ...;
if (someCondition) {
s += someValue;
}
s += additionalValue;
return s;
Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)
String s = "";
for (final Object item : items) {
s += item + "\n";
}
Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.

Which option will perform the best is not an easy question.
I did a benchmark using Caliper:
RUNTIME (NS)
array 88
builder 126
builderTillEnd 76
concat 3435
Benchmarked methods:
public static String array(String input)
{
char[] result = input.toCharArray(); // COPYING
for (int i = 0; i < input.length(); i++)
{
result[i] = 'X';
}
return String.valueOf(result); // COPYING
}
public static String builder(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result.toString(); // COPYING
}
public static StringBuilder builderTillEnd(String input)
{
StringBuilder result = new StringBuilder(input); // COPYING
for (int i = 0; i < input.length(); i++)
{
result.setCharAt(i, 'X');
}
return result;
}
public static String concat(String input)
{
String result = "";
for (int i = 0; i < input.length(); i++)
{
result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
// result = new StringBuilder(result).append('X').toString();
}
return result;
}
Remarks
If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.
java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:
public void setCharAt(int index, char ch) {
if ((index < 0) || (index >= count))
throw new StringIndexOutOfBoundsException(index);
value[index] = ch;
}
AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.
Conclusions
If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.
If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.
Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().

I'd prefer to use StringBuilder class where original string is modified.
For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java CharAt() and deleteCharAt() performance - java

I've been wondering about the implementation of charAt function for String/StringBuilder/StringBuffer in java what is the complexity of that ? also what about the deleteCharAt() in StringBuffer/StringBuilder ?

charAt is super fast (and can use intrinsics for String), it's a simple index into an array. deleteCharAt would require an arraycopy, thus deleting a char won't be fast.

Since we all know that the string is implemented in JDK as a character array, which implements the randomAccess interface. Therefore the time complexity of charAt should be int O(1). As other arrays, the delete operation has the O(n) time complexity.

Summary of all responses from above: charAt is O(1) since its just accessing the index of an array deleteCharAt can be O(N) in worst case since it copies the entire array for it.

Related

Count the Characters in a String Recursively & treat "eu" as a Single Character

Equals validation vs indexOf validation?

Java StringBuilder Delete Last Occurance of Character Efficiently

Space complexity of a recursive algorithm

Best way to modify an existing string? StringBuilder or convert to char array and back to string?

Categories

Resources