charAt() or substring? Which is faster?

charAt() or substring? Which is faster? - java

I want to go through each character in a String and pass each character of the String as a String to another function.
String s = "abcdefg";
for(int i = 0; i < s.length(); i++){
newFunction(s.substring(i, i+1));}
or
String s = "abcdefg";
for(int i = 0; i < s.length(); i++){
newFunction(Character.toString(s.charAt(i)));}
The final result needs to be a String. So any idea which will be faster or more efficient?

As usual: it doesn't matter but if you insist on spending time on micro-optimization or if you really like to optimize for your very special use case, try this:
import org.junit.Assert;
import org.junit.Test;
public class StringCharTest {
// Times:
// 1. Initialization of "s" outside the loop
// 2. Init of "s" inside the loop
// 3. newFunction() actually checks the string length,
// so the function will not be optimized away by the hotstop compiler
#Test
// Fastest: 237ms / 562ms / 2434ms
public void testCacheStrings() throws Exception {
// Cache all possible Char strings
String[] char2string = new String[Character.MAX_VALUE];
for (char i = Character.MIN_VALUE; i < Character.MAX_VALUE; i++) {
char2string[i] = Character.toString(i);
}
for (int x = 0; x < 10000000; x++) {
char[] s = "abcdefg".toCharArray();
for (int i = 0; i < s.length; i++) {
newFunction(char2string[s[i]]);
}
}
}
#Test
// Fast: 1687ms / 1725ms / 3382ms
public void testCharToString() throws Exception {
for (int x = 0; x < 10000000; x++) {
String s = "abcdefg";
for (int i = 0; i < s.length(); i++) {
// Fast: Creates new String objects, but does not copy an array
newFunction(Character.toString(s.charAt(i)));
}
}
}
#Test
// Very fast: 1331 ms/ 1414ms / 3190ms
public void testSubstring() throws Exception {
for (int x = 0; x < 10000000; x++) {
String s = "abcdefg";
for (int i = 0; i < s.length(); i++) {
// The fastest! Reuses the internal char array
newFunction(s.substring(i, i + 1));
}
}
}
#Test
// Slowest: 2525ms / 2961ms / 4703ms
public void testNewString() throws Exception {
char[] value = new char[1];
for (int x = 0; x < 10000000; x++) {
char[] s = "abcdefg".toCharArray();
for (int i = 0; i < s.length; i++) {
value[0] = s[i];
// Slow! Copies the array
newFunction(new String(value));
}
}
}
private void newFunction(String string) {
// Do something with the one-character string
Assert.assertEquals(1, string.length());
}
}

The answer is: it doesn't matter.
Profile your code. Is this your bottleneck?

Does newFunction really need to take a String? It would be better if you could make newFunction take a char and call it like this:
newFunction(s.charAt(i));
That way, you avoid creating a temporary String object.
To answer your question: It's hard to say which one is more efficient. In both examples, a String object has to be created which contains only one character. Which is more efficient depends on how exactly String.substring(...) and Character.toString(...) are implemented on your particular Java implementation. The only way to find it out is running your program through a profiler and seeing which version uses more CPU and/or more memory. Normally, you shouldn't worry about micro-optimizations like this - only spend time on this when you've discovered that this is the cause of a performance and/or memory problem.

Of the two snippets you've posted, I wouldn't want to say. I'd agree with Will that it almost certainly is irrelevant in the overall performance of your code - and if it's not, you can just make the change and determine for yourself which is fastest for your data with your JVM on your hardware.
That said, it's likely that the second snippet would be better if you converted the String into a char array first, and then performed your iterations over the array. Doing it this way would perform the String overhead once only (converting to the array) instead of every call. Additionally, you could then pass the array directly to the String constructor with some indices, which is more efficient than taking a char out of an array to pass it individually (which then gets turned into a one character array):
String s = "abcdefg";
char[] chars = s.toCharArray();
for(int i = 0; i < chars.length; i++) {
newFunction(String.valueOf(chars, i, 1));
}
But to reinforce my first point, when you look at what you're actually avoiding on each call of String.charAt() - it's two bounds checks, a (lazy) boolean OR, and an addition. This is not going to make any noticeable difference. Neither is the difference in the String constructors.
Essentially, both idioms are fine in terms of performance (neither is immediately obviously inefficient) so you should not spend any more time working on them unless a profiler shows that this takes up a large amount of your application's runtime. And even then you could almost certainly get more performance gains by restructuring your supporting code in this area (e.g. have newFunction take the whole string itself); java.lang.String is pretty well optimised by this point.

I would first obtain the underlying char[] from the source String using String.toCharArray() and then proceed to call newFunction.
But I do agree with Jesper that it would be best if you could just deal with characters and avoid all the String functions...

Leetcode seems to prefer the substring option here.
This is how I solved that problem:
class Solution {
public int strStr(String haystack, String needle) {
if(needle.length() == 0) {
return 0;
}
if(haystack.length() == 0) {
return -1;
}
for(int i=0; i<=haystack.length()-needle.length(); i++) {
int count = 0;
for(int j=0; j<needle.length(); j++) {
if(haystack.charAt(i+j) == needle.charAt(j)) {
count++;
}
}
if(count == needle.length()) {
return i;
}
}
return -1;
}
}
And this is the optimal solution they give:
class Solution {
public int strStr(String haystack, String needle) {
int length;
int n=needle.length();
int h=haystack.length();
if(n==0)
return 0;
// if(n==h)
// length = h;
// else
length = h-n;
if(h==n && haystack.charAt(0)!=needle.charAt(0))
return -1;
for(int i=0; i<=length; i++){
if(haystack.substring(i, i+needle.length()).equals(needle))
return i;
}
return -1;
}
}
Honestly, I can't figure out why it would matter.

Related

Performance tips questions

public void zero() {
int sum = 0;
for (int i = 0; i < mArray.length; ++i) {
sum += mArray[i].mSplat;
}
}
public void one() {
int sum = 0;
Foo[] localArray = mArray;
int len = localArray.length;
for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;
}
}
According to Android documentation, in above code, zero is slower. But I don't understand why ? well I haven't learn that much deep but as I know length is a field not method. So when loop retrieves its value, how its different from retrieving from local variable ? and array length is always fixed once initialized. What am I missing ?

Well I guess this is because at zero, he always needs to retrieve the information from mArray and in one, he has it accessible. This means, zero needs two "methods":
Access mArray
Access mArray.length
But one only needs one "methods":
Access len

In the first example, the JVM needs to first fetch the reference to the array and then access its length field.
In the second example, it only accesses one local variable.
On desktop JVMs this is generally optimised and the two methods are equivalent but it seems that Android's JVM does not do it... yet...

It is a matter of scope. Accessing an instance variable is slower than a method variable because it is not stored in the same memory places. (because method variables are likely to be accessed more often).
Same goes for len, but with an extra optimization. len cannot be changed from outside the method, and the compiler can see that it will never change. Therefore, its value is more predictable and the loop can be further optimized.

public void zero() {
int sum = 0;
for (int i = 0; i < mArray.length; ++i) {
sum += mArray[i].mSplat;
}
}
Here if you look at the for loop array length is calculated for every iteration, that degrades
the performance.
public void one() {
int sum = 0;
Foo[] localArray = mArray;
int len = localArray.length;
for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;
}
}
In this case the length is calculated before for loop and then used in the loop.

Using for loop to get the Hamming distance between 2 strings

In this task i need to get the Hamming distance (the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different - from Wikipedia) between the two strings sequence1 and sequence2.
First i made 2 new strings which is the 2 original strings but both with lowered case to make comparing easier. Then i resorted to using the for loop and if to compare the 2 strings. For any differences in characters in these 2 pair of string, the loop would add 1 to an int x = 0. The returns of the method will be the value of this x.
public static int getHammingDistance(String sequence1, String sequence2) {
int a = 0;
String sequenceX = sequence1.toLowerCase();
String sequenceY = sequence2.toLowerCase();
for (int x = 0; x < sequenceX.length(); x++) {
for (int y = 0; y < sequenceY.length(); y++) {
if (sequenceX.charAt(x) == sequenceY.charAt(y)) {
a += 0;
} else if (sequenceX.charAt(x) != sequenceY.charAt(y)) {
a += 1;
}
}
}
return a;
}
So does the code looks good and functional enough? Anything i could to fix or to optimize the code? Thanks in advance. I'm a huge noob so pardon me if i asked anything silly

From my point the following implementation would be ok:
public static int getHammingDistance(String sequence1, String sequence2) {
char[] s1 = sequence1.toCharArray();
char[] s2 = sequence2.toCharArray();
int shorter = Math.min(s1.length, s2.length);
int longest = Math.max(s1.length, s2.length);
int result = 0;
for (int i=0; i<shorter; i++) {
if (s1[i] != s2[i]) result++;
}
result += longest - shorter;
return result;
}
uses array, what avoids the invocation of two method (charAt) for each single char that needs to be compared;
avoid exception when one string is longer than the other.

your code is completely off.
as you said yourself, the distance is the number of places where the strings differ - so you should only have 1 loop, going over both strings at once. instead you have 2 nested loops that compare every index in string a to every index in string b.
also, writing an if condition that results in a+=0 is a waste of time.
try this instead:
for (int x = 0; x < sequenceX.length(); x++) { //both are of the same length
if (sequenceX.charAt(x) != sequenceY.charAt(x)) {
a += 1;
}
}
also, this is still a naive approach which will probbaly not work with complex unicode characters (where 2 characters can be logically equal yet not have the same character code)

public static int getHammingDistance(String sequenceX, String sequenceY) {
int a = 0;
// String sequenceX = sequence1.toLowerCase();
//String sequenceY = sequence2.toLowerCase();
if (sequenceX.length() != sequenceY.length()) {
return -1; //input strings should be of equal length
}
for (int i = 0; i < sequenceX.length(); i++) {
if (sequenceX.charAt(i) != sequenceY.charAt(i)) {
a++;
}
}
return a;
}

Your code is OK, however I'd suggest you the following improvements.
do not use charAt() of string. Get char array from string using toCharArray() before loop and then work with this array. This is more readable and more effective.
The structure
if (sequenceX.charAt(x) == sequenceY.charAt(y)) {
a += 0;
} else if (sequenceX.charAt(x) != sequenceY.charAt(y)) {
a += 1;
}
looks redundant. Fix it to:
if (sequenceX.charAt(x) == sequenceY.charAt(y)) {
a += 0;
} else {
a += 1;
}
Moreover taking into account that I recommended you to work with array change it to something like:
a += seqx[x] == seqY[x] ? 0 : 1
less code less bugs...
EDIT: as mentionded by #radai you do not need if/else structure at all: adding 0 to a is redundant.

Efficient methods for Incrementing and Decrementing in the same Loop

Suppose some situations exist where you would like to increment and decrement values in the same for loop. In this set of situations, there are some cases where you can "cheat" this by taking advantage of the nature of the situation -- for example, reversing a string.
Because of the nature of building strings, we don't really have to manipulate the iterate or add an additional counter:
public static void stringReversal(){
String str = "Banana";
String forwardStr = new String();
String backwardStr = new String();
for(int i = str.length()-1; i >= 0; i--){
forwardStr = str.charAt(i)+forwardStr;
backwardStr = backwardStr+str.charAt(i);
}
System.out.println("Forward String: "+forwardStr);
System.out.println("Backward String: "+backwardStr);
}
However, suppose a different case exists where we just want to print a decremented value, from the initial value to 0, and an incremented value, from 0 to the initial value.
public static void incrementAndDecrement(){
int counter = 0;
for(int i = 10; i >= 0; i--){
System.out.println(i);
System.out.println(counter);
counter++;
}
}
This works well enough, but having to create a second counter to increment seems messy. Are there any mathematical tricks or tricks involving the for loop that could be used that would make counter redundant?

Well it looks like you just want:
for(int i = 10; i >= 0; i--){
System.out.println(i);
System.out.println(10 - i);
}
Is that the case? Personally I'd normally write this as an increasing loop, as I find it easier to think about that:
for (int i = 0; i <= 10; i++) {
System.out.println(10 - i);
System.out.println(i);
}
Note that your string example is really inefficient, by the way - far more so than introducing an extra variable. Given that you know the lengths involved to start with, you can just start with two char[] of the right size, and populate the right index each time. Then create a string from each afterwards. Again, I'd do this with an increasing loop:
char[] forwardChars = new char[str.length()];
char[] reverseChars = new char[str.length()];
for (int i = 0; i < str.length(); i++) {
forwardChars[i] = str.charAt(i);
reverseChars[reverseChars.length - i - 1] = str.charAt(i);
}
String forwardString = new String(forwardChars);
String reverseString = new String(reverseChars);
(Of course forwardString will just be equal to str in this case anyway...)

You can have multiple variables and incrementers in a for loop.
for(int i = 10, j = 0; i >= 0; i--, j++) {
System.out.println(i);
System.out.println(j);
}

Dynamic programming-memoization

I am working on a DP problem in which a string of words with space removed, and I need to implement both buttom-up and memoization version to split the string into individual english words. However, I got the buttom-up version, however, the memoization seems a little complicated.
/* Split a string into individual english words
* #String str the str to be splitted
* #Return a sequence of words separated by space if successful,
null otherwise
*/
public static String buttom_up_split(String str){
int len = str.length();
int[] S = new int[len+1];
/*Stores all the valid strings*/
String[] result = new String[len+1];
/*Initialize the array*/
for(int i=0; i <= len; i++){
S[i] = -1;
}
S[0] =0;
for(int i=0; i < len; i++){
if(S[i] != -1){
for(int j= i+1; j <= len; j++){
String sub = str.substring(i, j);
int k = j;
if(isValidEnglishWord(sub)){
S[k] = 1; //set true indicates a valid split
/*Add space between words*/
if(result[i] != null){
/*Add the substring to the existing words*/
result[i+ sub.length()] = result[i] + " " + sub;
}
else{
/*The first word*/
result[i+ sub.length()] = sub;
}
}
}
}
}
return result[len]; //return the last element of the array
}
I really confused how to convert this buttom_up_version to the memoized version, hope someone can help..

Well, I'm not an export of memoization, but the idea is to have a "memory" of previous good english words.
The objective is to save computation time: in your case, the call to isValidEnglishWord().
Therefore, you need to adapt your alorythm this way:
walk through the 'str' string
extract a substring from it
checkif the substring is a valid word in your memory.
It's in memory: add a space and the word to your result.
It's not in memory: calls isValidEnglishWord and process its return.
It will give something like (not tested nor compiled)
// This is our memory
import java.util.*
private static Map<String, Boolean> memory = new HashMap<String, Boolean>()
public static String buttom_up_split(String str){
int len = str.length();
int[] S = new int[len+1];
String[] result = new String[len+1];
for(int i=0; i <= len; i++){
S[i] = -1;
}
S[0] =0;
for(int i=0; i < len; i++){
if(S[i] != -1){
for(int j= i+1; j <= len; j++){
String sub = str.substring(i, j);
int k = j;
// Order is significant: first look into memory !
Boolean isInMemory = memory.contains(sub);
if (isInMemory || isValidEnglishWord(sub)){
S[k] = 1;
if(result[i] != null){
// Memoize the result if needed.
if (!isInMemory) {
memory.put(sub, true);
}
result[i+ sub.length()] = result[i] + " " + sub;
} else {
result[i+ sub.length()] = sub;
}
}
}
}
}
return result[len];
}

Personally I always prefer to use memoization as transparently as possible without modifying the algorithm. This is because I want to be able to test the algorithm separately from memoization. Also I am working on a memoization library in which you only have to add #Memoize to methods to which memoization is applicable. But unfortunately this will come too late for you.
The last time I used memoization (without my library) I implemented it using a proxy class. An important remark is that this implementation does not support recursion. But this shouldn't be a problem since your algorithm is not recursive.
Some other references are:
wikipedia
Java implementation
memoize using proxy class
Remark about your algorithm:
How do you handle words that have other words in them? like "verbose" contains "verb", "theory" contains "the" etc...

Printing distinct integers in an array

I'm trying to write a small program that prints out distinct numbers in an array. For example if a user enters 1,1,3,5,7,4,3 the program will only print out 1,3,5,7,4.
I'm getting an error on the else if line in the function checkDuplicate.
Here's my code so far:
import javax.swing.JOptionPane;
public static void main(String[] args) {
int[] array = new int[10];
for (int i=0; i<array.length;i++) {
array[i] = Integer.parseInt(JOptionPane.showInputDialog("Please enter"
+ "an integer:"));
}
checkDuplicate (array);
}
public static int checkDuplicate(int array []) {
for (int i = 0; i < array.length; i++) {
boolean found = false;
for (int j = 0; j < i; j++)
if (array[i] == array[j]) {
found = true;
break;
}
if (!found)
System.out.println(array[i]);
}
return 1;
}
}

The simplest way would be to add all of the elements to a Set<Integer> and then just print the contents of the Set.

First of all, the "else if" statement is incorrect, since you don't provide any condition to the if (if you want an if, you need to write "if (condition) ...").
Second, you cannot decide inside the inner loop, if a value should be printed: The way your code works you write a value array[i] for each value array[j] that is different from array[i]!
Third: the inner loop needs only to go from 0 to the outer index i-1: For each element, you need only to decide, if it is the first occurrence (i.e. if the same value occured at any previous index or not). If it is, print it out, if not, ignore it.
A proper implementation of CheckDuplicate() would be:
public static void checkDuplicate(int array []) {
for (int i = 0; i < array.length; i++) {
boolean found = false;
for (int j = 0; j < i; j++)
if (array[i] == array[j]) {
found = true;
break;
}
if (!found)
System.out.println(array[i]);
}
}
But of course, some kind of Set would be much more efficient for bigger arrays...
EDIT: Of course, mmyers (see comments) is right by saying, that since CheckDuplicate() doesn't return any value, it should have return type void (instead of int). I corrected this in the above code...

Put them in a set ordered by insertion time, then convert back to an array if necessary.
new LinkedHashSet<Integer>(array).toArray()

Try throwing all of the integers into a Set. Duplicates will not ever be added to the Set and you will be left will a set of unique integers.

What you want can be accomplished using Java collection API, but not exactly as an one-liner, due to fact collection methods work with Objects and not primitives. J2SE lacks methods that convert, say, int[] to Integer[], but Apache Commons Lang library contains such useful methods, like ArrayUtils.toObject() and ArrayUtils.toPrimitive().
Using them, method to remove duplicated elements from an integer array looks something like this:
public static int[] removeDuplicates(int... array) {
Integer[] ints = ArrayUtils.toObject(array);
Set<Integer> set = new LinkedHashSet<Integer>(Arrays.asList(ints));
return ArrayUtils.toPrimitive(set.toArray(new Integer[set.size()]));
}
If your application is likely to include more of array/collection manipulation, I suggest you take a look at that library, instead of implementing things from scratch. But, if you're doing it for learning purposes, code away!

It would probably be better to add each number to a Set implementation rather than an array. Sets are specifically for storing collections of elements where you want to filter out duplicates.

Either use a Set as other people have suggested or use an List compatible class. With a list compatible class just use the Contains method to check if it already exists in the array.

import java.util.Scanner;
public class PrintDistinctNumbers {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
int [] numberArray = createArray();
System.out.println("The number u entered are: ");
displayArray(numberArray);
getDistinctNumbers(numberArray);
}
public static int[] createArray() {
Scanner input = new Scanner(System.in);
int [] numberCollection = new int [10];
System.out.println("Enter 10 numbers");
for(int i = 0; i < numberCollection.length; i++){
numberCollection[i] = input.nextInt();
}
return numberCollection;
}
public static void displayArray(int[] numberArray) {
for(int i = 0; i < numberArray.length; i++){
System.out.print(numberArray[i]+" ");
}
}
public static void getDistinctNumbers(int[] numberArray) {
boolean isDistinct = true;
int temp = 0;
int [] distinctArrayNumbers = new int [10];
for ( int i = 0; i < numberArray.length; i++){
isDistinct = true;
temp = numberArray[i];
for( int j = 0; j < distinctArrayNumbers.length; j++){
if( numberArray[i] == distinctArrayNumbers[j] ){
isDistinct = false;
}
}
if(isDistinct){
distinctArrayNumbers[temp]=numberArray[i];
temp++;
}
}
displayDistinctArray(distinctArrayNumbers);
}
public static void displayDistinctArray(int[] distinctArrayNumbers) {
for( int i = 0; i < distinctArrayNumbers.length; i++){
if(distinctArrayNumbers[i] != 0){
System.out.println(distinctArrayNumbers[i]);
}
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

charAt() or substring? Which is faster? - java

The answer is: it doesn't matter. Profile your code. Is this your bottleneck?

I would first obtain the underlying char[] from the source String using String.toCharArray() and then proceed to call newFunction. But I do agree with Jesper that it would be best if you could just deal with characters and avoid all the String functions...

Related

Performance tips questions

Using for loop to get the Hamming distance between 2 strings

Efficient methods for Incrementing and Decrementing in the same Loop

Dynamic programming-memoization

Printing distinct integers in an array

Categories

Resources