String concatenation for repeated string values in Java - java

I have two JAVA code snippets below, want to know which is better in terms of memory/performance.
First snippet:
String s1 = "USER.DELETE";
String s2 = "RESOURCE.DELETE";
String s3 = "ENTITY.DELETE";
Second snippet:
one static final variable
private static final String DELETE = ".DELETE";
and then using this variable
String s1 = "USER" + DELETE;
String s2 = "RESOURCE" + DELETE;
String s3 = "ENTITY" + DELETE;

First approach will create 3 String object instance in memory.
The second approach will create 4 String object instance in memory.
Performance impact:
There will not be any impact from performance point of view as string concatenation will be done at compile time in given scenario as value is already known.
Java spec:
Strings computed by constant expressions (§15.28) are computed at compile time and then treated as if they were literals.
http://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.10.5
Memory Impact :
There will be one extra string created inside the java heap memory space with second approach.
From code maintainability point of view I will go with second approach.
Suppose later if we want to change .DELETE to .ASYNCDELETE.
We have to make only one place change with second approach.
But with first approach we have to make 3 modification.

Actually there is no any difference. Compiler will make concatenation and store resulting string.
So choose according to your style.

The second snippet will store 4 strings in memory while the first will store three.
You'll "waste" the space required to store the ".DELETE".
You have a good article about String concatenation here

Little difference in this scenario, as others described above.
However if the subject is of interest to you in a wider usage, for example if you were to be creating lots of strings dynamically based on more combinations of static data, check out the String intern() method. It helps use the string class as a factory so you'll get the same string object for the same string contents, hurts performance a bit but can save a lot of memory usage and garbage collection overhead if you're working with lots of data, and can also make hash lookups faster if you always intern the keys, in specific situations you can override equals and hashCode / comparators to only use the builtin Object '==' comparison, so the comparator does not need to compare the string contents.

Related

In Java, when we print a string literal on to the terminal, does this string literal also be stored in the string pool?

I am aware that when we initialize a string literal to a variable this literal will be stored in the string pool by the JVM. Consider the piece of code below.
System.out.println("This is a string literal");
Does the string literal within the quotes also be stored in the string pool even if I don't initialize it to a variable?
I will preface this answer by saying that there is little practical use in gaining a deep understanding of the Java string pool. From a practical perspective, you just need to remember two things:
Don't use == to compare strings. Use equals, compareTo, or equivalent methods.
Don't use explicit String.intern calls in your code. If you want to avoid potential problems with duplicate strings, enable the string de-duplication feature that is available in modern Java GCs.
I am aware that when we initialize a string literal either using the 'new' keyword or not, this literal will be stored in the string pool by the JVM.
This is garbled.
Firstly, you don't "initialize" a string literal. You initialize a variable.
String hi = "hello"; // This initializes the variable `hi`.
Secondly you typically don't / shouldn't use a string literal with new.
String hi = new String("hello"); // This is bad. You should write this as above.
The normal use-case for creating a string using new is something like this:
String hi = new String(arrayOfCharacters, offset, count);
In fact, creation and interning of the String object that corresponds to a string literal, happens either at the first time that the literal is used in an expression or at an earlier time. The precise details (i.e. when it happens) are unspecified and (I understand) version dependent.
The first usage might be in a variable initialization, or it might be in something else; e.g. a method call.
So to your question:
Consider the piece of code below:
System.out.println("This is a string literal");
Does the string literal within the quotes also be stored in the string pool even if I do not initialize it?
Yes, it does. If that was the first time the literal was used, the code above may be the trigger for this to happen. But it could have happened previously; e.g. if the above code was run earlier.
As a followup, you asked:
Why does the String Pool collect string literals which are not stored in a variable and just displayed in the console?
Because the JLS 3.10.5 requires that the String objects which correspond to string literals are interned:
"Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern (§12.5)."
And you asked:
The Presence of the String Pool help optimize the program. By storing literals as such (which is actually not required because it is just to be displayed in the console), isn't it the case that it goes against its whole purpose (which is optimization)?
The original idea for interning and the string pool was to save memory. That made sense 25 years ago when the Java language was designed and originally specified. These days even a low-end Android phone has 1GB of RAM, and interning of string literals to save a few thousand bytes is kind of pointless. Except that the JLS says that this must happen.
But the answer is No, it doesn't go against the (original) purpose. This statement:
System.out.println("This is a string literal");
could be executed many times. You don't want / need to create a new String object for the literal each time that you execute it. The thing is that the JVM doesn't know what is going to happen.
Anyway, the interning must happen because that is what the spec says.

Java concatenate strings vs static strings

I try to get a better understanding of Strings. I am basically making a program that requires a lot of strings. However, a lot of the strings are very, very similar and merely require a different word at the end of the string.
E.g.
String one = "I went to the store and bought milk"
String two = "I went to the store and bought eggs"
String three = "I went to the store and bought cheese"
So my question is, what approach would be best suited to take when dealing with strings? Would concatenating 2 strings together have any benefits over just having static strings in, say for example, performance or memory management?
E.g.
String one = "I went to the store and bought "
String two = "milk"
String three = "cheese"
String four = one + two
String five = one + three
I am just trying to figure out the most optimal way of dealing with all these strings. (If it helps to put a number of strings I am using, I currently have 50 but the number could surplus a huge amount)
As spooky has said the main concern with the code is readability. Unless you are working on a program for a phone you do not need to manage your resources. That being said, it really doesn't matter whether you create a lot of Strings that stand alone or concatenate a base String with the small piece that varies. You won't really notice better performance either way.
You may set the opening sentence in a string like this
String openingSentence = "I went to the store and bought";
and alternate defining each word alone, by defining one array of strings like the following ::
String[] thingsToBeBought = { "milk", "water", "cheese" .... };
then you can do foreach loop and concatenate each element in the array with the opening sentence.
In Java, if you concatenate two Strings (e.g. using '+') a new String is created, so the old memory needs to be garbage collected. If you want to concatenate strings, the correct way to do this is to use a StringBuilder or StringBuffer.
Given your comment about these strings really being URLs, you probably want to have a StringBuilder/StringBuffer that is the URL base, and then append the suffixes as needed.
Performance wise final static strings are always better as they are generated during compile time. Something like this
final static String s = "static string";
Non static strings and strings concatenated as shown in the other example are generated at runtime. So even though performance will hardly matter for such a small thing, The second example is not as good as the first one performance wise as in your code :
// not as good performance wise since they are generated at runtime
String four = one + two
String five = one + three
Since you are going to use this string as URL, I would recommend to use StringJoiner (in case your are using JAVA 8). It will be as efficient as StringBuilder (will not create a new string every time you perform concatenation) and will automatically add "/" between strings.
StringJoiner myJoiner = new StringJoiner("/")
There will be no discernable difference in performance, so the manner in which you go about this is more a matter of preference. I would likely declare the first part of the sentence as a String and store the individual purchase items in an array.
Example:
String action = "I went to the store and bought ";
String [] items = {"milk", "eggs", "cheese"};
for (int x = 0; x< items.length; x++){
System.out.println(action + items[x]);
}
Whether you declare every possible String or separate Strings to be concatenated isn't going to have any measurable impact on memory or performance in the example you give. In the extreme case of declaring truly large numbers of String literals, Java's native hash table of interned Strings will use more memory if you declare every possible String, because the table's cached values will be longer.
If you are concatenating more than 2 Strings using the + operator, you will be creating extra String objects to be GC'd. For example if you have Strings a = "1" and b = "2", and do String s = "s" + a + b;, Java will first create the String "s1" and then concatenate it to form a second String "s12". Avoid the intermediate String by using something like StringBuilder. (This wouldn't apply to compile-time declarations, but it would to runtime concatenations.)
If you happen to be formatting a String rather than simply concatenating, use a MessageFormat or String.format(). It's prettier and avoids the intermediate Strings created when using the + operator. So something like, String urlBase = "http://host/res?a=%s&b=%s"; String url = String.format(urlBase, a, b); where a and b are the query parameter String values.

Adding a prefix to the String in Java?

I know that adding a character in a string should take O(1) time. for eg:-
String S = "abc"
S = S+'z';
What if I want to do vice-versa, concatenating a String into char. Is it possible like this?
S = 'z'+S;
If yes, then how much time will it take? Does Java copies whole content of String S{O(n)} or just do adjust pointers in memory {O(1)}?
Thanks!
String is immutable. Thus there's no way this operation (of adding the prefix) to be O(1). It is at least linear with respect to the size of S. And... as it makes no sense (think about it) to be O(f(N)) where O(f(N)) > O(N), it means it's O(N). Pretty sure about this just from common sense.
The order of concat does not matter. In recent versions of compilers this (usually) turns into byte code which uses a StringBuilder.

Why does appending "" to a String save memory?

I used a variable with a lot of data in it, say String data.
I wanted to use a small part of this string in the following way:
this.smallpart = data.substring(12,18);
After some hours of debugging (with a memory visualizer) I found out that the objects field smallpart remembered all the data from data, although it only contained the substring.
When I changed the code into:
this.smallpart = data.substring(12,18)+"";
..the problem was solved! Now my application uses very little memory now!
How is that possible? Can anyone explain this? I think this.smallpart kept referencing towards data, but why?
UPDATE:
How can I clear the big String then? Will data = new String(data.substring(0,100)) do the thing?
Doing the following:
data.substring(x, y) + ""
creates a new (smaller) String object, and throws away the reference to the String created by substring(), thus enabling garbage collection of this.
The important thing to realise is that substring() gives a window onto an existing String - or rather, the character array underlying the original String. Hence it will consume the same memory as the original String. This can be advantageous in some circumstances, but problematic if you want to get a substring and dispose of the original String (as you've found out).
Take a look at the substring() method in the JDK String source for more info.
EDIT: To answer your supplementary question, constructing a new String from the substring will reduce your memory consumption, provided you bin any references to the original String.
NOTE (Jan 2013). The above behaviour has changed in Java 7u6. The flyweight pattern is no longer used and substring() will work as you would expect.
If you look at the source of substring(int, int), you'll see that it returns:
new String(offset + beginIndex, endIndex - beginIndex, value);
where value is the original char[]. So you get a new String but with the same underlying char[].
When you do, data.substring() + "", you get a new String with a new underlying char[].
Actually, your use case is the only situation where you should use the String(String) constructor:
String tiny = new String(huge.substring(12,18));
When you use substring, it doesn't actually create a new string. It still refers to your original string, with an offset and size constraint.
So, to allow your original string to be collected, you need to create a new string (using new String, or what you've got).
I think this.smallpart kept
referencing towards data, but why?
Because Java strings consist of a char array, a start offset and a length (and a cached hashCode). Some String operations like substring() create a new String object that shares the original's char array and simply has different offset and/or length fields. This works because the char array of a String is never modified once it has been created.
This can save memory when many substrings refer to the same basic string without replicating overlapping parts. As you have noticed, in some situations, it can keep data that's not needed anymore from being garbage collected.
The "correct" way to fix this is the new String(String) constructor, i.e.
this.smallpart = new String(data.substring(12,18));
BTW, the overall best solution would be to avoid having very large Strings in the first place, and processing any input in smaller chunks, aa few KB at a time.
In Java strings are imutable objects and once a string is created, it remains on memory until it's cleaned by the garbage colector (and this cleaning is not something you can take for granted).
When you call the substring method, Java does not create a trully new string, but just stores a range of characters inside the original string.
So, when you created a new string with this code:
this.smallpart = data.substring(12, 18) + "";
you actually created a new string when you concatenated the result with the empty string.
That's why.
As documented by jwz in 1997:
If you have a huge string, pull out a substring() of it, hold on to the substring and allow the longer string to become garbage (in other words, the substring has a longer lifetime) the underlying bytes of the huge string never go away.
Just to sum up, if you create lots of substrings from a small number of big strings, then use
String subtring = string.substring(5,23)
Since you only use the space to store the big strings, but if you are extracting a just handful of small strings, from losts of big strings, then
String substring = new String(string.substring(5,23));
Will keep your memory use down, since the big strings can be reclaimed when no longer needed.
That you call new String is a helpful reminder that you really are getting a new string, rather than a reference to the original one.
Firstly, calling java.lang.String.substring creates new window on the original String with usage of the offset and length instead of copying the significant part of underlying array.
If we take a closer look at the substring method we will notice a string constructor call String(int, int, char[]) and passing it whole char[] that represents the string. That means the substring will occupy as much amount of memory as the original string.
Ok, but why + "" results in demand for less memory than without it??
Doing a + on strings is implemented via StringBuilder.append method call. Look at the implementation of this method in AbstractStringBuilder class will tell us that it finally do arraycopy with the part we just really need (the substring).
Any other workaround??
this.smallpart = new String(data.substring(12,18));
this.smallpart = data.substring(12,18).intern();
Appending "" to a string will sometimes save memory.
Let's say I have a huge string containing a whole book, one million characters.
Then I create 20 strings containing the chapters of the book as substrings.
Then I create 1000 strings containing all paragraphs.
Then I create 10,000 strings containing all sentences.
Then I create 100,000 strings containing all the words.
I still only use 1,000,000 characters. If you add "" to each chapter, paragraph, sentence and word, you use 5,000,000 characters.
Of course it's entirely different if you only extract one single word from the whole book, and the whole book could be garbage collected but isn't because that one word holds a reference to it.
And it's again different if you have a one million character string and remove tabs and spaces at both ends, making say 10 calls to create a substring. The way Java works or worked avoids copying a million characters each time. There is compromise, and it's good if you know what the compromises are.

Is there a difference between String concat and the + operator in Java? [duplicate]

This question already has answers here:
Closed 13 years ago.
Duplicate
java String concatenation
I'm curious what is the difference between the two.
The way I understand the string pool is this:
This creates 3 string objects in the string pool, for 2 of those all references are lost.
String mystr = "str";
mystr += "end";
Doesn't this also create 3 objects in the string pool?
String mystr = "str";
mystr = mystr.concat("end")
I know StringBuilder and StringBuffer are much more efficient in terms of memory usage when there's lots of concatination to be done. I'm just curious if there's any difference between the + operator and concat in terms of memory usage.
There's no difference in this particular case; however, they're not the same in general.
str1 += str2 is equivalent to doing the following:
str1 = new StringBuilder().append(str1).append(str2).toString();
To prove this to yourself, just make a simple method that takes two strings and +='s the first string to the second, then examine the disassembled bytecode.
By contrast, str1.concat(str2) simply makes a new string that's the concatenation of str1 and str2, which is less expensive for a small number of concatenated strings (but will lose to the first approach with a larger number).
Additionally, if str1 is null, notice that str1.concat(str2) throws a NPE, but str1 += str2 will simply treat str1 as if it were null without throwing an exception. (That is, it yields "null" concatenated with the value of str2. If str2 were, say, "foo", you would wind up with "nullfoo".)
Update: See this StackOverflow question, which is almost identical.
The important difference between += and concat() is not performance, it's semantics. concat() will only accept a string argument, but + (or +=) will accept anything. If the non-string operand is an object, it will be converted to a string by calling toString() on it; a primitive will be converted as if by calling the appropriate method in the associated wrapper class, e.g., Integer.toString(theInt); and a null reference becomes the string "null".
Actually, I don't know why concat() even exists. People see it listed in the API docs and assume it's there for a good reason--performance being the most obvious reason. But that's a red herring; if performance is really a concern, you should be using a StringBuilder, as discussed in the thread John linked to. Otherwise, + or += is much more convenient.
EDIT: As for the issue of "creating objects in the string pool," I think you're misunderstanding what the string pool is. At run-time, the actual character sequences, "str" and "end" will be stored in a dedicated data structure, and wherever you see the literals "str" and "end" in the source code, the bytecode will really contain references to the appropriate entries in that data structure.
In fact, the string pool is populated when the classes are loaded, not when the code containing the string literals is run. That means each of your snippets only creates one object: the result of the concatenation. (There's also some object creation behind the scenes, which is a little different for each of the techniques, but the performance impact is not worth worrying about.)
The way I understand the string pool
is this:
You seem to have a misconception concerning that term. There is no such thing as a "string pool" - the way you're using it, it looks like you just mean all String object on the heap. There is a runtime constant pool which contains, among many other things, compile-time String constants and String instances returned from String.intern()
Unless the argument to concat is an empty string, then
String mystr = "str";
mystr = mystr.concat("end")
will also create 3 strings.
More info: https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html.

Categories