Does java keep the String in a form of char array? - java

I couldn't find it on the api or by searching the web..
I know the JVM keeps every String object it has in a string pool in order to optimize memory usage. However I can't figure out how does it save it 'under the hood', Since a String is an immutable object using the toCharArray will get me a copy of the internal array the is stored on the String object in the pool? (if so than every operation involved in getting the string array as a char is O(n)) also - using charAt(i) uses the internal array of the string? or does it copy it to a new array and return the char at position i of the new copied array?

Until Java 8, Strings were internally represented as an array of characters – char[], encoded in UTF-16, so that every character uses two bytes of memory.
When we create a String via the new operator, the Java compiler will create a new object and store it in the heap. For example.
String str1= new String("Hello");
When we create a String variable and assign a value to it, the JVM searches the pool for a String of equal value. If found, the Java compiler will simply return a reference to its memory address, without allocating additional memory.If not found, it’ll be added to the pool and its reference will be returned.
String str2= "Hello";
toCharArray() internally creates a new char[] array by copying the characters of original array to the new one.
charAt(int index) returns the value of specified index of the internal (original) char[] array.
With Java 9 a new representation is provided, called Compact Strings. This new format will choose the appropriate encoding between char[] and byte[] depending on the stored content. Since the new String representation will use the UTF-16 encoding only when necessary, the amount of heap memory will be significantly lower, which in turn causes less Garbage Collector overhead on the JVM.
Source:
http://www.baeldung.com/java-string-pool

Related

Is it a good practice to nullifying String in java [duplicate]

This question already has answers here:
Why is char[] preferred over String for passwords?
(17 answers)
Closed 4 years ago.
I have a problem with storing a plain password in memory as a String. According to the reference, since Strings are immutable there is a vulnerability of using String data type for sensitive data storing in memory.
https://www.geeksforgeeks.org/use-char-array-string-storing-passwords-java/
Why is char[] preferred over String for passwords?
Can I overcome this security issue by nullifying the string variable instead of using char array or String buffer/builder.
eg : String password="password";
password = null;
No. Nullifying a string would only delink the reference. But the value will still exist in string pool. Because to conserve memory, string values are retained in the string pool.
Any potential hacker, can retrieve the value by gaining access to the string pool.
Whereas, using char[], you can simply treat that object as any other object. And nullifying the char object will wipe off the data from heap at the time of garbage collection.
An even better option will be using a byte array.
Read more about String Constant pool.
If you want absolute security, no. Nulling out the String is not the right solution.
The reason for this is that nulling it out makes no guarantees about the String no longer being available. Although it may make it more likely to be garbage collected (and this is only a 'may'), there are no guarantees about when (or even if) it will be garbage collected.
You should use either a byte array, or a char array, and then null each of the elements in the array when you are done.

We know that String will store in SCP in the same way String[] also store in SCP? and what about ArrayList<String>?

We know that a String will be stored in SCP (String Constant Pool) area:
1. in the same way String[] is also stored in SCP? i mean each array contains a String, so again this will be stored in SCP?
2. and what about ArrayList(String) ? i mean each arraylist contains a String, so again this will be stored in SCP?
In our project we are facing OutOfMemory. we are having more than 1000+ String[]. Each time values for String[] are different. So Huge no of objects are getting created in SCP (guessing) We want to change this to ArrayList(String) to reduce memory. If again ArrayList(String), each String gets stored in SCP area, then there is no use of changing from String[] to ArrayList(String)
Please explain in detail. Your response is valuable.
No. All Strings will not be stored in String Constant Pool. Only String literals and interned Strings will be stored there.
String s = "abc"; // stores "abc" in the SCP if it is not already present.
String s1= "abc";// stores "abc" in the SCP if it is not already present.
String s2="abc";// Doesn't store "abc" into the String pool as it is already present.
String s3=s1+s2; // "abcabc" goes on heap
String[] is just an array that holds references to String objects.
So, I think there is some other problem somewhere else.
Also, from Oracle doc :
In JDK 7, interned strings are no longer allocated in the permanent
generation of the Java heap, but are instead allocated in the main
part of the Java heap (known as the young and old generations), along
with the other objects created by the application. This change will
result in more data residing in the main Java heap, and less data in
the permanent generation, and thus may require heap sizes to be
adjusted. Most applications will see only relatively small differences
in heap usage due to this change, but larger applications that load
many classes or make heavy use of the String.intern() method will see
more significant differences.
Its the String literals that are stored in pool. Rest for all other questions that you have, its just a reference. So String[] holds reference to String Objects which include pooled objects as well. Similarly ArrayList will hold the reference.
Changing [] to ArrayList wont make a difference.
Rather changing these String to StringBuffer/builder will, as basic operations like +(concat) doesnot create a new object all together.

optimizing the java code for better memory

I need to write a function which will do the following functionalities
Note that this:
fqField.substring(quoteEnd+1, fqField.length());
uses the character array of the referenced string, rather than create a new string. That is, if I have a 100,000 character array and I take a 2 character substring of that, the substring will reference the original 100,000 chars. This is true even if you dispose of the reference to the original string.
If you do this:
new String(fqField.substring(quoteEnd+1, fqField.length()));
then this will create a new String, with a new underlying character array. You can then dispose of the original and you won't be consuming memory for the original.
The ArrayList "prefixes" which you're creating has the default size for a list. You could add a sensible size to it.
What about using char instead of String, is it an option for you to pass that as params?
How about making "prefixes" an array of String (or char) from the start, instead of making it an ArrayList first and converting it later.

Convert code with pointers in C to Java code

I am having some difficulty in understanding how to write the below piece of code using String or char[] in Java.
void xyz(char *a, int startIndex, int endIndex)
{
int j;
for (j = startIndex; j <= endIndex; j++)
{
doThis((a+startIndex), (a+j));
xyz(a, startIndex+1, endIndex);
}
}
Here char *a points to the starting location of the char name[]
The above are just some random functions, but I just want the logic of how to use char* and character index char[] in Java
Based on the rephrased question from the comment thread:
You cannot change the characters of a Java String. If you need to modify a sequence of characters, use StringBuilder, which supports setCharAt(int, char), insert(int, char), and append(char). You can use new StringBuilder(myString) to convert a String to a StringBuilder, and stringBuilder.toString() to convert back.
This is perfectly legit Java code -- it's not code smelly, it's just the way you work with mutable character sequences.
A char* in C is, as you noted, pointing to the start of your character array (which is how C manages Strings).
In C the size of a char is one byte, and pointers always point to the start of a byte. Your C String is an array of characters, so adding 1 to a pointer moves the start of your string right by one character.
That means that the C code:
char *a;
// Set the String here
a = a + 1;
translates in Java to something like:
String a;
// Set the String here
a = a.substring(1);
or if you are using a char array:
char[] a;
// Set the array contents here
char[] copyTo = new char[a.length];
System.arraycopy(a, 1, copyTo, 0, a.length);
a = copyTo;
Java will be a bit more careful of protecting you that C will be though. For instance, if you have a zero length string, the C code has the potential to either segfault (crashing the application) or give you a gibberish string full of memory junk (then, eventually, crash the application), whereas the Java code will throw an exception (normally an IndexOutOfBoundsException) which you can, hopefully, handle cleanly.
Remember though, that String in Java are immutable. You cannot change them, you can only create new Strings. Fortunately, String has several built in functions which allow you to do a lot of the standard actions, like replace part of the String with another and return the result. A character array is mutable, and you can change the characters within them, but you will lose a lot of the nice benefits you get from using the proper String class.
Simple Answer:
You can't do exactly that. Java is pass by reference only. You don't have access to memory location information, so you can't do arithmetic with it.
Longer Answer:
It looks like you are passing in a string for manipulation. You have several options to simulate that.
You can convert the string to an array of characters and then pass in a char[]. If your manipulations are not any sort of standard string operation and completely custom this is probably what you need to do. Keep in mind that you can't change the size of the array passed in, nor can you have a point at a new array after the function completes. (again, only pass by value). Only the values of the existing elements of the array can be modified.
You can pass in the String and use the String methods, such as subString() (which your begin and end indexes seem to suggest, but this may not meet your needs. Note that strings are immutable however, and you can only get a result out via the return statement.
If you really need to modify the contents of the object passed in you can pass a StringBuilder, StringBuffer or CharBuffer object and modify away.
There's a hack that can also be used to circumvent pass by reference, but it's poor style except in special situations. Pass in an array of whatever you need to modify, so in this case an array of array of characters would allow you to set a new sub-array, and effectively acheive pass by reference, but try not to do this :)
If your method modifies the values you cant use String as that is immutable, you can use StringBuilder instead.
If your methods already rely on char arrays and you need the offsets you can use a CharBuffer to wrap an array. It does not support String operations but supports views for sub ranges, which seems to be what you use in the doThis() method.

Why does appending "" to a String save memory?

I used a variable with a lot of data in it, say String data.
I wanted to use a small part of this string in the following way:
this.smallpart = data.substring(12,18);
After some hours of debugging (with a memory visualizer) I found out that the objects field smallpart remembered all the data from data, although it only contained the substring.
When I changed the code into:
this.smallpart = data.substring(12,18)+"";
..the problem was solved! Now my application uses very little memory now!
How is that possible? Can anyone explain this? I think this.smallpart kept referencing towards data, but why?
UPDATE:
How can I clear the big String then? Will data = new String(data.substring(0,100)) do the thing?
Doing the following:
data.substring(x, y) + ""
creates a new (smaller) String object, and throws away the reference to the String created by substring(), thus enabling garbage collection of this.
The important thing to realise is that substring() gives a window onto an existing String - or rather, the character array underlying the original String. Hence it will consume the same memory as the original String. This can be advantageous in some circumstances, but problematic if you want to get a substring and dispose of the original String (as you've found out).
Take a look at the substring() method in the JDK String source for more info.
EDIT: To answer your supplementary question, constructing a new String from the substring will reduce your memory consumption, provided you bin any references to the original String.
NOTE (Jan 2013). The above behaviour has changed in Java 7u6. The flyweight pattern is no longer used and substring() will work as you would expect.
If you look at the source of substring(int, int), you'll see that it returns:
new String(offset + beginIndex, endIndex - beginIndex, value);
where value is the original char[]. So you get a new String but with the same underlying char[].
When you do, data.substring() + "", you get a new String with a new underlying char[].
Actually, your use case is the only situation where you should use the String(String) constructor:
String tiny = new String(huge.substring(12,18));
When you use substring, it doesn't actually create a new string. It still refers to your original string, with an offset and size constraint.
So, to allow your original string to be collected, you need to create a new string (using new String, or what you've got).
I think this.smallpart kept
referencing towards data, but why?
Because Java strings consist of a char array, a start offset and a length (and a cached hashCode). Some String operations like substring() create a new String object that shares the original's char array and simply has different offset and/or length fields. This works because the char array of a String is never modified once it has been created.
This can save memory when many substrings refer to the same basic string without replicating overlapping parts. As you have noticed, in some situations, it can keep data that's not needed anymore from being garbage collected.
The "correct" way to fix this is the new String(String) constructor, i.e.
this.smallpart = new String(data.substring(12,18));
BTW, the overall best solution would be to avoid having very large Strings in the first place, and processing any input in smaller chunks, aa few KB at a time.
In Java strings are imutable objects and once a string is created, it remains on memory until it's cleaned by the garbage colector (and this cleaning is not something you can take for granted).
When you call the substring method, Java does not create a trully new string, but just stores a range of characters inside the original string.
So, when you created a new string with this code:
this.smallpart = data.substring(12, 18) + "";
you actually created a new string when you concatenated the result with the empty string.
That's why.
As documented by jwz in 1997:
If you have a huge string, pull out a substring() of it, hold on to the substring and allow the longer string to become garbage (in other words, the substring has a longer lifetime) the underlying bytes of the huge string never go away.
Just to sum up, if you create lots of substrings from a small number of big strings, then use
String subtring = string.substring(5,23)
Since you only use the space to store the big strings, but if you are extracting a just handful of small strings, from losts of big strings, then
String substring = new String(string.substring(5,23));
Will keep your memory use down, since the big strings can be reclaimed when no longer needed.
That you call new String is a helpful reminder that you really are getting a new string, rather than a reference to the original one.
Firstly, calling java.lang.String.substring creates new window on the original String with usage of the offset and length instead of copying the significant part of underlying array.
If we take a closer look at the substring method we will notice a string constructor call String(int, int, char[]) and passing it whole char[] that represents the string. That means the substring will occupy as much amount of memory as the original string.
Ok, but why + "" results in demand for less memory than without it??
Doing a + on strings is implemented via StringBuilder.append method call. Look at the implementation of this method in AbstractStringBuilder class will tell us that it finally do arraycopy with the part we just really need (the substring).
Any other workaround??
this.smallpart = new String(data.substring(12,18));
this.smallpart = data.substring(12,18).intern();
Appending "" to a string will sometimes save memory.
Let's say I have a huge string containing a whole book, one million characters.
Then I create 20 strings containing the chapters of the book as substrings.
Then I create 1000 strings containing all paragraphs.
Then I create 10,000 strings containing all sentences.
Then I create 100,000 strings containing all the words.
I still only use 1,000,000 characters. If you add "" to each chapter, paragraph, sentence and word, you use 5,000,000 characters.
Of course it's entirely different if you only extract one single word from the whole book, and the whole book could be garbage collected but isn't because that one word holds a reference to it.
And it's again different if you have a one million character string and remove tabs and spaces at both ends, making say 10 calls to create a substring. The way Java works or worked avoids copying a million characters each time. There is compromise, and it's good if you know what the compromises are.

Categories