I read a lot about memory allocation for Strings lately and can't find any details if things are the same with Java 8.
How much memory space would a String like "Alexandru Tanasescu" use in Java 8?
I use the 64bit version.
Java7 or lower
Minimum String memory usage :
(bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)
So
80 = 8 * (int) ((((19) * 2) + 45) / 8)
Understanding String memory usage (SOURCE)
To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far).
Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the String contains, say, 19 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 19*2=38 bytes for the seventeen chars. Since 12+38=50 isn't a multiple of 8, we also need to round up to the next multiple of 8 (56). So overall, our 19-character String will use up 56+24 = 80 bytes.
Java8.
Java 8 does not have the offset and length anymore. Only hash and the CharArray. #Thomas Jungblut
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
So, in Java8 the way to calculate memory for strings remains same but you must subtract 8 bytes less due to the missing offset and length.
"Alexandru Tanasescu" uses 104 bytes. This is how to get the size
long m0 = Runtime.getRuntime().freeMemory();
String s = new String("Alexandru Tanasescu");
long m1 = Runtime.getRuntime().freeMemory();
System.out.println(m0 - m1);
Note: run it with -XX:-UseTLAB option
If you look at the Oracle Java 8 sources, you have:
A char value[] and an int hash. A char is 2 bytes, and an int is 4 bytes.
So wouldn't the answer be yourstring.length * 2 + 4?
No. Every object had overhead. An array stores its dimensions, for example. And both the array (an object) and the string will incur extra memory from the garbage collector storing information about them.
There is no reliable way to calculate this, because AFAIK each JRE and JDK has no obligation to the size of object overhead.
According to the following JEP:
http://openjdk.java.net/jeps/254
The current implementation of the String class stores characters in a
char array, using two bytes (sixteen bits) for each character.
In Java SE 9 this might change.
Note however, since this is a JEP not a JSR (and it mentions implementation), I understand, that this is implementations specific and not defined by the JLS.
Related
I read a lot about memory allocation for Strings lately and can't find any details if things are the same with Java 8.
How much memory space would a String like "Alexandru Tanasescu" use in Java 8?
I use the 64bit version.
Java7 or lower
Minimum String memory usage :
(bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)
So
80 = 8 * (int) ((((19) * 2) + 45) / 8)
Understanding String memory usage (SOURCE)
To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far).
Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the String contains, say, 19 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 19*2=38 bytes for the seventeen chars. Since 12+38=50 isn't a multiple of 8, we also need to round up to the next multiple of 8 (56). So overall, our 19-character String will use up 56+24 = 80 bytes.
Java8.
Java 8 does not have the offset and length anymore. Only hash and the CharArray. #Thomas Jungblut
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
So, in Java8 the way to calculate memory for strings remains same but you must subtract 8 bytes less due to the missing offset and length.
"Alexandru Tanasescu" uses 104 bytes. This is how to get the size
long m0 = Runtime.getRuntime().freeMemory();
String s = new String("Alexandru Tanasescu");
long m1 = Runtime.getRuntime().freeMemory();
System.out.println(m0 - m1);
Note: run it with -XX:-UseTLAB option
If you look at the Oracle Java 8 sources, you have:
A char value[] and an int hash. A char is 2 bytes, and an int is 4 bytes.
So wouldn't the answer be yourstring.length * 2 + 4?
No. Every object had overhead. An array stores its dimensions, for example. And both the array (an object) and the string will incur extra memory from the garbage collector storing information about them.
There is no reliable way to calculate this, because AFAIK each JRE and JDK has no obligation to the size of object overhead.
According to the following JEP:
http://openjdk.java.net/jeps/254
The current implementation of the String class stores characters in a
char array, using two bytes (sixteen bits) for each character.
In Java SE 9 this might change.
Note however, since this is a JEP not a JSR (and it mentions implementation), I understand, that this is implementations specific and not defined by the JLS.
I have some slides from IBM named : "From Java Code to Java Heap: Understanding the Memory Usage of Your Application", that says, when we use String instead of char[], there is
Maximum overhead would be 24:1 for a single character!
but I am not able to understand what overhead is referred here. Can anybody please help?
Source :
This figure relates to JDK 6- 32-bit.
JDK 6
In pre-Java-7 world strings which were implemented as a pointer to a region of a char[] array:
// "8 (4)" reads "8 bytes for x64, 4 bytes for x32"
class String{ //8 (4) house keeping + 8 (4) class pointer
char[] buf; //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
int offset; //4 bytes -> three int
int length; //4 bytes -> fields align to
int hash; //4 bytes -> 16 (12) bytes
}
So I counted:
36 bytes per new String("a") for JDK 6 x32 <-- the overhead from the article
56 bytes per new String("a") for JDK 6 x64.
JDK 7
Just to compare, in JDK 7+ String is a class which holds a char[] buffer and a hash field only.
class String{ //8 (4) + 8 (4) bytes -> 16 (8) aligned
char[] buf; //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
int hash; //4 bytes -> 8 (4) aligned
}
So it's:
28 bytes per String for JDK 7 x32
48 bytes per String for JDK 7 x64.
UPDATE
For 3.75:1 ratio see #Andrey's explanation below. This proportion falls down to 1 as the length of the string grows.
Useful links:
Memory usage of Java Strings and string-related objects.
Calculate memory of a Map Entry - a simple technique to get a size of an object.
In the JVM, a character variable is stored in a single 16-bit memory allocation and changes to that Java variable overwrite that same memory location.This makes creating or updating character variables very fast and memory-cheap, but increases the JVM's overhead compared to the static allocation as used in Strings.
The JVM stores Java Strings in a variable size memory space (essentially, an array), which is exactly the same size (plus 1, for the string termination character) of the string when the String object is created or first assigned a value. Thus, an object with initial value "HELP!" would be allocated 96 bits of storage ( 6 characters, each 16-bits in size). This value is considered immutable, allowing the JVM to inline references to that variable, making static string assignments very fast, and very compact, plus very efficient from the JVM point of view.
Reference
I'll try explaining the numbers referenced in the source article.
The article describes object metadata typically consisting of: class, flags and lock.
The class and lock are stored in the object header and take 8 bytes on 32bit VM. I haven't found though any information about JVM implementations which has flags info in the object header. It might be so that this is stored somewhere externally (e.g. by garbage collector to count references to the object etc.).
So let's assume that the article talks about some x32 AbstractJVM which uses 12 bytes of memory to store meta information about the object.
Then for char[] we have:
12 bytes of meta information (8 bytes on x32 JDK 6, 16 bytes on x64 JDK)
4 bytes for array size
2 bytes for each character stored
2 bytes of alignment if characters number is odd (on x64 JDK: 2 * (4 - (length + 2) % 4))
For java.lang.String we have:
12 bytes of meta information (8 bytes on x32 JDK6, 16 bytes on x64 JDK6)
16 bytes for String fields (it is so for JDK6, 8 bytes for JDK7)
memory needed to store char[] as described above
So, let's count how much memory is needed to store "MyString" as String object:
12 + 16 + (12 + 4 + 2 * "MyString".length + 2 * ("MyString".length % 2)) = 60 bytes.
From other side we know that to store only the data (without information about the data type, length or anything else) we need:
2 * "MyString".length = 16 bytes
Overhead is 60 / 16 = 3.75
Similarly for single character array we get the 'maximum overhead':
12 + 16 + (12 + 4 + 2 * "a".length + 2 * ("a".length % 2)) = 48 bytes
2 * "a".length = 2 bytes
48 / 2 = 24
Following the article authors' logic ultimately the maximum overhead of value infinity is achieved when we store an empty string :).
I had read from old stackoverflow answer not able to get it.
In Oracle's JDK a String has four instance-level fields:
A character array
An integral offset
An integral character count
An integral hash value
That means that each String introduces an extra object reference (the String itself), and three integers in addition to the character array itself. (The offset and character count are there to allow sharing of the character array among String instances produced through the String#substring() methods, a design choice that some other Java library implementers have eschewed.) Beyond the extra storage cost, there's also one more level of access indirection, not to mention the bounds checking with which the String guards its character array.
If you can get away with allocating and consuming just the basic character array, there's space to be saved there. It's certainly not idiomatic to do so in Java though; judicious comments would be warranted to justify the choice, preferably with mention of evidence from having profiled the difference.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I was trying to find out efficient data types.
I know int is 4 bytes and char is one byte.
an object which contains five integers (4 * 5 = 20 bytes)
a String object which has ten characters. ( Suppose it has 10 characters 10 * 1 = 10 bytes)
Am I right?
Which one do you think it is better?
The objective answer first:
Primitive data types are documented here
Strings are more complicated because the JVM can intern them. See a good explanation here
The not so objective answer: pick the data structure that makes for the best design for your application.
If you have a specific constraint in your application, post more details about the data you need to handle and the constraints you have.
A String is not just an array of characters, it is an independent object, and has fields other than its backing char[]. For example, String has three int fields: offset, count and hash. The empty string, therefore, is generally 16 bytes (since we also need to take the char[] field into account) plus the normal 8 bytes of object overhead. Also note that a char[] is itself an object, and has the int field length and an associated object overhead. Once you have taken all this into account, then you can add the two (not one!) bytes per char.
So, for a 10-character string:
3 int fields: 12 bytes
char[] field: 8 bytes
int field: 4 bytes
object overhead: 8 bytes
10 characters: 20 bytes
object overhead: 8 bytes
This comes out to about 60 bytes. I say "about" because some of this is dependent on the VM.
You are incorrect about chars in Java: since they are designed to hold 16-bit UNICODE code points, they take two, not one byte each. In the end, both representations will take the same amount of memory.
You should pick the data type that makes the most sense to you, the designer of your classes, and to the readers of your code. Memory concerns should not be at the top of your design priorities unless the number of objects that you need threatens to overflow your available memory. Even then you should do careful memory profiling before you optimize.
Characters are 2 bytes in size. They are equivalent to an unsigned short, so a character's value can range between [0, 65535] inclusive.
The number of bytes a String occupies is actually:
string.length * 2
So for your example, a 10 character string occupies 20 bytes, not 10 bytes.
This would be just the string content. There are other variables within the String class which will occupy more bytes of course. And even an empty object occupies a certain number of bytes that will vary based on the JVM implementation.
However, just the character content will occupy 2 bytes per character.
But don't worry about this as its most assuredly premature optimization. Clean code is more important than lightning fast code usually. Pick appropriate data types, write code that's easy to follow and read. These things are more important.
If you are worried about holding large strings in memory consider changing your approach. The most common problem I see with large strings is when new programmers read an entire file into memory.
If you are doing this, try processing data line by line. Only hold the smallest unit you need in memory at a time, perform your processing, and move on.
I know int is 4 bytes
correct
and char is one byte.
A char is a 16-bit unsigned integer, so 2 bytes
an object which contains five integers (4 * 5 = 20 bytes)
A Object has a header which is 12 bytes on a 32-bit JVM and 16 bytes on a 64-bit JVM. Objects are 8 byte aligned, possibly 16 or 32 byte aligned if this is changed.
This means a new int[5] uses 16 + 20 + 4 (padding) = 40 bytes
a String object which has ten characters. ( Suppose it has 10 characters 10 * 1 = 10 bytes)
A String uses ~24 bytes with header and length fields etc, but it wraps a char[] which contains the actual chars, which is a further 16+20+4 = 40 bytes.
A simple way to check this is to use the following. Make sure you use -XX:-UseTLAB which improves memory accounting (but is slower for multi-threaded programming)
public static void main(String... ignored) {
char[] chars = new char[10];
long used = memoryUsed();
String s= new String(chars);
long diff = memoryUsed() - used;
if (diff == 0) throw new AssertionError("You must set -XX:-UseTLAB on the command line");
System.out.printf("Creating a String of 10 characters used %,d bytes of memory%n", diff);
}
private static long memoryUsed() {
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
prints
Creating a String of 10 characters used 64 bytes of memory
On this blog post, it's said that the minimum memory usage of a String is:
8 * (int) ((((no chars) * 2) + 45) / 8) bytes.
So for the String "Apple Computers", the minimum memory usage would be 72 bytes.
Even if I have 10,000 String objects of twice that length, the memory usage would be less than 2Mb, which isn't much at all. So does that mean I'm underestimating the amount of Strings present in an enterprise application, or is that formula wrong?
Thanks
String storage in Java depends on how the string was obtained. The backing char array can be shared between multiple instances. If that isn't the case, you have the usual object overhead plus storage for one pointer and three ints which usually comes out to 16 bytes overhead. Then the backing array requires 2 bytes per char since chars are UTF-16 code units.
For "Apple Computers" where the backing array is not shared, the minimum cost is going to be
backing array for 16 chars -- 32B which aligns nicely on a word boundary.
pointer to array - 4 or 8B depending on the platform
three ints for the offset, length, and memoized hashcode - 12B
2 x object overhead - depends on the VM, but 8B is a good rule of thumb.
one int for the array length.
So roughly 72B of which the actual payload constitutes 44.4%. The payload constitutes more for longer strings.
In Java7, some JDK implementations are doing away with backing array sharing to avoid pinning large char[]s in memory. That allows them to do away with 2 of the three ints.
That changes the calculation to 64B for a string of length 16 of which the actual payload constitutes 50%.
Is it possible to save character data using less memory than a Java String? Yes.
Does it matter for "enterprise" applications (or even Android or J2ME applications, which have to get by on a lot less memory)? Almost never.
Premature optimization is the root...
Compared to a other data types that you have, it is definitely high. The other primitives use 32 bits,64 bits,etc.
And given that String is immutable, every time you perform any operation on it, you end up creating a new String object, consuming even more memory.
My searches on SO have failed me, so if this is a duplicate, please redirect me.
With that out of the way, my question: I learned, from experience and browsing SO that a Java boolean is stored as a 32-bit int if you declare it as a standalone value, but as an 8-bit byte if you declare it within an array. My question, here, is as follows: Which is more memory efficient? Does the meta data of the array make it bigger in memory than the alternative?
boolean oneVariable = false, oneArray[] = {false};
The Array is an actual Object that comes with a memory penalty (I believe 12 bytes) So the primitive boolean is smaller.
The "meta data" of the array includes:
8 bytes (32-bit JVM) or 16 bytes (64-bit JVM) for object header
4 bytes (32 bits) for the length of the array
Add on the 1 necessary byte for the boolean data and you have 13 bytes (32 bit) or 21 bytes (64 bit) at a minimum.
However, objects are allocated memory in 8-byte multiples, so even though you only need 12 or 20 bytes of overhead + 1 byte for the boolean, you'll end up using 16 or 24 bytes of memory, respectively, for your array object.
In addition to the 16/24 bytes the object itself will take up, you'll need 4 bytes (32 bit) or 8 bytes (64 bit) for the memory address of the object, totaling 20 or 32 bytes of memory, respectively, to store your boolean in an array.
The size of a standalone variable is JVM dependent. Java does not specify the size of storage, and in fact Oracle says
This data type represents one bit of information, but its "size" isn't something that's precisely defined.
Older JVMs use a 32-bit stack cell, used to hold local variables, method arguments, and expression values so a single boolean used as a variable would consume 4 bytes; making the array at least 5 times as expensive as for a single boolean. This answer may be different if, for example, the boolean is a class variable in which case it would just be a single byte added to the existing overhead. In newer JVMs a single boolean would only use 1 byte, but depending on its context and the 8-byte padding necessary to align memory addresses, could still consume up to 8 bytes of heap space. It would still be smaller than the boolean array.
As user949300 mentioned, all objects carry a penalty that make them larger than primitives. For only a single boolean though, memory doesn't really matter. If you are storing a large number of booleans, consider using a BitSet. I believe under the hood it uses approximately 1 bit per boolean (plus some overhead).
This Java specialist article is a good source for understanding the memory usage.