String vs char[]

String vs char[] - java

I have some slides from IBM named : "From Java Code to Java Heap: Understanding the Memory Usage of Your Application", that says, when we use String instead of char[], there is
Maximum overhead would be 24:1 for a single character!
but I am not able to understand what overhead is referred here. Can anybody please help?
Source :

This figure relates to JDK 6- 32-bit.
JDK 6
In pre-Java-7 world strings which were implemented as a pointer to a region of a char[] array:
// "8 (4)" reads "8 bytes for x64, 4 bytes for x32"
class String{ //8 (4) house keeping + 8 (4) class pointer
char[] buf; //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
int offset; //4 bytes -> three int
int length; //4 bytes -> fields align to
int hash; //4 bytes -> 16 (12) bytes
}
So I counted:
36 bytes per new String("a") for JDK 6 x32 <-- the overhead from the article
56 bytes per new String("a") for JDK 6 x64.
JDK 7
Just to compare, in JDK 7+ String is a class which holds a char[] buffer and a hash field only.
class String{ //8 (4) + 8 (4) bytes -> 16 (8) aligned
char[] buf; //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
int hash; //4 bytes -> 8 (4) aligned
}
So it's:
28 bytes per String for JDK 7 x32
48 bytes per String for JDK 7 x64.
UPDATE
For 3.75:1 ratio see #Andrey's explanation below. This proportion falls down to 1 as the length of the string grows.
Useful links:
Memory usage of Java Strings and string-related objects.
Calculate memory of a Map Entry - a simple technique to get a size of an object.

In the JVM, a character variable is stored in a single 16-bit memory allocation and changes to that Java variable overwrite that same memory location.This makes creating or updating character variables very fast and memory-cheap, but increases the JVM's overhead compared to the static allocation as used in Strings.
The JVM stores Java Strings in a variable size memory space (essentially, an array), which is exactly the same size (plus 1, for the string termination character) of the string when the String object is created or first assigned a value. Thus, an object with initial value "HELP!" would be allocated 96 bits of storage ( 6 characters, each 16-bits in size). This value is considered immutable, allowing the JVM to inline references to that variable, making static string assignments very fast, and very compact, plus very efficient from the JVM point of view.
Reference

I'll try explaining the numbers referenced in the source article.
The article describes object metadata typically consisting of: class, flags and lock.
The class and lock are stored in the object header and take 8 bytes on 32bit VM. I haven't found though any information about JVM implementations which has flags info in the object header. It might be so that this is stored somewhere externally (e.g. by garbage collector to count references to the object etc.).
So let's assume that the article talks about some x32 AbstractJVM which uses 12 bytes of memory to store meta information about the object.
Then for char[] we have:
12 bytes of meta information (8 bytes on x32 JDK 6, 16 bytes on x64 JDK)
4 bytes for array size
2 bytes for each character stored
2 bytes of alignment if characters number is odd (on x64 JDK: 2 * (4 - (length + 2) % 4))
For java.lang.String we have:
12 bytes of meta information (8 bytes on x32 JDK6, 16 bytes on x64 JDK6)
16 bytes for String fields (it is so for JDK6, 8 bytes for JDK7)
memory needed to store char[] as described above
So, let's count how much memory is needed to store "MyString" as String object:
12 + 16 + (12 + 4 + 2 * "MyString".length + 2 * ("MyString".length % 2)) = 60 bytes.
From other side we know that to store only the data (without information about the data type, length or anything else) we need:
2 * "MyString".length = 16 bytes
Overhead is 60 / 16 = 3.75
Similarly for single character array we get the 'maximum overhead':
12 + 16 + (12 + 4 + 2 * "a".length + 2 * ("a".length % 2)) = 48 bytes
2 * "a".length = 2 bytes
48 / 2 = 24
Following the article authors' logic ultimately the maximum overhead of value infinity is achieved when we store an empty string :).

I had read from old stackoverflow answer not able to get it.
In Oracle's JDK a String has four instance-level fields:
A character array
An integral offset
An integral character count
An integral hash value
That means that each String introduces an extra object reference (the String itself), and three integers in addition to the character array itself. (The offset and character count are there to allow sharing of the character array among String instances produced through the String#substring() methods, a design choice that some other Java library implementers have eschewed.) Beyond the extra storage cost, there's also one more level of access indirection, not to mention the bounds checking with which the String guards its character array.
If you can get away with allocating and consuming just the basic character array, there's space to be saved there. It's certainly not idiomatic to do so in Java though; judicious comments would be warranted to justify the choice, preferably with mention of evidence from having profiled the difference.

Related

How many bytes are needed to store a string of fixed length in java? [duplicate]

I read a lot about memory allocation for Strings lately and can't find any details if things are the same with Java 8.
How much memory space would a String like "Alexandru Tanasescu" use in Java 8?
I use the 64bit version.

Java7 or lower
Minimum String memory usage :
(bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)
So
80 = 8 * (int) ((((19) * 2) + 45) / 8)
Understanding String memory usage (SOURCE)
To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far).
Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the String contains, say, 19 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 19*2=38 bytes for the seventeen chars. Since 12+38=50 isn't a multiple of 8, we also need to round up to the next multiple of 8 (56). So overall, our 19-character String will use up 56+24 = 80 bytes.
Java8.
Java 8 does not have the offset and length anymore. Only hash and the CharArray. #Thomas Jungblut
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
So, in Java8 the way to calculate memory for strings remains same but you must subtract 8 bytes less due to the missing offset and length.

"Alexandru Tanasescu" uses 104 bytes. This is how to get the size
long m0 = Runtime.getRuntime().freeMemory();
String s = new String("Alexandru Tanasescu");
long m1 = Runtime.getRuntime().freeMemory();
System.out.println(m0 - m1);
Note: run it with -XX:-UseTLAB option

If you look at the Oracle Java 8 sources, you have:
A char value[] and an int hash. A char is 2 bytes, and an int is 4 bytes.
So wouldn't the answer be yourstring.length * 2 + 4?
No. Every object had overhead. An array stores its dimensions, for example. And both the array (an object) and the string will incur extra memory from the garbage collector storing information about them.
There is no reliable way to calculate this, because AFAIK each JRE and JDK has no obligation to the size of object overhead.

According to the following JEP:
http://openjdk.java.net/jeps/254
The current implementation of the String class stores characters in a
char array, using two bytes (sixteen bits) for each character.
In Java SE 9 this might change.
Note however, since this is a JEP not a JSR (and it mentions implementation), I understand, that this is implementations specific and not defined by the JLS.

How much memory does a string use in Java 8?

I read a lot about memory allocation for Strings lately and can't find any details if things are the same with Java 8.
How much memory space would a String like "Alexandru Tanasescu" use in Java 8?
I use the 64bit version.

Java7 or lower
Minimum String memory usage :
(bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)
So
80 = 8 * (int) ((((19) * 2) + 45) / 8)
Understanding String memory usage (SOURCE)
To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far).
Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.
If the String contains, say, 19 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 19*2=38 bytes for the seventeen chars. Since 12+38=50 isn't a multiple of 8, we also need to round up to the next multiple of 8 (56). So overall, our 19-character String will use up 56+24 = 80 bytes.
Java8.
Java 8 does not have the offset and length anymore. Only hash and the CharArray. #Thomas Jungblut
a char array— thus a separate object— containing the actual characters;
an integer offset into the array at which the string starts;
the length of the string;
another int for the cached calculation of the hash code.
So, in Java8 the way to calculate memory for strings remains same but you must subtract 8 bytes less due to the missing offset and length.

"Alexandru Tanasescu" uses 104 bytes. This is how to get the size
long m0 = Runtime.getRuntime().freeMemory();
String s = new String("Alexandru Tanasescu");
long m1 = Runtime.getRuntime().freeMemory();
System.out.println(m0 - m1);
Note: run it with -XX:-UseTLAB option

If you look at the Oracle Java 8 sources, you have:
A char value[] and an int hash. A char is 2 bytes, and an int is 4 bytes.
So wouldn't the answer be yourstring.length * 2 + 4?
No. Every object had overhead. An array stores its dimensions, for example. And both the array (an object) and the string will incur extra memory from the garbage collector storing information about them.
There is no reliable way to calculate this, because AFAIK each JRE and JDK has no obligation to the size of object overhead.

According to the following JEP:
http://openjdk.java.net/jeps/254
The current implementation of the String class stores characters in a
char array, using two bytes (sixteen bits) for each character.
In Java SE 9 this might change.
Note however, since this is a JEP not a JSR (and it mentions implementation), I understand, that this is implementations specific and not defined by the JLS.

Size of an object and a string in Java [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I was trying to find out efficient data types.
I know int is 4 bytes and char is one byte.
an object which contains five integers (4 * 5 = 20 bytes)
a String object which has ten characters. ( Suppose it has 10 characters 10 * 1 = 10 bytes)
Am I right?
Which one do you think it is better?

The objective answer first:
Primitive data types are documented here
Strings are more complicated because the JVM can intern them. See a good explanation here
The not so objective answer: pick the data structure that makes for the best design for your application.
If you have a specific constraint in your application, post more details about the data you need to handle and the constraints you have.

A String is not just an array of characters, it is an independent object, and has fields other than its backing char[]. For example, String has three int fields: offset, count and hash. The empty string, therefore, is generally 16 bytes (since we also need to take the char[] field into account) plus the normal 8 bytes of object overhead. Also note that a char[] is itself an object, and has the int field length and an associated object overhead. Once you have taken all this into account, then you can add the two (not one!) bytes per char.
So, for a 10-character string:
3 int fields: 12 bytes
char[] field: 8 bytes
int field: 4 bytes
object overhead: 8 bytes
10 characters: 20 bytes
object overhead: 8 bytes
This comes out to about 60 bytes. I say "about" because some of this is dependent on the VM.

You are incorrect about chars in Java: since they are designed to hold 16-bit UNICODE code points, they take two, not one byte each. In the end, both representations will take the same amount of memory.
You should pick the data type that makes the most sense to you, the designer of your classes, and to the readers of your code. Memory concerns should not be at the top of your design priorities unless the number of objects that you need threatens to overflow your available memory. Even then you should do careful memory profiling before you optimize.

Characters are 2 bytes in size. They are equivalent to an unsigned short, so a character's value can range between [0, 65535] inclusive.
The number of bytes a String occupies is actually:
string.length * 2
So for your example, a 10 character string occupies 20 bytes, not 10 bytes.
This would be just the string content. There are other variables within the String class which will occupy more bytes of course. And even an empty object occupies a certain number of bytes that will vary based on the JVM implementation.
However, just the character content will occupy 2 bytes per character.
But don't worry about this as its most assuredly premature optimization. Clean code is more important than lightning fast code usually. Pick appropriate data types, write code that's easy to follow and read. These things are more important.
If you are worried about holding large strings in memory consider changing your approach. The most common problem I see with large strings is when new programmers read an entire file into memory.
If you are doing this, try processing data line by line. Only hold the smallest unit you need in memory at a time, perform your processing, and move on.

I know int is 4 bytes
correct
and char is one byte.
A char is a 16-bit unsigned integer, so 2 bytes
an object which contains five integers (4 * 5 = 20 bytes)
A Object has a header which is 12 bytes on a 32-bit JVM and 16 bytes on a 64-bit JVM. Objects are 8 byte aligned, possibly 16 or 32 byte aligned if this is changed.
This means a new int[5] uses 16 + 20 + 4 (padding) = 40 bytes
a String object which has ten characters. ( Suppose it has 10 characters 10 * 1 = 10 bytes)
A String uses ~24 bytes with header and length fields etc, but it wraps a char[] which contains the actual chars, which is a further 16+20+4 = 40 bytes.
A simple way to check this is to use the following. Make sure you use -XX:-UseTLAB which improves memory accounting (but is slower for multi-threaded programming)
public static void main(String... ignored) {
char[] chars = new char[10];
long used = memoryUsed();
String s= new String(chars);
long diff = memoryUsed() - used;
if (diff == 0) throw new AssertionError("You must set -XX:-UseTLAB on the command line");
System.out.printf("Creating a String of 10 characters used %,d bytes of memory%n", diff);
}
private static long memoryUsed() {
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
prints
Creating a String of 10 characters used 64 bytes of memory

Is an array of one boolean in Java smaller than a standalone variable?

My searches on SO have failed me, so if this is a duplicate, please redirect me.
With that out of the way, my question: I learned, from experience and browsing SO that a Java boolean is stored as a 32-bit int if you declare it as a standalone value, but as an 8-bit byte if you declare it within an array. My question, here, is as follows: Which is more memory efficient? Does the meta data of the array make it bigger in memory than the alternative?
boolean oneVariable = false, oneArray[] = {false};

The Array is an actual Object that comes with a memory penalty (I believe 12 bytes) So the primitive boolean is smaller.

The "meta data" of the array includes:
8 bytes (32-bit JVM) or 16 bytes (64-bit JVM) for object header
4 bytes (32 bits) for the length of the array
Add on the 1 necessary byte for the boolean data and you have 13 bytes (32 bit) or 21 bytes (64 bit) at a minimum.
However, objects are allocated memory in 8-byte multiples, so even though you only need 12 or 20 bytes of overhead + 1 byte for the boolean, you'll end up using 16 or 24 bytes of memory, respectively, for your array object.
In addition to the 16/24 bytes the object itself will take up, you'll need 4 bytes (32 bit) or 8 bytes (64 bit) for the memory address of the object, totaling 20 or 32 bytes of memory, respectively, to store your boolean in an array.
The size of a standalone variable is JVM dependent. Java does not specify the size of storage, and in fact Oracle says
This data type represents one bit of information, but its "size" isn't something that's precisely defined.
Older JVMs use a 32-bit stack cell, used to hold local variables, method arguments, and expression values so a single boolean used as a variable would consume 4 bytes; making the array at least 5 times as expensive as for a single boolean. This answer may be different if, for example, the boolean is a class variable in which case it would just be a single byte added to the existing overhead. In newer JVMs a single boolean would only use 1 byte, but depending on its context and the 8-byte padding necessary to align memory addresses, could still consume up to 8 bytes of heap space. It would still be smaller than the boolean array.

As user949300 mentioned, all objects carry a penalty that make them larger than primitives. For only a single boolean though, memory doesn't really matter. If you are storing a large number of booleans, consider using a BitSet. I believe under the hood it uses approximately 1 bit per boolean (plus some overhead).

This Java specialist article is a good source for understanding the memory usage.

What is the memory consumption of an object in Java?

Is the memory space consumed by one object with 100 attributes the same as that of 100 objects, with one attribute each?
How much memory is allocated for an object?
How much additional space is used when adding an attribute?

Mindprod points out that this is not a straightforward question to answer:
A JVM is free to store data any way it pleases internally, big or little endian, with any amount of padding or overhead, though primitives must behave as if they had the official sizes.
For example, the JVM or native compiler might decide to store a boolean[] in 64-bit long chunks like a BitSet. It does not have to tell you, so long as the program gives the same answers.
It might allocate some temporary Objects on the stack.
It may optimize some variables or method calls totally out of existence replacing them with constants.
It might version methods or loops, i.e. compile two versions of a method, each optimized for a certain situation, then decide up front which one to call.
Then of course the hardware and OS have multilayer caches, on chip-cache, SRAM cache, DRAM cache, ordinary RAM working set and backing store on disk. Your data may be duplicated at every cache level. All this complexity means you can only very roughly predict RAM consumption.
Measurement methods
You can use Instrumentation.getObjectSize() to obtain an estimate of the storage consumed by an object.
To visualize the actual object layout, footprint, and references, you can use the JOL (Java Object Layout) tool.
Object headers and Object references
In a modern 64-bit JDK, an object has a 12-byte header, padded to a multiple of 8 bytes, so the minimum object size is 16 bytes. For 32-bit JVMs, the overhead is 8 bytes, padded to a multiple of 4 bytes. (From Dmitry Spikhalskiy's answer, Jayen's answer, and JavaWorld.)
Typically, references are 4 bytes on 32bit platforms or on 64bit platforms up to -Xmx32G; and 8 bytes above 32Gb (-Xmx32G). (See compressed object references.)
As a result, a 64-bit JVM would typically require 30-50% more heap space. (Should I use a 32- or a 64-bit JVM?, 2012, JDK 1.7)
Boxed types, arrays, and strings
Boxed wrappers have overhead compared to primitive types (from JavaWorld):
Integer: The 16-byte result is a little worse than I expected because an int value can fit into just 4 extra bytes. Using an Integer costs me a 300 percent memory overhead compared to when I can store the value as a primitive type
Long: 16 bytes also: Clearly, actual object size on the heap is subject to low-level memory alignment done by a particular JVM implementation for a particular CPU type. It looks like a Long is 8 bytes of Object overhead, plus 8 bytes more for the actual long value. In contrast, Integer had an unused 4-byte hole, most likely because the JVM I use forces object alignment on an 8-byte word boundary.
Other containers are costly too:
Multidimensional arrays: it offers another surprise.
Developers commonly employ constructs like int[dim1][dim2] in numerical and scientific computing.
In an int[dim1][dim2] array instance, every nested int[dim2] array is an Object in its own right. Each adds the usual 16-byte array overhead. When I don't need a triangular or ragged array, that represents pure overhead. The impact grows when array dimensions greatly differ.
For example, a int[128][2] instance takes 3,600 bytes. Compared to the 1,040 bytes an int[256] instance uses (which has the same capacity), 3,600 bytes represent a 246 percent overhead. In the extreme case of byte[256][1], the overhead factor is almost 19! Compare that to the C/C++ situation in which the same syntax does not add any storage overhead.
String: a String's memory growth tracks its internal char array's growth. However, the String class adds another 24 bytes of overhead.
For a nonempty String of size 10 characters or less, the added overhead cost relative to useful payload (2 bytes for each char plus 4 bytes for the length), ranges from 100 to 400 percent.
Alignment
Consider this example object:
class X { // 8 bytes for reference to the class definition
int a; // 4 bytes
byte b; // 1 byte
Integer c = new Integer(); // 4 bytes for a reference
}
A naïve sum would suggest that an instance of X would use 17 bytes. However, due to alignment (also called padding), the JVM allocates the memory in multiples of 8 bytes, so instead of 17 bytes it would allocate 24 bytes.

It depends on the CPU architecture and JDK. For a modern JDK and 64-bit architecture, an object has 12-byte header and padding of 8 bytes - so the minimum object size is 16 bytes. You can use a tool called Java Object Layout to determine a size and get details about any entity's object layout and internal structure or guess this information by class reference. Example of output for Integer instance on my environment:
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
java.lang.Integer object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 int Integer.value N/A
Instance size: 16 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
For Integer, the instance size is 16 bytes: 4-bytes int is placed in place right after the 12-byte header. And it doesn't need any additional "padding", because 16 is a multiple of 8 (which is a RAM word size on 64-bits architecture) without remainder.
Code sample:
import org.openjdk.jol.info.ClassLayout;
import org.openjdk.jol.util.VMSupport;
public static void main(String[] args) {
System.out.println(VMSupport.vmDetails());
System.out.println(ClassLayout.parseClass(Integer.class).toPrintable());
}
If you use maven, to get JOL:
<dependency>
<groupId>org.openjdk.jol</groupId>
<artifactId>jol-core</artifactId>
<version>0.3.2</version>
</dependency>

Each object has a certain overhead for its associated monitor and type information, as well as the fields themselves. Beyond that, fields can be laid out pretty much however the JVM sees fit (I believe) - but as shown in another answer, at least some JVMs will pack fairly tightly. Consider a class like this:
public class SingleByte
{
private byte b;
}
vs
public class OneHundredBytes
{
private byte b00, b01, ..., b99;
}
On a 32-bit JVM, I'd expect 100 instances of SingleByte to take 1200 bytes (8 bytes of overhead + 4 bytes for the field due to padding/alignment). I'd expect one instance of OneHundredBytes to take 108 bytes - the overhead, and then 100 bytes, packed. It can certainly vary by JVM though - one implementation may decide not to pack the fields in OneHundredBytes, leading to it taking 408 bytes (= 8 bytes overhead + 4 * 100 aligned/padded bytes). On a 64 bit JVM the overhead may well be bigger too (not sure).
EDIT: See the comment below; apparently HotSpot pads to 8 byte boundaries instead of 32, so each instance of SingleByte would take 16 bytes.
Either way, the "single large object" will be at least as efficient as multiple small objects - for simple cases like this.

It appears that every object has an overhead of 16 bytes on 32-bit systems (and 24-byte on 64-bit systems).
http://algs4.cs.princeton.edu/14analysis/ is a good source of information. One example among many good ones is the following.
http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf is also very informative, for example:

The total used / free memory of a program can be obtained in the program via
java.lang.Runtime.getRuntime();
The runtime has several methods which relate to the memory. The following coding example demonstrates its usage.
public class PerformanceTest {
private static final long MEGABYTE = 1024L * 1024L;
public static long bytesToMegabytes(long bytes) {
return bytes / MEGABYTE;
}
public static void main(String[] args) {
// I assume you will know how to create an object Person yourself...
List <Person> list = new ArrayList <Person> ();
for (int i = 0; i <= 100_000; i++) {
list.add(new Person("Jim", "Knopf"));
}
// Get the Java runtime
Runtime runtime = Runtime.getRuntime();
// Run the garbage collector
runtime.gc();
// Calculate the used memory
long memory = runtime.totalMemory() - runtime.freeMemory();
System.out.println("Used memory is bytes: " + memory);
System.out.println("Used memory is megabytes: " + bytesToMegabytes(memory));
}
}

Is the memory space consumed by one object with 100 attributes the same as that of 100 objects, with one attribute each?
No.
How much memory is allocated for an object?
The overhead is 8 bytes on 32-bit, 12 bytes on 64-bit; and then rounded up to a multiple of 4 bytes (32-bit) or 8 bytes (64-bit).
How much additional space is used when adding an attribute?
Attributes range from 1 byte (byte) to 8 bytes (long/double), but references are either 4 bytes or 8 bytes depending not on whether it's 32bit or 64bit, but rather whether -Xmx is < 32Gb or >= 32Gb: typical 64-bit JVM's have an optimisation called "-UseCompressedOops" which compress references to 4 bytes if the heap is below 32Gb.

No, registering an object takes a bit of memory too. 100 objects with 1 attribute will take up more memory.

The question will be a very broad one.
It depends on the class variable or you may call as states memory usage in java.
It also has some additional memory requirement for headers and referencing.
The heap memory used by a Java object includes
memory for primitive fields, according to their size (see below for Sizes of primitive types);
memory for reference fields (4 bytes each);
an object header, consisting of a few bytes of "housekeeping" information;
Objects in java also requires some "housekeeping" information, such as recording an object's class, ID and status flags such as whether the object is currently reachable, currently synchronization-locked etc.
Java object header size varies on 32 and 64 bit jvm.
Although these are the main memory consumers jvm also requires additional fields sometimes like for alignment of the code e.t.c.
Sizes of primitive types
boolean & byte -- 1
char & short -- 2
int & float -- 4
long & double -- 8

I've gotten very good results from the java.lang.instrument.Instrumentation approach mentioned in another answer. For good examples of its use, see the entry, Instrumentation Memory Counter from the JavaSpecialists' Newsletter and the java.sizeOf library on SourceForge.

In case it's useful to anyone, you can download from my web site a small Java agent for querying the memory usage of an object. It'll let you query "deep" memory usage as well.

no, 100 small objects needs more information (memory) than one big.

The rules about how much memory is consumed depend on the JVM implementation and the CPU architecture (32 bit versus 64 bit for example).
For the detailed rules for the SUN JVM check my old blog
Regards,
Markus

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.