Java object which takes the least memory - java

This is a silly question, but here it goes.
I have a multithreaded program and a "global" Collection of unique elements. I rejected synchronized Set implementations due to performance, for the ConcurrentHashMap. I don't really need the Value part of Map, so I wanted to use the smallest Object in java in terms of memory usage. I solved this issue in a different way (single Boolean object referenced multiple times in the Map), but I am still curious what is the smallest object in Java. I always thought it to be Boolean, but that is not true I think (Java - boolean primitive type - size, Primitive Data Types)

It doesn't really matter, actually, since the value part of each association is fixed to be a reference. You might even use null as value here, but any other (fixed) object reference should be fine (and more convenient sometimes). I'd prefer Boolean.TRUE (or a similar "well known" singleton). You can then test for membership via
if (myMap.get(someKey) != null) { ... }
in addition to
if (myMap.containsKey(someKey)) { ... }

If you want a Set<K> that is backed by a ConcurrentHashMap, you should use Collections.newSetFromMap, e.g.
final Set<K> set = Collections.newSetFromMap(new ConcurrentHashMap<K, Boolean>());
Now, if you really want to reinvent the wheel, and care that much about memory usage, I suggest you merely use a plain Object as your value. Since every object in Java inherits from Object (the universal base class), the size of any object in memory must be greater than or equal to the size of a plain Object. You cannot use a primitives since generic type arguments must be Objects.
EDIT: Actually, allocating a particular object to use as your value here will take more memory than using a preexisting object which will likely be allocated for anyways. You can just use a reference to an object that will more or less always be allocated during VM initialization, e.g. Object.class. I really suggest you just use the first solution, though.

An object's size consists of:
the size of the instance variables it holds
an 8 or 16 bytes header (depending on the Hotspot VM (32/64bit))
a padding: its size is always padded to be a multiple of 8 bytes.
E.g (assuming a 32bit JVM) :
public MyBoolObject {
boolean flag;
}
will take up 16 bytes: 8bytes(header) + 1byte(instance variable) + 7bytes(padding).
Since you are not interested in the map values you can set them to null. This consumes 4 or 8 bytes memory from the stack (32/64bit).
You might also check this good list on cost/elements of well-known Java data structures:
http://code.google.com/p/memory-measurer/wiki/ElementCostInDataStructures

The primitive data types are not objects.
Since all objects in java must inherit from the super class Object. Then the smallest conceivable object in java would be a class that you define that has no members. Such a class would be pretty useless.

The Object class is instantiable and its instances are definitely the smallest objects in Java. However, many other objects have exactly the same footprint, Integer and Boolean being examples on 64-bit VMs. This is due to heap memory alignment.

Related

Why JAVA HashSet uses HashMap internally? Doesn't it waste's memory? [duplicate]

Looking at the source of Java 6, HashSet<E> is actually implemented using HashMap<E,Object>, using dummy object instance on every entry of the Set.
I think that wastes 4 byte (on 32-bit machines) for the size of the entry itself.
But, why is it still used? Is there any reason to use it besides making it easier to maintain the code?
Actually, it's not just HashSet. All implementations of the Set interface in Java 6 are based on an underlying Map. This is not a requirement; it's just the way the implementation is. You can see for yourself by checking out the documentation for the various implementations of Set.
Your main questions are
But, why is it still used? Is there
any reason to use it besides making it
easier to maintain the codes?
I assume that code maintenance is a big motivating factor. So is preventing duplication and bloat.
Set and Map are similar interfaces, in that duplicate elements are not allowed. (I think the only Set not backed by a Map is CopyOnWriteArraySet, which is an unusual Collection, because it's immutable.)
Specifically:
From the documentation of Set:
A collection that contains no
duplicate elements. More formally,
sets contain no pair of elements e1
and e2 such that e1.equals(e2), and at
most one null element. As implied by
its name, this interface models the
mathematical set abstraction.
The Set interface places additional
stipulations, beyond those inherited
from the Collection interface, on the
contracts of all constructors and on
the contracts of the add, equals and
hashCode methods. Declarations for
other inherited methods are also
included here for convenience. (The
specifications accompanying these
declarations have been tailored to the
Set interface, but they do not contain
any additional stipulations.)
The additional stipulation on
constructors is, not surprisingly,
that all constructors must create a
set that contains no duplicate
elements (as defined above).
And from Map:
An object that maps keys to values.
A map cannot contain duplicate keys; each key can map to at most one value.
If you can implement your Sets using existing code, any benefit (speed, for example) you can realize from existing code accrues to your Set as well.
If you choose to implement a Set without a Map backing, you have to duplicate code designed to prevent duplicate elements. Ah, the delicious irony.
That said, there's nothing preventing you from implementing your Sets differently.
My guess is that HashSet was originally implemented in terms of HashMap in order to get it done quickly and easily. In terms of lines of code, HashSet is a fraction of HashMap.
I would guess that the reason it still hasn't been optimized is fear of change.
However, the waste is much worse than you think. On both 32-bit and 64-bit, HashSet is 4x larger than necessary, and HashMap is 2x larger than necessary. HashMap could be implemented with an array with keys and values in it (plus chains for collisions). That means two pointers per entry, or 16 bytes on a 64-bit VM. In fact, HashMap contains an Entry object per entry, which adds 8 bytes for the pointer to the Entry and 8 bytes for the Entry object header. HashSet also uses 32 bytes per element, but the waste is 4x instead of 2x since it only requires 8 bytes per element.
Yes you are right, a small amount of wastage is definetley there. Small because, for every entry it uses the same object PRESENT(which is declared final). Hence the only wastage is for every entry's value in the HashMap.
Mostly I think, they took this approach for maintainability and reusability. (The JCF developers would have thought, we have tested HashMap anyway, why not reuse it.)
But if you are having huge collections, and you are a memory freak, then you may opt out for better alternatives like Trove or Google Collections.
I looked at your question and it took me a while to think about what you said. So here's my opinion regarding the HashSet implementation.
It is necessary to have the dummy instance to know if the value is or is not present in the set.
Take a look at the add method
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
Abd now let's take a look at the put return value
#returns the previous value associated with key, or null if there was no mapping for key. (A null return can also indicate that the map previously associated null with key.)
So the PRESENT object is just used to represent that the set contains the e value. I think you asked why not use null instead of PRESENT. But the, you would not be able to distinguish if the entry was previously on the map because map.put(key,value) would always return null and you would not have way to know if the key existed.
That being said you could argue that they could have used an implementation like this
public boolean add(E e) {
if( map.containsKey(e) ) {
return false;
}
map.put(e, null);
return true;
}
I guess they waste 4 bytes to avoid computing the hashCode, as it could be expensive, of the key two times (if the key is going to be added).
If you question referred to why they used a HashMap that would waste 8 bytes (because of the Map.Entry) instead of some other data structure using a similar Entry of only 4, then yes, I would say they did it for the reasons you mentioned.
I am guessing that it has never turned up as a significant problem for real applications or important benchmarks. Why complicate the code for no real benefit?
Also note, that object sizes are rounded up in many JVM implementation, so there may not actually be an increase in size (I don't know for this example). Also the code for HashMap is likely to be compiled and in cache. Other things being equal, more code => more cache misses => lower performance.
After searching through pages like this wondering why the mildly inefficient standard implementation, found com.carrotsearch.hppc.IntOpenHashSet
Your question:
I think that wastes 4 byte (on 32-bit machines) for the size of the entry itself.
Just one Object variable is created for the entire datastructure of hashset and doing that would save yourself from re-writing the entire hashMap kind of code again.
private static final Object PRESENT = new Object();
All the keys are having one value i.e PRESENT object.

Purpose of new keyword in creating array in Java

I want to know why an array created in Java static even when we use the new keyword to define it.
From what I've read, the new keyword allocates a memory space in the heap whenever it is encountered during run time, so why give the size of the array at all during definition.
e.g. Why can't
int[] array1=new int[20];
simply be:
int[] array1=new int[];
I know that it does not grow automatically and we have ArrayList for that but then what is the use of keyword new in this? It could have been defined as int array1[20]; like we used to do it in C, C++ if it has to be static.
P.S. I know this is an amateurish question but I am an amateur, I tried to Google but couldn't find anything comprehensive.
This may be an amateurish question, but it is one of the best amateurish questions you could make.
In order for java to allow you to declare arrays without new, it would have to support an additional kind of data type, which would behave like a primitive in the sense that it would not require allocation, but it would be very much unlike a primitive in the sense that it would be of variable size. That would have immensely complicated the compiler and the JVM.
The approach taken by java is to provide the bare minimum and sufficient primitives in order to be able to get most things done efficiently, and let everything else be done using objects. That's why arrays are objects.
Also, you might be a bit confused about the meaning of "static" here. In C, "static" means "of file scope", that is, not visible by other object files. In C++ and in Java, "static" means "belongs to the class" rather than "belongs to instances of the class". So, the term "static" is not suitable for describing array allocation. "Fixed size" or "fixed, predefined size" would be more suitable terms.
Well, in Java everything is an object, including arrays (they have length and other data). Thats why you cannot use
int var[20];
In java that would be an int and the compiler would be confused. Instead by using this:
int[] var;
You are declaring that var is of type int[] (int array) so Java understands it.
Also in java the length of the array and other data are saved on the array, for this reason you don't have to declare size of array during declaration, instead when creating an array (using new) the data are saved.
Maybe there is a better reason that oracle may have answered already, but the fact that in Java everything is an object must have something to do with it. Java is quite specific about objects and types, unlike C where you have more freedom but everything is more loose (especially using pointers).
The main idea of the array data structure is that all its elements are located in the sequential row of memory cells. That is why you can not create array with variable size: it should be unbounbed space vector in memory for this purpose, which is impossible.
If you want change size of array, you should recreate it.
Since arrays are fixed-size they need to know how much memory to allocate at the time they are instantiated.
ArrayLists or other resizing data structures that internally use arrays to store data actually re-allocate larger arrays when their inner array data
structure fills up.
My understanding of OP's reasoning is:
new is used for allocating dynamic objects (which can grow like, ArrayList), but arrays are static (can't grow). So one of them is unnecessary: the new or the size of the array.
If that is the question, then the answer is simple:
Well, in Java new is necessary for every Object allocation, because in Java all objects are dynamically allocated.
Turns out that in Java, arrays are objects, different from C/C++ where they are not.
All of Java's variables are at most a single 64bit field. Either primitives like
integer (32bit)
long (64bit)
...
or references to Objects which depending on JVM / config / OS are 64 or 32 bit fields (but unlike 64bit primitives with atomicity guaranteed).
There is no such thing as C's int[20] "type". Neither is there C's static.
What int[] array = new int[20] boils down to is roughly
int* array = malloc(20 * sizeof(java_int))
Each time you see new in Java you can imagine a malloc and a call to the constructor method in case it's a real Object (not just an array). Each Object is more or less just a struct of a few primitives and more pointers.
The result is a giant network of relatively small structs pointing to other things. And the garbage collector's task is to free all the leaves that have fallen off the network.
And this is also the reason why you can say Java is copy by value: both primitives and pointers are always copied.
regarding static in Java: there is conceptually a struct per class that represents the static context of a class. That's the place where static instance variables are anchored. Non-static instance variables are anchored at with their own instance-struct
class Car {
static int[] forAllCars = new int[20];
Object perCar;
}
...
new Car();
translates very loosely (my C is terrible) to
struct Car-Static {
Object* forAllCars;
};
struct Car-Instance {
Object* perCar;
};
// .. class load time. Happens once and this is referenced from some root object so it can't get garbage collected
struct Car-Static *car_class = (struct Car-Static*) malloc(sizeof(Car-Static));
car_class->forAllCars = malloc(20 * 4);
// .. for every new Car();
struct Car-Instance *new_reference = (struct Car-Instance*) malloc(sizeof(Car-Instance));
new_reference.perCar = NULL; // all things get 0'd
new_reference->constructor();
// "new" essentially returns the "new_reference" then

(Java reference situation) Should iI do what findbugs tools tells me to?

I ran Findbug tool on my project and it found 18 problems of the type:
Storing reference to mutable object -> May expose internal representation by incorporating reference to mutable object
So I have a class which the constructor accepts array of type Object and assigns it to a private class member variable. Here is an example:
public Class HtmlCellsProcessing extends HtmlTableProcessing
{
private Object[] htmlCells;
public HtmlCellsProcessing(Object[] htmlCells)
{
this.htmlCells = htmlCells;
}
}
Here is a further explanation about the warning:
This code stores a reference to an externally mutable object into the internal representation of the object.  If instances are accessed by untrusted code, and unchecked changes to the mutable object would compromise security or other important properties, you will need to do something different. Storing a copy of the object is better approach in many situations.
The advice they give me is pretty obvious but what happens if the array's size is very big and if I copy its values into the member variable array the application is going to take twice more memory.
What should I do in such a scenario where I have large amount of data? Should I pass it as reference or always copy it?
It depends. You have multiple concerns, including space, time and correctness.
A defensive copy helps you guarantee that the list items will not change without the knowledge of the class holding the array. But it will take O(n) time and space.
For a very large array, you may find that the costs of a defensive copy in space and time are harmful to your application. If you control all the code with access to the array, it may be reasonable to guarantee correctness without a defensive copy, and suppress the FindBugs warning on that class.
I'd duggest you to try using immutable list from guava library. See http://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained
If both encapsulation and performance are required, the typical solution is to pass a reference to an immutable object instead.
Therefore, rather than pass a huge array directly, encapsulate it in an object that does not permit the array's modification:
final class ArraySnapshot {
final Object[] array;
ArraySnapshot(Object[] array) {
this.array = Arrays.copyOf(array);
}
// methods to read from the array
}
This object can now be passed around cheaply, but the since it is immutable, encapsulation is ensured.
This idea, of course, if nothing new: it's what String does for char[].
The advice they give me is pretty obvious but what happens if the
array's size is very big and if i copy its values into the member
variable array the application is going to take twice more memory.
In Java you copy references not object themselves unless you do a deep copy.
So if your only concern is to get rid of the warning (which is valid though especially if you don't understand what you actually store and you have multiple threads modifying the objects) you could do a copy without some much concerns on memory.

Java object arrays memory requirements

Suppose there is an Integer array in my class:
public class Foo {
private Integer[] arr = new Integer[20];
.....
}
On a 64 bit architecture the space requirement for this is ~ (20*8+24) + 24*20 {space required for references + some array overhead + space required for objects}.
Why java stores references to all of the 20 Integer objects? Wouldn't knowing that first memory location and the number of items in the array suffice? (assuming and I also as I read somewhere that objects in an array are placed contiguously anyways). I want to know the reason for this sort of implementation. Sorry if this is a noobish question.
Like every other class, Integer is a reference type. This means it can only be accessed indirectly, via a reference. You cannot store an instance of a reference type in a field, a local variable, a slot in a collection, etc. -- you always have to store a reference and allocate the object itself separately. There are a variety of reasons for this:
You need to be able to represent null.
You need to be able to replace it with another instance of a subtype (assuming subtypes are possible, i.e. the class is not final). For example, an Object[] may actually store instances of any number of different classes with wildly varying sizes.
You need to preserve sharing, e.g. after a[0] = a[1] = someObject; all three must refer to the same object. This is much more important (vital even) if the object is mutable, but even with immutable objects the difference can be observed via reference equality checks (==).
You need reference assignment to be atomic (cf. Java memory model), so copying the whole instance is even more expensive than it seems.
With these and many other constraints, always storing references is the only feasible implementation strategy (in general). In very specific circumstances, a JIT compiler may avoid allocating an object entirely and store its directly (e.g. on the stack), but this is an obscure implementation detail, and not widely applicable. I only mention this for completeness and because it's a wonderful illustration of the as-if rule.

is there a performance hit when using enum.values() vs. String arrays?

I'm using enumerations to replace String constants in my java app (JRE 1.5).
Is there a performance hit when I treat the enum as a static array of names in a method that is called constantly (e.g. when rendering the UI)?
My code looks a bit like this:
public String getValue(int col) {
return ColumnValues.values()[col].toString();
}
Clarifications:
I'm concerned with a hidden cost related to enumerating values() repeatedly (e.g. inside paint() methods).
I can now see that all my scenarios include some int => enum conversion - which is not Java's way.
What is the actual price of extracting the values() array? Is it even an issue?
Android developers
Read Simon Langhoff's answer below, which has pointed out earlier by Geeks On Hugs in the accepted answer's comments. Enum.values() must do a defensive copy
For enums, in order to maintain immutability, they clone the backing array every time you call the Values() method. This means that it will have a performance impact. How much depends on your specific scenario.
I have been monitoring my own Android app and found out that this simple call used 13.4% CPU time! in my specific case.
In order to avoid cloning the values array, I decided to simple cache the values as a private field and then loop through those values whenever needed:
private final static Protocol[] values = Protocol.values();
After this small optimisation my method call only hogged a negligible 0.0% CPU time
In my use case, this was a welcome optimisation, however, it is important to note that using this approach is a tradeoff of mutability of your enum. Who knows what people might put into your values array once you give them a reference to it!?
Enum.values() gives you a reference to an array, and iterating over an array of enums costs the same as iterating over an array of strings. Meanwhile, comparing enum values to other enum values can actually be faster that comparing strings to strings.
Meanwhile, if you're worried about the cost of invoking the values() method versus already having a reference to the array, don't worry. Method invocation in Java is (now) blazingly fast, and any time it actually matters to performance, the method invocation will be inlined by the compiler anyway.
So, seriously, don't worry about it. Concentrate on code readability instead, and use Enum so that the compiler will catch it if you ever try to use a constant value that your code wasn't expecting to handle.
If you're curious about why enum comparisons might be faster than string comparisons, here are the details:
It depends on whether the strings have been interned or not. For Enum objects, there is always only one instance of each enum value in the system, and so each call to Enum.equals() can be done very quickly, just as if you were using the == operator instead of the equals() method. In fact, with Enum objects, it's safe to use == instead of equals(), whereas that's not safe to do with strings.
For strings, if the strings have been interned, then the comparison is just as fast as with an Enum. However, if the strings have not been interned, then the String.equals() method actually needs to walk the list of characters in both strings until either one of the strings ends or it discovers a character that is different between the two strings.
But again, this likely doesn't matter, even in Swing rendering code that must execute quickly. :-)
#Ben Lings points out that Enum.values() must do a defensive copy, since arrays are mutable and it's possible you could replace a value in the array that is returned by Enum.values(). This means that you do have to consider the cost of that defensive copy. However, copying a single contiguous array is generally a fast operation, assuming that it is implemented "under the hood" using some kind of memory-copy call, rather than naively iterating over the elements in the array. So, I don't think that changes the final answer here.
As a rule of thumb : before thinking about optimizing, have you any clue that this code could slow down your application ?
Now, the facts.
enum are, for a large part, syntactic sugar scattered across the compilation process. As a consequence, the values method, defined for an enum class, returns a static collection (that's to say loaded at class initialization) with performances that can be considered as roughly equivalent to an array one.
If you're concerned about performance, then measure.
From the code, I wouldn't expect any surprises but 90% of all performance guesswork is wrong. If you want to be safe, consider to move the enums up into the calling code (i.e. public String getValue(ColumnValues value) {return value.toString();}).
use this:
private enum ModelObject { NODE, SCENE, INSTANCE, URL_TO_FILE, URL_TO_MODEL,
ANIMATION_INTERPOLATION, ANIMATION_EVENT, ANIMATION_CLIP, SAMPLER, IMAGE_EMPTY,
BATCH, COMMAND, SHADER, PARAM, SKIN }
private static final ModelObject int2ModelObject[] = ModelObject.values();
If you're iterating through your enum values just to look for a specific value, you can statically map the enum values to integers. This pushes the performance impact on class load, and makes it easy/low impact to get specific enum values based on a mapped parameter.
public enum ExampleEnum {
value1(1),
value2(2),
valueUndefined(Integer.MAX_VALUE);
private final int enumValue;
private static Map enumMap;
ExampleEnum(int value){
enumValue = value;
}
static {
enumMap = new HashMap<Integer, ExampleEnum>();
for (ExampleEnum exampleEnum: ExampleEnum.values()) {
enumMap.put(exampleEnum.value, exampleEnum);
}
}
public static ExampleEnum getExampleEnum(int value) {
return enumMap.contains(value) ? enumMap.get(value) : valueUndefined;
}
}
I think yes. And it is more convenient to use Constants.

Categories