Related
What I want to do is take a decimal integer, convert it into hexadecimal, and then separate the bytes.
It's my understanding that ByteBuffer is the best way to do this. The integer will not exceed 65535, so the hex number is guaranteed to be 2 bytes. For an example, I have an integer of 40000 (hex value 9C40).
int n1 = 40000;
ByteBuffer b = ByteBuffer.allocate(2);
b.putInt(n1);
However, I get the following error when I run the program:
Exception in thread "main" java.nio.BufferOverflowException
What am I doing wrong? Shouldn't 9C40 be written into b (with b[0] = 9C and b[1] = 40)?
Also, once I get past this, if I want to convert the value stored in b[0] (which is 9C) to decimal (which is 156), would I just use the following code?
int n2 = b.get(0);
As you are working with a ByteBuffer, it stores an amount of x allocated bytes. Now you allocated 2 bytes and you try to store a datatype that has the size of 4 bytes. So the buffer will run out of bounds as the message said. If you want to store this data in a two byte sized buffer, you either use a short (16 bit - 2 bytes) or you allocate 4 bytes for your ByteBuffer.
With short:
ByteBuffer bb = ByteBuffer.allocate(2);
short myShort = (short) 40000;
bb.putShort(myShort);
System.out.println(String.format("%02X, %02X", bb.get(0), bb.get(1)));
With int:
ByteBuffer bb = ByteBuffer.allocate(4);
int myInt = 40000;
bb.putInt(myInt);
System.out.println(String.format("%02X, %02X", bb.get(2), bb.get(3)));
Output: 9C, 40
The data type you used to store the number 40000 is int, which requires 4 bytes of space. Yes I know the number won't exceed 65535 but the computer doesn't. You have to change it to an appropriate data type that can be stored in 2 bytes.
That data type, is short.
But there's another problem if you used short, you can't really store 40000 in short in Java is signed, so its max value is 32767.
So to store your 40000, you have to store -25536 instead in a short, because of overflow.
short n1 = (short)40000; // this will cause n1 to store -25536
ByteBuffer b = ByteBuffer.allocate(2);
b.putShort(n1);
Now it's time to print out the bytes. Bytes in Java are signed as well. So if you print this:
System.out.println(b.get(0));
System.out.println(b.get(1));
You'd get
-100
64
64 is expected since 64 in hex is 40, but why -100? Since bytes are signed, 156 can't be represented as 156. 156 in a signed byte is -100.
Instead of ByteBuffer I prefer Integer class which can convert the integer value to hex string & you can get each byte by index of method.
Use following code it do that
int n = 4000;
String hex = Integer.toHexString(n);
In this way you can get the hex value of any integer for one one byte use indexOf() method of string clas
You can get return the hex value as integer using valueOf() method in Integer class which takes two arguments one is string and another is radix
I use a hashmap to store a QTable for an implementation of a reinforcement learning algorithm. My hashmap should store 15000000 entries. When I ran my algorithm I saw that the memory used by the process is over 1000000K. When I calculated the memory, I would expect it to use not more than 530000K. I tried to write an example and I got the same high memory usage:
public static void main(String[] args) {
HashMap map = new HashMap<>(16_000_000, 1);
for(int i = 0; i < 15_000_000; i++){
map.put(i, i);
}
}
My memory calulation:
Each entryset is 32 bytes
Capacity is 15000000
HashMap Instance uses: 32 * SIZE + 4 * CAPACITY
memory = (15000000 * 32 + 15000000 * 4) / 1024 = 527343.75K
Where I'm wrong in my memory calculations?
Well, in the best case, we assume a word size of 32 bits/4 bytes (with CompressedOops and CompressedClassesPointers). Then, a map entry consists of two words JVM overhead (klass pointer and mark word), key, value, hashcode and next pointer, making 6 words total, in other words, 24 bytes. So having 15,000,000 entry instances will consume 360 MB.
Additionally, there’s the array holding the entries. The HashMap uses capacities that are a power of two, so for 15,000,000 entries, the array size is at least 16,777,216, consuming 64 MiB.
Then, you have 30,000,000 Integer instances. The problem is that map.put(i, i) performs two boxing operations and while the JVM is encouraged to reuse objects when boxing, it is not required to do so and reusing won’t happen in your simple program that is likely to complete before the optimizer ever interferes.
To be precise, the first 128 Integer instances are reused, because for values in the -128 … +127 range, sharing is mandatory, but the implementation does this by initializing the entire cache on the first use, so for the first 128 iterations, it doesn’t create two instances, but the cache consists of 256 instances, which is twice that number, so we end up again with 30,000,000 Integer instances total. An Integer instance consist of at least the two JVM specific words and the actual int value, which would make 12 bytes, but due to the default alignment, the actually consumed memory will be 16 bytes, dividable by eight.
So the 30,000,000 created Integer instances consume 480 MB.
This makes a total of 360 MB + 64 MiB + 480 MB, which is more than 900 MB, making a heap size of 1 GB entirely plausible.
But that’s what profiling tools are for. After running your program, I got
Note that this tool only reports the used size of the objects, i.e. the 12 bytes for an Integer object without considering the padding that you will notice when looking at the total memory allocated by the JVM.
I sort of had the same requirement as you.. so decided to throw my thoughts here.
1) There is a great tool for that: jol.
2) Arrays are objects too, and every object in java has two additional headers: mark and klass, usually 4 and 8 bytes in size (this can be tweaked via compressed pointers, but not going to go into details).
3) Is is important to note about the load factor here of the map (because it influences the resize of the internal array). Here is an example:
HashMap<Integer, Integer> map = new HashMap<>(16, 1);
for (int i = 0; i < 13; ++i) {
map.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map).toFootprint());
HashMap<Integer, Integer> map2 = new HashMap<>(16);
for (int i = 0; i < 13; ++i) {
map2.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map2).toFootprint());
Output of this is different(only the relevant lines):
1 80 80 [Ljava.util.HashMap$Node; // first case
1 144 144 [Ljava.util.HashMap$Node; // second case
See how the size is bigger for the second case because the backing array is twice as big (32 entries). You can only put 12 entries in a 16 size array, because the default load factor is 0.75: 16 * 0.75 = 12.
Why 144? The math here is easy: an array is an object, thus: 8+4 bytes for headers. Plus 32 * 4 for references = 140 bytes. Due to memory alignment of 8 bytes, there are 4 bytes for padding resulting in a total 144 bytes.
4) entries are stored inside either a Node or a TreeNode inside the map (Node is 32 bytes and TreeNode is 56 bytes). As you use ONLY integers, you will have only Nodes, as there should be no hash collisions. There might be collisions, but this does not yet mean that a certain array entry will be converted to a TreeNode, there is a threshold for that. We can easily prove that there will be Nodes only:
public static void main(String[] args) {
Map<Integer, List<Integer>> map = IntStream.range(0, 15_000_000).boxed()
.collect(Collectors.groupingBy(WillThereBeTreeNodes::hash)); // WillThereBeTreeNodes - current class name
System.out.println(map.size());
}
private static int hash(Integer key) {
int h = 0;
return (h = key.hashCode()) ^ h >>> 16;
}
The result of this will be 15_000_000, there was no merging, thus no hash-collisions.
5) When you create Integer objects there is pool for them (ranging from -127 to 128 - this can be tweaked as well, but let's not for simplicity).
6) an Integer is an object, thus it has 12 bytes header and 4 bytes for the actual int value.
With this in mind, let's try and see the output for 15_000_000 entries (since you are using a load factor of one, there is no need to create the internal capacity of 16_000_000). It will take a lot of time, so be patient. I also gave it a
-Xmx12G and -Xms12G
HashMap<Integer, Integer> map = new HashMap<>(15_000_000, 1);
for (int i = 0; i < 15_000_000; ++i) {
map.put(i, i);
}
System.out.println(GraphLayout.parseInstance(map).toFootprint());
Here is what jol said:
java.util.HashMap#9629756d footprint:
COUNT AVG SUM DESCRIPTION
1 67108880 67108880 [Ljava.util.HashMap$Node;
29999872 16 479997952 java.lang.Integer
1 48 48 java.util.HashMap
15000000 32 480000000 java.util.HashMap$Node
44999874 1027106880 (total)
Let's start from bottom.
total size of the hashmap footprint is: 1027106880 bytes or 1 027 MB.
Node instance is the wrapper class where each entry resides. it has a size of 32 bytes; there are 15 million entries, thus the line:
15000000 32 480000000 java.util.HashMap$Node
Why 32 bytes? It stores the hashcode(4 bytes), key reference (4 bytes), value reference (4 bytes), next Node reference (4 bytes), 12 bytes header, 4 bytes padding, resulting in 32 bytes total.
1 48 48 java.util.HashMap
A single hashmap instance - 48 bytes for it's internals.
If you really want to know why 48 bytes:
System.out.println(ClassLayout.parseClass(HashMap.class).toPrintable());
java.util.HashMap object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 Set AbstractMap.keySet N/A
16 4 Collection AbstractMap.values N/A
20 4 int HashMap.size N/A
24 4 int HashMap.modCount N/A
28 4 int HashMap.threshold N/A
32 4 float HashMap.loadFactor N/A
36 4 Node[] HashMap.table N/A
40 4 Set HashMap.entrySet N/A
44 4 (loss due to the next object alignment)
Instance size: 48 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Next the Integer instances:
29999872 16 479997952 java.lang.Integer
30 million integer objects (minus 128 that are cached in the pool)
1 67108880 67108880 [Ljava.util.HashMap$Node;
we have 15_000_000 entries, but the internal array of a HashMap is a power of two size, that's 16,777,216 references of 4 bytes each.
16_777_216 * 4 = 67_108_864 + 12 bytes header + 4 padding = 67108880
I have a file of ~50 million strings that I need to add to a symbol table of some sort on startup, then search several times with reasonable speed.
I tried using a DLB trie since lookup would be relatively fast since all strings are < 10 characters, but while populating the DLB I would get either GC overhead limit exceeded or outofmemory - heap space error. The same errors were found with HashMap. This is for an assignment that would be compiled and run by a grader so I would rather not just allocate more heap space. Is there a different data structure that would have less memory usage, while still having reasonable lookup time?
If you expect low prefix sharing, then a trie may not be your best option.
Since you only load the lookup table once, at startup, and your goal is low memory footprint with "reasonable speed" for lookup, your best option is likely a sorted array and binary search for lookup.
First, you load the data into an array. Since you likely don't know the size up front, you load into an ArrayList. You then extract the final array from the list.
Assuming you load 50 million 10 character strings, memory will be:
10 character string:
String: 12 byte header + 4 byte 'hash' + 4 byte 'value' ref = 24 bytes (aligned)
char[]: 12 byte header + 4 byte 'length' + 10 * 2 byte 'char' = 40 bytes (aligned)
Total: 24 + 40 = 64 bytes
Array of 50 million 10 character strings:
String[]: 12 byte header + 4 byte 'length' + 50,000,000 * 4 byte 'String' ref = 200,000,016 bytes
Values: 50,000,000 * 64 bytes = 3,200,000,000 bytes
Total: 200,000,016 + 3,200,000,000 = 3,400,000,016 bytes = 3.2 GB
You will need another copy of the String[] when you convert the ArrayList<String> to String[]. The Arrays.sort() operation may need 50% array size (~100,000,000 bytes) for temporary storage, but if ArrayList is released for GC before you sort, that space can be reused.
So, total requirement is ~3.5 GB, just for the symbol table.
Now, if space is truly at a premium, you can squeeze that. As you can see, the String itself adds an overhead of 24 bytes, out of the 64 bytes. You can make the symbol table use char[] directly.
Also, if your strings are all US-ASCII or ISO-8859-1, you can convert the char[] to a byte[], saving half the bytes.
Combined, that reduces the value size from 64 bytes to 32 bytes, and the total symbol table size from 3.2 GB to 1.8 GB, or roughly 2 GB during loading.
UPDATE
Assuming input list of strings are already sorted, below is example of how you do this. As an MCVE, it just uses a small static array as input, but you can easily read them from a file instead.
public class Test {
public static void main(String[] args) {
String[] wordsFromFile = { "appear", "attack", "cellar", "copper",
"erratic", "grotesque", "guitar", "guttural",
"kittens", "mean", "suit", "trick" };
List<byte[]> wordList = new ArrayList<>();
for (String word : wordsFromFile) // Simulating read from file
wordList.add(word.getBytes(StandardCharsets.US_ASCII));
byte[][] symbolTable = wordList.toArray(new byte[wordList.size()][]);
test(symbolTable, "abc");
test(symbolTable, "attack");
test(symbolTable, "car");
test(symbolTable, "kittens");
test(symbolTable, "xyz");
}
private static void test(byte[][] symbolTable, String word) {
int idx = Arrays.binarySearch(symbolTable,
word.getBytes(StandardCharsets.US_ASCII),
Test::compare);
if (idx < 0)
System.out.println("Not found: " + word);
else
System.out.println("Found : " + word);
}
private static int compare(byte[] w1, byte[] w2) {
for (int i = 0, cmp; i < w1.length && i < w2.length; i++)
if ((cmp = Byte.compare(w1[i], w2[i])) != 0)
return cmp;
return Integer.compare(w1.length, w2.length);
}
}
Output
Not found: abc
Found : attack
Not found: car
Found : kittens
Not found: xyz
Use a single char array to store all strings (sorted), and an array of integers for the offsets. String n is the chars from offset[n - 1] (inclusive) to offset[n] (exclusive). offset[-1] is zero.
Memory usage will be 1GB (50M*10*2) for the char array, and 200MB (50M * 4) for the offset array. Very compact even with two byte chars.
You will have to build this array by merging smaller sorted string arrays in order not to exceed your heap space. But once you have it, it should be reasonably fast.
Alternatively, you could try a memory optimized trie implementation such as https://github.com/rklaehn/radixtree . This uses not just prefix sharing, but also structural sharing for common suffixes, so unless your strings are completely random, it should be quite compact. See the space usage benchmark. But it is scala, not java.
Here's the question about memory allocation in Java.
Suppose I have an array of ints A[100] and another array of ints B[10][10]. Do they need the same amount of memory in Java or is it different? If the latter, what's the difference and how does it grow with N?
I'm talking here only about Ns that are power of 2 of a positive number, so we're talking here about square 2D arrays and their possible 1D representation.
Definitively, no.
In C/C++ a 2D-array allocates all memory for the 2D-array in "one chunk".
In Java, a 2D-array is an "array of arrays". One-dimensional arrays are allocated in one chunk, but the one-dimensional arrays may be scattered. Furthermore, the "outer" array (2nd dimension) needs heap memory as well, storing the references to the 1D-arrays.
So if one allocates a 2D-array of dimensions m (outer) and n (inner), Java will create one array of m elements and m arrays ofn elements each. The outer array just stores the references to the m inner arrays.
This page gives a nice explanation and visualization of multidimensional arrays in Java.
This is empirical confirmation of #Turing85's answer, and measurement of the overhead. This program alternately allocates and frees a single array of a million elements or an int[1000][1000], reporting the amount of memory in use at each step. It quickly settles down to:
Neither: 291696
1D: 4291696
Neither: 291696
2D: 4311696
showing an extra 20,000 bytes of memory in use for the 2D case.
Here is the program:
public class Test {
public static void main(String[] args) {
int M=1000;
for(int i=0; i<10; i++){
System.out.println("Neither: "+inUseMem());
int[] oneArray = new int[M*M];
System.out.println("1D: "+inUseMem());
oneArray = null;
System.out.println("Neither: "+inUseMem());
int[][] twoArray = new int[M][M];
System.out.println("2D: "+inUseMem());
twoArray = null;
}
}
private static long inUseMem() {
System.gc();
return Runtime.getRuntime().totalMemory()
- Runtime.getRuntime().freeMemory();
}
}
Running this program on the system of interest using the actual array sizes should show the cost of using the 2D array. If the arrays really are around 10,000 elements total, it is probably best to go with whatever makes the code more readable.
When an object is creating by using “new”, memory is allocated on the heap and a reference is returned. This is also true for arrays, since arrays are objects.
Single-dimension Array
int arr[] = new int3;
The int[] arr is just the reference to the array of 3 integer. If you create an array with 10 integer, it is the same – an array is allocated and a reference is returned.
Two-dimensional Array
Actually, we can only have one dimensional arrays in Java. 2D arrays are basically just one dimensional arrays of one dimensional arrays.
int[ ][ ] arr = new int[3][ ];
arr[0] = new int[3];
arr[1] = new int[5];
arr[2] = new int[4];
Multi-dimensional arrays use the name rules.
As you can observe that while managing a 2D array extra 1D array is needed thus in terms of memory the size differ but from users view both have same size.
Images were taken form this site.
I would like to know exactly the real space allocated in memory for an object.
I try to explain with some example: using a 64 bit JVM, pointer size should be 8 bytes, so:
Object singletest = new Object(); will take 8 bytes to reference the Object plus the size of the Object
Object arraytest = new Object[10]; will take 8 byte to reference the position where the array is stored plus 8*10 bytes to store the array plus the size of each Object
int singleint = new int; will take just 2 bytes, because int is a primitive type
int[] arrayint = new int[10]; will take 8 bytes to reference the position and 10*2 bytes for the elements
Moreover, this is the reason why Java allows to write code like this:
int[][] doublearray = new int[2][];
int[0][] = new int[5];
int[1][] = new int[10];
What really happen is that an array will produce a reference (aka pointer) like an object, so it doesn't really matter the size of the second dimension at declaration time (and dimensions can be different, there is no link between them). Then the space taken will be: a reference to doublearray (8 bytes), first dimension is simply a reference to the second one, so other 8 bytes * 2 (first dimension size), and finally 2 bytes * 5 plus 2 bytes * 10.
So, finally, if have a real class like this:
class Test {
int a, b;
int getA() {return A};
void setA(int a) {this.a = a;}
int getB() {return B};
void setB(int b) {this.b = b;}
}
when I call a new to instantiate, a pointer (or name it reference, because it's Java) of 8 bytes will be used plus 2+2bytes to store the integers into the class.
The questions are: am I right or I wrote total nonsense? Moreover, when I don't instantiate an object but I just declare it, 8 bytes will be allocated for further use or not? And what if I assign a null value?
Meanwhile for primitive type I'm quite sure that just declaring it will allocate the requested space (if I declare an "int i" then I can immediately call i++ because no reference are used, just a portion of memory is setted on "0").
I searched on internet without clever response... I know that I wrote a lot of questions, but any help will be appreciated! (and maybe I'm not the only one interested)
using a 64 bit JVM, pointer size should be 8 bytes,
Actually its usually 32-bit unless you have a maximum heap size of 32 GB or more heap. This is because Java uses references, not pointers (and each object is on an 8 byte, not 1 boundary)
The JVM can change the size of a reference depending on which JVM you use and what the maximum heap size is.
Object singletest = new Object(); will take 8 bytes to reference the Object plus the size of the Object
The object will use about 16 bytes of heap. It may or may not use 4 bytes of stack, but it could just use a register.
Object arraytest = new Object[10];
This will use about 16 bytes for the header, plus 10 times the reference sizes (about 56 bytes in total)
int singleint = new int; will take just 2 bytes, because int is a primitive type
int is always 32-bit bit, you can't create a newprimitive. As its notionally on the stack it might use 4-bytes of stack or it might only use a register.
int[] arrayint = new int[10]; will take 8 bytes to reference the position and 10*2 bytes for the elements
Again the object is likely to be the same size as the new Object[10] (56 bytes)
int[][] doublearray = new int[2][];
int[0][] = new int[5];
int[1][] = new int[10];
I wouldn't call it a doublearray as it could be confused with double[]
However the size is likkely to be about 16 + 2 * 4 for doublearray and 16 + 5*4 + 4 (for padding) and 16 + 10 * 4.
Memory allocated on the heap is aligned to an 8 byte boundary.
I call a new to instantiate, a pointer (or name it reference, because it's Java) of 8 bytes will be used plus 2+2bytes to store the integers into the class.
The reference is on the stack and usually this is not included.
The object has a header of about 16 bytes and the int values take up 2 * 4 bytes.
when I don't instantiate an object but I just declare it
Java doesn't let you declare Objects, only primitives and references.
what if I assign a null value?
That could change the value of the reference, but otherwise nothing happens.
if I declare an "int i" then I can immediately call i++ because no reference are used, just a portion of memory is setted on "0"
No heap will be used, possibly no stack will be used (possibly 4 bytes). Possible the JIT compiler will remove the code if it doesn't do anything.
maybe I'm not the only one interested
... but was not too afraid to ask. ;)