Java Object memory usage - ibm jvm 1.4.2

Java Object memory usage - ibm jvm 1.4.2 - java

Is it possible to find memory usage of object in java within application?
I want to have object memory usage to be part of debug output when application runs.
I don't want to connect using external application to VM.
I have a problem that few classes eats up huge amount of memory and causes memory
problems, my app gets crash. I need to find that memory usage (I am working with limited memory resources).
EDIT: I am using java 1.4:/

See my pet project, MemoryMeasurer. A tiny example:
long memory = MemoryMeasurer.measureBytes(new HashMap());
You may also derive more qualitative memory breakdown:
Footprint footprint = ObjectGraphMeasurer.measure(new HashMap());
For example, I used the latter to derive the per entry cost of various data structures, where the overhead is measured in number of objects created, references, and primitives, instead of just bytes (which is also doable). So, next time you use a (default) HashSet, you can be informed that each element in it costs 1 new object (not your element), 5 references, and an int, which is the exact same cost for an entry in HashMap (not unexpectedly, since any HashSet element ends up in a HashMap), and so on.
You can use it on any object graph. If your object graph contains links other structures you do wish to ignore, you should use a predicate to avoid exploring them.
Edit Instrumentation is not available to Java 1.4 (wow, people still use that?!), so the memoryBytes call above wouldn't work for you. But the second would. Then you can write something like this (if you are on a 32bit machine):
long memory = footprint.getObjects() * 8 + footprint.getReferences() * 4 +
footprint.getPrimitives().count(int.class) * 4 +
footprint.getPrimitives().count(long.class) * 8 + ...;
That gives you an approximation. A better answer would be to ceil this to the closest multiple of 16:
long alignedMemory = (x + 15) & (~0xF); //the last part zeros the lowest 4 bits
But the answer might still be off, since if you find, say, 16 booleans, it's one thing if they are found in the same object, and quite another if they are spread in multiple objects (and cause excessive space usage due to aligning). This logic could be implemented as another visitor (similar to how MemoryMeasurer and ObjectGraphMeasurer are implemented - quite simply as you may see), but I didn't bother, since that's what Instrumentation does, so it would only make sense of Java versions below 1.5.

Eclipse MAT is a really good tool to analyze memory.

There are tools that comes with jdk such as jmap and jhat which provides object level details.

The folowing link provides a piece of Java Code computing the size of objects:
http://www.javaworld.com/javaworld/javatips/jw-javatip130.html

Related

Best way to optimize string data in an application that allocates quite a bit of it

I have an application that uses a ton of String objects. One of my objects (lets call it Person) contains 9 of them. The data that is written to each String object is never written more than once, but will be read several times after. There will be several hundred thousand or so Person objects at a given time and many of these Person objects will share first name, last name, etc...
I am trying to think of immediate ways to reduce the amount memory that is consumed by the Person object but I am no expert when it comes to how Java manages its memory underneath.
Before I go down this rabbit hole, I would like to know what drawbacks there would be if I went down these paths and if it even make sense in the first place:
Using StringBuilder or StringBuffer solely because of the trimToSize() method which would allow me to reduce the number of allocated bytes used in the string.
Store the strings as byte[] array's and provide a getter that would convert the byte[] to String and a setter that would accept String and convert to byte[] - data is being read quite a bit, so would this be too expensive?
Create a hash table for (lets just say) "names" that would prevent duplicate allocations (using a pointer) for the same name over and over (there could be thousands of names with 10+ characters).
Before I pointlessly head down any of these roads, does it make sense to do? Maybe Java is already reducing String allocations and checking for duplicates?
I don't mind a good read either. I have found some documentation but nothing that explores to this depth.

Obviously StringBuilder and StringBuffer couldn't help in this case. String is immutable object, so these 2 classes were introduced for building Strings not for storing. Anyway you may (in most cases - must) use StringBuilder if you concatinate/insert chars in the middle/delete some chars from/of Strings
In my opinion, second option could led to increasing memory consuption because new String will be created when byte[] will be converted to String every time you need it.
Handwritten StringDeduplicator is very reasonable solution, especially if you are stuck with java 5,6,7.
Java 8/9 has String Deduplication option. By default, this option is disabled. To use this one in Java 8, you must enable the G1 garbage collector, while in Java 9 G1 is the default.
-XX:+UseStringDeduplication
Regarding String Deduplication, see:
JEP 192: String Deduplication in G1
Java 8 Update 20 Release Notes
Other Stack Overflow posts

Spark streaming gc setup questions

My logic is as follows.
Use createDirectStream to get a topic by log type in Kafka.
After repartition, the log is processed through various processing.
Create a single string using combineByKey for each log type (use StringBuilder).
Finally, save to HDFS by log type.
There are a lot of operations that add strings, so GC happens frequently.
How is it better to set up GC in this situation?
//////////////////////
There are various logic, but I think there is a problem in doing combineByKey.
rdd.combineByKey[StringBuilder](
(s: String) => new StringBuilder(s),
(sb: StringBuilder, s: String) => sb.append(s),
(sb1: StringBuilder, sb2: StringBuilder) => sb1.append(sb2)
).mapValues(_.toString)

The simplest thing with greatest impact you can do with that combineByKey expression is to size the StringBuilder you create so that it does not have to expand its backing character array as you merge string values into it; the resizing amplifies the allocation rate and wastes memory bandwidth by copying from old to new backing array. As a guesstimate, I would say pick the 90th percentile of string length of the resulting data set's records.
A second thing to look at (after collecting some statistics on your intermediate values) would be for your combiner function to pick the StringBuilder instance that has room to fit in the other one when you call sb1.append(sb2).
A good thing to take care of would be to use Java 8; it has optimizations that make a significant difference when there's heavy work on strings and string buffers.
Last but not least, profile to see where you are actually spending your cycles. This workload (excluding any additional custom processing you are doing) shouldn't need to promote a lot of objects (if any) to old generation, so you should make sure that young generation has ample size and is collected in parallel.

JVM Tunning of Java Class

My java class reads in a 60MB file and produces a HashMap of a HashMap with over 300 million records.
HashMap<Integer, HashMap<Integer, Double>> pairWise =
new HashMap<Integer, HashMap<Integer, Double>>();
I already tunned the VM argument to be:
-Xms512M -Xmx2048M
But system still goes for:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.createEntry(HashMap.java:869)
at java.util.HashMap.addEntry(HashMap.java:856)
at java.util.HashMap.put(HashMap.java:484)
at com.Kaggle.baseline.BaselineNew.createSimMap(BaselineNew.java:70)
at com.Kaggle.baseline.BaselineNew.<init>(BaselineNew.java:25)
at com.Kaggle.baseline.BaselineNew.main(BaselineNew.java:315)
How big of the heap will it take to run without failing with an OOME?

Your dataset is ridiculously large to process it in memory, this is not a final solution, just an optimization.
You're using boxed primitives, which is a very painful thing to look at.
According to this question, a boxed integer can be 20 bytes larger than an unboxed integer. This is not what I call memory efficient.
You can optimize this with specialized collections, which don't box the primitive values. One project providing these is Trove. You could use a TIntDoubleMap instead of your HashMap<Integer, Double> and a TIntObjectHashMap instead of your HashMap<Integer, …>.
Therefore your type would look like this:
TIntObjectHashMap<TIntDoubleHashMap> pairWise =
new TIntObjectHashMap<TIntDoubleHashMap>();
Now, do the math.
300.000.000 Doubles, each 24 bytes, use 7.200.000.000 bytes of memory, that is 7.2 GB.
If you store 300.000.000 doubles, taking 4 bytes each, you only need 1.200.000.000 bytes, which is 1.2 GB.
Congrats, you saved around 83% of the memory you previously used for storing your numbers!
Note that this calculation is rough, depends on the platform and implementation, and does not account for the memory used for the HashMap/T*Maps.

Your data set is large enough that holding all of it in memory at one time is not going to happen.
Consider storing the data in a database and loading partial data sets to perform manipulation.
Edit: My assumption was that you were going to do more than one pass on the data. If all you are doing is loading it and performing one action on each item, then Lex Webb's suggestion (comment below) is a better solution than a database. If you are performing more than one action per item, then database appears to be a better solution. The database does not need to be an SQL database, if your data is record oriented a NoSQL database might be a better fit.

You are using the wrong data structures for data of this volume. Java adds significant overhead in memory and time for every object it creates -- and at the 300 million object level you're looking at a lot of overhead. You should consider leaving this data in the file and use random access techniques to address it in place -- take a look at memory mapped files using nio.

Memory problems with Java in the context of Hadoop

I want to compute a multiway join in Hadoop framework. When the records of each relation get bigger from a threshold and beyond I face two memory problems,
1) Error: GC overhead limit exceeded,
2) Error: Java heap space.
The threshold is the 1.000.000 / relation for a chain join and a star join.
In the join computation I use some hash tables i.e.
Hashtable< V, LinkedList< K>> ht = new Hashtable< V, LinkedList< K>>( someSize, o.75F);
These errors occur when I hash the input records and only then for the moment. During the hashing I have quite many for loops which, produce a lot of temporary objects. For this reason I get the 1) problem. So, I solved the 1) problem by setting K = StringBuilder which is a final class. In other words I reduced the amount of temporary objects by having only few objects which their value, content changes but not themselves.
Now, I am dealing with the 2) problem. I increased the heap space in each of the nodes of my cluster by setting the appropriate variable in the file $HADOOP_HOME/hadoop/conf/hadoop-env.sh. The problem still remained. I did a very basic monitoring of the heap by using VisualVM. I monitored only the master node and especially the JobTracker and the local TaskTracker daemons. I didn't notice any heap overflow during this monitoring. Also the PermGen space didn't overflow.
So for the moment, in the declaration,
Hashtable< V, LinkedList< K>> ht = new Hashtable< V, LinkedList< K>>( someSize, o.75F);
I am thinking of setting V = SomeFinalClass. This SomeFinalClass will help me to keep the amount of objects low and consequently the memory usage. Of course a SomeFinalClass object will have the same hash code independently of its content by default. So I will not be able to use this SomeFinalClass as a key in the hash table above. In order to solve this problem I am thinking of overriding the default hashCode() method and by a similar String.hashCode() method. This method will produce a hash code based on the content of a SomeFinalClass object.
What is your opinion on the problems and the solutions above? What would you do?
Should I monitor also the DataNode daemon? Both of the errors above are TaskTracker errors, DataNode errors or both?
Finally, will the solutions above solve the problems for an arbitrary amount of records / relation? Or soon or later I will have the same problem again?

Use an ArrayList instead of a LinkedList and it will use a lot less memory.
Also I suggest using a HashMap instead of Hastable as the later is a legacy class.

Does Immutability of Strings in Java cause Out Of Memory

I have written a simple Java program that reads a million rows from the Database and writes them to a File.
The max memory that this program can use is 512M.
I frequently notice that this program runs Out Of Memory for more than 500K rows.
Since the program is a very simple program it is easy to find out that this doesn't have a memory leak. the way the program works is that it fetches a thousand rows from the Database, writes them to a file using Streams and then goes and fetches the next thousand rows. The size of each row varies but none of the rows is huge. On taking a dump while the program is running the older string are easily seen on the heap. These String in heap are unreachable which means they are waiting to get Garbage collected. I also believe that the GC doesn't necessarily run during the execution of this program which leaves String's in the heap longer than they should.
I think the solution would be to use long Char Arrays(or Stringbuffer) instead of using String objects to store the lines that are returned by the DB. The assumption is that I can overwrite the contents of a Char Array which means the same Char Array can be used across multiple iterations without having to allocate new Space each time.
Pseudocode :
Create an Array of Arrays using new char[1000][1000];
Fill the thousand rows from DB to the Array.
Write Array to File.
Use the same Array for next thousand rows
If the above pseudocode fixes my problem then in reality the Immutable nature of the String class hurts the Java programmer as there is no direct way to claim the space used up by a String even though the String is no longer in use.
Are there any better alternatives to this problem ?
P.S : I didn't do a static analysis alone. I used yourkit profiler to test a heap dump. The dump clearly says 96% of the Strings have NO GC Roots which means they are waiting to get Garbage collected. Also I don't use Substring in my code.

Immutability of the class String has absolutely nothing to do with OutOfMemoryError. Immutability means that it cannot ever change, only that.
If you run out of memory, it is simply because the garbage collector was unable to find any garbage to collect.
In practice, it is likely that you are holding references to way too many Strings in memory (for instance, do you have any kind of collection holding strings, such as List, Set, Map?). You must destroy these references to allow the garbage collector to do its job and free up some memory.

The simple answer to this question is 'no'. I suspect you're hanging onto references longer than you think.
Are you closing those streams properly ? Are you intern()ing those strings. That would result in a permanent copy being made of the string if it doesn't exist already, and taking up permgen space (which isn't collected). Are you taking substring() of a larger string ? Strings make use of the flyweight pattern and will share a character array if created using substring(). See here for more details.
You suggest that garbage collection isn't running. The option -verbose:gc will log the garbage collections and you can see immediately what's going on.

The only thing about strings which can cause an OutOfMemoryError is if you retain small sections of a much larger string. If you are doing this it should be obvious from a heap dump.
When you take a heap dump I suggest you only look at live objects, in which case any retained objects you don't need is most likely to be a bug in your code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.