StringBuffer char[] appears to be out of bounds in heapdump

StringBuffer char[] appears to be out of bounds in heapdump - java

Following an OutOfMemoryError I processed the resultant heapdumps through IBM Support Assistant's 64bit memory analyzer ( J9 VM running on Websphere 7.0.23)
Several leak candidates were listed ( all system classloader related ) however one of these appears to indicate that a char[] initialised with a value of 256 in StringBuffer actually contains 77 million null characters.
The resultant heapdump analysis from the Support Assistant shows a char[77418987] # 0xc32*** \u0000\u0000\u0000.......
this is referenced by StringBuffer -> PatternLayout -> TimeAndSizeRollingAppender
The retained heap checks out, 2 bytes for each char and 18 for the array itself for a total of 150+ Mbs.
Log4j version is 1.2.16 and we use the simonsite TimeAndSizeRollingAppender ( though I would like to remove this dependency ).
Could this be a false positive from Support Assistant or is there some way in which a char[256] can become a char[77000000+] on the heap?

By default, WebSphere generates a PHD file in response to an OOM event. One thing you need to be aware of is that these dumps contain information about the objects in the heap and their references, but not the actual data stored in attributes and arrays (of primitive types). That's why the memory analyzer only shows zeros. To get more information about the root cause, you should configure your WebSphere to create a system dump. That will allow you to see the data in the array and should give you a hint about what is happening.
The following link explains how to do this:
http://pic.dhe.ibm.com/infocenter/isa/v4r1m0/topic/com.ibm.java.diagnostics.memory.analyzer.doc/producing.html
For the 256 vs. 77000000+ question: 256 is only the initial capacity of the StringBuffer. It grows automatically as needed when data is appended.

Related

from InputStream to List<String>, why java is allocating space twice in the JVM?

I am currently trying to process a large txt file (a bit less than 2GB) containing lines of strings.
I am loading all its content from an InputStream to a List<String>. I do that via the following snippet :
try(BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream))) {
List<String> data = reader.lines()
.collect(Collectors.toList());
}
The problem is, the file itsef is less than 2GB, but when I look at the memory, the JVM is allocating twice the size of the file :
Also, here are the heaviest objects in memory :
So what I Understand is that Java is allocating twice the memory needed for the operation, one to put the content of the file in a byte array and another one to instanciate the string list.
My question is : can we optimize that ? avoid having twice the memory size needed ?

tl;dr String objects can take 2 bytes per character.
The long answer: conceptually a String is a sequence of char. Each char will represent one Codepoint (or half of one, but we can ignore that detail for now).
Each codepoint tends to represent a character (sometimes multiple codepoints make up one "character", but that's another detail we can ignore for this answer).
That means that if you read a 2 GB text file that was stored with a single-byte encoding (usually a member of the ISO-8859-* family) or variable-byte encoding (mostly UTF-8), then the size in memory in Java can easily be twice the size on disk.
Now there's a good amount on caveats on this, primarily that Java can (as an internal, invisible operation) use a single byte for each character in a String if and only if the characters used allow it (effectively if they fit into the fixed internal encoding that the JVM picked for this). But that didn't seem to happen for you.
What can you do to avoid that? That depends on what your use-case is:
Don't use String to store the data in the first place. Odds are that this data is actually representing some structure, and if you parse it into a dedicated format, you might get away with way less memory usage.
Don't keep the whole thing in memory: more often then not, you don't actually need everything in memory at once. Instead process and write away the data as you read it, thus never having more than a hand-full of records in memory at once.
Build your own string-like data type for your specific use-case. While building a full string replacement is a massive undertaking, if you know what subset of features you need it might actually be a quite surmountable challenge.
try to make sure that the data is stored as compact strings, if possible, by figuring out why that's not already happening (this requires digging deep in to the details of your JVM).

OutOfMemoryError with apache commons Base64 static method decodeBase64

While decoding an Base64 encoded string to byte array (Have to do this as I have a key which can act on byte array to decrypt), I am getting outOfMemory. What are the effective ways to handle this problem? Should I chunk my input encoded String into partitions of size and then decode it or any other suggestions which are effective please suggest.
Code which was causing the issue.
byte[] encrypted = Base64.decodeBase64(strEncryptedEncodedData);
Stack Trace
DefaultQuartzScheduler_Worker-3
at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
at java.lang.StringCoding$StringEncoder.encode([CII)[B (StringCoding.java:300)
at java.lang.StringCoding.encode(Ljava/lang/String;[CII)[B (StringCoding.java:344)
at java.lang.String.getBytes(Ljava/lang/String;)[B (String.java:918)
at org.apache.commons.codec.binary.StringUtils.getBytesUnchecked(Ljava/lang/String;Ljava/lang/String;)[B (StringUtils.java:156)
at org.apache.commons.codec.binary.StringUtils.getBytesUtf8(Ljava/lang/String;)[B (StringUtils.java:129)
at org.apache.commons.codec.binary.BaseNCodec.decode(Ljava/lang/String;)[B (BaseNCodec.java:306)
at org.apache.commons.codec.binary.Base64.decodeBase64(Ljava/lang/String;)[B (Base64.java:669)
Eclipse Memory Analyzer memory usage:
Edit1: Max allowed XMX is 1 GB.
Edit2: JDK version"1.8.0_91"

try to increase max heap size to the JVM using option like this
-Xmx4096m

Please specify the java version you use for this code.
There are more than 10 different types of OutOfMemoryError as listed below and yours might be “10.java.lang.OutOfMemoryError: Direct buffer memory” type. Please verify your exception stack trace to find this matching string to confirm the same. If you see different type, please share it.
I verified that “java.lang.StringCoding$StringEncoder” class you shared in your exception trace uses java.nio.ByteBuffer and other related classes. You can validate the import sections in the below url.
http://cr.openjdk.java.net/~sherman/7040220/webrev/src/share/classes/java/lang/StringCoding.java.html
Java applications can access native memory (not heap memory) for buffer operations (direct byte) to perform speed operations. Some portion of memory is allocated to JVM from native memory for these direct byte buffer operations. If its size is not enough, you can increase it using VM flag –XX:MaxDirectMemorySize= (eg. -XX:MaxDirectMemorySize=10M). Increasing heap memory by using –Xmx flag would not solve this type of outofmemory. Please try MaxDirectMemorySize flag and see whether it solves your problem.
If you want to know more details about this OutOfMemoryError, you can read Java Performance Optimization: How to avoid the 10 OutOfMemoryErrors book.
1.java.lang.OutOfMemoryError: Java heap space
2.java.lang.OutOfMemoryError: Unable to create new native thread
3.java.lang.OutOfMemoryError: Permgen space
4.java.lang.OutOfMemoryError: Metaspace
5.java.lang.OutOfMemoryError: GC overhead limit exceeded
6.java.lang.OutOfMemoryError: Requested array size exceeds VM limit
7.java.lang.OutOfMemoryError: request "size" bytes for "reason". Out of swap space?
8.java.lang.OutOfMemoryError: Compressed class space
9.java.lang.OutOfMemoryError: "reason" "stack trace" (Native method)
10.java.lang.OutOfMemoryError: Direct buffer memory

Best way to optimize string data in an application that allocates quite a bit of it

I have an application that uses a ton of String objects. One of my objects (lets call it Person) contains 9 of them. The data that is written to each String object is never written more than once, but will be read several times after. There will be several hundred thousand or so Person objects at a given time and many of these Person objects will share first name, last name, etc...
I am trying to think of immediate ways to reduce the amount memory that is consumed by the Person object but I am no expert when it comes to how Java manages its memory underneath.
Before I go down this rabbit hole, I would like to know what drawbacks there would be if I went down these paths and if it even make sense in the first place:
Using StringBuilder or StringBuffer solely because of the trimToSize() method which would allow me to reduce the number of allocated bytes used in the string.
Store the strings as byte[] array's and provide a getter that would convert the byte[] to String and a setter that would accept String and convert to byte[] - data is being read quite a bit, so would this be too expensive?
Create a hash table for (lets just say) "names" that would prevent duplicate allocations (using a pointer) for the same name over and over (there could be thousands of names with 10+ characters).
Before I pointlessly head down any of these roads, does it make sense to do? Maybe Java is already reducing String allocations and checking for duplicates?
I don't mind a good read either. I have found some documentation but nothing that explores to this depth.

Obviously StringBuilder and StringBuffer couldn't help in this case. String is immutable object, so these 2 classes were introduced for building Strings not for storing. Anyway you may (in most cases - must) use StringBuilder if you concatinate/insert chars in the middle/delete some chars from/of Strings
In my opinion, second option could led to increasing memory consuption because new String will be created when byte[] will be converted to String every time you need it.
Handwritten StringDeduplicator is very reasonable solution, especially if you are stuck with java 5,6,7.
Java 8/9 has String Deduplication option. By default, this option is disabled. To use this one in Java 8, you must enable the G1 garbage collector, while in Java 9 G1 is the default.
-XX:+UseStringDeduplication
Regarding String Deduplication, see:
JEP 192: String Deduplication in G1
Java 8 Update 20 Release Notes
Other Stack Overflow posts

Does Immutability of Strings in Java cause Out Of Memory

I have written a simple Java program that reads a million rows from the Database and writes them to a File.
The max memory that this program can use is 512M.
I frequently notice that this program runs Out Of Memory for more than 500K rows.
Since the program is a very simple program it is easy to find out that this doesn't have a memory leak. the way the program works is that it fetches a thousand rows from the Database, writes them to a file using Streams and then goes and fetches the next thousand rows. The size of each row varies but none of the rows is huge. On taking a dump while the program is running the older string are easily seen on the heap. These String in heap are unreachable which means they are waiting to get Garbage collected. I also believe that the GC doesn't necessarily run during the execution of this program which leaves String's in the heap longer than they should.
I think the solution would be to use long Char Arrays(or Stringbuffer) instead of using String objects to store the lines that are returned by the DB. The assumption is that I can overwrite the contents of a Char Array which means the same Char Array can be used across multiple iterations without having to allocate new Space each time.
Pseudocode :
Create an Array of Arrays using new char[1000][1000];
Fill the thousand rows from DB to the Array.
Write Array to File.
Use the same Array for next thousand rows
If the above pseudocode fixes my problem then in reality the Immutable nature of the String class hurts the Java programmer as there is no direct way to claim the space used up by a String even though the String is no longer in use.
Are there any better alternatives to this problem ?
P.S : I didn't do a static analysis alone. I used yourkit profiler to test a heap dump. The dump clearly says 96% of the Strings have NO GC Roots which means they are waiting to get Garbage collected. Also I don't use Substring in my code.

Immutability of the class String has absolutely nothing to do with OutOfMemoryError. Immutability means that it cannot ever change, only that.
If you run out of memory, it is simply because the garbage collector was unable to find any garbage to collect.
In practice, it is likely that you are holding references to way too many Strings in memory (for instance, do you have any kind of collection holding strings, such as List, Set, Map?). You must destroy these references to allow the garbage collector to do its job and free up some memory.

The simple answer to this question is 'no'. I suspect you're hanging onto references longer than you think.
Are you closing those streams properly ? Are you intern()ing those strings. That would result in a permanent copy being made of the string if it doesn't exist already, and taking up permgen space (which isn't collected). Are you taking substring() of a larger string ? Strings make use of the flyweight pattern and will share a character array if created using substring(). See here for more details.
You suggest that garbage collection isn't running. The option -verbose:gc will log the garbage collections and you can see immediately what's going on.

The only thing about strings which can cause an OutOfMemoryError is if you retain small sections of a much larger string. If you are doing this it should be obvious from a heap dump.
When you take a heap dump I suggest you only look at live objects, in which case any retained objects you don't need is most likely to be a bug in your code.

Java Object memory usage - ibm jvm 1.4.2

Is it possible to find memory usage of object in java within application?
I want to have object memory usage to be part of debug output when application runs.
I don't want to connect using external application to VM.
I have a problem that few classes eats up huge amount of memory and causes memory
problems, my app gets crash. I need to find that memory usage (I am working with limited memory resources).
EDIT: I am using java 1.4:/

See my pet project, MemoryMeasurer. A tiny example:
long memory = MemoryMeasurer.measureBytes(new HashMap());
You may also derive more qualitative memory breakdown:
Footprint footprint = ObjectGraphMeasurer.measure(new HashMap());
For example, I used the latter to derive the per entry cost of various data structures, where the overhead is measured in number of objects created, references, and primitives, instead of just bytes (which is also doable). So, next time you use a (default) HashSet, you can be informed that each element in it costs 1 new object (not your element), 5 references, and an int, which is the exact same cost for an entry in HashMap (not unexpectedly, since any HashSet element ends up in a HashMap), and so on.
You can use it on any object graph. If your object graph contains links other structures you do wish to ignore, you should use a predicate to avoid exploring them.
Edit Instrumentation is not available to Java 1.4 (wow, people still use that?!), so the memoryBytes call above wouldn't work for you. But the second would. Then you can write something like this (if you are on a 32bit machine):
long memory = footprint.getObjects() * 8 + footprint.getReferences() * 4 +
footprint.getPrimitives().count(int.class) * 4 +
footprint.getPrimitives().count(long.class) * 8 + ...;
That gives you an approximation. A better answer would be to ceil this to the closest multiple of 16:
long alignedMemory = (x + 15) & (~0xF); //the last part zeros the lowest 4 bits
But the answer might still be off, since if you find, say, 16 booleans, it's one thing if they are found in the same object, and quite another if they are spread in multiple objects (and cause excessive space usage due to aligning). This logic could be implemented as another visitor (similar to how MemoryMeasurer and ObjectGraphMeasurer are implemented - quite simply as you may see), but I didn't bother, since that's what Instrumentation does, so it would only make sense of Java versions below 1.5.

Eclipse MAT is a really good tool to analyze memory.

There are tools that comes with jdk such as jmap and jhat which provides object level details.

The folowing link provides a piece of Java Code computing the size of objects:
http://www.javaworld.com/javaworld/javatips/jw-javatip130.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.