for 13K users I have the following memory dump. I will paste the top 7 consumers. Netty seems to consume too much memory. Is this normal ?
(Netty Version:3.2.7, implementing IdleStateAwareChannelUpstreamHandler,Total Memory Netty Memory Usage:2.5GB minimum )
num #instances #bytes class name
----------------------------------------------
1: 23086640 923465600 org.jboss.netty.util.internal.ConcurrentHashMap$Segment
2: 28649817 916794144 java.util.concurrent.locks.ReentrantLock$NonfairSync
3: 23086640 554864352 [Lorg.jboss.netty.util.internal.ConcurrentHashMap$HashEntry;
4: 118907 275209504 [I
5: 5184704 207388160 java.util.concurrent.ConcurrentHashMap$Segment
6: 5184704 130874832 [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
7: 1442915 115433200 [Lorg.jboss.netty.util.internal.ConcurrentHashMap$Segment;
It looks like the memory usage is not normal.
Here are some facts about Netty internal memory usage
One channel has two ReentrantLocks, (one read lock,one write lock)
Channel stores all channel references in a
org.jboss.netty.util.internal.ConcurrentHashMap internally, and automatically
removes on close (This is to assign unique channel ids).
ChannelGroup stores channel references in a org.jboss.netty.util.internal.ConcurrentHashMap on add() and automatically removes on close.
There will be one ConcurrentHashMap$HashEntry per item stored in org.jboss.netty.util.internal.ConcurrentHashMap.
so you can calculate the expected memory usage, if your handlers are not leaking any references.
Related
I have a newbie question, so I started looking at Java Affinity library and I have the following code:
public static void main(String[] args){
AffinityLock l = AffinityLock.acquireLock(5);
Thread.currentThread().setName("Testing");
System.out.println("\nThe assignment of CPUs is\n" + AffinityLock.dumpLocks());
while(!Thread.currentThread().isInterrupted()){
}
}
and I have output:
The assignment of CPUs is
0: General use CPU
1: Reserved for this application
2: Reserved for this application
3: Reserved for this application
4: Reserved for this application
5: Thread[Testing,5,main] alive=true
...
But if I go ps -alF, I can see the process is not running on PSR 5. Am I missing something obvious?
Thanks a lot!!
- Mag
A thread could be executed on different cores. Please read this fascinating discussion.
In this particular case you can stop the debugger in the method net.openhft.affinity.LockInventory.set(CpuLayout cpuLayout) and do some steps to find out what happens during initialization of logicalCoreLocks array.
It looks like the dumped data was actual only at the moment of initialization, but it's not when being printed to the console.
My logic is as follows.
Use createDirectStream to get a topic by log type in Kafka.
After repartition, the log is processed through various processing.
Create a single string using combineByKey for each log type (use StringBuilder).
Finally, save to HDFS by log type.
There are a lot of operations that add strings, so GC happens frequently.
How is it better to set up GC in this situation?
//////////////////////
There are various logic, but I think there is a problem in doing combineByKey.
rdd.combineByKey[StringBuilder](
(s: String) => new StringBuilder(s),
(sb: StringBuilder, s: String) => sb.append(s),
(sb1: StringBuilder, sb2: StringBuilder) => sb1.append(sb2)
).mapValues(_.toString)
The simplest thing with greatest impact you can do with that combineByKey expression is to size the StringBuilder you create so that it does not have to expand its backing character array as you merge string values into it; the resizing amplifies the allocation rate and wastes memory bandwidth by copying from old to new backing array. As a guesstimate, I would say pick the 90th percentile of string length of the resulting data set's records.
A second thing to look at (after collecting some statistics on your intermediate values) would be for your combiner function to pick the StringBuilder instance that has room to fit in the other one when you call sb1.append(sb2).
A good thing to take care of would be to use Java 8; it has optimizations that make a significant difference when there's heavy work on strings and string buffers.
Last but not least, profile to see where you are actually spending your cycles. This workload (excluding any additional custom processing you are doing) shouldn't need to promote a lot of objects (if any) to old generation, so you should make sure that young generation has ample size and is collected in parallel.
I have a large Java app that processes a large collection of data files, using a try/catch within an actionPerformed (sample code below). It runs out of memory when I get to about 1000 files inside the loop.
Each file load legitimately takes about 1MB of storage, but I've looked carefully and don't see any place where that storage is being hung on to. Each file load is doing just the same thing (ie assigning the same vars), so it ought to be re-using, not accumulating.
I tried inserting an explicit gc call into the loop, which (according to visualvm) succeeds only in smoothing out the spikes in memory use (see image below).
The odd thing is the behavior of memory use: as the attached image makes clear, the usage climbs while the loading loop is working, persists at the plateau while inside the try, but the gc outside the try causes all the memory to reclaimed (the cliff at the end of the plateau).
Is there something about try/catch that influences gc behavior? Any hints about things to check in my code to find a possible leak that I might have introduced?
I've spent many hours on this with a variety of memory/heap management tools, and tracing code, and it's really got me bewildered. If it were a true memory leak in my code, why would the final gc clean up everything?
Many thanks for any suggestions/ideas.
if (message == MenuCommands.TRYLOADINGFILES){
try {
File dir = new File(<directory with 1015 files in it>);
File [] cskFiles = dir.listFiles(ioUtilities.cskFileFilter);
for (int i=0; i<cskFiles.length; i++){
loadDrawingFromFile(cskFiles[i], true);
if (i % 10 == 0) System.gc();
}
DebugUtilities.pauseForOK("pausing inside try");
}
catch (Exception e1){
e1.printStackTrace();
}
DebugUtilities.pauseForOK("pausing outside try");
System.gc();
DebugUtilities.pauseForOK("pausing after gc, outside try");
}
where
public static pauseForOK(String msg){
JOptionPane.showMessageDialog(null, msg, "OK", JOptionPane.INFORMATION_MESSAGE);
}
Followup based on suggestion from Peter, below. histo:live shows almost NO change no matter when run (at pgm startup, before any actions taken, after all files read (when visualvm reports GB of storage being used), after final gc, when visualvm says it's back down to initial stg use). From startup to running the first four categories about double, and the amount of Char stg goes up by about the amount expected for one file processing, but not much else changes.
According to it, it looks like nothing is sticking around. Here are the first 30 or so lines of the histo from just after when the file load loop finishes (before the final gc outside the try).
num #instances #bytes class name
----------------------------------------------
1: 67824 9242064 <methodKlass>
2: 67824 9199704 <constMethodKlass>
3: 6307 7517424 <constantPoolKlass>
4: 6307 6106760 <instanceKlassKlass>
5: 46924 5861896 [C
6: 5618 4751200 <constantPoolCacheKlass>
7: 10590 3944304 [S
8: 19427 3672480 [I
9: 15280 1617096 [B
10: 33996 1584808 [Ljava.lang.Object;
11: 2975 1487144 <methodDataKlass>
12: 40028 1280896 java.util.Hashtable$Entry
13: 45791 1098984 java.lang.String
14: 31079 994528 java.util.HashMap$Entry
15: 10580 973472 [Ljava.util.HashMap$Entry;
16: 6750 817344 java.lang.Class
17: 10427 583912 java.util.HashMap
18: 1521 523224 javax.swing.JPanel
19: 10008 516344 [[I
20: 8291 457176 [Ljava.security.ProtectionDomain;
21: 4022 431800 [Ljava.util.Hashtable$Entry;
22: 774 377712 com.sun.java.swing.plaf.windows.WindowsScrollBarUI$WindowsArrowButton
23: 689 369704 [J
24: 13931 334344 java.util.ArrayList
25: 7625 305000 java.util.WeakHashMap$Entry
26: 8611 275552 java.lang.ref.WeakReference
27: 8501 272032 java.security.AccessControlContext
28: 16144 258304 javax.swing.event.EventListenerList
29: 6141 245640 com.sun.tools.visualvm.attach.HeapHistogramImpl$ClassInfoImpl
30: 426 245376 <objArrayKlassKlass>
31: 3937 220472 java.util.Hashtable
32: 13395 214320 java.lang.Object
33: 2267 199496 javax.swing.text.html.InlineView
It shows basically this same thing no matter at what point in the process it's run. Got basically the same result even without the :live argument. Yet the program definitely will run out of memory if it runs on enough files.
One other item: I took two snapshots using visualvm's Memory Sampling, one at pgm starup and one on the plateau of memory use; the delta shows the expected increase in storage use, including an increase in the count of some structures that's exactly the same as the number of files processed. As each file processing creates one of those structures, it's as if all that intermediate storage is being kept around while inside the try, but can be cleared out afterward.
What's going on?
++++++++++++
Update 22:00 EDT Sunday
Thanks to #Peter Lowrey, #Vampire, and others for suggestions. Tried all those ideas and nothing works. Tried setting -XX:NewSize=1GB and -XX:NewRatio=3, but it didn't help.
The try/catch was a holdover from the original code and is (I belatedly realized) irrelevant in the example. Getting rid of it entirely changes nothing. Just the simple for-loop loading the files causes the same memory growth pattern, followed by the drop
to initial values when the final gc is done.
Following up on #Vampire's suggestion, I tried this variation (with the loads inline, rather than in a block):
loadDrawingFromFile(thecskFile, true);
loadDrawingFromFile(thecskFile, true);
... 20 times
DebugUtilities.pauseForOK("after 20 loads, before gc");
System.gc();
DebugUtilities.pauseForOK("after gc outside try");
The 20 file loads produced proportionally the same amount of growth in Used Heap space (about 400MB) as in the full example, then after the System.gc() above, the heap space used drops instantly back to program initialization levels, just as before.
When that happened I tried an even more basic approach
loadDrawingFromFile(thecskFile, true);
DebugUtilities.pauseForOK("after load ");
System.gc();
.. repeated 20 times
Turns out this work, in the sense that the memory usage never goes 50 MB even after 20 file loads.
So this seems to have to do with threads and thread interruption. Which leads me to mention one more fact: this is an application that runs off a GUI that's started with:
SwingUtilities.invokeLater(new Runnable() {
public void run() { ... }
}
I'm not that familiar with threads and the Swing Utilities, so perhaps this is some form of naive mistake, but it seems to come down to the fact that a lot of non-live objects are not being touched by the gc until the ShowMessageDialog interrupts something.
Additional suggestions welcome.
I suspect you don't have a memory leak. Instead you have having premature promotion of large objects.
If you are creating large objects, e.g. byte[], these go straight in the tenured space. These are only cleaned up on a major or full collections. Most likely you are only triggering minor collections do the large objects are not being freed until a full collection is triggered.
I guess Peter is right, but in case he isn't: You can run out of file descriptors by not closing streams in loadDrawingFromFile. IIRC it manifests itself also by OOM, while you can have tons of free memory. I guess it's not what's happening in your case as the exception message should state it clearly.
Following an OutOfMemoryError I processed the resultant heapdumps through IBM Support Assistant's 64bit memory analyzer ( J9 VM running on Websphere 7.0.23)
Several leak candidates were listed ( all system classloader related ) however one of these appears to indicate that a char[] initialised with a value of 256 in StringBuffer actually contains 77 million null characters.
The resultant heapdump analysis from the Support Assistant shows a char[77418987] # 0xc32*** \u0000\u0000\u0000.......
this is referenced by StringBuffer -> PatternLayout -> TimeAndSizeRollingAppender
The retained heap checks out, 2 bytes for each char and 18 for the array itself for a total of 150+ Mbs.
Log4j version is 1.2.16 and we use the simonsite TimeAndSizeRollingAppender ( though I would like to remove this dependency ).
Could this be a false positive from Support Assistant or is there some way in which a char[256] can become a char[77000000+] on the heap?
By default, WebSphere generates a PHD file in response to an OOM event. One thing you need to be aware of is that these dumps contain information about the objects in the heap and their references, but not the actual data stored in attributes and arrays (of primitive types). That's why the memory analyzer only shows zeros. To get more information about the root cause, you should configure your WebSphere to create a system dump. That will allow you to see the data in the array and should give you a hint about what is happening.
The following link explains how to do this:
http://pic.dhe.ibm.com/infocenter/isa/v4r1m0/topic/com.ibm.java.diagnostics.memory.analyzer.doc/producing.html
For the 256 vs. 77000000+ question: 256 is only the initial capacity of the StringBuffer. It grows automatically as needed when data is appended.
Is it possible to find memory usage of object in java within application?
I want to have object memory usage to be part of debug output when application runs.
I don't want to connect using external application to VM.
I have a problem that few classes eats up huge amount of memory and causes memory
problems, my app gets crash. I need to find that memory usage (I am working with limited memory resources).
EDIT: I am using java 1.4:/
See my pet project, MemoryMeasurer. A tiny example:
long memory = MemoryMeasurer.measureBytes(new HashMap());
You may also derive more qualitative memory breakdown:
Footprint footprint = ObjectGraphMeasurer.measure(new HashMap());
For example, I used the latter to derive the per entry cost of various data structures, where the overhead is measured in number of objects created, references, and primitives, instead of just bytes (which is also doable). So, next time you use a (default) HashSet, you can be informed that each element in it costs 1 new object (not your element), 5 references, and an int, which is the exact same cost for an entry in HashMap (not unexpectedly, since any HashSet element ends up in a HashMap), and so on.
You can use it on any object graph. If your object graph contains links other structures you do wish to ignore, you should use a predicate to avoid exploring them.
Edit Instrumentation is not available to Java 1.4 (wow, people still use that?!), so the memoryBytes call above wouldn't work for you. But the second would. Then you can write something like this (if you are on a 32bit machine):
long memory = footprint.getObjects() * 8 + footprint.getReferences() * 4 +
footprint.getPrimitives().count(int.class) * 4 +
footprint.getPrimitives().count(long.class) * 8 + ...;
That gives you an approximation. A better answer would be to ceil this to the closest multiple of 16:
long alignedMemory = (x + 15) & (~0xF); //the last part zeros the lowest 4 bits
But the answer might still be off, since if you find, say, 16 booleans, it's one thing if they are found in the same object, and quite another if they are spread in multiple objects (and cause excessive space usage due to aligning). This logic could be implemented as another visitor (similar to how MemoryMeasurer and ObjectGraphMeasurer are implemented - quite simply as you may see), but I didn't bother, since that's what Instrumentation does, so it would only make sense of Java versions below 1.5.
Eclipse MAT is a really good tool to analyze memory.
There are tools that comes with jdk such as jmap and jhat which provides object level details.
The folowing link provides a piece of Java Code computing the size of objects:
http://www.javaworld.com/javaworld/javatips/jw-javatip130.html