My web application is running in apache tomcat.
The classloader/component org.apache.catalina.loader.WebappClassLoader # 0x7a199fae8 occupies 1,70,86,32,104 (88.08%) bytes.
The memory is accumulated in one instance of java.util.concurrent.ConcurrentHashMap$Segment[] loaded by <system class loader>.
I got this problem while analyzing Heapdump. How to analyze it further ?
You provide very little information so I only can provide very little advice… ;-)
First you need to find out who is using the largest objects (the HashMap in your case). Try to look at the contents of the HashMap so you may find out what it is used for. You should also try to look at where this objects are referenced.
Than you can try to limit its size. Depending on whether it is used by a framework you use or by your own code this can be easy (e.g. configuration change for a frameworks cache), medium (e.g. you need to refactor your own code) or difficult (e.g. it is deeply buried in a library you have no control over).
Often the culprit is not the one you expect: Only because an object instance (in your case the HashMap) accumulates a lot of memory does not mean the "owner" of this object is the root cause of the problem. You might well have to look some levels above or below in the object tree or even in a completely different location. In most cases it is crucial that you know your application very well.
Update: You can try to inspect the contents of a HashMap by right clicking it and selecting Java Collections, Hash Entries. For general objects you can use List objects, with incoming references (to list all objects that reference the selected object) or with outgoing references (to list all object that are referenced by the selected object).
Memory analysis is not an easy task and can require a lot of time, at least if you are not used to it…
If you need further assistance you need to provide more details about your application, the frameworks you use and how the heap looks like in MAT.
Related
I am looking at changing the way some large objects which maintain the data for a large website are reloaded, they contain data relating to catalogue structure, products etc and get reloaded daily.
After changing how they are reloaded I need to be able to see whether there is any difference in the resulting data so the intention is to reload both and compare the content.
There may be some issues(ie. lists used when ordering is not imporatant) that make the comparison harder so I would need to be able to alter the structure before comparison. I have tried to serialise to json using gson but I run out of memory. I'm thinking of trying other serialisation methods or writing my own simple one.
I imagine this is something that other people will have wanted to do when changing critical things like this but I haven't managed to find anythign about it.
In this special case (separate VMs) I suggest adding something like a dump method to each class which writes the relevant content into a file (human readable text). This method calls dump on each aggregated object as well.
In the end you have to files from each VM, and then you can compare them using an MD5 checksum for example.
This is probably a lot of work, but if you encounter any differences, you can use diff on both files, and this will be a great help.
You can start with a simple version, and refine it step-by-step by adding more output.
Adding (complete) serialization later to a class is cumbersome. There might be tools which simplify this (using reflection etc.), but in my experience you have to tweak your classes: Exclude fields which are not relevant, define a sort order for lists, cyclic relations etc.
Actually I use a similar approach for the same reasons (to check whether a new version still returns the same result): The application contains multiple services (for each version), the results are always data transfer objects, serialization is added immediately to the DTOs, and DTOs must provide a comparison method dedicated for this purpose.
Looking at the complications and memory issues, also as you have mentioned you dont want to maintain versions, i would look to use database for comparison.
It will need some effort in terms of mapping your data in jvm to db table but once you have done that, it will be staright forward. You can dump data from one large object in db tables and then you can simply run a check from 2nd object in db.
Creating a stored proc can simplify things. This solution can support data check from any number of jvms.
Is there a method where I can iterate a Collection and only retrieve just a subset of attributes without loading/unloading the each of the full object to cache? 'Cos it seems like a waste to load/unload the WHOLE (possibly big) object when I need only some attribute(s), especially if the objects are big. It might cause unnecessary cache conflicts when loading such unnecessary data, right?
When I meant to 'load to cache' I mean to 'process' that object via the processor. So there would be objects of ex: 10 attributes. In the iterating loop I only use 1 of those. In such a scenario, I think its a waste to load all the other 9 attributes to the processor from the memory. Isn't there a solution to only extract the attributes without loading the full object?
Also, does something like Google's Guava solve the problem internally?
THANK YOU!
It's not usually the first place to look, but it's not certainly impossible that you're running into cache sharing problems. If you're really convinced (from realistic profiling or analysis of hardware counters) that this is a bottleneck worth addressing, you might consider altering your data structures to use parallel arrays of primitives (akin to column-based database storage in some DB architectures). e.g. one 'column' as a float[], another as a short[], a third as a String[], all indexed by the same identifier. This structure allows you to 'query' individual columns without loading into cache any columns that aren't currently needed.
I have some low-level algorithmic code that would really benefit from C's struct. I ran some microbenchmarks on various alternatives and found that parallel arrays was the most effective option for my algorithms (that may or may not apply to your own).
Note that a parallel-array structure will be considerably more complex to maintain and mutate than using Objects in java.util collections. So I'll reiterate - I'd only take this approach after you've convinced yourself that the benefit will be worth the pain.
There is no way in Java to manage loading to processor caches, and there is no way to change how the JVM works with objects, so the answer is no.
Java is not a low-level language and hides such details from the programmer.
The JVM will decide how much of the object it loads. It might load the whole object as some kind of read-ahead optimization, or load only the fields you actually access, or analyze the code during JIT compilation and do a combination of both.
Also, how large do you worry your objects are? I have rarely seen classes with more than a few fields, so I would not consider that big.
I am working on a project, in which need to load thousands of object's data in HashMap/Hashtable/ArrayList. No issue with small application but it goes out of memory in large application.
Please suggest how to handle this situation?
Why does it need to be in memory? I would say use a random access file.
I am wondering about your requirement here, why is it important to load thousands of objects at the same time? Can you provide more details about it? Perhaps your implementation can be reworked so that you don't need that many objects loaded in memory.
don't read all the data into memory at once, or expand the memory available prior to execution.
You can not increase heap size programmatically.
Either you have to increase the memory or u have to check through the code whether there is some point where application is creating many objects(probably in loops) . If it is there nullify that objects(make sure it will not affect your application flow). Another option try to use a lighter object (say A bean can be made lighter to a string object if you can properly override the toString() method. This will also increase the performance of your application)
You could use a cache system, like ehcache for example. That would give you some control over the "memory" used. There's other cache implementation, ehcache might not suit your needs.
I'm considering to port an application to db4o. The data model consists of lots of small objects with a lot of references between each other. For example, I have a book which points to an author and chapter. Chapters have sections, sections have large blobs of text, images, and they reference characters mentioned.
I think it should be possible to keep the meta structure in memory (everything except the text blobs) but I was wondering whether I could use some clever trick involving WeakReference so db4o would just keep the part of the model in memory that I really need (i.e. which I've been using recently).
The same is true for the text blobs (which should be around 1-10KB). Is it possible to get a String without having to worry about the DB layer and without having to query for the text blob using an artificial ID inside the getter and without using a hard reference which keeps the whole text in memory all the time?
Turning off WeakReferences is mostly used for performance tuning. The downsides to this approach are not negligible - so be careful. I would not recommend it.
Controlling memory usage should be done using activation features. Activation can help you keep only part of you model in memory and weakreferences will help you GC objects no longer used. I think that's the way to go.
Also - you can post your questions to db4o forums to get help from the db4o community.
Goran
I've not used db40, or any ORM/OODB product recently, however it would strike me that this kind of memory management & graph management feature should be part of the framework itself rather than something you build on top of it. If Versant's db40 doesn't offer this it might be worth you looking into another product instead that does offer it. So, I realise not the answer your looking for, but leveraging the framework would be my first port of call.
I would like to get a reference to all objects in the Java heap, even if I don't immediately have a reference to those objects in my active thread. I don't need non-referenced objects (those "queued" for garbage collection), but would like to get anything that's still in use.
The goal is to serialize and store all the objects to implement a poor-man's persistence of execution state. I realize that the rabbit hole goes deep when it comes to different types of transient state, but simply persisting objects & loaded class definitions would be useful to me.
Is there a way to access the heap in order to make this happen? Am I overlooking a more straightfoward approach?
I'd look into the the instrument package. Instrument the classes you are interested in so the ctor registers the created instance. You might be able to do this via AspectJ should you not want to use the java.lang.instrument or if the objects are created via something you can control (an IoC container or factories) then you can do something a good chunk less magical.
If you want to take a heap dump programmatically, you'll not find suitable APIs in the java.* or javax.* namespace. However, the Sun runtime comes with the HotSpotDiagnosticMXBean which will enable you to take a heap dump by writing the contents of the heap on to a specified file in disk.
I suggest you take a heap dump and then inspect it using the Eclipse Memory Analyser.
The views available allow you to drill down to instance level, view object properties. You can even query objects using OQL - and SQL-like query language for objects.
The left panel in the below screenshot demonstrates inspecting field values.
screenshot http://img181.imageshack.us/img181/4013/dominatortreegrouped.png