I am currently developing an application that processes several files, containing around 75,000 records a piece (stored in binary format). When this app is ran (manually, about once a month), about 1 million records are contained entirely with the files. Files are put in a folder, click process and it goes and stores this into a MySQL database (table_1)
The records contain information that needs to be compared to another table (table_2) containing over 700k records.
I have gone about this a few ways:
METHOD 1: Import Now, Process Later
In this method, I would import the data into the database without any processing from the other table. However when I wanted to run a report on the collected data, it would crash assuming memory leak (1 GB used in total before crash).
METHOD 2: Import Now, Use MySQL to Process
This was what I would like to do but in practice it didn't seem to turn out so well. In this I would write the logic in finding the correlations between table_1 and table_2. However the MySQL result is massive and I couldn't get a consistent output, sometimes causing MySQL giving up.
METHOD 3: Import Now, Process Now
I am currently trying this method and although the memory leak is subtle, It still only gets to about 200,000 records before crashing. I have tried numerous forced garbage collections along the way, destroying properly classes, etc. It seems something is fighting me.
I am at my wits end trying to solve the issue with memory leaking / the app crashing. I am no expert in Java and have yet to really deal with very large amounts of data in MySQL. Any guidance would be extremely helpful. I have put thought into these methods:
Break each line process into individual class, hopefully expunging any memory usage on each line
Some sort of stored routine where once a line is stored into the database, MySQL does the table_1 <=> table_2 computation and stores the result
But I would like to pose the question to the many skilled Stack Overflow members to learn properly how this should be handled.
I concur with the answers that say "use a profiler".
But I'd just like to point out a couple of misconceptions in your question:
The storage leak is not due to massive data processing. It is due to a bug. The "massiveness" simply makes the symptoms more apparent.
Running the garbage collector won't cure a storage leak. The JVM always runs a full garbage collection immediately before it decides to give up and throw an OOME.
It is difficult to give advice on what might actually be causing the storage leak without more information on what you are trying to do and how you are doing it.
The learning curve for a profiler like VirtualVM is pretty small. With luck, you'll have an answer - at least a very big clue - within an hour or so.
you properly handle this situation by either:
generating a heap dump when the app crashes and analyzing that in a good memory profiler
hook up the running app to a good memory profiler and look at the heap
i personally prefer yjp, but there are some decent free apps as well (e.g. jvisualvm and netbeans)
Without knowing too much about what you're doing, if you're running out of memory there's likely some point where you're storing everything in the jvm, but you should be able to do a data processing task like this the severe memory problems you're experiencing. In the past, I've seen data processing pipelines that run out of memory because there's one class reading stuff out of the db, wrapping it all up in a nice collection, and then passing it off to another, which of course requires all of the data to be in memory simultaneously. Frameworks are good for hiding this sort of thing.
Heap dumps/digging with virtualVm hasn't been terribly helpful for me , as the details I'm looking for are often hidden - e.g. If you've got a ton of memory filled with maps of strings it doesn't really help to tell you that Strings are the largest component of your memory useage, you sort of need to know who owns them.
Can you post more detail about the actual problem you're trying to solve?
Related
The application hitting slowness issue and generate some heapdump file, the heapdump file is 1.2GB, and I need to run my ha456.jar using 8.4GB RAM only can open the heapdump.
Before this, when I analyze the heapdump, I will try to see the Bigger LeakSize and check for the Leak Suspect value, and I can see that which class or which method of my application holding the big memory. And then I will try to fix the code so that it can run with better performance.
For this time, I cant really get the point that which module/method of my application causing the out of memory issue. The following are some of my screen shot of my HeapAnalyzer:
For me, its just common class, for example java/lang/object, java/lang/Long, or java/util/HashMap. I cant really know which method of my application causing the out of memory.
Appreciate your advise on how to analyze on this.
Finding memory leak is always very difficult for anyone in front of the code, let alone from so far. So I can only give you some suggestions:
you got an heap dump, filter by your own objects and analyze who creates the most numerous
run your application and monitor it with VisualVM, use the application a little bit and then force a GC run... 9 times out of 10 the objects whose number does not decrease significantly or do not completely reset are your memory leak
This maybe happening due to a lot of records are read from somewhere like database, queue which is of type Long. There could be a cartesian join or something of that sort. Once i had a ton of strings causing oom and the culprit was a logger accumulating logs.
A couple of thoughts-
When you get oom error trace that back to the suspect method.
Get thread dump and see which threads are active and what they are executing.
I'm writing a cache server in java that will cache image data (jpgs, pngs, tiff etc) in memory for fast access over http. The images are rendered by another service, which is an expensive operation, so I want to cache them on my cache server.
There are several reasons why I'm writing it from scratch, so the answer I'm looking for is not [some clever software product]
Question: How can I keep certain a set of data objects in main memory, and ensure that data is actually in main memory when I need it, and not pushed to disk by a virtual memory manager? That is, how can i do this in Java?
Further information: Objects could be referenced with any interval, e.g. days or say years apart to be a bit extreme :-)
EDIT: I have found this SO post which asks "can you keep objects in contiguous memory?" - This is not the question I'm asking, although it could help, if objects were referenced all the time, I presume. And btw, the answer to that question was "no", except obviously for value-types in arrays.
I strongly doubt you can do this in Java alone. You'll probably have to use something like mlock through JNI, as well as the requisite JNI incantations to pin the cached objects graphs in memory so the GC doesn't move them. And [insert miracle here] to compact the pinned memory into contiguous pages because that's what mlock operates on.
I assume you want to keep access time predictably low so you want to avoid paging. In Java you have very limited set of tool to manage memory. In fact, this is operating systems' job to track which pages are inactive and can be pushed to disc. I am not even sure whether there is any API in major operating systems to control this behaviour.
That being said, you must focus on fooling the system that pages are actually needed, while they weren't really used for a long time. I think you already know the answer - just write an asynchronous task that touches every object in your cache every second or so. This should make the operating system to believe that you process is still actively using these areas of memory.
Sad but should be effective.
We use BDB JE in one of our applications, and DbDump for backing up database. The interesting things happened one day. DbDump starts to throw out an OutOfMemoryError. Postmortem analysis shows that a lot of memory is used by internal BDB nodes (IN). It seems like BerkleyDB reads all the dataset in memory while backing it up, which is quite strange for me.
Another strange fact is that this behavior only visible when the environment is open by the application itself. So when DbDumb is the only client who open environment everything seems to be fine.
Have you considered using DbBackup instead? I know they do two different things, but if all you're looking to do is backup a database, there's no need to pull it all into memory when simply copying the files elsewhere will do. Or is the command line ability the deciding factor here?
I'm creating a Java application for image processing , and after a while of working on this program I got Out of memory exception because I think the Image objects taking a lot of memory space ,I can save the images as files to hard disk and read them when i need but that may took milli-seconds vs Nano-seconds if I use RAM with object.what I can do to solve this?
First of all, use a memory profiler such as YourKit to figure out what it is exactly that's consuming the memory (for example, it could be due to the accidental retention of some unneeded references). Once you understand how your program is actually using the memory, you can formulate a plan of attack.
Perhaps you have issues with not disposing images you are not using.
We have been facing Out of Memory errors in our App server for sometime. We see the used heap size increasing gradually until finally it reaches the available heap in size. This happens every 3 weeks after which a server restart is needed to fix this.
Upon analysis of the heap dumps we find the problem to be objects used in JSPs.
Can JSP objects be the real cause of Appserver memory issues? How do we free up JSP objects (Objects which are being instantiated using usebean or other tags)?
We have a clustered Websphere appserver with 2 nodes and an IHS.
EDIT: The findings above are based on the heap-dump and nativestderr log analysis given below using the IBM support assistant
nativestd err log analysis:
alt text http://saregos.com/wp-content/uploads/2010/03/chart.jpg
Heap dump analysis:
![alt text][2]
Heap dump analysis showing the immediate dominators (2 levels up of hastable entry in the image above)
![alt text][3]
The last image shows that the immediate dominators are in fact objects being used in JSPs.
EDIT2: More info available at http://saregos.com/?p=43
I'd first attach a profile tool to tell you what these "Objects" are that are taking up all the memory.
Eclipse has TPTP,
or there is JProfiler
or JProbe.
Any of these should show the object heap creaping up and allow you to inspect it to see what is on the heap.
Then search the code base to find who is creating these.
Maybe you have a cache or tree/map object with elements in and you have only implemented the "equals()" method on these objects, and you need to implement "hashcode()".
This would then result in the map/cache/tree getting bigger and bigger till it falls over.
This is only a guess though.
JProfiler would be my first call
Javaworld has example screen shot of what is in memory...
(source: javaworld.com)
And a screen shot of object heap building up and being cleaned up (hence the saw edge)
(source: javaworld.com)
UPDATE *************************************************
Ok, I'd look at...
http://www-01.ibm.com/support/docview.wss?uid=swg1PK38940
Heap usage increases over time which leads to an OutOfMemory
condition. Analysis of a heapdump shows that the following
objects are taking up an increasing amount of space:
40,543,128 [304] 47 class
com/ibm/wsspi/rasdiag/DiagnosticConfigHome
40,539,056 [56] 2 java/util/Hashtable 0xa8089170
40,539,000 [2,064] 511 array of java/util/Hashtable$Entry
6,300,888 [40] 3 java/util/Hashtable$HashtableCacheHashEntry
Triggering the garbage collection manually doesn't solve your problem - it won't free resources that are still in use.
You should use a profiling tool (like jProfiler) to find your leaks. You problably use code that stores references in lists or maps that are not released during runtime - propably static references.
If you run under the Sun 6 JVM strongly consider to use the jvisualvm program in the JDK to get an inital overview of what actually goes on inside the program. The snapshot comparison is really good to help you get further in which objects sneak in.
If Sun 6 JVM is not an option, then investigate which profiling tools you have. Trials can get you really far.
It can be something as simple as gigantic character arrays underlying a substring you are collecting in a list, for e.g. housekeeping.
I suggest reading Effective Java, chapter 2. Following it, together with a profiler, will help you identify the places where your application produces memory leaks.
Freeing up memory isn't the way to solve extensive memory consumption. The extensive memory consumption may be a result of two things:
not properly written code - the solution is to write it properly, so that it does not consume more than is needed - Effective Java will help here.
the application simply needs this much memory. Then you should increase the VM memory using Xmx, Xms, XX:MaxHeapSize,...
There is no specific to free up objects allocated in JSPs, at least as far as I know. Rather than investigationg such options, I'd rather focus on finding the actual problem in your application codes and fix it.
Some hints that might help:
Check the scope of your beans. Aren't
you e.g. storing something user or
request specific into "application"
scope (by mistake)?
Check settings of web session timeout in your web application and
appserver settings.
You mentioned the heap consumption grows gradually. If it's indeed so,
try to see by how much the heap size
grows with various user scenarios:
Grab a heapdump, run a test, let the
session data timeout, grab another
dump, compare the two. That might
give you some idea where do the objects on heap come from
Check your beans for any obvious memory leaks, for sure :)
EDIT: Checking for unreleased static resources that Daniel mentions is another worthwhile thing :)
As I understand those top-level memory-eaters are cache storage and objects stored in it. Probably you should make sure that your cache is going to free objects when it takes too much memory. You may want to use weak-ref if you need cache for live objects only.