If I have a List<Object>, would it be possible to run some method on each Object to see how much memory each is consuming? I know nothing about each Object it may be an entire video file loaded onto the heap or just a two-byte string. I ultimately would like to know which objects to drop first before running out of memory.
I think Runtime.totalMemory() shows the memory currently used by the JVM, but I want to see the memory used by a single object.
SoftReference looks kinda like what you need. Create a list of soft references to your objects, and if those objects are not referenced anywhere and you run out of memory, JVM will delete some of them. I don't know how smart the algorithm for choosing what to delete is, but it could as well be removing those that will free most memory.
If you are in a container you can use Jconsole http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html
The jdk since 1.5 comes with heap dump ulits... You in a container or in eclipse? Also why do you have a List of Objects??
There is no clean way to do it. You can create a dummy OutputStream which will do nothing but counting number of bytes written. So, you can make some estimation about your object graph size by serializing it to such stream.
I would not advise to do it in production system. I, personally, did it once for experimenting and making estimations.
Actually another possible tactic is just to make a crap load of instance of the class you want to check (like a million in an array).
The sheer number of objects should negate the overhead (as in the overhead of other stuff will be much smaller than your crap load of objects).
You will want to run this in isolation of course (ie public static main()).
I will admit you will need lots of memory for this test.
Something you could do is make a Map<Object, Long> which maps each object to it's memory size.
Then to measure the size of a particular object, you have to do it at instantiation of each object - measure the JVM memory use before (calling Runtime.totalMemory()) and after building the object (calling Runtime.totalMemory()) and take the difference between the two - that is the size of the object in memory. Then add the Object and Long to your map. From there you should be able to loop through all of the keys in the map and find the object using the largest amount of space.
I am not sure there is a way to do it per object after you already have your List<Object>... I hope this is helpful!
Related
I have a graph algorithm that generates intermediate results associated to different nodes. Currently, I have solved this by using a ConcurrentHashMap<Node, List<Result> (I am running multithreaded). So at first I add new results with map.get(node).add(result) and then I consume all results for a node at once with map.get(node).
However, I need to run on a pretty large graph where the number of intermediate results wan't fit into memory (good old OutOfMemory Exception). So I require some solution to write out the results on disk—because that's where there is still space.
Having looked at a lot of different "off-heap" maps and caches as well as MapDB I figured they are all not a fit for me. All of them don't seem to support Multimaps (which I guess you can call my map) or mutable values (which the list would be). Additionally, MapDB has been very slow for me when trying to create a new collection for every node (even with a custom serializer based on FST).
I can barely imagine, though, that I am the first and only to have such a problem. All I need is a mapping from a key to a list which I only need to extend or read as a whole. What would an elegant and simple solution look like? Or are there any existing libraries that I can use for this?
Thanks in advance for saving my week :).
EDIT
I have seen many good answers, however, I have two important constraints: I don't want to depend on an external database (e.g. Redis) and I can't influence the heap size.
You can increase the size of heap. The size of heap can be
configured to larger than physical memory size of your server while
you make sure the condition is right:
the size of heap + the size of other applications < the size of physical memory + the size of swap space
For instance, if the physical memory is 4G and the swap space is 4G,
the heap size can be configured to 6G.
But the program will suffer from page swapping.
You can use some database like Redis. Redis is key-value
database and has List structure.
I think this is the simplest way to solve your problem.
You can compress the Result instance. First, you serialize the
instance and compress that. And define the class:
class CompressResult {
byte[] result;
//...
}
And replace the Result to CompressResult. But you should deserialize
the result when you want to use it.
It will work well if the class Result has many fields and is very
complicated.
My recollection is that the JVM runs with a small initial max heap size. If you use the -Xmx10000m you can tell the JVM to run with a 10,000 MB (or whatever number you selected) heap. If your underlying OS resources support a larger heap that might work.
Is it possible to check programatically how much memory takes some object (with the whole subtree in the JVM memory). I would like to say (from the java code)
'tell me how much memory in the current JVM takes the JPanel with the
whole reference subtree when we assume that mentioned JPanel is the root
of this tree'.
I wonder if I could this way compare how much memory take two JPanels (or JFrame or whatever), and which takes more - without analyzing the dump. And I wonder if the answer is 'yes' how precise would be this value.
as stated in the comments to your qustion, the sizeOf problem in java isnt easy to solve, not only because the object youre trying to size isnt really the root of a memory graph, but also because there are issues with counting the size of static fields etc (they belong to the whole class, not any specific instance).
however, there are ways to get some meaningful data.
the 1st approach is to use a java agent attached to the jvm which in turn calls a size estimation function that sun/oracle have added starting with java 6. see this page for instructions
the 2nd approach is to estimate the size of an object tree based on theoretical calculations. there's a library that does this for you here
You can check JAMM, which is a java agent for measuring object size. You can find a tutorial here how to use it.
I'm filling up the JVM Heap Space.
Changing parameters to give more heap space to the JVM, or changing something in my algorithm in the code not to use so much space are two of the most recommended options.
But, if those two have already been tried and applied, and I still get out of memory exceptions, I'd like to see what the other options are.
I found out about this example of "Using a memory mapped file for a huge matrix" and a library called HugeCollections which are an interesting way to solve my problem. Unluckily, the library hasn't seen an update for over a year, and it's not in any Maven repo - so for me it's not a really reliable one.
My question is, is there any other library doing this, or a good way of achieving it (having collection objects (lists and sets) memory mapped)?
You don't say what sort of collections you're using, or the way that you're using them, so it's hard to give recommendations. However, here are a few things to keep in mind:
Keeping the objects on the Java heap will always be the simplest option, and RAM is relatively cheap.
Blindly moving to memory-mapped data is very likely to give horrendous performance, especially if you're moving around in the file and/or making lots of changes. Hash-based collection types are the worst, as they work by distributing data. Tree-based collection types are generally a better choice, and linear collections can go both ways.
Once you move off-heap, you need a way to translate your objects to/from Java. Object serialization is the easiest, but adds lots of overhead. Binary objects accessed via byte buffers are usually a better choice, but you need to be thread-conscious.
You also have to manage your own garbage collection for off-heap objects. Not a problem if all you're doing is creating/updating, but quickly becomes a pain if you're deleting.
If you have a lot of data, and need to access that data in varied ways, a database is probably your best bet.
Unluckily, the library hasn't seen an update for over a year, and it's not in any Maven repo - so for me it's not a really reliable one I agree and I wrote it. ;)
I suggest you look at https://github.com/peter-lawrey/Java-Chronicle which is higher performance has been used a bit. It really design for List & Queue but you could use it for a Map or Set with additional data structures.
Depending on your requirements, you could write your own library. e.g. for time series data I wrote a different library which is not open source unfortunately but can load tables of 500+ GB pretty cleanly.
it's not in any Maven repo
Neither is this one but would be happy for someone to add it.
Sounds like you're either having trouble with a memory leak, or trying to put too large an Object into memory.
Have you tried making a rough estimate of the amount of memory needed to load your data?
Assuming you have no memory leaks or other issues and really need that much storage that you can't fit it in the heap (which I find unlikely) you have basically only one option:
Don't put your data on the heap. Simple as that. Now which method you use to move your data out is very dependend on your requirements (what kind of data, frequency of updates and how much is it really?).
Note: You can use very large heaps with a 64-bit VM and if necessary enlarge the swap space of the OS. It may be the simplest solution to just brutally increase the maximum heap size (even if it means lots of swapping). I certainly would try that first in the situation you outlined.
I have a Huge data file and I only need specific data from this file, and later on, I will be using these data frequently.
So which of these two methods would be more efficient :
save this data in global variables (maybe LinkedList) and use them every time I need
save them in a file, and read the file every time I need the data
I should mention that these data could be a huge amount of integers.
Which of the mentioned two ways would give better performance with respect to speed and memory ?
If the file I/O overhead is not an issue for you: Save them in a file and create an index file mapping keys to file positions so you do not have to read your huge file.
If the data fits in your RAM and you want to be able to access it quickly - go by the first approach (but maybe without an index file) but read the data into memory at startup or when needed the first time.
As long as it fits in memory, working with memory is surely some orders of magnitude faster. But do not use LinkedList - it has a huge overhead. And do not use any standard Collection at all since it means boxing and blows the memory overhead by a factor 3 at least.
You could use int[] or a specialized collection for primitive types.
I'd recommend using a file via java.nio.IntBuffer. This way the data reside primarily on the disk but get mapped into memory too.
Probably the first one.
But there really isn't enough information there to answer you properly.
Firstly a linked list is fine if you only ever traverse it in order. However, if you need random access to it (5th element, then 100th, then 12th, then 45th...), it's lousy, and you'd be better with an ArrayList or something. Secondly, if you're storing lots of ints, if you use one of the standard Java collections, each int will be boxed, which may present a performance overhead.
Then you haven't said what 'huge' means. Thousands? Millions?
So, yeah, you need to say what kind of numbers you're dealing with, and what the access patterns are likely to be. And is the 'filtering' step a one-off--or is it done quite frequently?
It depends on system spec, if you are designing your app for one machine - the task is simple, elsewhere you should take into account memory and/or disk space limit on client's computer.
I think you cannot compare these two attitudes performance, as each one has it's own benefits and drawbacks. I'm certain that there are some algorithms available that you could further investigate, connected with reading part of a file into the memory, or creating a cache (when you read a number from a file, store it in memory, so next time you load it - it will be stored in memory).
Is there anyway in Java to delete data (e.g., a variable value, object) and be sure it can't be recovered from memory? Does assigning null to a variable in Java delete the value from memory? Any ideas? Answers applicable to other languages are also acceptable.
Due to the wonders virtual memory, it is nearly impossible to delete something from memory in a completely irretrievable manner. Your best bet is to zero out the value fields; however:
This does not mean that an old (unzeroed) copy of the object won't be left on an unused swap page, which could persist across reboots.
Neither does it stop someone from attaching a debugger to your application and poking around before the object gets zeroed or crashing the VM and poking around in the heap dump.
Store sensitive data in an array, then "zero" it out as soon as possible.
Any data in RAM can be copied to the disk by a virtual memory system. Data in RAM (or a core dump) can also be inspected by debugging tools. To minimize the chance of this happening, you should strive for the following
keep the time window a secret is
present in memory as short as
possible
be careful about IO pipelines (e.g.,
BufferedInputStream) that internally
buffer data
keep the references to the secret on the stack and out of the heap
don't use immutable types, like
String, to hold secrets
The cryptographic APIs in Java use this approach, and any APIs you create should support it too. For example, KeyStore.load allows a caller to clear a password char[], and when the call completes, as does the KeySpec for password-based encryption.
Ideally, you would use a finally block to zero the array, like this:
KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
InputStream is = …
char[] pw = System.console().readPassword();
try {
ks.load(is, pw);
}
finally {
Arrays.fill(pw, '\0');
}
Nothing gets deleted, its just about being accessible or not to the application.
Once inaccessible, the space becomes a candidate for subsequent usage when need arises and the space will be overwritten.
In case of direct memory access, something is always there to read but it might be junk and wont make sense.
By setting your Object to null doesn't mean that your object is removed from memory. The Virtual Machine will flag that Object as ready for Garbage Collection if there are no more references to that Object. Depending on your code it might still be referenced even though you have set it to null in which case it will not be removed. (Essentially if you expect it to be garbage collected and it is not you have a memory leak!)
Once it is flagged as ready for collection you have no control over when the Garbage Collector will remove it. You can mess around with Garbage Collection strategies but I wouldn't advise it.
Profile your application and look at the object and it's id and you can see what is referencing it. Java provide VisualVM with 1.6.0_07 and above or you can use NetBeans
As zacherates said, zero out the sensitive fields of your Object before removing references to it. Note that you can't zero out the contents of a String, so use char arrays and zero each element.
Nope, unless you have direct answer to hardware. There is a chance that variable will be cached somewhere. Sensitive data can even be stored in swap :) If you're concerning only about RAM, you can play with garbage collector. In high level langs usually you don't have a direct access to memory, so it's not possible to control this aspect. For example in .NET there is a class SecureString which uses interop and direct memory access.
I would think that your best bet (that isn't complex) is to use a char[] and then change each position in the array. The other comments about it being possible for it to be copied in memory still apply.
Primitive data (byte, char, int, double) and arrays of them (byte[], ...) are erasable by writing new random content into them.
Object data have to be sanitized by overwriting their primitive properties; setting a variable to null just makes the object available for GC, but not immediately dead. A dump of VM will contain them for anyone to see.
Immutable data such as String cannot be overwritten in any way. Any modification just makes a copy. You shall avoid keeping sensitive data in such objects.
P.S. If we talk about passwords, it's better to use crypto-strong hash functions (MD5, SHA1, ...), and never ever work with passwords in clear text.
If you're thinking about securing password/key management, you could write some JNI code that uses platform-specific API to store the keys in a secure way and not leak the data into the memory managed by the JVM. For example, you could store the keys in a page locked in physical memory and could prevent the IO bus from accessing the memory.
EDIT: To comment on some of the previous answers, the JVM could relocate your objects in memory without erasing their previous locations, so, even char[], bytes, ints and other "erasable" data types aren't an answer if you really want to make sure that no sensitive information is stored in the memory managed by the JVM or swapped on to the hard drive.
Totally and completely irretrievable is something almost impossible in this day and age.
When you normally delete something, the onlything that happens is that the first spot in your memory is emptied. This first spot used to contain the information as to howfar the memory had to be reserved for that program or something else.
But all the other info is still there untill it's overwritten by someone else.
i sudgest either TinyShredder, or using CCleaner set to the Gutmann-pass