Java HashMap memory, to reduce it - java

I'm using a HashMap in java and I have noted that it is consume too much memory, but I have the necessity to search the elements to quickly possible.
Is there something in order to reduce the memory of the hashMap if I know before how much element I will put inside?
Because I know how much information I will to store it, but I don't know it.
Because my problem is to read a file and in this file there are some information dived in two set and I have to connect these information in the same struct.
I know that the HashMap in order to work well wastes more than 25% of the memory that it has get.
Thank you for your help.

use:
new HashMap<>(capacity);

Related

Memory mapped collections in Java

I'm filling up the JVM Heap Space.
Changing parameters to give more heap space to the JVM, or changing something in my algorithm in the code not to use so much space are two of the most recommended options.
But, if those two have already been tried and applied, and I still get out of memory exceptions, I'd like to see what the other options are.
I found out about this example of "Using a memory mapped file for a huge matrix" and a library called HugeCollections which are an interesting way to solve my problem. Unluckily, the library hasn't seen an update for over a year, and it's not in any Maven repo - so for me it's not a really reliable one.
My question is, is there any other library doing this, or a good way of achieving it (having collection objects (lists and sets) memory mapped)?
You don't say what sort of collections you're using, or the way that you're using them, so it's hard to give recommendations. However, here are a few things to keep in mind:
Keeping the objects on the Java heap will always be the simplest option, and RAM is relatively cheap.
Blindly moving to memory-mapped data is very likely to give horrendous performance, especially if you're moving around in the file and/or making lots of changes. Hash-based collection types are the worst, as they work by distributing data. Tree-based collection types are generally a better choice, and linear collections can go both ways.
Once you move off-heap, you need a way to translate your objects to/from Java. Object serialization is the easiest, but adds lots of overhead. Binary objects accessed via byte buffers are usually a better choice, but you need to be thread-conscious.
You also have to manage your own garbage collection for off-heap objects. Not a problem if all you're doing is creating/updating, but quickly becomes a pain if you're deleting.
If you have a lot of data, and need to access that data in varied ways, a database is probably your best bet.
Unluckily, the library hasn't seen an update for over a year, and it's not in any Maven repo - so for me it's not a really reliable one I agree and I wrote it. ;)
I suggest you look at https://github.com/peter-lawrey/Java-Chronicle which is higher performance has been used a bit. It really design for List & Queue but you could use it for a Map or Set with additional data structures.
Depending on your requirements, you could write your own library. e.g. for time series data I wrote a different library which is not open source unfortunately but can load tables of 500+ GB pretty cleanly.
it's not in any Maven repo
Neither is this one but would be happy for someone to add it.
Sounds like you're either having trouble with a memory leak, or trying to put too large an Object into memory.
Have you tried making a rough estimate of the amount of memory needed to load your data?
Assuming you have no memory leaks or other issues and really need that much storage that you can't fit it in the heap (which I find unlikely) you have basically only one option:
Don't put your data on the heap. Simple as that. Now which method you use to move your data out is very dependend on your requirements (what kind of data, frequency of updates and how much is it really?).
Note: You can use very large heaps with a 64-bit VM and if necessary enlarge the swap space of the OS. It may be the simplest solution to just brutally increase the maximum heap size (even if it means lots of swapping). I certainly would try that first in the situation you outlined.

What is the fastest way to deal with large arrays of data in Android?

could you please suggest me (novice in Android/JAVA) what`s the most efficient way to deal with a relatively large amounts of data?
I need to compute some stuff for each of the 1000...5000 of elements in say a big datatype (x1,y1,z1 - double, flag1...flagn - boolean, desc1...descn - string) quite often (once a sec), that is why I want to do is as fast as possible.
What way would be the best? To declare a multidimensional array, or produce an array for each element (x1[i], y1[i]...), special class, some sort of JavaBean? Which one is the most efficient in terms of speed etc? Which is the most common way to deal with that sort of thing in Java?
Many thanks in advance!
Nick, you've asked a very generally questions. I'll do my best to answer it, but please be aware if you want anything more specific, you're going to need to drill down your question a bit.
Some back-envolope-calculations show that for and array of 5000 doubles you'll use 8 bytes * 5000 = 40,000 bytes or roughly 40 kB of memory. This isn't too bad as memory on most android devices is on the order of mega or even giga bytes. A good 'ol ArrayList should do just fine for storing this data. You could probably make things a little faster by specifying the ArrayLists length when you constructor. That way the Arraylist doesn't have to dynamically expand every time you add more data to it.
Word of caution though. Since we are on a memory restricted device, what could potentially happen is if you generate a lot of these ArrayLists rapidly in succession, you might start triggering the garbage collector a lot. This could cause your app to slow down (the whole device actually). If you're really going to be generating lots of data, then don't store it in memory. Store it off on disk where you'll have plenty of room and won't be triggering the garbage collector all the time.
I think that the efficiency with which you write the computation you need to do on each element is way more important than the data structure you use to store it. The difference between using an array for each element or an array of objects (each of which is the instance of a class containing all elements) should practically be negligible. Use whatever data structures you feel most comfortable with and focus on writing efficient algorithms.

How to save part of a TreeSet to file efficiently? and reload it? (Java question)

I'm working with a TreeSet to store some information, so that it is sorted according to some order.
When the TreeSet becomes very large (>1GB), I want to save the smallest elements in the TreeSet to a file, to free some RAM. Then later, when there is more free RAM, I want to be able to reload these elements into memory to process them.
My question is: is there some efficient way of storing part of a TreeSet to file and restoring them into memory later?
Note that when I reload the elements into memory, it could be part of a new TreeSet or into the same TreeSet.
Thanks for any idea about how to do this!
What are you using the TreeSet for? Do the contents change often? Are you trying to be efficient in terms of speed, or disk usage?
Reading and writing to a File is very slow compared to memory, and keeping the file and memory versions in sync might be challenging if they change often.
Maybe it makes sense to use a database. There are several lightweight databases such as derby and sqllite which can be embedded in your application. Databases are designed to worry about the memory vs. file issue, and if you have >1Gb of data, maybe it makes sense to organise it.

Java : which of these two methods is more efficient?

I have a Huge data file and I only need specific data from this file, and later on, I will be using these data frequently.
So which of these two methods would be more efficient :
save this data in global variables (maybe LinkedList) and use them every time I need
save them in a file, and read the file every time I need the data
I should mention that these data could be a huge amount of integers.
Which of the mentioned two ways would give better performance with respect to speed and memory ?
If the file I/O overhead is not an issue for you: Save them in a file and create an index file mapping keys to file positions so you do not have to read your huge file.
If the data fits in your RAM and you want to be able to access it quickly - go by the first approach (but maybe without an index file) but read the data into memory at startup or when needed the first time.
As long as it fits in memory, working with memory is surely some orders of magnitude faster. But do not use LinkedList - it has a huge overhead. And do not use any standard Collection at all since it means boxing and blows the memory overhead by a factor 3 at least.
You could use int[] or a specialized collection for primitive types.
I'd recommend using a file via java.nio.IntBuffer. This way the data reside primarily on the disk but get mapped into memory too.
Probably the first one.
But there really isn't enough information there to answer you properly.
Firstly a linked list is fine if you only ever traverse it in order. However, if you need random access to it (5th element, then 100th, then 12th, then 45th...), it's lousy, and you'd be better with an ArrayList or something. Secondly, if you're storing lots of ints, if you use one of the standard Java collections, each int will be boxed, which may present a performance overhead.
Then you haven't said what 'huge' means. Thousands? Millions?
So, yeah, you need to say what kind of numbers you're dealing with, and what the access patterns are likely to be. And is the 'filtering' step a one-off--or is it done quite frequently?
It depends on system spec, if you are designing your app for one machine - the task is simple, elsewhere you should take into account memory and/or disk space limit on client's computer.
I think you cannot compare these two attitudes performance, as each one has it's own benefits and drawbacks. I'm certain that there are some algorithms available that you could further investigate, connected with reading part of a file into the memory, or creating a cache (when you read a number from a file, store it in memory, so next time you load it - it will be stored in memory).

how to handle large lists of data

We have a part of an application where, say, 20% of the time it needs to read in a huge amount of data that exceeds memory limits. While we can increase memory limits, we hesitate to do so to since it requires having a high allocation when most times it's not necessary.
We are considering using a customized java.util.List implementation to spool to disk when we hit peak loads like this, but under lighter circumstances will remain in memory.
The data is loaded once into the collection, subsequently iterated over and processed, and then thrown away. It doesn't need to be sorted once it's in the collection.
Does anyone have pros/cons regarding such an approach?
Is there an open source product that provides some sort of List impl like this?
Thanks!
Updates:
Not to be cheeky, but by 'huge' I mean exceeding the amount of memory we're willing to allocate without interfering with other processes on the same hardware. What other details do you need?
The application is, essentially a batch processor that loads in data from multiple database tables and conducts extensive business logic on it. All of the data in the list is required since aggregate operations are part of the logic done.
I just came across this post which offers a very good option: STXXL equivalent in Java
Do you really need to use a List? Write an implementation of Iterator (it may help to extend AbstractIterator) that steps through your data instead. Then you can make use of helpful utilities like these with that iterator. None of this will cause huge amounts of data to be loaded eagerly into memory -- instead, records are read from your source only as the iterator is advanced.
If you're working with huge amounts of data, you might want to consider using a database instead.
Back it up to a database and do lazy loading on the items.
An ORM framework may be in order. It depends on your usage. It may be pretty straight forward, or the worst of your nightmares it is hard to tell from what you've described.
I'm optimist and I think that using a ORM framework ( such as Hibernate ) would solve your problem in about 3 - 5 days
Is there sorting/processing that's going on while the data is being read into the collection? Where is it being read from?
If it's being read from disk already, would it be possible to simply batch-process it directly from disk, instead of reading it into a list completely and then iterating? How inter-dependent is the data?
I would also question why you need to load all of the data in memory to process it. Typically, you should be able to do the processing as it is being loaded and then use the result. That would keep the actual data out of memory.

Categories