Minimizing application data memory overhead in java processes - java

I need to store lots of data (Objects) in memory (for computations).
Since computations are done based on this data it is critical that all data will reside in the same JVM process memory.
Most data will be built from Strings, Integers and other sub-objects (Collections, HashSet, etc...).
Since Java's objects memory overhead is significant (Strings are UTF-16, each object has 8 bytes overhead) I'm looking for libraries which enable storing such data in memory with lower overhead.
I've read interesting articles about reducing memory:
* http://www.cs.virginia.edu/kim/publicity/pldi09tutorials/memory-efficient-java-tutorial.pdf
* http://blog.griddynamics.com/2010/01/java-tricks-reducing-memory-consumption.html
I was just wondering if there is some library for such scenarios out there or I'll need to start from scratch.
To understand better my requirement imagine a server which process high volume of records and need to analyze them based on millions of other records which are stored in memory (for high processing rate).

for collection overhead have a look at trove - their memory overhead is lower than the built-in Collections classes (especially for maps and sets which, in the JDK are based on maps).
if you have large objects it might be worthwhile to save them "serialized" as some compact binary representation (not java serialization) and deserialize back to a full-blown object when needed)you could also use a cache library that can page out to disk? take a look at infinispan or ehcache. also, some of those libraries (ehcache among them, if memory serves) provide "off-heap storage" as part of your jvm process - a chunk of memory not subject to GC managed by the (native) library. if you have an efficient binary representation you could store it there (wont lower your footpring but might make GC behave better)

For the String bit you can store the byte[] you get from String.getBytes("UTF8"). If you require a String object again you can then create it again from the ByteArray. It will ofcourse incur some more CPU for creating the String objects over and over again, so it will be a tradeoff between size<->speed.

Regarding strings, also look into -XX:+UseCompressedStrings jvm option, but looks like is has been dropped from latest jvm updates, see this other question

Related

How is off heap memory read/written in Java?

In my Spark program, I'm interested in allocating and using data that is not touched by Java's garbage collector. Basically, I want to do the memory management of such data myself like you would do in C++. Is this a good case of using off heap memory? Secondly, how do you read and write to off heap memory in Java or Scala. I tried searching for examples, but couldn't find any.
Manual memory management is a viable optimization strategy for garbage collected languages. Garbage collection is a known source of overhead and algorithms can be tailored to minimize it. For example, when picking a hash table implementation one might prefer Open Addressing because it allocates its entries manually on the main array instead of handling them to the language memory allocation and its GC. As another example, here's a Trie searcher that packs the Trie into a single byte array in order to mimimize the GC overhead. Similar optimization can be used for regular expressions.
That kind of optimization, when the Java arrays are used as a low-level storage for the data, goes hand in hand with the Data-oriented design, where data is stored in arrays in order to achieve better cache locality. Data-oriented design is widely used in gamedev, where the performance matters.
In JavaScript this kind of array-backed data storage is an important part of asm.js.
The array-backed approach is sufficiently supported by most garbage collectors used in the Java world, as they'll try to avoid moving the large arrays around.
If you want to dig deeper, in Linux you can create a file inside the "/dev/shm" filesystem. This filesystem is backed by RAM and won't be flushed to disk unless your operating system is out of memory. Memory-mapping such files (with FileChannel.map) is a good enough way to get the off-heap memory directly from the operating system. (MappedByteBuffer operations are JIT-optimized to direct memory access, minus the boundary checks).
If you want to go even deeper, then you'll have to resort to JNI libraries in order to access the C-level memory allocator, malloc.
If you are not able to achieve "Efficiency with Algorithms, Performance with Data Structures", and if efficiency and performance are so critical, you could consider using "sun.misc.Unsafe". As the name suggests it is unsafe!!!
Spark is already using it as mentioned in project-tungsten.
Also, you can start here, to understand it better!!!
Note: Spark provides a highly concurrent for execution of application and with multiple JVMs mostlikely across multiple machines, manual memory management will be extreamly complex. Fundamemtally spark promotes re-computation over global shared memory. So, perhaps, you could store partially computed data/result in another store like HDFS/Kafka/cassandra!!!
Have a look at ByteBuffer.allocateDirect(int bytes). You don't need to memory map files to make use of them.
Off heap can be a good choice if the objects will stick there for a while (i.e. are reused). If you'll be allocating/deallocating them as you go, that's going to be slower.
Unsafe is cool but it's going to be removed. Probably in Java 9.

Processing large arrays that do not fit in RAM in Java

I am developing a text analysis program that represents documents as arrays of "feature counts" (e.g., occurrences of a particular token) within some pre-defined feature space. These arrays are stored in an ArrayList after some processing.
I am testing the program on a 64 mb dataset, with 50,000 records. The program worked fine with small data sets, but now it consistently throws a "out of memory" Java heap exception when I start loading the arrays into an ArrayList object (using the .add(double[]) method). Depending on how much memory I allocate to the stack, I will get this exception at the 1000th to 3000th addition to the ArrayList, far short of my 50,000 entries. It became clear to me that I cannot store all this data in RAM and operate on it as usual.
However, I'm not sure what data structures are best suited to allow me to access and perform calculations on the entire dataset when only part of it can be loaded into RAM?
I was thinking that serializing the data to disk and storing the locations in a hashmap in RAM would be useful. However, I have also seen discussions on caching and buffered processing.
I'm 100% sure this is a common CS problem, so I'm sure there are several clever ways that this has been addressed. Any pointers would be appreciated :-)
You have plenty of choices:
Increase heap size (-Xmx) to several gigabytes.
Do not use boxing collections, use fastutil - that should decrease your memory use 4x. http://fastutil.di.unimi.it/
Process your data in batches or sequentially - do not keep whole dataset in memory simultaneously.
Use a proper database. There are even intraprocess databases like HSQL, your mileage may vary.
Process your data via map-reduce, perhaps something local like pig.
How about using Apache Spark (Great for in-memory cluster computing) ?This would help scale your infrastructure as your data set gets Larger.

'Big dictionary' implementation in Java

I am in the middle of a Java project which will be using a 'big dictionary' of words. By 'dictionary' I mean certain numbers (int) assigned to Strings. And by 'big' I mean a file of the order of 100 MB. The first solution that I came up with is probably the simplest possible. At initialization I read in the whole file and create a large HashMap which will be later used to look strings up.
Is there an efficient way to do it without the need of reading the whole file at initialization? Perhaps not, but what if the file is really large, let's say in the order of the RAM available? So basically I'm looking for a way to look things up efficiently in a large dictionary stored in memory.
Thanks for the answers so far, as a result I've realised I could be more specific in my question. As you've probably guessed the application is to do with text mining, in particular representing text in a form of a sparse vector (although some had other inventive ideas :)). So what is critical for usage is to be able to look strings up in the dictionary, obtain their keys as fast as possible. Initial overhead of 'reading' the dictionary file or indexing it into a database is not as important as long as the string look-up time is optimized. Again, let's assume that the dictionary size is big, comparable to the size of RAM available.
Consider ChronicleMap (https://github.com/OpenHFT/Chronicle-Map) in a non-replicated mode. It is an off-heap Java Map implementation, or, from another point of view, a superlightweight NoSQL key-value store.
What it does useful for your task out of the box:
Persistance to disk via memory mapped files (see comment by MichaƂ Kosmulski)
Lazy load (disk pages are loaded only on demand) -> fast startup
If your data volume is larger than available memory, operating system will unmap rarely used pages automatically.
Several JVMs can use the same map, because off-heap memory is shared on OS level. Useful if you does the processing within a map-reduce-like framework, e. g. Hadoop.
Strings are stored in UTF-8 form, -> ~50% memory savings if strings are mostly ASCII (as maaartinus noted)
int or long values takes just 4(8) bytes, like if you have primitive-specialized map implementation.
Very little per-entry memory overhead, much less than in standard HashMap and ConcurrentHashMap
Good configurable concurrency via lock striping, if you already need, or are going to parallelize text processing in future.
At the point your data structure is a few hundred MB to orders of RAM, you're better off not initializing a data structure at run-time, but rather using a database which supports indexing(which most do these days). Indexing is going to be one of the only ways you can ensure the fastest retrieval of text once you're file gets so large and you're running up against the -Xmx settings of your JVM. This is because if your file is as large, or much larger than your maximum size settings, you're inevitably going to crash your JVM.
As for having to read the whole file at initialization. You're going to have to do this eventually so that you can efficiently search and analyze the text in your code. If you know that you're only going to be searching a certain portion of your file at a time, you can implement lazy loading. If not, you might as well bite the bullet and load your entire file into the DB in the beggenning. You can implement parallelism in this process, if there are other parts of your code execution that doesn't depend on this.
Please let me know if you have any questions!
As stated in a comment, a Trie will save you a lot of memory.
You should also consider using bytes instead of chars as this saves you a factor of 2 for plain ASCII text or when using your national charset as long as it has no more than 256 different letters.
At the first glance, combining this low-level optimization with tries makes no sense, as with them the node size is dominated by the pointers. But there's a way if you want to go low level.
So what is critical for usage is to be able to look strings up in the dictionary, obtain their keys as fast as possible.
Then forget any database, as they're damn slow when compared to HashMaps.
If it doesn't fit into memory, the cheapest solution is usually to get more of it. Otherwise, consider loading only the most common words and doing something slower for the others (e.g., a memory mapped file).
I was asked to point to a good tries implementation, especially off-heap. I'm not aware of any.
Assuming the OP needs no mutability, especially no mutability of keys, it all looks very simple.
I guess, the whole dictionary could be easily packed into a single ByteBuffer. Assuming mostly ASCII and with some bit hacking, an arrow would need 1 byte per arrow label character and 1-5 bytes for the child pointer. The child pointer would be relative (i.e., difference between the current node and the child), which would make most of them fit into a single byte when stored in a base 128 encoding.
I can only guess the total memory consumption, but I'd say, something like <4 bytes per word. The above compression would slow the lookup down, but still nowhere near what a single disk access needs.
It sounds too big to store in memory. Either store it in a relational database (easy, and with an index on the hash, fast), or a NoSQL solution, like Solr (small learning curve, very fast).
Although NoSQL is very fast, if you really want to tweak performance, and there are entries that are far more frequently looked up than others, consider using a limited size cache to hold the most recently used (say) 10000 lookups.

Why my JDBC call is consuming memory 4 times more that actual size of data

I wrote a small java program which loads data from DB2 database using simple JDBC call. I am using select query to get data and using java statement for this purpose. I have properly closed statement and connection objects. I am using 64 bit JVM for compilation and for running the program.
The query is returning 52 million records, each row having 24 columns, which takes me around 4 minutes to load complete data in Unix (having multiprocessor environment). I am using HashMap as data-structure to load the data: Map<String, Map<String, GridTradeStatus>>. The bean GridTradeStatus is a simple getter/setter bean with 24 properties in it.
The memory required for the program is alarmingly high. Java heap size goes up to 5.8 - 6GB to load complete data while actual used heap size remains between 4.7 - 4.9GB. I know that we should not load this much data into memory but my business requirements are in that way only.
The question is that when I put whole data of my table in a flat file it comes out to be roughly equivalent to ~1.2GB. I want to know why my java program is consuming memory 4 times more that its actual size.
There is nothing surprising here (to me at least).
a.) Strings in java consume double the space compared to most common text formats (because Strings are always represented as UTF-16 in the heap). Also, String as an object has quite some overhead (String object itself, reference to the char[] it contains, hashCode etc.). For small strings the String object costs easily as much memory as the data it contains.
b.) You put stuff into a HashMap. HashMap is not exactly memory efficient. First it uses a default load factor of 75%, which means a map with many entries has also a big bucket array. Then, each entry in the map is an object itself, which costs at least two references (key and value) plus object overhead.
In conclusion you pretty much have to expect the memory requirements to increase quite a bit. A factor of 4 is reasonable if your average data String is relatively short.
If you think you cannot afford a ratio 1:4 between the size of data in a flat file and the memory necessary to load the Strings in a HashMap, you should considere not using Java but a lower level language such as C++ or even C.
Of course there are possible optimizations :
use byte[] instead of String (about half the size)
do not use default HashMap parameters (initial size / load factor) but tweak them to meet your actual requirements.
What follows is mainly experience opinion based. I generally use 4 language levels :
high level scripting language (Python, Ruby, or even bash ...) when performance
is not a requirement and speed of developpement is
mid level language (Java, less frequently high level C++) when performance matters but when I also want simplicity of developpement and robustness (strong typing, ...)
low level language (low level C++, or C) what performance is a high requirement and when I accept to spend much more time in writing and testing individual modules
assembly language for the small parts where performance is critical and has been proved to be by profiling.
IMHO you can tweak Java code to highly reduce the memory footprint, but you risk to lose a great part of the interest of Java by losing the excellent string and collections support. It might be as easy and perhaps more efficient to code a small part of the application in C++ and use JNI to tie all together.

Optimising Java objects for CPU cache line efficiency

I'm writing a library where:
It will need to run on a wide range of different platforms / Java implementations (the common case is likely to be OpenJDK or Oracle Java on Intel 64 bit machines with Windows or Linux)
Achieving high performance is a priority, to the extent that I care about CPU cache line efficiency in object access
In some areas, quite large graphs of small objects will be traversed / processed (let's say around 1GB scale)
The main workload is almost exclusively reads
Reads will be scattered across the object graph, but not totally randomly (i.e. there will be significant hotspots, with occasional reads to less frequently accessed areas)
The object graph will be accessed concurrently (but not modified) by multiple threads. There is no locking, on the assumption that concurrent modification will not occur.
Are there some rules of thumb / guidelines for designing small objects so that they utilise CPU cache lines effectively in this kind of environment?
I'm particularly interested in sizing and structuring the objects correctly, so that e.g. the most commonly accessed fields fit in the first cache line etc.
Note: I am fully aware that this is implementation dependent, that I will need to benchmark, and of the general risks of premature optimization. No need to waste any further bandwidth pointing this out. :-)
A first step towards cache line efficiency is to provide for referential locality (i.e. keeping your data close to each other). This is hard to do in JAVA where almost everything is system allocated and accessed by reference.
To avoid references, the following might be obvious:
have non-reference types (i.e. int, char, etc.) as fields in your
objects
keep your objects in arrays
keep your objects small
These rules will at least ensure some referential locality when working on a single object and when traversing the object references in your object graph.
Another approach might be to not use object for your data at all, but have global non-ref typed arrays (of same size) for each item that would normally be a field in your class and then each instance would be identified by a common index into these arrays.
Then for optimizing the size of the arrays or chunks thereof, you have to know the MMU characteristics (page/cache size, number of cache lines, etc). I don't know if JAVA provides this in the System or Runtime classes, but you could pass this information as system properties on start up.
Of course this is totally orthogonal to what you should normally be doing in JAVA :)
Best regards
You may require information about the various caches of your CPU, you can access it from Java using Cachesize (currently supporting Intel CPUs). This can help to develop cache-aware algorithms.
Disclaimer : author of the lib.

Categories