Java: are there situations where disk is as fast as memory? - java

I'm writing some code to access an inverted index.
I have two interchangeable class which perform the reads on the index. One reads the index from the disk, buffering part of it. The other load the index completely in memory, as a byte[][] (the index size is around 7Gb) and read from this multidimensional array.
One would expect to have better performances while having the whole data in memory. But my measures state that working with the index on disk it's as fast as having it in memory.
(The time spent to load the index in memory isn't counted in the performances)
Why is this happening? Any ideas?
Further information: I've run the code enabling HPROF. Both working "on disk" or "in memory", the most used code it's NOT the one directly related to the reads. Also, for my (limited) understanding, the gc profiler doesn't show any gc related issue.
UPDATE #1: I've instrumented my code to monitor I/O times. It seems that most of the seeks on memory take 0-2000ns, while most of the seeks on disk take 1000-3000ns. The second metric seems a bit too low for me. Is it due disk caching by Linux? Is there a way to exclude disk caching for benchmarking purposes?
UPDATE #2: I've graphed the response time for every request to the index. The line for the memory and for the disk match almost exactly. I've done some other tests using the O_DIRECT flag to open the file (thanks to JNA!) and in that case the disk version of the code is (obviously) slower than memory. So, I'm concluding that the "problem" was because the aggressive Linux disk caching, which is pretty amazing.
UPDATE #3: http://www.nicecode.eu/java-streams-for-direct-io/

Three possibilities off the top of my head:
The operating system is already keeping all of the index file in memory via its file system cache. (I'd still expect an overhead, mind you.)
The index isn't the bottleneck of the code you're testing.
Your benchmarking methodology isn't quite right. (It can be very hard to do benchmarking well.)
The middle option seems the most likely to me.

No, disk can never be as fast as RAM (RAM is actually in the order of 100,000 times faster for magnetic discs). Most likely the OS is mapping your file in memory for you.

Related

Check if there is enough memory before allocating byte array

I need to load a file into memory. Before I do that I want to make sure that there is enough memory in my VM left. If not I would like to show an error message. I want to avoid the OutOfMemory exception.
Approach:
Get filesize of my file
Use Runtime.getRuntime().freeMemory()
Check if it fits
Would this work or do you have any other suggestions?
The problem with any "check first then do" strategy is that there may be changes between the "check" and the "do" that render the entire thing useless.
A "try then recover" strategy is almost always a better idea and, unfortunately, that means trying to allocate the memory and handling the exception. Even if you do the "check first" option, you should still code for the possibility that the allocation may fail.
A classic example of that would be checking that a file exists before opening it. If someone were to delete the file between your check and open, you'll get an exception regardless of the fact the file was there very recently.
Now I don't know why you have an aversion to catching the exception but I'd urge you to rethink it. Since Java relies heavily on them, they're generally accepted as a good way to do things, if you don't actually control what it is you're trying (such as opening files or allocating memory).
If, as it seems from your comments, you're worried about the out-of-memory affecting other threads, that shouldn't be the case if you try to allocate one big area for the file. If you only have 400M left and you ask for 600, your request will fail but you should still have that 400M left.
It's only if you nickle-and-dime your way up to the limit (say trying 600 separate 1M allocations) would other threads start to feel the pinch after you'd done about 400. And that would only happen if you dodn't release those 400 in a hurry.
So perhaps a possibility would be to work out how much space you need and make sure you allocate it in one hit. Either it'll work or it won't. If it doesn't, your other threads will be no worse off.
I suppose you could use your suggested method to try and make sure the allocation left a bit of space for the other threads (say 100M or 10% or something like that), if you're really concerned. But I'd just go ahead and try anyway. If your other threads can't do their work because of low memory, there's ample precedent to tell the user to provide more memory for the VM.
Personally I would advice against loading a massive file directly into memory, rather try to load it in chunks or use some sort of temp file to store intermediate data.
You may want to look at the FileChannel.map(FileChannel.MapMode, long, long) method. This allows mapping a file (think POSIX mmap) without filling the heap. The operating system will (hopefully successfully) take care of the memory for you.

java performance degrading due to arraylist of size more than 6000

Hello I am using jdk7 and ram of 6GB with intel corei5 processor.I have a java code which has an arraylist of size more than 6000 and each element in that arraylist contains 12 double values.The processing speed has decreased very much and now it takes around 20 mins to run that entire code.
What the code does is as follows:
There are some 4500 iterations happening due to nested for loops..and in each iteration a file of 400 kb is read and some processing happens and some values are stored in arraylist.
Once the arraylist is ready the values of arraylist are written in another file through csvwriter.and then i have used jtable and the jtable also required the arraylist for referring to some values in that arraylist.so basically i cant clear this arraylist.
I have given the values for heap memory as follows
-Xms1024M -Xmx4096M
I am new to programming and I am rather confused as to what should i do?Can i increase heap size more than this?will that help?My senior suggests to use hard disk memory for storing arraylist elements or processing but i doubt if that is possible.
Please help.Any help will be appreciated
12 x 8 x 6000 doubles are not going to take up a significant amount of memory
If your program's speed is getting slower each time until it eventually crashes with an OutOfMemoryError, then it's possible that you have a coding error that is causing a memory leak.
This question has some examples of memory leak causes in Java.
Using VisualVM or some manual logging will help to identify the issue. Static code anaylsers like FindBugs or PMD may also help.
Heap size isn't your issue. When you have one of those, you'll see an OutOfMemoryError.
Usually what you do when you encounter performance issues like this is you profile, either with something like VisualVM or by hand, using System.nanoTime() to track which part of your code is the bottleneck. From there, you make sure you're using appropriate data structures, algorithms, etc., and then see where you can parallelize your code.
I guess you're leaking the JTables somehow. This can easily happen with Listeners, TableSorters, etc. A proper tool would tell you, but the better way is IMHO to decompose the problem.
Either it's the GUI part what makes troubles or not. Ideally, the remaining program should be completely independent of the GUI, so you can run it in isolation and you'll see what happens.
My senior suggests to use hard disk memory for storing arraylist elements or processing but i doubt if that is possible.
Many things are possible but few make sense. If you're really storing just 12 x 8 x 6000 doubles, then it makes absolutely no sense. Even with the high overhead of Double, it's just a few megabytes.
Another idea: If all the data you need fits into memory, then you can try to read it all upfront. This ensures you're not storing multiple copies.

Why WalkingFileTree is faster the second time?

I am using the function Files.walkfiletree() from java.NIO, I am looking over a really big tree so when I run the app the first time (with first time I mean each time I turn on my computer) the app takes some time but the second time is really fast.
why? is some cache working? Can I use this in some permanent way?
When you read data from the filesystem, that information is caching making accessing it again much faster. In some cases 100x faster or more. It caches the data in memory because it is faster.
The simplest solution is to access/load this directory structure before you need it and you will get cached performance. e.g. you can do this on start up.
Another solution is to get a faster SSD. Accessing file structures performs a lot of disk operations to get all the pieces of information. A HDD can do up to 120 IOPS, a cheap SSD can do 40,000 IOPS and a fast SSD can do 250,000 IOPS. This can dramatically reduce the time to load this information.
However, since you cannot control what is in memory, except by accessing it repeatedly, it may be pushed out of the disk cache later.

java fastest concurrent random file R/W method for SSDs without memory swap

I have a linux box with 32GB of ram and a set of 4 SSD in a raid 0 config that maxes out at about 1GB of throughput (random 4k reads) and I am trying to determine the best way of accessing files on them randomly and conccurently using java. The two main ways I have seen so far are via random access file and mapped direct byte buffers.
Heres where it gets tricky though. I have my own memory cache for objects so any call to the objects stored in a file should go through to disk and not paged memory (I have disabled the swap space on my linux box to prevent this). Whilst mapped direct memory buffers are supposedly the fastest they rely on swapping which is not good because A) I am using all the free memory for the object cache, using mappedbytebuffers instead would incur a massive serialization overhead which is what the object cache is there to prevent.(My program is already CPU limited) B) with mappedbytebuffers the OS handles the details of when data is written to disk, I need to control this myself, ie. when I write(byte[]) it goes straight out to disk instantly, this is to prevent data corruption incase of power failure as I am not using ACID transactions.
On the other hand I need massive concurrency, ie. I need to read and write to multiple locations in the same file at the same time (whilst using offset/Range locks to prevent data corruption) I'm not sure how I can do this without mappedbytebuffers, I could always just que the reads/Writes but I'm not sure how this will negatively affect my throughput.
Finally I can not have a situation when I am creating new byte[] objects for reads or writes, this is because I perform almost a 100000 read/write operations per second, allocating and Garbage collecting all those objects would kill my program which is time sensitive and already CPU limited, reusing byte[] objects is fine through.
Please do not suggest any DB software as I have tried most of them and they add to much complexity and cpu overhead.
Anybody had this kind of dilemma?
Whilst mapped direct memory buffers are supposedly the fastest they rely on swapping
No, not if you have enough RAM. The mapping associates pages in memory with pages on disk. Unless the OS decides that it needs to recover RAM, the pages won't be swapped out. And if you are running short of RAM, all that disabling swap does is cause a fatal error rather than a performance degradation.
I am using all the free memory for the object cache
Unless your objects are extremely long-lived, this is a bad idea because the garbage collector will have to do a lot of work when it runs. You'll often find that a smaller cache results in higher overall throughput.
with mappedbytebuffers the OS handles the details of when data is written to disk, I need to control this myself, ie. when I write(byte[]) it goes straight out to disk instantly
Actually, it doesn't, unless you've mounted your filesystem with the sync option. And then you still run the risk of data loss from a failed drive (especially in RAID 0).
I'm not sure how I can do this without mappedbytebuffers
A RandomAccessFile will do this. However, you'll be paying for at least a kernel context switch on every write (and if you have the filesystem mounted for synchronous writes, each of those writes will involve a disk round-trip).
I am not using ACID transactions
Then I guess the data isn't really that valuable. So stop worrying about the possibility that someone will trip over a power cord.
Your objections to mapped byte buffers don't hold up. Your mapped files will be distinct from your object cache, and though they take address space they don't consume RAM. You can also sync your mapped byte buffers whenever you want (at the cost of some performance). Moreover, random access files end up using the same apparatus under the covers, so you can't save any performance there.
If mapped bytes buffers aren't getting you the performance you need, you might have to bypass the filesystem and write directly to raw partitions (which is what DBMS' do). To do that, you probably need to write C++ code for your data handling and access it through JNI.

BerkeleyDB write performance problems

I need a disk-based key-value store that can sustain high write and read performance for large data sets. Tall order, I know.
I'm trying the C BerkeleyDB (5.1.25) library from java and I'm seeing serious performance problems.
I get solid 14K docs/s for a short while, but as soon as I reach a few hundred thousand documents the performance drops like a rock, then it recovers for a while, then drops again, etc. This happens more and more frequently, up to the point where most of the time I can't get more than 60 docs/s with a few isolated peaks of 12K docs/s after 10 million docs. My db type of choice is HASH but I also tried BTREE and it is the same.
I tried using a pool of 10 db's and hashing the docs among them to smooth out the performance drops; this increased the write throughput to 50K docs/s but didn't help with the performance drops: all 10 db's slowed to a crawl at the same time.
I presume that the files are being reorganized, and I tried to find a config parameter that affects when this reorganization takes place, so each of the pooled db's would reorganize at a different time, but I couldn't find anything that worked. I tried different cache sizes, reserving space using the setHashNumElements config option so it wouldn't spend time growing the file, but every tweak made it much worse.
I'm about to give berkeleydb up and try much more complex solutions like cassandra, but I want to make sure I'm not doing something wrong in berkeleydb before writing it off.
Anybody here with experience achieving sustained write performance with berkeleydb?
Edit 1:
I tried several things already:
Throttling the writes down to 500/s (less than the average I got after writing 30 million docs in 15 hors, which indicates the hardware is capable of writing 550 docs/s). Didn't work: once a certain number of docs has been written, performance drops regardless.
Write incoming items to a queue. This has two problems: A) It defeats the purpose of freeing up ram. B) The queue eventually blocks because the periods during which BerkeleyDB freezes get longer and more frequent.
In other words, even if I throttle the incoming data to stay below the hardware capability and use ram to hold items while BerkeleyDB takes some time to adapt to the growth, as this time gets increasingly longer, performance approaches 0.
This surprises me because I've seen claims that it can handle terabytes of data, yet my tests show otherwise. I still hope I'm doing something wrong...
Edit 2:
After giving it some more thought and with Peter's input, I now understand that as the file grows larger, a batch of writes will get spread farther apart and the likelihood of them falling into the same disk cylinder drops, until it eventually reaches the seeks/second limitation of the disk.
But BerkeleyDB's periodic file reorganizations are killing performance much earlier than that, and in a much worse way: it simply stops responding for longer and longer periods of time while it shuffles stuff around. Using faster disks or spreading the database files among different disks does not help. I need to find a way around those throughput holes.
What I have seen with high rates of disk writes is that the system cache will fill up (giving lightening performance up to that point) but once it fills the application, even the whole system can slow dramatically, even stop.
Your underlying physical disk should sustain at least 100 writes per second. Any more than that is an illusion supported by clearer caching. ;) However, when the caching system is exhausted, you will see very bad behaviour.
I suggest you consider a disk controller cache. Its battery backed up memory would need to be about the size of your data.
Another option is to use SSD drives if the updates are bursty, (They can do 10K+ writes per second as they have no moving parts) with caching, this should give you more than you need but SSD have a limited number of writes.
BerkeleyDB does not perform file reorganizations, unless you're manually invoking the compaction utility. There are several causes of the slowdown:
Writes to keys in random access fashion, which causes much higher disk I/O load.
Writes are durable by default, which forces a lot of extra disk flushes.
Transactional environment is being used, in which case checkpoints cause a slowdown when flushing changes to disk.
When you say "documents", do you mean to say that you're using BDB for storing records larger than a few kbytes? BDB overflow pages have more overhead, and so you should consider using a larger page size.
This is an old question and the problem is probably gone, but I have recently had similar problems (speed of insert dropping dramatically after few hundred thousand records) and they were solved by giving more cache to the database (DB->set_cachesize). With 2GB of cache the insert speed was very good and more or less constant up to 10 million records (I didn't test further).
We have used BerkeleyDB (BDB) at work and have seem similar performance trends. BerkeleyDB uses a Btree to store its key/value pairs. When the number of entries keep increasing, the depth of the tree increases. BerkeleyDB caching works on loading trees into RAM so that a tree traversal does not incur file IO (reading from disk).
I need a disk-based key-value store that can sustain high write and read performance for large data sets.
Chronicle Map is a modern solution for this task. It's much faster than BerkeleyDB on both reads and writes, and is much more scalable in terms of concurrent access from multiple threads/processes.

Categories