Does the JVM intercept disk transactions/have its own disk buffer? - java

Question:
Do you guys know if calls to write on disk are intercepted by the jvm? Does it have its own buffer between the application and the OS? More specifically, can the jvm make an asynchronous disk write operation look synchronous to the application?
Background:
I've been running some applications with Berkeley DB, in sync mode, that is, the database is supposed to return calls to db.put(key, value) only after the (key, value) pair has been safely persisted to disk. To set such options, I do:
envConfig.setDurability(Durability.COMMIT_SYNC);
dbConfig.setDeferredWrite(false);
Above, envConfig is an EnvironmentConfig object and dbconfig is a DatabaseConfig object, which I use to adjust the behavior of the database.
Anyway, the above configuration is supposed to make every put(...) call to cause a disk transaction (which you can measure, i.e., with iostat on Linux), right? This would be because the alternative (COMMIT_NO_SYNC with deferred write) would return calls to put without waiting for the disk, so that it could buffer a good amount of data to write all at once, improving performance, at the expense of safety.
Problem:
I'm making several thousands of calls to put per second, but the number of disk transactions per second does not change almost, whether or not I set the above options in the database.

I am not providing an exact answer of the problem. But here is my experience with disk operations. In past I have faced the problem, where my expectation of a disk operation no fulfilled on stipulated time.
The disk write is always much much slower than writing into memory. I guess the disk writing here depends on hardware, native OS API and CPU allocation being given to disk write operation. So, you should not expect the disk write be as fast as you method call. In fact no program logic should be written assuming certain performance parameter of any devices like printer, disk etc.
If you meant to do that, you must have a reconciliation method, which ensure that the operation would be 100% completed, before next operation could be done.

Related

java fastest concurrent random file R/W method for SSDs without memory swap

I have a linux box with 32GB of ram and a set of 4 SSD in a raid 0 config that maxes out at about 1GB of throughput (random 4k reads) and I am trying to determine the best way of accessing files on them randomly and conccurently using java. The two main ways I have seen so far are via random access file and mapped direct byte buffers.
Heres where it gets tricky though. I have my own memory cache for objects so any call to the objects stored in a file should go through to disk and not paged memory (I have disabled the swap space on my linux box to prevent this). Whilst mapped direct memory buffers are supposedly the fastest they rely on swapping which is not good because A) I am using all the free memory for the object cache, using mappedbytebuffers instead would incur a massive serialization overhead which is what the object cache is there to prevent.(My program is already CPU limited) B) with mappedbytebuffers the OS handles the details of when data is written to disk, I need to control this myself, ie. when I write(byte[]) it goes straight out to disk instantly, this is to prevent data corruption incase of power failure as I am not using ACID transactions.
On the other hand I need massive concurrency, ie. I need to read and write to multiple locations in the same file at the same time (whilst using offset/Range locks to prevent data corruption) I'm not sure how I can do this without mappedbytebuffers, I could always just que the reads/Writes but I'm not sure how this will negatively affect my throughput.
Finally I can not have a situation when I am creating new byte[] objects for reads or writes, this is because I perform almost a 100000 read/write operations per second, allocating and Garbage collecting all those objects would kill my program which is time sensitive and already CPU limited, reusing byte[] objects is fine through.
Please do not suggest any DB software as I have tried most of them and they add to much complexity and cpu overhead.
Anybody had this kind of dilemma?
Whilst mapped direct memory buffers are supposedly the fastest they rely on swapping
No, not if you have enough RAM. The mapping associates pages in memory with pages on disk. Unless the OS decides that it needs to recover RAM, the pages won't be swapped out. And if you are running short of RAM, all that disabling swap does is cause a fatal error rather than a performance degradation.
I am using all the free memory for the object cache
Unless your objects are extremely long-lived, this is a bad idea because the garbage collector will have to do a lot of work when it runs. You'll often find that a smaller cache results in higher overall throughput.
with mappedbytebuffers the OS handles the details of when data is written to disk, I need to control this myself, ie. when I write(byte[]) it goes straight out to disk instantly
Actually, it doesn't, unless you've mounted your filesystem with the sync option. And then you still run the risk of data loss from a failed drive (especially in RAID 0).
I'm not sure how I can do this without mappedbytebuffers
A RandomAccessFile will do this. However, you'll be paying for at least a kernel context switch on every write (and if you have the filesystem mounted for synchronous writes, each of those writes will involve a disk round-trip).
I am not using ACID transactions
Then I guess the data isn't really that valuable. So stop worrying about the possibility that someone will trip over a power cord.
Your objections to mapped byte buffers don't hold up. Your mapped files will be distinct from your object cache, and though they take address space they don't consume RAM. You can also sync your mapped byte buffers whenever you want (at the cost of some performance). Moreover, random access files end up using the same apparatus under the covers, so you can't save any performance there.
If mapped bytes buffers aren't getting you the performance you need, you might have to bypass the filesystem and write directly to raw partitions (which is what DBMS' do). To do that, you probably need to write C++ code for your data handling and access it through JNI.

memcached and performance

I might be asking very basic question, but could not find a clear answer by googling, so putting it here.
Memcached caches information in a separate Process. Thus in order to get the cached information requires inter-process communication (which is generally serialization in java). That means, generally, to fetch a cached object, we need to get a serialized object and generally transport it to network.
Both, serialization and network communication are costly operations. if memcached needs to use both of these (generally speaking, there might be cases when network communication is not required), then how Memcached is fast? Is not replication a better solution?
Or this is a tradeoff of distribution/platform independency/scalability vs performance?
You are right that looking something up in a shared cache (like memcached) is slower than looking it up in a local cache (which is what i think you mean by "replication").
However, the advantage of a shared cache is that it is shared, which means each user of the cache has access to more cache than if the memory was used for a local cache.
Consider an application with a 50 GB database, with ten app servers, each dedicating 1 GB of memory to caching. If you used local caches, then each machine would have 1 GB of cache, equal to 2% of the total database size. If you used a shared cache, then you have 10 GB of cache, equal to 20% of the total database size. Cache hits would be somewhat faster with the local caches, but the cache hit rate would be much higher with the shared cache. Since cache misses are astronomically more expensive than either kind of cache hit, slightly slower hits are a price worth paying to reduce the number of misses.
Now, the exact tradeoff does depend on the exact ratio of the costs of a local hit, a shared hit, and a miss, and also on the distribution of accesses over the database. For example, if all the accesses were to a set of 'hot' records that were under 1 GB in size, then the local caches would give a 100% hit rate, and would be just as good as a shared cache. Less extreme distributions could still tilt the balance.
In practice, the optimum configuration will usually (IMHO!) be to have a small but very fast local cache for the hottest data, then a larger and slower cache for the long tail. You will probably recognise that as the shape of other cache hierarchies: consider the way that processors have small, fast L1 caches for each core, then slower L2/L3 caches shared between all the cores on a single die, then perhaps yet slower off-chip caches shared by all the dies in a system (do any current processors actually use off-chip caches?).
You are neglecting the cost of disk i/o in your your consideration, which is generally going to be the slowest part of any process, and is the main driver IMO for utilizing in-memory caching like memcached.
Memory caches use ram memory over the network. Replication uses both ram-memory as well as persistent disk memory to fetch data. Their purposes are very different.
If you're only thinking of using Memcached to store easily obtainable data such as 1-1 mapping for table records :you-re-gonna-have-a-bad-time:.
On the other hand if your data is the entire result-set of a complex SQL query that may even oveflow the SQL memory pool (and need to be temporarily written to disk to be fetched) you're going to see a big speed-up.
The previous example mentions needing to write data to disk for a read operation - yes it happens if the result set is too big for memory (imagine a CROSS JOIN) that means that you both read and write to that drive (thrashing comes to mind).
In A highly optimized application written in C for example you may have a total processing time of 1microsec and may need to wait for networking and/or serialization/deserialization (marshaling/unmarshaling) for a much longer time than the app execution time itself. That's when you'll begin too feel the limitations of memory-caching over the network.

What does "costly" mean in terms of software operations?

What is meant by Operation is costly or the resource is costly in-terms of Software. When i come across with some documents they mentioned something like Opening a file every-time is a Costly Operation. I can have more examples like this (Database connection is a costly operation, Thread pool is a cheaper one, etc..). At what basis it decided whether the task or operation is costly or cheaper? When we calculating this what the constraints to consider? Is based on the Time also?
Note : I already checked in the net with this but i didn't get any good explanation. If you found kindly share with me and i can close this..
Expensive or Costly operations are those which cause a lot of resources to be used, such as the CPU, Disk Drive(s) or Memory
For example, creating an integer variable in code is not a costly or expensive operation
By contrast, creating a connection to a remote server that hosts a relational database, querying several tables and returning a large results set before iterating over it while remaining connected to the data source would be (relatively) expensive or costly, as opposed to my first example with the Integer.
In order to build scalable, fast applications you would generally want to minimize the frequency of performing these costly/expensive actions, applying techniques of optimisation, caching, parallelism (etc) where they are essential to the operation of the software.
To get a degree of accuracy and some actual numbers on what is 'expensive' and what is 'cheap' in your application, you would employ some sort of profiling or analysis tool. For JavaScript, there is ySlow - for .NET applications, dotTrace - I'd be certain that whatever the platform, a similar solution exists. It's then down to someone to comprehend the output, which is probably the most important part!
Running time, memory use or bandwidth consumption are the most typical interpretations of "cost". Also consider that it may apply to cost in development time.
I'll try explain through some examples:
If you need to edit two field in each row of a Database, if you do it one field at a time that's gonna be close to twice the time as if it was properly done both at same time.
This extra time was not only your waste of time, but also a connection opened longer then needed, memory occupied longer then needed and at the end of the day, your eficience goes down the drain.
When you start scalling, very small amount of time wasted grows into a very big waste of Company resources.
It is almost certainly talking about a time penalty to perform that kind of input / output. Lots of memory shuffling (copying of objects created from classes with lots of members) is another time waster (pass by reference helps eliminate a lot of this).
Usually costly means, in a very simplified way, that it'll take much longer then an operation on memory.
For instance, accessing a File in your file system and reading each line takes much longer then simply iterating over a list of the same size in memory.
The same can be said about database operations, they take much longer then in-memory operations, and so some caution should be used not to abuse these operations.
This is, I repeat, a very simplistic explanation. Exactly what costly means depends on your particular context, the number of operations you're performing, and the overall architecture of the system.

Using NIO, do I need to care about R/W on block-boundaries?

Background
A lot of work has gone into optimizing database design, especially in the realm of the most optimal ways to read and write data from disks (both spindle and SSD).
The knowledge that has come out of the work suggests that reading and writing on block boundaries, matching the block sizes of the filesystem you are running on, is the most optimal approach.
Question
Say I am operating in a relatively low-memory environment and want to use a small 32MB memory-mapped file to read and write the contents of a huge 500GB file.
If I were using Java's NIO mechanisms, specifically the MappedByteBuffer (Java's memory-mapped file mechanism), would I need to take care to execute READ and WRITE operations on block boundaries (e.g. 4KB) into memory before pairing out the data I needed, or can I just issue R/W ops at any location I want and allow the operating system, VM paging logic, filesystem and storage firmware handle the optimization of the operations and culling of additional block data I didn't need as-needed?
Additional Detail
The reason for the question is in database design, I see this obsessive focus on block-optimization to the point that there doesn't seem to exist a world where you would ever just read and write data without the concept of a block.
What confuses me is that the filesystem is the one enforcing the block units of operation, why would my higher level app need to worry about this then? If I want the 17,631 bytes at offset 71, can't I just grab them and read them in, or is it really faster for me to figure out that
the read operation starts at block 0 and falls across the boundaries of blocks 0, 1 and 2... read all of those 3 blocks in to an internal byte[], then cull out the 17,631 bytes I wanted in the first place?
If the literature on DB design wasn't so religious about this block idea, the question would have never come up in my mind, but because it is, I am wondering if I am missing a critical detail here WRT filesystems and optimal block device I/O.
Thank you for reading.
I think part of the reason databases have awareness of a block size (which may not be exactly the same as the fs block size, but of course should align) is not just to perform block-aligned I/O, but also to manage how the disk data is cached in memory rather than just relying on the OS caching. Some databases bypass the OS filesystem cache completely, in fact. Having the database manage the cache sometimes allows greater intelligence as to how that cache is utilised, that the OS might not be able to provide.
An rdbms will typically take account of the number of blocks that could be read/written during a query in order to compare different execution plans: and the possibilities for all the data to be fetched from the same block can be a useful optimisation to take note of.
Most databases I'm familiar with have the concept of a block cache/buffer where some portion of the working set of the database lives. Managing a cache entirely made up of arbitrary extents could potentially be quite a bit harder to manage. Also many databases actually arrange their stored data as a sequence of blocks, so the I/O pattern grows out of that. Of course, this might simply be a legacy of databases originally written for platforms that didn't have rich OS caching facilities...
Trying to conclude this ramble with some sort of answer to your question... my feeling would be that reading from arbitrary extents within the mapped file and letting the OS deal with the extra slop should be fine. Performance-wise, it's probably more important to try and let the OS do read-ahead: e.g. using the "advise" calls so the OS can start reading the next extent from disk while you process the current one. And, of course, a way to advise the OS to uncache extents you've finished with.
4KB blocks are important because it's typically the granularity of the MMU and hence the OS virtual memory manager. When items are frequently used together, it's important to design your database layout so that these items end up in the same page. This way, a page fault will page in all the items in the page.

Using a concurrent hashmap to reduce memory usage with threadpool?

I'm working with a program that runs lengthy SQL queries and stores the processed results in a HashMap. Currently, to get around the slow execution time of each of the 20-200 queries, I am using a fixed thread pool and a custom callable to do the searching. As a result, each callable is creating a local copy of the data which it then returns to the main program to be included in the report.
I've noticed that 100 query reports, which used to run without issue, now cause me to run out of memory. My speculation is that because these callables are creating their own copy of the data, I'm doubling memory usage when I join them into another large HashMap. I realize I could try to coax the garbage collector to run by attempting to reduce the scope of the callable's table, but that level of restructuring is not really what I want to do if it's possible to avoid.
Could I improve memory usage by replacing the callables with runnables that instead of storing the data, write it to a concurrent HashMap? Or does it sound like I have some other problem here?
Don't create copy of data, just pass references around, ensuring thread safety if needed. If without data copying you still have OOM, consider increasing max available heap for application.
Drawback of above approach not using copy of data is that thread safety is harder to achieve, though.
Do you really need all 100-200 reports at the same time?
May be it's worth to limit the 1st level of caching by just 50 reports and introduce a 2nd level based on WeakHashMap?
When 1st level exceeds its size LRU will be pushed to the 2nd level which will depend on the amount of available memory (with use of WeakHashMap).
Then to search for reports you will first need to query 1st level, if value is not there query 2nd level and if value is not there then report was reclaimed by GC when there was not enough memory and you have to query DB again for this report.
Do the results of the queries depend on other query results? If not, whenever you discover the results in another thread, just use a ConcurrentHashMap like you are implying. Do you really need to ask if creating several unnecessary copies of data is causing your program to run out of memory? This should almost be obvious.

Categories