Many nested BufferedInputStream's - what's the impact?

Many nested BufferedInputStream's - what's the impact? - java

There's a common pattern, when each layer of application, dealing with data from a stream tends to wrap it into a BufferedInputStream, so that at a whole, there's a lot of buffers, filled from buffers, filled from buffers and so on.
I think this is bad practice and want to question: how does it impact the performance? Can this cause bugs?

This is a very general question, but I'd say there are a number of problems with having lots of layers of buffered input streams (in any language).
Each buffer takes up memory, even when it's not filled. So, even if the data gets sucked right up to the top "layer" straight away, memory is still being needlessly used. (Note: I'm assuming that Java doesn't resize its buffers automatically or anything — and I'm no Java expert.)
Whenever you read from the top-level buffer, you'll be setting off a big chain of method calls. Method calls involve indirection (i.e. pointer-following), passing-around of data (which could lead to poor caching performance), and so on.
It probably means that the design isn't very well-thought-out, since buffered streams should generally be for reading from sources that actually need buffering, like the disk or the network.
Just a few thoughts on the matter. I'm sure someone with better Java knowledge could contribute a more detailed analysis.

It will increase memory footprint due to the extra buffers, but I suspect its rare given the sizes likely involved that it will actually have a significant effect on a given program. Theres the standard rule of not trying to optimise before you need to.
Theres also bound to be a slight processor overhead, but this will be even less significant.
It all depends just how much it is used, if there are many large chains it could be a problem, but I think it unlikely to be a problem.
As David said it is likely an indication of poor design It would probably be more efficient for components to be able to share more complex objects directly, but its all down to specific uses (and I'm having trouble thinking of a reason that you would use multiple buffered streams in such a way).

It is indeed very bad practice and can indeed cause bugs. If method A does some reading and then passes the stream to method B which attaches a BufferedInputStream and does some more reading, the BufferedInputStream will fill its buffer, which may consume data that method A is expecting to be still there when method B returns. Data can be lost by method B's BufferedInputStream reading ahead.
As regards overheads, in practice, if the reads/writes are large enough, the intermediate buffers are bypassed anyway, so there isn't nearly as much extra copying as you might think: the performance impact is mostly the extra memory space plus the extra method calls.

Related

Using NIO, do I need to care about R/W on block-boundaries?

Background
A lot of work has gone into optimizing database design, especially in the realm of the most optimal ways to read and write data from disks (both spindle and SSD).
The knowledge that has come out of the work suggests that reading and writing on block boundaries, matching the block sizes of the filesystem you are running on, is the most optimal approach.
Question
Say I am operating in a relatively low-memory environment and want to use a small 32MB memory-mapped file to read and write the contents of a huge 500GB file.
If I were using Java's NIO mechanisms, specifically the MappedByteBuffer (Java's memory-mapped file mechanism), would I need to take care to execute READ and WRITE operations on block boundaries (e.g. 4KB) into memory before pairing out the data I needed, or can I just issue R/W ops at any location I want and allow the operating system, VM paging logic, filesystem and storage firmware handle the optimization of the operations and culling of additional block data I didn't need as-needed?
Additional Detail
The reason for the question is in database design, I see this obsessive focus on block-optimization to the point that there doesn't seem to exist a world where you would ever just read and write data without the concept of a block.
What confuses me is that the filesystem is the one enforcing the block units of operation, why would my higher level app need to worry about this then? If I want the 17,631 bytes at offset 71, can't I just grab them and read them in, or is it really faster for me to figure out that
the read operation starts at block 0 and falls across the boundaries of blocks 0, 1 and 2... read all of those 3 blocks in to an internal byte[], then cull out the 17,631 bytes I wanted in the first place?
If the literature on DB design wasn't so religious about this block idea, the question would have never come up in my mind, but because it is, I am wondering if I am missing a critical detail here WRT filesystems and optimal block device I/O.
Thank you for reading.

I think part of the reason databases have awareness of a block size (which may not be exactly the same as the fs block size, but of course should align) is not just to perform block-aligned I/O, but also to manage how the disk data is cached in memory rather than just relying on the OS caching. Some databases bypass the OS filesystem cache completely, in fact. Having the database manage the cache sometimes allows greater intelligence as to how that cache is utilised, that the OS might not be able to provide.
An rdbms will typically take account of the number of blocks that could be read/written during a query in order to compare different execution plans: and the possibilities for all the data to be fetched from the same block can be a useful optimisation to take note of.
Most databases I'm familiar with have the concept of a block cache/buffer where some portion of the working set of the database lives. Managing a cache entirely made up of arbitrary extents could potentially be quite a bit harder to manage. Also many databases actually arrange their stored data as a sequence of blocks, so the I/O pattern grows out of that. Of course, this might simply be a legacy of databases originally written for platforms that didn't have rich OS caching facilities...
Trying to conclude this ramble with some sort of answer to your question... my feeling would be that reading from arbitrary extents within the mapped file and letting the OS deal with the extra slop should be fine. Performance-wise, it's probably more important to try and let the OS do read-ahead: e.g. using the "advise" calls so the OS can start reading the next extent from disk while you process the current one. And, of course, a way to advise the OS to uncache extents you've finished with.

4KB blocks are important because it's typically the granularity of the MMU and hence the OS virtual memory manager. When items are frequently used together, it's important to design your database layout so that these items end up in the same page. This way, a page fault will page in all the items in the page.

Should I inline long code in a loop, or move it in a separate method?

Assume I have a loop (any while or for) like this:
loop{
A long code.
}
From the point of time complexity, should I divide this code in parts, write a function outside the loop, and call that function repeatedly?
I read something about functions very long ago, that calling a function repeatedly takes more time or memory or like something, I don't remember it exactly. Can you also provide some good reference about things like this (time complexity, coding style)?
Can you also provide some reference book or tutorial about heap memory, overheads etc. which affects the performance of program?

The performance difference is probably very minimal in this case. I would concentrate on clarity rather than performance until you identify this portion of your code to be a serious bottleneck.
It really does depend on what kind of code you're running in the loop, however. If you're just doing a tiny mathematical operation that isn't going to take any CPU time, but you're doing it a few hundred thousand times, then inlining the calculation might make sense. Anything more expensive than that, though, and performance shouldn't be an issue.

There is an overhead of calling a function.
So if the "long code" is fast compared to this overhead (and your application cares about performance), then you should definitely avoid the overhead.
However, if the performance is not noticably worse, it's better to make it more readable, by using a (or better multiple) function.

Rule one of performance optmisation: Measure it.
Personally, I go for readable code first and then optimise it IF NECESSARY. Usually, it isn't necessary :-)
See the first line in CHAPTER 3 - Measurement Is Everything
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil." - Donald Knuth
In this case, the difference in performance will probably be minimal between the two solutions, so writing clearer code is the way to do it.

There really isnt a simple "tutorial" on performance, it is a very complex subject and one that even seasoned veterans often dont fully understand. Anyway, to give you more of an idea of what the overhead of "calling" a function is, basically what you are doing is "freezing" the state of your function(in Java there are no "functions" per se, they are all called methods), calling the method, then "unfreezing", where your method was before.
The "freezing" essentially consists of pushing state information(where you were in the method, what the value of the variables was etc) on to the stack, "unfreezing" consists of popping the saved state off the stack and updating the control structures to where they were before you called the function. Naturally memory operations are far from free, but the VM is pretty good at keeping the performance impact to an absolute minimum.
Now keep in mind Java is almost entirely heap based, the only things that really have to get pushed on the stack are the value of pointers(small), your place in the program(again small), and whatever primitives you have local to your method, and a tiny bit of control information, nothing else. Furthermore, although you cannot explicitly inline in Java(though Im sure there are bytecode editors out there that essentially let you do that), most VMs, including the most popular HotSpot VM, will do this automatically for you. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html
So the bottom line is pretty much 0 performance impact, if you want to verify for yourself you can always run benchmarking and profiling tools, they should be able to confirm it for you.

From a execution speed point of view it shouldn't matter, and if you still believe this is a bottleneck it is easy to measure.
From a development performance perspective, it is a good idea to keep the code short. I would vote for turning the loop contents into one (or more) properly named methods.

Forget it! You can't gain any performance by doing the job of the JIT. Let JIT inline it for you. Keep the methods short for readability and also for performance, as JIT works better with short methods.
There are microptimizations which may help you gain some performance, but don't even think about them. I suggest the following rules:
Write clean code using appropriate objects and algorithms for readability and for performance.
In case the program is too slow, profile and identify the critical parts.
Think about improving them using better objects and algorithms.
As a last resort, you may also consider microoptimizations.

FileInput/OutputStream versus FileChannels -- which gives better performance

I am writing a program that has to copy a sizeable, but not huge amount of data from folder to folder (in the range of several dozen photos at once). Originally I was using java.io.FileOutputStream to simply read to buffer and write out, but then I heard about potential performance increases using java.nio.FileChannel.
I don't have the resources to run a serious, controlled test with the data I have, but there seems to be no consensus on what the advantages of each are (other than FileChannel being thread safe). Some users report FileChannel being great for smaller files, others report huge speed increases with larger files.
I am wondering if anyone knows exactly what the intent of creating FileChannel was in the first place: was it designed for better performance? In what cases? And is there a definitive performance increase for general kinds of data, or are the differences I should expect to see trivial because I am not working with data that is specialized enough?
EDIT: Assume my data does not need to be thread safe.

FileChannel.transferFrom/To should be faster than IO stream for file copying.
Or you can simply use Java 7's java.nio.file.Files.copy(source, target). That should be as fast as it can get.
However, in the end, performance won't be noticeably different - hard disk speed is the bottleneck.
FileChannel is not non-blocking, and it is not selectable. Not sure if they are going to add these features in future. Java 7 has AsynchronousFileChannel though.

Input and Output Streams assume a stream styled access to the file or resource. There are a few extra items which help (array reads) but the basic idea is that of a stream where you read in one or more characters at a time (possibly blocking until you have more characters available).
Channels are the means to copy information into Buffers. This provides a lower level of access to input and output routines. With thoughtful buffer sizing, the speed-ups can be impressive. Structuring your code around buffers can reduce the time spent in a read loop (also increasing performance). Finally, while it is possible to do pre-checking of input stream state in an attempt to avoid blocking, Channels and Buffers allow operations to perform in a non-blocking manner (even in the worst conditions).

Have you take a look at commons-io?
FileUtils.copyFileToDirectory(srcFile, destDir);

Java Profiling, Performance Tuning and Memory Profiling exercises

I am about to conduct a workshop profiling, performance tuning, memory profiling, memory leak detection etc. of java applications using JProfiler and Eclipse Tptp.
I need a set of exercises that I could offer to participants where they can:
Use the tool to to profile the discover the problem: bottleneck, memory leak, suboptimal code etc. I am sure there is plenty experience and real-life examples around.
Resolve the problem and implement optimized code
Demonstrate the solution by performing another session of profiling
Ideally, write the unit test that demonstrates the performance gain
Problems nor solutions should not be overly complicated; it should be possible to resolve them in matter of minutes at best and matter of hours at worst.
Some interesting areas to exercise:
Resolve memory leaks
Optimize loops
Optimize object creation and management
Optimize string operations
Resolve problems exacerbated by concurrency and concurrency bottlenecks
Ideally, exercises should include sample unoptimized code and the solution code.

I try to find real life examples that I've seen in the wild (maybe slightly altered, but the basic problems were all very real). I've also tried to cluster them around the same scenario, so you can build up a session easily.
Scenario: you have a time consuming function that you want to do many times for different values, but the same values may pop up again (ideally not too long after it was created). A good and simple example is url-web page pairs that you need to download and process (for the exercise it should be probably simulated).
Loops:
You want to check if any of a set of words pops up in the pages. Use your function in a loop, but with the same value, pseudo code:
for (word : words) {
checkWord(download(url))
}
One solution is quite easy, just download the page before the loop.
Other solution is below.
Memory leak:
simple one: you can also solve your problem with a kind of cache. In the simplest case you can just put the results to a (static) map. But if you don't prevent it, its size will grow infinitely -> memory leak.
Possible solution: use an LRU map. Most likely performance will not degrade too much, but the memory leak should go away.
trickier one: say you implement the previous cache using a WeakHashMap, where the keys are the URLs (NOT as strings, see later), values are instances of a class that contain the URL, the downloaded page and something else. You may assume that it should be fine, but in fact it is not: as the value (which is not weakly referenced) has a reference to the key (the URL) the key will never be eligible to clean up -> nice memory leak.
Solution: remove the URL from the value.
Same as before, but the urls are interned strings ("to save some memory if we happen to have the same strings again"), value does not refer to this. I did not try it, but it seems to me that it would also cause a leak, because interned Strings can not be GC-ed.
Solution: do not intern, which will also lead to the advice that you must not skip: don't do premature optimization, as it is the root of all evil.
Object creation & Strings:
say you want to display the text of the pages only (~remove html tags). Write a function that does it line by line, and appends it to a growing result. At first the result should be a string, so appending will take a lot of time and object allocation. You can detect this problem from performance point of view (why appends are so slow) and from object creation point of view (why we created so many Strings, StringBuffers, arrays, etc).
Solution: use a StringBuilder for the result.
Concurrency:
You want to speed the whole stuff up by doing downloading/filtering in parallel. Create some threads and run your code using them, but do everything inside a big synchronized block (based on the cache), just "to protect the cache from concurrency problems". Effect should be that you effectively use just one thread, as all the others are waiting to acquire the lock on the cache.
Solution: synchronize only around cache operations (e.g. use `java.util.collections.synchronizedMap())
Synchronize all tiny little pieces of code. This should kill performance, probably prevent normal parallel execution. If you are lucky/smart enough you can come up with a dead lock also.
Moral of this: synchronization should not be an ad hoc thing, on an "it will not hurt" basis, but a well thought thing.
Bonus exercise:
Fill up your cache at the beginning and don't do too much allocation afterward, but still have a small leak somewhere. Usually this pattern is not too easy to catch. You can use a "bookmark", or "watermark" feature of the profiler, which should be created right after the caching is done.

Don't ignore this method because it works very well for any language and OS, for these reasons. An example is here. Also, try to use examples with I/O and significant call depth. Don't just use little cpu-bound programs like Mandelbrot. If you take that C example, which isn't too large, and recode it in Java, that should illustrate most of your points.
Let's see:
Resolve memory leaks.
The whole point of a garbage collector is to plug memory leaks. However, you can still allocate too much memory, and that shows up as a large percent of time in "new" for some objects.
Optimize loops.
Generally loops don't need to be optimized unless there's very little done inside them (and they take a good percent of time).
Optimize object creation and management.
The basic approach here is: keep data structure as simple as humanly possible. Especially stay away from notification-style attempts to keep data consistent, because those things run away and make the call tree enormously bushy. This is a major reason for performance problems in big software.
Optimize string operations.
Use string builder, but don't sweat code that doesn't use a solid percent of execution time.
Concurrency.
Concurrency has two purposes.
1) Performance, but this only works to the extent that it allows multiple pieces of hardware to get cranking at the same time. If the hardware isn't there, it doesn't help. It hurts.
2) Clarity of expression, so for example UI code doesn't have to worry about heavy calculation or network I/O going on at the same time.
In any case, it can't be emphasized enough, don't do any optimization before you've proved that something takes a significant percent of time.

I have used JProfiler for profiling our application.But it hasn't been of much help.Then I used JHat.Using JHat you cannot see the heap in real time.You have to take a heap dump and then analyse it. Using the OQL(Object Query Language) is a good technique to find heap leaks.

How to avoid OutOfMemoryError when using Bytebuffers and NIO?

I'm using ByteBuffers and FileChannels to write binary data to a file. When doing that for big files or successively for multiple files, I get an OutOfMemoryError exception.
I've read elsewhere that using Bytebuffers with NIO is broken and should be avoided. Does any of you already faced this kind of problem and found a solution to efficiently save large amounts of binary data in a file in java?
Is the jvm option -XX:MaxDirectMemorySize the way to go?

I would say don't create a huge ByteBuffer that contains ALL of the data at once. Create a much smaller ByteBuffer, fill it with data, then write this data to the FileChannel. Then reset the ByteBuffer and continue until all the data is written.

Check out Java's Mapped Byte Buffers, also known as 'direct buffers'. Basically, this mechanism uses the OS's virtual memory paging system to 'map' your buffer directly to disk. The OS will manage moving the bytes to/from disk and memory auto-magically, very quickly, and you won't have to worry about changing your virtual machine options. This will also allow you to take advantage of NIO's improved performance over traditional java stream-based i/o, without any weird hacks.
The only two catches that I can think of are:
On 32-bit system, you are limited to just under 4GB total for all mapped byte buffers. (That is actually a limit for my application, and I now run on 64-bit architectures.)
Implementation is JVM specific and not a requirement. I use Sun's JVM and there are no problems, but YMMV.
Kirk Pepperdine (a somewhat famous Java performance guru) is involved with a website, www.JavaPerformanceTuning.com, that has some more MBB details: NIO Performance Tips

If you access files in a random fashion (read here, skip, write there, move back) then you have a problem ;-)
But if you only write big files, you should seriously consider using streams. java.io.FileOutputStream can be used directly to write file byte after byte or wrapped in any other stream (i.e. DataOutputStream, ObjectOutputStream) for convenience of writing floats, ints, Strings or even serializeable objects. Similar classes exist for reading files.
Streams offer you convenience of manipulating arbitrarily large files in (almost) arbitrarily small memory. They are preferred way of accessing file system in vast majority of cases.

Using the transferFrom method should help with this, assuming you write to the channel incrementally and not all at once as previous answers also point out.

This can depend on the particular JDK vendor and version.
There is a bug in GC in some Sun JVMs. Shortages of direct memory will not trigger a GC in the main heap, but the direct memory is pinned down by garbage direct ByteBuffers in the main heap. If the main heap is mostly empty they many not be collected for a long time.
This can burn you even if you aren't using direct buffers on your own, because the JVM may be creating direct buffers on your behalf. For instance, writing a non-direct ByteBuffer to a SocketChannel creates a direct buffer under the covers to use for the actual I/O operation.
The workaround is to use a small number of direct buffers yourself, and keep them around for reuse.

The previous two responses seem pretty reasonable. As for whether the command line switch will work, it depends how quickly your memory usage hits the limit. If you don't have enough ram and virtual memory available to at least triple the memory available, then you will need to use one of the alternate suggestions given.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.