I have lots of quests about Java NIO. I have read many articles where the discussion was digged deeper about it. But I really don't know in which aspects the NIO is quicker than IO.
Also I have observed that downloading a 100MB file with Java NIO code is at least 10 times faster than downloading with Java IO code.
Now my question regarding the fact is:
Suppose I am downloading a file is 1KB. In this case, will NIO code still be ten times faster for a 1KB file?
Generally speaking, NIO is faster than classic Java IO because it reduces the amount of in-memory copying. However, a ten-fold improvement in speed is implausible, even for large files. And when we are talking about downloading files (rather than reading / writing them to disk), the performance is likely to be dominated by the bandwidth and end-to-end latency to the machine you are loading from.
Finally, you are likely to find that the relative speedup of NIO for small files will be even less ... because of the overheads of establishing network connections, sending requests, processing headers and so on.
Related
We all have learned from trial and error that multiple blocking threads do not scale well, and that we should switch to using NIO where we see possible. Yet, there are not as many resources explaining why non-blocking is better by giving an under-the-hood example of how it actually works.
We all have learned from trial and error that multiple blocking threads do not scale well,
This was true 10 years ago, however in general, using blocking IO and NIO works well enough. Unless you have a very large number of connections and a service which does very little, you can support up to 1000 connections on a modern server comfortably. Don't forget servers are faster now, have much more cores, and people expect servers to do more work. i.e. the bottleneck is in the application not the IO.
we should switch to using NIO where we see possible.
The main benefit is reduced thread over head. As I mentioned this is not as high as it was when NIO was introduced more than ten years ago.
NIO is much harder to work with so I would suggest only using it if you really need to.
there are not as many resources explaining why non-blocking is better
The explanation is; you use less threads, thus you have a lower overhead. This only matters if the work each thread does is very little.
Note: It is often assumed that NIO mean non-blocking when actually the default behaviour of all the Channels in NIO is blocking. In fact in NIO, only TCP can be configured to be non-blocking. This is the exception rather than the rule.
Note2: the fastest way to handle a small number of connections is to use blocking NIO.
Finally, another benefit of using NIO is reduced copying of data if you use "direct" or native buffers. However, again you need to be doing bulk transfers of data, as soon as you start reading/writing the data in a byte by byte manner e.g. as text, this overhead swamps the gains you might have made.
by giving an under-the-hood example of how it actually works.
Most of the under the hood differences are either, not as much as you might imagine, or handled entirely by the operating system and thus obscured from Java or even the JVM.
So people are telling NIO is faster and scales better than IO.Will it be the same for a server handling 1000 concurrent GET/PUTs ?
Simple thread per model utilises multiples core to the max.Where does NIO stand at this respect?
Is there any way to combine both of these?If so,any links on details would be better.
Tens years ago NIO scaled much better than IO, largely because the number of threads you could efficiently use was relative small. esp on Linux systems. e.g. a few hundred threads. Today the tipping point is much higher e.g. around 10,000. If you need 100,000 connections, using NIO is a good idea. However if you only have a few thousand you are likely to find other issues such as your disk or network performance is far more critical.
I almost always use NIO with one thread per blocking connection. In fact the default behaviour until NIO2 in Java 7 is blocking Sockets and Files. BTW NIO2 uses a thread pool to support its "asynchronous" IO. ;)
In addition to links posted by #Kumar, I found this one helpful (was doing this research a few weeks ago): http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html.
See the associated slides as well for more detailed stats. He makes the argument for the java.io approach. Of course, as with everything, it depends on the use case.
This question already has answers here:
Fastest way to write huge data in text file Java
(7 answers)
Closed 6 years ago.
I get a fast stream of data (objects) and I would like to write it to a file.
This is a stand alone process so it doesn't do anything but read the data from a socket parse it to csv and write all to a file.
What is the best way to write a lot of csv lines to a file?
Is a buffer writing my solution?
Is there a buffered File object in Java ?
Should I manage it myself and use writeLines()?
Fastest way to write huge data in text file Java
If you're dealing with a huge throughput of data then I suggest you use a set of in-memory buffers where you deposit the data arriving and then have a thread/threadpool which uses Java NIO to "consume" these buffers and write them onto disk. You will however be limited by the disk writing speed -- bear in mind that it's not unusual for the speed of network to be faster than the speed of your hard disk! so you might want to consider a threadpool which writes in different physical locations and only "pastes" these files after all the data has been received and written.
As mentioned above, chances are that its disk I/O that limits you, not Java abstractions.
But beyond using a good lib to deal with CSV, you might consider using other (even more) efficient formats like JSON; as well as compression. GZIP is good at compressing things, but relatively slow; but there are faster ones too. For example, LZF (like this Java implementation) is fast enough to compress at speeds higher than typical disk I/O (and uncompress even faster). So compressing output may well increase throughput as well as reduce disk usage.
I remember 2 or 3 years ago reading a couple articles where people claimed that modern threading libraries were getting so good that thread-per-request servers would not only be easier to write than non-blocking servers but that they'd be faster, too. I believe this was even demonstrated in Java with a JVM that mapped Java threads to pthreads (i.e. the Java nio overhead was more than the context-switching overhead).
But now I see all the "cutting edge" servers use asynchronous libraries (Java nio, epoll, even node.js). Does this mean that async won?
Not in my opinion. If both models are well implemented (this is a BIG requirement) I think that the concept of NIO should prevail.
At the heart of a computer are cores. No matter what you do, you cannot parallelize your application more than you have cores. i.e. If you have a 4 core machine, you can ONLY do 4 things at a time (I'm glossing over some details here, but that suffices for this argument).
Expanding on that thought, if you ever have more threads than cores, you have waste. That waste takes two forms. First is the overhead of the extra threads themselves. Second is the time spent switching between threads. Both are probably minor, but they are there.
Ideally, you have a single thread per core, and each of those threads is running at 100% processing speed on their core. Task switching wouldn't occur in the ideal. Of course there is the OS, but if you take a 16 core machine and leave 2-3 threads for the OS, then the remaining 13-14 go towards your app. Those threads can switch what they are doing within your app, like when they are blocked by IO requirements, but don't have to pay that cost at the OS level. Write it right into your app.
An excellent example of this scaling is seen in SEDA http://www.eecs.harvard.edu/~mdw/proj/seda/ . It showed much better scaling under load than a standard thread-per-request model.
My personal experience is with Netty. I had a simple app. I implemented it well in both Tomcat and Netty. Then I load tested it with 100s of concurrent requests (upwards of 800 I believe). Eventually Tomcat slowed to a crawl and exhibited extremely bursty/laggy behavior. Whereas the Netty implementation simply increased response time, but continued with incredibly overall throughput.
Please note, this hinges on solid implementation. NIO is still getting better with time. We are learning how to tune our servers OSes to work better with it as well as how to implement the JVMs to better leverage the OS functionality. I don't think a winner can be declared yet, but I believe NIO will be the eventual winner, and it's doing quite well already.
It is faster as long as there is enough memory.
When there are too many connections, most of which are idle, NIO can save threads therefore save memory, and the system can handle a lot more users than thread-per-connection model.
CPU is not a direct factor here. With NIO, you effectively need to implement a threading model yourself, which is unlikely to be faster than JVM's threads.
In either choice, memory is the ultimate bottleneck. When load increases and memory used approaches max, GC will be very busy, and the system often demonstrate the symptom of 100% CPU.
Some time ago I found rather interesting presentation providing some insight on "why old thread-per-client model is better". There are even measurements. However I'm still thinking it through. In my opinion the best answer to this question is "it depends" because most (if not all) engineering decisions are trade offs.
Like that presentation said - there's speed and there's scalability.
One scenario where thread-per-request will almost certainly be faster than any async solution is when you have a relatively small number of clients (e.g. <100), but each client is very high volume. e.g. a realtime app where no more than 100 clients are sending/generating 500 messages a second each. Thread-per-request model will certainly be more efficient than any async event loop solution. Async scales better when there are many requests/clients because it doesn't waste cycles waiting on many client connections, but when you have few high volume clients with little waiting, it's less efficient.
From what I seen, authors of Node and Netty both recognize that these frameworks are meant to primarily address high volumes/many connections scalability situations, rather than being faster for for a smaller number of high volume clients.
So for some research work, I need to analyze a ton of raw movement data (currently almost a gig of data, and growing) and spit out quantitative information and plots.
I wrote most of it using Groovy (with JFreeChart for charting) and when performance became an issue, I rewrote the core parts in Java.
The problem is that analysis and plotting takes about a minute, whereas loading all of the data takes about 5-10 minutes. As you can imagine, this gets really annoying when I want to make small changes to plots and see the output.
I have a couple ideas on fixing this:
Load all of the data into a SQLite database.
Pros: It'll be fast. I'll be able to run SQL to get aggregate data if I need to.
Cons: I have to write all that code. Also, for some of the plots, I need access to each point of data, so loading a couple hundred thousand files, some parts may still be slow.
Java RMI to return the object. All the data gets loaded into one root object, which, when serialized, is about 200 megs. I'm not sure how long it would take to transfer a 200meg object through RMI. (same client).
I'd have to run the server and load all the data but that's not a big deal.
Major pro: this should take the least amount of time to write
Run a server that loads the data and executes a groovy script on command within the server vm. Overall, this seems like the best idea (for implementation time vs performance as well as other long term benefits)
What I'd like to know is have other people tackled this problem?
Post-analysis (3/29/2011): A couple months after writing this question, I ended up having to learn R to run some statistics. Using R was far, far easier and faster for data analysis and aggregation than what I was doing.
Eventually, I ended up using Java to run preliminary aggregation, and then ran everything else in R. R was also much easier to make beautiful charts than using JFreeChart.
Databases are very scalable, if you are going to have massive amounts of data. In MS SQL we currently group/sum/filter about 30GB of data in 4 minutes (somewhere around 17 million records I think).
If the data is not going to grow very much, then I'd try out approach #2. You can make a simple test application that creates a 200-400mb object with random data and test the performance of transferring it before deciding if you want to go that route.
Before you make a decision its probably worth understanding what is going on with your JVM as well as your physical system resources.
There are several factors that could be at play here:
jvm heap size
garbage collection algorithms
how much physical memory you have
how you load the data - is it from a file that is fragmented all over the disk?
do you even need to load all of the data at once - can it be done it batches
if you are doing it in batches you can vary the batch size and see what happens
if your system has multiple cores perhaps you could look at using more than one thread at a time to process/load data
if using multiple cores already and disk I/O is the bottleneck, perhaps you could try loading from different disks at the same time
You should also look at http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp if you aren't familiar with the settings for the VM.
If your data have a relational properties, there are nothing more natural than storing it at some SQL database. There you can solve your biggest problem -- performance, costing "just" to write your appropriate SQL code.
Seems very plain to me.
I'd look into analysis using R. It's a statistical language with graphing capabilities. It could put you ahead, especially if that's the kind of analysis you intend to do. Why write all that code?
I would recommend running a profiler to see what part of the loading process is taking the most time and if there's a possible quick win optimization. You can download an evaluation license of JProfiler or YourKit.
Ah, yes: large data structures in Java. Good luck with that, surviving "death by garbage collection" and all. What java seems to do best is wrapping a UI around some other processing engine, although it does free developers from most memory management tasks -- for a price. If it were me, I would most likely do the heavy crunching in Perl (having had to recode several chunks of a batch system in perl instead of java in a past job for performance reasons), then spit the results back to your existing graphing code.
However, given your suggested choices, you probably want to go with the SQL DB route. Just make sure that it really is faster for a few sample queries, watch the query-plan data and all that (assuming your system will log or interactively show such details)
Edit,(to Jim Ferrans) re: java big-N faster than perl (comment below): the benchmarks you referenced are primarily little "arithmetic" loops, rather than something that does a few hundred MB of IO and stores it in a Map / %hash / Dictionary / associative-array for later revisiting. Java I/O might have gotten better, but I suspect all the abstractness still makes it comparitively slow, and I know the GC is a killer. I haven't checked this lately, I don't process multi-GB data files on a daily basis at my current job, like I used to.
Feeding the trolls (12/21): I measured Perl to be faster than Java for doing a bunch of sequential string processing. In fact, depending on which machine I used, Perl was between 3 and 25 times faster than Java for this kind of work (batch + string). Of course, the particular thrash-test I put together did not involve any numeric work, which I suspect Java would have done a bit better, nor did it involve caching a lot of data in a Map/hash, which I suspect Perl would have done a bit better. Note that Java did much better at using large numbers of threads, though.