Let's say we have SocketChannel (in non-blocking mode) that registered with Selector for read interest. Let's say after select() Selector tells us that this channel is ready for read and we have some ByteBuffer. We want to read some bytes from our channel to this buffer (ByteBuffer is cleared before reading). For this we use channel's read() method that returns actual number of bytes read. Lets suppose that this number is positive after read from channel and also ByteBuffer's method hasRemaining() returns true. Is it practical in this situation to immediately try to read from same channel some more?
The same question for write(). If write() returns positive value and not all contents of the buffer was sent, is it practical to immediately try again until write() returns zero?
If you get a short read result, there is no more data to read without blocking, so you must not read again until there is. Otherwise the next read will almost certainly return zero or -1.
If the read fills the buffer, it might make sense from the point of view of that one connection to keep reading until it returns <= 0, but you are stealing cycles from the other channels. You need to consider fairness as well. In general you should probably do one read and keep iterating over the selected keys. If there's more data there the select will tell you next time.
Use big buffers.
This also means that it's wrong to clear the buffer before each read. You should get the data out with a flip/get/compact cycle, then the buffer is ready to read again and you don't risk losing data. This in turn implies that you need a buffer per connection.
It all depends on the data rate at which data is arriving, and the latency requirements of your application. If you don't care about latency at all, you might get slightly higher bandwidth by delaying your read interest until you suspect enough data has arrived to fill your buffer.
You have to be careful, though. Delaying reads could force the kernel to buffer more data, possibly fill its buffer, and have to start dropping packets or otherwise engage some flow control. That will more than kill any benefits from the last paragraph.
So generally, you want to read as much as you can, as early as you can. The benefits for batching reads are minor at best, and the potential pitfalls can be major. And keep in mind that the fact that you're seeing non-full reads means you're processing the data faster than it is coming in. In other words, you're in a state where you have CPU to burn, so the extra overhead of smaller reads is essentially free.
Related
Since JDK 1.4, Direct Buffer was introduced along with Java NIO. One reason of it is Java GC may move the memory. Therefore the buffer data must be put off heap.
I'm wondering why traditional Java blocking IO api (BIO) doesn't need a direct buffer? Does BIO use something like direct buffer internally, or are there some other mechanisms to avoid the "memory movement" problem?
The simple answer is: It doesn't matter. Java has a clear, and public, spec. The JLS, the JVMS, and the javadoc of the core library. Java implementations do exactly what those 3 documents state, and you may trust that somehow it 'works'. This isn't as trite as it sounds, for example, the JMM (Java Memory Model, part of the JVMS if memory serves) lays out all sorts of things a JVM 'may' do in regards to re-ordering instructions and caching local writes, which is tricky, because it means if you mess that up, given that it is a 'may', a JVM may not actually bug out, even though your code is buggy, in that a JVM may do X, and if it does that, your code breaks; just that on your machine, at this time, with this song playing on your music player, the JVM chose never to do X, so you can't observe the problem.
Fortunately, the BIO stuff mostly has no may in it.
Here is the basic outlay of BIO in java:
You call .read() .read(byte[]), or .read(byte[], off, len).
(This is no guarantee; an implementation detail; a JVM is not required to do it this way): The JVM will read 'as much as is currently available' (hence, .read(some100SizedByteArr) may read only 50 bytes, even though if you call read again it'll read more bytes: 50 so happened to be 'ready' in the network buffer. Lots of folks get that wrong and think .read(byte[]) will fill the byte array if it can. Nope. That would make it impossible to write code that processes data as it comes in!
(Again, no guarantee): Given that byte arrays can be shoved around in memory, you'd think that's a problem, but it really isn't: That byte[] is guaranteed not to magically grow new bytes in it, there is no way with the BIO API to say: Just fill this array as the bytes fly in over the wire. The only way to fill that array is to call .read() on your inputstream. That is a blocking operation, and the JVM can therefore 'deal with it' as it pleases. Perhaps the native layer simply locks out the garbage collector until data is returned (this isn't as pricey as it sounds; the .read() method, once at least 1 byte can be returned, returns quickly, it doesn't wait for more data beyond the first byte, at least, that's how most JVMs do it). Perhaps it will read the data into a cloned buffer that lives out of heap and blits it over into your array later (sounds inefficient, perhaps, but a JVM is free to do it this way). Possibly the JVM marks that byte array specifically as off-limits for movement of any sort but the GC just collects 'around' it. It doesn't matter - a JVM can do whatever it wants. As long as it guarantees that `.read(byte[]):
Blocks until EOF is reached (in which case it returns -1), or at least 1 byte is available.
Fills the byte array with the bytes so returned.
Marks the inputstream as having 'consumed' all that you just got.
Returns a value representing how many bytes have been filled.
That's sort of the point of java: The how is irrelevant. Had the how not been irrelevant, writing a JVM for a new platform could be either impossible or require full virtualization, making it incredibly slow. The docs give themselves some 'may' clauses exactly so that this can be avoided.
One place where may does show up in BIO: When you .interrupt() a thread that is currently locked in a BIO .write() call (and the bytes haven't all been sent yet, let's say the network is slow and you sent a big array), o a BIO .read() call (it blocks until at least 1 byte is available; let's say the other side isn't sending anything) - then what happens? The docs leave it out. It 'may' result in an IOException being thrown, thus ending the read/write call, with a message indicating you interrupted it. Or, .interrupt() does nothing, and it is in fact impossible to interrupt a thread frozen on a BIO call. Most JVMs do the exception thing (fortunately), but the docs leave room - if for whatever reason the underlying OS/arch don't make that feasible, then a JVM is free not to do anything if you attempt to interrupt(). Conclusion: If you want to write proper 'write once run anywhere' code you can't rely on the idea that you can .interrupt() BIO freezes.
I have a really strange behavior in Java and I can't tell whether this happens on purpose or by chance.
I do have a Socket Connection to Server that sends me a response to a request. I am reading this response from the Socket with the following loop, which is encapsulated in a try-with-resource.
BufferedInputStream remoteInput = new BufferedInputStream(remoteSocket.getInputStream())
final byte[] response = new byte[512];
int bytes_read;
while ((bytes_read = remoteInput.read(response,0,response.length)) != -1) {
// Messageparsingstuff which does not affect the behaviour
}
According to my understanding the "read" Method fills as many bytes as possible into the byte Array. The limiting factors are either the amount of received bytes or the size of the array.
Unfortunately, this is not whats happening: the protocol I'm transmitting answers my request with several smaller answers which are sent one after another over the same socket connection.
In my case the "read" Method always returns with exactly one of those smaller answers in the array. The length of the answers varies but the 512 Byte that fit into the array are always enough. Which means my array always contains only one message and the rest/unneeded part of the array remains untouched.
If I intentionally define the byte-array smaller than my messages it will return several completely filled arrays and one last array that contains the rest of the bytes until the message is complete.
(A 100 byte answer with an array length of 30 returns three completely filled arrays and one with only 10 bytes used)
The InputStream or a socket connection in general shouldn't interpret the transmitted bytes in any way which is why I am very confused right now. My program is not aware of the used protocol in any way. In fact, my entire program is only this loop and the stuff you need to establish a socket connection.
If I can rely on this behavior it would make parsing the response extremely easy but since I do not know what causes this behavior in the first place I don't know whether I can count on it.
The protocol I'm transmitting is LDAP but since my program is completely unaware of that, that shouldn't matter.
According to my understanding the "read" Method fills as many bytes as possible into the byte Array.
Your understanding is incorrect. The whole point of that method returning the "number of bytes read" is: it might return any number. And to be precise: when talking about a blocking read - when the method returns, it has read something; thus it will return a number >= 1.
In other words: you should never every rely on read() reading a specific amount of bytes. You always always always check the returned numbers; and if you are waiting for a certain value to be reached, then you have to do something about that in your code (like buffering again; until you got "enough" bytes in your own buffer to proceed).
Thing is: there is a whole, huge stack of elements involved in such read operations. Network, operating system, jvm. You can't control what exactly happens; and thus you can not and should not build any implicit assumptions into your code like this.
While you might see this behaviour on a given machine, esp over loopback, once you start using real networks and use different hardware this can change.
If you send messages with enough of a delay, and read them fast enough, you will see one message at a time. However, if writing messages are sent close enough or your reader is delayed in any way, you can get multiple messages sent at once.
Also if you message is large enough e.g. around the MTU or more, a single message can be broken up even if your buffer is more than large enough.
I am converting the details that has to be sent from my C++ function to Java as strings and as a char* which will be sent through socket.
My buffer size is 10 MB. Can I send the 10MB in one shot or should I split and send as chunks of smaller memory?
What is the difference between those two approaches? If I should send as smaller memory what should be the chunk size?
Can I send the 10MB in one shot
Yes.
or should I split and send as chunks of smaller memory?
No.
What is the difference between those two approaches?
The difference is that in case 1 you are letting TCP make all the decisions it is good at, with all the extra knowledge it has that you don't have, about the path MTU, the RTT, the receive window at the peer, ... whereas in case 2 you're trying to do TCP's job for it. Keeping a dog and barking yourself.
If I should send as smaller memory what should be the chunk size?
As big as possible.
When you call the write() function, you provide a buffer and number of bytes you want to write. However it is not guaranteed that the OS will send/write all the bytes that you are willing to write in a single shot. (In case of blocking sockets, the write() call would block until it copies the entire chunk to the TCP buffer. However in case of non-blocking ones, the write() would return and would not block and would write the just the bytes it is able to).
The TCP/IP stack runs in the OS and each OS will have its own implemenation of the stack. This stack would determine the buffer sizes and moreover the TCP/IP would itself take care of all the low level statistics such as MSS, the available receiver window size, which would let TCP run the flow control, congestion control related algorithms.
Therefore it is best that let TCP decide how would it want to send your data. Instead of you breaking the data into chunks, let the TCP stack do it for you.
Just be careful with the thing that always check the number of bytes actually sent which is returned by the write() call.
This is more like a matter of conscience than a technological issue :p
I'm writing some java code to dowload files from a server...For that, i'm using the BufferedOutputStream method write(), and BufferedInputStream method read().
So my question is, if i use a buffer to hold the bytes, what should be the number of bytes to read? Sure i can read byte to byte using just int byte = read() and then write(byte), or i could use a buffer. If i take the second approach, is there any aspects that i must pay attention when defining the number of bytes to read\write each time? What will this number affect in my program?
Thks
Unless you have a really fast network connection, the size of the buffer will make little difference. I'd say that 4k buffers would be fine, though there's no harm in using buffers a bit bigger.
The same probably applies to using read() versus read(byte[]) ... assuming that you are using a BufferedInputStream.
Unless you have an extraordinarily fast / low-latency network connection, the bottleneck is going to be the data rate that the network and your computers' network interfaces can sustain. For a typical internet connection, the application can move the data two or more orders of magnitude of times faster than the network can. So unless you do something silly (like doing 1 byte reads on an unbuffered stream), your Java code won't be the bottleneck.
BufferedInputStream and BufferedOutputStream typically rely on System.arraycopy for their implementations. System.arraycopy has a native implementation, which likely relies on memmove or bcopy. The amount of memory that is copied will depend on the available space in your buffer, but regardless, the implementation down to the native code is pretty efficient, unlikely to affect the performance of your application regardless of how many bytes you are reading/writing.
However, with respect to BufferedInputStream, if you set a mark with a high limit, a new internal buffer may need to be created. If you do use a mark, reading more bytes than are available in the old buffer may cause a temporary performance hit, though the amortized performance is still linear.
As Stephen C mentioned, you are more likely to see performance issues due to the network.
What is the MTU(maximum traffic unit) in your network connection? If you using UDP for example, you can check this value and use smaller array of bytes. If this is no metter, you need to check how memory eats your program. I think 1024 - 4096 will be good variant to save this data and continue to receive
If you pump data you normally do not need to use any Buffered streams. Just make sure you use a decently sized (8-64k) temporary byte[] buffer passed to the read method (or use a pump method which does it). The default buffer size is too small for most usages (and if you use a larger temp array it will be ignored anyway)
I have two scenarios in Netty where I am trying to minimize memory copies and optimize memory usage:
(1) Reading a very large frame (20 Megabites).
(2) Reading lots of very little frames (20 megabites at 50 bites per frame) to rebuild into one message at a higher level in the pipeline.
For the first scenario, as I get a length at the beginning of the frame, I extended FrameDecoder. Unfortunately as I don't see how to return the length to Netty (I only indicate whether the frame is complete or not), I believe Netty is going through multiple fill buffer, copy and realloc cycles thus using for more memory than is required. Is there something I am missing here? Or should I be avoiding the FrameDecoder entirely if I expect this scenario?
In the second scenario, I am currently creating a linked list of all the little frames which I wrap using ChannelBuffers.wrappedBuffer (which I can then wrap in a ChannelBufferInputStream), but I am again using far more memory than I expected to use (perhaps because the allocated ChannelBuffers have spare space?). Is this the right way to use Netty ChannelBuffers?
There is a specialized version of frame decoder called, LengthFieldBasedFrameDecoder. Its handy, when you have a header with message length. It can even extract the message length from header by giving an offset.
Actually, ChannelBuffers.wrappedBuffer does not creates copies of received data, it creates a composite buffer from given buffers, so your received frame data will not be copied. If you are holding the composite buffers/ your custom wrapper in the code and forgot to nullify, memory leaks can happen.
These are practices I follow,
Allocate direct buffers for long lived objects, slice it on use.
when I want to join/encode multiple buffers into one big buffer. I Use ChannelBuffers.wrappedBuffer
If I have a buffer and want to do something with it/portion of it, I make a slice of it by calling slice or slice(0,..) on channel buffer instance
If I have a channel buffer and know the position of data which is small, I always use getXXX methods
If I have a channel buffer, which is used in many places for make something out of it, always make it modifiable, slice it on use.
Note: channelbuffer.slice does not make a copy of the data, it creates a channel buffer with new reader & write index.
In the end, it appeared the best way to handle my FrameDecoder issue was to write my own on top of the SimpleChannelUpstreamHandler. As soon as I determined the length from the header, I created the ChannelBuffer with size exactly matching the length. This (along with other changes) significantly improved the memory performance of my application.