Java. Is it always send with buffer size? - java

When using Java sockets, is msg always send with buffer size lenght? (When i send only 300bytes, is it anyway send in package with for example 1024 bytes size?)
And what size buffer is the best option? What is a diffrence betweend 512 bytes, and 8k bytes size?
I don't want to create few threads, so I will ask here. Is java standard tcp serversocket performance enought for handling 100 connestions with 10-20 msg/s smooth?

Most machines have an MTU of 1500 bytes. This means if you send some multiple of this, it will break it into packets. If you have less than this, it may hold the data for a short period of time to see if more data will be sent to reduce the overhead of sending small packets. see Nagle's algorithim for more.
There is not much difference between sending 16 lots of 512 bytes or 8 KB all at once as the OS and network adapter will do some coalescing by default.
What really matters is the bandwidth of your connection between you and the other end. if you have up to 2000 messages per second and they are 512 bytes, you need a 10 Mbit/s line (2000*512*8 = ~8 Mbit)
In terms of numbers, you shouldn't have a problem up to about 10,000 connections, or around 500,000 msg per second. If you have a 1 Gbit/sec line you should be able to get 100 MB/s easily. If you have a 10 Gbit/s line you should be able to get more but you might have trouble using all the bandwidth unless you are careful.

Related

Byte array outputstream [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Stupid question probably but what is the ideal length for a byte array to send over an outputstream? I couldn't find anything about this online.
I have found a lot of example that set their array size equal to 2^X or something similar. But what is the purpose of this?
There is no optimal size. OutputStream is an abstract concept; there are a million implementations of it. (Not just 'FileOutputStream is an implementation', but 'FileOutputStream, on OpenJDK11, on Windows 10, with this servicepack, with this CPU and this much system memory, under these circumstances').
The reason you see that is for buffer efficiency. The problem with sending 1 byte is usually basically nothing, but sometimes, sending 1 (or very few) bytes results in this nasty scenario:
you send one byte.
The underlying outputstream isn't designed to buffer that byte, it doesn't have the storage for it, so the only thing it can do is send it onwards to the actual underlying resource. Let's say the OutputStream represents a file on the filesystem.
The kerneldriver for this works similarly. (Most OSes do buffer internally, but you can ask the OS not to do this when you open the file).
Therefore, that one byte now needs to be written to disk. However, it is an SSD, and you can't do that to an SSD, you can only write an entire cell at once*. That's just how SSDs work: You can only write an entire block's worth. They aren't bits in sequence on a big platter.
So, the kernel reads the entire cell out, updates the one byte you are writing, and writes the entire cell back to the SSD.
Your actual loop does write, say, about 50,000 bytes, so something that should have taken a single SSD read and write, now takes 50,000 reads and 50,000 writes, burning through your SSD cell longevity and taking 50,000 times longer than needed.
Similar issues occur for networking (end up sending a single byte, wrapped in HTTP headers, wrapped in 2 TCP/IP packets, resulting in sending ~1000 bytes over the network for each byte you .write(singleValue) and many other such systems.
So why don't these streams buffer?
Because there are cases where you don't actually want them to do this; there are plenty of reasons to write I/O with specific efficiency in mind.
Is there no way to just let something to this for me?
Ah, you're in luck, there is! BufferedWriter and friends (BufferedOutputStream exists as well) wrap around an underlying stream and buffer for you:
var file = new FileOutputStream("/some/path");
var wrapped = new BufferedOutputStream(file);
file.write(1); // this is a bad idea
wrapped.write(1); // this is fine
Here, the wrapped write doesn't result in anything happening except some memory being shoved around. No bytes are written to disk (with the downside that if someone trips over a powercable, it's just lost). Only after you close wrapped, or call flush() on wrapped, or write some sufficient amount of bytes to wrapped, will wrapped end up actually sending a whole bunch of bytes to the underlying stream. This is what you should use if making a byte array is unwieldy. Why reinvent the wheel?
But I want to write to the underlying raw stream
Well, you're using too few bytes if the amount of bytes is less than what a single TCP/IP packet can hold, or an unfortunate size otherwise (imagine the TCP/IP packet can hold 1000 bytes exactly, and you send 1001 and bytes. That's one full packet, and then a second packet with just 1 byte, giving you only 50% efficiency. 50% is still better than 0.1% efficiency which byte-at-a-time would get you in this hypothetical). But, if you send, say, 5001 bytes, that's 5 full packets and one regrettable 1-byte packet, for 83.35% efficiency. Unfortunate it's not near 100, but not nearly as bad. Same applies to disk (if an SSD cell holds 2048 bytes and you send 65537 bytes, it's still ~96/7% efficient).
You're using too many bytes if the impact on your own java process is such that this becomes problematic: It's causing excessive garbage collection, or, worse, out of memory errors.
So where is the 'sweet spot'? Depends a little bit, but 65536 is common and is unlikely to be 'too low'. Unless you run many thousands of simultaneous threads, it's not too high either.
It's usually a power of 2 mostly because of superstition, but there is some sense in it: Those underlying buffer things are often a power of two (computers are binary things, after all). So if the cell size happens to be, say, 2048, well, then you are 100% efficient if you send 65536 bytes (that's exactly 32 cells worth of data).
But, the only thing you're really trying to avoid is that 0.1% efficiency rate which occurs if you write one byte at a time to a packetizing (SSD, network, etc) underlying stream. Hence, it doesn't matter, as long as it's more than 2048 or so, you should already have avoided the doom scenario.
*) I'm oversimplifying; the point is that a single byte read or write can take as long as a whole chunk of them, and to give some hint as to why that is, not to do a complete deep-dive on SSD technology.

Sending a buffer of 10 MB through socket - Chunks or Whole 10MB?

I am converting the details that has to be sent from my C++ function to Java as strings and as a char* which will be sent through socket.
My buffer size is 10 MB. Can I send the 10MB in one shot or should I split and send as chunks of smaller memory?
What is the difference between those two approaches? If I should send as smaller memory what should be the chunk size?
Can I send the 10MB in one shot
Yes.
or should I split and send as chunks of smaller memory?
No.
What is the difference between those two approaches?
The difference is that in case 1 you are letting TCP make all the decisions it is good at, with all the extra knowledge it has that you don't have, about the path MTU, the RTT, the receive window at the peer, ... whereas in case 2 you're trying to do TCP's job for it. Keeping a dog and barking yourself.
If I should send as smaller memory what should be the chunk size?
As big as possible.
When you call the write() function, you provide a buffer and number of bytes you want to write. However it is not guaranteed that the OS will send/write all the bytes that you are willing to write in a single shot. (In case of blocking sockets, the write() call would block until it copies the entire chunk to the TCP buffer. However in case of non-blocking ones, the write() would return and would not block and would write the just the bytes it is able to).
The TCP/IP stack runs in the OS and each OS will have its own implemenation of the stack. This stack would determine the buffer sizes and moreover the TCP/IP would itself take care of all the low level statistics such as MSS, the available receiver window size, which would let TCP run the flow control, congestion control related algorithms.
Therefore it is best that let TCP decide how would it want to send your data. Instead of you breaking the data into chunks, let the TCP stack do it for you.
Just be careful with the thing that always check the number of bytes actually sent which is returned by the write() call.

Java: Throughput increase with increasing message size?

I have built a distributed client-server-database Java application based on sockets. The clients sends serialized objects to the servers (currently there are two servers) and the servers deserializes the objects and stores some content of the object in a postgreSQL database.
Now I'm benchmarking my system and I measured the size of the serialized object as well as the throughput and I made a very strange discovery which I cannot explain.
Until the object size reaches around 1400 Bytes or a bit less the throughput decreases, but then from object size of around 1400 Bytes till 2000 Bytes (or a bit above) the throughput stays constant and from an object size of around 2000 Bytes till 2600 Bytes (I only measured it till 2600 Bytes) the throughput increases.
I cannot explain this behaviour. My thinking was that the throughput will always decrease with increasing object size and if the MTU of 1500 Bytes is reached the decrease will be much bigger. But this seems not to be true and especially the constant throughput and the increase I cannot explain at all.

Preventing split packets over TCP

I am writing a program that transfers files over the network using TCP sockets.
Now I noticed that when I send a packet in size for example 1024 bytes, I get them split on the other side.
By "split" I mean I get some packets as if they were a part of a whole packet.
I tried to reduce the packet size and the algorithm worked, when the packet size was immensely small (about 30 bytes per packet) thus the file transferred very slowly.
Is there anything I can do in order to prevent the splitting?
SOLVED:i switched the connection to be over UDP and since UDP is packet bounded it worked
There is not such thing in TCP. TCP is a stream, what you write is what you get at the other end. This does not mean you get it the way it was written; TCP may break or group packets in order to do the jobs as effectively as possible. You can send 8 mega bytes packet in one write and TCP can break down into 10, 100 or 1000 packets, what you need to know is that at the other end you will get exactly 8 mega bytes no more no less. In order to do a file transfer effectively you need to tell the receiver how many bytes you are going to send. The receiver may read it in one chunk or in 100 chunks but must keep track of the data it reads and how many bytes to read.
Because TCP is stream oriented, TCP will not transfer information of 'packet boundaries', like UDP and SCTP.
So you must add information of packet boundaries to TCP payload, if it is not there already. There are several ways to do it:
You can use a length field for indicating how many bytes the following packet contains.
Or there could be a reserved symbol for separating different packets.
In all ways, receiver must read TCP input stream again, if complete packet is not received.
You can control the TCP maximum segment size in some socket implementations. If you set it low enough, you can make the segment fit inside a single packet. The BSD Sockets API, which influenced almost every other implementation, has a setsockopt() function that lets you set various options on the socket. One of them, TCP_MAXSEG, controls the maximum segment size.
Unfortunately for you, the standard Java Socket class doesn't support this particular option.

Predicting Network traffic overhead generated by a Java Application

I want to attempt to calculate how much data (bytes) I send/receive over the network. I send/receive both TCP and UDP packets, so I need to be able to calculate the size of these packets including their respective headers. I looked at this questions: Size of empty UDP and TCP packet and it lists the minimum size of the header, but is that libel to change? Should I just add the number of bytes I send in the packet, but the size of the minimum header? Also, I know at some point (n bytes) the data would be too big to fit in just one packet.
One other thing, the client is a mobile device, so it may receive over cellular or wifi. I am not sure if there is a difference in the packet size between the two, but I would probably just want to assume what ever is larger.
So my questions are, assuming the data is n bytes long:
1) How big would the TCP packet be, assuming it all fits in one packet?
2) How big would the UDP packet be, assuming it all fits in one packet?
3) Is there an easy way to determine the number of bytes it would take to overrun one packet? For both TCP and UDP.
Lets assume we're only talking about ethernet and IPv4
Look at your interface MTU, which has already subtracted
the size of the ethernet headers for the OS I can
remember (linux and FreeBSD)
Subtract 20 bytes for a normal IP header (no IP options)
Subtract 20 bytes for a normal TCP header
Or
Subtract 8 bytes for a UDP header
That is how much data you can pack into one IPv4 packet. So, if your TCP data is n bytes long, your total ethernet payload is (n + 20 + 20); your ethernet payload for UDP is (n + 20 + 8).
EDIT FOR QUESTIONS
RE: MTU
Your interface MTU is the largest ethernet payload that your drivers will let you encapsulate onto the wire. I subtract because we're assuming we start from the MTU and work up the encapsulation chain (i.e. eth -> ip -> tcp|udp); you cant send TCP or UDP without an IP header, so that must be accounted for as well..
RE: Calculating application overhead
Theoretical calculations about the overhead your application will generate are fine, but I suggest lab testing if you want meaningful numbers. Usage factors like average data transfer per client session, client hit rate per minute and concurrent clients can make a difference in some (unusual) cases.
It is sadly not possible to determine this completely. Packets might be split, reassembled etc. by network hardware all along the path to the receiver, so there is no guarantee to calculate the exact number of bytes.
Ethernet defines the frame size with 1500bytes, which makes 1460 bytes remaining if the headers are subtracted. Using jumbo frames up to 9k bytes is usually only supported locally. When the packet reaches the WAN, it will be fragmented.

Categories