I would like to know the difference in terms of performance between these two blocks that try to send a big file over a TCP socket.
I couldn't find much resources explaining their efficiency.
A-
byte[] buffer = new byte[1024];
int number;
while ((number = fileInputStream.read(buffer)) != -1) {
socketOutputStream.write(buffer, 0, number);
}
B-
byte mybytearray = new byte[filesize];
os.write(mybytearray);
Which one is better in terms of transfer delay?
Also What is the difference if i set the size to 1024 or 65536? How would that affect the performance.
The latency until the last byte of the file arrives is basically identical. However the first one is preferable, although with a much larger buffer, for the following reasons:
The data starts arriving sooner.
There is no assumption that the file size fits into an int.
There is no assumption that the entire file fits into memory, so
It scales to very large files without code changes.
Your MTU (Maximum Transmission Unit) size is likely to be around 1500 bytes. This means your data will be broken up (or combined into) this size no matter what you do. Any reasonable buffer size from 512 byte up is likely to give you the same transfer speed.
How you send and receieve data impact the amount of CPU you use. Unles syou have a fast network e.g. 10 GB, your CPU will mroe than keep up with your network.
Writing the code in an efficient manner will ensure you don't waste CPU (which is a good thing) but shouldn't make much difference to your transfer speed which is limited by your bandwidth (and latency of your network)
Related
Currently I am using this code on both Server and Client Side. Client is an android device.
BufferedOutputStream os = new BufferedOutputStream(socket.getOutputStream(),10000000);
BufferedInputStream sin = new BufferedInputStream(socket.getInputStream(),10000000);
os.write("10000000\n".getBytes());
os.flush();
for (int i =0;i<10000000;i++){
os.write((sampleRead[i]+" ").getBytes());
}
os.flush();
The problem is that this code takes about 80 secs to transfer data from android client to server while it takes only 8 seconds to transfer the data back from server to client. The code is same on both sides and buffer is also same. I also tried with different buffer sizes but the problem is with this segment
for (int i =0;i<10000000;i++){
os.write((sampleRead[i]+" ").getBytes());
}
The buffering takes most of the time while the actual transfer takes only about 6-7 seconds on a 150mbps hotspot connection. What could be the problem and how to solve it?
First of all, as a commenter has already noted, using a monstrously large buffer is likely to be counter productive. Once your stream buffer is bigger than the size of a network packet, app-side buffering loses its effectiveness. (The data in your "big" buffer needs to be split packet-sized chunks by the TCP/IP stack before it goes onto the network.) Indeed, if the app-side buffer is really large, you may find that your data gets stuck in the buffer for a long time waiting for the buffer to fill ... while the network is effectively idle.
(The Buffered... readers, writers and streams are primarily designed to avoid lots of syscalls that transfer tiny amounts of data. Above 10K or so, the buffering doesn't performance help much.)
The other thing to now is that in a lot of OS environments, the network throughput is actually limited by virtualization and default network stack tuning parameters. To get a better throughput, you may need to tune at the OS level.
Finally, if your network path is going over a network path that is congested, has a high end-to-end latency or links with constrained data rate, then you are unlikely to get fast data transfers no matter how you tune things.
(Compression might help ... if you can afford the CPU overhead at both ends ... but some data links already do compression transparently.)
You could compress the data transfer, it will save a lot of memory and well to transfer a compress stream of data is cheaper... For that you need to implement compress logic in client side and decompress logic in server side, see GZIPInputStream... And try reducing the buffer size is huge for a mobile device...
I have arrays like
byte[] b = new byte[10];
byte[] b1 = new byte[1024*1024];
I populate them with some values. Say,
for(i=0;i<10;i++){
b[i]=1;
}
for(i=0;i<1024*1024;i++){
b1[i]=1;
}
Then I write it to a RandomAccessFile and read again from that file into the same array using,
randomAccessFile.write(arrayName);
and
randomAccessFile.read(arrayName);
When I try to calculate the throughput of both these arrays(using the time calculated for file read and write) of varying sizes(10 bytes and 1Mb), throughput appears to be more for 1MB array.
Sample Output:
Throughput of 10kb array: 0.1 Mb/sec.
Throughput of 1Mb array: 1000.0 Mb/sec.
Why does this happen? I have Intel i7 with quad core processor. Will my hardware configuration be responsible for this? If not what could be the possible reason?
The reason for the big difference is the overheads involved in I/O that occur no matter what the size of data being transferred - it's like the flag fall of a taxi ride. Overheads, which are not restricted to java and includes many O/S operations, include:
Finding the file on disk
Checking O/S permissions on the file
Opening the file for I/O
Closing the file
Updating file info in the file system
Many other tasks
Also, disk I/O is performed in pages (size depends on O/S, but usually 2K), so I/O of 1 byte probably costs the same as I/O of 2048 bytes: A slightly fairer comparison would be a 2048 byte array with a 1Mb array.
If you are using buffered I/O, that can further speed up larger I/O tasks.
Finally, what you report as "10Kb" is in fact just 10 bytes, so your calculation is possibly incorrect.
This is more like a matter of conscience than a technological issue :p
I'm writing some java code to dowload files from a server...For that, i'm using the BufferedOutputStream method write(), and BufferedInputStream method read().
So my question is, if i use a buffer to hold the bytes, what should be the number of bytes to read? Sure i can read byte to byte using just int byte = read() and then write(byte), or i could use a buffer. If i take the second approach, is there any aspects that i must pay attention when defining the number of bytes to read\write each time? What will this number affect in my program?
Thks
Unless you have a really fast network connection, the size of the buffer will make little difference. I'd say that 4k buffers would be fine, though there's no harm in using buffers a bit bigger.
The same probably applies to using read() versus read(byte[]) ... assuming that you are using a BufferedInputStream.
Unless you have an extraordinarily fast / low-latency network connection, the bottleneck is going to be the data rate that the network and your computers' network interfaces can sustain. For a typical internet connection, the application can move the data two or more orders of magnitude of times faster than the network can. So unless you do something silly (like doing 1 byte reads on an unbuffered stream), your Java code won't be the bottleneck.
BufferedInputStream and BufferedOutputStream typically rely on System.arraycopy for their implementations. System.arraycopy has a native implementation, which likely relies on memmove or bcopy. The amount of memory that is copied will depend on the available space in your buffer, but regardless, the implementation down to the native code is pretty efficient, unlikely to affect the performance of your application regardless of how many bytes you are reading/writing.
However, with respect to BufferedInputStream, if you set a mark with a high limit, a new internal buffer may need to be created. If you do use a mark, reading more bytes than are available in the old buffer may cause a temporary performance hit, though the amortized performance is still linear.
As Stephen C mentioned, you are more likely to see performance issues due to the network.
What is the MTU(maximum traffic unit) in your network connection? If you using UDP for example, you can check this value and use smaller array of bytes. If this is no metter, you need to check how memory eats your program. I think 1024 - 4096 will be good variant to save this data and continue to receive
If you pump data you normally do not need to use any Buffered streams. Just make sure you use a decently sized (8-64k) temporary byte[] buffer passed to the read method (or use a pump method which does it). The default buffer size is too small for most usages (and if you use a larger temp array it will be ignored anyway)
I am creating an application to upload data to a server. The data will be pretty huge, up to 60-70gb. I am using java since I need it to run in any browser.
My approach is something like this:
InputStream s = new FileInputStream(file);
byte[] chunk = new byte[20000000];
s.read(chunk);
s.close();
client.postToServer(chunk);
For the moment it uses a large amount of memory, steadily climbs to about 1gb, and when the garbage collector hits it is VERY obvious, a 5-6 second gap between chunks.
Is there any way to improve the performance of this and keep the memory footprint to a decent level?
EDIT:
This is not my real code. There is alot of other things I do like calculating CRC, validating against InputStream.read return value, etcetera.
You need to think about buffer reuse, something like this:
int size = 64*1024; // 64KiB
byte[] chunk = new byte[size];
int read = -1;
for( read = s.read(chunk); read != -1; read = s.read(chunk)) {
/*
* I do hope you have some API call like the thing below, or at least one with a wrapper object that
* exposes partially filled buffers. Because read might not be the size of the entire buffer if there
* are less than that amount of bytes available in the input stream until the end of the file...
*/
client.postToServer(chunk, 0, read);
}
The first step would be to re-use your buffer, if you don't already do so. Reading a huge file should not generally require a lot of memory unless you keep it all in memory.
Also: Why are you using such a huge buffer? There's nothing really to be gained from it (unless you have an insanely fast network connection & hard disk). Reducing it to about 64k should have no negative effect on performance and might help Java with the GC.
You can try to tune the garbage collector ( http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html , http://www.petefreitag.com/articles/gctuning/ )
Following this thread.
Streaming large files in a java servlet.
Is it possible to find the total internet bandwidth available in current machine thru java?
what i am trying to do is while streaming large files thru servlet, based on the number of parallel request and the total band width i am trying to reduce the BUFFER_SIZE of the stream for each request. make sense?
Is there any pure java way? (without JNI)
Maybe you can time how long the app need to send one package (the buffer). And if that is larger than x milliseconds, then make your buffer smaller. You can use other values for the original bufferSize and if (stop - start > 700).
This is based on the thread you noticed:
ServletOutputStream out = response.getOutputStream();
InputStream in = [ code to get source input stream ];
String mimeType = [ code to get mimetype of data to be served ];
int bufferSize = 1024 * 4;
byte[] bytes = new byte[bufferSize];
int bytesRead;
response.setContentType(mimeType);
while ((bytesRead = in.read(bytes)) != -1) {
long start = System.currentTimeMillis();
out.write(bytes, 0, bytesRead);
long stop = System.currentTimeMillis();
if (stop - start > 700)
{
bufferSize /= 2;
bytes = new byte[bufferSize];
}
}
// do the following in a finally block:
in.close();
out.close();
The only way to find available bandwidth is to monitor / measure it. On windows you have access to Net.exe and can get the throughput on each NIC.
If you're serving the content through a servlet, then you could calculate how fast each servlet output stream is going. Collect that data for all streams for a user/session, and you could determine at least what the current bandwidth usage is.
A possible way to calculate the rate could be instead of writing the large files through the servlet output stream, write to a new FilterOutputStream that would keep track of your download rates.
The concept of "total internet bandwidth available in current machine" is really hard to define. However, tweaking the local buffer size will not affect how much data you can push through to an individual client.
The rate at which a given client can take data from your server will vary with the client, and with time. For any given connection, you might be limited by your local upstream connection to the Internet (e.g., server on DSL) or you might be limited somewhere in the core (unlikely) or the remote end (e.g., server in a data center, client on a dialup line). When you have many connections, each individual connection may have a different bottleneck. Measuring this available bandwidth is a hard problem; see for example this list of research and tools on the subject.
In general, TCP will handle using all the available bandwidth fairly for any given connection (though sometimes it may react to changes in available bandwidth slower than you like). If the client can't handle more data, the write call will block.
You should only need to tweak the buffersize in the linked question if you find that you are seeing low bandwidth and the cause of that is insufficient data buffered to write to the network. Another reason you might tweak the buffer size is if you have so many active connections that you are running low on memory.
In any case, the real answer may be to not buffer at all but instead put your static files on a separate server and use something like thttpd to serve them (using a system call like sendfile) instead of a servlet. This helps ensure that the bottleneck is not on your server, but somewhere out in the Internet, beyond your control.
EDIT: Re-reading this, it's a little muddled because it's late here. Basically, you shouldn't have to do this from scratch; use one of the existing highly scalable java servers, since they'll do it better and easier.
You're not going to like this, but it actually doesn't make sense, and here's why:
Total bandwidth is independent of the number of connections (though there is some small overhead), so messing with buffer sizes won't help much
Your chunks of data are being broken into variable-sized packets anyway. Your network card and protocol will deal with this better than your servlet can
Resizing buffers regularly is expensive -- far better to re-use constant buffers from a fixed-size pool and have all connections queue up for I/O rights
There are a billion and a half libraries that assist with this sort of server
Were this me, I would start looking at multiplexed I/O using NIO. You can almost certainly find a library to do this for you. The IBM article here may be a useful starting point.
I think the smart money gives you one network I/O thread, and one disk I/O thread, with multiplexing. Each connection requests a buffer from a pool, fills it with data (from a shared network or disk Stream or Channel), processes it, then returns the buffer to the pool for re-use. No re-sizing of buffers, just a bit of a wait for each chunk of data. If you want latency to stay short, then limit how many transfers can be active at a time, and queue up the others.