How the buffer byte array is continuously filling while streaming? - java

The below piece of code using to reading for Files
int bytesRead;
byte[] bytes = new byte[1000]; //buffer
FileInputStream fis = new FileInputStream(uploadedFile);
while ((bytesRead = fis.read(bytes)) != -1) {
fis.read(bytes, 0, bytesRead);
}
fis.close();
As per api of read() method
Reads up to b.length bytes of data from this input stream into an array of bytes. This method blocks until some input is available.
There is no where specified that it refills the bytes array,but the stream filling the array until the file successfully read..
But how the internally it's maintaining to get this magic done ??
I saw source code or read method
public int More ...read(byte b[]) throws IOException {
214 return readBytes(b, 0, b.length);
215 }
and readBytes's source code is
200 private native int More ...readBytes
(byte b[], int off, int len) throws IOException;
There is noting mentioned that how bytes ..
I uploaded a 500MB file without any problem,with allocation that 1000 bytes array.

If you're asking why you can read a ~500 MB file with a roughly 1 KB buffer, it's because you overwrite the contents of the buffer each time you go through the loop (approximately 500,000 times).
If you're asking how the read function is actually implemented, notice that the underlying call includes the keyword native. That means that native code is being called via JNI. The exact implementation is going to be JVM and OS dependent.

A great article on readBytes was published by Michael Schaeffer
In short:
File I/O in Java is implemented by reading into a local buffer, and then copying from the local buffer into the Java byte[] initially passed into int read(byte byf[]). This double-copy is slow, but it also requires heap allocation of a second read buffer, if the read buffer is large than 8K.
Many other helpful details to mention, but it's easier to read it.

Related

FileInputStream read byte by byte or block?

The reason why bufferedinputstream(BIS) is faster than FileInputStream(FIS) provided on Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream? is that
With a BufferedInputStream, the method delegates to an overloaded
read() method that reads 8192 amount of bytes and buffers them until
they are needed while FIS read the single byte
Per my understanding Disk is a 'block device'. The disk is always going to read/write entire blocks, even if the read request is for some smaller amount of data.
Is n't it ? So how even both FIS and BIS will be reading complete block not single byte(as stated for FIS). Right ? So how BIS is faster than FIS ?
The java API of InputStream is what it is. Specifically, it has this method:
int read() throws IOException
which reads a single byte (it returns an int, so that it can return -1 to indicate EOF).
So, if you try to read a SINGLE BYTE from a file, it'll try to do that. In the case of a block device like a harddisk, that'll likely read the entire block, and then chuck everything except that one byte, so, if you call that read() method 8192 times, it reads the same block, over and over, 8192 times, each time chucking away 8191 bytes and giving you just the one you want. Thus, reading 67 million bytes in the entire process. Ouch. Not very efficient.
Given that the kernel, CPU, disk, etc all read in a block size of 8192, there is zero performance difference between a BufferedInputStream(new FileInputStream) and just the new FileInputStream, IF you use something like:
byte[] buffer = new byte[8192];
in.read(buffer);
Now even plain jane unbuffered new FileInputStream just ends up reading that block off of disk just once.
BufferedInputStream does that 'under the hood' even if you use the single-byte form of read(), and will then feed you data from that byte array for the next 8191 calls to read(). That's all BufferedInputStream does.
If you are using the read() (one byte at a time) variant (or the byte-array variant of read, but with really small byte arrays), then BufferedInputStream makes sense. Otherwise, that does nothing and there is no need to put that in there.
NB: As far as I know, java makes no guesses about what the disk buffer size is and just uses some reasonable buffer size. The effect is the same: If using single-byte-at-a-time, wrapping your filestream into a bufferedstream improves performance by a factor 1000+, if you are using the byte array variant, no difference whatsoever.

Best practice to read limited length from an input stream

I have packets that are sent over sockets which are preceded by length headers. So every transmission comes with 4 bytes of length information followed by the packet. Therefore, I must limit my read() to never exceed the length, so that I don't accidentally read the next packet in line.
InputStream in is the input stream, ByteArrayOutputStream byteStream is the stream I write incoming packets into and int len is the length of an incoming transmission.
Here is what I came up with:
for (int i = 0; i < len; i++) {
byteStream.write(in.read());
}
This is absolutely horrific; reading bytes one by one. I was hoping to see if there is a better way of doing this using buffers (kind of similar to the canonical way of copying streams).
Wrap the input stream in a DataInputStream and then use the readInt() and readFully() methods to get the size and bytes of data out.
int len = dis.readInt();
// assuming relatively small byte arrays, use smarter logic for huge streams....
byte[] buffer = new byte[len];
dis.readFully(buffer);
byteStream.write(buffer);

Can InputSteam.read overflow buffer

Does the read command check the size of the buffer when filling it with data or is there a chance that data is lost because buffer isn't big enough? In other words, if there are ten bytes of data available to be read, will the server continue to store the remaining 2 bytes of data until the next read.
I'm just using 8 as an example here to over dramatise the situation.
InputStream stdout;
...
while(condition)
{
...
byte[] buffer = new byte[8];
int len = stdout.read(buffer);
}
No, read() won't lose any data just because you haven't given it enough space for all the available bytes.
It's not clear what you mean by "the server" here, but the final two bytes of a 10 byte message would be available after the first read. (Or possible, the first read() would only read the first six bytes, leaving four still to read, for example.)

what if we exceed the capacity of allocating buffer in ByteBuffer.allocate(48) NIO package class in java

file = new RandomAccessFile("xanadu.txt", "rw");
FileChannel channel = file.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(48);
int byteReads = channel.read(buffer);
SO I am allocating 48 as a capacity in the Buffer. Now
consider the txt file I am reading is of about 10MB , so logically it is crossing the buffer allocation size.
But when we try to read, we will be able to read all the contents of the file despite the size.
SO how this thing is possible.
I am new to this streaming field so may be my question seems to be very basic.
The read call simply won't read more than 48 bytes.
Nothing will overflow, you'll just need to call read repeatedly until you've read all the data you're expecting.
This is stated in the ReadableByteChannel interface docs:
Reads a sequence of bytes from this channel into the given buffer.
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
You need to clear() the buffer after processing its content before passing it back to read.

Style of FileInputStream read method

The FileInputStream read method has the signature (is that the right term?) -
public int read(byte[] b) throws IOException
// Reads up to b.length bytes of data from this input stream into an array of bytes. This method blocks until some input is available.
// Returns: the total number of bytes read into the buffer, or -1 if there is no more data because the end of the file has been reached.
What is the advantage of having a signature like this over something like this -
public byte[] read(int numberOfBytes) throws IOException
// Reads up to numberOfBytes bytes of data from this input stream into an array of bytes.
// Returns- an array of bytes read. Array is empty if there is no more data because the end of the file has been reached.
The first form allows you to reuse the same byte[] array for several executions. Basically you can read the whole stream producing minimal garbage (low GC activity).
The latter is clearly more convenient but requires creating new instance of byte[] every time it is executed internally within read() method. This means that while reading 10 GiB file (even in 100-byte chunk) your application would allocate 10 GiB of memory in total - not at the same time, but still the garbage collector would work like crazy.
Have a look at Collection.toArray(T[]) - it follows the same principle.

Categories