Best practice to read limited length from an input stream - java

I have packets that are sent over sockets which are preceded by length headers. So every transmission comes with 4 bytes of length information followed by the packet. Therefore, I must limit my read() to never exceed the length, so that I don't accidentally read the next packet in line.
InputStream in is the input stream, ByteArrayOutputStream byteStream is the stream I write incoming packets into and int len is the length of an incoming transmission.
Here is what I came up with:
for (int i = 0; i < len; i++) {
byteStream.write(in.read());
}
This is absolutely horrific; reading bytes one by one. I was hoping to see if there is a better way of doing this using buffers (kind of similar to the canonical way of copying streams).

Wrap the input stream in a DataInputStream and then use the readInt() and readFully() methods to get the size and bytes of data out.
int len = dis.readInt();
// assuming relatively small byte arrays, use smarter logic for huge streams....
byte[] buffer = new byte[len];
dis.readFully(buffer);
byteStream.write(buffer);

Related

Does java socket read the data exactly as it's sent

Consider we have a socket connection between two device (A and B). Now if I write only 16 bytes (size doesn't matter here) to the output stream (not BufferedOutputStream) of the socket in side A 3 times or in general more than once like this:
OutputStream outputStream = socket.getOutputStream();
byte[] buffer = new byte[16];
OutputStream.write(buffer);
OutputStream.write(buffer);
OutputStream.write(buffer);
I read the data in side B using the socket input stream (not BufferedInputStream) with a buffer larger than sending buffer for example 1024:
InputStream inputStream = socket.getInputStream();
byte[] buffer = new byte[1024];
int read = inputStream.read(buffer);
Now I wonder how the data is received on side B? May it get accumulated or it exactly read the data as A sends it? In another word may the read variable get more than 16?
InputStream makes very few guarantees about how much data will be read by any invocation of the multi-byte read() methods. There is a whole class of common programming errors that revolve around misunderstandings and wrong assumptions about that. For example,
if InputStream.read(byte[]) reads fewer bytes than the provided array can hold, that does not imply that the end of the stream has been reached, or even that another read will necessarily block.
the number of bytes read by any one invocation of InputStream.read(byte[]) does not necessarily correlate to any characteristic of the byte source on which the stream draws, except that it will always read at least one byte when not at the end of the stream, and that it will not read more bytes than are actually available by the time it returns.
the number of available bytes indicated by the available() method does not reliably indicate how many bytes a subsequent read should or will read. A nonzero return value reliably indicates only that the next read will not block; a zero return value tells you nothing at all.
Subclasses may make guarantees about some of those behaviors, but most do not, and anyway you often do not know which subclass you actually have.
In order to use InputStreams properly, you generally must be prepared to perform repeated reads until you get sufficient data to process. That can mean reading until you have accumulated a specific number of bytes, or reading until a delimiter is encountered. In some cases you can handle any number of bytes from any given read; generally these are cases where you are looping anyway, and feeding everything you read to a consumer that can accept variable length chunks (many compression and encryption interfaces are like that, for example).
Per the docs:
public int read(byte[] b) throws IOException
Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes
actually read is returned as an integer. This method blocks until
input data is available, end of file is detected, or an exception is
thrown. If the length of b is zero, then no bytes are read and 0 is
returned; otherwise, there is an attempt to read at least one byte. If
no byte is available because the stream is at the end of the file, the
value -1 is returned; otherwise, at least one byte is read and stored
into b.
The first byte read is stored into element b[0], the next one into
b[1], and so on. The number of bytes read is, at most, equal to the
length of b. Let k be the number of bytes actually read; these bytes
will be stored in elements b[0] through b[k-1], leaving elements b[k]
through b[b.length-1] unaffected.
Read(...) tells you how many bytes it put into the array and yes, you can read further; you'll get whatever was already there.

Why does an Inpustream read method return an int?

I'm trying to understand what is happening behind the scenes with the read method from an inputstream.
I know I can do the following:
InputStream is = new FileInputStream(new File("myFile.txt"));
byte[] buffer = new byte[8192];
while(is.read(buffer) != -1) {
// do something with the bytes
}
I believe this works by reading up to 8192 bytes into my byte array and then does something with those bytes. But why does the read return an integer of the number of bytes read? Is this purely so that whatever uses the bytes from the byte array knows when to stop looking for bytes?
I guess I'm confused because, for example, one of my reads shows
buffer[0] = 70
buffer[1] = 105
buffer[2] = 108
etc...
But if I'm reading up to 8192 for the entire byte array why are the elements in the array setup like this?
I realize this may be a dumb question, but I'd appreciate any help in understanding this.
The is.read(buffer) method reads either the maximum number of available bytes, or buffer.length(in this case, 8192) bytes, whichever comes first.
In the event that the entire array is not filled (which is possible because the number of available bytes may be less than buffer.length) it may be useful to have the number of bytes that were read so that the array can be safely iterated without throwing a NullPointerException.
The reason you are seeing
buffer[0] = 70
buffer[1] = 105
buffer[2] = 108
etc...
Is because in the stream of bytes being read, the first value is 70, the second is 105, the third is 108, and so on. These are actually the contents of the stream, one byte at a time.
Because, for whatever reason, it may not be able to read all of the bytes right now. So, the java VM returns the number it successfully read so you know what's going on.

How the buffer byte array is continuously filling while streaming?

The below piece of code using to reading for Files
int bytesRead;
byte[] bytes = new byte[1000]; //buffer
FileInputStream fis = new FileInputStream(uploadedFile);
while ((bytesRead = fis.read(bytes)) != -1) {
fis.read(bytes, 0, bytesRead);
}
fis.close();
As per api of read() method
Reads up to b.length bytes of data from this input stream into an array of bytes. This method blocks until some input is available.
There is no where specified that it refills the bytes array,but the stream filling the array until the file successfully read..
But how the internally it's maintaining to get this magic done ??
I saw source code or read method
public int More ...read(byte b[]) throws IOException {
214 return readBytes(b, 0, b.length);
215 }
and readBytes's source code is
200 private native int More ...readBytes
(byte b[], int off, int len) throws IOException;
There is noting mentioned that how bytes ..
I uploaded a 500MB file without any problem,with allocation that 1000 bytes array.
If you're asking why you can read a ~500 MB file with a roughly 1 KB buffer, it's because you overwrite the contents of the buffer each time you go through the loop (approximately 500,000 times).
If you're asking how the read function is actually implemented, notice that the underlying call includes the keyword native. That means that native code is being called via JNI. The exact implementation is going to be JVM and OS dependent.
A great article on readBytes was published by Michael Schaeffer
In short:
File I/O in Java is implemented by reading into a local buffer, and then copying from the local buffer into the Java byte[] initially passed into int read(byte byf[]). This double-copy is slow, but it also requires heap allocation of a second read buffer, if the read buffer is large than 8K.
Many other helpful details to mention, but it's easier to read it.

Can InputSteam.read overflow buffer

Does the read command check the size of the buffer when filling it with data or is there a chance that data is lost because buffer isn't big enough? In other words, if there are ten bytes of data available to be read, will the server continue to store the remaining 2 bytes of data until the next read.
I'm just using 8 as an example here to over dramatise the situation.
InputStream stdout;
...
while(condition)
{
...
byte[] buffer = new byte[8];
int len = stdout.read(buffer);
}
No, read() won't lose any data just because you haven't given it enough space for all the available bytes.
It's not clear what you mean by "the server" here, but the final two bytes of a 10 byte message would be available after the first read. (Or possible, the first read() would only read the first six bytes, leaving four still to read, for example.)

what if we exceed the capacity of allocating buffer in ByteBuffer.allocate(48) NIO package class in java

file = new RandomAccessFile("xanadu.txt", "rw");
FileChannel channel = file.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(48);
int byteReads = channel.read(buffer);
SO I am allocating 48 as a capacity in the Buffer. Now
consider the txt file I am reading is of about 10MB , so logically it is crossing the buffer allocation size.
But when we try to read, we will be able to read all the contents of the file despite the size.
SO how this thing is possible.
I am new to this streaming field so may be my question seems to be very basic.
The read call simply won't read more than 48 bytes.
Nothing will overflow, you'll just need to call read repeatedly until you've read all the data you're expecting.
This is stated in the ReadableByteChannel interface docs:
Reads a sequence of bytes from this channel into the given buffer.
An attempt is made to read up to r bytes from the channel, where r is the number of bytes remaining in the buffer, that is, dst.remaining(), at the moment this method is invoked.
You need to clear() the buffer after processing its content before passing it back to read.

Categories