How do I read x bytes from a stream? - java

I want to read exactly n bytes from a Socket at a time. How can I achieve that?

DataInputStream.readFully()
Of course it may block for an arbitrarily long time...

You can create a helper method to completely fill a buffer. Something like this:
public int fillBufferCompletely(InputStream is, byte[] bytes) throws IOException {
int size = bytes.length;
int offset = 0;
while (offset < size) {
int read = is.read(bytes, offset, size - offset);
if (read == -1) {
if ( offset == 0 ) {
return -1;
} else {
return offset;
}
} else {
offset += read;
}
}
return size;
}
Then you just need to pass in a buffer of size x.
Edit
Michael posted a link to a function which does essentially the same thing. The only difference with mine is that it does have the ability to return less than the buffer length, but only on the condition of an end-of-stream. DataInputStream.readFully would throw a runtime exception in this scenario.
So I'll leave my answer up in case an example of that behaviour is useful.

DataInputStream.readFully() throws an exception on EOF, as Mark Peters points out. But there are two other methods who don't: Commons IO's IOUtils.read() and Guavas ByteStreams.read(). These both try to read up to N bytes, stopping only at EOF, and return how many they actually read.

This is impossible. The underlying platforms cannot guarantee this, so neither can Java. You can attempt to read n bytes, but you always have to be prepared that you get less than what was requested.

Related

Java Inflater will loop infinitely sometimes

In my application, I'm trying to compress/decompress byte array using java's Inflater/Deflater class.
Here's part of the code I used at first:
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
Then after I deployed the code it'll randomly (very rare) cause the whole application hang, and when I took a thread dump, I can identify that one thread hanging
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
- locked java.util.zip.ZStreamRef#fc71443
at java.util.zip.Inflater.inflate(Inflater.java:280)
It doesn't happen very often. Then I googled everywhere and found out it could be some empty byte data passed in the inflater and finished() will never return true.
So I used a workaround, instead of using
while (!inflater.finished())
to determine if it's finished, I used
while (inflater.getRemaining() > 0)
But it happened again.
Now it makes me wonder what's the real reason that causes the issue. There shouldn't be any empty array passed in the inflater, even if it did, how come getRemaining() method did not break the while loop?
Can anybody help pls? It's really bugging me.
Confused by the same problem, I find this page.
This is my workaround for this, it may helps:
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int i = inflater.inflate(buffer);
if (i == 0) {
break;
}
byteArrayOutputStream.write(buffer, 0, i);
}
The javadoc of inflate:
Uncompresses bytes into specified buffer. Returns actual number of bytes uncompressed. A return value of 0 indicates that needsInput() or needsDictionary() should be called in order to determine if more input data or a preset dictionary is required. In the latter case, getAdler() can be used to get the Adler-32 value of the dictionary required.
So #Wildo Luo was certainly right to check for 0 being returned.
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
if (count != 0 ) {
outputStream.write(buffer, 0, count);
} else {
if (inflater.needsInput()) { // Not everything read
inflater.setInput(...);
} else if (inflater.needsDictionary()) { // Dictionary to be loaded
inflater.setDictionary(...);
}
}
}
inflater.end();
I can only imagine that elsewhere the code is not entirely right, maybe on the compression size. Better first check the general code. There is the Inflater(boolean nowrap) requiring an extra byte, the end() call. Exception handling (try-finally). Etcetera.
For unkown data, unknown occurrences: using a try-catch, find compressed data to check whether it is a data based error, and for testing any solution.
Having the same problem...
What I'm sure about:
I'm having an infinite loop, assured with logs printed.
inflater.inflate returns 0, and the output buffer size is 0.
My loop is like this (Hive ORC code):
while (!(inflater.finished() || inflater.needsDictionary() ||
inflater.needsInput())) {
try {
int count = inflater.inflate(out.array(),
out.arrayOffset() + out.position(),
out.remaining());
out.position(count + out.position());
} catch (DataFormatException dfe) {
throw new IOException("Bad compression data", dfe);
}
}
After the out buffer is consumed and its remaining size is 0, the loop will infinitely run.
But I'm not sure about whether it's orc or zlib caused this. On orc side, it fills original data with the same compression buffer size then do the compression, so theoretically it's not possible I get an compressed chunk larger than the buffer size. Possibilities may be zlib or hardware.
That being said, break the loop when count == 0 is dangerous, since there may be still uncompressed data in the inflator.

IndexOutOfBoundsException when reading from stream

I wrote a Java method to send an instruction to a remote device via serial port and get a known number of bytes as the answer. The code runs on RaspberryPi, using librxtx-java library. The remote device was verified to send the answer of expected length.
The code below is the last part of this method where RaspberryPi waits for all the bytes of the answer for up to a given time "t_max".
The code as it is throws an IndexOutOfBoundsException during System.arraycopy. If I wrap the arraycopy instruction by try...catch and print out the pointer variable at catch, there is indeed an index overflow.
However, if I uncomment the line which prints out the pointer value, there is no more exception. Even replacing this line by System.out.println("X"); makes the exception gone, but not does the System.out.print("X"); for example.
I tried changing the variables to volatile but no more luck. How can printing out to terminal change the value of a variable?
long t0 = System.currentTimeMillis();
long t = t0;
byte[] answer = new byte[answerLength];
byte[] readBuffer = new byte[answerLength];
int numBytes = 0;
int answerPointer = 0;
while (t - t0 < t_max) {
try {
if (inputStream.available() > 0) {
numBytes = inputStream.read(readBuffer);
}
} catch (Exception e) {
}
if (numBytes > 0) {
// System.out.println("answerPointer="+answerPointer);
System.arraycopy(readBuffer, 0, answer, answerPointer, numBytes);
answerPointer = answerPointer + numBytes;
}
if (answerPointer == answerLength) {
return (answer);
}
t = System.currentTimeMillis();
}
Have you tried verifying if the output stream and input stream are linked in any way? May be the input stream is reading from the output-stream and '\n' (new line) is being used as the end of stream character. Can you try printing out to a print-stream wrappend around byte-array-output-stream instead of standard-out and see if doing a ps.println("X") causes an exception? If it does cause an exception then possibly the standard output and input stream are linked and that is why doing a System.out.println("X") makes the exception go away.
Also, volatile keyword is used in the context of threads. It will not have any effect in a single thread environment.
If the code inputStream.available() throws an exception on second iteration of while (t - t0 < t_max) variables numBytes and readBuffer stay initialized with old values. Try to wrap all code in block while (t - t0 < t_max) into try {} catch {} and don't hide an exception.

A better way to find out how many bytes are in a stream?

Currently, I am relying on the ObjectInputStream.available() method to tell me how many bytes are left in a stream. Reason for this -- I am writing some unit/integration tests on certain functions that deal with streams and I am just trying to ensure that the available() method returns 0 after I am done.
Unfortunately, upon testing for failure (i.e., I have sent about 8 bytes down the stream) my assertion for available() == 0 is coming up true when it should be false. It should show >0 or 8 bytes!
I know that the available() method is classically unreliable, but I figured it would show something at least > 0!
Is there a more reliable way of checking if a stream is empty or not (The is my main goal here after all)? Perhaps in the Apache IO domain or some other library out there?
Does anyone know why the available() method is so profoundly unreliable; what is the point of it? Or, is there a specific, proper way of using it?
Update:
So, as many of you can read from the comments, the main issue I am facing is that on one end of a stream, I am sending a certain number of bytes but on the other end, not all the bytes are arriving!
Specifically, I am sending 205498 bytes on one end and only getting 204988 on the other, consistently. I am controlling both sides of this operation between threads in a socket, but it should be no matter.
Here is the code I have written to collect all the bytes.
public static int copyStream(InputStream readFrom, OutputStream writeTo, int bytesToRead)
throws IOException {
int bytesReadTotal = 0, bytesRead = 0, countTries = 0, available = 0, bufferSize = 1024 * 4;
byte[] buffer = new byte[bufferSize];
while (bytesReadTotal < bytesToRead) {
if (bytesToRead - bytesReadTotal < bufferSize)
buffer = new byte[bytesToRead - bytesReadTotal];
if (0 < (available = readFrom.available())) {
bytesReadTotal += (bytesRead = readFrom.read(buffer));
writeTo.write(buffer, 0, bytesRead);
countTries = 0;
} else if (countTries < 1000)
try {
countTries++;
Thread.sleep(1L);
} catch (InterruptedException ignore) {}
else
break;
}
return bytesReadTotal;
}
I put the countTries variable in there just to see what happens. Even without countTires in there, it will block forever before it reaches the BytesToRead.
What would cause the stream to suddenly block indefinitely like that? I know on the other end it fully sends the bytes over (as it actually utilizes the same method and I see that it completes the function with the full BytesToRead matching bytesReadTotal in the end. But the receiver doesn't. In fact, when I look at the arrays, they match up perfectly up till the end as well.
UPDATE2
I noticed that when I added a writeTo.flush() at the end of my copyStream method, it seems to work again. Hmm.. Why are flushes so vital in this situation. I.e., why would not using it cause a stream to perma-block?
The available() method only returns how many bytes can be read without blocking (which may be 0). In order to see if there are any bytes left in the stream, you have to read() or read(byte[]) which will return the number of bytes read. If the return value is -1 then you have reached the end of file.
This little code snippet will loop through an InputStream until it gets to the end (read() returns -1). I don't think it can ever return 0 because it should block until it can either read 1 byte or discover there is nothing left to read (and therefore return -1)
int currentBytesRead=0;
int totalBytesRead=0;
byte[] buf = new byte[1024];
while((currentBytesRead =in.read(buf))>0){
totalBytesRead+=currentBytesRead;
}

file comparison with memory consideration

I want to compare two files, one is in file system and the other is being downloaded from a HTTP URL.
We have tried to compare by byte[] arrays (we used HTTPRequestBuilder by Apache), but the concern is that the files may be too large and they may exhaust the memory. Do we have any good alternates.
You can compare the contents from two InputStream objects by reading just a buffer at a time. You'll need to read data as and when you "run out" from each stream, noting that you when you call read you may not end up actually reading a full buffer.
The two streams are equal if each byte-by-byte comparison from the buffers is equal and the streams run out of data at the same time. I suspect the code may be slightly fiddly, but it shouldn't be too bad.
In fact, for simpler code, if you wrap each InputStream in a BufferedInputStream, you could probably just compare byte-by-byte (calling the parameterless read() method on each iteration) without losing too much performance:
public boolean equals(InputStream x, InputStream y)
{
// TODO: Only wrap them if they're not already buffered
x = new BufferedInputStream(x);
y = new BufferedInputStream(y);
while (true)
{
int xValue = x.read();
int yValue = y.read();
if (xValue != yValue)
{
return false;
}
if (xValue == -1)
{
// Reached the end of both streams at the same time
return true;
}
}
}

How can I increase performance on reading the InputStream?

This very well may just be a KISS moment, but I feel like I should ask anyway.
I have a thread and it's reading from a sockets InputStream. Since I am dealing in particularly small data sizes (as in the data that I can expect to recieve from is in the order of 100 - 200 bytes), I set the buffer array size to 256. As part of my read function I have a check that will ensure that when I read from the InputStream that I got all of the data. If I didn't then I will recursively call the read function again. For each recursive call I merge the two buffer arrays back together.
My problem is, while I never anticipate using more than the buffer of 256, I want to be safe. But if sheep begin to fly and the buffer is significantly more the read the function (by estimation) will begin to take an exponential curve more time to complete.
How can I increase the effiency of the read function and/or the buffer merging?
Here is the read function as it stands.
int BUFFER_AMOUNT = 256;
private int read(byte[] buffer) throws IOException {
int bytes = mInStream.read(buffer); // Read the input stream
if (bytes == -1) { // If bytes == -1 then we didn't get all of the data
byte[] newBuffer = new byte[BUFFER_AMOUNT]; // Try to get the rest
int newBytes;
newBytes = read(newBuffer); // Recurse until we have all the data
byte[] oldBuffer = new byte[bytes + newBytes]; // make the final array size
// Merge buffer into the begining of old buffer.
// We do this so that once the method finishes, we can just add the
// modified buffer to a queue later in the class for processing.
for (int i = 0; i < bytes; i++)
oldBuffer[i] = buffer[i];
for (int i = bytes; i < bytes + newBytes; i++) // Merge newBuffer into the latter half of old Buffer
oldBuffer[i] = newBuffer[i];
// Used for the recursion
buffer = oldBuffer; // And now we set buffer to the new buffer full of all the data.
return bytes + newBytes;
}
return bytes;
}
EDIT: Am I being paranoid (unjustifiedly) and should just set the buffer to 2048 and call it done?
BufferedInputStream, as noted by Roland, and DataInputStream.readFully(), which replaces all the looping code.
int BUFFER_AMOUNT = 256;
Should be final if you don't want it changing at runtime.
if (bytes == -1) {
Should be !=
Also, I'm not entirely clear on what you're trying to accomplish with this code. Do you mind shedding some light on that?
I have no idea what you mean by "small data sizes". You should measure whether the time is spent in kernel mode (then you are issuing too many reads directly on the socket) or in user mode (then your algorithm is too complicated).
In the former case, just wrap the input with a BufferedInputStream with 4096 bytes of buffer and read from it.
In the latter case, just use this code:
/**
* Reads as much as possible from the stream.
* #return The number of bytes read into the buffer, or -1
* if nothing has been read because the end of file has been reached.
*/
static int readGreedily(InputStream is, byte[] buf, int start, int len) {
int nread;
int ptr = start; // index at which the data is put into the buffer
int rest = len; // number of bytes that we still want to read
while ((nread = is.read(buf, ptr, rest)) > 0) {
ptr += nread;
rest -= nread;
}
int totalRead = len - rest;
return (nread == -1 && totalRead == 0) ? -1 : totalRead;
}
This code completely avoids creating new objects, calling unnecessary methods and furthermore --- it is straightforward.

Categories