InputStream.read(byte[], 0 length) stops early? - java

I have been writing something to read a request stream (containing gzipped data) from an incoming HttpServletRequest ('request' below), however it appears that the normal InputStream read method doesn't actually read all content?
My code was:
InputStream requestStream = request.getInputStream();
if ((length = request.getContentLength()) != -1)
{
received = new byte[length];
requestStream.read(received, 0, length);
}
else
{
// create a variable length list of bytes
List<Byte> bytes = new ArrayList<Byte>();
boolean endLoop = false;
while (!endLoop)
{
// try and read the next value from the stream.. if not -1, add it to the list as a byte. if
// it is, we've reached the end.
int currentByte = requestStream.read();
if (currentByte != -1)
bytes.add((byte) currentByte);
else
endLoop = true;
}
// initialize the final byte[] to the right length and add each byte into it in the right order.
received = new byte[bytes.size()];
for (int i = 0; i < bytes.size(); i++)
{
received[i] = bytes.get(i);
}
}
What I found during testing was that sometimes the top part (for when a content length is present) would just stop reading part way through the incoming request stream and leave the remainder of the 'received' byte array blank. If I just make it run the else part of the if statement at all times, it reads fine and all the expected bytes are placed in 'received'.
So, it seems like I can just leave my code alone now with that change, but does anyone have any idea why the normal 'read'(byte[], int, int)' method stopped reading? The description says that it may stop if an end of file is present. Could it be that the gzipped data just happened to include bytes matching whatever the signature for that looks like?

You need to add a while loop at the top to get all the bytes. The stream will attempt to read as many bytes as it can, but it is not required to return len bytes at once:
An attempt is made to read as many as len bytes, but a smaller number may be read, possibly zero.
if ((length = request.getContentLength()) != -1)
{
received = new byte[length];
int pos = 0;
do {
int read = requestStream.read(received, pos, length-pos);
// check for end of file or error
if (read == -1) {
break;
} else {
pos += read;
}
} while (pos < length);
}
EDIT: fixed while.

You need to see how much of the buffer was filled. Its only guaranteed to give you at at least one byte.
Perhaps what you wanted was DataInputStream.readFully();

Related

Java.util.zip.DataFormatException: incorrect header check

First post, usually I find what Im looking for in other threads but not this time:
Im using javas Deflater and Inflater to compress/ decompress some data I send between a server and client application that Im working on.
It works just fine for 99% of my tests. However there is one particular dataset that when inflated throws this exception from the inflater.inflate() method:
DataFormatException: incorrect header check
There is nothing special about the data compared to the other runs. Its just a bunch of numbers seperated by commas "encoded" as a String and then done .getBytes() to. The only thing I know is that its a bit bigger this time. There is not encoding happening anywhere between the compression -> decompression steps.
This is the code to send something to either the client or the server. The code is shared.
OutputStream outputStream = new DataOutputStream(socket.getOutputStream());
byte[] uncompressed = SOMEJSON.toString().getBytes();
int realLength = uncompressed.length;
// compress data
byte[] compressedData = ByteCompression.compress(uncompressed);
int compressedLength = compressedData.length;
outputStream.write(ByteBuffer.allocate(Integer.BYTES).putInt(compressedLength).array());
outputStream.write(ByteBuffer.allocate(Integer.BYTES).putInt(realLength).array());
outputStream.write(compressedData);
outputStream.flush();
This is the code to receive data (either client or server) also shared:
DataInputStream dataIn = new DataInputStream(socket.getInputStream());
int compressedLength = dataIn.readInt();
int realLength = dataIn.readInt();
errorhandling.info("Packet Reader", "Expecting " + compressedLength + " (" + realLength + ") bytes.");
byte[] compressedData = new byte[compressedLength];
int readBytes = 0;
while (readBytes < compressedLength) {
int newByteAmount = dataIn.read(compressedData);
// catch nothing being read or end of line
if (newByteAmount <= 0) {
break;
}
readBytes += newByteAmount;
}
if (readBytes != compressedLength) {
errorhandling.info("Packet Reader", "Read byte amount differs from expected bytes.");
return new ErrorPacket("Read byte amount differs from expected bytes.").create();
}
byte[] uncompressedData = ByteCompression.decompress(compressedData, realLength);
String packetData = new String(uncompressedData);
Here are the methods to compress and decompress a byteArray (you guessed right its shared):
public static byte[] compress(byte[] uncompressed) {
Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION);
deflater.setInput(uncompressed);
deflater.finish();
byte[] compressed = new byte[uncompressed.length];
int compressedSize = 0;
while (!deflater.finished()) {
compressedSize += deflater.deflate(compressed);
}
deflater.end();
return Arrays.copyOfRange(compressed, 0, compressedSize);
}
public static byte[] decompress(byte[] compressed, int realLength) throws DataFormatException {
Inflater inflater = new Inflater(true);
inflater.setInput(compressed);
byte[] uncompressed = new byte[realLength];
while (!inflater.finished()) {
inflater.inflate(uncompressed); // throws DataFormatException: incorrect header check (but only super rarely)
}
inflater.end();
return uncompressed;
}
So far Ive tried differnt compression levels and messing with the "nowrap" option for both Deflater and Inflater (all combinations):
// [...]
Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION, true);
// [...]
Inflater inflater = new Inflater(true);
But that just results in these exceptions (but again only for that one particulat dataset):
DataFormatException: invalid stored block lengths
DataFormatException: invalid distance code
Im sorry for this wall of text but at this point I really dont know anymore what could be causing this issue.
Alright here is the solution:
My assumption was that this loop would APPEND new read data to the byte array where it last stopped THIS IS NOT THE CASE (it seems to stop reading after 2^16 bytes so thats why I dont get this issue with smaller packets).
This is wrong:
int readBytes = 0;
while (readBytes < compressedLength) {
int newByteAmount = dataIn.read(compressedData); // focus here!
readBytes += newByteAmount;
}
So whats happening is that the data is read correctly however the output array is overwriting itself!! Thats why I see wrong data at the start and a bunch of 00 00 at the end (because it never actually reached that part of the array)!
Using this instead fixed my issue:
dataIn.readFully(compressedData);
What concerns me is that I see the first variant of the code A LOT. Thats what I found when googling it.

I can't get all bytes from website

I'm trying to read all bytes from a web site but I think I don't get all bytes. I give a high value for bytes array length. I used this method but it always returns an exception.
Here is the code:
DataInputStream dis = new DataInputStream(s2.getInputStream());
byte[] bytes = new byte[900000];
// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead=dis.read(bytes, offset, bytes.length-offset)) >= 0) {
offset += numRead;
}
// Ensure all the bytes have been read in
if (offset < bytes.length) {
throw new IOException("Could not completely read website");
}
out.write(bytes);
Edited Version:
ByteArrayOutputStream bais = new ByteArrayOutputStream();
InputStream is = null;
try {
is = s2.getInputStream();
byte[] byteChunk = new byte[4096]; // Or whatever size you want to read in at a time.
int n;
while ( (n = is.read(byteChunk)) > 0 ) {
bais.write(byteChunk, 0, n);
}
}
catch (IOException e) {
System.err.printf ("Failed while reading bytes");
e.printStackTrace ();
// Perform any other exception handling that's appropriate.
}
finally {
if (is != null) { is.close(); }
}
byte[] asd = bais.toByteArray();
out.write(asd);
This is the problem:
if (offset < bytes.length)
You'll only trigger that if the original data is more than 900,000 bytes. If the response is entirely complete in less than that, read() will return -1 correctly to indicate the end of the stream.
You should actually be throwing an exception if offset is equal to bytes.length, as that indicates that you might have truncated data :)
It's not clear where you got the 900,000 value from, mind you...
I would suggest that if you want to stick with the raw stream, you use Guava's ByteStreams.toByteArray method to read all the data. Alternatively, you could keep looping round, reading into a smaller buffer, writing into a ByteArrayOutputStream on each iteration.
I realise this doesn't answer your specific question. However I really wouldn't hand-code this sort of thing, when libraries such as HttpClient exist and are debugged/profiled etc.
e.g. here's how to use the fluent interface
Request.Get("http://targethost/homepage").execute().returnContent();
JSoup is an alternative if you're dealing with grabbing and scraping HTML.

Java: Issue with available() method of BufferedInputStream

I'm dealing with the following code that is used to split a large file into a set of smaller files:
FileInputStream input = new FileInputStream(this.fileToSplit);
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output = new FileOutputStream(fileArr[i]);
BufferedOutputStream oBuff = new BufferedOutputStream(output);
int buffSize = 8192;
byte[] buffer = new byte[buffSize];
while (true) {
if (iBuff.available() < buffSize) {
byte[] newBuff = new byte[iBuff.available()];
iBuff.read(newBuff);
oBuff.write(newBuff);
oBuff.flush();
oBuff.close();
break;
}
int r = iBuff.read(buffer);
if (fileArr[i].length() >= this.partSize) {
oBuff.flush();
oBuff.close();
++i;
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
}
oBuff.write(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
This is the weird behavior I'm seeing... when I run this code using a 3GB file, the initial iBuff.available() call returns a value of a approximatley 2,100,000,000 and the code works fine. When I run this code on a 12GB file, the initial iBuff.available() call only returns a value of 200,000,000 (which is smaller than the split file size of 500,000,000 and causes the processing to go awry).
I'm thinking this discrepancy in behvaior has something to do with the fact that this is on 32-bit windows. I'm going to run a couple more tests on a 4.5 GB file and a 3.5 GB file. If the 3.5 file works and the 4.5 one doesn't, that will further confirm the theory that it's a 32bit vs 64bit issue since 4GB would then be the threshold.
Well if you read the javadoc it quite clearly states:
Returns the number of bytes that can
be read from this input stream
without blocking (emphasis added by me)
So it's quite clear that what you want is not what this method offers. So depending on the underlying InputStream you may get problems much earlier (eg a stream over the network with a server that doesn't return the filesize - you'd have to read the complete file and buffer it just to return the "correct" available() count, which would take a lot of time - what if you only want to read a header?)
So the correct way to handle this is to change your parsing method to be able to handle the file in pieces. Personally I don't see much reason at all to even use available() here - just calling read() and stopping as soon as read() returns -1 should work fine. Can be made more complicated if you want to assure that every file really contains blockSize byte - just add an internal loop if that scenario is important.
int blockSize = XXX;
byte[] buffer = new byte[blockSize];
int i = 0;
int read = in.read(buffer);
while(read != -1) {
out[i++].write(buffer, 0, read);
read = in.read(buffer);
}
There are few correct uses of available(), and this isn't one of them. You don't need all that junk. Memorize this:
int count;
byte[] buffer = new byte[8192]; // or more
while ((count = in.read(buffer)) > 0)
out.write(buffer, 0, count);
That's the canonical way to copy a stream in Java.
You should not use the InputStream.available() function at all. It is only needed in very special circumstances.
You should also not create byte arrays that are larger than 1 MB. It's a waste of memory. The commonly accepted way is to read a small block (4 kB up to 1 MB) from the source file and then store only as many bytes as you have read in the destination file. Do that until you have reached the end of the source file.
available isn't a measure of how much is still to be read but more a measure how much is guaranteed to be able to read before it might EOF or block waiting for input
and put close calls in the finallies
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output;
BufferedOutputStream oBuff=0;
try{
int buffSize = 8192;
int offset=0;
byte[] buffer = new byte[buffSize];
while(true){
int len = iBuff.read(buffer,offset,buffSize-offset);
if(len==-1){//EOF write out last chunk
oBuff.write(buffer,0,offset);
break;
}
if(len+offset==buffSize){//end of buffer write out to file
try{
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
oBuff.write(buffer);
}finally{
oBuff.close();
}
++i;
offset=0;
}
offset+=len;
}//while
}finally{
iBuff.close();
}
Here is some code that splits a file. If performance is critical to you, you can experiment with the buffer size.
package so6164853;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Formatter;
public class FileSplitter {
private static String printf(String fmt, Object... args) {
Formatter formatter = new Formatter();
formatter.format(fmt, args);
return formatter.out().toString();
}
/**
* #param outputPattern see {#link Formatter}
*/
public static void splitFile(String inputFilename, long fragmentSize, String outputPattern) throws IOException {
InputStream input = new FileInputStream(inputFilename);
try {
byte[] buffer = new byte[65536];
int outputFileNo = 0;
OutputStream output = null;
long writtenToOutput = 0;
try {
while (true) {
int bytesToRead = buffer.length;
if (bytesToRead > fragmentSize - writtenToOutput) {
bytesToRead = (int) (fragmentSize - writtenToOutput);
}
int bytesRead = input.read(buffer, 0, bytesToRead);
if (bytesRead != -1) {
if (output == null) {
String outputName = printf(outputPattern, outputFileNo);
outputFileNo++;
output = new FileOutputStream(outputName);
writtenToOutput = 0;
}
output.write(buffer, 0, bytesRead);
writtenToOutput += bytesRead;
}
if (output != null && (bytesRead == -1 || writtenToOutput == fragmentSize)) {
output.close();
output = null;
}
if (bytesRead == -1) {
break;
}
}
} finally {
if (output != null) {
output.close();
}
}
} finally {
input.close();
}
}
public static void main(String[] args) throws IOException {
splitFile("d:/backup.zip", 1440 << 10, "d:/backup.zip.part%04d");
}
}
Some remarks:
Only those bytes that have actually been read from the input file are written to one of the output files.
I left out the BufferedInputStream and BufferedOutputStream since their buffer's size is only 8192 bytes, which less than the buffer I use in the code.
As soon as I open a file, I make sure that it will be closed at the end, no matter what happens. (The finally blocks.)
The code contains only one call to input.read and only one call to output.write. This makes it easier to check for correctness.
The code for splitting a file does not catch the IOException, since it doesn't know what to do in such a case. It is just passed to the caller; maybe the caller knows how to handle it.
Both #ratchet and #Voo are correct.
As for what is happening.
int max value is 2,147,483,647 (http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html).
14 gigabytes is 15,032,385,536 which clearly don't fit an int.
See that according to the API Javadoc (http://download.oracle.com/javase/6/docs/api/java/io/BufferedInputStream.html#available%28%29) and as stated by #Voo, this don't break the method contract at all (just isn't what you are looking for).

read a file byte by byte then perform some operation every n bytes

I would like to know how can I read a file byte by byte then perform some operation every n bytes.
for example:
Say I have a file of size = 50 bytes, I want to divide it into blocks each of n bytes. Then each block is sent to a function for some operations to be done on those bytes. The blocks are to be created during the read process and sent to the function when the block reaches n bytes so that I don`t use much memory for storing all blocks.
I want the output of the function to be written/appended on a new file.
This is what I've reached to read, yet I don't know it it is right:
fc = new JFileChooser();
File f = fc.getSelectedFile();
FileInputStream in = new FileInputStream(f);
byte[] b = new byte[16];
in.read(b);
I haven't done anything yet for the write process.
You're on the right lines. Consider wrapping your FileInputStream with a BufferedInputStream, which improve I/O efficiency by reading the file in chunks.
The next step is to check the number of bytes read (returned by your call to read) and to hand-off the array to the processing function. Obviously you'll need to pass the number of bytes read to this method too in case the array was only partially populated.
So far your code looks OK. For reading binary files (as opposed to text files) you should indeed use FileInputStream (for reading text files, you should use a Reader, such as FileReader).
Note that you should check the return value from in.read(b);, because it might read less than 16 bytes if there are less than 16 bytes left at the end of the file.
Ofcourse you should add a loop to the program that keeps reading blocks of bytes until you reach the end of the file.
To write data to a binary file, use FileOutputStream. That class has a constructor that you can pass a flag to indicate that you want to append to an existing file:
FileOutputStream out = new FileOutputStream("output.bin", true);
Also, don't forget to call close() on the FileInputStream and FileOutputStream when you are done.
See the Java API documentation, especially the classes in the java.io package.
I believe that this will work:
final int blockSize = // some calculation
byte[] block = new byte[blockSize];
InputStream is = new FileInputStream(f);
try {
int ret = -1;
do {
int bytesRead = 0;
while (bytesRead < blockSize) {
ret = is.read(block, bytesRead, blockSize - bytesRead);
if (ret < 0)
break; // no more data
bytesRead += ret;
}
myFunction(block, bytesRead);
} while (0 <= ret);
}
finally {
is.close();
}
This code will call myFunction with blockSize bytes for all but possibly the last invocation.
It's a start.
You should check what read() returns. It can read fewer bytes than the size of the array, and also indicate that the end of the file is reached.
Obviously, you need to read() in a loop...
It might be a good idea to reuse the array, but that requires that the part that reads the array copies what it needs, rather than just keeping a reference to the array.
I think this is what you migth need
void readFile(String path, int n) {
try {
File f = new File(path);
FileInputStream fis = new FileInputStream(f);
int ret = 0;
byte[] array = new byte[n];
while(ret > -1) {
ret = fis.read(array);
doSomething(array, ret);
}
fis.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

Issue in reading data from Socket

I am facing some problem during reading data from socket If there is some null data in socket stream so the DataInputStream would not read the full data and the so at the receiving end there is exception for parsing data.
What is the right way to read the data from socket so there is no loss of data at any time ?
Thanks in advance.
You should post the code being used to read from the socket, but to me the most likely case is that the reading code is incorrectly interpreting a 0 byte as the end of the stream similar to this code
InputStream is = ...;
int val = is.read();
while (0 != (val = is.read()) {
// do something
}
But the end of stream indicator is actually -1
InputStream is = ...;
int val = is.read();
while (-1 != (val = is.read()) {
// do something
}
EDIT: in response to your comment on using isavailable(). I assume you mean available() since there is no method on isavailable() InputStream. If you're using available to detect the end of the stream, that is also wrong. That function only tells you how many bytes can be read without blocking (i.e. how many are currently in the buffer), not how many bytes there are left in the stream.
I personally prefer ObjectInputStream over DataInputStream since it can handle all types including strings, arrays, and even objects.
Yes, you can read an entire object only by 1 line receive.readObject(), but don't forget to type-cast the returned object.
read() mightbe easier since you read the whole thing in 1 line, but not accurate. read the data one by one, like this:
receive.readBoolean()
receive.readInt()
receive.readChar()
etc..
String finalString = new String("");
int finalSize = remainingData;//sizeOfDataN;
int reclen = 0;
int count_for_breaking_loop_for_reading_data = 0;
boolean for_parsing_data = true;
while(allDataReceived == false) {
ByteBuffer databuff = ByteBuffer.allocate(dis.available());
// System.out.println("bis.availbale is "+dis.available());
databuff.clear();
databuff.flip();
dis.read(databuff.array());
String receivedStringN = trimNull(databuff.array());
finalString = finalString + receivedStringN;
System.out.println("final string length "+finalString.length());
if(finalString.length() == finalSize) {
allDataReceived = true;
}
count_for_breaking_loop_for_reading_data++;
if(count_for_breaking_loop_for_reading_data > 1500) {
For_parsing_data = false;
Break;
}
}

Categories