First post, usually I find what Im looking for in other threads but not this time:
Im using javas Deflater and Inflater to compress/ decompress some data I send between a server and client application that Im working on.
It works just fine for 99% of my tests. However there is one particular dataset that when inflated throws this exception from the inflater.inflate() method:
DataFormatException: incorrect header check
There is nothing special about the data compared to the other runs. Its just a bunch of numbers seperated by commas "encoded" as a String and then done .getBytes() to. The only thing I know is that its a bit bigger this time. There is not encoding happening anywhere between the compression -> decompression steps.
This is the code to send something to either the client or the server. The code is shared.
OutputStream outputStream = new DataOutputStream(socket.getOutputStream());
byte[] uncompressed = SOMEJSON.toString().getBytes();
int realLength = uncompressed.length;
// compress data
byte[] compressedData = ByteCompression.compress(uncompressed);
int compressedLength = compressedData.length;
outputStream.write(ByteBuffer.allocate(Integer.BYTES).putInt(compressedLength).array());
outputStream.write(ByteBuffer.allocate(Integer.BYTES).putInt(realLength).array());
outputStream.write(compressedData);
outputStream.flush();
This is the code to receive data (either client or server) also shared:
DataInputStream dataIn = new DataInputStream(socket.getInputStream());
int compressedLength = dataIn.readInt();
int realLength = dataIn.readInt();
errorhandling.info("Packet Reader", "Expecting " + compressedLength + " (" + realLength + ") bytes.");
byte[] compressedData = new byte[compressedLength];
int readBytes = 0;
while (readBytes < compressedLength) {
int newByteAmount = dataIn.read(compressedData);
// catch nothing being read or end of line
if (newByteAmount <= 0) {
break;
}
readBytes += newByteAmount;
}
if (readBytes != compressedLength) {
errorhandling.info("Packet Reader", "Read byte amount differs from expected bytes.");
return new ErrorPacket("Read byte amount differs from expected bytes.").create();
}
byte[] uncompressedData = ByteCompression.decompress(compressedData, realLength);
String packetData = new String(uncompressedData);
Here are the methods to compress and decompress a byteArray (you guessed right its shared):
public static byte[] compress(byte[] uncompressed) {
Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION);
deflater.setInput(uncompressed);
deflater.finish();
byte[] compressed = new byte[uncompressed.length];
int compressedSize = 0;
while (!deflater.finished()) {
compressedSize += deflater.deflate(compressed);
}
deflater.end();
return Arrays.copyOfRange(compressed, 0, compressedSize);
}
public static byte[] decompress(byte[] compressed, int realLength) throws DataFormatException {
Inflater inflater = new Inflater(true);
inflater.setInput(compressed);
byte[] uncompressed = new byte[realLength];
while (!inflater.finished()) {
inflater.inflate(uncompressed); // throws DataFormatException: incorrect header check (but only super rarely)
}
inflater.end();
return uncompressed;
}
So far Ive tried differnt compression levels and messing with the "nowrap" option for both Deflater and Inflater (all combinations):
// [...]
Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION, true);
// [...]
Inflater inflater = new Inflater(true);
But that just results in these exceptions (but again only for that one particulat dataset):
DataFormatException: invalid stored block lengths
DataFormatException: invalid distance code
Im sorry for this wall of text but at this point I really dont know anymore what could be causing this issue.
Alright here is the solution:
My assumption was that this loop would APPEND new read data to the byte array where it last stopped THIS IS NOT THE CASE (it seems to stop reading after 2^16 bytes so thats why I dont get this issue with smaller packets).
This is wrong:
int readBytes = 0;
while (readBytes < compressedLength) {
int newByteAmount = dataIn.read(compressedData); // focus here!
readBytes += newByteAmount;
}
So whats happening is that the data is read correctly however the output array is overwriting itself!! Thats why I see wrong data at the start and a bunch of 00 00 at the end (because it never actually reached that part of the array)!
Using this instead fixed my issue:
dataIn.readFully(compressedData);
What concerns me is that I see the first variant of the code A LOT. Thats what I found when googling it.
Related
I am trying to create a github webhook. It sends a payload every time I publish a new package to one of my repositories. My issue is that that I cannot seem to be able to read in the whole body. It gets cut off at the same number of bytes each time. However, I can see the whole body if I read it using HttpServletRequest#getReader(). Is there something I am doing wrong when trying to read the input stream?
Here is the code for reading the body:
byte[] bodyBytes = new byte[request.getContentLength()];
System.out.println(request.getContentLength());
request.getInputStream().read(bodyBytes);
//System.out.println(request.getReader().readLine()); //works correctly
try (FileWriter fw = new FileWriter(new File("./payload.txt"))) {
for(byte i : bodyBytes)
fw.write("0x" + String.format("%02x ", i) + " ");
fw.write("\n\n\n");
fw.write(new String(bodyBytes));
}
As per the Javadocs, InputStream.read(byte[]) will read at least one byte, when available, and at most as many as the size of the byte array argument. It may read less for any reason, in which case you have to call it repeatedly to get the entire content. Simplest case: write to a ByteArrayOutputStream:
byte[] buf = new byte[1024];
int r;
InputStream is = request.getInputStream();
try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
while ((r = is.read(buf)) >= 0) {
baos.write(buf, 0, r);
}
// the bytes are now accessible:
byte[] entireContent = baos.getBytes();
}
This is the principle; it has the disadvantage that it stores the entire content into memory. You may want to process each "batch" of the input and write it to the file instead of in memory, e.g. as:
byte[] buf = new byte[1024];
int r, i;
InputStream is = request.getInputStream();
try (FileWriter fw = new FileWriter(new File("./payload.txt"))) {
while ((r = is.read(buf)) >= 0) {
for (i=0; i < r; i++) {
fw.write("0x" + String.format("%02x ", buf[i]) + " ");
}
// *************** NOTE ****************************
// Apparently you need the entire content as well, so
// this kind of streaming does not apply in this case.
// You have to store the entire content in memory.
// Keeping the code here as an example/reference.
}
}
I wanted to use Base64.java to encode and decode files. Encode.wrap(InputStream) and decode.wrap(InputStream) worked but runned slowly. So I used following code.
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException {
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE]; //final int BUFF_SIZE = 1024;
byte[] outBuff = null;
while (in.read(inBuff) > 0) {
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
out.flush();
out.close();
in.close();
}
However, it always throws
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit
at java.util.Base64$Decoder.decode0(Base64.java:704)
at java.util.Base64$Decoder.decode(Base64.java:526)
at Base64Coder.JavaBase64FileCoder.decodeFile(JavaBase64FileCoder.java:69)
...
After I changed final int BUFF_SIZE = 1024; into final int BUFF_SIZE = 3*1024;, the code worked. Since "BUFF_SIZE" is also used to encode file, I believe there were something wrong with the file encoded (1024 % 3 = 1, which means paddings are added in the middle of the file).
Also, as #Jon Skeet and #Tagir Valeev mentioned, I should not ignore the return value from InputStream.read(). So, I modified the code as below.
(However, I have to mention that the code does run much faster than using wrap(). I noticed the speed difference because I had coded and intensively used Base64.encodeFile()/decodeFile() long before jdk8 was released. Now, my buffed jdk8 code runs as fast as my original code. So, I do not know what is going on with wrap()... )
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException
{
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE];
byte[] outBuff = null;
int bytesRead = 0;
while (true)
{
bytesRead = in.read(inBuff);
if (bytesRead == BUFF_SIZE)
{
outBuff = decoder.decode(inBuff);
}
else if (bytesRead > 0)
{
byte[] tempBuff = new byte[bytesRead];
System.arraycopy(inBuff, 0, tempBuff, 0, bytesRead);
outBuff = decoder.decode(tempBuff);
}
else
{
out.flush();
out.close();
in.close();
return;
}
out.write(outBuff);
}
}
Special thanks to #Jon Skeet and #Tagir Valeev.
I strongly suspect that the problem is that you're ignoring the return value from InputStream.read, other than to check for the end of the stream. So this:
while (in.read(inBuff) > 0) {
// This always decodes the *complete* buffer
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
should be
int bytesRead;
while ((bytesRead = in.read(inBuff)) > 0) {
outBuff = decoder.decode(inBuff, 0, bytesRead);
out.write(outBuff);
}
I wouldn't expect this to be any faster than using wrap though.
Try to use decode.wrap(new BufferedInputStream(new FileInputStream(inputFileName))). With buffering it should be at least as fast as your manually crafted version.
As for why your code doesn't work: that's because the last chunk is likely to be shorter than 1024 bytes, but you try to decode the whole byte[] array. See the #JonSkeet answer for details.
Well, I changed
"final int BUFF_SIZE = 1024;"
into
"final int BUFF_SIZE = 1024 * 3;"
It worked!
So, I guess probabaly there is something wrong with padding... I mean, when encoding the file, (since 1024 % 3 = 1) there must be paddings. And those might raise problems when decoding...
You should records the number of bytes you have read, beside this,
You should be sure that your buffer size is divisible for 3, cause in Base64, every 3 bytes have four output(64 is 2^6, and 3*8 equals 4*6), by doing this, you can avoid padding problems.( In this way your output will not have the wrong ending of "=")
I have a client/server application that sends/receives data using BufferedOutputStream / BufferedInputStream . The protocol of communication is the following:
Send part :
first byte is the action to perform
next 4 bytes are the length of the message
next x bytes (x=length of message) are the message itself
Receive part :
read first byte to get the action
read the next 4 bytes to get the message length
read the x (obtained on prev step) bytes to get the message
Now the problem is that sometimes when i sent the length of the message (ex : 23045) on server part when i receive it i get a huge int (ex: 123106847).
A important clue is that this happens only when message exceeds a number of characters (in my case > 10K ) , if i sent a smaller message (ex 4-5k) everything works as expected.
Client send part (outputStream/inputStream are the type BufferedXXXStream):
private String getResponseFromServer( NormalizerActionEnum action, String message) throws IOException{
writeByte( action.id());
writeString( message);
flush(;
return read();
}
private String read() throws IOException{
byte[] msgLen = new byte[4];
inputStream.read(msgLen);
int len = ByteBuffer.wrap(msgLen).getInt();
byte[] bytes = new byte[len];
inputStream.read(bytes);
return new String(bytes);
}
private void writeByte( byte msg) throws IOException{
outputStream.write(msg);
}
private void writeString( String msg) throws IOException{
byte[] msgLen = ByteBuffer.allocate(4).putInt(msg.length()).array();
outputStream.write(msgLen);
outputStream.write(msg.getBytes());
}
private void flush() throws IOException{
outputStream.flush();
}
Server part (_input/_output are the type BufferedXXXStream)
private byte readByte() throws IOException, InterruptedException {
int b = _input.read();
while(b==-1){
Thread.sleep(1);
b = _input.read();
}
return (byte) b;
}
private String readString() throws IOException, InterruptedException {
byte[] msgLen = new byte[4];
int s = _input.read(msgLen);
while(s==-1){
Thread.sleep(1);
s = _input.read(msgLen);
}
int len = ByteBuffer.wrap(msgLen).getInt();
byte[] bytes = new byte[len];
s = _input.read(bytes);
while(s==-1){
Thread.sleep(1);
s = _input.read(bytes);
}
return new String(bytes);
}
private void writeString(String message) throws IOException {
byte[] msgLen = ByteBuffer.allocate(4).putInt(message.length()).array();
_output.write(msgLen);
_output.write(message.getBytes());
_output.flush();
}
....
byte cmd = readByte();
String message = readString();
Any help will be greatly appreciated. If you need additional details let me know.
UPDATE: Due to comments from Jon Skeet and EJP i realized that the read part on the server was having some pointless operations but letting this aside i finally got what the problem was: the key thing is that i keep the streams opened for the full length of the app and the first several times i sent the message length i'm able to read it on the server side BUT as Jon Skeet pointed out the data doesn't arrive all at once so when i try to read the message length again i'm actually reading from the message itself that is why i have bogus message lengths .
~ instead of sending the data length and then reading it all at once i sent it without the length and i read one byte at a time till the end of the string which works perfectly
private String readString() throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
byte[] bytes = new byte[100];
int s = 0;
int index=0;
while(true){
s = _input.read();
if(s == 10){
break;
}
bytes[index++] = (byte) (s);
if(index == bytes.length){
sb.append(new String(bytes));
bytes = new byte[100];
index=0;
}
}
if(index > 0){
sb.append(new String(Arrays.copyOfRange(bytes, 0, index)));
}
return sb.toString();
}
Look at this:
byte[] bytes = new byte[len];
s = _input.read(bytes);
while(s==-1){
Thread.sleep(1);
s = _input.read(bytes);
}
return new String(bytes);
Firstly, the loop is pointless: the only time read will return -1 is if it's closed, in which case looping isn't going to help you.
Secondly, you're ignoring the possibility that the data will come in more than one chunk. You're assuming that if you've managed to get any data, you've got all the data. Instead, you should loop something like this:
int bytesRead = 0;
while (bytesRead < bytes.length) {
int chunk = _input.read(bytes, bytesRead, bytes.length - bytesRead);
if (chunk == -1) {
throw new IOException("Didn't get as much data as we should have");
}
bytesRead += chunk;
}
Note that all your other InputStream.read calls also assume that you've managed to read data, and indeed that you've read all the data you need.
Oh, and you're using the platform-default encoding to convert between binary data and text data - not a good idea.
Is there any reason you're not using DataInputStream and DataOutputStream for this? Currently you're reinventing the wheel, and doing so with bugs.
You sending code is bugged:
byte[] msgLen = ByteBuffer.allocate(4).putInt(message.length()).array();
_output.write(msgLen);
_output.write(message.getBytes());
You send the number of characters as the message length, but after that you convert the message to bytes. Depending on the platform encoding String.getBytes() can give you much more bytes than there are characters.
You should never assume that String.length() has any relationship with String.getBytes().length! Those are different concepts and should never be mixed.
I have been writing something to read a request stream (containing gzipped data) from an incoming HttpServletRequest ('request' below), however it appears that the normal InputStream read method doesn't actually read all content?
My code was:
InputStream requestStream = request.getInputStream();
if ((length = request.getContentLength()) != -1)
{
received = new byte[length];
requestStream.read(received, 0, length);
}
else
{
// create a variable length list of bytes
List<Byte> bytes = new ArrayList<Byte>();
boolean endLoop = false;
while (!endLoop)
{
// try and read the next value from the stream.. if not -1, add it to the list as a byte. if
// it is, we've reached the end.
int currentByte = requestStream.read();
if (currentByte != -1)
bytes.add((byte) currentByte);
else
endLoop = true;
}
// initialize the final byte[] to the right length and add each byte into it in the right order.
received = new byte[bytes.size()];
for (int i = 0; i < bytes.size(); i++)
{
received[i] = bytes.get(i);
}
}
What I found during testing was that sometimes the top part (for when a content length is present) would just stop reading part way through the incoming request stream and leave the remainder of the 'received' byte array blank. If I just make it run the else part of the if statement at all times, it reads fine and all the expected bytes are placed in 'received'.
So, it seems like I can just leave my code alone now with that change, but does anyone have any idea why the normal 'read'(byte[], int, int)' method stopped reading? The description says that it may stop if an end of file is present. Could it be that the gzipped data just happened to include bytes matching whatever the signature for that looks like?
You need to add a while loop at the top to get all the bytes. The stream will attempt to read as many bytes as it can, but it is not required to return len bytes at once:
An attempt is made to read as many as len bytes, but a smaller number may be read, possibly zero.
if ((length = request.getContentLength()) != -1)
{
received = new byte[length];
int pos = 0;
do {
int read = requestStream.read(received, pos, length-pos);
// check for end of file or error
if (read == -1) {
break;
} else {
pos += read;
}
} while (pos < length);
}
EDIT: fixed while.
You need to see how much of the buffer was filled. Its only guaranteed to give you at at least one byte.
Perhaps what you wanted was DataInputStream.readFully();
I am getting OutOfMemory Exception. Why? I am using this code for logging. Does this approach correct?
Exceptions and closing of streams are handled in parent methods.
private static void writeToFile(File file, FileWriter out, String message) throws IOException {
if (file.exists() && file.isFile()) {
if ((file.length() + message.getBytes().length) <= FILE_MAX_SIZE_B) {
out.write(message);
} else {
int cutLenght = (int) (file.length() + message.getBytes().length - FILE_MAX_SIZE_B);
FileInputStream fileInputStream = new FileInputStream(file);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileInputStream));
char[] buf = new char[1024];
int numRead = 0;
StringBuffer text = new StringBuffer(1000);
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
String result = new String(text).substring(cutLenght);
result += message;
FileWriter fileWriter = new FileWriter(file, appendToFile);
writeToFile(file, fileWriter, result);
bufferedReader.close();
}
}
}
EDIT:
I am using this method for writting my logs in file. So for example in one second I can call 10 logs. I am getting error on lines:
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
My guess is that you are getting the OutOfMemoryError because you are reading the entire contents of the log file back into memory once it has gotten too close to its maximum size.
You could instead read and write it in smaller chunks, but that could be tricky since you have to avoid overwriting something you haven't already read.
Overall, this technique seems like a very inefficient method of maintaining the log data. Some alternative approaches off the top of my head:
(1) maintain a set of n log files, each with maximum size FILE_MAX_SIZE_B/n. When the first log fills up, open the next one for writing, and so on; when the last one fills up, go back to the first one. In this way you are discarding some of the oldest log data each time you switch files, but not all of it, and still maintaining your overall size limit.
(2) rotate the data within a single file. After each write, add a marker that indicates this is the end of the log stream. When the file has reached its maximum size, just start again at the beginning, overwriting the data that is there. The marker will tell you where the latest message is.
Try something like this:
void appendToFile(File f, CharSequence message, Charset cs, long maximumSize) throws IOException {
long available = maximumSize - f.length();
if (available > 0) {
FileOutputStream fos = new FileOutputStream(f, true);
try {
CharBuffer chars = CharBuffer.wrap(message);
ByteBuffer bytes = ByteBuffer.allocate(8 * 1024); // Re-used when encoding the string
CharsetEncoder enc = cs.newEncoder();
CoderResult res;
do {
res = enc.encode(chars, bytes, true);
bytes.flip();
long len = Math.min(available, bytes.remaining());
available -= len;
fos.write(bytes.array(), bytes.position(), (int) len);
bytes.clear();
} while (res == CoderResult.OVERFLOW && available > 0);
} finally {
fos.close();
}
}
}
Testable with this:
File f = new File(getCacheDir(), "tmp.txt");
f.delete();
// Or whatever charset you want.
Charset cs = Charset.forName("UTF-8");
int maxlen = 2 * 1024; // For this test, 2kb
try {
for (int i = 0; i < maxlen / 20; i++) {
// Write 30 characters for maxlen/20 times == guaranteed overflow
appendToFile(f, "123456789012345678901234567890", cs, maxlen);
System.out.println("Length=" + f.length());
}
} catch (Throwable t) {
t.printStackTrace();
}
f.delete();
Well, you're getting OOM because you're trying to load a huge file into memory.
Did you try opening it with append option instead?
you get OOME because you load the whole file, then get some part of the string. Instead, do a skip on your input stream and read.