What I want to do is save 4x8bytes as a 64bit Long.
And decode that 64bit Long into 4x8bytes again.
I know you may not understand it but I have an Encoder, which uses bytes
8 bits to make a 64 bit Long.
And I'm saving multiple of those an example: "-223784 2432834 -34233566"
and I want to read every number split it when " " is the character and put it in a long[].
Currently I have this Code:
FileInputStream fin = new FileInputStream( IOUtils.path + File.separator + "eclipse.hm" );
String c = "";
long[] longs = new long[1000000];
int b,ggg=0;
while((b=fin.read())!=-1) {
if( (char)b==' ' ) {
longs[ggg++] = Long.parseLong(c);
c = "";
} else {
c+=(char) b;
}
fetched++;
}
fin.close();
The Method of my "Decoder" is as follows:
public static Object decode(long[] input) throws DataFormatException, IOException, ClassNotFoundException {
byte[] toInflate = BitSet.valueOf(input).toByteArray();
Inflater inflater = new Inflater();
inflater.setInput(toInflate);
byte[] deflated = new byte[ toInflate.length*2 ];
inflater.inflate(deflated);
inflater.end();
ObjectInputStream ois = new ObjectInputStream( new ByteArrayInputStream(deflated) );
Object r = ois.readObject();
ois.close();
return r;
}
The Decoder works I had tested it with my Encoder and directly input the output of the Encoder.
So there must be a read error.
and I'm literally speechless, as well as I don't have anything in mind to fix this problem...
Thanks for help, sincerly Richee.
You can use some built-in String functions to split string. After that you need to do some transformations from string to long for all elements that you get after split step.
To fix this Issue I tried, removing the Deflater from my Encoder and Decoder(Inflater).
After that it worked so now I am wondering why does the Deflater/Inflater "destroy" my StreamHeader....
Anyways sorry for taking your time...
Related
First post, usually I find what Im looking for in other threads but not this time:
Im using javas Deflater and Inflater to compress/ decompress some data I send between a server and client application that Im working on.
It works just fine for 99% of my tests. However there is one particular dataset that when inflated throws this exception from the inflater.inflate() method:
DataFormatException: incorrect header check
There is nothing special about the data compared to the other runs. Its just a bunch of numbers seperated by commas "encoded" as a String and then done .getBytes() to. The only thing I know is that its a bit bigger this time. There is not encoding happening anywhere between the compression -> decompression steps.
This is the code to send something to either the client or the server. The code is shared.
OutputStream outputStream = new DataOutputStream(socket.getOutputStream());
byte[] uncompressed = SOMEJSON.toString().getBytes();
int realLength = uncompressed.length;
// compress data
byte[] compressedData = ByteCompression.compress(uncompressed);
int compressedLength = compressedData.length;
outputStream.write(ByteBuffer.allocate(Integer.BYTES).putInt(compressedLength).array());
outputStream.write(ByteBuffer.allocate(Integer.BYTES).putInt(realLength).array());
outputStream.write(compressedData);
outputStream.flush();
This is the code to receive data (either client or server) also shared:
DataInputStream dataIn = new DataInputStream(socket.getInputStream());
int compressedLength = dataIn.readInt();
int realLength = dataIn.readInt();
errorhandling.info("Packet Reader", "Expecting " + compressedLength + " (" + realLength + ") bytes.");
byte[] compressedData = new byte[compressedLength];
int readBytes = 0;
while (readBytes < compressedLength) {
int newByteAmount = dataIn.read(compressedData);
// catch nothing being read or end of line
if (newByteAmount <= 0) {
break;
}
readBytes += newByteAmount;
}
if (readBytes != compressedLength) {
errorhandling.info("Packet Reader", "Read byte amount differs from expected bytes.");
return new ErrorPacket("Read byte amount differs from expected bytes.").create();
}
byte[] uncompressedData = ByteCompression.decompress(compressedData, realLength);
String packetData = new String(uncompressedData);
Here are the methods to compress and decompress a byteArray (you guessed right its shared):
public static byte[] compress(byte[] uncompressed) {
Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION);
deflater.setInput(uncompressed);
deflater.finish();
byte[] compressed = new byte[uncompressed.length];
int compressedSize = 0;
while (!deflater.finished()) {
compressedSize += deflater.deflate(compressed);
}
deflater.end();
return Arrays.copyOfRange(compressed, 0, compressedSize);
}
public static byte[] decompress(byte[] compressed, int realLength) throws DataFormatException {
Inflater inflater = new Inflater(true);
inflater.setInput(compressed);
byte[] uncompressed = new byte[realLength];
while (!inflater.finished()) {
inflater.inflate(uncompressed); // throws DataFormatException: incorrect header check (but only super rarely)
}
inflater.end();
return uncompressed;
}
So far Ive tried differnt compression levels and messing with the "nowrap" option for both Deflater and Inflater (all combinations):
// [...]
Deflater deflater = new Deflater(Deflater.BEST_COMPRESSION, true);
// [...]
Inflater inflater = new Inflater(true);
But that just results in these exceptions (but again only for that one particulat dataset):
DataFormatException: invalid stored block lengths
DataFormatException: invalid distance code
Im sorry for this wall of text but at this point I really dont know anymore what could be causing this issue.
Alright here is the solution:
My assumption was that this loop would APPEND new read data to the byte array where it last stopped THIS IS NOT THE CASE (it seems to stop reading after 2^16 bytes so thats why I dont get this issue with smaller packets).
This is wrong:
int readBytes = 0;
while (readBytes < compressedLength) {
int newByteAmount = dataIn.read(compressedData); // focus here!
readBytes += newByteAmount;
}
So whats happening is that the data is read correctly however the output array is overwriting itself!! Thats why I see wrong data at the start and a bunch of 00 00 at the end (because it never actually reached that part of the array)!
Using this instead fixed my issue:
dataIn.readFully(compressedData);
What concerns me is that I see the first variant of the code A LOT. Thats what I found when googling it.
I wanted to use Base64.java to encode and decode files. Encode.wrap(InputStream) and decode.wrap(InputStream) worked but runned slowly. So I used following code.
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException {
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE]; //final int BUFF_SIZE = 1024;
byte[] outBuff = null;
while (in.read(inBuff) > 0) {
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
out.flush();
out.close();
in.close();
}
However, it always throws
Exception in thread "AWT-EventQueue-0" java.lang.IllegalArgumentException: Input byte array has wrong 4-byte ending unit
at java.util.Base64$Decoder.decode0(Base64.java:704)
at java.util.Base64$Decoder.decode(Base64.java:526)
at Base64Coder.JavaBase64FileCoder.decodeFile(JavaBase64FileCoder.java:69)
...
After I changed final int BUFF_SIZE = 1024; into final int BUFF_SIZE = 3*1024;, the code worked. Since "BUFF_SIZE" is also used to encode file, I believe there were something wrong with the file encoded (1024 % 3 = 1, which means paddings are added in the middle of the file).
Also, as #Jon Skeet and #Tagir Valeev mentioned, I should not ignore the return value from InputStream.read(). So, I modified the code as below.
(However, I have to mention that the code does run much faster than using wrap(). I noticed the speed difference because I had coded and intensively used Base64.encodeFile()/decodeFile() long before jdk8 was released. Now, my buffed jdk8 code runs as fast as my original code. So, I do not know what is going on with wrap()... )
public static void decodeFile(String inputFileName,
String outputFileName)
throws FileNotFoundException, IOException
{
Base64.Decoder decoder = Base64.getDecoder();
InputStream in = new FileInputStream(inputFileName);
OutputStream out = new FileOutputStream(outputFileName);
byte[] inBuff = new byte[BUFF_SIZE];
byte[] outBuff = null;
int bytesRead = 0;
while (true)
{
bytesRead = in.read(inBuff);
if (bytesRead == BUFF_SIZE)
{
outBuff = decoder.decode(inBuff);
}
else if (bytesRead > 0)
{
byte[] tempBuff = new byte[bytesRead];
System.arraycopy(inBuff, 0, tempBuff, 0, bytesRead);
outBuff = decoder.decode(tempBuff);
}
else
{
out.flush();
out.close();
in.close();
return;
}
out.write(outBuff);
}
}
Special thanks to #Jon Skeet and #Tagir Valeev.
I strongly suspect that the problem is that you're ignoring the return value from InputStream.read, other than to check for the end of the stream. So this:
while (in.read(inBuff) > 0) {
// This always decodes the *complete* buffer
outBuff = decoder.decode(inBuff);
out.write(outBuff);
}
should be
int bytesRead;
while ((bytesRead = in.read(inBuff)) > 0) {
outBuff = decoder.decode(inBuff, 0, bytesRead);
out.write(outBuff);
}
I wouldn't expect this to be any faster than using wrap though.
Try to use decode.wrap(new BufferedInputStream(new FileInputStream(inputFileName))). With buffering it should be at least as fast as your manually crafted version.
As for why your code doesn't work: that's because the last chunk is likely to be shorter than 1024 bytes, but you try to decode the whole byte[] array. See the #JonSkeet answer for details.
Well, I changed
"final int BUFF_SIZE = 1024;"
into
"final int BUFF_SIZE = 1024 * 3;"
It worked!
So, I guess probabaly there is something wrong with padding... I mean, when encoding the file, (since 1024 % 3 = 1) there must be paddings. And those might raise problems when decoding...
You should records the number of bytes you have read, beside this,
You should be sure that your buffer size is divisible for 3, cause in Base64, every 3 bytes have four output(64 is 2^6, and 3*8 equals 4*6), by doing this, you can avoid padding problems.( In this way your output will not have the wrong ending of "=")
I have a file which is split in two parts by "\n\n" - first part is not too long String and second is byte array, which can be quite long.
I am trying to read the file as follows:
byte[] result;
try (final FileInputStream fis = new FileInputStream(file)) {
final InputStreamReader isr = new InputStreamReader(fis);
final BufferedReader reader = new BufferedReader(isr);
String line;
// reading until \n\n
while (!(line = reader.readLine()).trim().isEmpty()){
// processing the line
}
// copying the rest of the byte array
result = IOUtils.toByteArray(reader);
reader.close();
}
Even though the resulting array is the size it should be, its contents are broken. If I try to use toByteArray directly on fis or isr, the contents of result are empty.
How can I read the rest of the file correctly and efficiently?
Thanks!
The reason your contents are broken is because the IOUtils.toByteArray(...) function reads your data as a string in the default character encoding, i.e. it converts the 8-bit binary values into text characters using whatever logic your default encoding prescribes. This usually leads to many of the binary values getting corrupted.
Depending on how exactly the charset is implemented, there is a slight chance that this might work:
result = IOUtils.toByteArray(reader, "ISO-8859-1");
ISO-8859-1 uses only a single byte per character. Not all character values are defined, but many implementations will pass them anyways. Maybe you're lucky with it.
But a much cleaner solution would be to instead read the String in the beginning as binary data first and then converting it to text via new String(bytes) rather than reading the binary data at the end as a String and then converting it back.
This might mean, though, that you need to implement your own version of a BufferedReader for performance purposes.
You can find the source code of the standard BufferedReader via the obvious Google search, which will (for example) lead you here:
http://www.docjar.com/html/api/java/io/BufferedReader.java.html
It's a bit long, but conceptually not too difficult to understand, so hopefully it will be useful as a reference.
Alternatively, you could read the file into byte array, find \n\n position and split the array into the line and bytes
byte[] a = Files.readAllBytes(Paths.get("file"));
String line = "";
byte[] result = a;
for (int i = 0; i < a.length - 1; i++) {
if (a[i] == '\n' && a[i + 1] == '\n') {
line = new String(a, 0, i);
int len = a.length - i - 1;
result = new byte[len];
System.arraycopy(a, i + 1, result, 0, len);
break;
}
}
Thanks for all the comments - the final implementation was done in this way:
try (final FileInputStream fis = new FileInputStream(file)) {
ByteBuffer buffer = ByteBuffer.allocate(64);
boolean wasLast = false;
String headerValue = null, headerKey = null;
byte[] result = null;
while (true) {
byte current = (byte) fis.read();
if (current == '\n') {
if (wasLast) {
// this is \n\n
break;
} else {
// just a new line in header
wasLast = true;
headerValue = new String(buffer.array(), 0, buffer.position()));
buffer.clear();
}
} else if (current == '\t') {
// headerKey\theaderValue\n
headerKey = new String(buffer.array(), 0, buffer.position());
buffer.clear();
} else {
buffer.put(current);
wasLast = false;
}
}
// reading the rest
result = IOUtils.toByteArray(fis);
}
I have a problem in code:
private static String compress(String str)
{
String str1 = null;
ByteArrayOutputStream bos = null;
try
{
bos = new ByteArrayOutputStream();
BufferedOutputStream dest = null;
byte b[] = str.getBytes();
GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
gz.write(b,0,b.length);
bos.close();
gz.close();
}
catch(Exception e) {
System.out.println(e);
e.printStackTrace();
}
byte b1[] = bos.toByteArray();
return new String(b1);
}
private static String deCompress(String str)
{
String s1 = null;
try
{
byte b[] = str.getBytes();
InputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gs = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numBytesRead = 0;
byte [] tempBytes = new byte[6000];
try
{
while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
{
baos.write(tempBytes, 0, numBytesRead);
}
s1 = new String(baos.toByteArray());
s1= baos.toString();
}
catch(ZipException e)
{
e.printStackTrace();
}
}
catch(Exception e) {
e.printStackTrace();
}
return s1;
}
public String test() throws Exception
{
String str = "teststring";
String cmpr = compress(str);
String dcmpr = deCompress(cmpr);
}
This code throw java.io.IOException: unknown format (magic number ef1f)
GZIPInputStream gs = new GZIPInputStream(bais);
It turns out that when converting byte new String (b1) and the byte b [] = str.getBytes () bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.
public String unZip(String zipped) throws DataFormatException, IOException {
byte[] bytes = zipped.getBytes("WINDOWS-1251");
Inflater decompressed = new Inflater();
decompressed.setInput(bytes);
byte[] result = new byte[100];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (decompressed.inflate(result) != 0)
buffer.write(result);
decompressed.end();
return new String(buffer.toByteArray(), charset);
}
I'm use this function to decompress server responce. Thanks for help.
You have two problems:
You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[]) constructor. That constructor is only meant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 library which makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren't the direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.
Additionally, the exception handling in your code is poor:
You should almost never catch Exception; catch more specific exceptions
You shouldn't just catch an exception and continue as if it had never happened. If you can't really handle the exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)
When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).
So your compress method should return a byte array and your decompress method should take a byte array as its parameter.
Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.
When you GZIP compress data, you always get binary data. This data
cannot be converted into string as it is no valid character data (in
any encoding).
Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)
I have this piece of code which I'm hoping will be able to tell me how much data I have downloaded (and soon put it in a progress bar), and then parse the results through my Sax Parser. If I comment out basically everything above the //xr.parse(new InputSource(request.getInputStream())); line and swap the xr.parse's over, it works fine. But at the moment, my Sax parser tells me I have nothing. Is it something to do with is.read (buffer) section?
Also, just as a note, request is a HttpURLConnection with various signatures.
/*Input stream to read from our connection*/
InputStream is = request.getInputStream();
/*we make a 2 Kb buffer to accelerate the download, instead of reading the file a byte at once*/
byte [ ] buffer = new byte [ 2048 ] ;
/*How many bytes do we have already downloaded*/
int totBytes,bytes,sumBytes = 0;
totBytes = request.getContentLength () ;
while ( true ) {
/*How many bytes we got*/
bytes = is.read (buffer);
/*If no more byte, we're done with the download*/
if ( bytes <= 0 ) break;
sumBytes+= bytes;
Log.v("XML", sumBytes + " of " + totBytes + " " + ( ( float ) sumBytes/ ( float ) totBytes ) *100 + "% done" );
}
/* Parse the xml-data from our URL. */
// OLD, and works if comment all the above
//xr.parse(new InputSource(request.getInputStream()));
xr.parse(new InputSource(is))
/* Parsing has finished. */;
Can anyone help me at all??
Kind regards,
Andy
'I could only find a way to do that
with bytes, unless you know another
method?'.
But you haven't found a method. You've just written code that doesn't work. And you don't want to save the input to a String either. You want to count the bytes while you're parsing them. Otherwise you're just adding latency, i.e. wasting time and slowing everything down. For an example of how to do it right, see javax.swing.ProgressMonitorInputStream. You don't have to use that but you certainly do have to use a FilterInputStream of some sort, probaby one you write yourself, that is wrapped around the request input stream and passed to the parser.
Your while loop is consuming the input stream and leaving nothing for the parser to read.
For what you're trying to do, you might want to look into implementing a FilterInputStream subclass wrapping the input stream.
You are building an InputStream over another InputStream that consumes its data before.
If you want to avoid reading just single bytes you could use a BufferedInputStream or different things like a BufferedReader.
In any case it's better to obtain the whole content before parsing it! Unless you need to dynamically parse it.
If you really want to keep it on like you are doing you should create two piped streams:
PipedOutputStream pipeOut = new PipedOutputStream();
PipedInputStream pipeIn = new PipedInputStream();
pipeIn.connect(pipeOut);
pipeOut.write(yourBytes);
xr.parse(pipeIn);
Streams in Java, like their name suggest you, doesn't have a precise dimension neither you know when they'll finish so whenever you create an InputStream, if you read from them you cannot then pass the same InputStream to another object because data is already being consumed from the former one.
If you want to do both things (downloading and parsing) together you have to hook between the data received from the HTTPUrlConncection you should:
first know the length of the data being downloaded, this can be obtained from HttpUrlConnection header
using a custom InputStream that decorates (this is how streams work in Java, see here) updading the progressbar..
Something like:
class MyInputStream extends InputStream
{
MyInputStream(InputStream is, int total)
{
this.total = total;
}
public int read()
{
stepProgress(1);
return super.read();
}
public int read(byte[] b)
{
int l = super.read(b);
stepProgress(l);
return l;
}
public int read(byte[] b, int off, int len)
{
int l = super.read(b, off, len);
stepProgress(l);
return l
}
}
InputStream mis= new MyInputStream(request.getInputStream(), length);
..
xr.parse(mis);
You can save your data in a file, and then read them out.
InputStream is = request.getInputStream();
if(is!=null){
File file = new File(path, "someFile.txt");
FileOutputStream os = new FileOutputStream(file);
buffer = new byte[2048];
bufferLength = 0;
while ((bufferLength = is.read(buffer)) > 0)
os.write(buffer, 0, bufferLength);
os.flush();
os.close();
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
factory.setNamespaceAware(true);
XmlPullParser xpp = factory.newPullParser();
FileInputStream fis = new FileInputStream(file);
xpp.setInput(new InputStreamReader(fis));
}