Java emulation of MySQL COMRESS DECOMPRESS functions - java

I'd like to insert some byte data into a mysql VARBINARY column. The data is large so I want to store it in a compressed way.
I'm using Percona 5.6 Mysql. I would like to emulate the COMPRESS function of mysql in Java and then insert the result in the database.
I would like to use MySql DECOMPRESS function in order to access this data.
Is there a way to do it?
I have tried using standard java.zip package. but it doesn't work.
Edit. Phrased differently, what is the Java equivalent of PHP's gzcompress (ZLIB)?

The result of COMPRESS is a four-byte little endian length of the uncompressed data, followed by a zlib stream containing the compressed data.
You can use the Deflater class in Java to compress to a zlib stream. Precede the result of that with the four-byte length.

Solution: Implemented MYSQL Compress and DECOMPRESS
//Compress byte stream using ZLib compression
public static byte[] compressZLib(byte[] data) {
Deflater deflater = new Deflater();
deflater.setInput(data);
deflater.finish();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
while (!deflater.finished()) {
int count = deflater.deflate(buffer); // returns the generated code... index
outputStream.write(buffer, 0, count);
}
try {
outputStream.close();
} catch (IOException e) {
}
return outputStream.toByteArray();
}
//MYSQL COMPRESS.
public static byte[] compressMySQL(byte[] data) {
byte[] lengthBeforeCompression = ByteBuffer.allocate(Integer.SIZE / Byte.SIZE).order(ByteOrder.LITTLE_ENDIAN).putInt(data.length).array();
ByteArrayOutputStream resultStream = new ByteArrayOutputStream();
try {
resultStream.write( lengthBeforeCompression );
resultStream.write( compressZLib(data));
resultStream.close();
} catch (IOException e) {
}
return resultStream.toByteArray( );
}
//Decompress using ZLib
public static byte[] decompressZLib(byte[] data) {
Inflater inflater = new Inflater();
inflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
try {
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
}catch (IOException ioe) {
} catch (DataFormatException e) {
}
return outputStream.toByteArray();
}
//MYSQL DECOMPRESS
public static byte[] decompressSQLCompression(byte[] input) {
//ignore first four bytes which denote length of uncompressed data. use rest of the array for decompression
byte[] data= Arrays.copyOfRange(input,4,input.length);
return decompressZLib(data);
}

Related

Reading GZIP File Causing Unexpected end of ZLIB input stream in java

I am converting GZIP byte array to String form in Java.This is considerably large file and idea is to convert this in to JSON.
But Exception I am getting is quite weird and is not making much Sense.
Code Snippeet:
public static String convert(byte[] bytes) throws IOException {
final byte[] BUFFER = new byte[16234];
GZIPInputStream gzipInputStream = new GZIPInputStream(new ByteArrayInputStream(bytes));
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int len;
while ((len = gzipInputStream.read(BUFFER)) >= 0) {
byteArrayOutputStream.write(BUFFER, 0, len);
if(byteArrayOutputStream.size ()>60812918){
System.out.println ( "stopping here" );
}
}
byteArrayOutputStream.flush();
byteArrayOutputStream.close();
gzipInputStream.close();
final byte[] dataPart = byteArrayOutputStream.toByteArray();
String data = new String(dataPart, StandardCharsets.UTF_8);
return data;
}
Exception Trace:
java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at com.here.rcs.discoverkernels.testS3FileReader.convert(testS3FileReader.java:84)
at com.here.rcs.discoverkernels.testS3FileReader.viewJson(testS3FileReader.java:45)
at com.here.rcs.discoverkernels.testS3FileReader.main(testS3FileReader.java:21)
From Coding point of view,I don't think there is something wrong with this piece of code.
Any Suggestions how to move forward with this.
Adding To Byte Conversion Part;
public static byte[] compress(final String data) throws IOException {
final byte[] dataPart = data.getBytes( StandardCharsets.UTF_8 );
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(byteArrayOutputStream);
gzipOutputStream.write(dataPart);
gzipOutputStream.flush();
gzipOutputStream.close();
byte[] bytes = byteArrayOutputStream.toByteArray();
return bytes;
}
I tried your program with a small .gz file, and it works as expected. I guess that
there are issues handling large files, or, more likely,
you did not loaded correctly data from file to bytes array. How did you? I followed this article: https://netjs.blogspot.com/2015/11/how-to-convert-file-to-byte-array-java.html

Why does getResourceAsStream() and reading file with FileInputStream return arrays of different length?

I want to read files as byte arrays and realised that amount of read bytes varies depending on the used method. Here the relevant code:
public byte[] readResource() {
try (InputStream is = getClass().getClassLoader().getResourceAsStream(FILE_NAME)) {
int available = is.available();
byte[] result = new byte[available];
is.read(result, 0, available);
return result;
} catch (Exception e) {
log.error("Failed to load resource '{}'", FILE_NAME, e);
}
return new byte[0];
}
public byte[] readFile() {
File file = new File(FILE_PATH + FILE_NAME);
try (InputStream is = new FileInputStream(file)) {
int available = is.available();
byte[] result = new byte[available];
is.read(result, 0, available);
return result;
} catch (Exception e) {
log.error("Failed to load file '{}'", FILE_NAME, e);
}
return new byte[0];
}
Calling File.length() and reading with the FileInputStream returns the correct length of 21566 bytes for the given test file, though reading the file as a resources returns 21622 bytes.
Does anyone know why I get different results and how to fix it so that readResource() returns the correct result?
Why does getResourceAsStream() and reading file with FileInputStream return arrays of different length?
Because you're misusing the available() method in a way that is specifically warned against in the Javadoc:
"It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream."
and
Does anyone know why I get different results and how to fix it so that readResource() returns the correct result?
Read in a loop until end of stream.
According to the the API docs of InputStream, InputStream.available() does not return the size of the resource - it returns
an estimate of the number of bytes that can be read (or skipped over) from this input stream without blocking
To get the size of a resource from a stream, you need to fully read the stream, and count the bytes read.
To read the stream and return the contents as a byte array, you could do something like this:
try ( InputStream is = getClass().getClassLoader().getResourceAsStream(FILE_NAME);
ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
byte[] buffer = new byte[4096];
int bytesRead = 0;
while ((bytesRead = is.read(buffer)) != -1) {
bos.write(buffer, 0, bytesRead);
}
return bos.toByteArray();
}

What's the difference between getBytes and serialize with String?

Just as the title says, I can't differ getBytes[] from serialization mechanism with String. Below is a test between getBytes[] and serialization mechanism:
public void testUTF() {
byte[] data = SerializeUtil.serUTFString(str);
System.out.println(data.length);
System.out.println(str.getBytes().length);
}
Here is SerializeUtil:
public static byte[] serUTFString(String data) {
byte[] result = null;
ObjectOutputStream oos = null;
ByteArrayOutputStream byteArray = new ByteArrayOutputStream();
try {
oos = new ObjectOutputStream(byteArray);
try {
oos.writeUTF(data);
oos.flush();
result = byteArray.toByteArray();
} finally {
oos.close();
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
When I set str to Redis, both can work correctly, but getBytes[] seems more efficient. Since they all return a byte array from String, whats's the difference, is serialization necessary?
String.getBytes() returns a byte array repersenting the string characters in the default encoding. ObjectOutputStream.writeUTF writes the string length then bytes in modified UTF-8 format, see java.io.DataOutput API.

Java: How do I convert InputStream to GZIPInputStream?

I have a method like
public void put(#Nonnull final InputStream inputStream, #Nonnull final String uniqueId) throws PersistenceException {
// a.) create gzip of inputStream
final GZIPInputStream zipInputStream;
try {
zipInputStream = new GZIPInputStream(inputStream);
} catch (IOException e) {
e.printStackTrace();
throw new PersistenceException("Persistence Service could not received input stream to persist for " + uniqueId);
}
I wan to convert the inputStream into zipInputStream, what is the way to do that?
The above method is incorrect and throws Exception as "Not a Zip Format"
converting Java Streams to me are really confusing and I do not make them right
The GZIPInputStream is to be used to decompress an incoming InputStream. To compress an incoming InputStream using GZIP, you basically need to write it to a GZIPOutputStream.
You can get a new InputStream out of it if you use ByteArrayOutputStream to write gzipped content to a byte[] and ByteArrayInputStream to turn a byte[] into an InputStream.
So, basically:
public void put(#Nonnull final InputStream inputStream, #Nonnull final String uniqueId) throws PersistenceException {
final InputStream zipInputStream;
try {
ByteArrayOutputStream bytesOutput = new ByteArrayOutputStream();
GZIPOutputStream gzipOutput = new GZIPOutputStream(bytesOutput);
try {
byte[] buffer = new byte[10240];
for (int length = 0; (length = inputStream.read(buffer)) != -1;) {
gzipOutput.write(buffer, 0, length);
}
} finally {
try { inputStream.close(); } catch (IOException ignore) {}
try { gzipOutput.close(); } catch (IOException ignore) {}
}
zipInputStream = new ByteArrayInputStream(bytesOutput.toByteArray());
} catch (IOException e) {
e.printStackTrace();
throw new PersistenceException("Persistence Service could not received input stream to persist for " + uniqueId);
}
// ...
You can if necessary replace the ByteArrayOutputStream/ByteArrayInputStream by a FileOuputStream/FileInputStream on a temporary file as created by File#createTempFile(), especially if those streams can contain large data which might overflow machine's available memory when used concurrently.
GZIPInputStream is for reading gzip-encoding content.
If your goal is to take a regular input stream and compress it in the GZIP format, then you need to write those bytes to a GZIPOutputStream.
See also this answer to a related question.

How can I easily compress and decompress Strings to/from byte arrays?

I have some strings that are roughly 10K characters each. There is plenty of repetition in them. They are serialized JSON objects. I'd like to easily compress them into a byte array, and uncompress them from a byte array.
How can I most easily do this? I'm looking for methods so I can do the following:
String original = "....long string here with 10K characters...";
byte[] compressed = StringCompressor.compress(original);
String decompressed = StringCompressor.decompress(compressed);
assert(original.equals(decompressed);
You can try
enum StringCompressor {
;
public static byte[] compress(String text) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
OutputStream out = new DeflaterOutputStream(baos);
out.write(text.getBytes("UTF-8"));
out.close();
} catch (IOException e) {
throw new AssertionError(e);
}
return baos.toByteArray();
}
public static String decompress(byte[] bytes) {
InputStream in = new InflaterInputStream(new ByteArrayInputStream(bytes));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
byte[] buffer = new byte[8192];
int len;
while((len = in.read(buffer))>0)
baos.write(buffer, 0, len);
return new String(baos.toByteArray(), "UTF-8");
} catch (IOException e) {
throw new AssertionError(e);
}
}
}
Peter Lawrey's answer can be improved a bit using this less complex code for the decompress function
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
OutputStream out = new InflaterOutputStream(baos);
out.write(bytes);
out.close();
return new String(baos.toByteArray(), "UTF-8");
} catch (IOException e) {
throw new AssertionError(e);
}
I made a library to solve the problem of compressing generic Strings (expecially short ones).
It tries to compress the String using various algorithms (plain utf-8, 5bit encoding for latin letters, huffman encoding, gzip for long Strings) and chooses the one with the shortest result (in the worst case, it will choose the utf-8 encoding, so that you never risk to lose space).
I hope it may be useful, here's the link
https://github.com/lithedream/lithestring
EDIT: I realized that your Strings are always "long", my library defaults on gzip for those sizes, I fear I cannot do better for you.

Categories