GZIP decompress string and byte conversion

GZIP decompress string and byte conversion - java

I have a problem in code:
private static String compress(String str)
{
String str1 = null;
ByteArrayOutputStream bos = null;
try
{
bos = new ByteArrayOutputStream();
BufferedOutputStream dest = null;
byte b[] = str.getBytes();
GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
gz.write(b,0,b.length);
bos.close();
gz.close();
}
catch(Exception e) {
System.out.println(e);
e.printStackTrace();
}
byte b1[] = bos.toByteArray();
return new String(b1);
}
private static String deCompress(String str)
{
String s1 = null;
try
{
byte b[] = str.getBytes();
InputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gs = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numBytesRead = 0;
byte [] tempBytes = new byte[6000];
try
{
while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
{
baos.write(tempBytes, 0, numBytesRead);
}
s1 = new String(baos.toByteArray());
s1= baos.toString();
}
catch(ZipException e)
{
e.printStackTrace();
}
}
catch(Exception e) {
e.printStackTrace();
}
return s1;
}
public String test() throws Exception
{
String str = "teststring";
String cmpr = compress(str);
String dcmpr = deCompress(cmpr);
}
This code throw java.io.IOException: unknown format (magic number ef1f)
GZIPInputStream gs = new GZIPInputStream(bais);
It turns out that when converting byte new String (b1) and the byte b [] = str.getBytes () bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.
public String unZip(String zipped) throws DataFormatException, IOException {
byte[] bytes = zipped.getBytes("WINDOWS-1251");
Inflater decompressed = new Inflater();
decompressed.setInput(bytes);
byte[] result = new byte[100];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (decompressed.inflate(result) != 0)
buffer.write(result);
decompressed.end();
return new String(buffer.toByteArray(), charset);
}
I'm use this function to decompress server responce. Thanks for help.

You have two problems:
You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[]) constructor. That constructor is only meant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 library which makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren't the direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.
Additionally, the exception handling in your code is poor:
You should almost never catch Exception; catch more specific exceptions
You shouldn't just catch an exception and continue as if it had never happened. If you can't really handle the exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)

When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).
So your compress method should return a byte array and your decompress method should take a byte array as its parameter.
Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.

When you GZIP compress data, you always get binary data. This data
cannot be converted into string as it is no valid character data (in
any encoding).
Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)

Related

How can I create a truststore from a base64 encoded String?

I have a String that is encoded in base64, I need to take this string, decode it and create a truststore file, but when I do that, the final file is not valid. Here is my code:
public static void buildFile() {
String exampleofencoded = "asdfasdfasdfadfa";
File file = new File("folder/file.jks");
try (FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos))
{
Base64.Decoder decoder = Base64.getDecoder();
String decodedString =new String(decoder.decode(exampleofencoded).getBytes());
dos.writeBytes(decodedString);
}
catch (IOException e) {
System.out.println("Error creating file");
}
catch(NullPointerException e) {
System.out.println(e.getMessage();
}
}

The problem is two-fold.
You're converting a byte[] array to String, which is a lossy operation for actual binary data for most character sets (except maybe iso-8859-1).
You're using DataOutputStream, which is not a generic output stream, but intended for a specific serialization format of primitive types. And specifically its writeBytes method comes with an important caveat ("Each character in the string is written out, in sequence, by discarding its high eight bits."), which is one more reason why only using iso-8859-1 will likely work.
Instead, write the byte array directly to the file
public static void buildFile() {
String exampleofencoded = "asdfasdfasdfadfa";
File file = new File("folder/file.jks");
try (FileOutputStream fos = Files.newOutputStream(file.toPath()) {
Base64.Decoder decoder = Base64.getDecoder();
byte[] decodedbytes = decoder.decode(exampleofencoded);
fos.write(decodedbytes);
} catch (IOException e) {
System.out.println("Error creating file");
}
}
As an aside, you shouldn't catch NullPointerException in your code, it is almost always a problem that can be prevented by careful programming and/or validation of inputs. I would usually also advise against catch the IOException here and only printing it. It is probably better to propagate that exception as well, and let the caller handle it.

How to convert String variable back in byte[] in JAVA [duplicate]

This question already has answers here:
How to convert Java String into byte[]?
(9 answers)
Closed 4 years ago.
I have the following code to zip and unzip the String:
public static void main(String[] args) {
// TODO code application logic here
String Source = "hello world";
byte[] a = ZIP(Source);
System.out.format("answer:");
System.out.format(a.toString());
System.out.format("\n");
byte[] Source2 = a.toString().getBytes();
System.out.println("\nsource 2:" + Source2.toString() + "\n");
String b = unZIP(Source2);
System.out.println("\nunzip answer:");
System.out.format(b);
System.out.format("\n");
}
public static byte[] ZIP(String source) {
ByteArrayOutputStream bos= new ByteArrayOutputStream(source.length()* 4);
try {
GZIPOutputStream outZip= new GZIPOutputStream(bos);
outZip.write(source.getBytes());
outZip.flush();
outZip.close();
} catch (Exception Ex) {
}
return bos.toByteArray();
}
public static String unZIP(byte[] Source) {
ByteArrayInputStream bins= new ByteArrayInputStream(Source);
byte[] buf= new byte[2048];
StringBuffer rString= new StringBuffer("");
int len;
try {
GZIPInputStream zipit= new GZIPInputStream(bins);
while ((len = zipit.read(buf)) > 0) {
rString.append(new String(buf).substring(0, len));
}
return rString.toString();
} catch (Exception Ex) {
return "";
}
}
When "Hello World" have been zipped, it's will become [B#7bdecdec in byte[] and convert into String and display on the screen. However, if I'm trying to convert the string back into byte[] with the following code:
byte[] Source2 = a.toString().getBytes();
the value of variable a will become to [B#60a1807c instead of [B#7bdecdec . Does anyone know how can I convert the String (a value of byte but been convert into String) back in byte[] in JAVA?

Why doing byte[] Source2 = a.toString().getBytes(); ?
It seems like a double conversion; you convert a byte[] to string the to byte[].
The real conversion of a byte[] to string is new String(byte[]) hoping that you're in the same charset.
Source2 should be an exact copy of a hence you should just do byte[] Source2 = a;

Your unzip is wrong because you are converting back a string which might be in some other encoding (let's say UTF-8):
public static String unZIP(byte[] source) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(source.length*2);
try (ByteArrayInputStream in = new ByteArrayInputStream(source);
GZIPInputStream zis = new GZIPInputStream(in)) {
byte[] buffer = new buffer[4096];
for (int n = 0; (n = zis.read(buffer) != 0; ) {
bos.write(buffer, 0, n);
}
}
return new String(bos.toByteArray(), StandardCharsets.UTF_8);
}
This one, not tested, will:
Store byte from the gzip stream into a ByteArrayOutputStream
Close the gzip/ByteArrayInputStream using try with resources
Convert the whole into a String using UTF-8 (you should always use encoding and unless rare case, UTF-8 is the way to go).
You must not use StringBuffer for two reasons:
The most important one: this will not behave well with multi bytes string such as UTF-8 or UTF-16.
And second, StringBuffer is synchronized: you should use StringBuilder whenever possible and whenever it should be used (eg: not here!). StringBuffer should be reserved for case where your share the StringBuffer with several threads, otherwise it is useless.
With those change, you will also need to change the ZIP as per David Conrad comment and because the unZIP use UTF-8:
public static byte[] ZIP(String source) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(source.length()* 4);
try (GZIPOutputStream zip = new GZIPOutputStream(bos)) {
zip.write(source.getBytes(StandardCharsets.UTF_8));
}
return bos.toByteArray();
}
As for the main, printing a byte[] will result in the default toString.

UTF-8 byte[] to String

Let's suppose I have just used a BufferedInputStream to read the bytes of a UTF-8 encoded text file into a byte array. I know that I can use the following routine to convert the bytes to a string, but is there a more efficient/smarter way of doing this than just iterating through the bytes and converting each one?
public String openFileToString(byte[] _bytes)
{
String file_string = "";
for(int i = 0; i < _bytes.length; i++)
{
file_string += (char)_bytes[i];
}
return file_string;
}

Look at the constructor for String
String str = new String(bytes, StandardCharsets.UTF_8);
And if you're feeling lazy, you can use the Apache Commons IO library to convert the InputStream to a String directly:
String str = IOUtils.toString(inputStream, StandardCharsets.UTF_8);

Java String class has a built-in-constructor for converting byte array to string.
byte[] byteArray = new byte[] {87, 79, 87, 46, 46, 46};
String value = new String(byteArray, "UTF-8");

To convert utf-8 data, you can't assume a 1-1 correspondence between bytes and characters.
Try this:
String file_string = new String(bytes, "UTF-8");
(Bah. I see I'm way to slow in hitting the Post Your Answer button.)
To read an entire file as a String, do something like this:
public String openFileToString(String fileName) throws IOException
{
InputStream is = new BufferedInputStream(new FileInputStream(fileName));
try {
InputStreamReader rdr = new InputStreamReader(is, "UTF-8");
StringBuilder contents = new StringBuilder();
char[] buff = new char[4096];
int len = rdr.read(buff);
while (len >= 0) {
contents.append(buff, 0, len);
}
return buff.toString();
} finally {
try {
is.close();
} catch (Exception e) {
// log error in closing the file
}
}
}

You can use the String(byte[] bytes) constructor for that. See this link for details.
EDIT You also have to consider your plateform's default charset as per the java doc:
Constructs a new String by decoding the specified array of bytes using
the platform's default charset. The length of the new String is a
function of the charset, and hence may not be equal to the length of
the byte array. The behavior of this constructor when the given bytes
are not valid in the default charset is unspecified. The
CharsetDecoder class should be used when more control over the
decoding process is required.

You could use the methods described in this question (especially since you start off with an InputStream): Read/convert an InputStream to a String
In particular, if you don't want to rely on external libraries, you can try this answer, which reads the InputStream via an InputStreamReader into a char[] buffer and appends it into a StringBuilder.

Knowing that you are dealing with a UTF-8 byte array, you'll definitely want to use the String constructor that accepts a charset name. Otherwise you may leave yourself open to some charset encoding based security vulnerabilities. Note that it throws UnsupportedEncodingException which you'll have to handle. Something like this:
public String openFileToString(String fileName) {
String file_string;
try {
file_string = new String(_bytes, "UTF-8");
} catch (UnsupportedEncodingException e) {
// this should never happen because "UTF-8" is hard-coded.
throw new IllegalStateException(e);
}
return file_string;
}

Here's a simplified function that will read in bytes and create a string. It assumes you probably already know what encoding the file is in (and otherwise defaults).
static final int BUFF_SIZE = 2048;
static final String DEFAULT_ENCODING = "utf-8";
public static String readFileToString(String filePath, String encoding) throws IOException {
if (encoding == null || encoding.length() == 0)
encoding = DEFAULT_ENCODING;
StringBuffer content = new StringBuffer();
FileInputStream fis = new FileInputStream(new File(filePath));
byte[] buffer = new byte[BUFF_SIZE];
int bytesRead = 0;
while ((bytesRead = fis.read(buffer)) != -1)
content.append(new String(buffer, 0, bytesRead, encoding));
fis.close();
return content.toString();
}

String has a constructor that takes byte[] and charsetname as parameters :)

This also involves iterating, but this is much better than concatenating strings as they are very very costly.
public String openFileToString(String fileName)
{
StringBuilder s = new StringBuilder(_bytes.length);
for(int i = 0; i < _bytes.length; i++)
{
s.append((char)_bytes[i]);
}
return s.toString();
}

Why not get what you are looking for from the get go and read a string from the file instead of an array of bytes? Something like:
BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream( "foo.txt"), Charset.forName( "UTF-8"));
then readLine from in until it's done.

I use this way
String strIn = new String(_bytes, 0, numBytes);

Converting part of a ByteBuffer to a String

I have a ByteBuffer containing bytes that were derived by String.getBytes(charsetName), where "containing" means that the string comprises the entire sequence of bytes between the ByteBuffer's position() and limit().
What's the best way for me to get the string back? (assuming I know the encoding charset) Is there anything better than the following (which seems a little clunky)
byte[] ba = new byte[bbuf.remaining()];
bbuf.get(ba);
try {
String s = new String(ba, charsetName);
}
catch (UnsupportedEncodingException e) {
/* take appropriate action */
}

String s = Charset.forName(charsetName).decode(bbuf).toString();

Compressing strings for client/server transport in Java

I work with a propriety client/server message format that restricts what I can send over the wire. I can't send a serialized object, I have to store the data in the message as a String. The data I am sending are large comma-separated values, and I want to compress the data before I pack it into the message as a String.
I attempted to use Deflater/Inflater to achieve this, but somewhere along the line I am getting stuck.
I am using the two methods below to deflate/inflate. However, passing the result of the compressString() method to decompressStringMethod() returns a null result.
public String compressString(String data) {
Deflater deflater = new Deflater();
byte[] target = new byte[100];
try {
deflater.setInput(data.getBytes(UTF8_CHARSET));
deflater.finish();
int deflateLength = deflater.deflate(target);
return new String(target);
} catch (UnsupportedEncodingException e) {
//TODO
}
return data;
}
public String decompressString(String data) {
String result = null;
try {
byte[] input = data.getBytes();
Inflater inflater = new Inflater();
int inputLength = input.length;
inflater.setInput(input, 0, inputLength);
byte[] output = new byte[100];
int resultLength = inflater.inflate(output);
inflater.end();
result = new String(output, 0, resultLength, UTF8_CHARSET);
} catch (DataFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return result;
}

From what I can tell, your current approach is:
Convert String to byte array using getBytes("UTF-8").
Compress byte array
Convert compressed byte array to String using new String(bytes, ..., "UTF-8").
Transmit compressed string
Receive compressed string
Convert compressed string to byte array using getBytes("UTF-8").
Decompress byte array
Convert decompressed byte array to String using new String(bytes, ..., "UTF-8").
The problem with this approach is in step 3. When you compress the byte array, you create a sequence of bytes which may no longer be valid UTF-8. The result will be an exception in step 3.
The solution is to use a "bytes to characters" encoding scheme like Base64 to turn the compressed bytes into a transmissible string. In other words, replace step 3 with a call to a Base64 encode function, and step 6 with a call to a Base64 decode function.
Notes:
For small strings, compressing and
encoding is likely to actually
increase the size of the transmitted string.
If the compacted String is going to be incorporated into a URL, you may want to pick a different encoding to Base64 that avoids characters that need to be URL escaped.
Depending on the nature of the data you are transmitting, you may find that a domain specific compression works better than a generic one. Consider compressing the data before creating the comma-separated string. Consider alternatives to comma-separated strings.

The problem is that you convert compressed bytes to a string, which breaks the data. Your compressString and decompressString should work on byte[]
EDIT: Here is revised version. It works
EDIT2: And about base64. you're sending bytes, not strings. You don't need base64.
public static void main(String[] args) {
String input = "Test input";
byte[] data = new byte[100];
int len = compressString(input, data, data.length);
String output = decompressString(data, len);
if (!input.equals(output)) {
System.out.println("Test failed");
}
System.out.println(input + " " + output);
}
public static int compressString(String data, byte[] output, int len) {
Deflater deflater = new Deflater();
deflater.setInput(data.getBytes(Charset.forName("utf-8")));
deflater.finish();
return deflater.deflate(output, 0, len);
}
public static String decompressString(byte[] input, int len) {
String result = null;
try {
Inflater inflater = new Inflater();
inflater.setInput(input, 0, len);
byte[] output = new byte[100]; //todo may oveflow, find better solution
int resultLength = inflater.inflate(output);
inflater.end();
result = new String(output, 0, resultLength, Charset.forName("utf-8"));
} catch (DataFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return result;
}

TO ME: write compress algorithm myself is difficult but writing binary to string is not. So if I were you, I will serialize the object normally and zip it with compression (as provided by ZipFile) then convert to string using something like Base64 Encode/Decode.
I actually have BASE64 ENCODE/DECODE functions. If you wanted I can post it here.

If you have a piece of code which seems to be silently failing, perhaps you shouldn't catch and swallow Exceptions:
catch (UnsupportedEncodingException e) {
//TODO
}
But the real reason why decompress returns null is because your exception handling doesn't specify what to do with result when you catch an exception - result is left as null. Are you checking the output to see if any Exceptions are occuring?
If I run your decompress() on a badly formatted String, Inflater throws me this DataFormatException:
java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:223)
at java.util.zip.Inflater.inflate(Inflater.java:240)

Inflator/Deflator is not a solution for compress string.
I think GZIPInputString and GZIPOutputString is the proper tool to compress the string

I was facing similar issue which was resolved by base64 decoding the input.
i.e instead of
data.getBytes(UTF8_CHARSET)
i tried
Base64.decodeBase64(data)
and it worked.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

GZIP decompress string and byte conversion - java

Related

How can I create a truststore from a base64 encoded String?

How to convert String variable back in byte[] in JAVA [duplicate]

UTF-8 byte[] to String

Converting part of a ByteBuffer to a String

Compressing strings for client/server transport in Java

Categories

Resources