Compressing strings for client/server transport in Java - java

I work with a propriety client/server message format that restricts what I can send over the wire. I can't send a serialized object, I have to store the data in the message as a String. The data I am sending are large comma-separated values, and I want to compress the data before I pack it into the message as a String.
I attempted to use Deflater/Inflater to achieve this, but somewhere along the line I am getting stuck.
I am using the two methods below to deflate/inflate. However, passing the result of the compressString() method to decompressStringMethod() returns a null result.
public String compressString(String data) {
Deflater deflater = new Deflater();
byte[] target = new byte[100];
try {
deflater.setInput(data.getBytes(UTF8_CHARSET));
deflater.finish();
int deflateLength = deflater.deflate(target);
return new String(target);
} catch (UnsupportedEncodingException e) {
//TODO
}
return data;
}
public String decompressString(String data) {
String result = null;
try {
byte[] input = data.getBytes();
Inflater inflater = new Inflater();
int inputLength = input.length;
inflater.setInput(input, 0, inputLength);
byte[] output = new byte[100];
int resultLength = inflater.inflate(output);
inflater.end();
result = new String(output, 0, resultLength, UTF8_CHARSET);
} catch (DataFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return result;
}

From what I can tell, your current approach is:
Convert String to byte array using getBytes("UTF-8").
Compress byte array
Convert compressed byte array to String using new String(bytes, ..., "UTF-8").
Transmit compressed string
Receive compressed string
Convert compressed string to byte array using getBytes("UTF-8").
Decompress byte array
Convert decompressed byte array to String using new String(bytes, ..., "UTF-8").
The problem with this approach is in step 3. When you compress the byte array, you create a sequence of bytes which may no longer be valid UTF-8. The result will be an exception in step 3.
The solution is to use a "bytes to characters" encoding scheme like Base64 to turn the compressed bytes into a transmissible string. In other words, replace step 3 with a call to a Base64 encode function, and step 6 with a call to a Base64 decode function.
Notes:
For small strings, compressing and
encoding is likely to actually
increase the size of the transmitted string.
If the compacted String is going to be incorporated into a URL, you may want to pick a different encoding to Base64 that avoids characters that need to be URL escaped.
Depending on the nature of the data you are transmitting, you may find that a domain specific compression works better than a generic one. Consider compressing the data before creating the comma-separated string. Consider alternatives to comma-separated strings.

The problem is that you convert compressed bytes to a string, which breaks the data. Your compressString and decompressString should work on byte[]
EDIT: Here is revised version. It works
EDIT2: And about base64. you're sending bytes, not strings. You don't need base64.
public static void main(String[] args) {
String input = "Test input";
byte[] data = new byte[100];
int len = compressString(input, data, data.length);
String output = decompressString(data, len);
if (!input.equals(output)) {
System.out.println("Test failed");
}
System.out.println(input + " " + output);
}
public static int compressString(String data, byte[] output, int len) {
Deflater deflater = new Deflater();
deflater.setInput(data.getBytes(Charset.forName("utf-8")));
deflater.finish();
return deflater.deflate(output, 0, len);
}
public static String decompressString(byte[] input, int len) {
String result = null;
try {
Inflater inflater = new Inflater();
inflater.setInput(input, 0, len);
byte[] output = new byte[100]; //todo may oveflow, find better solution
int resultLength = inflater.inflate(output);
inflater.end();
result = new String(output, 0, resultLength, Charset.forName("utf-8"));
} catch (DataFormatException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return result;
}

TO ME: write compress algorithm myself is difficult but writing binary to string is not. So if I were you, I will serialize the object normally and zip it with compression (as provided by ZipFile) then convert to string using something like Base64 Encode/Decode.
I actually have BASE64 ENCODE/DECODE functions. If you wanted I can post it here.

If you have a piece of code which seems to be silently failing, perhaps you shouldn't catch and swallow Exceptions:
catch (UnsupportedEncodingException e) {
//TODO
}
But the real reason why decompress returns null is because your exception handling doesn't specify what to do with result when you catch an exception - result is left as null. Are you checking the output to see if any Exceptions are occuring?
If I run your decompress() on a badly formatted String, Inflater throws me this DataFormatException:
java.util.zip.DataFormatException: incorrect header check
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:223)
at java.util.zip.Inflater.inflate(Inflater.java:240)

Inflator/Deflator is not a solution for compress string.
I think GZIPInputString and GZIPOutputString is the proper tool to compress the string

I was facing similar issue which was resolved by base64 decoding the input.
i.e instead of
data.getBytes(UTF8_CHARSET)
i tried
Base64.decodeBase64(data)
and it worked.

Related

How can I create a truststore from a base64 encoded String?

I have a String that is encoded in base64, I need to take this string, decode it and create a truststore file, but when I do that, the final file is not valid. Here is my code:
public static void buildFile() {
String exampleofencoded = "asdfasdfasdfadfa";
File file = new File("folder/file.jks");
try (FileOutputStream fos = new FileOutputStream(file);
BufferedOutputStream bos = new BufferedOutputStream(fos);
DataOutputStream dos = new DataOutputStream(bos))
{
Base64.Decoder decoder = Base64.getDecoder();
String decodedString =new String(decoder.decode(exampleofencoded).getBytes());
dos.writeBytes(decodedString);
}
catch (IOException e) {
System.out.println("Error creating file");
}
catch(NullPointerException e) {
System.out.println(e.getMessage();
}
}
The problem is two-fold.
You're converting a byte[] array to String, which is a lossy operation for actual binary data for most character sets (except maybe iso-8859-1).
You're using DataOutputStream, which is not a generic output stream, but intended for a specific serialization format of primitive types. And specifically its writeBytes method comes with an important caveat ("Each character in the string is written out, in sequence, by discarding its high eight bits."), which is one more reason why only using iso-8859-1 will likely work.
Instead, write the byte array directly to the file
public static void buildFile() {
String exampleofencoded = "asdfasdfasdfadfa";
File file = new File("folder/file.jks");
try (FileOutputStream fos = Files.newOutputStream(file.toPath()) {
Base64.Decoder decoder = Base64.getDecoder();
byte[] decodedbytes = decoder.decode(exampleofencoded);
fos.write(decodedbytes);
} catch (IOException e) {
System.out.println("Error creating file");
}
}
As an aside, you shouldn't catch NullPointerException in your code, it is almost always a problem that can be prevented by careful programming and/or validation of inputs. I would usually also advise against catch the IOException here and only printing it. It is probably better to propagate that exception as well, and let the caller handle it.

Cannot get my base64 string to decode on my javascript client

I am sending data from my java tomcat server to my browser using a WebSocket. I get the error: "Uncaught InvalidCharacterError: 'atob' failed: The string to be decoded is not correctly encoded."
Here is my code:
(java server code):
public void open(Session session)
{
String base64ImageString = generateImageString();
try
{
session.getBasicRemote().sendText(base64ImageString);
}
catch(IOException e)
{
e.printStackTrace();
}
}
private String generateImageString()
{
int imageData[] = new int[2];
imageData[0] = 255;
imageData[1] = 128;
String base64Image = "";
for(int i = 0; i < imageData.length; i++)
{
try
{
base64Image += Base64.encode(Integer.toString(imageData[i]).getBytes("UTF8"));
catch (UnsupportedEncodingException e)
}
catch( UnsupportedEncodingException e)
{
e.printStackTrace();
}
}
return base64Image;
}
(JavaScript code):
function onMessage(evt)
{
base64ImageDataString = evt.data;
imageDataString = window.atob(base64ImageDataString);
}
My base64 string looks like this on the java and javascript side: [B#74193bd0[B#24a6103c
I am using org.glassfish.jersey.internal.util.Base64 if it matters. I am really stumped :(
My base64 string looks like this on the java and javascript side: [B#74193bd0[B#24a6103c
That's not base64. That's the concatenation of the result of calling toString() on two byte arrays. You're using a method which returns a byte[], not a string, which means your string concatenation is inappropriate. You could use Base64.encodeAsString - or use a different base64 library entirely (e.g. the iharder one). But really you shouldn't be doing any string concatenation.
Your generateImageString code is completely broken. It's not at all clear why you'd convert an integer to a string, get the UTF-8 representation of that, and then convert the byte array to base64... and then do that in a loop. That's just not the way to get anything meaningful.
I suspect you should actually be starting with a byte[] rather than an int[] - it's not clear what those values are meant to be - but then you want a single call to Base64.encode, passing the byte[] in. If you're calling Integer.toString or concatenating bits of Base64 data, you're doing it wrong.
[B#24a6103c represents a byte array as a string since Base64.encode returns a byte array.
You need to convert the byte array to a string before concatenating it to the String base64Image
I think you want to do this:
base64Image += new String(Base64.encode(Integer.toString(imageData[i]).getBytes("UTF8")));

How SHA-1 hashing works on Java on Android

I searched around for how to hash device identifiers and stumbled on the following code.
I don't really understand what it's doing.
Why do I need to urlEncode the device id ?
Why do I need to hash the bytes, couldn't I just do that on a String ?
Why do I need to convert it to a BigInteger ?
Why do I need to shift bits to get a String with the hashed id ?
Can anyone explain what's going on line by line? I hope this will help other people understand this snippet that's getting passed around in blogs and forums, too.
String hashedId = "";
String deviceId = urlEncode(Secure.getString(context.getContentResolver(), Secure.ANDROID_ID));
try {
MessageDigest digest = MessageDigest.getInstance("SHA-1");
byte bytes[] = digest.digest(deviceId.getBytes());
BigInteger b = new BigInteger(1, bytes);
hashedId = String.format("%0" + (bytes.length << 1) + "x", b);
} catch (NoSuchAlgorithmException e) {
//ignored
}
return hashedId;
Why do I need to urlEncode the device id ?
Why do I need to hash the bytes, couldn't I just do that on a String ?
Most hashing algorithms, including SHA-1, work on binary data as input (i.e. bytes). Strings themselves don't have a specific binary representation; it changes depending on the encoding.
The line of code they provide uses the default encoding, which is a bit fragile. I would prefer to see something like
byte bytes[] = digest.digest(deviceId.getBytes(Charset.forName("UTF-8")));
Why do I need to convert it to a BigInteger ?
This is being used for convenience to help with the conversion to a hexadecimal representation.
Why do I need to shift bits to get a String with the hashed id ?
The format String being used is %0Nx, which causes the string to be zero-padded to N characters. Since it takes two characters to represent a byte in hexadecimal, N is bytes*2, which is the result as bytes << 1.
I don't really understand why you wouldn't just include Guava for Android and use the Hashing builder:
String hash = Hashing.sha1().hashString(deviceId, Charsets.UTF_8).toString();
It's one line and doesn't throw checked exceptions.
About the bit-shifting: shifting left by one is equivalent to multiplying by 2. Each byte in the string is represented by 2 hex characters, so the resulting string will be twice as long as the number of bytes in the hash.
This will create a format string that looks something like %032x, which will print an integral value as a zero-padded 32-character string.
You need to hash the bytes, rather than the String, so that you're hashing the character data rather than the String object, which may have unpredictable internal state for a given sequence of characters.
It's converted to BigInteger so it can be consistently formatted with two hex digits per byte. (This is why the length is multiplied by two with the left shift.)
Basically, the answer to all of your questions is: so that you get reliable, repeatable results, even on different platforms.
You Can Use this Code also :
public class sha1Calculate {
public static void main(String[] args)throws Exception
{
File file = new File("D:\\Android Links.txt");
String outputTxt= "";
String hashcode = null;
try {
FileInputStream input = new FileInputStream(file);
ByteArrayOutputStream output = new ByteArrayOutputStream ();
byte [] buffer = new byte [65536];
int l;
while ((l = input.read (buffer)) > 0)
output.write (buffer, 0, l);
input.close ();
output.close ();
byte [] data = output.toByteArray ();
MessageDigest digest = MessageDigest.getInstance( "SHA-1" );
byte[] bytes = data;
digest.update(bytes, 0, bytes.length);
bytes = digest.digest();
StringBuilder sb = new StringBuilder();
for( byte b : bytes )
{
sb.append( String.format("%02X", b) );
}
System.out.println("Digest(in hex format):: " + sb.toString());
}catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

GZIP decompress string and byte conversion

I have a problem in code:
private static String compress(String str)
{
String str1 = null;
ByteArrayOutputStream bos = null;
try
{
bos = new ByteArrayOutputStream();
BufferedOutputStream dest = null;
byte b[] = str.getBytes();
GZIPOutputStream gz = new GZIPOutputStream(bos,b.length);
gz.write(b,0,b.length);
bos.close();
gz.close();
}
catch(Exception e) {
System.out.println(e);
e.printStackTrace();
}
byte b1[] = bos.toByteArray();
return new String(b1);
}
private static String deCompress(String str)
{
String s1 = null;
try
{
byte b[] = str.getBytes();
InputStream bais = new ByteArrayInputStream(b);
GZIPInputStream gs = new GZIPInputStream(bais);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int numBytesRead = 0;
byte [] tempBytes = new byte[6000];
try
{
while ((numBytesRead = gs.read(tempBytes, 0, tempBytes.length)) != -1)
{
baos.write(tempBytes, 0, numBytesRead);
}
s1 = new String(baos.toByteArray());
s1= baos.toString();
}
catch(ZipException e)
{
e.printStackTrace();
}
}
catch(Exception e) {
e.printStackTrace();
}
return s1;
}
public String test() throws Exception
{
String str = "teststring";
String cmpr = compress(str);
String dcmpr = deCompress(cmpr);
}
This code throw java.io.IOException: unknown format (magic number ef1f)
GZIPInputStream gs = new GZIPInputStream(bais);
It turns out that when converting byte new String (b1) and the byte b [] = str.getBytes () bytes are "spoiled." At the output of the line we have already more bytes. If you avoid the conversion to a string and work on the line with bytes - everything works. Sorry for my English.
public String unZip(String zipped) throws DataFormatException, IOException {
byte[] bytes = zipped.getBytes("WINDOWS-1251");
Inflater decompressed = new Inflater();
decompressed.setInput(bytes);
byte[] result = new byte[100];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while (decompressed.inflate(result) != 0)
buffer.write(result);
decompressed.end();
return new String(buffer.toByteArray(), charset);
}
I'm use this function to decompress server responce. Thanks for help.
You have two problems:
You're using the default character encoding to convert the original string into bytes. That will vary by platform. It's better to specify an encoding - UTF-8 is usually a good idea.
You're trying to represent the opaque binary data of the result of the compression as a string by just calling the String(byte[]) constructor. That constructor is only meant for data which is encoded text... which this isn't. You should use base64 for this. There's a public domain base64 library which makes this easy. (Alternatively, don't convert the compressed data to text at all - just return a byte array.)
Fundamentally, you need to understand how different text and binary data are - when you want to convert between the two, you should do so carefully. If you want to represent "non text" binary data (i.e. bytes which aren't the direct result of encoding text) in a string you should use something like base64 or hex. When you want to encode a string as binary data (e.g. to write some text to disk) you should carefully consider which encoding to use. If another program is going to read your data, you need to work out what encoding it expects - if you have full control over it yourself, I'd usually go for UTF-8.
Additionally, the exception handling in your code is poor:
You should almost never catch Exception; catch more specific exceptions
You shouldn't just catch an exception and continue as if it had never happened. If you can't really handle the exception and still complete your method successfully, you should let the exception bubble up the stack (or possibly catch it and wrap it in a more appropriate exception type for your abstraction)
When you GZIP compress data, you always get binary data. This data cannot be converted into string as it is no valid character data (in any encoding).
So your compress method should return a byte array and your decompress method should take a byte array as its parameter.
Futhermore, I recommend you use an explicit encoding when you convert the string into a byte array before compression and when you turn the decompressed data into a string again.
When you GZIP compress data, you always get binary data. This data
cannot be converted into string as it is no valid character data (in
any encoding).
Codo is right, thanks a lot for enlightening me. I was trying to decompress a string (converted from the binary data). What I amended was using InflaterInputStream directly on the input stream returned by my http connection. (My app was retrieving a large JSON of strings)

Converting part of a ByteBuffer to a String

I have a ByteBuffer containing bytes that were derived by String.getBytes(charsetName), where "containing" means that the string comprises the entire sequence of bytes between the ByteBuffer's position() and limit().
What's the best way for me to get the string back? (assuming I know the encoding charset) Is there anything better than the following (which seems a little clunky)
byte[] ba = new byte[bbuf.remaining()];
bbuf.get(ba);
try {
String s = new String(ba, charsetName);
}
catch (UnsupportedEncodingException e) {
/* take appropriate action */
}
String s = Charset.forName(charsetName).decode(bbuf).toString();

Categories