Decompress zlib stream in Clojure

Decompress zlib stream in Clojure - java

I have a binary file with contents created by zlib.compress on Python, is there an easy way to open and decompress it in Clojure?
import zlib
import json
with open('data.json.zlib', 'wb') as f:
f.write(zlib.compress(json.dumps(data).encode('utf-8')))
Basicallly it isn't a gzip file, it is just bytes representing deflated data.
I could only find these references but not quite what I'm looking for (I think first two are most relevant):
deflateclj_hatemogi_clojure/deflate.clj
funcool/buddy-core/deflate.clj
Compressing / Decompressing strings in clojure
Reading and Writing Compressed Files
clj-http
Must I really implement this multi-line wrapper to java.util.zip or is there a nice library out there? Actually I'm not even sure if these byte streams are compatible across libraries, or if I'm just trying to mix-and-match wrong libs.
Steps in Python:
>>> '{"hello": "world"}'.encode('utf-8')
b'{"hello": "world"}'
>>> zlib.compress(b'{"hello": "world"}')
b'x\x9c\xabV\xcaH\xcd\xc9\xc9W\xb2RP*\xcf/\xcaIQ\xaa\x05\x009\x99\x06\x17'
>>> [int(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23]
>>> import numpy
>>> [numpy.int8(i) for i in zlib.compress(b'{"hello": "world"}')]
[120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]
>>> zlib.decompress(bytes([120, 156, 171, 86, 202, 72, 205, 201, 201, 87, 178, 82, 80, 42, 207, 47, 202, 73, 81, 170, 5, 0, 57, 153, 6, 23])).decode('utf-8')
'{"hello": "world"}'
Decode attempt in Clojure:
; https://github.com/funcool/buddy-core/blob/master/src/buddy/util/deflate.clj#L40 without try-catch
(ns so.core
(:import java.io.ByteArrayInputStream
java.io.ByteArrayOutputStream
java.util.zip.Deflater
java.util.zip.DeflaterOutputStream
java.util.zip.InflaterInputStream
java.util.zip.Inflater
java.util.zip.ZipException)
(:gen-class))
(defn uncompress
"Given a compressed data as byte-array, uncompress it and return as an other byte array."
([^bytes input] (uncompress input nil))
([^bytes input {:keys [nowrap buffer-size]
:or {nowrap true buffer-size 2048}
:as opts}]
(let [buf (byte-array (int buffer-size))
os (ByteArrayOutputStream.)
inf (Inflater. ^Boolean nowrap)]
(with-open [is (ByteArrayInputStream. input)
iis (InflaterInputStream. is inf)]
(loop []
(let [readed (.read iis buf)]
(when (pos? readed)
(.write os buf 0 readed)
(recur)))))
(.toByteArray os))))
(uncompress (byte-array [120, -100, -85, 86, -54, 72, -51, -55, -55, 87, -78, 82, 80, 42, -49, 47, -54, 73, 81, -86, 5, 0, 57, -103, 6, 23]))
ZipException invalid stored block lengths java.util.zip.InflaterInputStream.read (InflaterInputStream.java:164)
Any help would be appreciated. I wouldn't want to use zip or gzip files as I only care about raw content, not file names or modification dates in this context. But is possible to use an other compression algorithm on Python side if it is the only option.

Here is an easy way to do it with gzip:
Python code:
import gzip
content = "the quick brown fox"
with gzip.open('fox.txt.gz', 'wb') as f:
f.write(content)
Clojure code:
(with-open [in (java.util.zip.GZIPInputStream.
(clojure.java.io/input-stream
"fox.txt.gz"))]
(println "result:" (slurp in)))
;=> result: the quick brown fox
Keep in mind that "gzip" is an algorithm and a format, and does not mean you need to use the "gzip" command-line tool.
Please note that the input to Clojure doesn't have to be a file. You could send the gzip compressed data as raw bytes over a socket and still decompress it on the Clojure side. Full details at: https://clojuredocs.org/clojure.java.io/input-stream
Update
If you need to use the pure zlib format instead of gzip, the result is very similar:
Python code:
import zlib
fp = open( 'balloon.txt.z', 'wb' )
fp.write( zlib.compress( 'the big red baloon' ))
fp.close()
Clojure code:
(with-open [in (java.util.zip.InflaterInputStream.
(clojure.java.io/input-stream
"balloon.txt.z"))]
(println "result:" (slurp in)))
;=> result: the big red baloon

Related

Base64 decoding differences

I've been trying to figure out why Base64 decoding is different between Dart and Java.
Dart example code
import 'dart:convert';
var str = '640gPKMxZZbeLDIUeXiZmg==';
var dec = base64.decode(str);
print(dec);
prints: [235, 141, 32, 60, 163, 49, 101, 150, 222, 44, 50, 20, 121, 120, 153, 154]
Java example code
import java.util.Base64;
String str = "640gPKMxZZbeLDIUeXiZmg==";
byte[] dec = Base64.getDecoder().decode(str);
System.out.println(Arrays.toString(dec));
prints: [-21, -115, 32, 60, -93, 49, 101, -106, -34, 44, 50, 20, 121, 120, -103, -102]
Any ideas? As far as I'm aware they both implement RFC4648.
For the Dart code, I did try using base64url and the normalize function which didn't change anything (to be expected I suppose). Not too sure what else to try.

Unpredictable behaviour of JTextField removing ASCII char

I have been working on some application which encrypts data using 3DES algorithm, but when I tried to make some UI for it I faced some really strange behaviour, after I encrypt data and represent it back again as ASCII I get:
String encryptedASCII = ;nÆ«»Ë?&]º²ÿ
and following array of bytes for it:
[-62, -118, 59, 110, -61, -122, -62, -85, -62, -69, -61, -117, 4, 7, -62, -105, 63, 38, 93, -62, -70, -62, -78, -61, -65]
But when I use:
textField.setText(encryptedASCII)
and once again get it from there to decrypt:
textField.getText()
I got:
;nÆ«»Ë?&]º²ÿ
and bytes for it:
[-62, -118, 59, 110, -61, -122, -62, -85, -62, -69, -61, -117, -62, -105, 63, 38, 93, -62, -70, -62, -78, -61, -65]
What makes it two missing bytes [4, 7] from the array of bytes which I got before I set up text field with its representation as ASCII.
Is there anything what I'm missing here ? I can't decrypt it back again having data changed over setting it up in text field.

String (bytes[] Charset) is returning results differently in Java7 and java 8

import java.io.UnsupportedEncodingException;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
public class Java87String {
public static void main(String[] args) throws UnsupportedEncodingException {
// TODO Auto-generated method stub
//byte[] b = {-101, 53, -51, -26, 24, 60, 20, -31, -6, 45, 50, 103, -66, 28, 114, -39, 92, 23, -47, 32, -5, -122, -28, 79, 22, -76, 116, -122, -54, -122};
//byte[] b = {-76, -55, 85, -50, 80, -23, 27, 62, -94, -74, 47, -123, -119, 94, 90, 61, -63, 73, 56, -48, -54, -4, 11, 79};
byte[] b = { -5, -122, -28};
System.out.println("Input Array :" + Arrays.toString(b));
System.out.println("Array Length : " + b.length);
String target = new String(b,StandardCharsets.UTF_8);
System.out.println(Arrays.toString(target.getBytes("UTF-8")));
System.out.println("Final Key :" + target);
}
}
The above code returns the following output in Java 7
Input Array :[-5, -122, -28]
Array Length : 3
[-17, -65, -67]
Final Key :�
The Same code returns the following output in Java 8
Input Array :[-5, -122, -28]
Array Length : 3
[-17, -65, -67, -17, -65, -67, -17, -65, -67]
Final Key :���
Sounds like Java8 is doing the right thing of replacing with the default sequence of [-17, -65, -67].
Why is there a difference in output and Any Known bugs in JDK 1.7 which fixes this issue?

Per the String JavaDoc:
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified. The CharsetDecoder class should be used when more control over the decoding process is required.

I think (-5, -122, -28) is a invalid UTF-8 byte sequence, so the JVM may output anything in this case. If it were a valid one, maybe the different Java versions could show the same output.
Does this specific byte sequence have a meaning? just curious

How to decode a Base64 string in Scala or Java?

I have a string encoded in Base64:
eJx9xEERACAIBMBKJyKDcTzR_hEsgOxjAcBQFVVNvi3qEsrRnWXwbhHOmzWnctPHPVkPu-4vBQ==
How can I decode it in Scala language?
I tried to use:
val bytes1 = new sun.misc.BASE64Decoder().decodeBuffer(compressed_code_string)
But when I compare the byte array with the correct one that I generated in Python language, there is an error. Here is the command I used in python:
import base64
base64.urlsafe_b64decode(compressed_code_string)
The Byte Array in Scala is:
(120, -100, 125, -60, 65, 17, 0, 32, 8, 4, -64, 74, 39, 34, -125, 113, 60, -47, -2, 17, 44, -128, -20, 99, 1, -64, 80, 21, 85, 77, -66, 45, -22, 18, -54, -47, -99, 101, -16, 110, 17, -50, -101, 53, -89, 114, -45, -57, 61, 89, 15, -69, -2, 47, 5)
And the one generated in python is:
(120, -100, 125, -60, 65, 17, 0, 32, 8, 4, -64, 74, 39, 34, -125, 113, 60, -47, -2, 17, 44, -128, -20, 99, 1, -64, 80, 21, 85, 77, -66, 45, -22, 18, -54, -47, -99, 101, -16, 110, 17, -50, -101, 53, -89, 114, -45, -57, 61, 89, 15, -69, -18, 47, 5)
Note that there is a single difference in the end of the array

In Scala, Encoding a String to Base64 and decoding back to the original String using Java APIs:
import java.util.Base64
import java.nio.charset.StandardCharsets
scala> val bytes = "foo".getBytes(StandardCharsets.UTF_8)
bytes: Array[Byte] = Array(102, 111, 111)
scala> val encoded = Base64.getEncoder().encodeToString(bytes)
encoded: String = Zm9v
scala> val decoded = Base64.getDecoder().decode(encoded)
decoded: Array[Byte] = Array(102, 111, 111)
scala> val str = new String(decoded, StandardCharsets.UTF_8)
str: String = foo

There is unfortunately not just one Base64 encoding. The - character doesn't have the same representation in all encodings. For example, in the MIME encoding, it's not used at all. In the encoding for URLs, it is a value of 62--and this is the one that Python is using. The default sun.misc decoder wants + for 62. If you change the - to +, you get the correct answer (i.e. the Python answer).
In Scala, you can convert the string s to MIME format like so:
s.map{ case '-' => '+'; case '_' => '/'; case c => c }
and then the Java MIME decoder will work.

Both Python and Java are correct in terms of the decoding. They are just using a different RFC for this purpose. Python library is using RFC 3548 and the used java library is using RFC 4648 and RFC 2045.
Changing the hyphen(-) into a plus(+) from your input string will make the both decoded byte data are similar.

Different results on Oracle JRE and Dalvik JVM

I'm stuck while creating a licence manager for an Android app where licence key is generated on desktop server, and verification code runs on android devices. The verification code when executed on desktop produces desired results, but the same code produces a different result on Android.
I debugged the problem and reached the point where the results were getting snapped!
here is a code snippet to demonstrate the difference:
byte[] bytes = {-88, 50, -29, 114, 51, 88, 38, -52, 114, 91, -23, -55, 124, 37, -90, -49, 36, -110, -67, -59, -33, -75, 85, -72, -109, 25, -54, 89, 6, 35, -50, -11, -87, -22, 33, -2, 55, -30, 75, -36, -40, -29, -103, 110, 46, -100, -68, 101, -105, 62, 53, -20, -20, -21, -118, -72, -27, 32, 59, 127, 15, -117, 6, 102};
System.out.println(new String(bytes, "UTF-8").hashCode());
on oracle jdk the result comes out to be
-24892055
but on android phone the result is:
-186036018
Any help will be appreciated.

When you call getBytes() you need to specify an ecoding there as well, otherwise you'll get the default encoding from the OS, which could be anything, e.g. showBytes(new String(bytes, "UTF-8").getBytes("UTF-8"));

It's a difference in how Android and Java handle malformed UTF-8. Given the four byte sequence 0xf5 0xa9 0xea 0x21, Android returns two Unicode replacement characters (0xfffd). Oracle's class library returns three Unicode replacement characters.
Here's a simpler example that demonstrates the problem.
byte[] bytes = { (byte) 0xf5, (byte) 0xa9, (byte) 0xea, (byte) 0x21 };
String decoded = new String(bytes, "UTF-8");
for (int i = 0; i < decoded.length(); i++) {
System.out.print(Integer.toHexString(decoded.charAt(i)) + " ");
}
Oracle's JVM prints
fffd fffd fffd
Android's dalvikvm prints
fffd fffd
Your best bet is to avoid decoding byte sequences using UTF-8 unless you know that they are in fact UTF-8. I've reported this inconsistency to the Dalvik team to investigate: Android bug 23831.
If you use CharsetDecoder, Android uses icu4c to do the conversion. That returns U+fffd U+fffd U+0021, which also seems correct by my reading of the UTF-8 spec. In future releases, Android's String will match Android's CharsetDecoder 2.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Decompress zlib stream in Clojure - java

Related

Base64 decoding differences

Unpredictable behaviour of JTextField removing ASCII char

String (bytes[] Charset) is returning results differently in Java7 and java 8

How to decode a Base64 string in Scala or Java?

Different results on Oracle JRE and Dalvik JVM

Categories

Resources