I'm trying to uncompress data that was compressed using the ZLIB library written by Jean-loup Gailly back in the 1990s. I think it is a popular library (I see a lot of programs that ship the zlib32.dll file it uses) so I hope someone will be familiar enough with it to help me. I am using the compress() function directly which from what I read uses rfc-1951 DEFLATE format.
Here is a segment of the code I am using to read some compressed data from a stream and uncompress it:
InputStream is = new ByteArrayInputStream(buf);
//GZIPInputStream gzis = new GZIPInputStream(is);
InflaterInputStream iis = new InflaterInputStream(is);
byte[] buf2 = new byte[uncompressedDataLength];
iis.read(buf2);
The iis.read(buf2) function throws an internal exception of "Data Format Error". I tried using GZIPInputStream also, but that also throws the same exception.
The "buf" variable is type byte[] and I have confirmed by debugging that it is the same as what my C program gets back from the ZLIB compress() function (the actual data comes from a server over TCP). "uncompressedDataLength" is the known size of the uncompressed data that was also provided by the C program (server).
Has anyone tried reading/writing data using this library and then reading/writing the same data on the Android using Java?
I did find a "pure Java port of ZLIB" referenced in a few places, and if I need to I can try that, but I would rather use the built-in/OS functions if possible.
The data formats deflate, zlib and gzip in play here are all related.
The base is the deflate compressed data format, defined in RFC 1951.
As it is often quite useless in its pure form, we usually use a wrapping format around it.
The gzip compressed data format (RFC 1952) is intended for compression of files. It consists of a header which has space for a file name and some attributes, a deflate data stream, and a CRC-32 check sum (4 bytes) at the end. (There is also support of multiple such files in one stream in the specification, but I think this isn't used as often.)
The zlib compressed data format, defined in RFC 1950: It consists of a smaller header (2 or 6 bytes), a deflate data stream, and an Adler-32 check sum (4 bytes) at the end. (The Adler-32 check sum is intended to be faster to calculate than the CRC-32 check sum used in gzip.) It is intended for compressed transmission of data inside some other protocols, or compressed storage inside other file formats. For example, it is used inside the PNG file format.
The zlib library supports all these formats. Java's java.util.zip is build on zlib (as part of the VM's implementation/native calls), and exposes access to these with several classes:
The Deflater and Inflater classes implement - depending on the nowrap argument to the constructor - either the zlib or the deflate data formats.
DeflaterOutputStream/DeflaterInputStream/InflaterInputStream/InflaterOutputStream build on a Deflater/Inflater. The documentation doesn't say clearly whether the default Inflater/Deflater implements zlib or deflate, but the source shows that it uses the default Deflater or Inflater constructor, which implements zlib.
GZipOutputStream/GZipInputStream implement, as the name says, the gzip format.
I had a look at the source code of zlib's compress function, and it seems to use the zlib format. So your code should do the right thing. Make sure there is no missing data, or additional data which is not part of the compressed data block before or after it.
Disclaimer: This is the state for Java SE, I suppose it is similar for Android, but I can't guarantee this.
The jzlib library you found (I suppose), which is a Java reimplementation of zlib, also implements all these data formats (gzip was added in the latest update). For interactive use (on the compressing side) it is preferable, since it allows some flushing actions which are not possible with java.util's classes (other than using some workaround like changing the compression level), and it also might be faster since it avoids native calls (which always have some overhead).
PS: The zip (or pkzip) file format is also related: It uses deflate internally for each file inside the archive.
Related
I'm trying to learn how to use DeflaterOutputStream as something to kill time during my winter break. I'm confused because when I look at the documentation https://docs.oracle.com/javase/7/docs/api/java/util/zip/DeflaterOutputStream.html, it says that deflate() is used to write a compressed data to OutputStream, while write() is to write data to the DeflaterOutputStream (compressed OutputStream) to be compressed.
However, I'm looking at sample codes on the internet, but none uses deflate() at all. All the code I've seen so far just write() to the DeflaterOutputStream without calling deflate().
https://stackoverflow.com/a/13060441/12181863
https://www.programcreek.com/java-api-examples/?api=java.util.zip.DeflaterOutputStream
I noticed that the code puts a FileOutputStream inside the DeflaterOutputStream, but how does it interact? Does it automatically call deflate() to send compressed data to FileOutputStream when data is written to DeflaterOutputStream?
It's protected: It is intended for anything subclassing that stream, and you're not subclassing it, so as far as you are concerned, it is an implementation detail you cannot include in your reasoning and which isn't meant for you to invoke.
Unless, of course, you subclass it.
Which you could - it's sort of a toolkit for building LZ-based compression streams on top of. That's why both GZipOutputStream and ZipOutputStream extend it: Those are different containers that more or less use the same compression technology. And they do invoke that deflate. Unless you're developing your own LZ-based compression system or implementing a reader for an existing, non-zip, non-gz, non-deflater based compression format, this is not meant for you.
These kinds of outputstreams are called 'filterstreams': They do not themselves represent any resource, they wrap around one. They can wrap around any OutputStream (or any InputStream, the concept works on 'both sides' so to speak), and modify bytes in transit.
var out = new DeflaterOutputStream(whatever) creates a new deflater stream that will compress any data you send to it (via out.write(stuff)), and it will in turn take the compressed data and send it on to whatever. It does the job of:
take bytes (as per out.write), buffer as much as is needed to do the job:
... of compressing this data.
Then process the compressed data, as it becomes compressed, by sending it to the wrapped outputstream (whatever, in this example), by calling its write method.
The basic usage is:
Create a resource, such as Files.newOutputStream or someSocket.getOutputStream or httpServletResponse.getOutputStream() or System.out or anything else that produces a stream - it's a abstract concept for a reason: To make things flexible.
Wrap that resource into a DeflaterOutputStream
Write all your data to the deflateroutputstream. Forget about the original - you made it so you can pass it to DeflaterOutputStream, and that's where your interaction with the underlying stream ends.
Close the deflaterstream (which will end up closing the underlying stream as well).
We usually compress data with DeflateOutputStream (or GZIPOutputStream) and decompress them with InflateInputStream (or GZIPInputStream).
But we have DeflateInputStream and InflateOutputStream since Java 1.6. What are the usages of these two classes?
You can use them if you need to process data in its compressed (deflated) format, or if you have compressed data in memory that you need to decompress and output. This could come in handy if you need to store compressed data in some location that does not handle streams very well (such as a database field), or if you have obtained compressed data from a non-stream source and want to decompress it to a stream destination.
My app creates a large amount of output, but only over a long time. Each time there is new output to add it is just a string (a few hundred bytes worth).
It would simplify my code considerably if I could add incrementally (i.e. append) to a pre-existing GZIP (or Zip) file. Is this even possible (in Java, specifically)?
I am looking for a solution that will create a file that can be opened by 3rd party apps.
I realize I can decompress the file, add the additional text and compress it again as a new blob.
Thanks
PVS
Yes. See this example in C in the examples directory of the zlib distribution: gzlog.h and gzlog.c. It does exactly that, allowing you to append short pieces of data to a gzip file. It does so efficiently, by not compressing the additions until a threshold is reached, and then compressing what hasn't been compressed so far. After each addition, the gzip file contains the addition and is a valid gzip file. The code also protects against system crashes in the middle of an append operation, recovering the file on the next append operation.
Though allowed by the format, this code does not simply concatenate short gzip streams. That would result in very poor compression.
I have implemented the program that will transfer any txt file using the udp socket in java. I am using printwriter to write and read. But using that I am not able to transfer any file other than txt (say i want to transfer pdf). In this case what should be done. I am using the below function for file write.
Output_File_Write = new PrintWriter("dummy.txt");
Output_File_Write.print(new String(p.getData()));
Writers / PrintWriters are for writing text files. They take (Unicode-based) character data and encode it using the default character encoding (or a specified one), and write that to the file.
A PDF document (as you get it from the network) is in a binary format, so you need to use a FileOutputStream to write the file.
It is also a little bit concerning that you are attempting to transfer documents using UDP. UDP provides no guarantees that the datagrams sent will all arrive, or that they will arrive in the same order as they were sent. Unless you can always fit the entire document into a single datagram, you will have to do a significant amount of work to detect that datagrams have been dropped or have arrived in the wrong order ... and take remedial action.
Using TCP would be far simpler.
AFAIK PrintWriter is meant to be used with Text. Quote from doc
Prints formatted representations of objects to a text-output stream. This class implements all of the print methods found in PrintStream. It does not contain methods for writing raw bytes, for which a program should use unencoded byte streams.
To be able to send binary data you would need to use apt API for it, for example PrintStream
I've to make a code to upload/download a file on remote machine. But when i upload the file new line is not saved as well as it automatically inserts some binary characters. Also I'm not able to save the file in its actual format, I've to save it as "filename.ser". I'm using serialization-deserialization concept of java.
Thanks in advance.
How exactly are you transmitting the files? If you're using implementations of InputStream and OutputStream, they work on a byte-by-byte level so you should end up with a binary-equal output.
If you're using implementations of Reader and Writer, they convert the bytes to characters according to some character mapping, and then perform the reverse process when saving. Depending on the platform encodings of the various machines (and possibly other effects if you're not specifying the charset explicitly), you could well end up with differences in the binary file.
The fact that you mention newlines makes me think that you're using Readers to send strings (and possibly that you're stitching the strings back together yourself by manually adding newlines). If you want the files to be binary equal, then send them as a stream of bytes and store that stream verbatim. If you want them to be equal as strings in a given character set, then use Readers and Writers but specify the character set explicitly. If you want them to be transmitted as strings in the platform default set (not very useful), then accept that they're not going to be binary equal as files.
(Also, your question really doesn't provide much information to solve it. To me, it basically reads "I wrote some code to do X, and it doesn't work. Where did I go wrong?" You seem to assume that your code is correct by not listing it, but at the same time recognise that it's not...)