When to use deflate() of deflateroutputstream?

When to use deflate() of deflateroutputstream? - java

I'm trying to learn how to use DeflaterOutputStream as something to kill time during my winter break. I'm confused because when I look at the documentation https://docs.oracle.com/javase/7/docs/api/java/util/zip/DeflaterOutputStream.html, it says that deflate() is used to write a compressed data to OutputStream, while write() is to write data to the DeflaterOutputStream (compressed OutputStream) to be compressed.
However, I'm looking at sample codes on the internet, but none uses deflate() at all. All the code I've seen so far just write() to the DeflaterOutputStream without calling deflate().
https://stackoverflow.com/a/13060441/12181863
https://www.programcreek.com/java-api-examples/?api=java.util.zip.DeflaterOutputStream
I noticed that the code puts a FileOutputStream inside the DeflaterOutputStream, but how does it interact? Does it automatically call deflate() to send compressed data to FileOutputStream when data is written to DeflaterOutputStream?

It's protected: It is intended for anything subclassing that stream, and you're not subclassing it, so as far as you are concerned, it is an implementation detail you cannot include in your reasoning and which isn't meant for you to invoke.
Unless, of course, you subclass it.
Which you could - it's sort of a toolkit for building LZ-based compression streams on top of. That's why both GZipOutputStream and ZipOutputStream extend it: Those are different containers that more or less use the same compression technology. And they do invoke that deflate. Unless you're developing your own LZ-based compression system or implementing a reader for an existing, non-zip, non-gz, non-deflater based compression format, this is not meant for you.
These kinds of outputstreams are called 'filterstreams': They do not themselves represent any resource, they wrap around one. They can wrap around any OutputStream (or any InputStream, the concept works on 'both sides' so to speak), and modify bytes in transit.
var out = new DeflaterOutputStream(whatever) creates a new deflater stream that will compress any data you send to it (via out.write(stuff)), and it will in turn take the compressed data and send it on to whatever. It does the job of:
take bytes (as per out.write), buffer as much as is needed to do the job:
... of compressing this data.
Then process the compressed data, as it becomes compressed, by sending it to the wrapped outputstream (whatever, in this example), by calling its write method.
The basic usage is:
Create a resource, such as Files.newOutputStream or someSocket.getOutputStream or httpServletResponse.getOutputStream() or System.out or anything else that produces a stream - it's a abstract concept for a reason: To make things flexible.
Wrap that resource into a DeflaterOutputStream
Write all your data to the deflateroutputstream. Forget about the original - you made it so you can pass it to DeflaterOutputStream, and that's where your interaction with the underlying stream ends.
Close the deflaterstream (which will end up closing the underlying stream as well).

Related

How to convert a BufferedWriter to a BufferedReader

My understanding is that this is a common scenario, but Java doesn't have a baked in solution and I've been searching on and off for more than a day now. I have tried the CircularCharBuffer from the Ostermiller library, but that uses some sort of reader that constantly waits for new input, so I couldn't get readline() to detect the end of the content (it would just hang).
So could someone tell me how I could do a conversion? For what it's worth, I'm converting multiple (potentially many) PDF files to raw text using the PDFBox lib. The PDFBox API puts the content onto a Writer, after which I need to get at the content for further processing (so BufferedReader/Writer is not actually essential, but some kind of Reader/Writer). I know that this is possible using StringReader/Writer, but I'm not sure that this is efficient plus I loose the readline() method.

This is a bit like asking how to convert a pig into an elephant ... :-)
OK, there are two ways to address this problem (using the Java libraries):
You can capture the data written to a buffered writer so that it can then be read using a buffered reader. Basically, you do this by:
using your BufferedWriter to write to a StringWriter or CharArrayWriter,
closing it,
extracting the resulting stuff from the SW / CAW as a String, and
wrapping the String in a StringReader,
wrapping the StringReader in a BufferedReader.
You can create a PipedReader / PipedWriter pair and wrap them with BufferedReader and BufferedWriter respectively.
The two approaches both have disadvantages:
The first one requires you to complete the writing before constructing the read side. That means you need space to hold the entire stream content in memory, and you can't do producer-side and consumer-side processing in parallel.
The second one requires you to produce and consume in separate threads ... or risk having the pipeline block permanently.
Conceptually speaking, the Ostermiller library is really an reimplementation of PipeReader / PipeWriter. (And some of the advantages of his reimplementation were mooted in Java 1.6 ... which allows you to specify the pipeline's buffer size. Mark support is interesting, but I can imagine some problems, depending on how you used it.)
You might also be able to find a PipedReader / PipedWriter replacement that uses a flexible buffer that grows and contracts as required. (At least ... this is conceptually possible.)

The CircularCharBuffer from the Ostermiller lib has two methods getWriter() and getReader() to get a reader on the content of a writer, and vice versa. The reason the Reader was hanging at the final readLine() was because I wasn't calling close() on the writer after I had finished writing to it. So the final readLine() was waiting for new content on the writer that was never going to arrive.
The Ostermiller library can be found here.

Java using streams as sort of "buffers"

I'm working with a library that I have to provide an InputStream and a PrintStream. It uses the InputStream to gather data for processing and the PrintStream to provide results. I'm stuck using this library and its API cannot be altered.
There are two issues with this that I think have related solutions.
First, the data that needs to be read via the InputStream is not available upfront. Instead, the data is dynamically created by a different part of the application and given to my code as a String via method call. My code's job is to somehow allow the library to read this data through the InputStream provided as I get it.
Second, I need to somehow get the result that is written to the PrintStream and send it to another part of the application as a String. This needs to happen as immediately after the data is put in to the PrintStream as possible.
What it looks like I need are two stream objects that behave more or less like buffers. I need an InputStream that I can shove data in to whenever I have it and a PrintStream that I can grab it's contents whenever it has some. This seems a little awkward to me, but I'm not sure how else to do it.
I'm wondering if anything already exists that allows this kind of behavior or if there is a different (better) solution that will work in the situation I've described. The only thing I can come up with is to try to implement streams with this behavior, but that can become complicated fast (especially since the InputStream needs to block until data is available).
Any ideas?
Edit: To be clear, I'm not writing the library. I'm writing code that is supposed to provide the library with an InputStream to read data from and a PrintStream to write data to.

Looks like both streams need to be constantly reading/writing so you'll need two threads independent of each other. The pattern resembles JMS a little bit, in which case you're feeding information to a "queue" or "topic", and wait for it to be processed then put on a "output" queue/topic. This may introduce additional moving parts, but you could write a simple client to place info onto a JMS queue, then have a listener to just grab messages, and feed it to the input stream constantly. Then another piece of code to read from output stream, and do what you need with it.
Hope this helps.

Read response before send it

I am sending a plain text file to the user through a servlet.
I am using flatworm framework to build the flat file. I receive the file in the browser but is empty. So i want start the debugging analysing the outputstream before being sent.
How i can read the response before i send it in the servlet? I think is the same thing that asking how can i transform an OutputStream to an InputStream.
I already saw solutions that always involve ByteArrayOutputStream , and as you know when i call in the servlet response.getOutputStream() it returns me an OutputStream and not a ByteArrayOutputStream.

There seems to be some confusion somewhere, though I'm not sure exactly where.
What can you do with an OutputStream? Why, you can write to it, and that's about it. That means that if you're given (or look up) an output stream, it's up to you to supply the data - which means you already have it.
Perhaps on the other hand, you're not directly calling write on the OutputStream yourself, but passing this stream into the flatworm library (which will in turn write output to it). In that case, there's your debugging "hook" right there - flatworm will write out the file to any output stream you send it. So in this case, instead of passing in the servlet's stream, you pass in a stream that you've created yourself.
That might be a ByteArrayOutputStream, which (after the flatworm method has returned) you can inspect to get the bytes written. At this point you could manually write them through to the response's output stream. Or maybe you need to do something slightly trickier and create your own stream wrapper which writes straight through to the underlying response stream but logs on the way - and pass this into flatworm.
The bottom line however is that if you're interacting with an output stream, then "your" code already has the data somewhere locally and it's just a matter of capturing/accessing that.

Question about streams in Java

I have been reading about streams in Java the past days. After reading quite a bit I start to understand that the name "stream" was chosen because of similarity to what we use the word about in "real life" such as water. And that it is not necessary to know where the data comes from. Please correct if I have interpreted it wrong.
But I do not understand this. When I say getOutputStream or getInputStream on a socket for example I get a InputStreamwhich I can chain to whatever I like. But isn't the InputStream/OutputStream abstract classes? I don't know exactly how to explain it properly but I do not understand that a socket connection just by invoking that method automatically has a stream/channel where bytes/characters can flow? What is in fact a InputStream / OutputStream? Are streams a way of abstracting the real sources?
I think I understand the various ways of chaining them though, however I feel I miss the core of the concept.
In lack of a proper way of explaining it I will delete question if it is bad.
Thank you for your time.

InputStream/OutputStream are, well, abstractions. They give you some basic API for reading/writing bytes or groups of bytes without exposing the actual implementation. Let's take OutputStream as an example:
OutputStream receives a bunch of bytes through the public API. You don't actually know (and care) what is happening with these bytes afterwards: they are sent. Real implementation may: append them to file, ignore them (NullOutputStream in Apache Commons), save them in memory or... send through socket.
This is what happens when you call Socket.getOutputStream(): you get some implementation of OutputStream, just don't care, it is implementation-dependent and specific. When you send bytes to this stream, underlying implementation will push them using TCP/IP or UDP. In fact TCP/IP itself is a stream protocol, even though it operates on packets/frames.
The situation is similar for InputStream - you get some implementation from socket. When you ask the stream for few bytes, the underlying InputStream implementation will ask OS socket for the same amount of bytes, possibly blocking. But this is the real fun of inheritance: you don't care! Just use these stream any way you want, chaining, buffering, etc.

When you call getInputStream, the socket returns an instance of some concrete subclass of InputStream. Usually you don't worry about the exact class of the returned object; since it is an InputStream, you just work with it that way. (The subclass may even be a private nested class of the socket class.)

InputStream is indeed an abstraction. On each occasion a different implementation of the stream concept can be used. But the user of the stream does not need to know what was the exact implementation.
In the case of Socket, the implementation is a SocketInputStream which extends FileInputStream

I don't think this is a bad question. You're quite right that streams abstract away from you the complexity of where the data is coming from and make it uniform. Therefore you can write code that reads from a File or a Socket and that code could look almost identical. It means you generally have to write less code.
When you get the InputStream from a Socket you are able to access any data arriving at that socket. When reading from this stream, typically, you supply a byte array and ask the stream to fill it for you. It will read as much data as is available or it will fill the buffer. It's the up to you what you do with the data in this byte array.
For any kind of Socket IO though, having said all this about Streams, the Java socket API is quite old and there are some really good alternatives available which wrap it and are easier to use. I highly recommend Netty using which you can forget about streams and focus on POJOs and how to encode and decode them.

What is the fastest way to output a large amount of data?

I have an JAX-RS web service that calls a db2 z/os database and returns about 240mb of data in a resultset. I am then creating an OutputStream to send this data to the client by looping through the resultset and adding a few XML tags for my output.
I am confused about what to use PrintWriter, BufferedWriter or OutputStreamWriter. I am looking for the fastest way to deliver the data. I also don't want the JVM to hold onto this data any longer than it needs to, so I don't use up it's memory.
Any help is appreciated.

You should use
BufferedWriter
Call .flush() frequently
Enable gzip for best compression
Start thinking about a different way of doing this. Can your data be paginated? Do you need all the data in one request.

If you are sending a large binary data, you probably don't want to use xml. When xml is used, binary data is usually represented using base64 which becomes larger than the original binary and uses quite a lot of CPU for the conversion into base64.
If I were you, I'd send the binary separate from the xml. If you are using WebService, MTOM attachment could help. Otherwise you could send the reference to the binary data in the xml, and let the app. download the binary data separately.
As for the fastest way to send binary, if you are using weblogic, just writing on the response's outputstram would be ok. That output stream is most probably buffered and whatever you do probably won't change the performance anyways.
Turning on gzip could also help depending on what you are sending (e.g. if you are sending jpeg (stuff that is already compressed) or something, it won't help a lot but if you are sending raw text then it can help a lot, etc.).

One solution (which might not work for you) is to spawn a job / thread that creates a file and then notifies the user when the file is ready to download, in this way you're not tied to the bandwidth of the client connection (and you can even compress the file properly, before the client downloads it)
Some Business Intelligence and data crunching applications do this, specially if the process takes some time to generate the data.

The output max speed will me limited by network bandwith and i am shure any Java OutputStream will be much more faster than you will notice the difference.
The choice depends on the data to send: is that text (lines) PrintWriter is easy, is that a byte array take OutputStream.
To hold not too much data in the buffers you should call flush() any x kb maybe.

You should never use PrintWriter to output data over a network. First of all, it creates platform-dependent line breaks. Second, it silently catches all I/O exceptions, which makes it hard for you to deal with those exceptions.
And if you're sending 240 MB as XML, then you're definitely doing something wrong. Before you start worrying about which stream class to use, try to reduce the amount of data.
EDIT:
The advice about PrintWriter (and PrintStream) came from a book by Elliotte Rusty Harold. I can't remember which one, but it was a few years ago. I think that ServletResponse.getWriter() was added to the API after that book was written - so it looks like Sun didn't follow Rusty's advice. I still think it was good advice - for the reasons stated above, and because it can tempt implementation authors to violate the API contract
in order to get predictable behavior.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.