Read from ByteArrayOutputStream while it's being written to - java

I have a class that is constantly producing data and writing it to a ByteArrayOutputStream on its own thread. I have a 2nd thread that gets a reference to this ByteArrayOutputStream. I want the 2nd thread to read any data (and empty) the ByteArrayOutputStream and then stop when it doesn't get any bytes and sleep. After the sleep, I want it to try to get more data and empty it again.
The examples I see online say to use PipedOutputStream. If my first thread is making the ByteArrayOutputStream available to the outside world from a separate reusable library, I don't see how to hook up the inputStream to it.
How would one setup the PipedInputStream to connect it to the ByteArrayOutputStream to read from it as above? Also, when reading the last block from the ByteArrayOutputStream, will I see bytesRead == -1, indicating when the outputStream is closed from the first thread?
Many thanks,
Mike

Write to the PipedOutputStream directly (that is, don't use a ByteArrayOutputStream at all). They both extend OutputStream and so have the same interface.
There are connect methods in both PipedOutputStream and PipedInputStream that are used to wire two pipes together, or you can use one of the constructors to create a pair.
Writes to the PipedOutputStream will block when the buffer in the PipedInputStream fills up, and reads from the PipedInputStream will block when the buffer is empty, so the producer thread will sleep (block) if it gets "ahead" of the consumer and vice versa.
After blocking the threads wait for 1000ms before rechecking the buffer, so it's good practice to flush the output after writes complete (this will wake the reader if it is sleeping).
Your input stream will see the EOF (bytesRead == -1) when you close the output stream in the producer thread.
import java.io.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class PipeTest {
public static void main(String[] args) throws IOException {
PipedOutputStream out = new PipedOutputStream();
// Wire an input stream to the output stream, and use a buffer of 2048 bytes
PipedInputStream in = new PipedInputStream(out, 2048);
ExecutorService executor = Executors.newCachedThreadPool();
// Producer thread.
executor.execute(() -> {
try {
for (int i = 0; i < 10240; i++) {
out.write(0);
// flush to wake the reader
out.flush();
}
out.close();
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
// Consumer thread.
executor.execute(() -> {
try {
int b, read = 0;
while ((b = in.read()) != -1) {
read++;
}
System.out.println("Read " + read + " bytes.");
} catch (IOException e) {
throw new UncheckedIOException(e);
}
});
executor.shutdown();
}
}

Related

GZIPOutputStream that does its compression in a separate thread

Is there an implemetation of GZIPOutputStream that would do the heavy lifting (compressing + writing to disk) in a separate thread?
We are continuously writing huge amounts of GZIP-compressed data. I am looking for a drop-in replacement that could be used instead of GZIPOutputStream.
You can write to a PipedOutputStream and have a thread which reads the PipedInputStream and copies it to any stream you like.
This is a generic implementation. You give it an OutputStream to write to and it returns an OutputStream for you to write to.
public static OutputStream asyncOutputStream(final OutputStream out) throws IOException {
PipedOutputStream pos = new PipedOutputStream();
final PipedInputStream pis = new PipedInputStream(pos);
new Thread(new Runnable() {
#Override
public void run() {
try {
byte[] bytes = new byte[8192];
for(int len; (len = pis.read(bytes)) > 0;)
out.write(bytes, 0, len);
} catch(IOException ioe) {
ioe.printStackTrace();
} finally {
close(pis);
close(out);
}
}
}, "async-output-stream").start();
return pos;
}
static void close(Closeable closeable) {
if (closeable != null) try {
closeable.close();
} catch (IOException ignored) {
}
}
I published some code that does exactly what you are looking for. It has always frustrated me that Java doesn't automatically pipeline calls like this across multiple threads, in order to overlap computation, compression, and disk I/O:
https://github.com/lukehutch/PipelinedOutputStream
This class splits writing to an OutputStream into separate producer and consumer threads (actually, starts a new thread for the consumer), and inserts a blocking bounded buffer between them. There is some data copying between buffers, but this is done as efficiently as possible.
You can even layer this twice to do the disk writing in a separate thread from the gzip compression, as shown in README.md.

Best practice for reading / writing to a java server socket

How do you design a read and write loop which operates on a single socket (which supports parallel read and write operations)? Do I have to use multiple threads? Is my (java) solution any good? What about that sleep command? How do you use that within such a loop?
I'm trying to use 2 Threads:
Read
public void run() {
InputStream clientInput;
ByteArrayOutputStream byteBuffer;
BufferedInputStream bufferedInputStream;
byte[] data;
String dataString;
int lastByte;
try {
clientInput = clientSocket.getInputStream();
byteBuffer = new ByteArrayOutputStream();
bufferedInputStream = new BufferedInputStream(clientInput);
while(isRunning) {
while ((lastByte = bufferedInputStream.read()) > 0) {
byteBuffer.write(lastByte);
}
data = byteBuffer.toByteArray();
dataString = new String(data);
byteBuffer.reset();
}
} catch (IOException e) {
e.printStackTrace();
}
}
Write
public void run() {
OutputStream clientOutput;
byte[] data;
String dataString;
try {
clientOutput = clientSocket.getOutputStream();
while(isOpen) {
if(!commandQueue.isEmpty()) {
dataString = commandQueue.poll();
data = dataString.getBytes();
clientOutput.write(data);
}
Thread.sleep(1000);
}
clientOutput.close();
}
catch (IOException e) {
e.printStackTrace();
}
catch (InterruptedException e) {
e.printStackTrace();
}
}
Read fails to deliver a proper result, since there is no -1 sent.
How do I solve this issue?
Is this sleep / write loop a good solution?
There are basically three ways to do network I/O:
Blocking. In this mode reads and writes will block until they can be fulfilled, so if you want to do both simultaneously you need separate threads for each.
Non-blocking. In this mode reads and writes will return zero (Java) or in some languages (C) a status indication (return == -1, errno=EAGAIN/EWOULDBLOCK) when they cannot be fulfilled, so you don't need separate threads, but you do need a third API that tells you when the operations can be fulfilled. This is the purpose of the select() API.
Asynchronous I/O, in which you schedule the transfer and are given back some kind of a handle via which you can interrogate the status of the transfer, or, in more advanced APIs, a callback.
You should certainly never use the while (in.available() > 0)/sleep() style you are using here. InputStream.available() has few correct uses and this isn't one of them, and the sleep is literally a waste of time. The data can arrive within the sleep time, and a normal read() would wake up immediately.
You should rather use a boolean variable instead of while(true) to properly close your thread when you will want to. Also yes, you should create multiple thread, one per client connected, as the thread will block itself until a new data is received (with DataInputStream().read() for example). And no, this is not really a design question, each library/Framework or languages have its own way to listen from a socket, for example to listen from a socket in Qt you should use what is called "signals and slots", not an infinite loop.

Flaws with PipedInputStream/PipedOutputStream

I've seen two answers on SO that claim that the PipedInputStream and PipedOutputStream classes provided by Java are flawed. But they did not elaborate on what was wrong with them. Are they really flawed, and if so in what way? I'm currently writing some code that uses them, so I'd like to know whether I'm taking a wrong turn.
One answer said:
PipedInputStream and PipedOutputStream are broken (with regards to threading). They assume each instance is bound to a particular thread. This is bizarre.
To me that seems neither bizarre nor broken. Perhaps the author also had some other flaws in mind?
Another answer said:
In practice they are best avoided. I've used them once in 13 years and I wish I hadn't.
But that author could not recall what the problem was.
As with all classes, and especially classes used in multiple threads, you will have problems if you misuse them. So I do not consider the unpredictable "write end dead" IOException that PipedInputStream can throw to be a flaw (failing to close() the connected PipedOutputStream is a bug; see the article Whats this? IOException: Write end dead, by Daniel Ferbers, for more information). What other claimed flaws are there?
They are not flawed.
As with all classes, and especially classes used in multiple threads, you will have problems if you misuse them. The unpredictable "write end dead" IOException that PipedInputStream can throw is not a flaw (failing to close() the connected PipedOutputStream is a bug; see the article Whats this? IOException: Write end dead, by Daniel Ferbers, for more information).
I have used them nicely in my project and they are invaluable for modifying streams on the fly and passing them around. The only drawback seemed to be that PipedInputStream had a short buffer (around 1024) and my outputstream was pumping in around 8KBs.
There is no defect with it and it works perfectly well.
-------- Example in groovy
public class Runner{
final PipedOutputStream source = new PipedOutputStream();
PipedInputStream sink = new PipedInputStream();
public static void main(String[] args) {
new Runner().doit()
println "Finished main thread"
}
public void doit() {
sink.connect(source)
(new Producer(source)).start()
BufferedInputStream buffer = new BufferedInputStream(sink)
(new Consumer(buffer)).start()
}
}
class Producer extends Thread {
OutputStream source
Producer(OutputStream source) {
this.source=source
}
#Override
public void run() {
byte[] data = new byte[1024];
println "Running the Producer..."
FileInputStream fout = new FileInputStream("/Users/ganesh/temp/www/README")
int amount=0
while((amount=fout.read(data))>0)
{
String s = new String(data, 0, amount);
source.write(s.getBytes())
synchronized (this) {
wait(5);
}
}
source.close()
}
}
class Consumer extends Thread{
InputStream ins
Consumer(InputStream ins)
{
this.ins = ins
}
public void run()
{
println "Consumer running"
int amount;
byte[] data = new byte[1024];
while ((amount = ins.read(data)) >= 0) {
String s = new String(data, 0, amount);
println "< $s"
synchronized (this) {
wait(5);
}
}
}
}
One flaw might be that there is not clear way for the writer to indicate to the reader that it encountered a problem:
PipedOutputStream out = new PipedOutputStream();
PipedInputStream in = new PipedInputStream(out);
new Thread(() -> {
try {
writeToOut(out);
out.close();
}
catch (SomeDataProviderException e) {
// Have to notify the reading side, but how?
}
}).start();
readFromIn(in);
The writer could close out, but maybe the reader misinterprets that as end of data. To handle this correctly additional logic is needed. It would be easier if functionality to manually break the pipe was provided.
There is now JDK-8222924 which requests a way to manually break the pipe.
From my point of view there is a flaw. More precisely there is a high risk of a deadlock if the Thread which should pump data into the PipedOutputStream dies prematurely before it actually writes a single byte into the stream. The problem in such a situation is that the implementation of the piped streams is not able to detect the broken pipe. Consequently the thread reading from PipedInputStream will wait forever (i.e. deadlock) in it's first call to read().
Broken pipe detection actually relies on the first call to write() as the implementation will than lazily initialize the write side thread and only from that point in time broken pipe detection will work.
The following code reproduces the situation:
import java.io.IOException;
import java.io.PipedInputStream;
import java.io.PipedOutputStream;
import org.junit.Test;
public class PipeTest
{
#Test
public void test() throws IOException
{
final PipedOutputStream pout = new PipedOutputStream();
PipedInputStream pin = new PipedInputStream();
pout.connect(pin);
Thread t = new Thread(new Runnable()
{
public void run()
{
try
{
if(true)
{
throw new IOException("asd");
}
pout.write(0); // first byte which never get's written
pout.close();
}
catch(IOException e)
{
throw new RuntimeException(e);
}
}
});
t.start();
pin.read(); // wait's forever, e.g. deadlocks
}
}
The flaws that I see with the JDK implementation are:
1) No timeouts, reader or writer can block infinitely.
2) Suboptimal control over when data is transferred (should be done only with flush, or when circular buffer is full)
So I created my own to address the above, (timeout value passed via a ThreadLocal):
PipedOutputStream
How to use:
PiedOutputStreamTest
Hope it helps...

Write end dead exception using PipedInputStream java

Write end dead exception occurs in the following situation:
Two threads:
A: PipedOutputStream put = new PipedOutputStream();
String msg = "MESSAGE";
output.wirte(msg.getBytes());
output.flush();
B: PipedInputStream get = new PipedOutputStream(A.put);
byte[] get_msg = new byte[1024];
get.read(get_msg);
Here is the situation: A and B run concurrently, and A writes to the pipe and B reads it. B just read from the pipe and buffer of this pipe is cleared. Then A doesn't write msg to the pipe in unknown interval. However, at one moment, B read the pipe again and java.io.IOException: write end dead occurs, because the buffer of the pipe is still empty. And I don't want to sleep() thread B to wait for A writing the pipe, which is also unstable. How to avoid this problem and solve it? Thanks
"Write end dead" exceptions will arise when you have:
A PipedInputStream connected to a PipedOutputStream and
The ends of these pipe are read/written by two different threads
The threads finish without closing their side of the pipe.
To resolve this exception, simply close your Piped Stream in your Thread's runnable after you have completed writing and reading bytes to/from the pipe stream.
Here is some sample code:
final PipedOutputStream output = new PipedOutputStream();
final PipedInputStream input = new PipedInputStream(output);
Thread thread1 = new Thread(new Runnable() {
#Override
public void run() {
try {
output.write("Hello Piped Streams!! Used for Inter Thread Communication".getBytes());
output.close();
} catch(IOException io) {
io.printStackTrace();
}
}
});
Thread thread2 = new Thread(new Runnable() {
#Override
public void run() {
try {
int data;
while((data = input.read()) != -1) {
System.out.println(data + " ===> " + (char)data);
}
input.close();
} catch(IOException io) {
io.printStackTrace();
}
}
});
thread1.start();
thread2.start();
Complete code is here: https://github.com/prabhash1785/Java/blob/master/JavaCodeSnippets/src/com/prabhash/java/io/PipedStreams.java
For more details, please have a look at this nice blog: https://techtavern.wordpress.com/2008/07/16/whats-this-ioexception-write-end-dead/
you need to close PipedOutputStream, before writing thread is finished (and ofcourse after all data is written). PipedInputStream throws this exception on read() when there is no writing thread and writer is not properly closed

Java: Proper Network IO handling?

The problem I am having is that when I use an InputStream to read bytes, it blocks until the connection is finished. EG:
InputStream is = socket.getInputStream();
byte[] buffer = new byte[20000];
while (is.read(buffer) != -1) {
System.out.println("reading");
}
System.out.println("socket read");
"socket read" doesn't print out until the FYN packet is actually recieved, thus closing the connection. What is the proper way to receive all the bytes in without blocking and waiting for the connection to drop?
Take a look at java.nio which has non-blocking IO support.
Reading till you get -1 means that you want to read until EOS. If you don't want to read until EOS, don't loop till the -1: stop sooner. The question is 'when?'
If you want to read a complete 'message' and no more, you must send the message in such a way that the reader can find its end: for example, a type-length-value protocol, or more simply a size word before each message, or a self-describing protocol such as XML.
With traditional sockets the point is that usually you do want them to block: what you do when logically you don't want your program to block is you put your reading/writing code in another thread, so that the separate read/write thread blocks, but not your whole program.
Failing that, you can use the available() method to see if there is actually any input available before reading. But then you need to be careful not to sit in a loop burning CPU by constantly calling available().
Edit: if the problem is that you're happy to block until the bytes have arrived, but not until the connection has dropped (and that is what is happeningh), then you need to make the client at the other end call flush() on its output stream after it has sent the bytes.
Try this:
InputStream is = socket.getInputStream();
byte[] buffer = new byte[20000];
int bytesRead;
do {
System.out.println("reading");
bytesRead = is.read(buffer);
}
while (is.available() > 0 && bytesRead != -1);
System.out.println("socket read");
More info: https://docs.oracle.com/javase/1.5.0/docs/api/java/io/InputStream.html#available()
Example taken from exampledepot on java.nio
// Create a direct buffer to get bytes from socket.
// Direct buffers should be long-lived and be reused as much as possible.
ByteBuffer buf = ByteBuffer.allocateDirect(1024);
try {
// Clear the buffer and read bytes from socket
buf.clear();
int numBytesRead = socketChannel.read(buf);
if (numBytesRead == -1) {
// No more bytes can be read from the channel
socketChannel.close();
} else {
// To read the bytes, flip the buffer
buf.flip();
// Read the bytes from the buffer ...;
// see Getting Bytes from a ByteBuffer
}
} catch (IOException e) {
// Connection may have been closed
}
Be sure to understand buffer flipping because it causes a lot of headache. Basically, you have to reverse your buffer to read from it. If you are to reuse that buffer to have the socket to write in it, you have to flip it again. However clear() resets the buffer direction.
the code is probably not doing what you think it does.
read(buffer) returns the number of bytes it read, in other words: it is not guaranties to fill up your buffer anyway.
See DataInputStream.readFully() for code that fill up the entire array:
or you can use this functions (which are based on DataInputStream.readFully()) :
public final void readFully(InputStream in, byte b[]) throws IOException
{
readFully(in, b, 0, b.length);
}
public final void readFully(InputStream in, byte b[], int off, int len) throws IOException
{
if (len < 0) throw new IndexOutOfBoundsException();
int n = 0;
while (n < len)
{
int count = in.read(b, off + n, len - n);
if (count < 0) throw new EOFException();
n += count;
}
}
Your code would look like:
InputStream is = socket.getInputStream();
byte[] buffer = new byte[20000];
readFully(is, buffer);
System.out.println("socket read");

Categories