Java OutputStream buffer size - java

The OutputStream in Java has a method named flush(). Based on its documentation:
Flushes this output stream and forces any buffered output bytes to be written out.
How can I understand how much of a multi-byte capacity this buffer is?
Extra note: I've got my own OutputStream from an HttpURLConnection getOutputStream() method.

It depends on the kind of OutputStream you're using.
Let's start with the basics, by analyzing what flush of OutputStream proposes as a contract:
public void flush()
throws IOException
Flushes this output stream and forces any buffered output bytes to be
written out. The general contract of flush is that calling it is an
indication that, if any bytes previously written have been buffered by
the implementation of the output stream, such bytes should immediately
be written to their intended destination.
If the intended destination of this stream is an abstraction provided
by the underlying operating system, for example a file, then flushing
the stream guarantees only that bytes previously written to the stream
are passed to the operating system for writing; it does not guarantee
that they are actually written to a physical device such as a disk
drive.
The flush method of OutputStream does nothing.
And if you see the flush method of OutputStream, it actually does nothing:
public void flush() throws IOException {
}
The idea is that the implementation that is decorating an OutputStream will have to deal with its flush and then cascade it to other OutputStreams until it reaches the OS if that is the case.
So it does something! By means of whoever is implementing it. The concrete classes will override flush to do something like moving data to disk or sending it over the network (your case).
If you check out the flush of a BufferedOutputStream:
/**
* Flushes this buffered output stream. This forces any buffered
* output bytes to be written out to the underlying output stream.
*
* #exception IOException if an I/O error occurs.
* #see java.io.FilterOutputStream#out
*/
public synchronized void flush() throws IOException {
flushBuffer();
out.flush();
}
/** Flush the internal buffer */
private void flushBuffer() throws IOException {
if (count > 0) {
out.write(buf, 0, count);
count = 0;
}
}
You may see that it's writing the contents of its own buffer to the wrapped OutputStream. And you can see the default size of its buffer (or you could change it), see its constructors:
/**
* The internal buffer where data is stored.
*/
protected byte buf[];
/**
* Creates a new buffered output stream to write data to the
* specified underlying output stream.
*
* #param out the underlying output stream.
*/
public BufferedOutputStream(OutputStream out) {
this(out, 8192);
}
/**
* Creates a new buffered output stream to write data to the
* specified underlying output stream with the specified buffer
* size.
*
* #param out the underlying output stream.
* #param size the buffer size.
* #exception IllegalArgumentException if size <= 0.
*/
public BufferedOutputStream(OutputStream out, int size) {
super(out);
if (size <= 0) {
throw new IllegalArgumentException("Buffer size <= 0");
}
buf = new byte[size];
}
So, the default size of the BufferedInputStream buffer is 8192 bytes.
Now that you got the gist, check out the code the OutputStream that is being used for you in your HttpURLConnection to familiarize yourself with its buffer (if it has one).
In your java journey, you may end up with some native code that delegates the action of flushing to the OS. In that case you may have to check if your OS is working with some buffer and what is its size when dealing with IO. I know that this part of the answer may sound way to broad, but that is the reality of it. You need to know what you're working with in order to get the idea of what is behind it.
Check out this question:
What is the purpose of flush() in Java streams?
And this article:
http://www.oracle.com/technetwork/articles/javase/perftuning-137844.html
cheers!

Related

Java : know how much bytes were read by ImageIO.read() (and maybe other similar methods)

i am making a java program that reads data from a binary stream (using a DataInputStream).
Sometimes during this process i need to read a data chunk, however the method (which i cannot modify) that reads it will stop before reaching the end of the chunk (it is the normal behavior, apparently it just doesn't need the last bytes, but i can't do anything about the fact that they are there). This is not a problem in itself because i know exactly how long the chunk is, i.e. i know how many bytes there are in the chunk so i can skip bytes (with the skipBytes(int) method) until the end of the chunk ; the problem is : i don't actually know how many bytes the method actually read (or left), so i don't know how many bytes i need to skip to reach the end of the chunk.
Is there any way to :
know how many bytes were read in a stream since a certain point in time ?
know how many bytes were read in a stream since it was ?
any other way i could know how many bytes my data-chunk-reading method just read (since it won't directly tell me) ?
Just in case, i made a small diagram
Thanks in advance
ImageInputStream can do what you want. It implements DataInput and it has most of the methods of InputStream. And it has getStreamPosition, seek and skipBytes methods.
However, as you correctly point out, ImageIO.read(ImageInputStream) would close the stream, preventing you from reading more than one image.
The solution is to avoid using ImageIO.read, and instead obtain an ImageReader explicitly, using ImageIO.getImageReaders. Then you can invoke an ImageReader’s read method, which does not close the stream.
Here’s how I implemented it:
public void readImages(InputStream source,
Consumer<? super BufferedImage> imageHandler)
throws IOException {
// Every image is at a byte index which is a multiple of this number.
int boundary = 5000;
try (ImageInputStream stream = ImageIO.createImageInputStream(source)) {
while (true) {
long pos = stream.getStreamPosition();
Iterator<ImageReader> readers = ImageIO.getImageReaders(stream);
if (!readers.hasNext()) {
break;
}
ImageReader reader = readers.next();
reader.setInput(stream);
BufferedImage image = reader.read(0);
imageHandler.accept(image);
pos = stream.getStreamPosition();
long bytesToSkip = boundary - (pos % boundary);
if (bytesToSkip < boundary) {
stream.skipBytes(bytesToSkip);
}
}
}
}
And here’s how I tested it:
try (InputStream source = new BufferedInputStream(
Files.newInputStream(Path.of(filename)))) {
reader.readImages(source, img -> EventQueue.invokeLater(() -> {
JOptionPane.showMessageDialog(null, new ImageIcon(img));
}));
}
All the buffered read methods return the actual number of bytes read.
Quoting documentation for InputStream#read(byte[] b):
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.

How does buffer size affect NIO Channel performance?

I was reading Hadoop IPC implementation.
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
/**
* When the read or write buffer size is larger than this limit, i/o will be
* done in chunks of this size. Most RPC requests and responses would be
* be smaller.
*/
private static int NIO_BUFFER_LIMIT = 8*1024; //should not be more than 64KB.
/**
* This is a wrapper around {#link WritableByteChannel#write(ByteBuffer)}.
* If the amount of data is large, it writes to channel in smaller chunks.
* This is to avoid jdk from creating many direct buffers as the size of
* buffer increases. This also minimizes extra copies in NIO layer
* as a result of multiple write operations required to write a large
* buffer.
*
* #see WritableByteChannel#write(ByteBuffer)
*/
private int channelWrite(WritableByteChannel channel,
ByteBuffer buffer) throws IOException {
int count = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
channel.write(buffer) : channelIO(null, channel, buffer);
if (count > 0) {
rpcMetrics.incrSentBytes(count);
}
return count;
}
/**
* This is a wrapper around {#link ReadableByteChannel#read(ByteBuffer)}.
* If the amount of data is large, it writes to channel in smaller chunks.
* This is to avoid jdk from creating many direct buffers as the size of
* ByteBuffer increases. There should not be any performance degredation.
*
* #see ReadableByteChannel#read(ByteBuffer)
*/
private int channelRead(ReadableByteChannel channel,
ByteBuffer buffer) throws IOException {
int count = (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
channel.read(buffer) : channelIO(channel, null, buffer);
if (count > 0) {
rpcMetrics.incrReceivedBytes(count);
}
return count;
}
The logic is,
If the buffer is small, it'll read/write channel one time. If buffer is large, it'll do it many times, and every time read/write 8kb.
I don't understand the javadocs and why it does this way.
Why "This is to avoid jdk from creating many direct buffers as the size of buffer increases."?
Does big buffer size affect read performance as well?
I understand how buffer size affects FileInputStream performance (link). But here is SocketChannel. So it's unrelated.
Good question. sun.nio.ch.IOUtil is used while writing in channel, and it has the following lines in its write(..) function
int var7 = var5 <= var6?var6 - var5:0;
ByteBuffer var8 = Util.getTemporaryDirectBuffer(var7);
Here is Util.getTemporaryDirectBuffer
static ByteBuffer getTemporaryDirectBuffer(int var0) {
Util.BufferCache var1 = (Util.BufferCache)bufferCache.get();
ByteBuffer var2 = var1.get(var0);
if(var2 != null) {
return var2;
} else {
if(!var1.isEmpty()) {
var2 = var1.removeFirst();
free(var2);
}
return ByteBuffer.allocateDirect(var0);
}
}
And under a heavy load and when int var0 is in a big range it would create lots of new buffers and free(..) the old ones. Because the bufferCache has limited length (equals to IOUtil.IOV_MAX which is defined in system config. On modern Linux systems, the limit is 1024) and wouldn't store buffers of every length.
I think this is meant in This is to avoid jdk from creating many direct buffers as the size of buffer increases..

Reading first N bytes of a file as an InputStream in Java?

For the life of me, I haven't been able to find a question that matches what I'm trying to do, so I'll explain what my use-case is here. If you know of a topic that already covers the answer to this, please feel free to direct me to that one. :)
I have a piece of code that uploads a file to Amazon S3 periodically (every 20 seconds). The file is a log file being written by another process, so this function is effectively a means of tailing the log so that someone can read its contents in semi-real-time without having to have direct access to the machine that the log resides on.
Up until recently, I've simply been using the S3 PutObject method (using a File as input) to do this upload. But in AWS SDK 1.9, this no longer works because the S3 client rejects the request if the content size actually uploaded is greater than the content-length that was promised at the start of the upload. This method reads the size of the file before it starts streaming the data, so given the nature of this application, the file is very likely to have increased in size between that point and the end of the stream. This means that I need to now ensure I only send N bytes of data regardless of how big the file is.
I don't have any need to interpret the bytes in the file in any way, so I'm not concerned about encoding. I can transfer it byte-for-byte. Basically, what I want is a simple method where I can read the file up to the Nth byte, then have it terminate the read even if there's more data in the file past that point. (In other words, insert EOF into the stream at a specific point.)
For example, if my file is 10000 bytes long when I start the upload, but grows to 12000 bytes during the upload, I want to stop uploading at 10000 bytes regardless of that size change. (On a subsequent upload, I would then upload the 12000 bytes or more.)
I haven't found a pre-made way to do this - the best I've found so far appears to be IOUtils.copyLarge(InputStream, OutputStream, offset, length), which can be told to copy a maximum of "length" bytes to the provided OutputStream. However, copyLarge is a blocking method, as is PutObject (which presumably calls a form of read() on its InputStream), so it seems that I couldn't get that to work at all.
I haven't found any methods or pre-built streams that can do this, so it's making me think I'd need to write my own implementation that directly monitors how many bytes have been read. That would probably then work like a BufferedInputStream where the number of bytes read per batch is the lesser of the buffer size or the remaining bytes to be read. (eg. with a buffer size of 3000 bytes, I'd do three batches at 3000 bytes each, followed by a batch with 1000 bytes + EOF.)
Does anyone know a better way to do this? Thanks.
EDIT Just to clarify, I'm already aware of a couple alternatives, neither of which are ideal:
(1) I could lock the file while uploading it. Doing this would cause loss of data or operational problems in the process that's writing the file.
(2) I could create a local copy of the file before uploading it. This could be very inefficient and take up a lot of unnecessary disk space (this file can grow into the several-gigabyte range, and the machine it's running on may be that short of disk space).
EDIT 2: My final solution, based on a suggestion from a coworker, looks like this:
private void uploadLogFile(final File logFile) {
if (logFile.exists()) {
long byteLength = logFile.length();
try (
FileInputStream fileStream = new FileInputStream(logFile);
InputStream limitStream = ByteStreams.limit(fileStream, byteLength);
) {
ObjectMetadata md = new ObjectMetadata();
md.setContentLength(byteLength);
// Set other metadata as appropriate.
PutObjectRequest req = new PutObjectRequest(bucket, key, limitStream, md);
s3Client.putObject(req);
} // plus exception handling
}
}
LimitInputStream was what my coworker suggested, apparently not aware that it had been deprecated. ByteStreams.limit is the current Guava replacement, and it does what I want. Thanks, everyone.
Complete answer rip & replace:
It is relatively straightforward to wrap an InputStream such as to cap the number of bytes it will deliver before signaling end-of-data. FilterInputStream is targeted at this general kind of job, but since you have to override pretty much every method for this particular job, it just gets in the way.
Here's a rough cut at a solution:
import java.io.IOException;
import java.io.InputStream;
/**
* An {#code InputStream} wrapper that provides up to a maximum number of
* bytes from the underlying stream. Does not support mark/reset, even
* when the wrapped stream does, and does not perform any buffering.
*/
public class BoundedInputStream extends InputStream {
/** This stream's underlying #{code InputStream} */
private final InputStream data;
/** The maximum number of bytes still available from this stream */
private long bytesRemaining;
/**
* Initializes a new {#code BoundedInputStream} with the specified
* underlying stream and byte limit
* #param data the #{code InputStream} serving as the source of this
* one's data
* #param maxBytes the maximum number of bytes this stream will deliver
* before signaling end-of-data
*/
public BoundedInputStream(InputStream data, long maxBytes) {
this.data = data;
bytesRemaining = Math.max(maxBytes, 0);
}
#Override
public int available() throws IOException {
return (int) Math.min(data.available(), bytesRemaining);
}
#Override
public void close() throws IOException {
data.close();
}
#Override
public synchronized void mark(int limit) {
// does nothing
}
#Override
public boolean markSupported() {
return false;
}
#Override
public int read(byte[] buf, int off, int len) throws IOException {
if (bytesRemaining > 0) {
int nRead = data.read(
buf, off, (int) Math.min(len, bytesRemaining));
bytesRemaining -= nRead;
return nRead;
} else {
return -1;
}
}
#Override
public int read(byte[] buf) throws IOException {
return this.read(buf, 0, buf.length);
}
#Override
public synchronized void reset() throws IOException {
throw new IOException("reset() not supported");
}
#Override
public long skip(long n) throws IOException {
long skipped = data.skip(Math.min(n, bytesRemaining));
bytesRemaining -= skipped;
return skipped;
}
#Override
public int read() throws IOException {
if (bytesRemaining > 0) {
int c = data.read();
if (c >= 0) {
bytesRemaining -= 1;
}
return c;
} else {
return -1;
}
}
}

BufferedInputStream Mark/Reset invalid mark

I have 2 BufferedInputStreams which both contain an xml string: one small one, and one very large.
Here's how the beginning of each of these xml strings looks like:
<RootElement success="true">
I created a method which:
Sets the mark at the beginning of the inputstream
Reads the first few bytes of the xml to check if the root element has a specific attrribute.
Reset the inputstream to the mark position, so another method can enjoy the complete full stream.
I was under the impression that the size of neither the buffer of the buffered input stream (default is 8012bytes) nor the mark readlimit would actually matter because I'm only reading like the first 50 bytes before resetting regardless of how large my inputstream is.
Unfortunately I get a "IOException: resseting to invalid mark" exception. Here's the relevant code:
private boolean checkXMLForSuccess(BufferedInputStream responseStream) throws XMLStreamException, FactoryConfigurationError
{
//Normally, this should be set to the amount of bytes that can be read before invalidating.
//the mark. Because we use a default buffer size (1024 or 2048 bytes) that is much greater
//than the amount of bytes we will read here (less than 100 bytes) this is not a concern.
responseStream.mark(100);
XMLStreamReader xmlReader = XMLInputFactory.newInstance().createXMLStreamReader(responseStream);
xmlReader.next(); //Go to the root element
//This is for loop, but the root element can only have 1 attribute.
for (int i=0; i < xmlReader.getAttributeCount(); i++)
{
if(xmlReader.getAttributeLocalName(i).equals(SUCCES_ATTRIBUTE))
{
Boolean isSuccess = Boolean.parseBoolean(xmlReader.getAttributeValue(i));
if (isSuccess)
{
try
{
responseStream.reset();
}
catch (IOException e)
{
//Oh oh... reset mark problem??
}
return true;
}
}
}
return false;
}
Now, of course I tried setting the mark read limit to a higher number. I had to set it to a value of 10000 before it finally worked. I cannot imagine my code below needs to read 10000 bytes! What other factors could be responsible for this behaviour?
According to the Documentation of InputStream class - reset() method:
public void reset()
throws IOException
The general contract of reset is:
If the method markSupported returns true, then:
If the number of bytes read from the stream since mark was
last called is larger than the argument to mark at that last call,
then an IOException might be thrown.
In your code,
You have passed 100 as the byte read limit.
responseStream.mark(100);
and there is a very high probability the part of the code:
xmlReader.next();
reads more than 100 bytes, and the mark being invalidated and a call to the reset() method throwing an IOException.
XMLStreamReader.next():
Get next parsing event - a processor may return all contiguous
character data in a single chunk, or it may split it into several
chunks
So, the reader could have kept reading more than the read limit bytes causing the mark to be invalidated. (This happens irrespective of the file size, and if the contiguous characters are large).
Second instance,
If the method markSupported returns false, then:
The call to reset may throw an IOException
but the BufferedInputStream supports marking,
public boolean markSupported()
Tests if this input stream supports the mark and reset methods. The
markSupported method of BufferedInputStream returns true.
So the second case can be cut down.
This is a guess, but the XMLStreamReader is likely reading a large part of the InputStream during the getAttributeCount or getAttributeLocalName methods, although it's possible that it's being done when the XMLStreamReader is created...
I haven't looked through the OpenJDK code to confirm this, though.

A better way to find out how many bytes are in a stream?

Currently, I am relying on the ObjectInputStream.available() method to tell me how many bytes are left in a stream. Reason for this -- I am writing some unit/integration tests on certain functions that deal with streams and I am just trying to ensure that the available() method returns 0 after I am done.
Unfortunately, upon testing for failure (i.e., I have sent about 8 bytes down the stream) my assertion for available() == 0 is coming up true when it should be false. It should show >0 or 8 bytes!
I know that the available() method is classically unreliable, but I figured it would show something at least > 0!
Is there a more reliable way of checking if a stream is empty or not (The is my main goal here after all)? Perhaps in the Apache IO domain or some other library out there?
Does anyone know why the available() method is so profoundly unreliable; what is the point of it? Or, is there a specific, proper way of using it?
Update:
So, as many of you can read from the comments, the main issue I am facing is that on one end of a stream, I am sending a certain number of bytes but on the other end, not all the bytes are arriving!
Specifically, I am sending 205498 bytes on one end and only getting 204988 on the other, consistently. I am controlling both sides of this operation between threads in a socket, but it should be no matter.
Here is the code I have written to collect all the bytes.
public static int copyStream(InputStream readFrom, OutputStream writeTo, int bytesToRead)
throws IOException {
int bytesReadTotal = 0, bytesRead = 0, countTries = 0, available = 0, bufferSize = 1024 * 4;
byte[] buffer = new byte[bufferSize];
while (bytesReadTotal < bytesToRead) {
if (bytesToRead - bytesReadTotal < bufferSize)
buffer = new byte[bytesToRead - bytesReadTotal];
if (0 < (available = readFrom.available())) {
bytesReadTotal += (bytesRead = readFrom.read(buffer));
writeTo.write(buffer, 0, bytesRead);
countTries = 0;
} else if (countTries < 1000)
try {
countTries++;
Thread.sleep(1L);
} catch (InterruptedException ignore) {}
else
break;
}
return bytesReadTotal;
}
I put the countTries variable in there just to see what happens. Even without countTires in there, it will block forever before it reaches the BytesToRead.
What would cause the stream to suddenly block indefinitely like that? I know on the other end it fully sends the bytes over (as it actually utilizes the same method and I see that it completes the function with the full BytesToRead matching bytesReadTotal in the end. But the receiver doesn't. In fact, when I look at the arrays, they match up perfectly up till the end as well.
UPDATE2
I noticed that when I added a writeTo.flush() at the end of my copyStream method, it seems to work again. Hmm.. Why are flushes so vital in this situation. I.e., why would not using it cause a stream to perma-block?
The available() method only returns how many bytes can be read without blocking (which may be 0). In order to see if there are any bytes left in the stream, you have to read() or read(byte[]) which will return the number of bytes read. If the return value is -1 then you have reached the end of file.
This little code snippet will loop through an InputStream until it gets to the end (read() returns -1). I don't think it can ever return 0 because it should block until it can either read 1 byte or discover there is nothing left to read (and therefore return -1)
int currentBytesRead=0;
int totalBytesRead=0;
byte[] buf = new byte[1024];
while((currentBytesRead =in.read(buf))>0){
totalBytesRead+=currentBytesRead;
}

Categories