I am implementing a facility that uses generic reading interface. I am using java.lang.Readable interface which uses CharBuffer to write out the data to.
What it doesn't say is whether the read call will block. It does however return amount of characters written in the buffer, but to me it can also indicate that the buffer didn't have enough space left for the entire waiting input to be written, so only part of it was written instead. But what happens if the buffer has plenty of space, but no input (or fewer character than buffer can hold) is available? Will read return zero (or a small integer number) or will it block?
Yes, it blocks. After operation is completed this method returns the number of char values added to the buffer, or -1 if this source of characters is at its end.
If it's non-blocking, there has to be a mechanism to notify when it becomes readable. So it can't be non-blocking.
Related
I'm new to Java ByteBuffers and was wondering what the correct way to write to a ByteBuffer after it has been flipped.
In my use case, I am writing an outputBuffer to a socket:
outBuffer.flip();
//Non-blocking SocketChannel
int bytesWritten = getSocket().write(outBuffer);
After this, the output buffer has to be written to again. Also not all of the bytes in the outBuffer may have been written to the socket.
Since it is currently flipped, how can I make it writable again, without overriding any data if it is still in the buffer and wasn't written to the socket?
If I am right, outBuffer.position() == bytesWritten and limit should be at how much data there was to write.
So would using the following in order to reuse the output buffer be right? :
int limit = outBuffer.limit()
outBuffer.limit(outBuffer.capacity());
outBuffer.position(limit);
Again from the API spec.:
The following loop copies bytes from one channel to another via the buffer buf:
while (in.read(buf) >= 0 || buf.position != 0) {
buf.flip();
out.write(buf);
buf.compact(); // In case of partial write
}
since it is currently flipped
It will stay flipped. The write doesn't change that.
how can I make it writable again, without overriding any data if it is still in the buffer and wasn't written to the socket?
You don't have to do anything, but if you want to read before you write again you should do flip/write/compact. If you just want to repeat the write just call write() again, with the buffer still in its current state.
But I prefer to always keep these buffers ready for reading, so there is no possibility of a slip-up, and to flip/write/compact (or flip/get/compact) when those operations are necessary, atomically as it were.
Note that you should not use clear(), unless you are certain that the write was complete and the buffer is now empty. In that case compact and clear are equivalent. But it is simpler to just always compact.
If you're copying in blocking mode, use the loop quoted by #zlakad.
When looking at the PrintWriter contract for the following constructor:
public PrintWriter(OutputStream out, boolean autoFlush)
Creates a new PrintWriter from an existing OutputStream. This convenience constructor creates the necessary intermediate OutputStreamWriter, which will convert characters into bytes using the default character encoding.
Parameters:
out - An output stream
autoFlush - A boolean; if true, the println, printf, or format methods will flush the output buffer
See Also:
OutputStreamWriter.OutputStreamWriter(java.io.OutputStream)
Notice the autoFlush flag only works on println, printf, and format. Now, I know that printf and format basically do the exact same thing as print except with more options, but I just don't see why they didn't include print as well in the contract. Why did they make this decision?
I suspect it's because the Java authors are making assumptions about performance:
Consider the following code:
public static void printArray(int[] array, PrintWriter writer) {
for(int i = 0; i < array.length; i++) {
writer.print(array[i]);
if(i != array.length - 1) writer.print(',');
}
}
You almost certainly would not want such a method to call flush() after every single call. It could be a big performance hit, especially for large arrays. And, if for some reason you did want that, you could just call flush yourself.
The idea is that printf, format, and println methods are likely going to be printing a good chunk of text all at once, so it makes sense to flush after every one. But it would rarely, if ever, make sense flush after only 1 or a few characters.
After some searching, I have found a citation for this reasoning (emphasis mine):
Most of the examples we've seen so far use unbuffered I/O. This means each read or write request is handled directly by the underlying OS. This can make a program much less efficient, since each such request often triggers disk access, network activity, or some other operation that is relatively expensive.
To reduce this kind of overhead, the Java platform implements buffered I/O streams. Buffered input streams read data from a memory area known as a buffer; the native input API is called only when the buffer is empty. Similarly, buffered output streams write data to a buffer, and the native output API is called only when the buffer is full.
<snip>
It often makes sense to write out a buffer at critical points, without waiting for it to fill. This is known as flushing the buffer.
Some buffered output classes support autoflush, specified by an optional constructor argument. When autoflush is enabled, certain key events cause the buffer to be flushed. For example, an autoflush PrintWriter object flushes the buffer on every invocation of println or format.
I'm working on a Java program where I'm reading from a file in dynamic, unknown blocks. That is, each block of data will not always be the same size and the size is determined as data is being read. For I/O I'm using a MappedByteBuffer (the file inputs are on the order of MB).
My goal:
Find an efficient way to store each complete block during the input phase so that I can process it.
My constraints:
I am reading one byte at a time from the buffer
My processing method takes a primitive byte array as input
Each block gets processed before the next block is read
What I've tried:
I played around with dynamic structures like Lists but they don't have backing arrays and the conversion time to a primitive array concerns me
I also thought about using a String to store each block and then getBytes() to get the byte[], but it's so slow
Reading the file multiple times in order to find the block size first, and then grab the relevant bytes
I am trying to find an approach that doesn't defeat the purpose of fast I/O. Any advice would be greatly appreciated.
Additional Info:
I'm using a rolling hash to decide where blocks should end
Here's a bit of pseudo-code:
circular_buffer[] = read first 128 bytes
rolling_hash = hash(buffer[])
block_storage = ??? // this is the data structure I'd like to use
while file has more text
b = next byte
add b to block_storage
add b to next index in circular_buffer (if reached end, start adding/overwriting front)
shift rolling_hash one byte to the right
if hash has a certain characteristic
process block_storage as a byte[] //should contain entire block of data
As you can see, I'm reading one byte at a time, and storing/overwriting that one byte repeatedly. However, once I get to the processing stage, I want to be able to access all of the info in the block. There is no predetermined max size of a block either, so I can't pre-allocate.
It seems to me, that you reqire a dynamically growing buffer. You can use the built in BytaArrayOutputStream to achieve that. It will automatically grow to store all data written to it. You can use write(int b) and toByteArray() to realize add b to block_storage and process block_storage as a byte[].
But take care - this stream will grow unbounded. You should implement some sanity checks around it to avoid using up all memory (e.g. count bytes written to it and break by throwing an exception, when it exceeds an reasonable amount). Also make sure to close and throw away the reference to a stream after consuming the block, to allow the GC to free up memory.
edit: As #marcman pointed out, the buffer can be reset().
I'm trying to build a simple parser, and since InputStream doesn't have some peek-like method, I'm using mark and reset.
But I suspect that successive calls to mark, invalidate the previous ones. Is that the case?
Is it possible to do something like
foo.mark(1);
...
foo.mark(2);
...
foo.reset();
...
foo.reset();
If not, is there some other way to simulate this or the peek method?
Thx.
Your suspicion is correct, the InputStream.mark(int readlimit) method will allow you reposition the stream only to the last marked position, provided you have read less than readlimit bytes. If you want a "peekable" InputStream you may want to consider the PushbackInputStream. It doesn't explicitly offer peek functionality, but it will allow you to "push back" bytes you have read.
Marks don't nest.
If you want to reread the stream several times, you might need to copy (a portion of) the stream into a byte array, and make a ByteArrayInputStream of it. You still can't have multiple marks, but you can have multiple ByteArrayInputStreams. (Or just forget about ByteArrayInputStream and pick bytes off the array directly.)
Is there a way to flush the input stream in Java, perhaps prior to closing it? In relation to
iteratively invoking the statements below, while reading several files on disk
InputStream fileStream = item.openStream();
fileStream.close;
InputStream cannot be flushed. Why do you want to do this?
OutputStream can be flushed as it implements the interface Flushable. Flushing makes IMHO only sense in scenarios where data is written (to force a write of buffered data). Please see the documentation of Flushable for all implementing classes.
This is an old question but appears to be the only one of its kind, and I think there is a valid use case for "flushing" an input stream.
In Java, if you are using a BufferedReader or BufferedInputStream (which I believe is a common case), "flushing" the stream can be considered to be equivalent to discarding all data currently in the buffer -- i.e., flushing the buffer.
For an example of when this might be useful, consider implementing a REPL for a programming language such as Python or similar.
In this case you might use a BufferedReader on System.in. The user enters a (possibly large) expression and hits enter. At this point, the expression (or some part of it) is stored in the buffer of your Reader.
Now, if a syntax error occurs somewhere within the expression, it will be caught and displayed to the user. However, the remainder of the expression still resides in the input buffer.
When the REPL loop continues, it will begin reading just beyond the point where the syntax error occurred, in the middle of some erroneous expression. This is likely not desirable. Rather, it would be better to simply discard the remainder of the buffer and continue with a "fresh start."
In this sense, we can use the BufferedReader API method ready() to discard any remaining buffered characters. The documentation reads:
"Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
Then we can define a method to "flush" a BufferedReader as:
void flush(BufferedReader input) throws IOException
{
while (input.ready())
input.read();
}
which simply discards all remaining data until the buffer is empty. We then call flush() after handling a syntax error (by displaying to the user). When the REPL loop resumes you have an empty buffer and thus do not get a pile of meaningless errors caused by the "junk" left in the buffer.