Buffered Input Stream mark read limit

Buffered Input Stream mark read limit - java

I am learning how to use an InputStream. I was trying to use mark for BufferedInputStream, but when I try to reset I have these exceptions:
java.io.IOException: Resetting to invalid mark
I think this means that my mark read limit is set wrong. I actually don't know how to set the read limit in mark(). I tried like this:
is = new BufferedInputStream(is);
is.mark(is.available());
This is also wrong.
is.mark(16);
This also throws the same exception.
How do I know what read limit I am supposed to set? Since I will be reading different file sizes from the input stream.

mark is sometimes useful if you need to inspect a few bytes beyond what you've read to decide what to do next, then you reset back to the mark and call the routine that expects the file pointer to be at the beginning of that logical part of the input. I don't think it is really intended for much else.
If you look at the javadoc for BufferedInputStream it says
The mark operation remembers a point in the input stream and the reset operation causes all the bytes read since the most recent mark operation to be reread before new bytes are taken from the contained input stream.
The key thing to remember here is once you mark a spot in the stream, if you keep reading beyond the marked length, the mark will no longer be valid, and the call to reset will fail. So mark is good for specific situations and not much use in other cases.

This will read 5 times from the same BufferedInputStream.
for (int i=0; i<5; i++) {
inputStream.mark(inputStream.available()+1);
// Read from input stream
Thumbnails.of(inputStream).forceSize(160, 160).toOutputStream(out);
inputStream.reset();
}

The value you pass to mark() is the amount backwards that you will need to reset. if you need to reset to the beginning of the stream, you will need a buffer as big as the entire stream. this is probably not a great design as it will not scale well to large streams. if you need to read the stream twice and you don't know the source of the data (e.g. if it's a file, you could just re-open it), then you should probably copy it to a temp file so you can re-read it at will.

Related

Taking input in Java

I have some confusion if someone can help. Tried searching the web for it but didn't get any satisfying answer.
Why don't we simply use System.in.somemethod() to take input in Java just like we do for output? Like System.out.println() is used so why not System.in as it is? Why is there a long process for Input?

The only methods that System.in, an InputStream, provides are the overloads of read. Sure, you could do something like:
byte[] bytes = new byte[5];
System.in.read(bytes);
System.out.println(Arrays.toString(bytes));
to read five bytes from the console. But this has the following disadvantages:
You need to handle the checked IOException. (not shown in the code snippet above)
Hard to work with bytes. (unless you want them specifically)
You usually just want to read the input until the end of a line. With this it's hard to know where the end of a line is.
So that's why people use Scanners to wrap the System.in stream into something more user-friendly.

Taking input from the command line will always be trickier than just outputting data. This is because there is no way to know that the input is semantically correct, structured correctly or even syntactically correct.
If you just want to read bytes from System.in then a lot of the uncertainty of the input disappears. In that case there is only two things to take into account: I/O errors and end-of-input - both of which are also present for System.out. The only other thing that may be tricky is that InputStream may not return all the bytes that are requested in a single call to read.
So reading data from System.in isn't hard; interpreting the data - which often comes down to parsing the data or validating the data - is the hard part. And that's why often the Scanner class is used to make sense of the input.

Just as you cannot use System.out.somemethod() instead of System.out.println() in the same way you cannot use System.in.somemethod() instead of System.in.read().

How to re-read an InputStream after calling IOUtils.copy?

I simply use
IOUtils.copy(myInputStream, myOutputStream);
And I see that before calling IOUtils.copy the input stream is avaible to read and after not.
flux.available()
(int) 1368181 (before)
(int) 0 (after)
I saw some explanation on this post, and I see I can copy the bytes from my InputStream to a ByteArrayInputStream and then use mark(0) and read(), in order to read multiple times an input stream.
Here is the code resulted (which is working).
I find this code very verbose, and I'd like if there is a better solution to do that.
ByteArrayInputStream fluxResetable = new ByteArrayInputStream(IOUtils.toByteArray(myInputStream));
fluxResetable.mark(0);
IOUtils.copy(fluxResetable, myOutputStream);
fluxResetable.reset();

An InputStream, unless otherwise stated, is single shot: you consume it once and that's it.
If you want to read it many times, that isn't just a stream any more, it's a stream with a buffer. Your solution reflects that accurately, so it is acceptable. The one thing I would probably change is to store the byte array and always create a new ByteArrayInputStream from it when needed, rather than resetting the same one:
byte [] content = IOUtils.toByteArray(myInputStream);
IOUtils.copy(new ByteArrayInputStream(content), myOutputStream);
doSomethingElse(new ByteArrayInputStream(content));
The effect is more or less the same but it's slightly easier to see what you're trying to do.

Can I perform successive mark operations on an InputStream in Java

I'm trying to build a simple parser, and since InputStream doesn't have some peek-like method, I'm using mark and reset.
But I suspect that successive calls to mark, invalidate the previous ones. Is that the case?
Is it possible to do something like
foo.mark(1);
...
foo.mark(2);
...
foo.reset();
...
foo.reset();
If not, is there some other way to simulate this or the peek method?
Thx.

Your suspicion is correct, the InputStream.mark(int readlimit) method will allow you reposition the stream only to the last marked position, provided you have read less than readlimit bytes. If you want a "peekable" InputStream you may want to consider the PushbackInputStream. It doesn't explicitly offer peek functionality, but it will allow you to "push back" bytes you have read.

Marks don't nest.
If you want to reread the stream several times, you might need to copy (a portion of) the stream into a byte array, and make a ByteArrayInputStream of it. You still can't have multiple marks, but you can have multiple ByteArrayInputStreams. (Or just forget about ByteArrayInputStream and pick bytes off the array directly.)

flushing input stream: java

Is there a way to flush the input stream in Java, perhaps prior to closing it? In relation to
iteratively invoking the statements below, while reading several files on disk
InputStream fileStream = item.openStream();
fileStream.close;

InputStream cannot be flushed. Why do you want to do this?
OutputStream can be flushed as it implements the interface Flushable. Flushing makes IMHO only sense in scenarios where data is written (to force a write of buffered data). Please see the documentation of Flushable for all implementing classes.

This is an old question but appears to be the only one of its kind, and I think there is a valid use case for "flushing" an input stream.
In Java, if you are using a BufferedReader or BufferedInputStream (which I believe is a common case), "flushing" the stream can be considered to be equivalent to discarding all data currently in the buffer -- i.e., flushing the buffer.
For an example of when this might be useful, consider implementing a REPL for a programming language such as Python or similar.
In this case you might use a BufferedReader on System.in. The user enters a (possibly large) expression and hits enter. At this point, the expression (or some part of it) is stored in the buffer of your Reader.
Now, if a syntax error occurs somewhere within the expression, it will be caught and displayed to the user. However, the remainder of the expression still resides in the input buffer.
When the REPL loop continues, it will begin reading just beyond the point where the syntax error occurred, in the middle of some erroneous expression. This is likely not desirable. Rather, it would be better to simply discard the remainder of the buffer and continue with a "fresh start."
In this sense, we can use the BufferedReader API method ready() to discard any remaining buffered characters. The documentation reads:
"Tells whether this stream is ready to be read. A buffered character stream is ready if the buffer is not empty, or if the underlying character stream is ready."
Then we can define a method to "flush" a BufferedReader as:
void flush(BufferedReader input) throws IOException
{
while (input.ready())
input.read();
}
which simply discards all remaining data until the buffer is empty. We then call flush() after handling a syntax error (by displaying to the user). When the REPL loop resumes you have an empty buffer and thus do not get a pile of meaningless errors caused by the "junk" left in the buffer.

Java BufferedReader back to the top of a text file?

I currently have 2 BufferedReaders initialized on the same text file. When I'm done reading the text file with the first BufferedReader, I use the second one to make another pass through the file from the top. Multiple passes through the same file are necessary.
I know about reset(), but it needs to be preceded with calling mark() and mark() needs to know the size of the file, something I don't think I should have to bother with.
Ideas? Packages? Libs? Code?
Thanks
TJ

The Buffered readers are meant to read a file sequentially. What you are looking for is the java.io.RandomAccessFile, and then you can use seek() to take you to where you want in the file.
The random access reader is implemented like so:
try{
String fileName = "c:/myraffile.txt";
File file = new File(fileName);
RandomAccessFile raf = new RandomAccessFile(file, "rw");
raf.readChar();
raf.seek(0);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
The "rw" is a mode character which is detailed here.
The reason the sequential access readers are setup like this is so that they can implement their buffers and that things can not be changed beneath their feet. For example the file reader that is given to the buffered reader should only be operated on by that buffered reader. If there was another location that could affect it you could have inconsistent operation as one reader advanced its position in the file reader while the other wanted it to remain the same now you use the other reader and it is in an undetermined location.

What's the disadvantage of just creating a new BufferedReader to read from the top? I'd expect the operating system to cache the file if it's small enough.
If you're concerned about performance, have you proved it to be a bottleneck? I'd just do the simplest thing and not worry about it until you have a specific reason to. I mean, you could just read the whole thing into memory and then do the two passes on the result, but again that's going to be more complicated than just reading from the start again with a new reader.

The best way to proceed is to change your algorithm, in a way in which you will NOT need the second pass. I used this approach a couple of times, when I had to deal with huge (but not terrible, i.e. few GBs) files which didn't fit the available memory.
It might be hard, but the performance gain usually worths the effort

About mark/reset:
The mark method in BufferedReader takes a readAheadLimit parameter which limits how far you can read after a mark before reset becomes impossible. Resetting doesn't actually mean a file system seek(0), it just seeks inside the buffer. To quote the Javadoc:
readAheadLimit - Limit on the number of characters that may be read while still preserving the mark. After reading this many characters, attempting to reset the stream may fail. A limit value larger than the size of the input buffer will cause a new buffer to be allocated whose size is no smaller than limit. Therefore large values should be used with care.

"The whole business about mark() and reset() in BufferedReader smacks of poor design."
why don't you extend this class and have it do a mark() in the constructor() and then do a seek(0) in topOfFile() method.
BR,
~A

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.