I'm new to Java streams, I would like to read a specific files content then need to read it from the beginning. I have created a BufferedInputStream and i'm confused about the documentation of BufferedInputStream.mark(int markLimit)
Documentation says:
public void mark(int readlimit)
This method marks a position in the input to which the stream can be "reset" by calling the reset() method. The parameter readlimit is the number of bytes that can be read from the stream after setting the mark before the mark becomes invalid. For example, if mark() is called with a read limit of 10, then when 11 bytes of data are read from the stream before the reset() method is called, then the mark is invalid and the stream object instance is not required to remember the mark.
Note that the number of bytes that can be remembered by this method can be greater than the size of the internal read buffer. It is also not dependent on the subordinate stream supporting mark/reset functionality.
Overrides:
mark in class FilterInputStream
Parameters:
readlimit - The number of bytes that can be read before the mark becomes invalid**
My code is:
public class Test {
public static void main(String[] args) throws IOException {
File resource = new File("beforeFix.txt");
FileInputStream fileInputStream = new FileInputStream(resource);
BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);
int i = bufferedInputStream.read();
bufferedInputStream.mark(1);
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
bufferedInputStream.reset();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
i = bufferedInputStream.read();
bufferedInputStream.reset();
}
}
In the above code I have set the marklimit as 1 but the mark is not goes into invalid as per the documentation.
Can anyone clearly explain me what is the actual purpose of setting this with small example?
Thanks in advance
In order for reset to work and go back to the position you marked, the data that was read after you marked needs to be buffered in memory. The value you specify when marking is the amount of memory that should be reserved for this.
So if you intend to read 100 bytes before calling reset, then your buffer needs to be atleast 100 bytes, and so that is what you have to call mark with.
bufferedInputStream.mark(200);
... read no more than 200 bytes ...
bufferedInputStream.reset(); // reset back to marked position
Update
It looks like the documentation for mark is not matching the actual behaviour. The documentation states:
the maximum limit of bytes that can be read before the mark position becomes invalid
However, it looks like it should be the minimum limit, or at the very least the underlying implementations are not required to discard the mark as soon as the read limit is exceeded if they can still support resetting to the marked position.
By calling mark with a specified limit, you are requesting the capability to support resetting after reading up to the specified limit, you are not denying a capability beyond that. The specification clearly says:
However, the stream is not required to remember any data at all if more than readlimit bytes are read from the stream before reset is called.
“is not required to” does not imply “is not allowed to”. The specification simply states what you can not expect to always work, it doesn’t state what you can expect to always fail.
In the case of BufferedInputStream, it’s easy to explain what happens under the hood. Each BufferedInputStream has a capacity, the default is 8192, and it always can reset past as many bytes as its current buffer capacity. By specifying a higher limit, you will cause it to allocate a larger buffer when needed, to fulfill the guaranty.
Since you can’t query a stream for its current buffer capacity, you can only rely on the guaranty that reset works as long as you don’t read more bytes than the specified limit.
It’s easy to change your example to make it fail reproducible:
File resource = new File("beforeFix.txt");
FileInputStream fileInputStream = new FileInputStream(resource);
BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream, 1);
int i = bufferedInputStream.read();
bufferedInputStream.mark(1);
i = bufferedInputStream.read();
i = bufferedInputStream.read();
bufferedInputStream.reset(); // will fail
Please read the below documentation to better understand it. I had the same doubt as you and then decided to read about it in detail.
If the method mark has not been called since the stream was created, or the number of bytes read from the stream since the mark was last called is larger than the argument to mark at that last call, then an IOException might be thrown.
If such an IOException is not thrown, then the stream is reset to a state such that all the bytes read since the most recent call to mark (or since the start of the file, if the mark has not been called) will be resupplied to subsequent callers of the read method, followed by any bytes that otherwise would have been the next input data as of the time of the call to reset.
From the first point, it is now very much clear that an IOException is not guaranteed to be thrown. So, if you are reading more than the allowed number of bytes (the specified argument of mark method) after calling the mark method, then it's a risky operation and not recommended.
From the second point, you can understand what happens if IOException is not thrown.
The link to the documentation: https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#reset--
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
public class Main {
static final String fileAbsolutePath = "file.txt";
public static void main(String[] args) {
// file contains 3 chars 'a', 'b', and 'c'
try (FileInputStream fileInputStream = new FileInputStream(fileAbsolutePath);
BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream)) {
System.out.println((char) bufferedInputStream.read()); // prints a
bufferedInputStream.mark(2); // place mark at second byte
System.out.println((char) bufferedInputStream.read()); // prints b
System.out.println((char) bufferedInputStream.read()); // prints c
System.out.println(bufferedInputStream.read()); // prints -1
bufferedInputStream.reset(); // meaning start again from where I placed the mark
System.out.println((char) bufferedInputStream.read()); // prints b
System.out.println((char) bufferedInputStream.read()); // prints c
System.out.println( bufferedInputStream.read()); // prints -1
} catch (IOException ie) { ie.printStackTrace();}
}
}
As the Oracle documentation says about reset() method of InputStream (which is override by FilterInputStream and further by BufferedInputStream).
The general contract of reset is:
The general contract of reset is:
-If the method markSupported returns true, then:
-If the method mark has not been called since the stream was created, or
the number of bytes read from the stream since mark was last called is
larger than the argument to mark at that last call, then an IOException
**might be** thrown.
(note that IOException might be thrown)
-If such an IOException is not thrown, then the stream is reset to a state
such that all the bytes read since the most recent call to mark (or since
the start of the file, if mark has not been called) will be resupplied to
subsequent callers of the read method, followed by any bytes that
otherwise would have been the next input data as of the time of the call
to reset.
I hope, your question is solved and help this for future programmers.
Related
i am making a java program that reads data from a binary stream (using a DataInputStream).
Sometimes during this process i need to read a data chunk, however the method (which i cannot modify) that reads it will stop before reaching the end of the chunk (it is the normal behavior, apparently it just doesn't need the last bytes, but i can't do anything about the fact that they are there). This is not a problem in itself because i know exactly how long the chunk is, i.e. i know how many bytes there are in the chunk so i can skip bytes (with the skipBytes(int) method) until the end of the chunk ; the problem is : i don't actually know how many bytes the method actually read (or left), so i don't know how many bytes i need to skip to reach the end of the chunk.
Is there any way to :
know how many bytes were read in a stream since a certain point in time ?
know how many bytes were read in a stream since it was ?
any other way i could know how many bytes my data-chunk-reading method just read (since it won't directly tell me) ?
Just in case, i made a small diagram
Thanks in advance
ImageInputStream can do what you want. It implements DataInput and it has most of the methods of InputStream. And it has getStreamPosition, seek and skipBytes methods.
However, as you correctly point out, ImageIO.read(ImageInputStream) would close the stream, preventing you from reading more than one image.
The solution is to avoid using ImageIO.read, and instead obtain an ImageReader explicitly, using ImageIO.getImageReaders. Then you can invoke an ImageReader’s read method, which does not close the stream.
Here’s how I implemented it:
public void readImages(InputStream source,
Consumer<? super BufferedImage> imageHandler)
throws IOException {
// Every image is at a byte index which is a multiple of this number.
int boundary = 5000;
try (ImageInputStream stream = ImageIO.createImageInputStream(source)) {
while (true) {
long pos = stream.getStreamPosition();
Iterator<ImageReader> readers = ImageIO.getImageReaders(stream);
if (!readers.hasNext()) {
break;
}
ImageReader reader = readers.next();
reader.setInput(stream);
BufferedImage image = reader.read(0);
imageHandler.accept(image);
pos = stream.getStreamPosition();
long bytesToSkip = boundary - (pos % boundary);
if (bytesToSkip < boundary) {
stream.skipBytes(bytesToSkip);
}
}
}
}
And here’s how I tested it:
try (InputStream source = new BufferedInputStream(
Files.newInputStream(Path.of(filename)))) {
reader.readImages(source, img -> EventQueue.invokeLater(() -> {
JOptionPane.showMessageDialog(null, new ImageIcon(img));
}));
}
All the buffered read methods return the actual number of bytes read.
Quoting documentation for InputStream#read(byte[] b):
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.
In my application, I'm trying to compress/decompress byte array using java's Inflater/Deflater class.
Here's part of the code I used at first:
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
Then after I deployed the code it'll randomly (very rare) cause the whole application hang, and when I took a thread dump, I can identify that one thread hanging
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
- locked java.util.zip.ZStreamRef#fc71443
at java.util.zip.Inflater.inflate(Inflater.java:280)
It doesn't happen very often. Then I googled everywhere and found out it could be some empty byte data passed in the inflater and finished() will never return true.
So I used a workaround, instead of using
while (!inflater.finished())
to determine if it's finished, I used
while (inflater.getRemaining() > 0)
But it happened again.
Now it makes me wonder what's the real reason that causes the issue. There shouldn't be any empty array passed in the inflater, even if it did, how come getRemaining() method did not break the while loop?
Can anybody help pls? It's really bugging me.
Confused by the same problem, I find this page.
This is my workaround for this, it may helps:
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int i = inflater.inflate(buffer);
if (i == 0) {
break;
}
byteArrayOutputStream.write(buffer, 0, i);
}
The javadoc of inflate:
Uncompresses bytes into specified buffer. Returns actual number of bytes uncompressed. A return value of 0 indicates that needsInput() or needsDictionary() should be called in order to determine if more input data or a preset dictionary is required. In the latter case, getAdler() can be used to get the Adler-32 value of the dictionary required.
So #Wildo Luo was certainly right to check for 0 being returned.
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
if (count != 0 ) {
outputStream.write(buffer, 0, count);
} else {
if (inflater.needsInput()) { // Not everything read
inflater.setInput(...);
} else if (inflater.needsDictionary()) { // Dictionary to be loaded
inflater.setDictionary(...);
}
}
}
inflater.end();
I can only imagine that elsewhere the code is not entirely right, maybe on the compression size. Better first check the general code. There is the Inflater(boolean nowrap) requiring an extra byte, the end() call. Exception handling (try-finally). Etcetera.
For unkown data, unknown occurrences: using a try-catch, find compressed data to check whether it is a data based error, and for testing any solution.
Having the same problem...
What I'm sure about:
I'm having an infinite loop, assured with logs printed.
inflater.inflate returns 0, and the output buffer size is 0.
My loop is like this (Hive ORC code):
while (!(inflater.finished() || inflater.needsDictionary() ||
inflater.needsInput())) {
try {
int count = inflater.inflate(out.array(),
out.arrayOffset() + out.position(),
out.remaining());
out.position(count + out.position());
} catch (DataFormatException dfe) {
throw new IOException("Bad compression data", dfe);
}
}
After the out buffer is consumed and its remaining size is 0, the loop will infinitely run.
But I'm not sure about whether it's orc or zlib caused this. On orc side, it fills original data with the same compression buffer size then do the compression, so theoretically it's not possible I get an compressed chunk larger than the buffer size. Possibilities may be zlib or hardware.
That being said, break the loop when count == 0 is dangerous, since there may be still uncompressed data in the inflator.
I have 2 BufferedInputStreams which both contain an xml string: one small one, and one very large.
Here's how the beginning of each of these xml strings looks like:
<RootElement success="true">
I created a method which:
Sets the mark at the beginning of the inputstream
Reads the first few bytes of the xml to check if the root element has a specific attrribute.
Reset the inputstream to the mark position, so another method can enjoy the complete full stream.
I was under the impression that the size of neither the buffer of the buffered input stream (default is 8012bytes) nor the mark readlimit would actually matter because I'm only reading like the first 50 bytes before resetting regardless of how large my inputstream is.
Unfortunately I get a "IOException: resseting to invalid mark" exception. Here's the relevant code:
private boolean checkXMLForSuccess(BufferedInputStream responseStream) throws XMLStreamException, FactoryConfigurationError
{
//Normally, this should be set to the amount of bytes that can be read before invalidating.
//the mark. Because we use a default buffer size (1024 or 2048 bytes) that is much greater
//than the amount of bytes we will read here (less than 100 bytes) this is not a concern.
responseStream.mark(100);
XMLStreamReader xmlReader = XMLInputFactory.newInstance().createXMLStreamReader(responseStream);
xmlReader.next(); //Go to the root element
//This is for loop, but the root element can only have 1 attribute.
for (int i=0; i < xmlReader.getAttributeCount(); i++)
{
if(xmlReader.getAttributeLocalName(i).equals(SUCCES_ATTRIBUTE))
{
Boolean isSuccess = Boolean.parseBoolean(xmlReader.getAttributeValue(i));
if (isSuccess)
{
try
{
responseStream.reset();
}
catch (IOException e)
{
//Oh oh... reset mark problem??
}
return true;
}
}
}
return false;
}
Now, of course I tried setting the mark read limit to a higher number. I had to set it to a value of 10000 before it finally worked. I cannot imagine my code below needs to read 10000 bytes! What other factors could be responsible for this behaviour?
According to the Documentation of InputStream class - reset() method:
public void reset()
throws IOException
The general contract of reset is:
If the method markSupported returns true, then:
If the number of bytes read from the stream since mark was
last called is larger than the argument to mark at that last call,
then an IOException might be thrown.
In your code,
You have passed 100 as the byte read limit.
responseStream.mark(100);
and there is a very high probability the part of the code:
xmlReader.next();
reads more than 100 bytes, and the mark being invalidated and a call to the reset() method throwing an IOException.
XMLStreamReader.next():
Get next parsing event - a processor may return all contiguous
character data in a single chunk, or it may split it into several
chunks
So, the reader could have kept reading more than the read limit bytes causing the mark to be invalidated. (This happens irrespective of the file size, and if the contiguous characters are large).
Second instance,
If the method markSupported returns false, then:
The call to reset may throw an IOException
but the BufferedInputStream supports marking,
public boolean markSupported()
Tests if this input stream supports the mark and reset methods. The
markSupported method of BufferedInputStream returns true.
So the second case can be cut down.
This is a guess, but the XMLStreamReader is likely reading a large part of the InputStream during the getAttributeCount or getAttributeLocalName methods, although it's possible that it's being done when the XMLStreamReader is created...
I haven't looked through the OpenJDK code to confirm this, though.
I'm running a multithreaded minimalistic http(s) server (not a web server though) that accepts connections on three server sockets: local, internet and internet-ssl.
Each socket has a so timeout of 1000ms (might be lowered in the future).
The worker threads read requests like this:
byte[] reqBuffer = new byte[512];
theSocket.getInputStream().read(reqBuffer);
The problem now is that with the newly implemented ssl socket the problem with the 1/n-1 record splitting technique arises. Also some clients split in other strange ways when using ssl (4/n-4 etc.) so I thought I might just perform multiple reads like this:
byte[] reqBuffer = new byte[512];
InputStream is = theSocket.getInputStream();
int read = is.read(reqBuffer, 0, 128); // inital read - with x/n-x this is very small
int pos = 0;
if (read > 0) {
pos = read;
}
int i = 0;
do {
read = is.read(reqBuffer, pos, 128);
if (read > 0) {
pos += read;
}
i++;
} while(read == 128 && i < 3); // max. 3 more reads (4 total = 512 bytes) or until less than 128 bytes are read (request should be completely read)
Which works with browsers like firefox or chrome and other clients using that technique.
Now my problem is that the new method is much slower. Requests to the local socket are so slow that a script with 2 seconds timeout times out requesting (I have no idea why). Maybe I have some logical problem in my code?
Is there a better way to read from a SSL socket? Because there are up to hundreds or even a thousand requests per second and the new read method slows down even the http requests.
Note: The ssl-socket is not in use at the moment and will not be used until I can fix this problem.
I have also tried reading line for line using a buffered reader since we are talking about http here but the server exploded running out of file descriptors (limit is 20 000). Might have been because of my implementation, though.
I'm thankful for every suggestion regarding this problem. If you need more information about the code just tell me and I will post them asap.
EDIT:
I actually put a little bit more thought into what I am trying to do and I realized that it comes down to reading HTTP headers. So the best solution would be to actually read the request line for line (or character for character) and stop reading after x lines or until an empty line (marking the end of the header) is reached.
My current approach would be to put a BufferedInputStream around the socket's InputStream and read it with an InputStreamReader which is "read" by a BufferedReader (question: does it make sense to use a BufferedInputStream when I'm using a BufferedReader?).
This BufferedReader reads the request character for character, detects end-of-lines (\r\n) and continues to read until either a line longer than 64 characters is reached, a maximum of 8 lines are read or an empty line is reached (marking the end of the HTTP header). I will test my implementation tomorrow and edit this edit accordingly.
EDIT:
I almost forgot to write my results here: It works. On every socket, even faster than the previously working way. Thanks everyone for pointing me in the right direction. I ended up implementing it like this:
List<String> requestLines = new ArrayList<String>(6);
InputStream is = this.cSocket.getInputStream();
bis = new BufferedInputStream(is, 1024);
InputStreamReader isr = new InputStreamReader(bis, Config.REQUEST_ENCODING);
BufferedReader br = new BufferedReader(isr);
/* read input character for character
* maximum line size is 768 characters
* maximum number of lines is 6
* lines are defined as char sequences ending with \r\n
* read lines are added to a list
* reading stops at the first empty line => HTTP header end
*/
int readChar; // the last read character
int characterCount = 0; // the character count in the line that is currently being read
int lineCount = 0; // the overall line count
char[] charBuffer = new char[768]; // create a character buffer with space for 768 characters (max line size)
// read as long as the stream is not closed / EOF, the character count in the current line is below 768 and the number of lines read is below 6
while((readChar = br.read()) != -1 && characterCount < 768 && lineCount < 6) {
charBuffer[characterCount] = (char) readChar; // fill the char buffer with the read character
if (readChar == '\n' && characterCount > 0 && charBuffer[characterCount-1] == '\r') { // if end of line is detected (\r\n)
if (characterCount == 1) { // if empty line
break; // stop reading after an empty line (HTTP header ended)
}
requestLines.add(new String(charBuffer, 0, characterCount-1)); // add the read line to the readLines list (and leave out the \r)
// charBuffer = new char[768]; // clear the buffer - not required
characterCount = 0; // reset character count for next line
lineCount++; // increase read line count
} else {
characterCount++; // if not end of line, increase read character count
}
}
This is most likely slower as you are waiting for the other end to send more data, possibly data it is never going to send.
A better approach is you give it a larger buffer like 32KB (128 is small) and only read once the data which is available. If this data needs to be re-assembled in the messages of some sort, you shouldn't be using timeouts or a fixed number of loops as read() is only guaranteed to return one byte at least.
You should certainly wrap a BufferedInputStream around the SSLSocket's input stream.
Your technique of reading 128 bytes at a time and advancing the offset is completely pointless. Just read as much as you can at a time and deal with it. Or one byte at a time from the buffered stream.
Similarly you should certainly wrap the SSLSocket's output stream in a BufferedOutputStream.
Currently, I am relying on the ObjectInputStream.available() method to tell me how many bytes are left in a stream. Reason for this -- I am writing some unit/integration tests on certain functions that deal with streams and I am just trying to ensure that the available() method returns 0 after I am done.
Unfortunately, upon testing for failure (i.e., I have sent about 8 bytes down the stream) my assertion for available() == 0 is coming up true when it should be false. It should show >0 or 8 bytes!
I know that the available() method is classically unreliable, but I figured it would show something at least > 0!
Is there a more reliable way of checking if a stream is empty or not (The is my main goal here after all)? Perhaps in the Apache IO domain or some other library out there?
Does anyone know why the available() method is so profoundly unreliable; what is the point of it? Or, is there a specific, proper way of using it?
Update:
So, as many of you can read from the comments, the main issue I am facing is that on one end of a stream, I am sending a certain number of bytes but on the other end, not all the bytes are arriving!
Specifically, I am sending 205498 bytes on one end and only getting 204988 on the other, consistently. I am controlling both sides of this operation between threads in a socket, but it should be no matter.
Here is the code I have written to collect all the bytes.
public static int copyStream(InputStream readFrom, OutputStream writeTo, int bytesToRead)
throws IOException {
int bytesReadTotal = 0, bytesRead = 0, countTries = 0, available = 0, bufferSize = 1024 * 4;
byte[] buffer = new byte[bufferSize];
while (bytesReadTotal < bytesToRead) {
if (bytesToRead - bytesReadTotal < bufferSize)
buffer = new byte[bytesToRead - bytesReadTotal];
if (0 < (available = readFrom.available())) {
bytesReadTotal += (bytesRead = readFrom.read(buffer));
writeTo.write(buffer, 0, bytesRead);
countTries = 0;
} else if (countTries < 1000)
try {
countTries++;
Thread.sleep(1L);
} catch (InterruptedException ignore) {}
else
break;
}
return bytesReadTotal;
}
I put the countTries variable in there just to see what happens. Even without countTires in there, it will block forever before it reaches the BytesToRead.
What would cause the stream to suddenly block indefinitely like that? I know on the other end it fully sends the bytes over (as it actually utilizes the same method and I see that it completes the function with the full BytesToRead matching bytesReadTotal in the end. But the receiver doesn't. In fact, when I look at the arrays, they match up perfectly up till the end as well.
UPDATE2
I noticed that when I added a writeTo.flush() at the end of my copyStream method, it seems to work again. Hmm.. Why are flushes so vital in this situation. I.e., why would not using it cause a stream to perma-block?
The available() method only returns how many bytes can be read without blocking (which may be 0). In order to see if there are any bytes left in the stream, you have to read() or read(byte[]) which will return the number of bytes read. If the return value is -1 then you have reached the end of file.
This little code snippet will loop through an InputStream until it gets to the end (read() returns -1). I don't think it can ever return 0 because it should block until it can either read 1 byte or discover there is nothing left to read (and therefore return -1)
int currentBytesRead=0;
int totalBytesRead=0;
byte[] buf = new byte[1024];
while((currentBytesRead =in.read(buf))>0){
totalBytesRead+=currentBytesRead;
}