How do I decide how many bytes to read from an inputstream? - java

I am trying to read from an InputStream. I wrote below code
byte[] bytes = new byte[1024 * 32];
while (bufferedInStream.read(bytes) != -1) {
bufferedOutStream.write(bytes);
}
What I don't understand is how many bytes I should read in an iteration? The stream contains a file saved on the disk.
I read here but I did not understand the post really.

Say you had a flow of water from a pipe into a bath. You then used a bucket to get water from the bath and carry to say to your garden to water the lawn. The bath is the buffer. When you are walking across the lawn the buffer is filling up so when you return there is a bucket ful for you to take again.
If the bath is tiny then it could overflow while you are walking with the bucket and so you will lose water. If you have a massive bath then it is unlikely to overflow. so a larger buffer is more convenient. but of course a larger bath costs more money and takes up more space.
A buffer in your program takes up memory space. And you don't want to take up all your available memory for your buffer just because it is convenient.
Generally in your read function you can specify how many bytes to read. so even if you have a small buffer you could do this (pseudocode):
const int bufsize = 50;
buf[bufsize];
unsigned read;
while ((read = is.read(buf, bufsize)) != NULL) {
// do something with data - up to read bytes
}
In above code bufzise is MAXIMUM data to read into the buffer.
If your read function does not allow you to specify a maximum number of bytes to read then you need to supply a buffer large enough to receive the largest possible read amount.
So the optimal buffer size is application specific. Only the application developer will know the characteristics of the data. Eg how fast is the flow of water into the bath. What bath size can you afford (embedded apps), how quickly can you carry bucket from bath across garden and back again.

It is depend on available memory, size of file and other stuff. You better make some measurement.
PS: You code is wrong. bufferedInStream.read(bytes) may not fill all buffer, but only part of it. This method return actual amount of bytes as result.
byte[] bytes = new byte[1024 * 32];
int size;
while ((size = bufferedInStream.read(bytes)) != -1) {
bufferedOutStream.write(bytes, 0, size);
}

Here is my suggestion (assuming we are dealing with just input stream and not how we gonna write to output stream):
If your use case does not have any requirement for high read performance, go ahead with FileInputStream. For example:
FileInputStream fileInputStream = new FileInputStream("filePath");
byte[] bytes = new byte[1024];
int size;
while ((size = fileInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
For better read performance, use BufferedInputStream and stick to its default buffer size and read single byte at a time. For example:
byte[] bytes = new byte[1];
BufferedInputStream bufferedInputStream =
new BufferedInputStream(fileInputStream("filePath"))
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
For more performance, try tuning the buffer size of BufferedInputStream and read one byte at a time. For example:
byte[] bytes = new byte[1];
BufferedInputStream bufferedInputStream =
new BufferedInputStream(fileInputStream("filePath"), 16048)
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
If you require even more, use buffer on top of BufferedInputStream. For example:
byte[] bytes = new byte[1024];
BufferedInputStream bufferedInputStream =
new BufferedInputStream(fileInputStream("filePath"), 16048)
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}

You basically have a byte container of the length you specified (1024*32)
Then, the inputStream will fill as much as possible, probably the full container, iteration throughout iteration until it reaches the end of the file when it will fill only the remaining bytes, and return -1 the next iteration (the one it cant read anything)
So you are basically copy&pasting from input to output in chunks of 1024*32 bytes size
Hope it helps you understand the code
By the way, the last iteration, if the input stream has less than 1024*32, the output will receive not only the last part of the file but also a repetition of the previous iteration contents for the bytes not filled it the last iteration.

The idea is not to read the entire file contents at one time using the buffered input stream. You use the buffered input stream to read as many bytes as the bytes[] array size. You consume the bytes read and then move on to reading more bytes from the file. Hence you don't need know the file size in order to read it.
This post will be more helpful as it explains why you should wrap a fileinputstream with a buffered input stream
Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

Related

Java : know how much bytes were read by ImageIO.read() (and maybe other similar methods)

i am making a java program that reads data from a binary stream (using a DataInputStream).
Sometimes during this process i need to read a data chunk, however the method (which i cannot modify) that reads it will stop before reaching the end of the chunk (it is the normal behavior, apparently it just doesn't need the last bytes, but i can't do anything about the fact that they are there). This is not a problem in itself because i know exactly how long the chunk is, i.e. i know how many bytes there are in the chunk so i can skip bytes (with the skipBytes(int) method) until the end of the chunk ; the problem is : i don't actually know how many bytes the method actually read (or left), so i don't know how many bytes i need to skip to reach the end of the chunk.
Is there any way to :
know how many bytes were read in a stream since a certain point in time ?
know how many bytes were read in a stream since it was ?
any other way i could know how many bytes my data-chunk-reading method just read (since it won't directly tell me) ?
Just in case, i made a small diagram
Thanks in advance
ImageInputStream can do what you want. It implements DataInput and it has most of the methods of InputStream. And it has getStreamPosition, seek and skipBytes methods.
However, as you correctly point out, ImageIO.read(ImageInputStream) would close the stream, preventing you from reading more than one image.
The solution is to avoid using ImageIO.read, and instead obtain an ImageReader explicitly, using ImageIO.getImageReaders. Then you can invoke an ImageReader’s read method, which does not close the stream.
Here’s how I implemented it:
public void readImages(InputStream source,
Consumer<? super BufferedImage> imageHandler)
throws IOException {
// Every image is at a byte index which is a multiple of this number.
int boundary = 5000;
try (ImageInputStream stream = ImageIO.createImageInputStream(source)) {
while (true) {
long pos = stream.getStreamPosition();
Iterator<ImageReader> readers = ImageIO.getImageReaders(stream);
if (!readers.hasNext()) {
break;
}
ImageReader reader = readers.next();
reader.setInput(stream);
BufferedImage image = reader.read(0);
imageHandler.accept(image);
pos = stream.getStreamPosition();
long bytesToSkip = boundary - (pos % boundary);
if (bytesToSkip < boundary) {
stream.skipBytes(bytesToSkip);
}
}
}
}
And here’s how I tested it:
try (InputStream source = new BufferedInputStream(
Files.newInputStream(Path.of(filename)))) {
reader.readImages(source, img -> EventQueue.invokeLater(() -> {
JOptionPane.showMessageDialog(null, new ImageIcon(img));
}));
}
All the buffered read methods return the actual number of bytes read.
Quoting documentation for InputStream#read(byte[] b):
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.

Fastest way to read from InputStream to OutputStream

This code below streams at 1.3 seconds for a 2.43 MB file
byte[] buff = new byte[64*1024];
private static void flow(InputStream is, OutputStream os, byte[] buf )
throws IOException {
int numRead;
while ( (numRead = is.read(buf) ) >= 0) {
os.write(buf, 0, numRead);
}
}
What is the fastest way to "stream" an InputStream to OutputStream?
Update:
Data source is a cache, EHCache to be specific:
byte[] cached = cacheService.get(cacheKey); // Just `2 ms` to get the bytes, very fast
if(cached != null && cached.length > 0) {
flow(ByteSource.wrap(cached).openStream(), outputStream, buff);
}
I can't make any assertion that it's the fastest but I would suggest using apache-commons-io's IOUtils. Specifically
public static long copy(InputStream input, OutputStream output, int bufferSize)
and try to benchmark with different values of bufferSize.
https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/IOUtils.html#copy(java.io.InputStream,%20java.io.OutputStream,%20int)
The real problem here is the high level of abstraction you're working with. Provided you know exactly where the data is coming from (e.g. the file system) and where it's going (e.g network socket) and you know which operating system you're working on, it is possible to leverage the kernel's stream support to make this much faster.
Googling for "zero copy kernel io" I found this article which is an okay overview:
https://xunnanxu.github.io/2016/09/10/It-s-all-about-buffers-zero-copy-mmap-and-Java-NIO/
Since Java 9, InputStream provides a transferTo(OutStream) method or using Java 7 Files can also be used.
Again no claims on which is the fastest but you can benchmark these as well.
References:
Official Documentation
A similar Question
I would also have said commons-io: IOUtils::copy which does this probably better than a naive approach, but the code seems to do the same as yours (see copyLarge) but answer about Java 9 makes it a better choice.
public static long copyLarge(final InputStream input, final OutputStream output, final byte[] buffer)
throws IOException {
long count = 0;
int n;
while (EOF != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
However, your problem may not be how you copy, but rather the lack of buffering: you could try with BufferedInputStream and BufferedOutputStream on top of existing stream:
Files.newInputStream is not buffered.
Files.newOutputStream is not buffered.
You could use FileChannel and ByteBuffer.
System is probably buffering file on its side.
You should roll up a JMH benchmark test:
Not sure how you can disable system buffering. I don't think it is a problem.
I would first check result with buffered input stream of various size (8K, 16K, 32K, 64K, 512K, 1M, 2M, 4M, 8M)
Then with buffered output stream
Then with a mix of two.
While it may take time to execute, the road to what the fastest implies measuring.

Java Inflater will loop infinitely sometimes

In my application, I'm trying to compress/decompress byte array using java's Inflater/Deflater class.
Here's part of the code I used at first:
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
Then after I deployed the code it'll randomly (very rare) cause the whole application hang, and when I took a thread dump, I can identify that one thread hanging
at java.util.zip.Inflater.inflateBytes(Native Method)
at java.util.zip.Inflater.inflate(Inflater.java:259)
- locked java.util.zip.ZStreamRef#fc71443
at java.util.zip.Inflater.inflate(Inflater.java:280)
It doesn't happen very often. Then I googled everywhere and found out it could be some empty byte data passed in the inflater and finished() will never return true.
So I used a workaround, instead of using
while (!inflater.finished())
to determine if it's finished, I used
while (inflater.getRemaining() > 0)
But it happened again.
Now it makes me wonder what's the real reason that causes the issue. There shouldn't be any empty array passed in the inflater, even if it did, how come getRemaining() method did not break the while loop?
Can anybody help pls? It's really bugging me.
Confused by the same problem, I find this page.
This is my workaround for this, it may helps:
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int i = inflater.inflate(buffer);
if (i == 0) {
break;
}
byteArrayOutputStream.write(buffer, 0, i);
}
The javadoc of inflate:
Uncompresses bytes into specified buffer. Returns actual number of bytes uncompressed. A return value of 0 indicates that needsInput() or needsDictionary() should be called in order to determine if more input data or a preset dictionary is required. In the latter case, getAdler() can be used to get the Adler-32 value of the dictionary required.
So #Wildo Luo was certainly right to check for 0 being returned.
byte[] buffer = new byte[1024];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
if (count != 0 ) {
outputStream.write(buffer, 0, count);
} else {
if (inflater.needsInput()) { // Not everything read
inflater.setInput(...);
} else if (inflater.needsDictionary()) { // Dictionary to be loaded
inflater.setDictionary(...);
}
}
}
inflater.end();
I can only imagine that elsewhere the code is not entirely right, maybe on the compression size. Better first check the general code. There is the Inflater(boolean nowrap) requiring an extra byte, the end() call. Exception handling (try-finally). Etcetera.
For unkown data, unknown occurrences: using a try-catch, find compressed data to check whether it is a data based error, and for testing any solution.
Having the same problem...
What I'm sure about:
I'm having an infinite loop, assured with logs printed.
inflater.inflate returns 0, and the output buffer size is 0.
My loop is like this (Hive ORC code):
while (!(inflater.finished() || inflater.needsDictionary() ||
inflater.needsInput())) {
try {
int count = inflater.inflate(out.array(),
out.arrayOffset() + out.position(),
out.remaining());
out.position(count + out.position());
} catch (DataFormatException dfe) {
throw new IOException("Bad compression data", dfe);
}
}
After the out buffer is consumed and its remaining size is 0, the loop will infinitely run.
But I'm not sure about whether it's orc or zlib caused this. On orc side, it fills original data with the same compression buffer size then do the compression, so theoretically it's not possible I get an compressed chunk larger than the buffer size. Possibilities may be zlib or hardware.
That being said, break the loop when count == 0 is dangerous, since there may be still uncompressed data in the inflator.

How to set fileChannel position

I am reading in a file 510 bytes at a time. The bytes are in a byte buffer and I am reading them using a fileChannel.
Once I change the position, and it checks the case inside the while loop again but then jumps out the while loop. The total number of bytes is around 8000 bytes. How do I rewind to a specific position in the fileChannel without causing this error?
Heres my code:
File f = new File("./data.txt");
FileChannel fChannel = f.getChannel();
ByteBuffer bBuffer = ByteBuffer.allocate(510);
while(fChannel.read(bBuffer) > 0){
//omit code
if(//case){
fChannel.position(3060);
}
}
If your ByteBuffer is full, read() will return zero and your loop will terminate. You need to flip() your ByteBuffer, take data out of it, and then compact() it to make room for more data.
I was also working a lot for reading the file as bytes. At first, i realized it would be great to have such a flexible mechanism that u can set the position of the file along with the byte size and finally end up with the below code.
public static byte[] bytes;
public static ByteBuffer buffer;
public static byte[] getBytes(int position)
{
try
{
bytes=new byte[10];
buffer.position(position);
buffer.get(bytes);
}
catch (BufferUnderflowException bue)
{
int capacity=buffer.capacity();
System.out.println(capacity);
int size=capacity-position;
bytes=new byte[size];
buffer.get(bytes);
}
return bytes;
}
Here u can also make the byte array size flexible by passing the parameter size along with the position. The underflow exception is handled here.
Hopefully, it will help you;

How can I increase performance on reading the InputStream?

This very well may just be a KISS moment, but I feel like I should ask anyway.
I have a thread and it's reading from a sockets InputStream. Since I am dealing in particularly small data sizes (as in the data that I can expect to recieve from is in the order of 100 - 200 bytes), I set the buffer array size to 256. As part of my read function I have a check that will ensure that when I read from the InputStream that I got all of the data. If I didn't then I will recursively call the read function again. For each recursive call I merge the two buffer arrays back together.
My problem is, while I never anticipate using more than the buffer of 256, I want to be safe. But if sheep begin to fly and the buffer is significantly more the read the function (by estimation) will begin to take an exponential curve more time to complete.
How can I increase the effiency of the read function and/or the buffer merging?
Here is the read function as it stands.
int BUFFER_AMOUNT = 256;
private int read(byte[] buffer) throws IOException {
int bytes = mInStream.read(buffer); // Read the input stream
if (bytes == -1) { // If bytes == -1 then we didn't get all of the data
byte[] newBuffer = new byte[BUFFER_AMOUNT]; // Try to get the rest
int newBytes;
newBytes = read(newBuffer); // Recurse until we have all the data
byte[] oldBuffer = new byte[bytes + newBytes]; // make the final array size
// Merge buffer into the begining of old buffer.
// We do this so that once the method finishes, we can just add the
// modified buffer to a queue later in the class for processing.
for (int i = 0; i < bytes; i++)
oldBuffer[i] = buffer[i];
for (int i = bytes; i < bytes + newBytes; i++) // Merge newBuffer into the latter half of old Buffer
oldBuffer[i] = newBuffer[i];
// Used for the recursion
buffer = oldBuffer; // And now we set buffer to the new buffer full of all the data.
return bytes + newBytes;
}
return bytes;
}
EDIT: Am I being paranoid (unjustifiedly) and should just set the buffer to 2048 and call it done?
BufferedInputStream, as noted by Roland, and DataInputStream.readFully(), which replaces all the looping code.
int BUFFER_AMOUNT = 256;
Should be final if you don't want it changing at runtime.
if (bytes == -1) {
Should be !=
Also, I'm not entirely clear on what you're trying to accomplish with this code. Do you mind shedding some light on that?
I have no idea what you mean by "small data sizes". You should measure whether the time is spent in kernel mode (then you are issuing too many reads directly on the socket) or in user mode (then your algorithm is too complicated).
In the former case, just wrap the input with a BufferedInputStream with 4096 bytes of buffer and read from it.
In the latter case, just use this code:
/**
* Reads as much as possible from the stream.
* #return The number of bytes read into the buffer, or -1
* if nothing has been read because the end of file has been reached.
*/
static int readGreedily(InputStream is, byte[] buf, int start, int len) {
int nread;
int ptr = start; // index at which the data is put into the buffer
int rest = len; // number of bytes that we still want to read
while ((nread = is.read(buf, ptr, rest)) > 0) {
ptr += nread;
rest -= nread;
}
int totalRead = len - rest;
return (nread == -1 && totalRead == 0) ? -1 : totalRead;
}
This code completely avoids creating new objects, calling unnecessary methods and furthermore --- it is straightforward.

Categories