Fastest way to read from InputStream to OutputStream - java

This code below streams at 1.3 seconds for a 2.43 MB file
byte[] buff = new byte[64*1024];
private static void flow(InputStream is, OutputStream os, byte[] buf )
throws IOException {
int numRead;
while ( (numRead = is.read(buf) ) >= 0) {
os.write(buf, 0, numRead);
}
}
What is the fastest way to "stream" an InputStream to OutputStream?
Update:
Data source is a cache, EHCache to be specific:
byte[] cached = cacheService.get(cacheKey); // Just `2 ms` to get the bytes, very fast
if(cached != null && cached.length > 0) {
flow(ByteSource.wrap(cached).openStream(), outputStream, buff);
}

I can't make any assertion that it's the fastest but I would suggest using apache-commons-io's IOUtils. Specifically
public static long copy(InputStream input, OutputStream output, int bufferSize)
and try to benchmark with different values of bufferSize.
https://commons.apache.org/proper/commons-io/javadocs/api-2.5/org/apache/commons/io/IOUtils.html#copy(java.io.InputStream,%20java.io.OutputStream,%20int)
The real problem here is the high level of abstraction you're working with. Provided you know exactly where the data is coming from (e.g. the file system) and where it's going (e.g network socket) and you know which operating system you're working on, it is possible to leverage the kernel's stream support to make this much faster.
Googling for "zero copy kernel io" I found this article which is an okay overview:
https://xunnanxu.github.io/2016/09/10/It-s-all-about-buffers-zero-copy-mmap-and-Java-NIO/

Since Java 9, InputStream provides a transferTo(OutStream) method or using Java 7 Files can also be used.
Again no claims on which is the fastest but you can benchmark these as well.
References:
Official Documentation
A similar Question

I would also have said commons-io: IOUtils::copy which does this probably better than a naive approach, but the code seems to do the same as yours (see copyLarge) but answer about Java 9 makes it a better choice.
public static long copyLarge(final InputStream input, final OutputStream output, final byte[] buffer)
throws IOException {
long count = 0;
int n;
while (EOF != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
However, your problem may not be how you copy, but rather the lack of buffering: you could try with BufferedInputStream and BufferedOutputStream on top of existing stream:
Files.newInputStream is not buffered.
Files.newOutputStream is not buffered.
You could use FileChannel and ByteBuffer.
System is probably buffering file on its side.
You should roll up a JMH benchmark test:
Not sure how you can disable system buffering. I don't think it is a problem.
I would first check result with buffered input stream of various size (8K, 16K, 32K, 64K, 512K, 1M, 2M, 4M, 8M)
Then with buffered output stream
Then with a mix of two.
While it may take time to execute, the road to what the fastest implies measuring.

Related

Java : know how much bytes were read by ImageIO.read() (and maybe other similar methods)

i am making a java program that reads data from a binary stream (using a DataInputStream).
Sometimes during this process i need to read a data chunk, however the method (which i cannot modify) that reads it will stop before reaching the end of the chunk (it is the normal behavior, apparently it just doesn't need the last bytes, but i can't do anything about the fact that they are there). This is not a problem in itself because i know exactly how long the chunk is, i.e. i know how many bytes there are in the chunk so i can skip bytes (with the skipBytes(int) method) until the end of the chunk ; the problem is : i don't actually know how many bytes the method actually read (or left), so i don't know how many bytes i need to skip to reach the end of the chunk.
Is there any way to :
know how many bytes were read in a stream since a certain point in time ?
know how many bytes were read in a stream since it was ?
any other way i could know how many bytes my data-chunk-reading method just read (since it won't directly tell me) ?
Just in case, i made a small diagram
Thanks in advance
ImageInputStream can do what you want. It implements DataInput and it has most of the methods of InputStream. And it has getStreamPosition, seek and skipBytes methods.
However, as you correctly point out, ImageIO.read(ImageInputStream) would close the stream, preventing you from reading more than one image.
The solution is to avoid using ImageIO.read, and instead obtain an ImageReader explicitly, using ImageIO.getImageReaders. Then you can invoke an ImageReader’s read method, which does not close the stream.
Here’s how I implemented it:
public void readImages(InputStream source,
Consumer<? super BufferedImage> imageHandler)
throws IOException {
// Every image is at a byte index which is a multiple of this number.
int boundary = 5000;
try (ImageInputStream stream = ImageIO.createImageInputStream(source)) {
while (true) {
long pos = stream.getStreamPosition();
Iterator<ImageReader> readers = ImageIO.getImageReaders(stream);
if (!readers.hasNext()) {
break;
}
ImageReader reader = readers.next();
reader.setInput(stream);
BufferedImage image = reader.read(0);
imageHandler.accept(image);
pos = stream.getStreamPosition();
long bytesToSkip = boundary - (pos % boundary);
if (bytesToSkip < boundary) {
stream.skipBytes(bytesToSkip);
}
}
}
}
And here’s how I tested it:
try (InputStream source = new BufferedInputStream(
Files.newInputStream(Path.of(filename)))) {
reader.readImages(source, img -> EventQueue.invokeLater(() -> {
JOptionPane.showMessageDialog(null, new ImageIcon(img));
}));
}
All the buffered read methods return the actual number of bytes read.
Quoting documentation for InputStream#read(byte[] b):
Returns:
the total number of bytes read into the buffer, or -1 if there is no more data because the end of the stream has been reached.

How do I decide how many bytes to read from an inputstream?

I am trying to read from an InputStream. I wrote below code
byte[] bytes = new byte[1024 * 32];
while (bufferedInStream.read(bytes) != -1) {
bufferedOutStream.write(bytes);
}
What I don't understand is how many bytes I should read in an iteration? The stream contains a file saved on the disk.
I read here but I did not understand the post really.
Say you had a flow of water from a pipe into a bath. You then used a bucket to get water from the bath and carry to say to your garden to water the lawn. The bath is the buffer. When you are walking across the lawn the buffer is filling up so when you return there is a bucket ful for you to take again.
If the bath is tiny then it could overflow while you are walking with the bucket and so you will lose water. If you have a massive bath then it is unlikely to overflow. so a larger buffer is more convenient. but of course a larger bath costs more money and takes up more space.
A buffer in your program takes up memory space. And you don't want to take up all your available memory for your buffer just because it is convenient.
Generally in your read function you can specify how many bytes to read. so even if you have a small buffer you could do this (pseudocode):
const int bufsize = 50;
buf[bufsize];
unsigned read;
while ((read = is.read(buf, bufsize)) != NULL) {
// do something with data - up to read bytes
}
In above code bufzise is MAXIMUM data to read into the buffer.
If your read function does not allow you to specify a maximum number of bytes to read then you need to supply a buffer large enough to receive the largest possible read amount.
So the optimal buffer size is application specific. Only the application developer will know the characteristics of the data. Eg how fast is the flow of water into the bath. What bath size can you afford (embedded apps), how quickly can you carry bucket from bath across garden and back again.
It is depend on available memory, size of file and other stuff. You better make some measurement.
PS: You code is wrong. bufferedInStream.read(bytes) may not fill all buffer, but only part of it. This method return actual amount of bytes as result.
byte[] bytes = new byte[1024 * 32];
int size;
while ((size = bufferedInStream.read(bytes)) != -1) {
bufferedOutStream.write(bytes, 0, size);
}
Here is my suggestion (assuming we are dealing with just input stream and not how we gonna write to output stream):
If your use case does not have any requirement for high read performance, go ahead with FileInputStream. For example:
FileInputStream fileInputStream = new FileInputStream("filePath");
byte[] bytes = new byte[1024];
int size;
while ((size = fileInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
For better read performance, use BufferedInputStream and stick to its default buffer size and read single byte at a time. For example:
byte[] bytes = new byte[1];
BufferedInputStream bufferedInputStream =
new BufferedInputStream(fileInputStream("filePath"))
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
For more performance, try tuning the buffer size of BufferedInputStream and read one byte at a time. For example:
byte[] bytes = new byte[1];
BufferedInputStream bufferedInputStream =
new BufferedInputStream(fileInputStream("filePath"), 16048)
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
If you require even more, use buffer on top of BufferedInputStream. For example:
byte[] bytes = new byte[1024];
BufferedInputStream bufferedInputStream =
new BufferedInputStream(fileInputStream("filePath"), 16048)
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, size);
}
You basically have a byte container of the length you specified (1024*32)
Then, the inputStream will fill as much as possible, probably the full container, iteration throughout iteration until it reaches the end of the file when it will fill only the remaining bytes, and return -1 the next iteration (the one it cant read anything)
So you are basically copy&pasting from input to output in chunks of 1024*32 bytes size
Hope it helps you understand the code
By the way, the last iteration, if the input stream has less than 1024*32, the output will receive not only the last part of the file but also a repetition of the previous iteration contents for the bytes not filled it the last iteration.
The idea is not to read the entire file contents at one time using the buffered input stream. You use the buffered input stream to read as many bytes as the bytes[] array size. You consume the bytes read and then move on to reading more bytes from the file. Hence you don't need know the file size in order to read it.
This post will be more helpful as it explains why you should wrap a fileinputstream with a buffered input stream
Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

How to use NIO to write InputStream to File?

I am using following way to write InputStream to File:
private void writeToFile(InputStream stream) throws IOException {
String filePath = "C:\\Test.jpg";
FileChannel outChannel = new FileOutputStream(filePath).getChannel();
ReadableByteChannel inChannel = Channels.newChannel(stream);
ByteBuffer buffer = ByteBuffer.allocate(1024);
while(true) {
if(inChannel.read(buffer) == -1) {
break;
}
buffer.flip();
outChannel.write(buffer);
buffer.clear();
}
inChannel.close();
outChannel.close();
}
I was wondering if this is the right way to use NIO. I have read a method FileChannel.transferFrom, which takes three parameter:
ReadableByteChannel src
long position
long count
In my case I only have src, I don't have the position and count, is there any way I can use this method to create the file?
Also for Image is there any better way to create image only from InputStream and NIO?
Any information would be very useful to me. There are similar questions here, in SO, but I cannot find any particular solution which suites my case.
I would use Files.copy
Files.copy(is, Paths.get(filePath));
as for your version
ByteBuffer.allocateDirect is faster - Java will make a best effort to perform native I/O operations directly upon it.
Closing is unreliable, if first fails second will never execute. Use try-with-resources instead, Channels are AutoCloseable too.
No it's not correct. You run the risk of losing data. The canonical NIO copy loop is as follows:
while (in.read(buffer) >= 0 || buffer.position() > 0)
{
buffer.flip();
out.write(buffer);
buffer.compact();
}
Note the changed loop conditions, which take care of flushing the output at EOS, and the use of compact() instead of clear(), which takes care of the possibility of short writes.
Similarly the canonical transferTo()/transferFrom() loop is as follows:
long offset = 0;
long quantum = 1024*1024; // or however much you want to transfer at a time
long count;
while ((count = out.transferFrom(in, offset, quantum)) > 0)
{
offset += count;
}
It must be called in a loop, as it isn't guaranteed to transfer the entire quantum.

Java External Sorting Efficiency

I have to write an external sorting program in java which given a file A containing an arbitrary number of integers, sorts them using only file B (which is the same size) as temporary storage. For the first stage I am reading blocks of the file into ram, using the inbuilt java sort and writing back to file B, however this is proving to be very slow. I would like to know if there are any glaring inefficiencies in my code? Note that input1 and output are RandomAccessFile Objcets and BUFFER_SIZE is the block size decided at runtime by the amount of free memory.
public void SortBlocks() throws IOException{
int startTime = (int) System.currentTimeMillis();
input1.seek(0);output.seek(0);
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(input1.getFD()),2048));
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(output.getFD()),2048));
int[] buffer = new int[BUFFER_SIZE];
int j=0;
for(int i=0; i<input1.length();i+=4){
buffer[j] = in.readInt();
j++;
if(j == BUFFER_SIZE){
writeInts(buffer,out,j);
j=0;
}
}
writeInts(buffer,out,j);
out.flush();
SwitchIO();
int endTime = (int) System.currentTimeMillis();
System.out.println("sorted blocks in " + Integer.toString(endTime-startTime));
}
private static void writeInts(int[] Ints, DataOutputStream out, int size) throws IOException{
Arrays.sort(Ints,0,size);
for(int i=0;i<size;i++){
out.writeInt(Ints[i]);
}
}
Thanks in advance for your feedback.
The most glaring inefficiency is the use of input1.length() which is a relatively expensive operation and you are calling it on every int value.
I can't see why you decrease the buffer size when the default (8192) would be more efficient.
If you are reading files, I would use a ByteBuffer as an IntBuffer. A bottleneck is likely to be the way you read and write data. Using int values in native order would improve the translation performance. (Rather than the default which big endian)
If you access the file as a memory mapped file you may be able to gracefully handle files larger than the memory size.

Read file at a certain rate in Java

Is there an article/algorithm on how I can read a long file at a certain rate?
Say I do not want to pass 10 KB/sec while issuing reads.
A simple solution, by creating a ThrottledInputStream.
This should be used like this:
final InputStream slowIS = new ThrottledInputStream(new BufferedInputStream(new FileInputStream("c:\\file.txt"),8000),300);
300 is the number of kilobytes per second. 8000 is the block size for BufferedInputStream.
This should of course be generalized by implementing read(byte b[], int off, int len), which will spare you a ton of System.currentTimeMillis() calls. System.currentTimeMillis() is called once for each byte read, which can cause a bit of an overhead. It should also be possible to store the number of bytes that can savely be read without calling System.currentTimeMillis().
Be sure to put a BufferedInputStream in between, otherwise the FileInputStream will be polled in single bytes rather than blocks. This will reduce the CPU load form 10% to almost 0. You will risk to exceed the data rate by the number of bytes in the block size.
import java.io.InputStream;
import java.io.IOException;
public class ThrottledInputStream extends InputStream {
private final InputStream rawStream;
private long totalBytesRead;
private long startTimeMillis;
private static final int BYTES_PER_KILOBYTE = 1024;
private static final int MILLIS_PER_SECOND = 1000;
private final int ratePerMillis;
public ThrottledInputStream(InputStream rawStream, int kBytesPersecond) {
this.rawStream = rawStream;
ratePerMillis = kBytesPersecond * BYTES_PER_KILOBYTE / MILLIS_PER_SECOND;
}
#Override
public int read() throws IOException {
if (startTimeMillis == 0) {
startTimeMillis = System.currentTimeMillis();
}
long now = System.currentTimeMillis();
long interval = now - startTimeMillis;
//see if we are too fast..
if (interval * ratePerMillis < totalBytesRead + 1) { //+1 because we are reading 1 byte
try {
final long sleepTime = ratePerMillis / (totalBytesRead + 1) - interval; // will most likely only be relevant on the first few passes
Thread.sleep(Math.max(1, sleepTime));
} catch (InterruptedException e) {//never realized what that is good for :)
}
}
totalBytesRead += 1;
return rawStream.read();
}
}
The crude solution is just to read a chunk at a time and then sleep eg 10k then sleep a second. But the first question I have to ask is: why? There are a couple of likely answers:
You don't want to create work faster than it can be done; or
You don't want to create too great a load on the system.
My suggestion is not to control it at the read level. That's kind of messy and inaccurate. Instead control it at the work end. Java has lots of great concurrency tools to deal with this. There are a few alternative ways of doing this.
I tend to like using a producer consumer pattern for soling this kind of problem. It gives you great options on being able to monitor progress by having a reporting thread and so on and it can be a really clean solution.
Something like an ArrayBlockingQueue can be used for the kind of throttling needed for both (1) and (2). With a limited capacity the reader will eventually block when the queue is full so won't fill up too fast. The workers (consumers) can be controlled to only work so fast to also throttle the rate covering (2).
while !EOF
store System.currentTimeMillis() + 1000 (1 sec) in a long variable
read a 10K buffer
check if stored time has passed
if it isn't, Thread.sleep() for stored time - current time
Creating ThrottledInputStream that takes another InputStream as suggested would be a nice solution.
If you have used Java I/O then you should be familiar with decorating streams. I suggest an InputStream subclass that takes another InputStream and throttles the flow rate. (You could subclass FileInputStream but that approach is highly error-prone and inflexible.)
Your exact implementation will depend upon your exact requirements. Generally you will want to note the time your last read returned (System.nanoTime). On the current read, after the underlying read, wait until sufficient time has passed for the amount of data transferred. A more sophisticated implementation may buffer and return (almost) immediately with only as much data as rate dictates (be careful that you should only return a read length of 0 if the buffer is of zero length).
You can use a RateLimiter. And make your own implementation of the read in InputStream. An example of this can be seen bellow
public class InputStreamFlow extends InputStream {
private final InputStream inputStream;
private final RateLimiter maxBytesPerSecond;
public InputStreamFlow(InputStream inputStream, RateLimiter limiter) {
this.inputStream = inputStream;
this.maxBytesPerSecond = limiter;
}
#Override
public int read() throws IOException {
maxBytesPerSecond.acquire(1);
return (inputStream.read());
}
#Override
public int read(byte[] b) throws IOException {
maxBytesPerSecond.acquire(b.length);
return (inputStream.read(b));
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
maxBytesPerSecond.acquire(len);
return (inputStream.read(b,off, len));
}
}
if you want to limit the flow by 1 MB/s you can get the input stream like this:
final RateLimiter limiter = RateLimiter.create(RateLimiter.ONE_MB);
final InputStreamFlow inputStreamFlow = new InputStreamFlow(originalInputStream, limiter);
It depends a little on whether you mean "don't exceed a certain rate" or "stay close to a certain rate."
If you mean "don't exceed", you can guarantee that with a simple loop:
while not EOF do
read a buffer
Thread.wait(time)
write the buffer
od
The amount of time to wait is a simple function of the size of the buffer; if the buffer size is 10K bytes, you want to wait a second between reads.
If you want to get closer than that, you probably need to use a timer.
create a Runnable to do the reading
create a Timer with a TimerTask to do the reading
schedule the TimerTask n times a second.
If you're concerned about the speed at which you're passing the data on to something else, instead of controlling the read, put the data into a data structure like a queue or circular buffer, and control the other end; send data periodically. You need to be careful with that, though, depending on the data set size and such, because you can run into memory limitations if the reader is very much faster than the writer.

Categories