I'm developing a little program to encryp/decrypt a binary file using AES-256 and HMAC to check the results.
My code is based on AESCrypt implementation in Java, but I wanted to modify it to allow multiple threads to do the job simultaneously.
I get the size of original bytes and calculate the number of 16 bytes blocks per thread, then I startes the threads with information about the offset to apply for reading and writing (because there is a header for the encrypted file, so the offset_write = offset_read+header_length).
When it finishes the encryption I passed the output content (without the header) trough the HMAC to generate the checksum.
The problem is that some bytes get corrupted in the bytes between two threads.
Code of main:
//..
// Initialization and creation of iv, aesKey
//..
in = new FileInputStream(fromPath);
out = new FileOutputStream(toPath);
//..
// Some code for generate the header and write it to out
//..
double totalBytes = new Long(archivo.length()).doubleValue();
int bloquesHilo = new Double(Math.ceil(totalBytes/(AESCrypt.NUM_THREADS*AESCrypt.BLOCK_SIZE))).intValue();
int offset_write = new Long((out.getChannel()).position()).intValue();
for (int i = 0; i < AESCrypt.NUM_THREADS; i++)
{
int offset = bloquesHilo*AESCrypt.BLOCK_SIZE*i;
HiloCrypt hilo = new HiloCrypt(fromPath, toPath, ivSpec, aesKey, offset, offsetInicio, bloquesHilo, this);
hilo.start();
}
Code for a thread (class HiloCrypt):
public class HiloCrypt extends Thread {
private RandomAccessFile in;
private RandomAccessFile out;
private Cipher cipher;
private Mac hmac;
private IvParameterSpec ivSpec2;
private SecretKeySpec aesKey2;
private Integer num_blocks;
private Integer offset_read;
private Integer offset_write;
private AESCrypt parent;
public HiloCrypt(String input, String output, IvParameterSpec ivSpec, SecretKeySpec aesKey, Integer offset_thread, Integer offset_write, Integer blocks, AESCrypt parent2)
{
try
{
// If i don't use RandomAccessFile there is a problem copying data
this.in = new RandomAccessFile(input, "r");
this.out = new RandomAccessFile(output, "rw");
int total_offset_write = offset_write + offset_thread;
// Adjust the offset for reading and writing
this.out.seek(total_offset_write);
this.in.seek(offset_thread);
this.ivSpec2 = ivSpec;
this.aesKey2 = aesKey;
this.cipher = Cipher.getInstance(AESCrypt.CRYPT_TRANS);
this.hmac = Mac.getInstance(AESCrypt.HMAC_ALG);
this.num_blocks = blocks;
this.offset_read = offset_thread;
this.offset_write = total_offset_write;
this.parent = parent2;
} catch (Exception e)
{
System.err.println(e);
return;
}
}
public void run()
{
int len, last,block_counter,total = 0;
byte[] text = new byte[AESCrypt.BLOCK_SIZE];
try{
// Start encryption objects
this.cipher.init(Cipher.ENCRYPT_MODE, this.aesKey2, this.ivSpec2);
this.hmac.init(new SecretKeySpec(this.aesKey2.getEncoded(), AESCrypt.HMAC_ALG));
while ((len = this.in.read(text)) > 0 && block_counter < this.num_blocks)
{
this.cipher.update(text, 0, AESCrypt.BLOCK_SIZE, text);
this.hmac.update(text);
// Write the block
this.out.write(text);
last = len;
total+=len;
block_counter++;
}
if (len < 0) // If it's the last block, calculate the HMAC
{
last &= 0x0f;
this.out.write(last);
this.out.seek(this.offset_write-this.offset_read);
while ((len = this.out.read(text)) > 0)
{
this.hmac.update(text);
}
// write last block of HMAC
text=this.hmac.doFinal();
this.out.write(text);
}
// Close streams
this.in.close();
this.out.close();
// Code to notify the end of the thread
}
catch(Exception e)
{
System.err.println("Hola!");
System.err.println(e);
}
}
}
With this code if I execute only 1 thread, the encryption/decryption goes perfect, but with 2+ threads there is a problem with bytes in the zone between threads jobs, the data gets corrupted there and the checksum also fails.
I'm trying to do this with threads because it gets near 2x faster than with one thread, I think it should be because of processing and not by the accessing of the file.
As a irrelevant data, it compress 250Mb of data in 43 seconds on a MB Air. ¿It's a good time?
AESCrypt is not thread safe. You cannot use multiple threads with it.
Generally speaking, encryption code is rarely thread safe, as it requires complex mathematics to generate secure output. AES by itself is relatively fast, if you need better speed from it, consider vertical scaling or hardware accelerators as a first step. Later, you can add more servers to encrypt different files concurrently (horizontal scaling).
You basically want to multithread an operation that is intrinsically sequential.
Stream cipher cannot be made parallel because each block depends on the completion of the previous block. So you can encrypt multiple files in parallel independently with slight performance increase, especially if the files are in memory rather than on disk, but you cannot encrypt a single file using multiple cores.
As I can see, you use an update method. I'm not an expert in Java crypography but even the name of the method tells me that the encryption algorithm holds a state: "multithreading" and "state" are not friends, you have to deal with state management across threads.
Race condition explains why you get blocks damaged.
It makes absolutely no sense to use more than 1 thread for the HMAC because 1) it has to be computed sequentially and 2) I/O access R/W is much slower than actual HMAC computation
For AES it can be a good idea to use multiple threads when using CNT mode or other chaining modes which don't require knowledge of previous data blocks.
what about moving the question to crypto-stackexchange?
Related
I have to transfer larger files (upto 10GB) using UDP. Unfortunately TCP cannot be used in this use case because there is no bidirectional communication between sender and receiver possible.
Sending a file is not the problem. I have written the client using netty. It reads the file, encodes it (unique ID, position in stream and so on) and sends it to the destination at a configurable rate (packets per seconds). All the packets are received at the destination. I have used iptables and Wireshark to verify that.
The problem occurs with the recipient. Receiving upto 90K packets a second works pretty fine. But receiving and decoding it at this rate is not possible using a single thread.
My first approach was to use thread safe queues (one producer and multiple consumer). But using multiple consumers did not lead to better results. Some packets were still lost. It seems that the overhead (locking/unlocking the queue) slows down the process. So I decided to use lmax disruptor with a single producer (receiving the UDP datagrams) and multiple consumer (decoding the packet). But surprisingly, this does not lead to success either. It is hardly a speed advantage to use two lmax consumers and I wonder why.
This is main part receiving UDP packets and call the disruptor
public void receiveUdpStream(DatagramChannel channel) {
boolean exit = false;
// the size of the UDP datagram
int size = shareddata.cr.getDatagramsize();
// the number of decoders (configurable)
int nn_decoders = shareddata.cr.getDecoders();
Udp2flowEventFactory factory = new Udp2flowEventFactory(size);
// the size of the ringbuffer
int bufferSize = 1 << 10;
Disruptor<Udp2flowEvent> disruptor = new Disruptor<>(
factory,
bufferSize,
DaemonThreadFactory.INSTANCE,
ProducerType.SINGLE,
new YieldingWaitStrategy());
// my consumers
Udp2flowDecoder decoder[] = new Udp2flowDecoder[nn_decoders];
for (int i = 0; i < nn_decoders; i++) {
decoder[i] = new Udp2flowDecoder(i, shareddata);
}
disruptor.handleEventsWith(decoder);
RingBuffer<Udp2flowEvent> ringBuffer = disruptor.getRingBuffer();
Udp2flowProducer producer = new Udp2flowProducer(ringBuffer);
disruptor.start();
while (!exit) {
try {
ByteBuffer buf = ByteBuffer.allocate(size);
channel.receive(buf);
receivedDatagrams++; // countig the received packets
buf.flip();
producer.onData(buf);
} catch (Exception e) {
logger.debug("got exeception " + e);
exit = true;
}
}
}
My lmax event is simple...
public class Udp2flowEvent {
ByteBuffer buf;
Udp2flowEvent(int size) {
this.buf = ByteBuffer.allocateDirect(size);
}
public void set(ByteBuffer buf) {
this.buf = buf;
}
public ByteBuffer getEvent() {
return this.buf;
}
}
And this is my factory
public class Udp2flowEventFactory implements EventFactory<Udp2flowEvent> {
private int size;
Udp2flowEventFactory(int size) {
super();
this.size = size;
}
public Udp2flowEvent newInstance() {
return new Udp2flowEvent(size);
}
}
The producer ...
public class Udp2flowProducer {
private final RingBuffer<Udp2flowEvent> ringBuffer;
public Udp2flowProducer(RingBuffer<Udp2flowEvent> ringBuffer)
{
this.ringBuffer = ringBuffer;
}
public void onData(ByteBuffer buf)
{
long sequence = ringBuffer.next(); // Grab the next sequence
try
{
Udp2flowEvent event = ringBuffer.get(sequence);
event.set(buf);
}
finally
{
ringBuffer.publish(sequence);
}
}
}
The interesting but very simple part is the decoder. It looks like this.
public void onEvent(Udp2flowEvent event, long sequence, boolean endOfBatch) {
// each consumer decodes its packets
if (sequence % nn_decoders != decoderid) {
return;
}
ByteBuffer buf = event.getEvent();
event = null; // is it faster to null the event?
shareddata.increaseReceiveddatagrams();
// headertype
// some code omitted. But the code looks something like this
final int headertype = buf.getInt();
final int headerlength = buf.getInt();
final long payloadlength = buf.getLong();
// decoding int and longs works fine.
// but decoding the remaining part not!
byte[] payload = new byte[buf.remaining()];
buf.get(payload);
// some code omitted. The payload is used later on...
}
And here are some interesting facts:
all decoders work well. I see the number of decoders running
all packets are received but the decoding takes too long. More precisely: decoding the first two ints and the long value works fine but decoding the payload takes too long. This leads to a 'backpressure' and some packets are lost.
Fun fact: The code works pretty fine on my MacBook Air but does not work on my server. (MacBook: Core i7; Server: ESXi with 8 virtual Cores on a Xeon #2.6Ghz and no load at all).
Now my questions and I hope that somebody has an idea:
why does it hardly make a difference to use several consumers? The difference is only 5%
In general: What is the best way to receive 60K (or more) UDP packets and decode it? I tried netty as receiver but UDP does not scale very well.
Why is decoding the payload so slow?
Are there any errors that I have overlooked?
Should I use another producer / consumer library? LMAX has a very low latency but what's about throughput?
Ring Buffers don't seem like the right tec for this problem because when a ring buffer has filled all it's capacity it will block and it is also an inherently sequential architecture. You need to know in advance the highest number of packets to expect and size for that. Also UDP is lossy unless you implement a message assurance protocol.
Not sure why you say TCP is not bidirectional, it is and it takes care of lost packets.
To cope with data flooding, you may need to distribute the incoming packets to separate servers if a single one is insufficient. A queue should work to absorb a flood of data. You may need a massive number of decoders awaiting if you want to process this volume of data in near real time.
Suggest you use TCP.
I am using following way to write InputStream to File:
private void writeToFile(InputStream stream) throws IOException {
String filePath = "C:\\Test.jpg";
FileChannel outChannel = new FileOutputStream(filePath).getChannel();
ReadableByteChannel inChannel = Channels.newChannel(stream);
ByteBuffer buffer = ByteBuffer.allocate(1024);
while(true) {
if(inChannel.read(buffer) == -1) {
break;
}
buffer.flip();
outChannel.write(buffer);
buffer.clear();
}
inChannel.close();
outChannel.close();
}
I was wondering if this is the right way to use NIO. I have read a method FileChannel.transferFrom, which takes three parameter:
ReadableByteChannel src
long position
long count
In my case I only have src, I don't have the position and count, is there any way I can use this method to create the file?
Also for Image is there any better way to create image only from InputStream and NIO?
Any information would be very useful to me. There are similar questions here, in SO, but I cannot find any particular solution which suites my case.
I would use Files.copy
Files.copy(is, Paths.get(filePath));
as for your version
ByteBuffer.allocateDirect is faster - Java will make a best effort to perform native I/O operations directly upon it.
Closing is unreliable, if first fails second will never execute. Use try-with-resources instead, Channels are AutoCloseable too.
No it's not correct. You run the risk of losing data. The canonical NIO copy loop is as follows:
while (in.read(buffer) >= 0 || buffer.position() > 0)
{
buffer.flip();
out.write(buffer);
buffer.compact();
}
Note the changed loop conditions, which take care of flushing the output at EOS, and the use of compact() instead of clear(), which takes care of the possibility of short writes.
Similarly the canonical transferTo()/transferFrom() loop is as follows:
long offset = 0;
long quantum = 1024*1024; // or however much you want to transfer at a time
long count;
while ((count = out.transferFrom(in, offset, quantum)) > 0)
{
offset += count;
}
It must be called in a loop, as it isn't guaranteed to transfer the entire quantum.
I need to encrypt and send data over TCP (from a few 100 bytes to a few 100 megabytes per message) in chunks from Java to a C++ program, and need to send the size of the data ahead of time so the recipient knows when to stop reading the current message and process it, then wait for the next message (the connection stays open so there's no other way to indicate end of message; as the data can be binary, I can't use a flag to indicate message end due to the possibility the encrypted bytes might randomly happen to be identical to any flag I choose at some point).
My issue is calculating the encrypted message size before encrypting it, which will in general be different than the input length due to padding etc.
Say I have initialized as follows:
AlgorithmParameterSpec paramSpec = new IvParameterSpec(initv);
encipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
mac = Mac.getInstance("HmacSHA512");
encipher.init(Cipher.ENCRYPT_MODE, key, paramSpec);
mac.init(key);
buf = new byte[encipher.getOutputSize(blockSize)];
Then I send the data as such (and also have an analogous function that uses a stream for input instead of byte[]):
public void writeBytes(DataOutputStream out, byte[] input) {
try {
//mac.reset(); // Needed ?
int left = input.length;
int offset = 0;
while (left > 0)
{
int chunk = Math.min(left, blockSize);
int ctLength = encipher.update(input, offset, chunk, buf, 0);
mac.update(input, offset, chunk);
out.write(buf, 0, ctLength);
left -= chunk;
offset += chunk;
}
out.write(encipher.doFinal(mac.doFinal());
out.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
But how to precalculate the output size that will be sent to the receiving computer?
Basically, I want to out.writeInt(messageSize) before the loop. But how to calculate messageSize? The documentation for Cipher's getOutputSize() says that "This call takes into account any unprocessed (buffered) data from a previous update call, and padding." So this seems to imply that the value might change for the same function argument over multiple calls to update() or doFinal()... Can I assume that if blockSize is a multiple of the AES CBC block size to avoid padding, I should have a constant value for each block? That is, simply check that _blockSize % encipher.getOutputSize(1) != 0 and then in the write function,
int messageSize = (input.length / blockSize) * encipher.getOutputSize(blockSize) +
encipher.getOutputSize(input.length % blockSize + mac.getMacLength());
??
If not, what alternatives do I have?
When using PKCS5 padding, the size of the message after padding will be:
padded_size = original_size + BLOCKSIZE - (original_size % BLOCKSIZE);
The above gives the complete size of the entire message (up to the doFinal() call), given the complete size of the input message. It occurs to me that you actually want to know just the length of the final portion - all you need to do is store the output byte array of the doFinal() call, and use the .length() method on that array.
I need the advice from someone who knows Java very well and the memory issues.
I have a large file (something like 1.5GB) and I need to cut this file in many (100 small files for example) smaller files.
I know generally how to do it (using a BufferedReader), but I would like to know if you have any advice regarding the memory, or tips how to do it faster.
My file contains text, it is not binary and I have about 20 character per line.
To save memory, do not unnecessarily store/duplicate the data in memory (i.e. do not assign them to variables outside the loop). Just process the output immediately as soon as the input comes in.
It really doesn't matter whether you're using BufferedReader or not. It will not cost significantly much more memory as some implicitly seem to suggest. It will at highest only hit a few % from performance. The same applies on using NIO. It will only improve scalability, not memory use. It will only become interesting when you've hundreds of threads running on the same file.
Just loop through the file, write every line immediately to other file as you read in, count the lines and if it reaches 100, then switch to next file, etcetera.
Kickoff example:
String encoding = "UTF-8";
int maxlines = 100;
BufferedReader reader = null;
BufferedWriter writer = null;
try {
reader = new BufferedReader(new InputStreamReader(new FileInputStream("/bigfile.txt"), encoding));
int count = 0;
for (String line; (line = reader.readLine()) != null;) {
if (count++ % maxlines == 0) {
close(writer);
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("/smallfile" + (count / maxlines) + ".txt"), encoding));
}
writer.write(line);
writer.newLine();
}
} finally {
close(writer);
close(reader);
}
First, if your file contains binary data, then using BufferedReader would be a big mistake (because you would be converting the data to String, which is unnecessary and could easily corrupt the data); you should use a BufferedInputStream instead. If it's text data and you need to split it along linebreaks, then using BufferedReader is OK (assuming the file contains lines of a sensible length).
Regarding memory, there shouldn't be any problem if you use a decently sized buffer (I'd use at least 1MB to make sure the HD is doing mostly sequential reading and writing).
If speed turns out to be a problem, you could have a look at the java.nio packages - those are supposedly faster than java.io,
You can consider using memory-mapped files, via FileChannels .
Generally a lot faster for large files. There are performance trade-offs that could make it slower, so YMMV.
Related answer: Java NIO FileChannel versus FileOutputstream performance / usefulness
This is a very good article:
http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/
In summary, for great performance, you should:
Avoid accessing the disk.
Avoid accessing the underlying operating system.
Avoid method calls.
Avoid processing bytes and characters individually.
For example, to reduce the access to disk, you can use a large buffer. The article describes various approaches.
Does it have to be done in Java? I.e. does it need to be platform independent? If not, I'd suggest using the 'split' command in *nix. If you really wanted, you could execute this command via your java program. While I haven't tested, I imagine it perform faster than whatever Java IO implementation you could come up with.
You can use java.nio which is faster than classical Input/Output stream:
http://java.sun.com/javase/6/docs/technotes/guides/io/index.html
Yes.
I also think that using read() with arguments like read(Char[], int init, int end) is a better way to read a such a large file
(Eg : read(buffer,0,buffer.length))
And I also experienced the problem of missing values of using the BufferedReader instead of BufferedInputStreamReader for a binary data input stream. So, using the BufferedInputStreamReader is a much better in this like case.
package all.is.well;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import junit.framework.TestCase;
/**
* #author Naresh Bhabat
*
Following implementation helps to deal with extra large files in java.
This program is tested for dealing with 2GB input file.
There are some points where extra logic can be added in future.
Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.
It uses random access file,which is almost like streaming API.
* ****************************************
Notes regarding executor framework and its readings.
Please note :ExecutorService executor = Executors.newFixedThreadPool(10);
* for 10 threads:Total time required for reading and writing the text in
* :seconds 349.317
*
* For 100:Total time required for reading the text and writing : seconds 464.042
*
* For 1000 : Total time required for reading and writing text :466.538
* For 10000 Total time required for reading and writing in seconds 479.701
*
*
*/
public class DealWithHugeRecordsinFile extends TestCase {
static final String FILEPATH = "C:\\springbatch\\bigfile1.txt.txt";
static final String FILEPATH_WRITE = "C:\\springbatch\\writinghere.txt";
static volatile RandomAccessFile fileToWrite;
static volatile RandomAccessFile file;
static volatile String fileContentsIter;
static volatile int position = 0;
public static void main(String[] args) throws IOException, InterruptedException {
long currentTimeMillis = System.currentTimeMillis();
try {
fileToWrite = new RandomAccessFile(FILEPATH_WRITE, "rw");//for random write,independent of thread obstacles
file = new RandomAccessFile(FILEPATH, "r");//for random read,independent of thread obstacles
seriouslyReadProcessAndWriteAsynch();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Thread currentThread = Thread.currentThread();
System.out.println(currentThread.getName());
long currentTimeMillis2 = System.currentTimeMillis();
double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;
System.out.println("Total time required for reading the text in seconds " + time_seconds);
}
/**
* #throws IOException
* Something asynchronously serious
*/
public static void seriouslyReadProcessAndWriteAsynch() throws IOException {
ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class
while (true) {
String readLine = file.readLine();
if (readLine == null) {
break;
}
Runnable genuineWorker = new Runnable() {
#Override
public void run() {
// do hard processing here in this thread,i have consumed
// some time and ignore some exception in write method.
writeToFile(FILEPATH_WRITE, readLine);
// System.out.println(" :" +
// Thread.currentThread().getName());
}
};
executor.execute(genuineWorker);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");
file.close();
fileToWrite.close();
}
/**
* #param filePath
* #param data
* #param position
*/
private static void writeToFile(String filePath, String data) {
try {
// fileToWrite.seek(position);
data = "\n" + data;
if (!data.contains("Randomization")) {
return;
}
System.out.println("Let us do something time consuming to make this thread busy"+(position++) + " :" + data);
System.out.println("Lets consume through this loop");
int i=1000;
while(i>0){
i--;
}
fileToWrite.write(data.getBytes());
throw new Exception();
} catch (Exception exception) {
System.out.println("exception was thrown but still we are able to proceeed further"
+ " \n This can be used for marking failure of the records");
//exception.printStackTrace();
}
}
}
Don't use read without arguments.
It's very slow.
Better read it to buffer and move it to file quickly.
Use bufferedInputStream because it supports binary reading.
And it's all.
Unless you accidentally read in the whole input file instead of reading it line by line, then your primary limitation will be disk speed. You may want to try starting with a file containing 100 lines and write it to 100 different files one line in each and make the triggering mechanism work on the number of lines written to the current file. That program will be easily scalable to your situation.
Is there an article/algorithm on how I can read a long file at a certain rate?
Say I do not want to pass 10 KB/sec while issuing reads.
A simple solution, by creating a ThrottledInputStream.
This should be used like this:
final InputStream slowIS = new ThrottledInputStream(new BufferedInputStream(new FileInputStream("c:\\file.txt"),8000),300);
300 is the number of kilobytes per second. 8000 is the block size for BufferedInputStream.
This should of course be generalized by implementing read(byte b[], int off, int len), which will spare you a ton of System.currentTimeMillis() calls. System.currentTimeMillis() is called once for each byte read, which can cause a bit of an overhead. It should also be possible to store the number of bytes that can savely be read without calling System.currentTimeMillis().
Be sure to put a BufferedInputStream in between, otherwise the FileInputStream will be polled in single bytes rather than blocks. This will reduce the CPU load form 10% to almost 0. You will risk to exceed the data rate by the number of bytes in the block size.
import java.io.InputStream;
import java.io.IOException;
public class ThrottledInputStream extends InputStream {
private final InputStream rawStream;
private long totalBytesRead;
private long startTimeMillis;
private static final int BYTES_PER_KILOBYTE = 1024;
private static final int MILLIS_PER_SECOND = 1000;
private final int ratePerMillis;
public ThrottledInputStream(InputStream rawStream, int kBytesPersecond) {
this.rawStream = rawStream;
ratePerMillis = kBytesPersecond * BYTES_PER_KILOBYTE / MILLIS_PER_SECOND;
}
#Override
public int read() throws IOException {
if (startTimeMillis == 0) {
startTimeMillis = System.currentTimeMillis();
}
long now = System.currentTimeMillis();
long interval = now - startTimeMillis;
//see if we are too fast..
if (interval * ratePerMillis < totalBytesRead + 1) { //+1 because we are reading 1 byte
try {
final long sleepTime = ratePerMillis / (totalBytesRead + 1) - interval; // will most likely only be relevant on the first few passes
Thread.sleep(Math.max(1, sleepTime));
} catch (InterruptedException e) {//never realized what that is good for :)
}
}
totalBytesRead += 1;
return rawStream.read();
}
}
The crude solution is just to read a chunk at a time and then sleep eg 10k then sleep a second. But the first question I have to ask is: why? There are a couple of likely answers:
You don't want to create work faster than it can be done; or
You don't want to create too great a load on the system.
My suggestion is not to control it at the read level. That's kind of messy and inaccurate. Instead control it at the work end. Java has lots of great concurrency tools to deal with this. There are a few alternative ways of doing this.
I tend to like using a producer consumer pattern for soling this kind of problem. It gives you great options on being able to monitor progress by having a reporting thread and so on and it can be a really clean solution.
Something like an ArrayBlockingQueue can be used for the kind of throttling needed for both (1) and (2). With a limited capacity the reader will eventually block when the queue is full so won't fill up too fast. The workers (consumers) can be controlled to only work so fast to also throttle the rate covering (2).
while !EOF
store System.currentTimeMillis() + 1000 (1 sec) in a long variable
read a 10K buffer
check if stored time has passed
if it isn't, Thread.sleep() for stored time - current time
Creating ThrottledInputStream that takes another InputStream as suggested would be a nice solution.
If you have used Java I/O then you should be familiar with decorating streams. I suggest an InputStream subclass that takes another InputStream and throttles the flow rate. (You could subclass FileInputStream but that approach is highly error-prone and inflexible.)
Your exact implementation will depend upon your exact requirements. Generally you will want to note the time your last read returned (System.nanoTime). On the current read, after the underlying read, wait until sufficient time has passed for the amount of data transferred. A more sophisticated implementation may buffer and return (almost) immediately with only as much data as rate dictates (be careful that you should only return a read length of 0 if the buffer is of zero length).
You can use a RateLimiter. And make your own implementation of the read in InputStream. An example of this can be seen bellow
public class InputStreamFlow extends InputStream {
private final InputStream inputStream;
private final RateLimiter maxBytesPerSecond;
public InputStreamFlow(InputStream inputStream, RateLimiter limiter) {
this.inputStream = inputStream;
this.maxBytesPerSecond = limiter;
}
#Override
public int read() throws IOException {
maxBytesPerSecond.acquire(1);
return (inputStream.read());
}
#Override
public int read(byte[] b) throws IOException {
maxBytesPerSecond.acquire(b.length);
return (inputStream.read(b));
}
#Override
public int read(byte[] b, int off, int len) throws IOException {
maxBytesPerSecond.acquire(len);
return (inputStream.read(b,off, len));
}
}
if you want to limit the flow by 1 MB/s you can get the input stream like this:
final RateLimiter limiter = RateLimiter.create(RateLimiter.ONE_MB);
final InputStreamFlow inputStreamFlow = new InputStreamFlow(originalInputStream, limiter);
It depends a little on whether you mean "don't exceed a certain rate" or "stay close to a certain rate."
If you mean "don't exceed", you can guarantee that with a simple loop:
while not EOF do
read a buffer
Thread.wait(time)
write the buffer
od
The amount of time to wait is a simple function of the size of the buffer; if the buffer size is 10K bytes, you want to wait a second between reads.
If you want to get closer than that, you probably need to use a timer.
create a Runnable to do the reading
create a Timer with a TimerTask to do the reading
schedule the TimerTask n times a second.
If you're concerned about the speed at which you're passing the data on to something else, instead of controlling the read, put the data into a data structure like a queue or circular buffer, and control the other end; send data periodically. You need to be careful with that, though, depending on the data set size and such, because you can run into memory limitations if the reader is very much faster than the writer.