When given a buffer of MAX_BUFFER_SIZE, and a file that far exceeds it, how can one:
Read the file in blocks of MAX_BUFFER_SIZE?
Do it as fast as possible
I tried using NIO
RandomAccessFile aFile = new RandomAccessFile(fileName, "r");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buffer = ByteBuffer.allocate(CAPARICY);
int bytesRead = inChannel.read(buffer);
buffer.flip();
while (buffer.hasRemaining()) {
buffer.get();
}
buffer.clear();
bytesRead = inChannel.read(buffer);
aFile.close();
And regular IO
InputStream in = new FileInputStream(fileName);
long length = fileName.length();
if (length > Integer.MAX_VALUE) {
throw new IOException("File is too large!");
}
byte[] bytes = new byte[(int) length];
int offset = 0;
int numRead = 0;
while (offset < bytes.length
&& (numRead = in.read(bytes, offset, bytes.length - offset)) >= 0) {
offset += numRead;
}
if (offset < bytes.length) {
throw new IOException("Could not completely read file " + fileName);
}
in.close();
Turns out that regular IO is about 100 times faster in doing the same thing as NIO. Am i missing something? Is this expected? Is there a faster way to read the file in buffer chunks?
Ultimately i am working with a large file i don't have memory for to read it all at once. Instead, I'd like to read it incrementally in blocks that would then be used for processing.
If you want to make your first example faster
FileChannel inChannel = new FileInputStream(fileName).getChannel();
ByteBuffer buffer = ByteBuffer.allocateDirect(CAPACITY);
while(inChannel.read(buffer) > 0)
buffer.clear(); // do something with the data and clear/compact it.
inChannel.close();
If you want it to be even faster.
FileChannel inChannel = new RandomAccessFile(fileName, "r").getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
// access the buffer as you wish.
inChannel.close();
This can take 10 - 20 micro-seconds for files up to 2 GB in size.
Assuming that you need to read the entire file into memory at once (as you're currently doing), neither reading smaller chunks nor NIO are going to help you here.
In fact, you'd probably be best reading larger chunks - which your regular IO code is automatically doing for you.
Your NIO code is currently slower, because you're only reading one byte at a time (using buffer.get();).
If you want to process in chunks - for example, transferring between streams - here is a standard way of doing it without NIO:
InputStream is = ...;
OutputStream os = ...;
byte buffer[] = new byte[1024];
int read;
while((read = is.read(buffer)) != -1){
os.write(buffer, 0, read);
}
This uses a buffer size of only 1 KB, but can transfer an unlimited amount of data.
(If you extend your answer with details of what you're actually looking to do at a functional level, I could further improve this to a better answer.)
Related
I have a large text file (csv) on disk that I'm splitting into lines. Something like this:
BufferedReader reader = new BufferedReader(new FileReader(file));
while ((line = reader .readLine()) != null) {
...
}
What I want to do is compute the offset from the start of the file for every 1,000 lines say, so if in the future I want to read the 10,001th line, I can jump straight to offset X, then start iterating.
The file could be encoded in any way, so there is no strong relationship between bytes and chars.
Does anyone know of any "counting readers", or an alternative approach? I'm very happy to implement a Reader myself, but don't want to write a very complex class if I can avoid it.
When you need random access, BufferedReader is not suited. Instead, you need to look into Channel and its subclasses like FileChannel and so on.
Simple example of reading using a channel:
RandomAccessFile aFile = new RandomAccessFile("data/nio-data.txt", "rw");
FileChannel inChannel = aFile.getChannel();
ByteBuffer buf = ByteBuffer.allocate(48);
int bytesRead = inChannel.read(buf);
while (bytesRead != -1) {
System.out.println("Read " + bytesRead);
buf.flip();
while(buf.hasRemaining()){
System.out.print((char) buf.get());
}
buf.clear();
bytesRead = inChannel.read(buf);
}
aFile.close();
Source: http://tutorials.jenkov.com/java-nio/channels.html
As for your question of reading from where you left off, FileChannel defines a method read(ByteBuffer buf,int position) where position is the position in bytes where yu want to read from.
I'm working with Amazon S3 and would like to upload an InputStream (which requires counting the number of bytes I'm sending).
public static boolean uploadDataTo(String bucketName, String key, String fileName, InputStream stream) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[1];
try {
while (stream.read(buffer) != -1) { // copy from stream to buffer
out.write(buffer); // copy from buffer to byte array
}
} catch (Exception e) {
UtilityFunctionsObject.writeLogException(null, e);
}
byte[] result = out.toByteArray(); // we needed all that just for length
int bytes = result.length;
IO.close(out);
InputStream uploadStream = new ByteArrayInputStream(result);
....
}
I was told copying a byte at a time is highly inefficient (obvious for large files). I can't make it more because it will add padding to the ByteArrayOutputStream, which I can't strip out. I can strip it out from result, but how can I do it safely? If I use an 8KB buffer, can I just strip out the right most buffer[i] == 0? Or is there a better way to do this? Thanks!
Using Java 7 on Windows 7 x64.
You can do something like this:
int read = 0;
while ((read = stream.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
stream.read() returns the number of bytes that have been written into buffer. You can pass this information to the len parameter of out.write(). So you make sure that you write only the bytes you have read from the stream.
Use Jakarta Commons IOUtils to copy from the input stream to the byte array stream in a single step. It will use an efficient buffer, and not write any excess bytes.
If you want efficiency you could process the file as you read it. I would replace uploadStream with stream and remove the rest of the code.
If you need some buffering you can do this
InputStream uploadStream = new BufferedInputStream(stream);
the default buffer size is 8 KB.
If you want the length use File.length();
long length = new File(fileName).length();
I know how to read a file by bytes but cannot find a example how to read it in chunks of bytes. I have a byte array, and i want to read the file by 512bytes and send them over a socket.
I have tried by reading total bytes of file and then subtracting 512 bytes until i got a chunk that was less than 512 bytes and signaled EOF and end of transfer.
I am trying to implement a TFTP, where data is sent in 512 byte chunks.
Anyhow would be thankful for a example.
You ... read 512 bytes at a time.
char[] myBuffer = new char[512];
int bytesRead = 0;
BufferedReader in = new BufferedReader(new FileReader("foo.txt"));
while ((bytesRead = in.read(myBuffer,0,512)) != -1)
{
...
}
You can use the appropriate read() method from the input stream, for example FileInputStream supports a read(byte[]) to read a chunk of bytes.
something like: You may want to wrap the input stream in a BufferedInputStream if you wanted to guarantee 512 byte blocks (the constructor takes a block size argument).
byte[] buffer = new byte[512];
FileInputStream in = new FileInputStream("some_file");
int rc = in.read(buffer);
while(rc != -1)
{
// rc should contain the number of bytes read in this operation.
// do stuff...
// next read
rc = in.read(buffer);
}
Using InputStream you can read in an array of given size and limit the reading to this size.
Read here: http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html
I have a Java class, where I'm reading data in via an InputStream
byte[] b = null;
try {
b = new byte[in.available()];
in.read(b);
} catch (IOException e) {
e.printStackTrace();
}
It works perfectly when I run my app from the IDE (Eclipse).
But when I export my project and it's packed in a JAR, the read command doesn't read all the data. How could I fix it?
This problem mostly occurs when the InputStream is a File (~10kb).
Thanks!
Usually I prefer using a fixed size buffer when reading from input stream. As evilone pointed out, using available() as buffer size might not be a good idea because, say, if you are reading a remote resource, then you might not know the available bytes in advance. You can read the javadoc of InputStream to get more insight.
Here is the code snippet I usually use for reading input stream:
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) >= 0){
for (int i = 0; i < bytesRead; i++){
//Do whatever you need with the bytes here
}
}
The version of read() I'm using here will fill the given buffer as much as possible and
return number of bytes actually read. This means there is chance that your buffer may contain trailing garbage data, so it is very important to use bytes only up to bytesRead.
Note the line (bytesRead = in.read(buffer)) >= 0, there is nothing in the InputStream spec saying that read() cannot read 0 bytes. You may need to handle the case when read() reads 0 bytes as special case depending on your case. For local file I never experienced such case; however, when reading remote resources, I actually seen read() reads 0 bytes constantly resulting the above code into an infinite loop. I solved the infinite loop problem by counting the number of times I read 0 bytes, when the counter exceed a threshold I will throw exception. You may not encounter this problem, but just keep this in mind :)
I probably will stay away from creating new byte array for each read for performance reasons.
read() will return -1 when the InputStream is depleted. There is also a version of read which takes an array, this allows you to do chunked reads. It returns the number of bytes actually read or -1 when at the end of the InputStream. Combine this with a dynamic buffer such as ByteArrayOutputStream to get the following:
InputStream in = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int read;
byte[] input = new byte[4096];
while ( -1 != ( read = in.read( input ) ) ) {
buffer.write( input, 0, read );
}
input = buffer.toByteArray()
This cuts down a lot on the number of methods you have to invoke and allows the ByteArrayOutputStream to grow its internal buffer faster.
File file = new File("/path/to/file");
try {
InputStream is = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(is);
System.out.println("Byte array size: " + bytes.length);
} catch (IOException e) {
e.printStackTrace();
}
Below is a snippet of code that downloads a file (*. Png, *. Jpeg, *. Gif, ...) and write it in BufferedOutputStream that represents the HttpServletResponse.
BufferedInputStream inputStream = bo.getBufferedInputStream(imageFile);
try {
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int bytesRead = 0;
byte[] input = new byte[DefaultBufferSizeIndicator.getDefaultBufferSize()];
while (-1 != (bytesRead = inputStream.read(input))) {
buffer.write(input, 0, bytesRead);
}
input = buffer.toByteArray();
response.reset();
response.setBufferSize(DefaultBufferSizeIndicator.getDefaultBufferSize());
response.setContentType(mimeType);
// Here's the secret. Content-Length should equal the number of bytes read.
response.setHeader("Content-Length", String.valueOf(buffer.size()));
response.setHeader("Content-Disposition", "inline; filename=\"" + imageFile.getName() + "\"");
BufferedOutputStream outputStream = new BufferedOutputStream(response.getOutputStream(), DefaultBufferSizeIndicator.getDefaultBufferSize());
try {
outputStream.write(input, 0, buffer.size());
} finally {
ImageBO.close(outputStream);
}
} finally {
ImageBO.close(inputStream);
}
Hope this helps.
I'm making a program in Java with Sockets. I can send commands to the client and from the client to the server. To read the commands I use a BufferedReader. To write them, a PrintWriter But now I want to transfer a file through that socket (Not simply create a second connection).First I write to the outputstream how many bytes the file contains. For example 40000 bytes. So I write the number 40000 through the socket, but the other side of the connection reads 78.
So I was thinking: The BufferedReader reads more than just the line (by calling readLine()) and on that way I lose some bytes from the file-data. Because they are in the buffer from the BufferedReader.
So the number 78 is a byte of the file I want to transmit.
Is this way of thinking right, or not. If so, how to sovle this problem.
I hope I've explained well.
Here is my code, but my default language is Dutch. So some variable-name can sound stange.
public void flushStreamToStream(InputStream is, OutputStream os, boolean closeIn, boolean closeOut) throws IOException {
byte[] buffer = new byte[BUFFERSIZE];
int bytesRead;
if ((!closeOut) && closeIn) { // To Socket from File
action = "Upload";
os.write(is.available()); // Here I write 400000
max = is.available();
System.out.println("Bytes to send: " + max);
while ((bytesRead = is.read(buffer)) != -1) {
startTiming(); // Two lines to compute the speed
os.write(buffer, 0, bytesRead);
stopTiming(); // Speed compution
process += bytesRead;
}
os.flush();
is.close();
return;
}
if ((!closeIn) && closeOut) { // To File from Socket
action = "Download";
int bytesToRead = -1;
bytesToRead = is.read(); // Here he reads 78.
System.out.println("Bytes to read: " + bytesToRead);
max = bytesToRead;
int nextBufferSize;
while ((nextBufferSize = Math.min(BUFFERSIZE, bytesToRead)) > 0) {
startTiming();
bytesRead = is.read(buffer, 0, nextBufferSize);
bytesToRead -= bytesRead;
process += nextBufferSize;
os.write(buffer, 0, bytesRead);
stopTiming();
}
os.flush();
os.close();
return;
}
throw new IllegalArgumentException("The only two boolean combinations are: closeOut == false && closeIn == true AND closeOut == true && closeIn == false");
}
Here is the solution:
Thanks to James suggestion
I think laginimaineb anwser was a piece of the solution.
Read the commands.
DataInputStream in = new DataInputStream(is); // Originally a BufferedReader
// Read the request line
String str;
while ((str = in.readLine()) != null) {
if (str.trim().equals("")) {
continue;
}
handleSocketInput(str);
}
Now the flushStreamToStream:
public void flushStreamToStream(InputStream is, OutputStream os, boolean closeIn, boolean closeOut) throws IOException {
byte[] buffer = new byte[BUFFERSIZE];
int bytesRead;
if ((!closeOut) && closeIn) { // To Socket from File
action = "Upload";
DataOutputStream dos = new DataOutputStream(os);
dos.writeInt(is.available());
max = is.available();
System.out.println("Bytes to send: " + max);
while ((bytesRead = is.read(buffer)) != -1) {
startTiming();
dos.write(buffer, 0, bytesRead);
stopTiming();
process += bytesRead;
}
os.flush();
is.close();
return;
}
if ((!closeIn) && closeOut) { // To File from Socket
action = "Download";
DataInputStream dis = new DataInputStream(is);
int bytesToRead = dis.readInt();
System.out.println("Bytes to read: " + bytesToRead);
max = bytesToRead;
int nextBufferSize;
while ((nextBufferSize = Math.min(BUFFERSIZE, bytesToRead)) > 0) {
startTiming();
bytesRead = is.read(buffer, 0, nextBufferSize);
bytesToRead -= bytesRead;
process += nextBufferSize;
os.write(buffer, 0, bytesRead);
stopTiming();
}
os.flush();
os.close();
return;
}
throw new IllegalArgumentException("The only two boolean combinations are: closeOut == false && closeIn == true AND closeOut == true && closeIn == false");
}
Martijn.
I'm not sure I've followed your explanation.
However, yes - you have no real control over how much a BufferedReader will actually read. The point of such a reader is that it optimistically reads chunks of the underlying resource as needed to replenish its buffer. So when you first call readLine, it will see that its internal buffer doesn't have enough to serve youthe request, and will go off and read however many bytes it feels like into its buffer from the underlying source, which will generally be much more than you asked for just then. Once the buffer's been populated, it returns your line from the buffered content.
Thus once you wrap an input stream in a BufferedReader, you should be sure to only read that stream through the same buffered reader. If you don't you'll end up losing data (as some bytes will have been consumed and are now sitting in the BufferedReader's cache waiting to be served).
DataInputStream is most likely what you want to use. Also, don't use the available() method as it is generally useless.
A BufferedReader assumes that it is the only one reading from the underlying input stream.
It's purpose is to minimize the number of reads from the underlying stream (which are expensive, as they can delegate quite deeply). To that end, it keeps a buffer, which it fills by reading as many bytes as possible into it in a single call to the underlying stream.
So yes, your diagnosis is accurate.
Just a wild stab here - 40000 is 1001110001000000 in binary. Now, the first seven bits here are 1001110 which is 78. Meaning, you're writing 2 bytes of information but reading seven bits.