BufferOutputStream write zero byte when merge the file - java

I am trying merge n pieces of file become single file. But I got strange behavior on my function. The function are called for x times in n seconds. Let say I have 100 files which I will merge, every second I call 5 files and merger it. and in the next second the amount is double to be 10, but from 1-5 is the same file as before the rest is new file. It work normal but in some point, its give zero byte or sometime give the right size.
Could you help me spot the mistake in my function below?
public void mergeFile(list<String> fileList, int x) {
int count = 0;
BufferedOutputStream out = null;
try {
out = new BufferedOutputStream(new FileOutputStream("Test.doc"));
for (String file : fileList) {
InputStream in = new BufferedInputStream(new FileInputStream(file));
byte[] buff = new byte[1024];
in.read(buff);
out.write(buff);
in.close();
count++;
if (count == x) {
break;
}
}
out.flush();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
*sorry for my English

in.read(buff);
Check the Javadoc. That method isn't guaranteed to fill the buffer. It returns a value which tells you how many bytes it read. You're supposed to use that, and in this situation you are supposed to use it when deciding how many bytes, if any, to write.

You do not read the full file, you read from each file only up to 1024 bytes. You need to loop the read as long as it returns data (or use something like Files.copy().
BTW: you dont need a BufferedOutputStream if you copy with large buffers.
public void mergeFile(list<String> fileList, int x) throws IOException {
try (OutputStream out = new FileOutputStream("Test.doc");) {
int count=0;
for (String file : fileList) {
Files.copy(new File(file).toPath(), out);
count++;
if (count == x) {
break;
}
}
}
}
You also do not need to flush() if you close. I am using "try-with-resource" here, so I dont need to close it explicitely. It is best to propagate the exceptions.

Related

IO Image reading and writing: Is writing array of bytes different from writing byte at a time using write(int b) method?

I am new to java IO and I tried to simply copy and paste a photo. I used two ways to achieve this the first works nicely but the second doesn't.
This Code works fine.
try (BufferedInputStream input = new BufferedInputStream(new FileInputStream("photoOriginal.jpg"));
BufferedOutputStream output =new BufferedOutputStream(new FileOutputStream("photoCopy.jpg"))) {
try {
int n =0;
byte[] buf = new byte[4092];
while((n = input.read(buf))!=-1){
output. Write(buf,0,n);
output.flush();
}
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
e.printStackTrace();
}
But the second doesn't work , after the program finished I find the copy File with the same exact size as the original but when trying to open it ,it shows format not supported error.
try (BufferedInputStream input = new BufferedInputStream(new FileInputStream("photoOriginal.jpg"));
BufferedOutputStream output =new BufferedOutputStream(new FileOutputStream("photoCopy.jpg"))) {
try {
int byteRead = input.read();
while (byteRead != -1) {
byteRead = input.read();
output.write(byteRead);
output.flush();
}
}
}
} catch (IOException e) {
System.out.println("Error: " + e.getMessage());
e.printStackTrace();
}
I don't understand where the problem is, it seems that the 2 sample are doing the same thing.
Is reading to and writing from byte array different from reading and writing single byte at a time ?
Isn't writing int to a Stream with write(int b) method only writes the lowest 8 bits and vice versa as said in Documentation ?
write
public abstract void write(int b)
throws IOException
Writes the specified byte to this output stream. The general contract for write is that one byte is written to the output stream. The byte to be written is the eight low-order bits of the argument b. The 24 high-order bits of b are ignored.
hope someone will help.
You're not writing out the first byte - you call input.read(), check that it's not -1, but then call input.read() again:
// Broken code
int byteRead = input.read();
while (byteRead != -1) {
byteRead = input.read();
output.write(byteRead);
output.flush();
}
If you just move the next input.read() call to the end of the loop, it will work:
// Working code with duplication
int byteRead = input.read();
while (byteRead != -1) {
output.write(byteRead);
output.flush();
byteRead = input.read();
}
Or you could combine the "read and test" to avoid duplication:
// Working code without duplication
int byteRead;
while ((byteRead = input.read()) != -1) {
output.write(byteRead);
output.flush();
}
However, this is still a very inefficient way of copying a stream. Copying a chunk at a time, as per your first code, is much more efficient (or using the built-in transferTo method if you're using Java 9 or higher, as rostamn79 notes).
Baeldung.com provides information on stream.transferTo() method which does not incur an additional copy to Java heap
https://www.baeldung.com/java-inputstream-to-outputstream
Example code
#Test
public void givenUsingJavaNine_whenCopyingInputStreamToOutputStream_thenCorrect() throws IOException {
String initialString = "Hello World!";
try (InputStream inputStream = new ByteArrayInputStream(initialString.getBytes());
ByteArrayOutputStream targetStream = new ByteArrayOutputStream()) {
inputStream.transferTo(targetStream);
assertEquals(initialString, new String(targetStream.toByteArray()));
}
}
See how this transferTo is called with both streams as arguments

Why I am getting OutOfMemory Exception?

I am getting OutOfMemory Exception. Why? I am using this code for logging. Does this approach correct?
Exceptions and closing of streams are handled in parent methods.
private static void writeToFile(File file, FileWriter out, String message) throws IOException {
if (file.exists() && file.isFile()) {
if ((file.length() + message.getBytes().length) <= FILE_MAX_SIZE_B) {
out.write(message);
} else {
int cutLenght = (int) (file.length() + message.getBytes().length - FILE_MAX_SIZE_B);
FileInputStream fileInputStream = new FileInputStream(file);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileInputStream));
char[] buf = new char[1024];
int numRead = 0;
StringBuffer text = new StringBuffer(1000);
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
String result = new String(text).substring(cutLenght);
result += message;
FileWriter fileWriter = new FileWriter(file, appendToFile);
writeToFile(file, fileWriter, result);
bufferedReader.close();
}
}
}
EDIT:
I am using this method for writting my logs in file. So for example in one second I can call 10 logs. I am getting error on lines:
while ((numRead=bufferedReader.read(buf)) != -1) {
text.append(buf,0,numRead);
}
My guess is that you are getting the OutOfMemoryError because you are reading the entire contents of the log file back into memory once it has gotten too close to its maximum size.
You could instead read and write it in smaller chunks, but that could be tricky since you have to avoid overwriting something you haven't already read.
Overall, this technique seems like a very inefficient method of maintaining the log data. Some alternative approaches off the top of my head:
(1) maintain a set of n log files, each with maximum size FILE_MAX_SIZE_B/n. When the first log fills up, open the next one for writing, and so on; when the last one fills up, go back to the first one. In this way you are discarding some of the oldest log data each time you switch files, but not all of it, and still maintaining your overall size limit.
(2) rotate the data within a single file. After each write, add a marker that indicates this is the end of the log stream. When the file has reached its maximum size, just start again at the beginning, overwriting the data that is there. The marker will tell you where the latest message is.
Try something like this:
void appendToFile(File f, CharSequence message, Charset cs, long maximumSize) throws IOException {
long available = maximumSize - f.length();
if (available > 0) {
FileOutputStream fos = new FileOutputStream(f, true);
try {
CharBuffer chars = CharBuffer.wrap(message);
ByteBuffer bytes = ByteBuffer.allocate(8 * 1024); // Re-used when encoding the string
CharsetEncoder enc = cs.newEncoder();
CoderResult res;
do {
res = enc.encode(chars, bytes, true);
bytes.flip();
long len = Math.min(available, bytes.remaining());
available -= len;
fos.write(bytes.array(), bytes.position(), (int) len);
bytes.clear();
} while (res == CoderResult.OVERFLOW && available > 0);
} finally {
fos.close();
}
}
}
Testable with this:
File f = new File(getCacheDir(), "tmp.txt");
f.delete();
// Or whatever charset you want.
Charset cs = Charset.forName("UTF-8");
int maxlen = 2 * 1024; // For this test, 2kb
try {
for (int i = 0; i < maxlen / 20; i++) {
// Write 30 characters for maxlen/20 times == guaranteed overflow
appendToFile(f, "123456789012345678901234567890", cs, maxlen);
System.out.println("Length=" + f.length());
}
} catch (Throwable t) {
t.printStackTrace();
}
f.delete();
Well, you're getting OOM because you're trying to load a huge file into memory.
Did you try opening it with append option instead?
you get OOME because you load the whole file, then get some part of the string. Instead, do a skip on your input stream and read.

Java: Issue with available() method of BufferedInputStream

I'm dealing with the following code that is used to split a large file into a set of smaller files:
FileInputStream input = new FileInputStream(this.fileToSplit);
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output = new FileOutputStream(fileArr[i]);
BufferedOutputStream oBuff = new BufferedOutputStream(output);
int buffSize = 8192;
byte[] buffer = new byte[buffSize];
while (true) {
if (iBuff.available() < buffSize) {
byte[] newBuff = new byte[iBuff.available()];
iBuff.read(newBuff);
oBuff.write(newBuff);
oBuff.flush();
oBuff.close();
break;
}
int r = iBuff.read(buffer);
if (fileArr[i].length() >= this.partSize) {
oBuff.flush();
oBuff.close();
++i;
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
}
oBuff.write(buffer);
}
} catch (Exception e) {
e.printStackTrace();
}
This is the weird behavior I'm seeing... when I run this code using a 3GB file, the initial iBuff.available() call returns a value of a approximatley 2,100,000,000 and the code works fine. When I run this code on a 12GB file, the initial iBuff.available() call only returns a value of 200,000,000 (which is smaller than the split file size of 500,000,000 and causes the processing to go awry).
I'm thinking this discrepancy in behvaior has something to do with the fact that this is on 32-bit windows. I'm going to run a couple more tests on a 4.5 GB file and a 3.5 GB file. If the 3.5 file works and the 4.5 one doesn't, that will further confirm the theory that it's a 32bit vs 64bit issue since 4GB would then be the threshold.
Well if you read the javadoc it quite clearly states:
Returns the number of bytes that can
be read from this input stream
without blocking (emphasis added by me)
So it's quite clear that what you want is not what this method offers. So depending on the underlying InputStream you may get problems much earlier (eg a stream over the network with a server that doesn't return the filesize - you'd have to read the complete file and buffer it just to return the "correct" available() count, which would take a lot of time - what if you only want to read a header?)
So the correct way to handle this is to change your parsing method to be able to handle the file in pieces. Personally I don't see much reason at all to even use available() here - just calling read() and stopping as soon as read() returns -1 should work fine. Can be made more complicated if you want to assure that every file really contains blockSize byte - just add an internal loop if that scenario is important.
int blockSize = XXX;
byte[] buffer = new byte[blockSize];
int i = 0;
int read = in.read(buffer);
while(read != -1) {
out[i++].write(buffer, 0, read);
read = in.read(buffer);
}
There are few correct uses of available(), and this isn't one of them. You don't need all that junk. Memorize this:
int count;
byte[] buffer = new byte[8192]; // or more
while ((count = in.read(buffer)) > 0)
out.write(buffer, 0, count);
That's the canonical way to copy a stream in Java.
You should not use the InputStream.available() function at all. It is only needed in very special circumstances.
You should also not create byte arrays that are larger than 1 MB. It's a waste of memory. The commonly accepted way is to read a small block (4 kB up to 1 MB) from the source file and then store only as many bytes as you have read in the destination file. Do that until you have reached the end of the source file.
available isn't a measure of how much is still to be read but more a measure how much is guaranteed to be able to read before it might EOF or block waiting for input
and put close calls in the finallies
BufferedInputStream iBuff = new BufferedInputStream(input);
int i = 0;
FileOutputStream output;
BufferedOutputStream oBuff=0;
try{
int buffSize = 8192;
int offset=0;
byte[] buffer = new byte[buffSize];
while(true){
int len = iBuff.read(buffer,offset,buffSize-offset);
if(len==-1){//EOF write out last chunk
oBuff.write(buffer,0,offset);
break;
}
if(len+offset==buffSize){//end of buffer write out to file
try{
output = new FileOutputStream(fileArr[i]);
oBuff = new BufferedOutputStream(output);
oBuff.write(buffer);
}finally{
oBuff.close();
}
++i;
offset=0;
}
offset+=len;
}//while
}finally{
iBuff.close();
}
Here is some code that splits a file. If performance is critical to you, you can experiment with the buffer size.
package so6164853;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Formatter;
public class FileSplitter {
private static String printf(String fmt, Object... args) {
Formatter formatter = new Formatter();
formatter.format(fmt, args);
return formatter.out().toString();
}
/**
* #param outputPattern see {#link Formatter}
*/
public static void splitFile(String inputFilename, long fragmentSize, String outputPattern) throws IOException {
InputStream input = new FileInputStream(inputFilename);
try {
byte[] buffer = new byte[65536];
int outputFileNo = 0;
OutputStream output = null;
long writtenToOutput = 0;
try {
while (true) {
int bytesToRead = buffer.length;
if (bytesToRead > fragmentSize - writtenToOutput) {
bytesToRead = (int) (fragmentSize - writtenToOutput);
}
int bytesRead = input.read(buffer, 0, bytesToRead);
if (bytesRead != -1) {
if (output == null) {
String outputName = printf(outputPattern, outputFileNo);
outputFileNo++;
output = new FileOutputStream(outputName);
writtenToOutput = 0;
}
output.write(buffer, 0, bytesRead);
writtenToOutput += bytesRead;
}
if (output != null && (bytesRead == -1 || writtenToOutput == fragmentSize)) {
output.close();
output = null;
}
if (bytesRead == -1) {
break;
}
}
} finally {
if (output != null) {
output.close();
}
}
} finally {
input.close();
}
}
public static void main(String[] args) throws IOException {
splitFile("d:/backup.zip", 1440 << 10, "d:/backup.zip.part%04d");
}
}
Some remarks:
Only those bytes that have actually been read from the input file are written to one of the output files.
I left out the BufferedInputStream and BufferedOutputStream since their buffer's size is only 8192 bytes, which less than the buffer I use in the code.
As soon as I open a file, I make sure that it will be closed at the end, no matter what happens. (The finally blocks.)
The code contains only one call to input.read and only one call to output.write. This makes it easier to check for correctness.
The code for splitting a file does not catch the IOException, since it doesn't know what to do in such a case. It is just passed to the caller; maybe the caller knows how to handle it.
Both #ratchet and #Voo are correct.
As for what is happening.
int max value is 2,147,483,647 (http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html).
14 gigabytes is 15,032,385,536 which clearly don't fit an int.
See that according to the API Javadoc (http://download.oracle.com/javase/6/docs/api/java/io/BufferedInputStream.html#available%28%29) and as stated by #Voo, this don't break the method contract at all (just isn't what you are looking for).

read a file byte by byte then perform some operation every n bytes

I would like to know how can I read a file byte by byte then perform some operation every n bytes.
for example:
Say I have a file of size = 50 bytes, I want to divide it into blocks each of n bytes. Then each block is sent to a function for some operations to be done on those bytes. The blocks are to be created during the read process and sent to the function when the block reaches n bytes so that I don`t use much memory for storing all blocks.
I want the output of the function to be written/appended on a new file.
This is what I've reached to read, yet I don't know it it is right:
fc = new JFileChooser();
File f = fc.getSelectedFile();
FileInputStream in = new FileInputStream(f);
byte[] b = new byte[16];
in.read(b);
I haven't done anything yet for the write process.
You're on the right lines. Consider wrapping your FileInputStream with a BufferedInputStream, which improve I/O efficiency by reading the file in chunks.
The next step is to check the number of bytes read (returned by your call to read) and to hand-off the array to the processing function. Obviously you'll need to pass the number of bytes read to this method too in case the array was only partially populated.
So far your code looks OK. For reading binary files (as opposed to text files) you should indeed use FileInputStream (for reading text files, you should use a Reader, such as FileReader).
Note that you should check the return value from in.read(b);, because it might read less than 16 bytes if there are less than 16 bytes left at the end of the file.
Ofcourse you should add a loop to the program that keeps reading blocks of bytes until you reach the end of the file.
To write data to a binary file, use FileOutputStream. That class has a constructor that you can pass a flag to indicate that you want to append to an existing file:
FileOutputStream out = new FileOutputStream("output.bin", true);
Also, don't forget to call close() on the FileInputStream and FileOutputStream when you are done.
See the Java API documentation, especially the classes in the java.io package.
I believe that this will work:
final int blockSize = // some calculation
byte[] block = new byte[blockSize];
InputStream is = new FileInputStream(f);
try {
int ret = -1;
do {
int bytesRead = 0;
while (bytesRead < blockSize) {
ret = is.read(block, bytesRead, blockSize - bytesRead);
if (ret < 0)
break; // no more data
bytesRead += ret;
}
myFunction(block, bytesRead);
} while (0 <= ret);
}
finally {
is.close();
}
This code will call myFunction with blockSize bytes for all but possibly the last invocation.
It's a start.
You should check what read() returns. It can read fewer bytes than the size of the array, and also indicate that the end of the file is reached.
Obviously, you need to read() in a loop...
It might be a good idea to reuse the array, but that requires that the part that reads the array copies what it needs, rather than just keeping a reference to the array.
I think this is what you migth need
void readFile(String path, int n) {
try {
File f = new File(path);
FileInputStream fis = new FileInputStream(f);
int ret = 0;
byte[] array = new byte[n];
while(ret > -1) {
ret = fis.read(array);
doSomething(array, ret);
}
fis.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

Traditional IO vs memory-mapped

I'm trying to illustrate the difference in performance between traditional IO and memory mapped files in java to students.
I found an example somewhere on internet but not everything is clear to me, I don't even think all steps are nececery. I read a lot about it here and there but I'm not convinced about a correct implementation of neither of them.
The code I try to understand is:
public class FileCopy{
public static void main(String args[]){
if (args.length < 1){
System.out.println(" Wrong usage!");
System.out.println(" Correct usage is : java FileCopy <large file with full path>");
System.exit(0);
}
String inFileName = args[0];
File inFile = new File(inFileName);
if (inFile.exists() != true){
System.out.println(inFileName + " does not exist!");
System.exit(0);
}
try{
new FileCopy().memoryMappedCopy(inFileName, inFileName+".new" );
new FileCopy().customBufferedCopy(inFileName, inFileName+".new1");
}catch(FileNotFoundException fne){
fne.printStackTrace();
}catch(IOException ioe){
ioe.printStackTrace();
}catch (Exception e){
e.printStackTrace();
}
}
public void memoryMappedCopy(String fromFile, String toFile ) throws Exception{
long timeIn = new Date().getTime();
// read input file
RandomAccessFile rafIn = new RandomAccessFile(fromFile, "rw");
FileChannel fcIn = rafIn.getChannel();
ByteBuffer byteBuffIn = fcIn.map(FileChannel.MapMode.READ_WRITE, 0,(int) fcIn.size());
fcIn.read(byteBuffIn);
byteBuffIn.flip();
RandomAccessFile rafOut = new RandomAccessFile(toFile, "rw");
FileChannel fcOut = rafOut.getChannel();
ByteBuffer writeMap = fcOut.map(FileChannel.MapMode.READ_WRITE,0,(int) fcIn.size());
writeMap.put(byteBuffIn);
long timeOut = new Date().getTime();
System.out.println("Memory mapped copy Time for a file of size :" + (int) fcIn.size() +" is "+(timeOut-timeIn));
fcOut.close();
fcIn.close();
}
static final int CHUNK_SIZE = 100000;
static final char[] inChars = new char[CHUNK_SIZE];
public static void customBufferedCopy(String fromFile, String toFile) throws IOException{
long timeIn = new Date().getTime();
Reader in = new FileReader(fromFile);
Writer out = new FileWriter(toFile);
while (true) {
synchronized (inChars) {
int amountRead = in.read(inChars);
if (amountRead == -1) {
break;
}
out.write(inChars, 0, amountRead);
}
}
long timeOut = new Date().getTime();
System.out.println("Custom buffered copy Time for a file of size :" + (int) new File(fromFile).length() +" is "+(timeOut-timeIn));
in.close();
out.close();
}
}
When exactly is it nececary to use RandomAccessFile? Here it is used to read and write in the memoryMappedCopy, is it actually nececary just to copy a file at all? Or is it a part of memorry mapping?
In customBufferedCopy, why is synchronized used here?
I also found a different example that -should- test the performance between the 2:
public class MappedIO {
private static int numOfInts = 4000000;
private static int numOfUbuffInts = 200000;
private abstract static class Tester {
private String name;
public Tester(String name) { this.name = name; }
public long runTest() {
System.out.print(name + ": ");
try {
long startTime = System.currentTimeMillis();
test();
long endTime = System.currentTimeMillis();
return (endTime - startTime);
} catch (IOException e) {
throw new RuntimeException(e);
}
}
public abstract void test() throws IOException;
}
private static Tester[] tests = {
new Tester("Stream Write") {
public void test() throws IOException {
DataOutputStream dos = new DataOutputStream(
new BufferedOutputStream(
new FileOutputStream(new File("temp.tmp"))));
for(int i = 0; i < numOfInts; i++)
dos.writeInt(i);
dos.close();
}
},
new Tester("Mapped Write") {
public void test() throws IOException {
FileChannel fc =
new RandomAccessFile("temp.tmp", "rw")
.getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE, 0, fc.size())
.asIntBuffer();
for(int i = 0; i < numOfInts; i++)
ib.put(i);
fc.close();
}
},
new Tester("Stream Read") {
public void test() throws IOException {
DataInputStream dis = new DataInputStream(
new BufferedInputStream(
new FileInputStream("temp.tmp")));
for(int i = 0; i < numOfInts; i++)
dis.readInt();
dis.close();
}
},
new Tester("Mapped Read") {
public void test() throws IOException {
FileChannel fc = new FileInputStream(
new File("temp.tmp")).getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_ONLY, 0, fc.size())
.asIntBuffer();
while(ib.hasRemaining())
ib.get();
fc.close();
}
},
new Tester("Stream Read/Write") {
public void test() throws IOException {
RandomAccessFile raf = new RandomAccessFile(
new File("temp.tmp"), "rw");
raf.writeInt(1);
for(int i = 0; i < numOfUbuffInts; i++) {
raf.seek(raf.length() - 4);
raf.writeInt(raf.readInt());
}
raf.close();
}
},
new Tester("Mapped Read/Write") {
public void test() throws IOException {
FileChannel fc = new RandomAccessFile(
new File("temp.tmp"), "rw").getChannel();
IntBuffer ib = fc.map(
FileChannel.MapMode.READ_WRITE, 0, fc.size())
.asIntBuffer();
ib.put(0);
for(int i = 1; i < numOfUbuffInts; i++)
ib.put(ib.get(i - 1));
fc.close();
}
}
};
public static void main(String[] args) {
for(int i = 0; i < tests.length; i++)
System.out.println(tests[i].runTest());
}
}
I more or less see whats going on, my output looks like this:
Stream Write: 653
Mapped Write: 51
Stream Read: 651
Mapped Read: 40
Stream Read/Write: 14481
Mapped Read/Write: 6
What is makeing the Stream Read/Write so unbelievably long? And as a read/write test, to me it looks a bit pointless to read the same integer over and over (if I understand well what's going on in the Stream Read/Write) Wouldn't it be better to read int's from the previously written file and just read and write ints on the same place? Is there a better way to illustrate it?
I've been breaking my head about a lot of these things for a while and I just can't get the whole picture..
What I see with the one benchmark "Stream Read/Write" is:
It does not really do stream I/O but seeks to a specific location in the file. This is non-buffered so all the I/Os must be completed from disk (the other streams are using buffered I/O so really read/write in large blocks then the ints are read from or written to the memory area).
It is seeking to the end - 4 bytes so reads the last int and the writes a new int. The file continues to grow in length by one int every iteration. This really doesn't add much to the time cost though (but does show that the author of that benchmark either misunderstood something or was not careful).
This explains the very high cost of that particular benchmark.
You asked:
Wouldn't it be better to read int's
from the previously written file and
just read and write ints on the same
place?
This is what the author I think was trying to do with the last two benchmarks but that's not what they got. With RandomAccessFile to read and write the same place in the file you would need to put a seek before the read and the write:
raf.seek(raf.length() - 4);
int val = raf.readInt();
raf.seek(raf.length() - 4);
raf.writeInt(val);
This does demonstrate one advantage of memory mapped I/O since you can just use the same memory address to access the same bits of the file instead of having to do an additional seek before every call.
By the way, your first benchmark example class may have issues too since CHUNK_SIZE is not an even multiple of the file system block size. Often it's good to use multiples of 1024 and 8192 has been shown as a good sweet spot for most applications (and the reason the Java's BufferedInputStream and BufferedOutputStream use that value for the default buffer sizes). The OS will need to read an extra block(s) to satisfy read requests that are not on block boundaries. Subsequent reads (of a stream) will reread the same block, possibly some full blocks, and then an extra again. Memory mapped I/O always physically reads and writes in blocks as the actual I/Os are handled by the OS memory manager which would use its page size. Page size is always optimized to map well to file blocks.
In that example, the memory mapped test does read everything into a memory buffer and then write it all back out. These two tests are really not well written to compare those two cases. memmoryMappedCopy should read and write in the same chunk size as customBufferedCopy.
EDIT: There may even be more things wrong with these test classes. Because of your comment to the other answer I looked more carefully at the first class again.
Method customBufferedCopy is static and uses a static buffer. For this kind of test that buffer should be defined within the method. Then it would not need to use synchronized (though it doesn't need it in this context and for these tests anyway). This static method is called as a normal method, which is bad programming practice (i.e. use FileCopy.customBufferedCopy(...) instead of new FileCopy().customBufferedCopy(...)).
If you actually did run this test from multiple threads the use of that buffer would be contentious and the benchmark would not just be about file I/O so it would not be fair to compare the results of the two test methods.
1) These sound like questions your students should be asking - not the other way around?
2) The reason the two methods are used are to demonstrate the different ways that you can copy a file. I would hazard a guess that the first method (RamdomAccessFile) creates a version of the file in RAM, and then copies to a new version on the disk, and that the second method (customBufferedCop) reads directly from the drive.
3) I'm not sure, but I think synchronized is used to ensure that multiple instances of the same class do not write at the same time.
4) As for the last question, I've got to go - so I hope someone else can help you with that.
Seriously though, these sound like just the questions a tutor should be teaching to their students. If you don't have the ability to research simple things like this yourself, what kind of example are you setting your students? </rant>
Thanks for looking in to this. I will look at the first examples later, for now, my professor asked to rewrite the 2 tests (Stream and mapped read/write)
They generate random ints, first read the index (the generated int) and check if the int at this index is equal to the generated int, if it's not equal, the generated int is written at its index. He thought this could result in a better test, making more use of the RandomAccessFile, does this make sence?
However I have some issues, first of all I dont know how to use a buffer with the stream read/write when I'm using RandomAccessFile, I found a lot about byte[] buffers using an array but i'm not sure how to use it correctly.
My code so far for this test:
new Tester("Stream Read/Write") {
public void test() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("temp.tmp"), "rw");
raf.seek(numOfUbuffInts*4);
raf.writeInt(numOfUbuffInts);
for (int i = 0; i < numOfUbuffInts; i++) {
int getal = (int) (1 + Math.random() * numOfUbuffInts);
raf.seek(getal*4);
if (raf.readInt() != getal) {
raf.seek(getal*4);
raf.writeInt(getal);
}
}
raf.close();
}
},
So this is still unbuffered..
The second test I did as following:
new Tester("Mapped Read/Write") {
public void test() throws IOException {
RandomAccessFile raf = new RandomAccessFile(new File("temp.tmp"), "rw");
raf.seek(numOfUbuffInts*4);
raf.writeInt(numOfUbuffInts);
FileChannel fc = raf.getChannel();
IntBuffer ib = fc.map(FileChannel.MapMode.READ_WRITE, 0, fc.size()).asIntBuffer();
for(int i = 1; i < numOfUbuffInts; i++) {
int getal = (int) (1 + Math.random() * numOfUbuffInts);
if (ib.get(getal) != getal) {
ib.put(getal, getal);
}
}
fc.close();
}
}
For small numbers of numOfUbuffInts it seems to go fast, for large numbers (20 000 000+) it takes ages.
I just tried some things but i'm not sure if i'm on the right track.

Categories