Java FileInputStream FileOutputStream difference in the run

Java FileInputStream FileOutputStream difference in the run - java

Could someone tell me why the 1. run is wrong? (The return code is 0, but the file written is only half of the original one.
Thanks in advance!
public class FileCopyFisFos {
public static void main(String[] args) throws IOException {
FileInputStream fis = new FileInputStream("d:/Test1/OrigFile.MP4");
FileOutputStream fos = new FileOutputStream("d:/Test2/DestFile.mp4");
// 1. run
// while (fis.read() != -1){
// int len = fis.read();
// fos.write(len);
// }
// 2. run
// int len;
// while ((len = fis.read()) != -1){
// fos.write(len);
// }
fis.close();
fos.close();
}
}

FileInputStream 's read() method follows this logic:
Reads a byte of data from this input stream. This method blocks if no input is yet available.
So assigning the value of its return to a variable, such as:
while((len = fis.read())!= -1)
Is avoiding the byte of data just read from the stream to be forgotten, as every read() call will be assigned to your len variable.
Instead, this code bypasses one of every two bytes from the stream, as the read() executed in the while condition is never assigned to a variable. So the stream advances without half of the bytes being read (assigned to len):
while (fis.read() != -1) { // reads a byte of data (but not saved)
int len = fis.read(); // next byte of data saved
fos.write(len); // possible -1 written here
}

#aran and others already pointed out the solution to your problem.
However there are more sides to this, so I extended your example:
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
public class FileCopyFisFos {
public static void main(final String[] args) throws IOException {
final File src = new File("d:/Test1/OrigFile.MP4");
final File sink = new File("d:/Test2/DestFile.mp4");
{
final long startMS = System.currentTimeMillis();
final long bytesCopied = copyFileSimple(src, sink);
System.out.println("Simple copy transferred " + bytesCopied + " bytes in " + (System.currentTimeMillis() - startMS) + "ms");
}
{
final long startMS = System.currentTimeMillis();
final long bytesCopied = copyFileSimpleFaster(src, sink);
System.out.println("Simple+Fast copy transferred " + bytesCopied + " bytes in " + (System.currentTimeMillis() - startMS) + "ms");
}
{
final long startMS = System.currentTimeMillis();
final long bytesCopied = copyFileFast(src, sink);
System.out.println("Fast copy transferred " + bytesCopied + " bytes in " + (System.currentTimeMillis() - startMS) + "ms");
}
System.out.println("Test completed.");
}
static public long copyFileSimple(final File pSourceFile, final File pSinkFile) throws IOException {
try (
final FileInputStream fis = new FileInputStream(pSourceFile);
final FileOutputStream fos = new FileOutputStream(pSinkFile);) {
long totalBytesTransferred = 0;
while (true) {
final int readByte = fis.read();
if (readByte < 0) break;
fos.write(readByte);
++totalBytesTransferred;
}
return totalBytesTransferred;
}
}
static public long copyFileSimpleFaster(final File pSourceFile, final File pSinkFile) throws IOException {
try (
final FileInputStream fis = new FileInputStream(pSourceFile);
final FileOutputStream fos = new FileOutputStream(pSinkFile);
BufferedInputStream bis = new BufferedInputStream(fis);
BufferedOutputStream bos = new BufferedOutputStream(fos);) {
long totalBytesTransferred = 0;
while (true) {
final int readByte = bis.read();
if (readByte < 0) break;
bos.write(readByte);
++totalBytesTransferred;
}
return totalBytesTransferred;
}
}
static public long copyFileFast(final File pSourceFile, final File pSinkFile) throws IOException {
try (
final FileInputStream fis = new FileInputStream(pSourceFile);
final FileOutputStream fos = new FileOutputStream(pSinkFile);) {
long totalBytesTransferred = 0;
final byte[] buffer = new byte[20 * 1024];
while (true) {
final int bytesRead = fis.read(buffer);
if (bytesRead < 0) break;
fos.write(buffer, 0, bytesRead);
totalBytesTransferred += bytesRead;
}
return totalBytesTransferred;
}
}
}
The hints that come along with that code:
There is the java.nio package that usualy does those things a lot faster and in less code.
Copying single bytes is 1'000-40'000 times slower that bulk copy.
Using try/resource/catch is the best way to avoid problems with reserved/locked resources like files etc.
If you solve something that is quite commonplace, I suggest you put it in a utility class of your own or even your own library.
There are helper classes like BufferedInputStream and BufferedOutputStream that take care of efficiency greatly; see example copyFileSimpleFaster().
But as usual, it is the quality of the concept that has the most impact on the implementation; see example copyFileFast().
There are even more advanced concepts (similar to java.nio), that take into account concepts like OS caching behaviour etc, which will give performance another kick.
Check my outputs, or run it on your own, to see the differences in performance:
Simple copy transferred 1608799 bytes in 12709ms
Simple+Fast copy transferred 1608799 bytes in 51ms
Fast copy transferred 1608799 bytes in 4ms
Test completed.

Related

java.io.UTFDataFormatException while reading file entry name

Im trying to "pack" several files (previously inside a jar archive) in another single non-jar file by using DataInputStream / DataOutputStream.
The idea was:
First int = number of entries
First UTF is the first entry name
Second Int is entry byte array length (entry size)
Then repeat for every entry.
The code:
public static void main(String[] args) throws Throwable {
test();
System.out.println("========================================================================================");
final DataInputStream dataInputStream = new DataInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut")));
for (int int1 = dataInputStream.readInt(), i = 0; i < int1; ++i) {
final String utf = dataInputStream.readUTF();
System.out.println("Entry name: " + utf);
final byte[] array = new byte[dataInputStream.readInt()];
for (int j = 0; j < array.length; ++j) {
array[j] = dataInputStream.readByte();
}
System.out.println("Entry bytes length: " + array.length);
}
}
Unpacking original & packing to new one:
private static void test() throws Throwable {
JarInputStream stream = new JarInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJar.jar")));
JarInputStream stream1 = new JarInputStream(new FileInputStream(new File("C:\\Users\\Admin\\Desktop\\randomJar.jar")));
final byte[] buffer = new byte[2048];
final DataOutputStream outputStream = new DataOutputStream(new FileOutputStream(new File("C:\\Users\\Admin\\Desktop\\randomJarOut")));
int entryCount = 0;
for (ZipEntry entry; (entry = stream.getNextJarEntry()) != null; ) {
entryCount++;
}
outputStream.writeInt(entryCount);
for (JarEntry entry; (entry = stream1.getNextJarEntry()) != null; ) {
int entryRealSize = stream1.read(buffer);
if (!(entryRealSize == -1)) {
System.out.println("Writing: " + entry.getName() + " Length: " + entryRealSize);
outputStream.writeUTF(entry.getName());
outputStream.writeInt(entryRealSize);
for (int len = stream1.read(buffer); len != -1; len = stream1.read(buffer)) {
outputStream.write(buffer, 0, len);
}
}
}
outputStream.flush();
outputStream.close();
}
Apparently im able to unpack the first entry without any problems, the second one and others:
Entry name: META-INF/services/org.jd.gui.spi.ContainerFactory
Entry bytes length: 434
Exception in thread "main" java.io.UTFDataFormatException: malformed input around byte 279
at java.io.DataInputStream.readUTF(DataInputStream.java:656)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at it.princekin.esercizio.Bootstrap.main(Bootstrap.java:29)
Disconnected from the target VM, address: '127.0.0.1:54384', transport: 'socket'
Process finished with exit code 1
Does anyone knows how to fix this? Why is this working for the first entry but not the others?

My take on this is that the jar file (which in fact is a zip file) has a Central Directory which is only read with the ZipFile (or JarFile) class.
The Central Directory contains some data about the entries such as the size.
I think the ZipInputStream will not read the Central Directory and thus the ZipEntry will not contain the size (returning -1 as it is unknown) whereas reading ZipEntry from ZipFile class will.
So if you first read the size of each entry using a ZipFile and store that in a map, you can easily get it when reading the data with the ZipInputStream.
This page includes some good examples as well.
So my version of your code would be:
import java.io.*;
import java.util.HashMap;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipInputStream;
public class JarRepacker {
public static void main(String[] args) throws Throwable {
JarRepacker repacker = new JarRepacker();
repacker.repackJarToMyFileFormat("commons-cli-1.3.1.jar", "randomJarOut.bin");
repacker.readMyFileFormat("randomJarOut.bin");
}
private void repackJarToMyFileFormat(String inputJar, String outputFile) throws Throwable {
int entryCount;
Map<String, Integer> sizeMap = new HashMap<>();
try (ZipFile zipFile = new ZipFile(inputJar)) {
entryCount = zipFile.size();
zipFile.entries().asIterator().forEachRemaining(e -> sizeMap.put(e.getName(), (int) e.getSize()));
}
try (final DataOutputStream outputStream = new DataOutputStream(new FileOutputStream(outputFile))) {
outputStream.writeInt(entryCount);
try (ZipInputStream stream = new ZipInputStream(new BufferedInputStream(new FileInputStream(inputJar)))) {
ZipEntry entry;
final byte[] buffer = new byte[2048];
while ((entry = stream.getNextEntry()) != null) {
final String name = entry.getName();
outputStream.writeUTF(name);
final Integer size = sizeMap.get(name);
outputStream.writeInt(size);
//System.out.println("Writing: " + name + " Size: " + size);
int len;
while ((len = stream.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
}
}
outputStream.flush();
}
}
private void readMyFileFormat(String fileToRead) throws IOException {
try (DataInputStream dataInputStream
= new DataInputStream(new BufferedInputStream(new FileInputStream(fileToRead)))) {
int entries = dataInputStream.readInt();
System.out.println("Entries in file: " + entries);
for (int i = 1; i <= entries; i++) {
final String name = dataInputStream.readUTF();
final int size = dataInputStream.readInt();
System.out.printf("[%3d] Reading: %s of size: %d%n", i, name, size);
final byte[] array = new byte[size];
for (int j = 0; j < array.length; ++j) {
array[j] = dataInputStream.readByte();
}
// Still need to do something with this array...
}
}
}
}

The problem, probably, lies in that you are mixing not reciprocal read/write methods:
The writer method writes with outputStream.writeInt(entryCount) and the main method reads with dataInputStream.readInt(). That is OK.
The writer method writes with outputStream.writeUTF(entry.getName()) and the main method reads with dataInputStream.readUTF(). That is OK.
The writer method writes with outputStream.writeInt(entryRealSize) and the main method reads with dataInputStream.readInt(). That is OK.
The writer method writes with outputStream.write(buffer, 0, len) and the main method reads with dataInputStream.readByte() several times. WRONG.
If you write an array of bytes with write(buffer, offset, len), you must read it with read(buffer, offset, len), because write(buffer, offset, len) writes exactly len physical bytes onto the output stream, while writeByte (the counterpart of readByte) writes a lot of metadata overhead about the object type, and then its state variables.
Bugs in the writer method
There is also a mayor bug in the writer method: It invokes up to three times stream1.read(buffer), but it just uses once the buffer contents. The result is that the real size of file is actually written onto the output stream metadata, but it is followed by just a small part of the data.
If you need to know the input file size before writing it in the output stream, you have two choices:
Either chose a large enough buffer size (like 204800) which will allow you to read the whole file in just one read and write it in just one write.
Or either separate read from write algorithms: First a method to read the whole file and store it in memory (a byte[], for example), and then another method to write the byte[] onto the output stream.
Full fixed solution
I've fixed your program, with specific, decoupled methods for each task. The process consists in parsing the input file to a memory model, write it to an intermediate file according to your custom definition, and then read it back.
public static void main(String[] args)
throws Throwable
{
File inputJarFile=new File(args[0]);
File intermediateFile=new File(args[1]);
List<FileData> fileDataEntries=parse(inputJarFile);
write(fileDataEntries, intermediateFile);
read(intermediateFile);
}
public static List<FileData> parse(File inputJarFile)
throws IOException
{
List<FileData> list=new ArrayList<>();
try (JarInputStream stream=new JarInputStream(new FileInputStream(inputJarFile)))
{
for (ZipEntry entry; (entry=stream.getNextJarEntry()) != null;)
{
byte[] data=readAllBytes(stream);
if (data.length > 0)
{
list.add(new FileData(entry.getName(), data));
}
stream.closeEntry();
}
}
return list;
}
public static void write(List<FileData> fileDataEntries, File output)
throws Throwable
{
try (DataOutputStream outputStream=new DataOutputStream(new FileOutputStream(output)))
{
int entryCount=fileDataEntries.size();
outputStream.writeInt(entryCount);
for (FileData fileData : fileDataEntries)
{
int entryRealSize=fileData.getData().length;
{
System.out.println("Writing: " + fileData.getName() + " Length: " + entryRealSize);
outputStream.writeUTF(fileData.getName());
outputStream.writeInt(entryRealSize);
outputStream.write(fileData.getData());
}
}
outputStream.flush();
}
}
public static void read(File intermediateFile)
throws IOException
{
try (DataInputStream dataInputStream=new DataInputStream(new FileInputStream(intermediateFile)))
{
for (int entryCount=dataInputStream.readInt(), i=0; i < entryCount; i++)
{
String utf=dataInputStream.readUTF();
int entrySize=dataInputStream.readInt();
System.out.println("Entry name: " + utf + " size: " + entrySize);
byte[] data=readFixedLengthBuffer(dataInputStream, entrySize);
System.out.println("Entry bytes length: " + data.length);
}
}
}
private static byte[] readAllBytes(InputStream input)
throws IOException
{
byte[] buffer=new byte[4096];
byte[] total=new byte[0];
int len;
do
{
len=input.read(buffer);
if (len > 0)
{
byte[] total0=total;
total=new byte[total0.length + len];
System.arraycopy(total0, 0, total, 0, total0.length);
System.arraycopy(buffer, 0, total, total0.length, len);
}
}
while (len >= 0);
return total;
}
private static byte[] readFixedLengthBuffer(InputStream input, int size)
throws IOException
{
byte[] buffer=new byte[size];
int pos=0;
int len;
do
{
len=input.read(buffer, pos, size - pos);
if (len > 0)
{
pos+=len;
}
}
while (pos < size);
return buffer;
}
private static class FileData
{
private final String name;
private final byte[] data;
public FileData(String name, byte[] data)
{
super();
this.name=name;
this.data=data;
}
public String getName()
{
return this.name;
}
public byte[] getData()
{
return this.data;
}
}

How to deal with a huge, one-line file in Java

I need to read a huge file (15+GB) and perform some minor modifications (add some newlines so a different parser can actually work with it). You might think that there are already answers for doing this normally:
Reading a very huge file in java
How to read a large text file line by line using Java?
but my entire file is on one line.
My general approach so far is very basic:
char[] buffer = new char[X];
BufferedReader reader = new BufferedReader(new ReaderUTF8(new FileInputStream(new File("myFileName"))), X);
char[] bufferOut = new char[X+a little];
int bytesRead = -1;
int i = 0;
int offset = 0;
long totalBytesRead = 0;
int countToPrint = 0;
while((bytesRead = reader.read(buffer)) >= 0){
for(i = 0; i < bytesRead; i++){
if(buffer[i] == '}'){
bufferOut[i+offset] = '}';
offset++;
bufferOut[i+offset] = '\n';
}
else{
bufferOut[i+offset] = buffer[i];
}
}
writer.write(bufferOut, 0, bytesRead+offset);
offset = 0;
totalBytesRead += bytesRead;
countToPrint += 1;
if(countToPrint == 10){
countToPrint = 0;
System.out.println("Read "+((double)totalBytesRead / originalFileSize * 100)+" percent.");
}
}
writer.flush();
After some experimentation, I've found that a value of X larger than a million gives optimal speed - it looks like I'm getting about 2% every 10 minutes, while a value of X of ~60,000 only got 60% in 15 hours. Profiling reveals that I'm spending 96+% of my time in the read() method, so that's definitely my bottleneck. As of writing this, my 8 million X version has finished 32% of the file after 2 hours and 40 minutes, in case you want to know how it performs long-term.
Is there a better approach for dealing with such a large, one-line file? As in, is there a faster way of reading this type of file that gives me a relatively easy way of inserting the newline characters?
I am aware that different languages or programs could probably handle this gracefully, but I'm limiting this to a Java perspective.

You are making this far more complicated than it should be. By just making use of the buffering already provided by the standard classes you should get a thorughput of at least several MB per second without any hassles.
This simple test program processes 1GB in less than 2 minutes on my PC (including creating the test file):
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Random;
public class TestFileProcessing {
public static void main(String[] argv) {
try {
long time = System.currentTimeMillis();
File from = new File("C:\\Test\\Input.txt");
createTestFile(from, StandardCharsets.UTF_8, 1_000_000_000);
System.out.println("Created file in: " + (System.currentTimeMillis() - time) + "ms");
time = System.currentTimeMillis();
File to = new File("C:\\Test\\Output.txt");
doIt(from, to, StandardCharsets.UTF_8);
System.out.println("Converted file in: " + (System.currentTimeMillis() - time) + "ms");
} catch (IOException e) {
throw new RuntimeException(e.getMessage(), e);
}
}
public static void createTestFile(File file, Charset encoding, long size) throws IOException {
Random r = new Random(12345);
try (OutputStream fout = new FileOutputStream(file);
BufferedOutputStream bout = new BufferedOutputStream(fout);
Writer writer = new OutputStreamWriter(bout, encoding)) {
for (long i=0; i<size; ++i) {
int c = r.nextInt(26);
if (c == 0)
writer.write('}');
else
writer.write('a' + c);
}
}
}
public static void doIt(File from, File to, Charset encoding) throws IOException {
try (InputStream fin = new FileInputStream(from);
BufferedInputStream bin = new BufferedInputStream(fin);
Reader reader = new InputStreamReader(bin, encoding);
OutputStream fout = new FileOutputStream(to);
BufferedOutputStream bout = new BufferedOutputStream(fout);
Writer writer = new OutputStreamWriter(bout, encoding)) {
int c;
while ((c = reader.read()) >= 0) {
if (c == '}')
writer.write('\n');
writer.write(c);
}
}
}
}
As you see no elaborate logic or excessive buffer sizes are used. What is used is simply buffering the streams closest to the hardware, the FileInput/OutputStream.

Error in my mp3 splitter program?

So, I made a program that splits a .mp3 file in Java. Basically, it works fine on some files but on some, the first split file encounters an error after playing some part. The other files work completely fine though.
I think it has something to do with how a file cannot be a multiple of the size of my array and there should be some mod value left. Can anybody please identify the error in this code and correct it?
(here, splitval = no. of splits to be made, filename1= the selected file)
int splitsize=filesize/splitval;
String filecalled;
try
{
byte []b=new byte[splitsize];
FileInputStream fis = new FileInputStream(filename1);
name1=filename2.replaceAll(".mp3", "");
for(int j=1;j<=splitval;j++)
{
filecalled=name1+"_split_"+j+".mp3";
FileOutputStream fos = new FileOutputStream(filecalled);
int i=fis.read(b);
fos.write(b, 0, i);
//System.out.println("no catch");
}
JOptionPane.showMessageDialog(this, "split process successful");
}
catch(IOException e)
{
System.out.println(e.getMessage());
}
Thanks in advance!
EDIT:
I edited the code as suggested, ran it. Here:
C:\Users\dell5050\Desktop\Julien.mp3 5383930 bytes
C:\Users\dell5050\Desktop\ Julien_split_1.mp3 1345984 bytes
C:\Users\dell5050\Desktop\ Julien_split_2.mp3 1345984 bytes
C:\Users\dell5050\Desktop\ Julien_split_3.mp3 1345984 bytes
C:\Users\dell5050\Desktop\ Julien_split_4.mp3 1345978 bytes
There is change in the last few bytes which means that the filesize%splitval is solved.. but still the first file in this.. containing '_split_1' has error while playing some of the last part.
The second file containing '_split_2' starts exactly where the first ended. So the split process is correct. Then, what exactly is the extra empty in the end of the first file?
Also, I noticed that the artwork and info of the original file carries over into the first file ONLY. No other files. Does it have something to do with that? Same thing doesnt happen in some other mp3 files.
CODE:
FileInputStream fis;
FileOutputStream fos;
int splitsize = (int)(filesize / splitval) + (int)(filesize % splitval);
byte[] b = new byte[splitsize];
System.out.println(filename1 + " " + filesize + " bytes");
try
{
fis = new FileInputStream(file);
name1 = filename2.replaceAll(".mp3", "");
for (int j = 1; j <= splitval; j++)
{
String filecalled = name1 + "_split_" + j + ".mp3";
fos = new FileOutputStream(filecalled);
int i = fis.read(b);
fos.write(b, 0, i);
fos.close();
System.out.println(filecalled + " " + i + " bytes");
}
}
catch(IOException ie)
{
System.out.println(ie.getMessage());
}

I doubt you could split a mp3 file just by copying n-bytes to a file and go to the next. Mp3 has a specific format and you'll probably need a library to handle this format.
EDIT regarding the size of the part files being all equal:
You are not writing all the bytes of the file to the split files. If you sum the sizes of all split files and compare it to the size of the original file you'll find out that your missing some bytes. This is because your loop runs from 1 to splitval and always writes the exact number of bytes to each part file i.e. splitsize. So the number of bytes your are missing is filesize % splitval.
To resolve this problem simply add filesize % splitval to splitsize. This way you'll not be missing any bytes. The files from 1 to splitval - 1 will have the same size, the last file will be smaller.
Here is a corrected version of your code with some additions to merge the split files in order to perform an assertion using SHA1-checksum.
Disclaimer - The output files are not expected to be proper mp3 files
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import junit.framework.Assert;
import org.junit.Test;
public class SplitFile {
#Test
public void splitFile() throws IOException, NoSuchAlgorithmException {
String filename1 = "mp3/Innocence_-_Nero.mp3";
File file = new File(filename1);
FileInputStream fis = null;
FileOutputStream fos = null;
long filesize = file.length();
long filesizeActual = 0L;
int splitval = 5;
int splitsize = (int)(filesize / splitval) + (int)(filesize % splitval);
byte[] b = new byte[splitsize];
System.out.println(filename1 + " " + filesize + " bytes");
try {
fis = new FileInputStream(file);
String name1 = filename1.replaceAll(".mp3", "");
String mergeFile = name1 + "_merge.mp3";
for (int j = 1; j <= splitval; j++) {
String filecalled = name1 + "_split_" + j + ".mp3";
fos = new FileOutputStream(filecalled);
int i = fis.read(b);
fos.write(b, 0, i);
fos.close();
fos = null;
System.out.println(filecalled + " " + i + " bytes");
filesizeActual += i;
}
Assert.assertEquals(filesize, filesizeActual);
mergeFileParts(filename1, splitval);
check(filename1, mergeFile);
} finally {
if(fis != null) {
fis.close();
}
if(fos != null) {
fos.close();
}
}
}
private void mergeFileParts(String filename1, int splitval) throws IOException {
FileInputStream fis = null;
FileOutputStream fos = null;
try {
String name1 = filename1.replaceAll(".mp3", "");
String mergeFile = name1 + "_merge.mp3";
fos = new FileOutputStream(mergeFile);
for (int j = 1; j <= splitval; j++) {
String filecalled = name1 + "_split_" + j + ".mp3";
File partFile = new File(filecalled);
fis = new FileInputStream(partFile);
int partFilesize = (int) partFile.length();
byte[] b = new byte[partFilesize];
int i = fis.read(b, 0, partFilesize);
fos.write(b, 0, i);
fis.close();
fis = null;
}
} finally {
if(fis != null) {
fis.close();
}
if(fos != null) {
fos.close();
}
}
}
private void check(String expectedPath, String actualPath) throws IOException, NoSuchAlgorithmException {
System.out.println("check...");
FileInputStream fis = null;
try {
File expectedFile = new File(expectedPath);
long expectedSize = expectedFile.length();
File actualFile = new File(actualPath);
long actualSize = actualFile.length();
System.out.println("exp=" + expectedSize);
System.out.println("act=" + actualSize);
Assert.assertEquals(expectedSize, actualSize);
fis = new FileInputStream(expectedFile);
String expected = makeMessageDigest(fis);
fis.close();
fis = null;
fis = new FileInputStream(actualFile);
String actual = makeMessageDigest(fis);
fis.close();
fis = null;
System.out.println("exp=" + expected);
System.out.println("act=" + actual);
Assert.assertEquals(expected, actual);
} finally {
if(fis != null) {
fis.close();
}
}
}
public String makeMessageDigest(InputStream is) throws NoSuchAlgorithmException, IOException {
byte[] data = new byte[1024];
MessageDigest md = MessageDigest.getInstance("SHA1");
int bytesRead = 0;
while(-1 != (bytesRead = is.read(data, 0, 1024))) {
md.update(data, 0, bytesRead);
}
return toHexString(md.digest());
}
private String toHexString(byte[] digest) {
StringBuffer sha1HexString = new StringBuffer();
for(int i = 0; i < digest.length; i++) {
sha1HexString.append(String.format("%1$02x", Byte.valueOf(digest[i])));
}
return sha1HexString.toString();
}
}
Output (for my test file)
mp3/Innocence_-_Nero.mp3 5048528 bytes
mp3/Innocence_-_Nero_split_1.mp3 1009708 bytes
mp3/Innocence_-_Nero_split_2.mp3 1009708 bytes
mp3/Innocence_-_Nero_split_3.mp3 1009708 bytes
mp3/Innocence_-_Nero_split_4.mp3 1009708 bytes
mp3/Innocence_-_Nero_split_5.mp3 1009696 bytes
check...
exp=5048528
act=5048528
exp=e81cf2dc65ab84e3df328e52d63a55301232b917
act=e81cf2dc65ab84e3df328e52d63a55301232b917

A better way to convert a directory of files into bytes

I have been messing with this for some time and it's getting better and better, but it's still a little slow for me. Can anyone help speed this up / make the design better, please?
Also, the files must only be numbers and the file must end with the file extension ".dat"
I never added the checks because I didn't feel is was necessary.
public void preloadModels() {
try {
File directory = new File(signlink.findcachedir() + "raw", File.separator);
File[] modelFiles = directory.listFiles();
for (int modelIndex = modelFiles.length - 1;; modelIndex--) {
String modelFileName = modelFiles[modelIndex].getName();
byte[] buffer = getBytesFromInputStream(new FileInputStream(new File(directory, modelFileName)));
Model.method460(buffer, Integer.parseInt(modelFileName.replace(".dat", "")));
}
} catch (Throwable e) {
return;
}
}
public static final byte[] getBytesFromInputStream(InputStream inputStream) throws IOException {
byte[] buffer = new byte[32 * 1024];
int bufferSize = 0;
for (;;) {
int read = inputStream.read(buffer, bufferSize, buffer.length - bufferSize);
if (read == -1) {
return Arrays.copyOf(buffer, bufferSize);
}
bufferSize += read;
if (bufferSize == buffer.length) {
buffer = Arrays.copyOf(buffer, bufferSize * 2);
}
}
}

I would do the following.
public void preloadModels() throws IOException {
File directory = new File(signlink.findcachedir() + "raw");
for (File file : directory.listFiles()) {
if (!file.getName().endsWith(".dat")) continue;
byte[] buffer = getBytesFromFile(file);
Model.method460(buffer, Integer.parseInt(file.getName().replace(".dat", "")));
}
}
public static byte[] getBytesFromFile(File file) throws IOException {
byte[] buffer = new byte[(int) file.length()];
try (DataInputStream dis = new DataInputStream(new FileInputStream(file))) {
dis.readFully(buffer);
return buffer;
}
}
If this is still too slow, most likely the limitation is the speed of hard drive.

How about using Apache Commons IOUtils class.
IOUtils.toByteArray(InputStream input)

I think the easiest way is to add all directories content to archive. Have a look at java.util.zip. It has some bugs with file names before 7th version. There is also Apache Commons implementation.

CharBuffer vs. char[]

Is there any reason to prefer a CharBuffer to a char[] in the following:
CharBuffer buf = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
while( in.read(buf) >= 0 ) {
out.append( buf.flip() );
buf.clear();
}
vs.
char[] buf = new char[DEFAULT_BUFFER_SIZE];
int n;
while( (n = in.read(buf)) >= 0 ) {
out.write( buf, 0, n );
}
(where in is a Reader and out in a Writer)?

No, there's really no reason to prefer a CharBuffer in this case.
In general, though, CharBuffer (and ByteBuffer) can really simplify APIs and encourage correct processing. If you were designing a public API, it's definitely worth considering a buffer-oriented API.

I wanted to mini-benchmark this comparison.
Below is the class I have written.
The thing is I can't believe that the CharBuffer performed so badly. What have I got wrong?
EDIT: Since the 11th comment below I have edited the code and the output time, better performance all round but still a significant difference in times. I also tried out2.append((CharBuffer)buff.flip()) option mentioned in the comments but it was much slower than the write option used in the code below.
Results: (time in ms)
char[] : 3411
CharBuffer: 5653
public class CharBufferScratchBox
{
public static void main(String[] args) throws Exception
{
// Some Setup Stuff
String smallString =
"1111111111222222222233333333334444444444555555555566666666667777777777888888888899999999990000000000";
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < 1000; i++)
{
stringBuilder.append(smallString);
}
String string = stringBuilder.toString();
int DEFAULT_BUFFER_SIZE = 1000;
int ITTERATIONS = 10000;
// char[]
StringReader in1 = null;
StringWriter out1 = null;
Date start = new Date();
for (int i = 0; i < ITTERATIONS; i++)
{
in1 = new StringReader(string);
out1 = new StringWriter(string.length());
char[] buf = new char[DEFAULT_BUFFER_SIZE];
int n;
while ((n = in1.read(buf)) >= 0)
{
out1.write(
buf,
0,
n);
}
}
Date done = new Date();
System.out.println("char[] : " + (done.getTime() - start.getTime()));
// CharBuffer
StringReader in2 = null;
StringWriter out2 = null;
start = new Date();
CharBuffer buff = CharBuffer.allocate(DEFAULT_BUFFER_SIZE);
for (int i = 0; i < ITTERATIONS; i++)
{
in2 = new StringReader(string);
out2 = new StringWriter(string.length());
int n;
while ((n = in2.read(buff)) >= 0)
{
out2.write(
buff.array(),
0,
n);
buff.clear();
}
}
done = new Date();
System.out.println("CharBuffer: " + (done.getTime() - start.getTime()));
}
}

If this is the only thing you're doing with the buffer, then the array is probably the better choice in this instance.
CharBuffer has lots of extra chrome on it, but none of it is relevant in this case - and will only slow things down a fraction.
You can always refactor later if you need to make things more complicated.

The difference, in practice, is actually <10%, not 30% as others are reporting.
To read and write a 5MB file 24 times, my numbers taken using a Profiler. They were on average:
char[] = 4139 ms
CharBuffer = 4466 ms
ByteBuffer = 938 (direct) ms
Individual tests a couple times favored CharBuffer.
I also tried replacing the File-based IO with In-Memory IO and the performance was similar. If you are trying to transfer from one native stream to another, then you are better off using a "direct" ByteBuffer.
With less than 10% performance difference, in practice, I would favor the CharBuffer. It's syntax is clearer, there's less extraneous variables, and you can do more direct manipulation on it (i.e. anything that asks for a CharSequence).
Benchmark is below... it is slightly wrong as the BufferedReader is allocated inside the test-method rather than outside... however, the example below allows you to isolate the IO time and eliminate factors like a string or byte stream resizing its internal memory buffer, etc.
public static void main(String[] args) throws Exception {
File f = getBytes(5000000);
System.out.println(f.getAbsolutePath());
try {
System.gc();
List<Main> impls = new java.util.ArrayList<Main>();
impls.add(new CharArrayImpl());
//impls.add(new CharArrayNoBuffImpl());
impls.add(new CharBufferImpl());
//impls.add(new CharBufferNoBuffImpl());
impls.add(new ByteBufferDirectImpl());
//impls.add(new CharBufferDirectImpl());
for (int i = 0; i < 25; i++) {
for (Main impl : impls) {
test(f, impl);
}
System.out.println("-----");
if(i==0)
continue; //reset profiler
}
System.gc();
System.out.println("Finished");
return;
} finally {
f.delete();
}
}
static int BUFFER_SIZE = 1000;
static File getBytes(int size) throws IOException {
File f = File.createTempFile("input", ".txt");
FileWriter writer = new FileWriter(f);
Random r = new Random();
for (int i = 0; i < size; i++) {
writer.write(Integer.toString(5));
}
writer.close();
return f;
}
static void test(File f, Main impl) throws IOException {
InputStream in = new FileInputStream(f);
File fout = File.createTempFile("output", ".txt");
try {
OutputStream out = new FileOutputStream(fout, false);
try {
long start = System.currentTimeMillis();
impl.runTest(in, out);
long end = System.currentTimeMillis();
System.out.println(impl.getClass().getName() + " = " + (end - start) + "ms");
} finally {
out.close();
}
} finally {
fout.delete();
in.close();
}
}
public abstract void runTest(InputStream ins, OutputStream outs) throws IOException;
public static class CharArrayImpl extends Main {
char[] buff = new char[BUFFER_SIZE];
public void runTest(InputStream ins, OutputStream outs) throws IOException {
Reader in = new BufferedReader(new InputStreamReader(ins));
Writer out = new BufferedWriter(new OutputStreamWriter(outs));
int n;
while ((n = in.read(buff)) >= 0) {
out.write(buff, 0, n);
}
}
}
public static class CharBufferImpl extends Main {
CharBuffer buff = CharBuffer.allocate(BUFFER_SIZE);
public void runTest(InputStream ins, OutputStream outs) throws IOException {
Reader in = new BufferedReader(new InputStreamReader(ins));
Writer out = new BufferedWriter(new OutputStreamWriter(outs));
int n;
while ((n = in.read(buff)) >= 0) {
buff.flip();
out.append(buff);
buff.clear();
}
}
}
public static class ByteBufferDirectImpl extends Main {
ByteBuffer buff = ByteBuffer.allocateDirect(BUFFER_SIZE * 2);
public void runTest(InputStream ins, OutputStream outs) throws IOException {
ReadableByteChannel in = Channels.newChannel(ins);
WritableByteChannel out = Channels.newChannel(outs);
int n;
while ((n = in.read(buff)) >= 0) {
buff.flip();
out.write(buff);
buff.clear();
}
}
}

I think that CharBuffer and ByteBuffer (as well as any other xBuffer) were meant for reusability so you can buf.clear() them instead of going through reallocation every time
If you don't reuse them, you're not using their full potential and it will add extra overhead. However if you're planning on scaling this function this might be a good idea to keep them there

You should avoid CharBuffer in recent Java versions, there is a bug in #subsequence(). You cannot get a subsequence from the second half of the buffer since the implementation confuses capacity and remaining. I observed the bug in java 6-0-11 and 6-0-12.

The CharBuffer version is slightly less complicated (one less variable), encapsulates buffer size handling and makes use of a standard API. Generally I would prefer this.
However there is still one good reason to prefer the array version, in some cases at least. CharBuffer was only introduced in Java 1.4 so if you are deploying to an earlier version you can't use Charbuffer (unless you roll-your-own/use a backport).
P.S If you use a backport remember to remove it once you catch up to the version containing the "real" version of the backported code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java FileInputStream FileOutputStream difference in the run - java

Related

java.io.UTFDataFormatException while reading file entry name

How to deal with a huge, one-line file in Java

Error in my mp3 splitter program?

A better way to convert a directory of files into bytes

CharBuffer vs. char[]

Categories

Resources