how can i read from a binary file? - java

I want to read a binary file that its size is 5.5 megabyte(a mp3 file). I tried it with fileinputstream but it took many attempts. If possible, I want to read file with a minimal waste of time.

You should try to use a BufferedInputStream around your FileInputStream. It will improve the performance significantly.
new BufferedInputStream(fileInputStream, 8192 /* default buffer size */);
Furthermore, I'd recommend to use the read-method that takes a byte array and fills it instead of the plain read.

There are useful utilities in FileUtils for reading a file at once. This is simpler and efficient for modest files up to 100 MB.
byte[] bytes = FileUtils.readFileToByteArray(file); // handles IOException/close() etc.

Try this:
public static void main(String[] args) throws IOException
{
InputStream i = new FileInputStream("a.mp3");
byte[] contents = new byte[i.available()];
i.read(contents);
i.close();
}
A more reliable version based on helpful comment from #Paul Cager & Liv related to available's and read's unreliability.
public static void main(String[] args) throws IOException
{
File f = new File("c:\\msdia80.dll");
InputStream i = new FileInputStream(f);
byte[] contents = new byte[(int) f.length()];
int read;
int pos = 0;
while ((read = i.read(contents, pos, contents.length - pos)) >= 1)
{
pos += read;
}
i.close();
}

Related

How do I decompress large files using Zstd-jni and Byte Buffers

I am trying to decompress a lot of 40 MB+ files as I download them in parallel using ByteBuffers and Channels. I am getting better throughput by using Channels than I do by using Streams and we need this to be a very high throughput system as we need to process 40 TB of files every day and this part of the process is currently the bottleneck. The files are compressed with zstd-jni. Zstd-jni has api's for decompressing byte buffers but I get an error when I use them. How do I decompress a byte buffer at a time using zstd-jni?
I found these examples in their tests, but unless I am missing something the examples using ByteBuffers seem to assume the entire input file fits in one ByteBuffer:
https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala
Below is my code for compressing and decompressing files. The compression code works great, but the decompression code then fails with an error of -70.
public static long compressFile(String inFile, String outFolder, ByteBuffer inBuffer, ByteBuffer compressedBuffer, int compressionLevel) throws IOException {
File file = new File(inFile);
File outFile = new File(outFolder, file.getName() + ".zs");
long numBytes = 0l;
try (RandomAccessFile inRaFile = new RandomAccessFile(file, "r");
RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
FileChannel inChannel = inRaFile.getChannel();
FileChannel outChannel = outRaFile.getChannel()) {
inBuffer.clear();
while(inChannel.read(inBuffer) > 0) {
inBuffer.flip();
compressedBuffer.clear();
long compressedSize = Zstd.compressDirectByteBuffer(compressedBuffer, 0, compressedBuffer.capacity(), inBuffer, 0, inBuffer.limit(), compressionLevel);
numBytes+=compressedSize;
compressedBuffer.position((int)compressedSize);
compressedBuffer.flip();
outChannel.write(compressedBuffer);
inBuffer.clear();
}
}
return numBytes;
}
public static long decompressFile(String originalFilePath, String inFolder, ByteBuffer inBuffer, ByteBuffer decompressedBuffer) throws IOException {
File outFile = new File(originalFilePath);
File inFile = new File(inFolder, outFile.getName() + ".zs");
outFile = new File(inFolder, outFile.getName());
long numBytes = 0l;
try (RandomAccessFile inRaFile = new RandomAccessFile(inFile, "r");
RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
FileChannel inChannel = inRaFile.getChannel();
FileChannel outChannel = outRaFile.getChannel()) {
inBuffer.clear();
while(inChannel.read(inBuffer) > 0) {
inBuffer.flip();
decompressedBuffer.clear();
long compressedSize = Zstd.decompressDirectByteBuffer(decompressedBuffer, 0, decompressedBuffer.capacity(), inBuffer, 0, inBuffer.limit());
System.out.println(Zstd.isError(compressedSize) + " " + compressedSize);
numBytes+=compressedSize;
decompressedBuffer.position((int)compressedSize);
decompressedBuffer.flip();
outChannel.write(decompressedBuffer);
inBuffer.clear();
}
}
return numBytes;
}
Yes, the static methods you use in your example assume the whole compressed file fits in one ByteBuffer. As far as I understand your requirements, you need streaming decompression using ByteBuffers. ZstdDirectBufferDecompressingStream already provides this:
https://static.javadoc.io/com.github.luben/zstd-jni/1.3.7-1/com/github/luben/zstd/ZstdDirectBufferDecompressingStream.html
and here is an example how to use it (from the tests):
https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala#L261-L302
but you have also to subclass it and override the "refill" method.
EDIT: here is a new test I just added that has exactly the same structure as your question - moving data beteen channels:
https://github.com/luben/zstd-jni/blob/master/src/test/scala/Zstd.scala#L540-L586

How to write a byte array to a part of exist file?

Pls help me this. I have a file(Not a text file). I read a part of the file then I convert it into a byte array and I do something with the array. So, Can I erase the part of the file and write my own byte array into?
Yes you can do that, provided in case of some files it may become corrupted. Here in the sample I am copying only half of the bytes from the real file and writing it to a new file which gives a partial image written to file.
public static void main(String[] args) throws IOException {
File file = new File("D:\\Penguins.jpg");
byte[] bFile = new byte[(int) file.length()];
FileInputStream fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
byte[] newArray = Arrays.copyOf(bFile, (int) file.length() / 2);
FileOutputStream out = new FileOutputStream("D:\\partialPenguins.jpg");
out.write(newArray);
out.close();
}
You can use RandomAccessFile. Move the file-pointer to the required position and rewrite
Read it into the memory, convert, do whatever you wish, assemble a whole file, and write in it's entiriety to the disk, replacing the old one.

How can I read a specific number of bytes from a FileInputStream object using buffers

I have a series of objects stored within a file concatenated as below:
sizeOfFile1 || file1 || sizeOfFile2 || file2 ...
The size of the files are serialized long objects and the files are just the raw bytes of the files.
I am trying to extract the files from the input file. Below is my code:
FileInputStream fileInputStream = new FileInputStream("C:\Test.tst");
ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
while (fileInputStream.available() > 0)
{
long size = (long) objectInputStream.readObject();
FileOutputStream fileOutputStream = new FileOutputStream("C:\" + size + ".tst");
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
int chunkSize = 256;
final byte[] temp = new byte[chunkSize];
int finalChunkSize = (int) (size % chunkSize);
final byte[] finalTemp = new byte[finalChunkSize];
while(fileInputStream.available() > 0 && size > 0)
{
if (fileInputStream.available() > finalChunkSize)
{
int i = fileInputStream.read(temp);
secBufferedOutputStream.write(temp, 0, i);
size = size - i;
}
else
{
int i = fileInputStream.read(finalTemp);
secBufferedOutputStream.write(finalTemp, 0, i);
size = 0;
}
}
bufferedOutputStream.close();
}
fileOutputStream.close();
My code fails after it reads the first sizeOfFile; it just reads the rest of the input file into one file when there are multiple files stored.
Can anyone see the issue here?
Regards.
Wrap it in a DataInputStream and use readFully(byte[]).
But I question the design. Serialization and random access do not mix. It sounds like you should be using a database.
NB you are misusing available(). See the method's Javadoc page. It is never correct to use it as a count of the total number of bytes in the stream. There are few if any correct uses of available(), and this isn't one of them.
you could try NIO instead...
FileChannel roChannel = new RandomAccessFile(file, "r").getChannel();
ByteBuffer roBuf = roChannel.map(FileChannel.MapMode.READ_ONLY, 0, SIZE);
This reads only SIZE bytes from the file.
B
This is using DataInput to read longs. In this particular case I am not using readFully() as a segment might be too long to keep it in memory:
DataInputStream in = new DataInputStream(FileInputStream());
byte[] buf = new byte[64*1024];
while(true) {
OutputStream out = ...;
long size;
try { size = in.readLong(); } catch (EOFException e) { break; }
while(size > 0) {
int len = (size > buf.length)?buf.length:size;
len = in.read(buf, 0, len);
out.write(buf, 0, len);
size-=len;
}
out.close();
}
Save yourself a lot of trouble by doing one of these things:
Switch to using Avro, trust me you would be crazy not to. It's easy to learn, and will accomodate schema changes. Using ObjectXXXStream is one of the worst ideas ever, as soon as you change your schema your old files are garbage.
or use Thrift
or use Hibernate (but this is probably not a great option, hibernate takes a lot of time to learn, and takes a lot of configuration)
If you really refuse to switch to avro, I recommend reading up on apache's IOUtils class. It has a method to copy from one input stream to another, saving you a lot of headaches. Unfortunately what you want to do is a little more complicated, you want the size prefixing each file. You might be able to use a combination of SequenceInputStream objects to do that.
There is also GzipOutputStream and ZipOutputStream, but I think those require some other jars added to your classpath too.
I'm not going to write an example because I honestly think you should just learn avro or thrift and use that.

Java - File To Byte Array - Fast One

I want to read a file into a byte array. So, I am reading it using:
int len1 = (int)(new File(filename).length());
FileInputStream fis1 = new FileInputStream(filename);
byte buf1[] = new byte[len1];
fis1.read(buf1);
However, it is realy very slow. Can anyone inform me a very fast approach (possibly best one) to read a file into byte array. I can use java library also if needed.
Edit: Is there any benchmark which one is faster (including library approach).
It is not very slow, at least there is not way to make it faster. BUT it is wrong. If file is big enough the method read() will not return all bytes from fist call. This method returns number of bytes it managed to read as return value.
The right way is to call this method in loop:
public static void copy(InputStream input,
OutputStream output,
int bufferSize)
throws IOException {
byte[] buf = new byte[bufferSize];
int bytesRead = input.read(buf);
while (bytesRead != -1) {
output.write(buf, 0, bytesRead);
bytesRead = input.read(buf);
}
output.flush();
}
call this as following:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
copy(new FileInputStream(myfile), baos);
byte[] bytes = baos.toByteArray();
Something like this is implemented in a lot of packages, e.g. FileUtils.readFileToByteArray() mentioned by #Andrey Borisov (+1)
EDIT
I think that reason for slowness in your case is the fact that you create so huge array. Are you sure you really need it? Try to re-think your design. I believe that you do not have to read this file into array and can process data incrementally.
apache commons-io FileUtils.readFileToByteArray

Best way to write String to file using java nio

I need to write(append) huge string to flat file using java nio. The encoding is ISO-8859-1.
Currently we are writing as shown below. Is there any better way to do the same ?
public void writeToFile(Long limit) throws IOException{
String fileName = "/xyz/test.txt";
File file = new File(fileName);
FileOutputStream fileOutputStream = new FileOutputStream(file, true);
FileChannel fileChannel = fileOutputStream.getChannel();
ByteBuffer byteBuffer = null;
String messageToWrite = null;
for(int i=1; i<limit; i++){
//messageToWrite = get String Data From database
byteBuffer = ByteBuffer.wrap(messageToWrite.getBytes(Charset.forName("ISO-8859-1")));
fileChannel.write(byteBuffer);
}
fileChannel.close();
}
EDIT: Tried both options. Following are the results.
#Test
public void testWritingStringToFile() {
DiagnosticLogControlManagerImpl diagnosticLogControlManagerImpl = new DiagnosticLogControlManagerImpl();
try {
File file = diagnosticLogControlManagerImpl.createFile();
long startTime = System.currentTimeMillis();
writeToFileNIOWay(file);
//writeToFileIOWay(file);
long endTime = System.currentTimeMillis();
System.out.println("Total Time is " + (endTime - startTime));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
/**
*
* #param limit
* Long
* #throws IOException
* IOException
*/
public void writeToFileNIOWay(File file) throws IOException {
FileOutputStream fileOutputStream = new FileOutputStream(file, true);
FileChannel fileChannel = fileOutputStream.getChannel();
ByteBuffer byteBuffer = null;
String messageToWrite = null;
for (int i = 1; i < 1000000; i++) {
messageToWrite = "This is a test üüüüüüööööö";
byteBuffer = ByteBuffer.wrap(messageToWrite.getBytes(Charset
.forName("ISO-8859-1")));
fileChannel.write(byteBuffer);
}
}
/**
*
* #param limit
* Long
* #throws IOException
* IOException
*/
public void writeToFileIOWay(File file) throws IOException {
FileOutputStream fileOutputStream = new FileOutputStream(file, true);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(
fileOutputStream, 128 * 100);
String messageToWrite = null;
for (int i = 1; i < 1000000; i++) {
messageToWrite = "This is a test üüüüüüööööö";
bufferedOutputStream.write(messageToWrite.getBytes(Charset
.forName("ISO-8859-1")));
}
bufferedOutputStream.flush();
fileOutputStream.close();
}
private File createFile() throws IOException {
File file = new File(FILE_PATH + "test_sixth_one.txt");
file.createNewFile();
return file;
}
Using ByteBuffer and Channel: took 4402 ms
Using buffered Writer : Took 563 ms
UPDATED:
Since Java11 there is a specific method to write strings using java.nio.file.Files:
Files.writeString(Paths.get(file.toURI()), "My string to save");
We can also customize the writing with:
Files.writeString(Paths.get(file.toURI()),
"My string to save",
StandardCharsets.UTF_8,
StandardOpenOption.CREATE,
StandardOpenOption.TRUNCATE_EXISTING);
ORIGINAL ANSWER:
There is a one-line solution, using Java nio:
java.nio.file.Files.write(Paths.get(file.toURI()),
"My string to save".getBytes(StandardCharsets.UTF_8),
StandardOpenOption.CREATE,
StandardOpenOption.TRUNCATE_EXISTING);
I have not benchmarked this solution with the others, but using the built-in implementation for open-write-close file should be fast and the code is quite small.
I don't think you will be able to get a strict answer without benchmarking your software. NIO may speed up the application significantly under the right conditions, but it may also make things slower.
Here are some points:
Do you really need strings? If you store and receive bytes from you database you can avoid string allocation and encoding costs all together.
Do you really need rewind and flip? Seems like you are creating a new buffer for every string and just writing it to the channel. (If you go the NIO way, benchmark strategies that reuse the buffers instead of wrapping / discarding, I think they will do better).
Keep in mind that wrap and allocateDirect may produce quite different buffers. Benchmark both to grasp the trade-offs. With direct allocation, be sure to reuse the same buffer in order to achieve the best performance.
And the most important thing is: Be sure to compare NIO with BufferedOutputStream and/or BufferedWritter approaches (use a intermediate byte[] or char[] buffer with a reasonable size as well). I've seen many, many, many people discovering that NIO is no silver bullet.
If you fancy some bleeding edge... Back to IO Trails for some NIO2 :D.
And here is a interesting benchmark about file copying using different strategies. I know it is a different problem, but I think most of the facts and author conclusions also apply to your problem.
Cheers,
UPDATE 1:
Since #EJP tiped me that direct buffers wouldn't be efficient for this problem, I benchmark it myself and ended up with a nice NIO solution using nemory-mapped files. In my Macbook running OS X Lion this beats BufferedOutputStream by a solid margin. but keep in mind that this might be OS / Hardware / VM specific:
public void writeToFileNIOWay2(File file) throws IOException {
final int numberOfIterations = 1000000;
final String messageToWrite = "This is a test üüüüüüööööö";
final byte[] messageBytes = messageToWrite.
getBytes(Charset.forName("ISO-8859-1"));
final long appendSize = numberOfIterations * messageBytes.length;
final RandomAccessFile raf = new RandomAccessFile(file, "rw");
raf.seek(raf.length());
final FileChannel fc = raf.getChannel();
final MappedByteBuffer mbf = fc.map(FileChannel.MapMode.READ_WRITE, fc.
position(), appendSize);
fc.close();
for (int i = 1; i < numberOfIterations; i++) {
mbf.put(messageBytes);
}
}
I admit that I cheated a little by calculating the total size to append (around 26 MB) beforehand. This may not be possible for several real world scenarios. Still, you can always use a "big enough appending size for the operations and later truncate the file.
UPDATE 2 (2019):
To anyone looking for a modern (as in, Java 11+) solution to the problem, I would follow #DodgyCodeException's advice and use java.nio.file.Files.writeString:
String fileName = "/xyz/test.txt";
String messageToWrite = "My long string";
Files.writeString(Paths.get(fileName), messageToWrite, StandardCharsets.ISO_8859_1);
A BufferedWriter around a FileWriter will almost certainly be faster than any NIO scheme you can come up with. Your code certainly isn't optimal, with a new ByteBuffer per write, and then doing pointless operations on it when it is about to go out of scope, but in any case your question is founded on a misconception. NIO doesn't 'offload the memory footprint to the OS' at all, unless you're using FileChannel.transferTo/From(), which you can't in this instance.
NB don't use a PrintWriter as suggested in comments, as this swallows exceptions. PW is really only for consoles and log files where you don't care.
Here is a short and easy way. It creates a file and writes the data relative to your code project:
private void writeToFile(String filename, String data) {
Path p = Paths.get(".", filename);
try (OutputStream os = new BufferedOutputStream(
Files.newOutputStream(p, StandardOpenOption.CREATE, StandardOpenOption.APPEND))) {
os.write(data.getBytes(), 0, data.length());
} catch (IOException e) {
e.printStackTrace();
}
}
This works for me:
//Creating newBufferedWritter for writing to file
BufferedWritter napiš = Files.newBufferedWriter(Paths.get(filePath));
napiš.write(what);
//Don't forget for this (flush all what you write to String write):
napiš.flush();

Categories