In Java, can I remove specific bytes from a file? - java

So far I managed to do something with Byte Stream : read the original file, and write in a new file while omitting the desired bytes (and then finish by deleting/renaming the files so that there's only one left).
I'd like to know if there's a way to directly modify the bytes without requiring to manipulate more than one file. The reason is because this has to be performed when there is low memory and the file is too big, so cloning the file before trimming it may not be the best option.

I'd like to know if there's a way to directly modify the bytes without requiring to manipulate more than one file.
There isn't a SAFE way to do it.
The unsafe way to do it involves (for example) mapping the file using a MappedByteBuffer, and shuffling the bytes around.
But the problem is that if something goes wrong while you are doing this, you are liable to end up with a corrupted file.
Therefore, if the user asks to perform this operation when the device's memory is too full to hold a second copy of the file, the best thing is to tell the user to "delete some files first".
The reason is because this has to be performed when there is low memory and the file is too big, so cloning the file before trimming it may not be the best option.
If you are primarily worried about "memory" on the storage device, see above.
If you are worried about RAM, then #RealSkeptic's observation is correct. You shouldn't need to hold the entire file in RAM at the same time. You can read, modify, write it a buffer at a time.

You can't remove bytes in the middle of the file without placing the rest of the file in memory. But you can replace bytes if it can help you.

Related

Best way to merge binary files in Java

I'm developing a basic download manager that can download a file over http using multiple connections. At the end of the download, I have several temp file containing each a part of the file.
I now want to merge them into a single file.
It's not hard to do so, simply create an output stream and input streams and pipe the inputs into the output in the good order.
But I was wondering: is there a way to do it more efficiently? I mean, from my understanding what will happen here is that the JVM will read byte per byte the inputs, and write byte per byte the output.
So basically I have :
- read byte from disk
- store byte in memory
- some CPU instructions will probably run and the byte will probably be copied into the CPU's cache
- write byte to the disk
I was wondering if there was a way to keep the process on the disk? I don't know if I'm understandable, but basically to tell the disk "hey disk, take these files of yours and make one with them"
In a short sentence, I want to reduce the CPU & memory usage to the lowest possible.
In theory it may be possible to do this operation on a file system level: you could append the block list from one inode to another without moving the data. This is not very practical though, most likely you would have to bypass your operating system and access the disk directly.
The next best thing may be using FileChannel.transferTo or transferFrom methods:
This method is potentially much more efficient than a simple loop that reads from this channel and writes to the target channel. Many operating systems can transfer bytes directly from the filesystem cache to the target channel without actually copying them.
You should also test reading and writing large blocks of bytes using streams or RandomAccessFile - it may still be faster than using channels. Here's a good article about testing sequential IO performance in Java.

How to evaluate size of file in Java before creating it?

In my Java programm I need to create files and write in it something that i can get by Inputstream's read() method. How can I evaluate the size of file before creating it?
Normally, you don't need to know how big the file will be, but if you really do:
The only way you could do that would be to fully read the content from the InputStream into memory first, and then see how much you have.
You have several options for how to read it all into memory, one of which might be to write it to a ByteArrayOutputStream. (And then, of course, write that out to the file when you're ready.)
But again, the great thing about streams is that you don't have to read things all into memory; if you can avoid needing to know the size in advance, that would be best.
Also note that the space the file will occupy on disk won't be exactly the same as the file size; most file systems work in chunks (4k, 8k, 16k, 32k) and so a file that's (say) 12k on a file system using 8k chunks will actually occupy 16k of space.
It's depend of the encoding used, but you can write it in memorystream and get the length.

Optimising Java's NIO for small files

We have a file I/O bottleneck. We have a directory which contains lots of JPEG files, and we want to read them in in real time as a movie. Obviously this is not an ideal format, but this is a prototype object tracking system and there is no possibility to change the format as they are used elsewhere in the code.
From each file we build a frame object which basically means having a buffered image and an explicit bytebuffer containing all of the information from the image.
What is the best strategy for this? The data is on a SSD which in theory has read/write rates around 400Mb/s, but in practice is reading no more than 20 files per second (3-4Mb/s) using the naive implementation:
bufferedImg = ImageIO.read(imageFile);[1]
byte[] data = ((DataBufferByte)bufferedImg.getRaster().getDataBuffer()).getData();[2]
imgBuf = ByteBuffer.wrap(data);
However, Java produces lots of possibilities for improving this.
(1) CHannels. Esp File Channels
(2) Gathering/Scattering.
(3) Direct Buffering
(4) Memory Mapped Buffers
(5) MultiThreading - use a bunch of callables to access many files simultaneously.
(6) Wrapping the files in a single large file.
(7) Other things I haven't thought of yet.
I would just like to know if anyone has extensively tested the different options, and knows what is optimal? I assume that (3) is a must, but I would still like to optimise the reading of a single file as far as possible, and am unsure of the best strategy.
Bonus Question: In the code snipped above, when does the JVM actually 'hit the disk' and read in the contents of the file, is it [1] or is that just a file handler which `points' to the object? It kind of makes sense to lazily evaluate but I don't know how the implementation of the ImageIO class works.
ImageIO.read(imageFile)
As it returns BufferedImage, I assume it will hit disk and just not file handler.

Adding files to zip java using memory while avoiding reserved file name problems

I want to add, remove or modify files in zip using the most effective way possible.
Yes, you may say what I should do is to unzip/zip files into file system, but if there is a file with special name like 'aux' or 'con' , It doesn't work in Windows as they are DOS device names, and also there might be filename encoding issues that prevents the process from working proberly. Another reason I don't just unzip to file system and re-zip is that it is much more slower and takes more disk space than just using RAM.
In image : http://i.stack.imgur.com/yPuYG.png
You could use a memory bases stream, like ByteArrayOutputStream to read/write the contents of the file.
The issue is the amount of available memory, because RAM is limited, you're going to need to store the output on something larger, like a disk eventually.
In order to try and optimism the process, you could set a preferred threshold for the read/write/process operation.
Basically you would run the process and calculate how long it took, based on the preferred threshold, adjust the buffer size for the next loop.
I would allow for a number of loops and average the time so your not trying to do fine control over the buffer that might actually slow you down

Shift the file while writing?

Is it possible to shift the contents of a file while writing to it using FileWriter?
I need to write data constants to the head of the file and if I do that it overwrites the file.
What technique should I use to do this or should I make make copies of the file (with the new data on top) on every file write?
If you want to overwrite certain bytes of the file and not others, you can use seek and write to do so. If you want to change the content of every byte in the file (by, for example, adding a single byte to the beginning of the file) then you need to write a new file and potentially rename it after you've done writing it.
Think of the answer to the question "what will be the contents of the byte at offset x after I'm done?". If, for a large percent of the possible values of x the answer is "not what it used to be" then you need a new file.
Rather than contending ourselves with the question "what will be the contents of the byte at offset x after I'm done?", lets change the mindset and ask why can't the file system or perhaps the hard disk firmware do : a) provide another mode of accessing the file [let's say, inline] b) increase the length of the file by the number of bytes added at the front or in the middle or even at the end c) move each byte that starts from the crossection by the newcontent.length positions
It would be easier and faster to handle these operations at the disk firmware or file system implementation level rather than leaving that job to the application developer. I hope file system writers or hard disk vendors would offer such feature soon.
Regards,
Samba

Categories