Copy and move files in Java, explanation and comparison of different approaches

Copy and move files in Java, explanation and comparison of different approaches - java

I implement a file manipulation functionality, and I paid attention that Java provides multiple techniques to copy and move files. Below you can find code snippets, briefly describing these approaches:
Approach #1:
File from = new File(src.getPath());
File to = new File(dst.getPath());
from.renameTo(to);
Approach #2:
FileChannel inChannel = new FileInputStream(src).getChannel();
FileChannel outChannel = new FileOutputStream(dst).getChannel();
inChannel.transferTo(0, inChannel.size(), outChannel);
Approach #3:
InputStream in = getContentResolver().openInputStream(selectedImageUri);
OutputStream out = new FileOutputStream("/sdcard/wallpapers/" + wall);
byte[] buffer = new byte[1024];
int read;
while ((read = in.read(buffer)) != -1) {
out.write(buffer, 0, read);
}
Approach #4:
import static java.nio.file.StandardCopyOption.*;
Files.copy(source, target, REPLACE_EXISTING);
All these approaches work, but I can't grasp when should I use each of them? What are the pros and cons of each of these methods, especially from the performance and the reliability points of view? Is there any specific scenario when I have to prefer one technique over another?

It is already discussed enough here and the following is from here
Your first approach is File rename that has nothing to do with File copy
java.io.File class doesn’t have any shortcut method to copy file from source to destination.
1. Using Stream: This is the conventional way of file copy in java, here we create two Files, source and destination. Then we create InputStream from source and write it to destination file using OutputStream.
2. Using java.nio.channels.FileChannel: Java NIO classes were introduced in Java 1.4 and FileChannel can be used to copy file in java. According to transferFrom() method javadoc, this way of copy file is supposed to be faster than using Streams to copy files.
3. Using Apache Commons IO: Apache Commons IO FileUtils.copyFile(File srcFile, File destFile) can be used to copy file in java. If you are already using Apache Commons IO in your project, it makes sense to use this for code simplicity. Internally it uses Java NIO FileChannel, so you can avoid this wrapper method if you are not already using it for other functions.
4. Java 7 Files class: If you are working on Java 7, you can use Files class copy() method to copy file in java. It uses File System providers to copy the files.
Now to see which one of these methods is more efficient we will copy a large file[1 GB] using each one of them in a simple program. To avoid any performance speedups from caching we are going to use four different source files and four different destination files.{Refer code in link}
Time taken by FileStreams Copy = 127572360
Time taken by FileChannels Copy = 10449963
Time taken by Java7 Files Copy = 10808333
Time taken by Apache Commons IO Copy = 17971677
From the output it’s clear that Stream Copy is the best way to copy File in Java. FileChannels is the best way to copy large files. If you work with even larger files you will notice a much bigger speed difference

We can divide your four approaches into two types:
Use a built-in standard library method (such as File.renameTo() and Files.move()).
Do the work ourselves - by copying bytes from source to target.
First, note that File doesn't have a copy method, so you only have one option for built-in, standard library method when you're talking about copy.
Also note that "do the work ourselves" when renaming is going to be very bad - you're going to copy the entire file, then delete the old file. Not a good or efficient approach. In most cases, renaming/moving within the same filesystem requires just changing file metadata without actually touching the content, so it's really a lot better to use a standard library.
So you have two cases:
Renaming
The options are really using either File.renameTo() or Files.move(). No point in using streams and copying data.
File is an outdated class. It shouldn't really be used anymore. There is an excellent explanation why, which sums up to the fact that File doesn't give you any information when any of its standard methods fail, whereas Files provides you with very accurate exceptions when that happens.
Copying
You have two choices - either use Files.copy() or one of the "do it yourself" approaches.
By far, if what you are copying are actual files, your choice should be Files.copy(). There is no need to re-invent the wheel. It does exactly what you want, is well documented, you're not likely to introduce bugs accidentally. And yes, it's very efficient.
Files.copy() relies on underlying "providers" for its operation. What it means is that there are specialized vendor (or operating system) specific classes that do the operation that is the most efficient for that filesystem. Whether it's a Linux filesystem or a Windows one, the copy will be optimized for it. There are even providers for specialized cases, such as zip files, so you can copy files inside a zip, jar or war file using Files.copy() - which is a lot more complicated if you try the "do it yourself" approach.
Besides, Files.copy() checks lots of things that you might forget when you write "your own" copy. For example, did you remember to check that the file that you are reading from and the file you are writing to are not the same file? It could cause serious trouble. Files.copy() does it. It checks permissions, it checks if the target of the copy is a directory, and so on. So it's very reliable.
So why do you have the option to do "your own"? Because well, Java is a general-purpose language. You have the option to read from a file, the option to write to a file, so you can write your own "copy" method. That doesn't mean you should.
Note that in your "approach #3", the "source" file is not actually a file! It's produced from an Image URI, which means it could be a network source. When your source is not a file, but a stream or channel based on a socket, database BLOB, web server request etc., you can't really use Files.copy(). This is where you'd need to write your own.
Actually, Files also has options for copying from a file to an OutputStream or from an InputStream to a file, so if one side of the copy is a stream and the other a file, you can use that. It will be readable, safe, and throw meaningful exceptions.
So write your own copy:
when you need to move data from sources to targets which are not files,
when you need to filter or process the data somehow rather than copy as-is from source to target,
when you are using old versions of Java, prior to 1.7. In this case, channels would probably be better than streams.

Related

How to convert a file directory into a List of InputStreams with Java 8

I want to covert all files inside ROOT directory into java 8 stream of InputStream.
suppose there are a lot of files inside the ROOT
for example file OS:
ROOT
|--> file1
|--> file2
I want to do somethink but using java 8 with the best PERFORMANCE!!
List<InputStream> ins = new ArrayList<>();
File[] arr = new File(ROOT).listFiles()
for(File file: arr){
ins.add(FileUtils.openInputStream(file));
}

Short answer: best solution is the one that you have dismissed:
Use File.listFiles() (or equivalent) to iterate over the files in each directory.
Use recursion for nested directories.
Lets start with the performance issue. When you are uploading a large number of individual files to cloud storage, the performance bottleneck is likely to be the network and the remote server:
Unless you have an extraordinarily good network link, a single TCP stream won't transfer data anywhere like as fast as it can be read from disk (or written at the other end).
Each time you transfer a file, there is likely to be an overhead for starting the new file. The remote server has to create the new file, which entails adding a directory entry, and inode to hold the metadata, etc.
Even on the sending side, the OS and disc overheads of reading directories and metadata are likely to dominate the Java overheads.
(But don't just trust what I say ... measure it!)
The chances are that the above overheads will be orders of magnitude greater than you can get by tweaking the Java-side file traversal.
But ignoring the above, I don't think that using the Java 8 Stream paradigm would help anyway. AFAIK, there are no special high performance "adapters" for applying streams to directory entries, so you would most likely end up with a Stream wrapper for the result of listFiles() calls. And that would not improve performance.
(You might get some benefit from parallel streams, but I don't think you will get enough control over the parallelism.)
Furthermore, you would need to deal with the fact that if your Java 8 Stream produces InputStream or similar handles, then you need to make sure that those handles are properly closed. You can't just close them all at the end, or rely on the GC to finalize them. If you do either of those, you risk running out of file descriptors.

InputStream is = new ByteArrayInputStream(Files.readAllBytes(Paths.get(Root)));

How would I create my own zip like file?

I want to be able to create a file that'll act like a zip but at the same time it isn't an actual zip.
Let's say I have a program that'll take a bunch of files and directories and store them into a single file with a name and extention of data.rds and you would need the same program to extract them out of it. I've seen in lots of different games that they use file formats such as .arc, .nsa, .mxdl etc which all store many files inside of them, .rar is probably the most commonly known format. The four extentions can't be opened as a normal zip and require a specific program in order to extract the files from them, I want to learn as to how you would encrypt and decrypt many files into a single one without making it readable like it would be in a normal zip file.
Pretty much how would one go about doing this? I know it would be a long process and won't be answered with a few simple lines of code but if someone could point me in a direction towards learning as to how to do such a thing, that would help helpful.

No matter what format you invent, someone will figure it out. Anyone can decompile your code and see your algorithm.
I would just use the Zip format and give the file a different extension (which it sounds like you're already doing). An easy way to keep casual observers from opening your file is to put a couple junk bytes at the front of it:
private static final byte[] secretSignature = { 10, 20 };
void writeData(Path file)
throws IOException {
try (OutputStream out = new BufferedOutputStream(
Files.newOutputStream(file))) {
out.write(secretSignature);
ZipOutputStream zip = new ZipOutputStream(out);
// Write zip entries
zip.finish();
}
}
void readData(Path file)
throws IOException {
try (InputStream in = new BufferedInputStream(
Files.newInputStream(file))) {
in.skip(secretSignature.length);
ZipInputStream zip = new ZipInputStream(in);
ZipEntry entry;
while ((entry = zip.getNextEntry()) != null) {
// Read entry
}
}
}

You could approach it like this:
1) start with an application that does "simply" store the contents of directories, list of files, ... in a single file. Meaning: learn how to collect all these files; and how to push them into a single uncompressed archive (and of course: ensure that you can extract things afterwards again)
2) when that step is working (and properly and extensively tested); then add a "compression" resp. "decompression" step.
Your favorite search engine will give you many results when searching for "compression algorithms".

It depends on your goal.
I'm going to assume you wish to write your own algorithm for fun.
If you just want to pack things together and encrypt them, well, just take the files you need and write their binary content in a sequential manner, prepending at the start of the file something like an index table, that tells you where in the big-file each file starts. Then encrypt everything using your algorithm of choice.
If you want to also compress them, the simplest algorithm I feel suggesting you to implement is Huffman encoding of your binary content. Note that, while simple enough in theory, it can still be quite an ordeal to implement, so think carefully if it's worth it or if you can rely on something off-the-shelf.
Bottom line: if you are doing it to teach yourself something, go for it. If you need it in a bigger project where the end goal isn't learning these things, just take something that already exists.

I sense that you are more concerned about authenticity, that is, that the archive is not modified. I will further assume that you don't really want to implement your own compression algorithms.
That being said, what you could is the following:
Create a zip with different extension.
Compute the SHA1 hash of the file
Use the SHA1 hash to check if that archive hasn't been changed.

reading files from memory instead of disk

I have a Java project with a huge set of XML files (>500). Reading this files at runtime leads to performance issues.
Is there an option to load all the XML files to RAM and read from there instead of the disk?
I know there are products like RamDisk but this one is a commercial tool.
Can I copy XML files to main memory and read from main memory using any existing Java API / libraries?

I would first try memory mapped files, as provided by RandomAccessFile and FileChannel in standard java library. This way OS will be able to keep the frequently used file content in memory, effectively achieving what you want.

You can use In-Memory databases to store intermediate files (XML files). This will give the speed of using ram and db together.
For reference use the following links:
http://www.mcobject.com/in_memory_database
Usage of H2 as in memory database:
http://www.javatips.net/blog/2014/07/h2-in-memory-database-example

Use java.io.RandomAccessFile class. It behaves like a large array of bytes stored in the file system. Instances of this class support both reading and writing to a random access file.
Also I would suggest using a MemoryMappedFile, to read the file directly from the disk instead of loading it in memory.
RandomAccessFile file = new RandomAccessFile("wiki.txt", "r");
FileChannel channel = file.getChannel();
MappedByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, 1024*50);
And then you can read the buffer as usual.

have you considered creating an object structure for these files and serializing them, java object serialization and deserialization is much faster than parsing an XML, this is again considering that these 500 or so XML files don't get modified between reads.
here is an article which talks about serializing and deserializing.
if the concern is to load file content into memory, then consider ByteArrayInputStream, ByteArrayOutputStream classes maybe even use ByteBuffer, these can store the bytes in memory

Java object serialization/deserialization is not faster than XML writing and parsing in general. When large numbers of objects are involved Java serialization/deserialization can actually be very inefficient, because it tracks each individual object (so that repeated references aren't serialized more than once). This is great for networks of objects, but for simple tree structures it adds a lot of overhead with no gains.
Your best approach is probably to just use a fast technique for processing the XML (such as javax.xml.stream.XMLStreamReader). Unless the files are huge, that 30-40 seconds time to load the XML files is way out of line - you're probably using an inefficient approach to processing the XML, such as loading them into a DOM. You can also try reading multiple files in parallel (such as by using Java 8 parallel Streams).

Looks like your main issue is large number of files and RAM is not an issue. Can you confirm?
Is it possible that you do a preprocessing step where you append all these files using some kind of separator and create a big file? This way you can increase the block size of your reads and avoid the performance penalty of disk seeks.

Have you thought about compressing the XML files and reading in those compressed XML files? Compressed XML could be as little as 3-5% the size of the original or better. You can uncompress it when it is visible to users and then store it compressed again for further reading.
Here is a library I found that might help:
zip4j

It all depends, whether you read the data more than once or not.
Assuming we use some sort of Java-based-RamDisk (it would actually be some sort of Buffer or Byte-array).
Further assume the time to process the data takes less than reading from. So you have to read it at least one single time. So it would make no difference if you'd read it first from disk-to-memory and then process it from memory.
If you would read a file more than once, you could read all the files into memory (various options, Buffer, Byte-Arrays, custom FileSystem, ...).
In case processing takes longer than reading (which seems not to be the case), you could pre-fetch the files from disk using a separate thread - and process the data from memory using another thread.

Optimising Java's NIO for small files

We have a file I/O bottleneck. We have a directory which contains lots of JPEG files, and we want to read them in in real time as a movie. Obviously this is not an ideal format, but this is a prototype object tracking system and there is no possibility to change the format as they are used elsewhere in the code.
From each file we build a frame object which basically means having a buffered image and an explicit bytebuffer containing all of the information from the image.
What is the best strategy for this? The data is on a SSD which in theory has read/write rates around 400Mb/s, but in practice is reading no more than 20 files per second (3-4Mb/s) using the naive implementation:
bufferedImg = ImageIO.read(imageFile);[1]
byte[] data = ((DataBufferByte)bufferedImg.getRaster().getDataBuffer()).getData();[2]
imgBuf = ByteBuffer.wrap(data);
However, Java produces lots of possibilities for improving this.
(1) CHannels. Esp File Channels
(2) Gathering/Scattering.
(3) Direct Buffering
(4) Memory Mapped Buffers
(5) MultiThreading - use a bunch of callables to access many files simultaneously.
(6) Wrapping the files in a single large file.
(7) Other things I haven't thought of yet.
I would just like to know if anyone has extensively tested the different options, and knows what is optimal? I assume that (3) is a must, but I would still like to optimise the reading of a single file as far as possible, and am unsure of the best strategy.
Bonus Question: In the code snipped above, when does the JVM actually 'hit the disk' and read in the contents of the file, is it [1] or is that just a file handler which `points' to the object? It kind of makes sense to lazily evaluate but I don't know how the implementation of the ImageIO class works.

ImageIO.read(imageFile)
As it returns BufferedImage, I assume it will hit disk and just not file handler.

Java: Where can I find advanced file manipulation source/libraries?

I'm writing arbitrary byte arrays (mock virus signatures of 32 bytes) into arbitrary files, and I need code to overwrite a specific file given an offset into the file. My specific question is: is there source code/libraries that I can use to perform this particular task?
I've had this problem with Python file manipulation as well. I'm looking for a set of functions that can kill a line, cut/copy/paste, etc. My assumptions are that these are extremely common tasks, and I couldn't find it in the Java API nor my google searches.
Sorry for not RTFM well; I haven't come across any information, and I've been looking for a while now.

Maybe you are looking for something like the RandomAccessFile class in the standard Java JDK. It supports reads and writes at some offset, as well as byte arrays.

Java's RandomAccessFile is exactly what you want.
It includes methods like seek(long) that allow you to move wherever you need in the file. It also allows for reading and writing at the same time.

As far as I know, Java has primarily lower level functions for manipulating files directly. Here is the best I've come up with
The actions you describe are standard in the Swing world, and for text comes down to manipulating a Document object. These act on data in memory. The class java.nio.channels.FileChannel has similar methods that act directly on a file. Neither fine the end of lines automatically, but other classes in java.io and java.nio do.
Apache Commons has a sandbox library called Flatfile which looks like it does what you want. The problem is that no code has been released yet. You may, however, want to talk to people working on it to get some more ideas. I didn't do a general check on libraries.

Have you looked into File/FileReader/FileWriter/BufferedReader? You can get the contents of the files and manipulate it as you like, you can search the data in the files, you can overwrite files, create new, append to an existing....
I am not sure this is exactly what you are asking for but I use these APIs all the time for logging, RTF editors, text file creation for email, and many other things.
As far as cut/copy/past goes, I have not come across the ability to do that directly, however, you can output the contents of the file and "copy" what part of it you want and "paste" it into a new file, or append it to an existing.

While writing a byte array to a file is a common task, writing to a give file 32-bytes byte array just once is just not something you are going to find in java.io :)
To get started, would the below method and comments look reasonable to you? I bet someone here, maybe even myself, could whip it out quick like.
public static void writeFauxVirusSignature(File file, byte[] bytes, long offset) {
//open file
//move to offset
//write bytes
//close file
}
Questions:
How big could the potential target files be?
Do you need performance?
I ask because clean, easy to read code would use Apache Commons lib's, but large file writes in a performance sensitive environment will necessitate using java.nio libraries

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.