How copy files bigger than 4.3 GB in java - java

I'm writing a program part simply to copy a file from source to a destination File. The code works as well it should but if there is a file large file the process of copy will end up, after the destination file reach a size of 4.3 GB, with an exception. The exception is a "file is to large" it looks like:
java.io.IOException: Die Datei ist zu groß
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
at java.nio.channels.Channels.writeFully(Channels.java:101)
at java.nio.channels.Channels.access$000(Channels.java:61)
at java.nio.channels.Channels$1.write(Channels.java:174)
at java.nio.file.Files.copy(Files.java:2909)
at java.nio.file.Files.copy(Files.java:3069)
at sample.Controller.copyStream(Controller.java:318)
The method to produce that is following:
private void copyStream(File src, File dest){
try {
FileInputStream fis = new FileInputStream(src);
OutputStream newFos = java.nio.file.Files.newOutputStream(dest.toPath(),StandardOpenOption.WRITE);
Files.copy(src.toPath(),newFos);
newFos.flush();
newFos.close();
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
I also tried to use java.io Fileoutputstream and write in a kbyte way, but there happends the same. How can I copy or create files larger than 4.3 GB? Is it maybe possible in other language than java? This programm I run on a Linux (Ubuntu LTS 16.04).
Thanks in advance.
Edit:
Thanks very much you all for your help. As you said, the file system was the problem. After i formated the file system to exfat it works fine.

POSIX (and thus Unix) systems are allowed to impose a maximum length on the path (what you get from File.getPath() or the components of a path (the last of which you can get with File.getName()). You might be seeing this problem because of the long name for the file.
In that case, the file open operating system call will fail with an ENAMETOOLONG error code.
However, the message "File too large" is typically associated with the ´EFBIG´ error code. That is more likely to result from a write system call:
An attempt was made to write a file that exceeds the implementation-dependent maximum file size or the process' file size limit.
Perhaps the file is being opened for appending, and the implied lseek to the end of the file is giving the EFBIG error.
In the end, you could try other methods of copying if it has to do something with your RAM.
Also another option could be that the disk is full.
To copy files there are basically four ways [and it turns out streams is the fastest on a basic level] :
Copy with streams:
private static void copyFileUsingStream(File source, File dest) throws IOException {
InputStream is = null;
OutputStream os = null;
try {
is = new FileInputStream(source);
os = new FileOutputStream(dest);
byte[] buffer = new byte[1024];
int length;
while ((length = is.read(buffer)) > 0) {
os.write(buffer, 0, length);
}
} finally {
is.close();
os.close();
}
}
Copy with Java NIO classes:
private static void copyFileUsingChannel(File source, File dest) throws IOException {
FileChannel sourceChannel = null;
FileChannel destChannel = null;
try {
sourceChannel = new FileInputStream(source).getChannel();
destChannel = new FileOutputStream(dest).getChannel();
destChannel.transferFrom(sourceChannel, 0, sourceChannel.size());
}finally{
sourceChannel.close();
destChannel.close();
}
}
Copy with Apache Commons IO FileUtils:
private static void copyFileUsingApacheCommonsIO(File source, File dest) throws IOException {
FileUtils.copyFile(source, dest);
}
and your Method by using Java 7 and the Files class:
private static void copyFileUsingJava7Files(File source, File dest) throws IOException {
Files.copy(source.toPath(), dest.toPath());
}
Edit 1:
as suggested in the comments, here are three SO-questions, which cover the problem and explain the four different methodes of copying better:
Standard concise way to copy a file in Java?
File copy/move methods and approaches explanation, comparison
Reading and writing a large file using Java NIO
Thanks to #jww for pointing it out

Related

Word Documents Will Not Open After Downloading Using Apache Commons and Java FTP [duplicate]

This question already has answers here:
Can't download files with Arabic name with Java FTP client
(1 answer)
Images downloaded from "some" servers with FTPClient are corrupted
(2 answers)
Closed 3 years ago.
I have followed the Apache tutorial for downloading documents from FTP using Java. I have tried two methods and both of them download the files and the file sizes are reported to be the same as the size that I see in the FTP client if I inspect the files with Filezilla. However, when I get the files on my local disk and then go to open them, Word throws an error and asks if I want to recover the document. Even if recovery did work, I need the files to be downloaded properly in the first place. Can someone shed some light on why this may be happening?
Here is the source:
private void downloadAllFiles() throws IOException{
client.enterLocalPassiveMode();
client.changeWorkingDirectory(ftpDirectory);
client.setFileStructure(FTP.BINARY_FILE_TYPE);
FTPFile[] files = client.listFiles();
for(FTPFile f : files) {
if(f.isFile())
downloadFile(f);
}
}
private void downloadFile(FTPFile ftpFile) throws IOException{
File saveLocation = new File(fileStorageDir);
if(!saveLocation.exists())
saveLocation.mkdirs();
File downloadFile = new File(fileStorageDir + "\\" + ftpFile.getName());
OutputStream outputStream = new BufferedOutputStream(new FileOutputStream(downloadFile));
InputStream inputStream = client.retrieveFileStream(ftpFile.getName());
byte[] bytesArray = new byte[4096];
int bytesRead = -1;
while((bytesRead = inputStream.read(bytesArray)) !=-1) {
outputStream.write(bytesArray, 0, bytesRead);
}
boolean success = client.completePendingCommand();
if (success) {
System.out.println("File has been downloaded successfully.");
}
outputStream.close();
inputStream.close();
}
There are also odd characters in the file names. On the FTP server, it looks like this
How the Filenames Should Look
When I save them via Java (either method I tried), they have strange characters in the filenames I want removed:
Local File Names With Odd Characters
Any advice on either of these problems would be much appreciated. Thank you for your help.
It should be
client.setFileType(FTP.BINARY_FILE_TYPE);
not
client.setFileStructure(FTP.BINARY_FILE_TYPE);
For fixing the weird file names:
client.setControlEncoding("UTF-8");

How do you create non-existing folders/subdirectories when copying a file with Java InputStream?

I have used InputStream to succesfully copy a file from one location to another:
public static void copy(File src, File dest) throws IOException {
InputStream is = null;
OutputStream os = null;
try {
is = new FileInputStream("C:\\test.txt");
os = new FileOutputStream("C:\\javatest\\test.txt");
byte[] buf = new byte[1024];
int bytesRead;
while ((bytesRead = is.read(buf)) > 0) {
os.write(buf, 0, bytesRead);
}
} finally {
is.close();
os.close();
}
}
The problem appears when I add a non-existing folder into the path, for example:
os = new FileOutputStream("C:\\javatest\\javanewfolder\\test.txt");
This returns a NullPointerException error. How can I create all of the missing directories when executing the copy process through Output Stream?
First, if possible I'd recommend you to use the java.nio.file classes (e.g. Path), instead of the File based approach. You will create Path objects by using a file system. You may use the default filesystem, if no flexibility is needed here:
final String folder = ...
final String filename = ...
final FileSystem fs = FileSystems.getDefault();
final Path myFile fs.getPath(folder, filename);
Then your problem is easily solved by a very convenient API:
final Path destinationFolder = dest.getParent();
Files.createDirectories(myPath.getParent());
try (final OutputStream os = Files.newOutputStream(myFile)) {
...
}
The Files.createDirectories() method will not fail if the directory already exists, but it may fail due to other reasons. For example if a file "foo/bar" exists, Files.createDirectories("foo/bar/folder") will most likely not succeed. ;)
Please read the javadoc carefully!
To check, if a path points to an existing directory, just user:
Files.isDirectory(somePath);
If needed, you can convert between File and Path. You will lose file system information, though:
final Path path1 = file1.toPath();
final File file2 = path2.toFile();
You could use Files.createDirectories:
Files.createDirectories(Paths.get("C:\\javatest\\javanewfolder"));
Also, you could use Files.copy to copy file )

How to determine the compression method of a zip file

From a third party I am retrieving .zip files. I want to unzip these to another folder. To this end I found a method that does exactly that, see code below. It iterates through all files and unzips them to another folder. However, when observing the corresponding compression method I found out that this changes for some files. And for some files it states: "invalid compression method", after which it aborts further unzipping of the zip file.
As the compression method seems to change, I suspect I need to set the compression method to the correct one (however that might be a wrong assumption). So rises my question: how to determine the compression method needed?
The code I am using:
public void unZipIt(String zipFile, String outputFolder){
//create output directory is not exists
File folder = new File(OUTPUT_FOLDER);
if(!folder.exists()){
folder.mkdir();
}
FileInputStream fis = null;
ZipInputStream zipIs = null;
ZipEntry zEntry = null;
try
{
fis = new FileInputStream(zipFile);
zipIs = new ZipInputStream(new BufferedInputStream(fis));
while((zEntry = zipIs.getNextEntry()) != null){
System.out.println(zEntry.getMethod());
try{
byte[] tmp = new byte[4*1024];
FileOutputStream fos = null;
String opFilePath = OUTPUT_FOLDER + "\\" + zEntry.getName();
System.out.println("Extracting file to "+opFilePath);
fos = new FileOutputStream(opFilePath);
int size = 0;
while((size = zipIs.read(tmp)) != -1){
fos.write(tmp, 0 , size);
}
fos.flush();
fos.close();
} catch(IOException e){
System.out.println(e.getMessage());
}
}
zipIs.close();
} catch (FileNotFoundException e) {
System.out.println(e.getMessage());
}
catch(IOException ex){
System.out.println(ex.getMessage());
}
}
Currently I am retrieving the following output:
8
Extracting file to C:\Users\nlmeibe2\Documents\Projects\Output_test\SOPHIS_cptyrisk_tradedata_1192_20140616.csv
8
Extracting file to C:\Users\nlmeibe2\Documents\Projects\Output_test\SOPHIS_cptyrisk_underlying_1192_20140616.csv
0
Extracting file to C:\Users\nlmeibe2\Documents\Projects\Output_test\10052013/
12
Extracting file to C:\Users\nlmeibe2\Documents\Projects\Output_test\MRM_Daily_Position_Report_Package_Level_Underlying_View_EQB_v2_COBDATE_2014-06-16_RUNDATETIME_2014-06-17-04h15.csv
invalid compression method
invalid compression method
Since you only print the exception message and not the stack trace (with line numbers), it is impossible to know exactly where the exception is thrown, but I suppose it is not thrown until you actually try to read from the ZipEntry.
If the numbers in your output is the ZIP method, the last entry you encounter is compressed with method 12 (bzip2), which is not supported by the Java ZIP implementation. PKWare (the maintainers of the ZIP format) regularly add new compression methods to the ZIP specification and there are currently some 12-15 (not sure about the exact number) compression methods specified. Java only supports the methods 0 (stored) and 8 (deflated) and will throw an exception with the message "invalid compression method" if you try to decompress a ZIP file using an unsupported compression method.
Both WinZip and the ZIP functions in Windows may use compression methods not supported by the Java API.
Use zEntry.getMethod() to get the compression method
Returns the compression method of the entry, or -1 if not specified.
It will return an int which will be
public static final int STORED
public static final int DEFLATED
or -1 if it don't know the method.
Docs.

when file moved/renamed - what's the difference between java RAF and File

I only want to discuss about this in java/linux context.
RandomAccessFile rand = new RandomAccessFile("test.log", "r");
VS
File file = new File("test.log");
After the creation, we start reading the file to the end.
In java.io.File case, it will throw IOException when reading the file if you mv or delete the physical file prior to the file reading.
public void readIOFile() throws IOException, InterruptedException {
File file = new File("/tmp/test.log");
System.out.print("file created"); // convert byte into char
Thread.sleep(5000);
while (true) {
char[] buffer = new char[1024];
FileReader fr = new FileReader(file);
fr.read(buffer);
System.out.println(buffer);
}
}
But in RandomFileAccess case, if you mv or delete the physical file prior to the file reading, it will finish reading the file without errors/exceptions.
public void readRAF() throws IOException, InterruptedException {
File file = new File("/tmp/test.log");
RandomAccessFile rand = new RandomAccessFile(file, "rw");
System.out.println("file created"); // convert byte into char
while (true) {
System.out.println(file.lastModified());
System.out.println(file.length());
Thread.sleep(5000);
System.out.println("finish sleeping");
int i = (int) rand.length();
rand.seek(0); // Seek to start point of file
for (int ct = 0; ct < i; ct++) {
byte b = rand.readByte(); // read byte from the file
System.out.print((char) b); // convert byte into char
}
}
}
Can anyone explain to me why ? Is there anything to do with file's inode?
Unlike RandomAccessFile or say, InputStream and many other java IO facilities, File is just an immutable handle that you drag from time to time when you need to do filesystem gateway actions. You may think of it as the reference: File instance is pointing to some specified path. On the other hand RandomAccessFile have notion of path only at construction time: it goes to the specified path, opens file and acquires file system descriptor -- you may think of it as an unique id of a given file, which do not changes on move and some other operations -- and uses this id throughout it's lifetime to address file.
The OS based file system services such as creating folders, files, verifying the permissions, changing file names etc., are provided by the java.io.File class.
The java.io.RandomAccessFile class provides random access to the records that are stored in a data file. Using this class, reading and writing , manipulations to the data can be done. One more flexibility is that it can read and write primitive data types, which helps in structured approach in handling data files.
Unlike the input and output stream classes in java.io, RandomAccessFile is used for both reading and writing files. RandomAccessFile does not inherit from InputStream or OutputStream. It implements the DataInput and DataOutput interfaces.
There is no evidence here that you have moved or renamed the file at all.
If you did thatt from outside the program, clearly it is just a timing issue.
If you rename a file before you try to open it with the old name, it will fail. Surely this is obvious?
One of the main difference is, File can not have control over write or read directly, it requires IO streams to do that. Where as RAF, we can write or read the files.

Poor Performance of Java's unzip utilities

I have noticed that the unzip facility in Java is extremely slow compared to using a native tool such as WinZip.
Is there a third party library available for Java that is more efficient?
Open Source is preferred.
Edit
Here is a speed comparison using the Java built-in solution vs 7zip.
I added buffered input/output streams in my original solution (thanks Jim, this did make a big difference).
Zip File size: 800K
Java Solution: 2.7 seconds
7Zip solution: 204 ms
Here is the modified code using the built-in Java decompression:
/** Unpacks the give zip file using the built in Java facilities for unzip. */
#SuppressWarnings("unchecked")
public final static void unpack(File zipFile, File rootDir) throws IOException
{
ZipFile zip = new ZipFile(zipFile);
Enumeration<ZipEntry> entries = (Enumeration<ZipEntry>) zip.entries();
while(entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
java.io.File f = new java.io.File(rootDir, entry.getName());
if (entry.isDirectory()) { // if its a directory, create it
continue;
}
if (!f.exists()) {
f.getParentFile().mkdirs();
f.createNewFile();
}
BufferedInputStream bis = new BufferedInputStream(zip.getInputStream(entry)); // get the input stream
BufferedOutputStream bos = new BufferedOutputStream(new java.io.FileOutputStream(f));
while (bis.available() > 0) { // write contents of 'is' to 'fos'
bos.write(bis.read());
}
bos.close();
bis.close();
}
}
The problem is not the unzipping, it's the inefficient way you write the unzipped data back to disk. My benchmarks show that using
InputStream is = zip.getInputStream(entry); // get the input stream
OutputStream os = new java.io.FileOutputStream(f);
byte[] buf = new byte[4096];
int r;
while ((r = is.read(buf)) != -1) {
os.write(buf, 0, r);
}
os.close();
is.close();
instead reduces the method's execution time by a factor of 5 (from 5 to 1 second for a 6 MB zip file).
The likely culprit is your use of bis.available(). Aside from being incorrect (available returns the number of bytes until a call to read would block, not until the end of the stream), this bypasses the buffering provided by BufferedInputStream, requiring a native system call for every byte copied into the output file.
Note that wrapping in a BufferedStream is not necessary if you use the bulk read and write methods as I do above, and that the code to close the resources is not exception safe (if reading or writing fails for any reason, neither is nor os would be closed). Finally, if you have IOUtils in the class path, I recommend using their well tested IOUtils.copy instead of rolling your own.
Make sure you are feeding the unzip method a BufferedInputStream in your Java application. If you have made the mistake of using an unbuffered input stream your IO performance is guaranteed to suck.
I have found an 'inelegant' solution. There is an open source utility 7zip (www.7-zip.org) that is free to use. You can download the command line version (http://www.7-zip.org/download.html). 7-zip is only supported on Windows, but it looks like this has been ported to other platforms (p7zip).
Obviously this solution is not ideal since it is platform specific and relies on an executable. However, the speed compared to doing the unzip in Java is incredible.
Here is the code for the utility function that I created to interface with this utility. There is room for improvement as the code below is Windows specific.
/** Unpacks the zipfile to the output directory. Note: this code relies on 7-zip
(specifically the cmd line version, 7za.exe). The exeDir specifies the location of the 7za.exe utility. */
public static void unpack(File zipFile, File outputDir, File exeDir) throws IOException, InterruptedException
{
if (!zipFile.exists()) throw new FileNotFoundException(zipFile.getAbsolutePath());
if (!exeDir.exists()) throw new FileNotFoundException(exeDir.getAbsolutePath());
if (!outputDir.exists()) outputDir.mkdirs();
String cmd = exeDir.getAbsolutePath() + "/7za.exe -y e " + zipFile.getAbsolutePath();
ProcessBuilder builder = new ProcessBuilder(new String[] { "cmd.exe", "/C", cmd });
builder.directory(outputDir);
Process p = builder.start();
int rc = p.waitFor();
if (rc != 0) {
log.severe("Util::unpack() 7za process did not complete normally. rc: " + rc);
}
}

Categories