I am processing a large number of files, say 1000 files using a java program. Processing each file takes significant amount of time. Problem is : when I process a file, due to some unknown problem (may be antivirus or any other problem) the input file is not able to be accessed by java program, so I get "Access is denied" and ultimately "java.io.FileNotFoundException".
One of the possible solution is to whenever I get the exception I process the file again calling the function, but calling the function with file name is difficult as this function is recursive function, which process the directories and files recursively. Kindly suggest me the alternative ways.
Move the catch() inside the body of the recursive method.
void readFilesRecursively(File dirOrFile){
boolean successfullRead=false;
for(; !successfullRead ;){
try{
..........Read....
successfullRead=true;
}catch(java.io.FileNotFoundException ex){}
}
}
Keep a list of files whose processing fails and add files in the list whenever u get the exception.
Once recursive call ends, check if list have any data, if yes, process them.
Related
Linux machine, Java standalone application
I am having the following situation:
I have:
consecutive file write(which creates the destination file and writes some content to it) and file move.
I also have a power outage problem, which instantly cuts off the power of computer during these operations.
As a result, I am getting that the file was created, and it was moved as well, but the file content is empty.
The question is what under the hood can be causing this exact outcome? Considering the time sensitivity, may be hard drive is disabled before the processor and RAM during the cut out, but in that case, how is it possible that the file is created and moved after, but the write before moving is not successful?
I tried catching and logging the exception and debug information but the problem is power outage disables the logging abilities(I/O) as well.
try {
FileUtils.writeStringToFile(file, JsonUtils.toJson(object));
} finally {
if (file.exists()) {
FileUtils.moveFileToDirectory(file, new File(path), true);
}
}
Linux file systems don't necessarily write things to disk immediately, or in exactly the order that you wrote them. That includes both file content and file / directory metadata.
So if you get a power failure at the wrong time, you may find that the file data and metadata is inconsistent.
Normally this doesn't matter. (If the power fails and you don't have a UPS, the applications go away without getting a chance to finish what they were doing.)
However, if it does matter, you can do the following: to force the file to "sync" before you move it:
FileOutputStream fos = ...
// write to file
fs.getFD().sync();
fs.close();
// now move it
You need to read the javadoc for sync() carefully to understand what the method actually does.
You also need to read the javadoc for the method you are using to move the file regarding atomicity.
I have a code to work with some file:
Path path = ...;
if (!path.toFile().exists() || Files.size(path) == 0L) {
Files.write(path, data, StandardOpenOption.CREATE);
}
It's working fine almost always, but in some cases it overrides existing file, so I'm getting corrupted file with old data overriden with new data. For example if file content was 00000000000000 and data is AAA in code above, I'll get the file with content AAA00000000000.
File access is syncronized well, so only one thread can access the file, only one instance of application can be started at same time. Application is running on Heroku (it's heroku-managed filesystem), I can't reproduce same behavior on my laptop.
Is it pissible that Files.size(path) returns zero for file with some data? How to rewrite this code to make it work correctly? Is it possible to use another StandardOpenOption flags to fail (throw exception) if file is not empty or doesn't exist?
What is the desired behavior for an existing file with data?
Discard existing data
You can use CREATE and TRUNCATE_EXISTING together. Actually, maybe you should use nothing, since the default for write() is CREATE, TRUNCATE_EXISTING, WRITE, per the documentation.
Keep existing data
You can open it in APPEND mode rather than WRITE mode.
Do nothing if file already exists and is not empty.
This is tricky. The non-zero size report is troubling. I'd suggest using CREATE_NEW (fail if exists) and if you get the failure exception, read the file to see if it's non-empty.
Your code contains a race hazard because it performs a "look before you leap" that can not be relied upon. In between your predicate
!path.toFile().exists() || Files.size(path) == 0L
giving true, which you think means the file has no previous content, and executing the Files.write to write to the file, a different process (or thread) could have written to the file.
I want to list large number of files(10, 20 thousand or so) contained in a single directory, quickly and efficiently.
I have read quite a few posts especially over here explaining the short coming of Java to achieve such, basically due to the underlying filesystem (and that probably Java 7 has some answer to it).
Some of the posts here have proposed alternatives like native calls or piping etc and I do understand the best possible option under normal circumstances is the java call
- String[] sList = file.list(); which is only slightly better than file.listFiles();
Also, there was a suggestion for the use of multithreading(also Executor service).
Well, here the issue is I have very little practical know-how of how to code multithreading way. So my logic is bound to be incorrect. Still, I tried this way:
created a list of few thread objects
Ran a loop of this list, called the .start() and immediately .sleep(500)
In the thread class, over-rode the run methos to include the .list()
Something like this, Caller class -
String[] strList = null;
for (int i = 0; i < 5; i++){
ThreadLister tL = new ThreadLister(fit);
threadList.add(tL);
}
for (int j = 0; j < threadList.size(); j++) {
thread = threadList.get(j);
thread.start();
thread.sleep(500);
}
strList = thread.fileList;
and the Thread class as -
public String[] fileList;
public ThreadLister(File f) {
this.f = f;
}
public void run() {
fileList = f.list();
}
I might be way off here with multithreading, I guess that.
I would very much appreciate a solution to my requirement the multithreading. Added benefit is I would learn a bit more about practical multithreading.
Query Update
Well, Obviously multithreading isn't going to help me(well I now realise its not actually a solution). Thank you for helping me to rule out threading.
So I tried,
1. FileUtils.listFiles() from apache commons - not much difference.
2. Native call viz. exec("cmd /c dir /B .\\Test") - here this executes fast but then when I read the Stream using a while loop that takes ages.
What actually I require is filename depending upon a certain filter amongst about 100k files in single directory. So I am using like File.list(new FileNameFilter()).
I believe FileNameFilter has no benefit, as it will try to match accordingly with all the files first and then give out the output.
Yes, I understand, I need a different approach of storing these files. One option I can try is storing these files in multiple directories, I am yet to try this(I dont know if this will help enough) - As suggested by Boris earlier.
What else can be a better option, will a native call on Unix ls with filename match work effectively. I know on windows it doesnt work, I mean unless we are searching in same directory
Kind Regards
Multi-threading is useful for listing multiple directories. However, you cannot split a single call to a single directory and I doubt it would be much faster if you could as the OS returns the files in any order it pleases.
The first thing about learning multi-threading is that not all solutions will be faster or simpler just by using multiple threads.
Am as a completely different suggestion. Did you try using Apache Commons File util?
http://commons.apache.org/io/api-release/index.html Check out the method FileUtils.listFiles().
It will list out all the files in a directory. Maybe it is fast enough and optimized enough for you needs. Maybe you really don't need to reinvent the wheel and the solution is already out there?
What eventually, I have done is.
1. As a quickfix, to get over the problem at the moment, I used a native call to write all the filenames in a temp text file and then used a BufferedReader to read each line.
2. Wrote an utility to archive the inactive files(most of them) into some other archive location, thereby reducing the total no.of files in the active directory. So that the normal list() call returns much quicker.
3. But going forward as a long term solution, I will be modifying the way all these files are stored and create a kind of directory hierarchy structure wherein then each directory will be holding comparatively few files and hence the list() can work very fast.
One thing came to my mind and I noticed while testing was this list() when runs for the first time takes a long time but subsequent requests were very very fast. Makes me believe that JVM inetlligently retrieves the list which has remained on the heap. I tried a few things like adding files to the dir or changing the File variable name but still the response was instant. So I believe that this array sits on the heap till gc'ed and Java intelligently responds for same request. <*Am I right? or is that not how it behaves? some explanation pls.*>
Due to this, I thought, if I can write a small program to get this list once everyday and keep a static reference to it then this array won't be gc'ed and every request to retrieve this list will be fast. <*Again, some comments/suggestion appreciated.*>
Is there a way to configure Tomcat, wherein the GC may gc all other non-referenced objects but doesn't for some which are specified so? Somebody told me in Linux something like this is implemented at obviously for the OS level, I dont know whether its true or not though.
Which file system are you using? each file system has its own limitation on number of files/folders a directory can have (including the directory depth). So not sure how you could create and if created through some program were you able to read all the files back.
As suggested above the FileNameFilter, is a post file name filter so I am not sure if it would be any help (although you are probably creating smaller lists of file lists) as each listFiles() method would get the complete list.
For example:
1) Say Thread 1 is capturing list of file names starting with "T*", listFiles() call would retrieve all the thousands of file names and then filters as per FileNameFilter criteria
2) Thread 2 if capturing list of file names starting with "S*" would repeat the all the steps from 1.
So, you reading the directory listing multiple times putting more and more load on Heap/JVM native calls/file system etc.
If possible best suggestion would be to re-organize the directory structure.
I want to save a video file in C:\ by incrementing the file name e.g. video001.avi video002.avi video003.avi etc. i want to do this in java. The program is on
Problem in java programming on windows7 (working well in windows xp)
How do i increment the file name so that it saves without replacing the older file.
Using the File.createNewFile() you can atomically create a file and determine whether or not the current thread indeed created the file, the return value of that method is a boolean that will let you know whether or not a new file was created or not. Simply checking whether or not a file exists before you create it will help but will not guarantee that when you create and write to the file that the current thread created it.
You have two options:
just increment a counter, and rely on the fact that you're the only running process writing these files (and none exist already). So you don't need to check for clashes. This is (obviously) prone to error.
Use the File object (or Apache Commons FileUtils) to get the list of files, then increment a counter and determine if the corresponding file exists. If not, then write to it and exit. This is a brute force approach, but unless you're writing thousands of files, is quite acceptable performance-wise.
I got following exception when I am trying to seek to some file.
>
Error while seeking to 38128 in myFile, File length: 85742
java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
at java.io.RandomAccessFile.readLong(RandomAccessFile.java:758)
>
But If you see I am trying to seek to '38128' where as file length is '85742'. It reported EOF exception. I wonder how it is possible? Another process appends contents to that file periodically and closes the file handler. It appends contents using DataOutputStream. My process is trying to seek to some locations and reading it. One more thing is I got this exception only once. I tried to reproduce it but it never happened again. The file is in local disk only. No filer.
Thanks
D. L. Kumar
I would be very careful when trying to do random access on a file that is concurrently being written to from another process. It might lead to all kinds of strange synchronisation problems, as you are experiencing right now.
Do you determine the length of the file from the same process as the one doing the seek()? Has the other modifying processing done a flush()?
The process writing the data may have been told to write the data, but the data could be buffered to write. Be sure to call flush() on the output stream prior to attempting to read the data.