I'm writing a backup program using Java and the package NIO.
As far as I found, I could do it as in the code example below, i.e. read the folder content list and then for each file I have to do file attributes request... this is not an effective approach, especially if someone has big folders with thousands of files there in one folder, so maybe there is another way to do it by reading folder's content with all its attributes?
Or maybe should I use something else instead of NIO for this case? Thank You very much
public void scan(Path folder) throws IOException {
try (DirectoryStream<Path> ds = Files.newDirectoryStream(folder)) {
for (Path path : ds) {
//Map<String, Object> attributes = Files.readAttributes(path, "size,lastModifiedTime");
}
}
}
Thanks to DuncG, the answer is very simple:
HashMap <Path, BasicFileAttributes> attrs = new HashMap<>();
BiPredicate<Path, BasicFileAttributes> predicate = (p, a) -> {
return attrs.put(p, a) == null;
};
Stream<Path> stream = Files.find(folder, Integer.MAX_VALUE, predicate);
I made a benchmark to compare, this example executes 3x times faster than the example from the question text, so it seems that it invokes fewer filesystem I/O operations... in theory...
Related
I'm trying to move large amount of files(around 300Kb max size each file) from S3 folder to another.
I'm using AWS sdk for java, and tried to move around 1500 files.
it took too much time, and the number of files may be increase to 10,000.
for each copy of file, need to delete from the source folder as there is no method to move file.
this what i tried:
public void moveFiles(String fromKey, String toKey) {
Stream<S3ObjectSummary> objectSummeriesStream = this.getObjectSummeries(fromKey);
objectSummeriesStream.forEach(file ->
{
this.s3Bean.copyObject(bucketName, file.getKey(), bucketName, toKey);
this.s3Bean.deleteObject(bucketName, file.getKey());
});
}
private Stream<S3ObjectSummary> getObjectSummeries(String key) {
// get the files that their prefix is "key" (can be consider as Folders).
ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(this.bucketName)
.withPrefix(key);
ObjectListing outFilesList = this.s3Bean.listObjects(listObjectsRequest);
return outFilesList.getObjectSummaries()
.stream()
.filter(x -> !x.getKey()
.equals(key));
}
If you are using Java application you can try to use several threads to copy files:
private ExecutorService executorService = Executors.fixed(20);
public void moveFiles(String fromKey, String toKey) {
Stream<S3ObjectSummary> objectSummeriesStream =
this.getObjectSummeries(fromKey);
objectSummeriesStream.forEach(file ->
{
executorService.submit(() ->
this.s3Bean.copyObject(bucketName, file.getKey(), bucketName, toKey);
this.s3Bean.deleteObject(bucketName, file.getKey());
)};
});
}
This should speed up the process.
An alternative might be using AWS-lambda. Once the file appear in source bucket you can, for example, put event in the SQS FIFO queue. The lambda will start single file copy by this event. If I am not mistaken in parallel you can start up to 500 instances of lambdas. Should be fast.
Is there a way to implement cd */ using Java?
On the terminal, this command takes me to the first subdirectory.
While looking for a solution, I came across this answer:
Moving to a directory one level down
But this takes me up, by a level, in the hierarchy. It uses a function called getParentFile(). Is there a similar function for the child, maybe?
You can't change the current working directory of a Java process (that I am aware of), but if you just need to work with File objects, you can do something like this:
File dir = new File("/some/start/path");
File[] children = dir.listFiles(file -> file.isDirectory());
if (children.length > 0) {
/* You may need a different sort order to duplicate the behavior
of the * glob character, but for example purposes... */
Arrays.sort(children, Comparator.comparing(File::getName));
/* Take the first one */
dir = children[0];
}
System.out.println("New directory is: " + dir.getAbsoluteFile());
Or if you want to use Streams, something like this would accomplish the same thing:
Path base = Paths.get("/some/start/path");
try (Stream<Path> items = Files.list(base)) {
Path found = items
.filter(Files::isDirectory)
.sorted()
.findFirst()
.orElse(base);
System.out.println("New directory is: " + found);
}
In the Streams case you should make sure to use try-with-resources, otherwise you will leak open file handles.
Here's an answer using the newer Files.newDirectoryStream method. It does basically the same thing as what Sean Bright posted, although it does't sort the files before selecting the first one.
public class DirWalk {
public static void main( String[] args ) throws IOException {
List<Path> subDir = StreamSupport.stream( Files.newDirectoryStream(
Paths.get( "." ), f -> Files.isDirectory( f ) ).spliterator(), false )
.limit(1)
.collect( Collectors.toList() );
System.out.println( "First sub-directory found: " + subDir );
}
}
Given the following code, how can I simplify it to a single, functional line?
// DELETE CSV TEMP FILES
final Map<Boolean, List<File>> deleteResults = Stream.of(tmpDir.listFiles())
.filter(tempFile -> tempFile.getName().endsWith(".csv"))
.collect(Collectors.partitioningBy(File::delete));
// LOG SUCCESSES AND FAILURES
deleteResults.entrySet().forEach(entry -> {
if (entry.getKey() && !entry.getValue().isEmpty()) {
LOGGER.debug("deleted temporary files, {}",
entry.getValue().stream().map(File::getAbsolutePath).collect(Collectors.joining(",")));
} else if (!entry.getValue().isEmpty()) {
LOGGER.debug("failed to delete temporary files, {}",
entry.getValue().stream().map(File::getAbsolutePath).collect(Collectors.joining(",")));
}
});
This is a common pattern I run into, where I have a stream of things, and I want to filter this stream, creating two streams based off that filter, where I can then do one thing to Stream A and another thing to Stream B. Is this an anti-pattern, or is it supported somehow?
If you particularly don't want the explicit variable referencing the interim map then you can just chain the operations:
.collect(Collectors.partitioningBy(File::delete))
.forEach((del, files) -> {
if (del) {
LOGGER.debug(... files.stream()...);
} else {
LOGGER.debug(... files.stream()...);
});
If you want to log all files of the either category together, there is no way around collecting them into a data structure holding them, until all elements are known. Still, you can simplify your code:
Stream.of(tmpDir.listFiles())
.filter(tempFile -> tempFile.getName().endsWith(".csv"))
.collect(Collectors.partitioningBy(File::delete,
Collectors.mapping(File::getAbsolutePath, Collectors.joining(","))))
.forEach((success, files) -> {
if (!files.isEmpty()) {
LOGGER.debug(success? "deleted temporary files, {}":
"failed to delete temporary files, {}",
files);
}
});
This doesn’t collect the files into a List but into the intended String for the subsequent logging action in the first place. The logging action also is identical for both cases, but only differs in the message.
Still, the most interesting thing is why deleting a file failed, which a boolean doesn’t tell. Since Java 7, the nio package provides a better alternative:
Create helper method
public static String deleteWithReason(Path p) {
String problem;
IOException ioEx;
try {
Files.delete(p);
return "";
}
catch(FileSystemException ex) {
problem = ex.getReason();
ioEx = ex;
}
catch(IOException ex) {
ioEx = ex;
problem = null;
}
return problem!=null? problem.replaceAll("\\.?\\R", ""): ioEx.getClass().getName();
}
and use it like
Files.list(tmpDir.toPath())
.filter(tempFile -> tempFile.getFileName().toString().endsWith(".csv"))
.collect(Collectors.groupingBy(YourClass::deleteWithReason,
Collectors.mapping(p -> p.toAbsolutePath().toString(), Collectors.joining(","))))
.forEach((failure, files) ->
LOGGER.debug(failure.isEmpty()? "deleted temporary files, {}":
"failed to delete temporary files, "+failure+ ", {}",
files)
);
The disadvantage, if you want to call it that way, is does not produce a single entry for all failed files, if they have different failure reasons. But that’s obviously unavoidable if you want to log them with the reason why they couldn’t be deleted.
Note that if you want to exclude “being deleted by someone else concurrently” from the failures, you can simply use Files.deleteIfExists(p) instead of Files.delete(p) and being already deleted will be treated as success.
I have been attempting to program a solution for ImageJ to process my images.
I understand how to get a directory, run commands on it, etc etc. However I've run into a situation where I now need to start using some type of search function in order to pair two images together in a directory full of image pairs.
I'm hoping that you guys can confirm I am on the right direction and that my idea is right. So far it is proving difficult for me to understand as I have less than even a month's worth of experience with Java. Being that this project is directly for my research I really do have plenty of drive to get it done I just need some direction in what functions are useful to me.
I initially thought of using regex but I saw that when you start processing a lot of images (especially with imagej which it seems does not dump data usage well, if that's the correct way to say it) that regex is very slow.
The general format of these images is:
someString_DAPI_0001.tif
someString_GFP_0001.tif
someString_DAPI_0002.tif
someString_GFP_0002.tif
someString_DAPI_0003.tif
someString_GFP_0003.tif
They are in alphabetical order so it should be able to go to the next image in the list. I'm just a bit lost on what functions I should use to accomplish this but I think my overall while structure is correct. Thanks to some help from Java forums. However I'm still stuck on where to go to next.
So far here is my code: Thanks to this SO answer for partial code
int count = 0;
getFile("C:\");
string DAPI;
string GFP;
private void getFile(String dirPath) {
File f = new File(dirPath);
File[] files = f.listFiles();
while (files.length > 0) {
if (/* File name contains "DAPI"*/){
DAPI = File f;
string substitute to get 'GFP' filename
store GFP file name into variable
do something(DAPI, GFP);
}
advance to next filename in list
}
}
As of right now I don't really know how to search for a string within a string. I've seen regex capture groups, and other solutions but I do not know the "best" one for processing hundreds of images.
I also have no clue what function would be used to substitute substrings.
I'd much appreciate it if you guys could point me towards the functions best for this case. I like to figure out how to make it on my own I just need help getting to the right information. Also want to make sure I am not making major logic mistakes here.
It doesn't seem like you need regex if your file names follow the simple pattern that you mentioned. You can simply iterate over the files and filter based on whether the filename contains DAPI e.g. see below. This code may be oversimplification of your requirements but I couldn't tell that based on the details you've provided.
import java.io.*;
public class Temp {
int count = 0;
private void getFile(String dirPath) {
File f = new File(dirPath);
File[] files = f.listFiles();
if (files != null) {
for (File file : files) {
if (file.getName().contains("DAPI")) {
String dapiFile = file.getName();
String gfpFile = dapiFile.replace("DAPI", "GFP");
doSomething(dapiFile, gfpFile);
}
}
}
}
//Do Something does nothing right now, expand on it.
private void doSomething(String dapiFile, String gfpFile) {
System.out.println(new File(dapiFile).getAbsolutePath());
System.out.println(new File(gfpFile).getAbsolutePath());
}
public static void main(String[] args) {
Temp app = new Temp();
app.getFile("C:\\tmp\\");
}
}
NOTE: As per Vogel612's answer, if you have Java 8 and like a functional solution you can have:
private void getFile(String dirPath) {
try {
Files.find(Paths.get(dirPath), 1, (path, basicFileAttributes) -> (path.toFile().getName().contains("DAPI"))).forEach(
dapiPath -> {
Path gfpPath = dapiPath.resolveSibling(dapiPath.getFileName().toString().replace("DAPI", "GFP"));
doSomething(dapiPath, gfpPath);
});
} catch (IOException e) {
e.printStackTrace();
}
}
//Dummy method does nothing yet.
private void doSomething(Path dapiPath, Path gfpPath) {
System.out.println(dapiPath.toAbsolutePath().toString());
System.out.println(gfpPath.toAbsolutePath().toString());
}
Using java.io.File is the wrong way to approach this problem. What you're looking for is a Stream-based solution using Files.find that would look something like this:
Files.find(dirPath, 1, (path, attributes) -> {
return path.getFileName().toString().contains("DAPI");
}).forEach(path -> {
Path gfpFile = path.resolveSibling(/*build GFP name*/);
doSomething(path, gfpFile);
});
What this does is:
Iterate over all Paths below dirPath 1 level deep (may be adjusted)
Check that the File's name contains "DAPI"
Use these files to find the relevant "GFP"-File
give them to doSomething
This is preferrable to the files solution because of multiple things:
It's significantly more informative when failing
It's cleaner and more terse than your File-Based solution and doesn't have to check for null
It's forward compatible, and thus preferrable over a File-Based solution
Files.find is available from Java 8 onwards
I want to list all the FILES within the specified directory and subdirectories within that directory. No directories should be listed.
My current code is below. It does not work properly as it only lists the files and directories within the specified directory.
How can I fix this?
final List<Path> files = new ArrayList<>();
Path path = Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
try
{
DirectoryStream<Path> stream;
stream = Files.newDirectoryStream(path);
for (Path entry : stream)
{
files.add(entry);
}
stream.close();
}
catch (IOException e)
{
e.printStackTrace();
}
for (Path entry: files)
{
System.out.println(entry.toString());
}
Java 8 provides a nice way for that:
Files.walk(path)
This method returns Stream<Path>.
Make a method which will call itself if a next element is directory
void listFiles(Path path) throws IOException {
try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
for (Path entry : stream) {
if (Files.isDirectory(entry)) {
listFiles(entry);
}
files.add(entry);
}
}
}
Check FileVisitor, very neat.
Path path= Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
final List<Path> files=new ArrayList<>();
try {
Files.walkFileTree(path, new SimpleFileVisitor<Path>(){
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
if(!attrs.isDirectory()){
files.add(file);
}
return FileVisitResult.CONTINUE;
}
});
} catch (IOException e) {
e.printStackTrace();
}
If you want to avoid having the function calling itself recursively and having a file list that is a member variable, you can use a stack:
private List<Path> listFiles(Path path) throws IOException {
Deque<Path> stack = new ArrayDeque<Path>();
final List<Path> files = new LinkedList<>();
stack.push(path);
while (!stack.isEmpty()) {
DirectoryStream<Path> stream = Files.newDirectoryStream(stack.pop());
for (Path entry : stream) {
if (Files.isDirectory(entry)) {
stack.push(entry);
}
else {
files.add(entry);
}
}
stream.close();
}
return files;
}
This is the shortest implementation I came up with:
final List<Path> files = new ArrayList<>();
Path path = Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
try {
Files.walk(path).forEach(entry -> list.add(entry));
} catch (IOException e) {
e.printStackTrack();
}
Using Rx Java, the requirement can be solved in a number of ways while sticking to usage of DirectoryStream from JDK.
Following combinations will give you the desired effect, I'd explain them in sequence:
Approach 1. A recursive approach using flatMap() and defer() operators
Approach 2. A recursive approach using flatMap() and fromCallable operators
Note: If you replace usage of flatMap() with concatMap(), the directory tree navigation will necessarily happen in a depth-first-search (DFS) manner. With flatMap(), DFS effect is not guaranteed.
Approach 1: Using flatMap() and defer()
private Observable<Path> recursiveFileSystemNavigation_Using_Defer(Path dir) {
return Observable.<Path>defer(() -> {
//
// try-resource block
//
try(DirectoryStream<Path> children = Files.newDirectoryStream(dir))
{
//This intermediate storage is required because DirectoryStream can't be navigated more than once.
List<Path> subfolders = Observable.<Path>fromIterable(children)
.toList()
.blockingGet();
return Observable.<Path>fromIterable(subfolders)
/* Line X */ .flatMap(p -> !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_Using_Defer(p), Runtime.getRuntime().availableProcessors());
// /* Line Y */ .concatMap(p -> !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_Using_Defer(p));
} catch (IOException e) {
/*
This catch block is required even though DirectoryStream is Closeable
resource. Reason is that .close() call on a DirectoryStream throws a
checked exception.
*/
return Observable.<Path>empty();
}
});
}
This approach is finding children of given directory and then emitting the children as Observables. If a child is a file, it will be immediately available to a subscriber else flatMap() on Line X will invoke the method recursively passing each sub-directory as argument. For each such subdir, flatmap will internally subscribe to their children all at the same time. This is like a chain-reaction which needs to be controlled.
Therefore use of Runtime.getRuntime().availableProcessors() sets the maximum concurrency level for flatmap() and prevents it from subscribing to all subfolders at the same time. Without setting concurrency level, imagine what will happen when a folder had 1000 children.
Use of defer() prevents the creation of a DirectoryStream prematurely and ensures it will happen only when a real subscription to find its subfolders is made.
Finally the method returns an Observable < Path > so that a client can subscribe and do something useful with the results as shown below:
//
// Using the defer() based approach
//
recursiveDirNavigation.recursiveFileSystemNavigation_Using_Defer(startingDir)
.subscribeOn(Schedulers.io())
.observeOn(Schedulers.from(Executors.newFixedThreadPool(1)))
.subscribe(p -> System.out.println(p.toUri()));
Disadvantage of using defer() is that it does not deal with checked exceptions nicely if its argument function is throwing a checked exception. Therefore even though DirectoryStream (which implements Closeable) was created in a try-resource block, we still had to catch the IOException because the auto closure of a DirectoryStream throws that checked exception.
While using Rx based style, use of catch() blocks for error handling sounds a bit odd because even errors are sent as events in reactive programming. So why not we use an operator which exposes such errors as events.
A better alternative named as fromCallable() was added in Rx Java 2.x. 2nd approach shows the use of it.
Approach 2. Using flatMap() and fromCallable operators
This approach uses fromCallable() operator which takes a Callable as argument. Since we want a recursive approach, the expected result from that callable is an Observable of children of given folder. Since we want a subscriber to receive results when they are available, we need to return a Observable from this method. Since the result of inner callable is an Observable list of children, the net effect is an Observable of Observables.
private Observable<Observable<Path>> recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(Path dir) {
/*
* fromCallable() takes a Callable argument. In this case the callbale's return value itself is
* a list of sub-paths therefore the overall return value of this method is Observable<Observable<Path>>
*
* While subscribing the final results, we'd flatten this return value.
*
* Benefit of using fromCallable() is that it elegantly catches the checked exceptions thrown
* during the callable's call and exposes that via onError() operator chain if you need.
*
* Defer() operator does not give that flexibility and you have to explicitly catch and handle appropriately.
*/
return Observable.<Observable<Path>> fromCallable(() -> traverse(dir))
.onErrorReturnItem(Observable.<Path>empty());
}
private Observable<Path> traverse(Path dir) throws IOException {
//
// try-resource block
//
try(DirectoryStream<Path> children = Files.newDirectoryStream(dir))
{
//This intermediate storage is required because DirectoryStream can't be navigated more than once.
List<Path> subfolders = Observable.<Path>fromIterable(children)
.toList()
.blockingGet();
return Observable.<Path>fromIterable(subfolders)
/* Line X */ .flatMap(p -> ( !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(p).blockingSingle())
,Runtime.getRuntime().availableProcessors());
// /* Line Y */ .concatMap(p -> ( !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(p).blockingSingle() ));
}
}
A subscriber will then need to flatten the results stream as shown below:
//
// Using the fromCallable() based approach
//
recursiveDirNavigation.recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(startingDir)
.subscribeOn(Schedulers.io())
.flatMap(p -> p)
.observeOn(Schedulers.from(Executors.newFixedThreadPool(1)))
.subscribe(filePath -> System.out.println(filePath.toUri()));
In traverse() method, why is line X using blocking Get
Because the recursive function returns an Observable < Observable >, but flatmap at that line needs an Observable to subscribe to.
Line Y in both approaches uses concatMap()
Because concatMap() can be comfortably used if we don't want parallelism during innner subscriptions made by flatmap().
In both approaches, the implementation of method isFolder looks like below:
private boolean isFolder(Path p){
if(p.toFile().isFile()){
return false;
}
return true;
}
Maven coordinates for Java RX 2.0
<dependency>
<groupId>io.reactivex.rxjava2</groupId>
<artifactId>rxjava</artifactId>
<version>2.0.3</version>
</dependency>
Imports in Java file
import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.Executors;
import io.reactivex.Observable;
import io.reactivex.schedulers.Schedulers;
Complete the implementation: It will read every file from subfolder just a quick check
Path configFilePath = FileSystems.getDefault().getPath("C:\\Users\\sharmaat\\Desktop\\issue\\stores");
List<Path> fileWithName = Files.walk(configFilePath)
.filter(s -> s.toString().endsWith(".java"))
.map(Path::getFileName)
.sorted()
.collect(Collectors.toList());
for (Path name : fileWithName) {
// printing the name of file in every sub folder
System.out.println(name);
}
Try this ..it traverses through every folder and print both folder as well as files:-
public static void traverseDir(Path path) {
try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
for (Path entry : stream) {
if (Files.isDirectory(entry)) {
System.out.println("Sub-Folder Name : " + entry.toString());
traverseDir(entry);
} else {
System.out.println("\tFile Name : " + entry.toString());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
Try : You will get a list of directory and sub-directory path;
There may be unlimited sub-directory, try to use recursive process.
public class DriectoryFileFilter {
private List<String> filePathList = new ArrayList<String>();
public List<String> read(File file) {
if (file.isFile()) {
filePathList.add(file.getAbsolutePath());
} else if (file.isDirectory()) {
File[] listOfFiles = file.listFiles();
if (listOfFiles != null) {
for (int i = 0; i < listOfFiles.length; i++){
read(listOfFiles[i]);
}
} else {
System.out.println("[ACCESS DENIED]");
}
}
return filePathList;
}
}