How to handle split Streams functionally

How to handle split Streams functionally - java

Given the following code, how can I simplify it to a single, functional line?
// DELETE CSV TEMP FILES
final Map<Boolean, List<File>> deleteResults = Stream.of(tmpDir.listFiles())
.filter(tempFile -> tempFile.getName().endsWith(".csv"))
.collect(Collectors.partitioningBy(File::delete));
// LOG SUCCESSES AND FAILURES
deleteResults.entrySet().forEach(entry -> {
if (entry.getKey() && !entry.getValue().isEmpty()) {
LOGGER.debug("deleted temporary files, {}",
entry.getValue().stream().map(File::getAbsolutePath).collect(Collectors.joining(",")));
} else if (!entry.getValue().isEmpty()) {
LOGGER.debug("failed to delete temporary files, {}",
entry.getValue().stream().map(File::getAbsolutePath).collect(Collectors.joining(",")));
}
});
This is a common pattern I run into, where I have a stream of things, and I want to filter this stream, creating two streams based off that filter, where I can then do one thing to Stream A and another thing to Stream B. Is this an anti-pattern, or is it supported somehow?

If you particularly don't want the explicit variable referencing the interim map then you can just chain the operations:
.collect(Collectors.partitioningBy(File::delete))
.forEach((del, files) -> {
if (del) {
LOGGER.debug(... files.stream()...);
} else {
LOGGER.debug(... files.stream()...);
});

If you want to log all files of the either category together, there is no way around collecting them into a data structure holding them, until all elements are known. Still, you can simplify your code:
Stream.of(tmpDir.listFiles())
.filter(tempFile -> tempFile.getName().endsWith(".csv"))
.collect(Collectors.partitioningBy(File::delete,
Collectors.mapping(File::getAbsolutePath, Collectors.joining(","))))
.forEach((success, files) -> {
if (!files.isEmpty()) {
LOGGER.debug(success? "deleted temporary files, {}":
"failed to delete temporary files, {}",
files);
}
});
This doesn’t collect the files into a List but into the intended String for the subsequent logging action in the first place. The logging action also is identical for both cases, but only differs in the message.
Still, the most interesting thing is why deleting a file failed, which a boolean doesn’t tell. Since Java 7, the nio package provides a better alternative:
Create helper method
public static String deleteWithReason(Path p) {
String problem;
IOException ioEx;
try {
Files.delete(p);
return "";
}
catch(FileSystemException ex) {
problem = ex.getReason();
ioEx = ex;
}
catch(IOException ex) {
ioEx = ex;
problem = null;
}
return problem!=null? problem.replaceAll("\\.?\\R", ""): ioEx.getClass().getName();
}
and use it like
Files.list(tmpDir.toPath())
.filter(tempFile -> tempFile.getFileName().toString().endsWith(".csv"))
.collect(Collectors.groupingBy(YourClass::deleteWithReason,
Collectors.mapping(p -> p.toAbsolutePath().toString(), Collectors.joining(","))))
.forEach((failure, files) ->
LOGGER.debug(failure.isEmpty()? "deleted temporary files, {}":
"failed to delete temporary files, "+failure+ ", {}",
files)
);
The disadvantage, if you want to call it that way, is does not produce a single entry for all failed files, if they have different failure reasons. But that’s obviously unavoidable if you want to log them with the reason why they couldn’t be deleted.
Note that if you want to exclude “being deleted by someone else concurrently” from the failures, you can simply use Files.deleteIfExists(p) instead of Files.delete(p) and being already deleted will be treated as success.

Related

Is there any way to get all the enums classes and their respective values from a project in IntelliJ?

I need to create a list of classes declared as enum in a project in IntelliJ and, inside each one, list the respective values. Is there any automated way to accomplish this task, as it is a rather large project, with hundreds of occurrences.

Yes, you can do this using the type hierarchy view and export the entries to a file. I'm using eclipse shortcuts and I can open the type hieararchy using F4. Just make sure to filter Production files.
Example using the Guava project:
As for getting all the values for each enum, I don't think there's such a feature in Intellij, but you could probably write a script to process the exported file.
Edit
This method will print enum values using the file exported from Intellij. You might need to modify it to handle errors, etc.
private static void printEnumClassesAndValues(String file) throws IOException {
Files.lines(Path.of(file))
.filter(line -> !line.contains("java.lang"))
.map(line -> {
String[] tokens = line.replaceAll("[\\(\\),]", "").trim().split(" ");
return String.format("%s.%s", tokens[1], tokens[0]); // package + class name
})
.forEach(enumClass -> System.out.printf("%s: %s%n", enumClass, Arrays.toString(getEnumValues(enumClass))));
}
private static Enum<?>[] getEnumValues(String enumClass) {
try {
Method m = Class.forName(enumClass).getDeclaredMethod("values");
return (Enum<?>[]) m.invoke(null);
} catch (Exception ex) {
throw new RuntimeException(ex); // log or handle otherwise
}
}

Trying to add substrings from newLines in a large file to a list

I downloaded my extended listening history from Spotify and I am trying to make a program to turn the data into a list of artists without doubles I can easily make sense of. The file is rather huge because it has data on every stream I have done since 2016 (307790 lines of text in total). This is what 2 lines of the file looks like:
{"ts":"2016-10-30T18:12:51Z","username":"edgymemes69endmylifepls","platform":"Android OS 6.0.1 API 23 (HTC, 2PQ93)","ms_played":0,"conn_country":"US","ip_addr_decrypted":"68.199.250.233","user_agent_decrypted":"unknown","master_metadata_track_name":"Devil's Daughter (Holy War)","master_metadata_album_artist_name":"Ozzy Osbourne","master_metadata_album_album_name":"No Rest for the Wicked (Expanded Edition)","spotify_track_uri":"spotify:track:0pieqCWDpThDCd7gSkzx9w","episode_name":null,"episode_show_name":null,"spotify_episode_uri":null,"reason_start":"fwdbtn","reason_end":"fwdbtn","shuffle":true,"skipped":null,"offline":false,"offline_timestamp":0,"incognito_mode":false},
{"ts":"2021-03-26T18:15:15Z","username":"edgymemes69endmylifepls","platform":"Android OS 11 API 30 (samsung, SM-F700U1)","ms_played":254120,"conn_country":"US","ip_addr_decrypted":"67.82.66.3","user_agent_decrypted":"unknown","master_metadata_track_name":"Opportunist","master_metadata_album_artist_name":"Sworn In","master_metadata_album_album_name":"Start/End","spotify_track_uri":"spotify:track:3tA4jL0JFwFZRK9Q1WcfSZ","episode_name":null,"episode_show_name":null,"spotify_episode_uri":null,"reason_start":"fwdbtn","reason_end":"trackdone","shuffle":true,"skipped":null,"offline":false,"offline_timestamp":1616782259928,"incognito_mode":false},
It is formatted in the actual text file so that each stream is on its own line. NetBeans is telling me the exception is happening at line 19 and it only fails when I am looking for a substring bounded by the indexOf function. My code is below. I have no idea why this isn't working, any ideas?
import java.util.*;
public class MainClass {
public static void main(String args[]){
File dat = new File("SpotifyListeningData.txt");
List<String> list = new ArrayList<String>();
Scanner swag = null;
try {
swag = new Scanner(dat);
}
catch(Exception e) {
System.out.println("pranked");
}
while (swag.hasNextLine())
if (swag.nextLine().length() > 1)
if (list.contains(swag.nextLine().substring(swag.nextLine().indexOf("artist_name"), swag.nextLine().indexOf("master_metadata_album_album"))))
System.out.print("");
else
try {list.add(swag.nextLine().substring(swag.nextLine().indexOf("artist_name"), swag.nextLine().indexOf("master_metadata_album_album")));}
catch(Exception e) {}
System.out.println(list);
}
}

Find a JSON parser you like.
Create a class that with the fields you care about marked up to the parsers specs.
Read the file into a collection of objects. Most parsers will stream the contents so you're not string a massive string.
You can then load the data into objects and store that as you see fit. For your purposes, a TreeSet is probably what you want.

Your code will throw a lot of exceptions only because you don't use braces. Please do use braces in each blocks, whether it is if, else, loops, whatever. It's a good practice and prevent unnecessary bugs.
However, everytime scanner.nextLine() is called, it reads the next line from the file, so you need to avoid using that in this way.
The best way to deal with this is to write a class containing the fields same as the json in each line of the file. And map the json to the class and get desired field value from that.
Your way is too much risky and dependent on structure of the data, even on whitespaces. However, I fixed some lines in your code and this will work for your purpose, although I actually don't prefer operating string in this way.
while (swag.hasNextLine()) {
String swagNextLine = swag.nextLine();
if (swagNextLine.length() > 1) {
String toBeAdded = swagNextLine.substring(swagNextLine.indexOf("artist_name") + "artist_name".length() + 2
, swagNextLine.indexOf("master_metadata_album_album") - 2);
if (list.contains(toBeAdded)) {
System.out.print("Match");
} else {
try {
list.add(toBeAdded);
} catch (Exception e) {
System.out.println("Add to list failed");
}
}
System.out.println(list);
}
}

How to log only a certain exception with vavr

So I have a method which returns an Vavr Try:
public Try<Result> request() {...}
request comes from a source which I cannot modify. Currently, I flatmap over the result from request and depending if the Result has an error return a Try with an exception or a success with the data from the Result:
public Try<Data> fetchData() {
return request().flatMap(result -> {
if (result.hasError()) {
return Try.failure(new FailedRequestException());
} else {
return Try.success(result.data());
}
});
}
What I want is in some places where fetchData is used first do something with the data if the Try is a success and if it is a failure, log an error if the error is a FailedRequestException, else, do something else with the exception, something like the following:
fetchData().andThen(data -> ...).onFailure(ex -> {
if (ex instanceOf FailedRequestException) {
log.error("Could not fetch data: " + ex.getMessage());
} else {
// Do something with the exception
...
}
});
My problem with this approach is that fetchData returns a Try so the caller cannot know that a FailedRequestException is part of the possible failures. I can let fetchData return a Try<Either<FailedRequestException, Data>> but this doesn't feel right either. Is there any way to do the above in a more elegant way? I also tried using the Match and Case but the Case expects a Function as handler and not a Consumer.

To sum up: you actually have 3 scenarios (success, failure with FailedRequestException, any other failure). This sounds like a job for pattern matching! Let's make the code as visible and expressive as the business requirement :)
Match(fetchData()).of(
Case($Success($()), data -> doStuff(data)),
Case($Failure($(instanceOf(FailedRequestException.class))), fre -> logFreAndReturnValue(fre)),
Case($Failure($()), e -> doSomethingWithOtherException(e))
);
FWIW, you can rewrite your fetchData implementation as such:
Try(request())
.mapFailure(Case($(), ignored -> new FailedRequestException()))
.map(Result::data);
As a rule of thumb, try to stick to using flatMap when the context (Success or Failure) may change. In your current fetchData implementation a success remains a success, a failure remains a failure, so it is a mapping between the input and the output, hence use map family of functions.
Cheers!

Recursively list all files within a directory using nio.file.DirectoryStream;

I want to list all the FILES within the specified directory and subdirectories within that directory. No directories should be listed.
My current code is below. It does not work properly as it only lists the files and directories within the specified directory.
How can I fix this?
final List<Path> files = new ArrayList<>();
Path path = Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
try
{
DirectoryStream<Path> stream;
stream = Files.newDirectoryStream(path);
for (Path entry : stream)
{
files.add(entry);
}
stream.close();
}
catch (IOException e)
{
e.printStackTrace();
}
for (Path entry: files)
{
System.out.println(entry.toString());
}

Java 8 provides a nice way for that:
Files.walk(path)
This method returns Stream<Path>.

Make a method which will call itself if a next element is directory
void listFiles(Path path) throws IOException {
try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
for (Path entry : stream) {
if (Files.isDirectory(entry)) {
listFiles(entry);
}
files.add(entry);
}
}
}

Check FileVisitor, very neat.
Path path= Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
final List<Path> files=new ArrayList<>();
try {
Files.walkFileTree(path, new SimpleFileVisitor<Path>(){
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
if(!attrs.isDirectory()){
files.add(file);
}
return FileVisitResult.CONTINUE;
}
});
} catch (IOException e) {
e.printStackTrace();
}

If you want to avoid having the function calling itself recursively and having a file list that is a member variable, you can use a stack:
private List<Path> listFiles(Path path) throws IOException {
Deque<Path> stack = new ArrayDeque<Path>();
final List<Path> files = new LinkedList<>();
stack.push(path);
while (!stack.isEmpty()) {
DirectoryStream<Path> stream = Files.newDirectoryStream(stack.pop());
for (Path entry : stream) {
if (Files.isDirectory(entry)) {
stack.push(entry);
}
else {
files.add(entry);
}
}
stream.close();
}
return files;
}

This is the shortest implementation I came up with:
final List<Path> files = new ArrayList<>();
Path path = Paths.get("C:\\Users\\Danny\\Documents\\workspace\\Test\\bin\\SomeFiles");
try {
Files.walk(path).forEach(entry -> list.add(entry));
} catch (IOException e) {
e.printStackTrack();
}

Using Rx Java, the requirement can be solved in a number of ways while sticking to usage of DirectoryStream from JDK.
Following combinations will give you the desired effect, I'd explain them in sequence:
Approach 1. A recursive approach using flatMap() and defer() operators
Approach 2. A recursive approach using flatMap() and fromCallable operators
Note: If you replace usage of flatMap() with concatMap(), the directory tree navigation will necessarily happen in a depth-first-search (DFS) manner. With flatMap(), DFS effect is not guaranteed.
Approach 1: Using flatMap() and defer()
private Observable<Path> recursiveFileSystemNavigation_Using_Defer(Path dir) {
return Observable.<Path>defer(() -> {
//
// try-resource block
//
try(DirectoryStream<Path> children = Files.newDirectoryStream(dir))
{
//This intermediate storage is required because DirectoryStream can't be navigated more than once.
List<Path> subfolders = Observable.<Path>fromIterable(children)
.toList()
.blockingGet();
return Observable.<Path>fromIterable(subfolders)
/* Line X */ .flatMap(p -> !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_Using_Defer(p), Runtime.getRuntime().availableProcessors());
// /* Line Y */ .concatMap(p -> !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_Using_Defer(p));
} catch (IOException e) {
/*
This catch block is required even though DirectoryStream is Closeable
resource. Reason is that .close() call on a DirectoryStream throws a
checked exception.
*/
return Observable.<Path>empty();
}
});
}
This approach is finding children of given directory and then emitting the children as Observables. If a child is a file, it will be immediately available to a subscriber else flatMap() on Line X will invoke the method recursively passing each sub-directory as argument. For each such subdir, flatmap will internally subscribe to their children all at the same time. This is like a chain-reaction which needs to be controlled.
Therefore use of Runtime.getRuntime().availableProcessors() sets the maximum concurrency level for flatmap() and prevents it from subscribing to all subfolders at the same time. Without setting concurrency level, imagine what will happen when a folder had 1000 children.
Use of defer() prevents the creation of a DirectoryStream prematurely and ensures it will happen only when a real subscription to find its subfolders is made.
Finally the method returns an Observable < Path > so that a client can subscribe and do something useful with the results as shown below:
//
// Using the defer() based approach
//
recursiveDirNavigation.recursiveFileSystemNavigation_Using_Defer(startingDir)
.subscribeOn(Schedulers.io())
.observeOn(Schedulers.from(Executors.newFixedThreadPool(1)))
.subscribe(p -> System.out.println(p.toUri()));
Disadvantage of using defer() is that it does not deal with checked exceptions nicely if its argument function is throwing a checked exception. Therefore even though DirectoryStream (which implements Closeable) was created in a try-resource block, we still had to catch the IOException because the auto closure of a DirectoryStream throws that checked exception.
While using Rx based style, use of catch() blocks for error handling sounds a bit odd because even errors are sent as events in reactive programming. So why not we use an operator which exposes such errors as events.
A better alternative named as fromCallable() was added in Rx Java 2.x. 2nd approach shows the use of it.
Approach 2. Using flatMap() and fromCallable operators
This approach uses fromCallable() operator which takes a Callable as argument. Since we want a recursive approach, the expected result from that callable is an Observable of children of given folder. Since we want a subscriber to receive results when they are available, we need to return a Observable from this method. Since the result of inner callable is an Observable list of children, the net effect is an Observable of Observables.
private Observable<Observable<Path>> recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(Path dir) {
/*
* fromCallable() takes a Callable argument. In this case the callbale's return value itself is
* a list of sub-paths therefore the overall return value of this method is Observable<Observable<Path>>
*
* While subscribing the final results, we'd flatten this return value.
*
* Benefit of using fromCallable() is that it elegantly catches the checked exceptions thrown
* during the callable's call and exposes that via onError() operator chain if you need.
*
* Defer() operator does not give that flexibility and you have to explicitly catch and handle appropriately.
*/
return Observable.<Observable<Path>> fromCallable(() -> traverse(dir))
.onErrorReturnItem(Observable.<Path>empty());
}
private Observable<Path> traverse(Path dir) throws IOException {
//
// try-resource block
//
try(DirectoryStream<Path> children = Files.newDirectoryStream(dir))
{
//This intermediate storage is required because DirectoryStream can't be navigated more than once.
List<Path> subfolders = Observable.<Path>fromIterable(children)
.toList()
.blockingGet();
return Observable.<Path>fromIterable(subfolders)
/* Line X */ .flatMap(p -> ( !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(p).blockingSingle())
,Runtime.getRuntime().availableProcessors());
// /* Line Y */ .concatMap(p -> ( !isFolder(p) ? Observable.<Path> just(p) : recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(p).blockingSingle() ));
}
}
A subscriber will then need to flatten the results stream as shown below:
//
// Using the fromCallable() based approach
//
recursiveDirNavigation.recursiveFileSystemNavigation_WithoutExplicitCatchBlock_UsingFromCallable(startingDir)
.subscribeOn(Schedulers.io())
.flatMap(p -> p)
.observeOn(Schedulers.from(Executors.newFixedThreadPool(1)))
.subscribe(filePath -> System.out.println(filePath.toUri()));
In traverse() method, why is line X using blocking Get
Because the recursive function returns an Observable < Observable >, but flatmap at that line needs an Observable to subscribe to.
Line Y in both approaches uses concatMap()
Because concatMap() can be comfortably used if we don't want parallelism during innner subscriptions made by flatmap().
In both approaches, the implementation of method isFolder looks like below:
private boolean isFolder(Path p){
if(p.toFile().isFile()){
return false;
}
return true;
}
Maven coordinates for Java RX 2.0
<dependency>
<groupId>io.reactivex.rxjava2</groupId>
<artifactId>rxjava</artifactId>
<version>2.0.3</version>
</dependency>
Imports in Java file
import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.Executors;
import io.reactivex.Observable;
import io.reactivex.schedulers.Schedulers;

Complete the implementation: It will read every file from subfolder just a quick check
Path configFilePath = FileSystems.getDefault().getPath("C:\\Users\\sharmaat\\Desktop\\issue\\stores");
List<Path> fileWithName = Files.walk(configFilePath)
.filter(s -> s.toString().endsWith(".java"))
.map(Path::getFileName)
.sorted()
.collect(Collectors.toList());
for (Path name : fileWithName) {
// printing the name of file in every sub folder
System.out.println(name);
}

Try this ..it traverses through every folder and print both folder as well as files:-
public static void traverseDir(Path path) {
try (DirectoryStream<Path> stream = Files.newDirectoryStream(path)) {
for (Path entry : stream) {
if (Files.isDirectory(entry)) {
System.out.println("Sub-Folder Name : " + entry.toString());
traverseDir(entry);
} else {
System.out.println("\tFile Name : " + entry.toString());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}

Try : You will get a list of directory and sub-directory path;
There may be unlimited sub-directory, try to use recursive process.
public class DriectoryFileFilter {
private List<String> filePathList = new ArrayList<String>();
public List<String> read(File file) {
if (file.isFile()) {
filePathList.add(file.getAbsolutePath());
} else if (file.isDirectory()) {
File[] listOfFiles = file.listFiles();
if (listOfFiles != null) {
for (int i = 0; i < listOfFiles.length; i++){
read(listOfFiles[i]);
}
} else {
System.out.println("[ACCESS DENIED]");
}
}
return filePathList;
}
}

which method of mapper class shall be used to read the file in distributed cache

You want to populate an associative array in order to perform a map-side join. You’ve decided to
put this information in a text file, place that file into the DistributedCache and read it in your
Mapper before any records are processed.
Indentify which method in the Mapper you should use to implement code for reading the file and
populating the associative array?
map or configure ??

I believe you're looking for the setup() method.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Mapper.html#setup%28org.apache.hadoop.mapreduce.Mapper.Context%29
It gets called once at the initialization of each mapper task. So if there's anything you want to do before the map task starts to read the key/value pairs through the map method (such as, in your question, to read a file off the distributed cache and populate some member vars with the info), then that is the place to do it.

class IndexMapperExample implements Mapper {
void configure(JobConf conf) {
try {
String stopwordCacheName = new Path(HDFS_STOPWORD_LIST).getName();
Path [] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
if (null != cacheFiles && cacheFiles.length > 0) {
for (Path cachePath : cacheFiles) {
if (cachePath.getName().equals(stopwordCacheName)) {
loadStopWords(cachePath);
break;
}
}
}
} catch (IOException ioe) {
System.err.println("IOException reading from distributed cache");
System.err.println(ioe.toString());
}
}
In above code snippet,file of distributed cache gets read in configure method.
that means distributed cache gets read in configure method.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to handle split Streams functionally - java

If you particularly don't want the explicit variable referencing the interim map then you can just chain the operations: .collect(Collectors.partitioningBy(File::delete)) .forEach((del, files) -> { if (del) { LOGGER.debug(... files.stream()...); } else { LOGGER.debug(... files.stream()...); });

Related

Is there any way to get all the enums classes and their respective values from a project in IntelliJ?

Trying to add substrings from newLines in a large file to a list

How to log only a certain exception with vavr

Recursively list all files within a directory using nio.file.DirectoryStream;

which method of mapper class shall be used to read the file in distributed cache

Categories

Resources