Best practice for iterating over Collection with impure functions in Java

Best practice for iterating over Collection with impure functions in Java - java

Suppose this is the use case:
I want to update a Hashmap cache inside my class.
I have a set of keys and some conditions I wanna apply to the keys and the values which are retrieved with the key.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Set;
public class App {
Set<String> keysToUpdate;
HashMap<String, List<String>> cache;
void buildCache(){
keysToUpdate.stream()
.filter(k -> true) // some filter
.forEach(
key ->{
// now values list comes from outside the stream pipeline
List<String> values = cache.computeIfAbsent(key, k -> new ArrayList<>());
getValuesforKey(key)
.stream()
.filter(k -> true) //another filter
// side effects are introduced
.forEach(value -> {
//some other operation, for example logging the values added like
// log.info("{} value added", value);
values.add(value);
}
);
}
);
}
private List<String> getValuesforKey(String key) {
//some method to get the values for the key
return new ArrayList<>();
}
}
We are told that shared mutability like this is bad because the execution is not deterministic, but in this specific case I am adding values to a hashmap and I don't care about the order of execution if I know keysToUpdate doesn't contain repeated values.
Are there other aspects I haven't considered? Is this code safe if streams are parallelised?
If not would using the collection's iterator fix the problem? (code below).
Or would it be best to use imperative programming instead? In what cases is shared mutability OK within the stream
public class App {
Set<String> keysToUpdate;
HashMap<String, List<String>> cache;
void buildCache(){
keysToUpdate.stream()
.filter(k -> true)// some filter
.collect(Collectors.toList()) // Collect before iterating
.forEach(
key ->{
// now values list comes from outside the stream pipeline
List<String> values = cache.computeIfAbsent(key, k -> new ArrayList<>());
getValuesforKey(key)
.stream()
.filter(k -> true) 、、another filter
.collect(Collectors.toList()) // Collect before iterating
// side effects are introduced
.forEach(value -> {
//some other operation, for example logging the values added like
// log.info("{} value added", value);
values.add(value);
}
);
}
);
}
private List<String> getValuesforKey(String key) {
//some method to get the values for the key
return new ArrayList<>();
}
}

When dealing with multithreading, the question to be answered is whether race conditions may happen (that is, if two or more different threads may access the same resource at the same time with at least one of them trying to modify it).
In your example, the computeIfAbsent method will modify your map if the requested key is not already there. So, two threads can potentially modify the same resource (the cache object). To avoid this, you can obtain (at the beginning of the buildCache method) a thread-safe version of your map by using Collections.synchronizedMap() and then operate on the returned map.
For what regards the values list, safety depends on whether two threads may operate on the same key, and thus modify the same list. In your examples, keys are a unique as being obtained from a set, so the code is safe.
Side note: the expected increase of performance depends on the amount of processing that the getValuesForKey method has to perform. If negligible, most threads will just be waiting for a lock on the map, thus making the performance increase also minimal.

Related

How to remove Keys that would cause Collisions before executing Collectors.toMap()

I have a stream of objects similar to this previous question, however, instead of ignoring duplicate values, I would like to remove any values from that stream beforehand and print them out.
For example, from this snippet:
Map<String, String> phoneBook = people.stream()
.collect(toMap(Person::getName,
Person::getAddress));
If there were duplicate entries, it would cause a java.lang.IllegalStateException: Duplicate key error to be thrown.
The solution proposed in that question used a mergeFunction to keep the first entry if a collision was found.
Map<String, String> phoneBook =
people.stream()
.collect(Collectors.toMap(
Person::getName,
Person::getAddress,
(address1, address2) -> {
System.out.println("duplicate key found!");
return address1;
}
));
Instead of keeping the first entry, if there is a collision from a duplicate key in the stream, I want to know which value caused the collision and make sure that there are no occurrences of that value within the resulting map.
I.e. if "Bob" appeared three times in the stream, it should not be in the map even once.
In the process of creating that map, I would like to filter out any duplicate names and record them some way.
I want to make sure that when creating the list there can be no duplicate entry and for there to be some way to know which entries had duplicate keys in incoming stream. I was thinking about using groupingBy and filter beforehand to find the duplicate keys, but I am not sure what the best way to do it is.

I would like to remove any values from that stream beforehand.
As #JimGarrison has pointed out, preprocessing the data doesn't make sense.
You can't know it in advance whether a name is unique or not until the all data set has been processed.
Another thing that you have to consider that inside the stream pipeline (before the collector) you have knowledge on what data has been encountered previously. Because results of intermediate operations should not depend on any state.
In case if you are thinking that streams are acting like a sequence of loops and therefore assuming that it's possible to preprocess stream elements before collecting them, that's not correct. Elements of the stream pipeline are being processed lazily one at a time. I.e. all the operations in the pipeline will get applied on a single element and each operation will be applied only if it's needed (that's what laziness means).
For more information, have a look at this tutorial and API documentation
Implementations
You can segregate unique values and duplicates in a single stream statement by utilizing Collectors.teeing() and a custom object that will contain separate collections of duplicated and unique entries of the phone book.
Since the primarily function of this object only to carry the data I've implemented it as Java 16 record.
public record FilteredPhoneBook(Map<String, String> uniquePersonsAddressByName,
List<String> duplicatedNames) {}
Collector teeing() expects three arguments: two collectors and a function that merges the results produced by both collectors.
The map generated by the groupingBy() in conjunction with counting(), is meant to determine duplicated names.
Since there's no point to processing the data, toMap() which is used as the second collector will create a map containing all names.
When both collectors will hand out their results to the merger function, it will take care of removing the duplicates.
public static FilteredPhoneBook getFilteredPhoneBook(Collection<Person> people) {
return people.stream()
.collect(Collectors.teeing(
Collectors.groupingBy(Person::getName, Collectors.counting()), // intermediate Map<String, Long>
Collectors.toMap( // intermediate Map<String, String>
Person::getName,
Person::getAddress,
(left, right) -> left),
(Map<String, Long> countByName, Map<String, String> addressByName) -> {
countByName.values().removeIf(count -> count == 1); // removing unique names
addressByName.keySet().removeAll(countByName.keySet()); // removing all duplicates
return new FilteredPhoneBook(addressByName, new ArrayList<>(countByName.keySet()));
}
));
}
Another way to address this problem to utilize Map<String,Boolean> as the mean of discovering duplicates, as #Holger have suggested.
With the first collector will be written using toMap(). And it will associate true with a key that has been encountered only once, and its mergeFunction will assign the value of false if at least one duplicate was found.
The rest logic remains the same.
public static FilteredPhoneBook getFilteredPhoneBook(Collection<Person> people) {
return people.stream()
.collect(Collectors.teeing(
Collectors.toMap( // intermediate Map<String, Boolean>
Person::getName,
person -> true, // not proved to be a duplicate and initially considered unique
(left, right) -> false), // is a duplicate
Collectors.toMap( // intermediate Map<String, String>
Person::getName,
Person::getAddress,
(left, right) -> left),
(Map<String, Boolean> isUniqueByName, Map<String, String> addressByName) -> {
isUniqueByName.values().removeIf(Boolean::booleanValue); // removing unique names
addressByName.keySet().removeAll(isUniqueByName.keySet()); // removing all duplicates
return new FilteredPhoneBook(addressByName, new ArrayList<>(isUniqueByName.keySet()));
}
));
}
main() - demo
public static void main(String[] args) {
List<Person> people = List.of(
new Person("Alise", "address1"),
new Person("Bob", "address2"),
new Person("Bob", "address3"),
new Person("Carol", "address4"),
new Person("Bob", "address5")
);
FilteredPhoneBook filteredPhoneBook = getFilteredPhoneBook(people);
System.out.println("Unique entries:");
filteredPhoneBook.uniquePersonsAddressByName.forEach((k, v) -> System.out.println(k + " : " + v));
System.out.println("\nDuplicates:");
filteredPhoneBook.duplicatedNames().forEach(System.out::println);
}
Output
Unique entries:
Alise : address1
Carol : address4
Duplicates:
Bob

You can't know which keys are duplicates until you have processed the entire input stream. Therefore, any pre-processing step has to make a complete pass of the input before your main logic, which is wasteful.
An alternate approach could be:
Use the merge function to insert a dummy value for the offending key
At the same time, insert the offending key into a Set<K>
After the input stream is processed, iterate over the Set<K> to remove offending keys from the primary map.

In mathematical terms you want to partition your grouped aggregate and handle both parts separately.
Map<String, String> makePhoneBook(Collection<Person> people) {
Map<Boolean, List<Person>> phoneBook = people.stream()
.collect(Collectors.groupingBy(Person::getName))
.values()
.stream()
.collect(Collectors.partitioningBy(list -> list.size() > 1,
Collectors.mapping(r -> r.get(0),
Collectors.toList())));
// handle duplicates
phoneBook.get(true)
.forEach(x -> System.out.println("duplicate found " + x));
return phoneBook.get(false).stream()
.collect(Collectors.toMap(
Person::getName,
Person::getAddress));
}

Java 8 using stream, flatMap and lambda

I have this piece of code and I want to return a list of postCodes:
List<String> postcodes = new ArrayList<>();
List<Entry> entries = x.getEntry(); //getEntry() returns a list of Entry class
for (Entry entry : entries) {
if (entry != null) {
Properties properties = entry.getContent().getProperties();
postcodes.addAll(Arrays.asList(properties.getPostcodes().split(",")));
}
}
return postcodes;
Here's my attempt to use stream() method and the following chained methods:
...some other block of code
List<Entry> entries = x.getEntry.stream()
.filter(entry -> recordEntry != null)
.flatMap(entry -> {
Properties properties = recordEntry.getContent().getProperties();
postCodes.addAll(Arrays.asList(properties.getPostcodes().split(",")));
});

you've got several issues with your code i.e:
postCodes.addAll is a side-effect and therefore you should avoid doing that otherwise when the code is executed in parallel you'll receive non-deterministic results.
flatMap expects a stream, not a boolean; which is what your code currently attempts to pass to flatMap.
flatMap in this case consumes a function that also consumes a value and returns a value back and considering you've decide to use a lambda statement block then you must include a return statement within the lambda statement block specifying the value to return. this is not the case within your code.
stream pipelines are driven by terminal operations which are operations that turn a stream into a non-stream value and your code currently will not execute at all as you've just set up the ingredients but not actually asked for a result from the stream.
the receiver type of your query should be List<String> not List<Entry> as within your current code the call to Arrays.asList(properties.getPostcodes().split(",")) returns a List<String> which you then add to an accumulator with the call addAll.
thanks to Holger for pointing it out, you're constantly failing to decide whether the variable is named entry or recordEntry.
That said here's how I'd rewrite your code:
List<String> entries = x.getEntry.stream()
.filter(Objects::nonNull)
.map(Entry::getContent)
.map(Content::getProperties)
.map(Properties::getPostcodes‌)
.flatMap(Pattern.co‌mpile(",")::splitAsS‌tream)
.collect(Collectors.toList());
and you may want to use Collectors.toCollection to specify a specific implementation of the list returned if deemed appropriate.
edit:
with a couple of good suggestions from shmosel we can actually use method references throughout the stream pipelines and therefore enabling better intent of the code and a lot easier to follow.
or you could proceed with the approach:
List<String> entries = x.getEntry.stream()
.filter(e -> e != null)
.flatMap(e -> Arrays.asList(
e.getContent().getProperties().getPostcodes().split(",")).stream()
)
.collect(Collectors.toList());
if it's more comfortable to you.

Can TreeMap be used to retrieve all key/value pairs above a given key value?

I have a piece of code that maintains a map of revisions done to samples with a given ID:
private Map<Long, SampleId> sampleRevisionMap = new HashMap<>();
While maintaining this, other threads can call in to get all changes made since the given revision number. To find the relevant IDs I do
public Set<SampleId> getRevisionIDs(long clientRevision) {
return sampleRevisionMap.entrySet().stream()
.filter(k -> k.getKey() > clientRevision)
.map(entry -> entry.getValue())
.collect(Collectors.toSet());
}
In short, give me all values with key above a threshold.
is there a better way to do this employing an ordered map, i.e. java.utils.TreeMap?

Yes, you can do it by calling tailMap:
public Collection<SampleId> getRevisionIDs(long clientRevision) {
return sampleRevisionMap.tailMap(clientRevision).values();
}
The above includes the value mapped to clientRevision as well. If you want everything above it, use clientRevision+1 instead.

How to create a List<T> from Map<K,V> and List<K> of keys?

Using Java 8 lambdas, what's the "best" way to effectively create a new List<T> given a List<K> of possible keys and a Map<K,V>? This is the scenario where you are given a List of possible Map keys and are expected to generate a List<T> where T is some type that is constructed based on some aspect of V, the map value types.
I've explored a few and don't feel comfortable claiming one way is better than another (with maybe one exception -- see code). I'll clarify "best" as a combination of code clarity and runtime efficiency. These are what I came up with. I'm sure someone can do better, which is one aspect of this question. I don't like the filter aspect of most as it means needing to create intermediate structures and multiple passes over the names List. Right now, I'm opting for Example 6 -- a plain 'ol loop. (NOTE: Some cryptic thoughts are in the code comments, especially "need to reference externally..." This means external from the lambda.)
public class Java8Mapping {
private final Map<String,Wongo> nameToWongoMap = new HashMap<>();
public Java8Mapping(){
List<String> names = Arrays.asList("abbey","normal","hans","delbrook");
List<String> types = Arrays.asList("crazy","boring","shocking","dead");
for(int i=0; i<names.size(); i++){
nameToWongoMap.put(names.get(i),new Wongo(names.get(i),types.get(i)));
}
}
public static void main(String[] args) {
System.out.println("in main");
Java8Mapping j = new Java8Mapping();
List<String> testNames = Arrays.asList("abbey", "froderick","igor");
System.out.println(j.getBongosExample1(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample2(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample3(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample4(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample5(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample6(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
}
private static class Wongo{
String name;
String type;
public Wongo(String s, String t){name=s;type=t;}
#Override public String toString(){return "Wongo{name="+name+", type="+type+"}";}
}
private static class Bongo{
Wongo wongo;
public Bongo(Wongo w){wongo = w;}
#Override public String toString(){ return "Bongo{wongo="+wongo+"}";}
}
// 1: Create a list externally and add items inside 'forEach'.
// Needs to externally reference Map and List
public List<Bongo> getBongosExample1(List<String> names){
final List<Bongo> listOne = new ArrayList<>();
names.forEach(s -> {
Wongo w = nameToWongoMap.get(s);
if(w != null) {
listOne.add(new Bongo(nameToWongoMap.get(s)));
}
});
return listOne;
}
// 2: Use stream().map().collect()
// Needs to externally reference Map
public List<Bongo> getBongosExample2(List<String> names){
return names.stream()
.filter(s -> nameToWongoMap.get(s) != null)
.map(s -> new Bongo(nameToWongoMap.get(s)))
.collect(Collectors.toList());
}
// 3: Create custom Collector
// Needs to externally reference Map
public List<Bongo> getBongosExample3(List<String> names){
Function<List<Wongo>,List<Bongo>> finisher = list -> list.stream().map(Bongo::new).collect(Collectors.toList());
Collector<String,List<Wongo>,List<Bongo>> bongoCollector =
Collector.of(ArrayList::new,getAccumulator(),getCombiner(),finisher, Characteristics.UNORDERED);
return names.stream().collect(bongoCollector);
}
// example 3 helper code
private BiConsumer<List<Wongo>,String> getAccumulator(){
return (list,string) -> {
Wongo w = nameToWongoMap.get(string);
if(w != null){
list.add(w);
}
};
}
// example 3 helper code
private BinaryOperator<List<Wongo>> getCombiner(){
return (l1,l2) -> {
l1.addAll(l2);
return l1;
};
}
// 4: Use internal Bongo creation facility
public List<Bongo> getBongosExample4(List<String> names){
return names.stream().filter(s->nameToWongoMap.get(s) != null).map(s-> new Bongo(nameToWongoMap.get(s))).collect(Collectors.toList());
}
// 5: Stream the Map EntrySet. This avoids referring to anything outside of the stream,
// but bypasses the lookup benefit from Map.
public List<Bongo> getBongosExample5(List<String> names){
return nameToWongoMap.entrySet().stream().filter(e->names.contains(e.getKey())).map(e -> new Bongo(e.getValue())).collect(Collectors.toList());
}
// 6: Plain-ol-java loop
public List<Bongo> getBongosExample6(List<String> names){
List<Bongo> bongos = new ArrayList<>();
for(String s : names){
Wongo w = nameToWongoMap.get(s);
if(w != null){
bongos.add(new Bongo(w));
}
}
return bongos;
}
}

If namesToWongoMap is an instance variable, you can't really avoid a capturing lambda.
You can clean up the stream by splitting up the operations a little more:
return names.stream()
.map(n -> namesToWongoMap.get(n))
.filter(w -> w != null)
.map(w -> new Bongo(w))
.collect(toList());
return names.stream()
.map(namesToWongoMap::get)
.filter(Objects::nonNull)
.map(Bongo::new)
.collect(toList());
That way you don't call get twice.
This is very much like the for loop, except, for example, it could theoretically be parallelized if namesToWongoMap can't be mutated concurrently.
I don't like the filter aspect of most as it means needing to create intermediate structures and multiple passes over the names List.
There are no intermediate structures and there is only one pass over the List. A stream pipeline says "for each element...do this sequence of operations". Each element is visited once and the pipeline is applied.
Here are some relevant quotes from the java.util.stream package description:
A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state.

Radiodef's answer pretty much nailed it, I think. The solution given there:
return names.stream()
.map(namesToWongoMap::get)
.filter(Objects::nonNull)
.map(Bongo::new)
.collect(toList());
is probably about the best that can be done in Java 8.
I did want to mention a small wrinkle in this, though. The Map.get call returns null if the name isn't present in the map, and this is subsequently filtered out. There's nothing wrong with this per se, though it does bake null-means-not-present semantics into the pipeline structure.
In some sense we'd want a mapper pipeline operation that has a choice of returning zero or one elements. A way to do this with streams is with flatMap. The flatmapper function can return an arbitrary number of elements into the stream, but in this case we want just zero or one. Here's how to do that:
return names.stream()
.flatMap(name -> {
Wongo w = nameToWongoMap.get(name);
return w == null ? Stream.empty() : Stream.of(w);
})
.map(Bongo::new)
.collect(toList());
I admit this is pretty clunky and so I wouldn't recommend doing this. A slightly better but somewhat obscure approach is this:
return names.stream()
.flatMap(name -> Optional.ofNullable(nameToWongoMap.get(name))
.map(Stream::of).orElseGet(Stream::empty))
.map(Bongo::new)
.collect(toList());
but I'm still not sure I'd recommend this as it stands.
The use of flatMap does point to another approach, though. If you have a more complicated policy of how to deal with the not-present case, you could refactor this into a helper function that returns a Stream containing the result or an empty Stream if there's no result.
Finally, JDK 9 -- still under development as of this writing -- has added Stream.ofNullable which is useful in exactly these situations:
return names.stream()
.flatMap(name -> Stream.ofNullable(nameToWongoMap.get(name)))
.map(Bongo::new)
.collect(toList());
As an aside, JDK 9 has also added Optional.stream which creates a zero-or-one stream from an Optional. This is useful in cases where you want to call an Optional-returning function from within flatMap. See this answer and this answer for more discussion.

One approach I didn't see is retainAll:
public List<Bongo> getBongos(List<String> names) {
Map<String, Wongo> copy = new HashMap<>(nameToWongoMap);
copy.keySet().retainAll(names);
return copy.values().stream().map(Bongo::new).collect(
Collectors.toList());
}
The extra Map is a minimal performance hit, since it's just copying pointers to objects, not the objects themselves.

Java 8 Map KeySet Stream not working as desired for use in Collector

I have been trying to learn Java 8's new functional interface features, and I am having some difficulty refactoring code that I have previously written.
As part of a test case, I want to store a list of read names in a Map structure in order to check to see if those reads have been "fixed" in a subsequent section of code. I am converting from an existing Map> data structure. The reason why I am flattening this datastructure is because the outer "String" key of the original Map is not needed in the subsequent analysis (I used it to segregate data from different sources before merging them in the intermediate data). Here is my original program logic:
public class MyClass {
private Map<String, Map<String, Short>> anchorLookup;
...
public void CheckMissingAnchors(...){
Map<String, Boolean> anchorfound = new HashMap<>();
// My old logic used the foreach syntax to populate the "anchorfound" map
for(String rg : anchorLookup.keySet()){
for(String clone : anchorLookup.get(rg).keySet()){
anchorfound.put(clone, false);
}
}
...
// Does work to identify the read name in the file. If found, the boolean in the map
// is set to "true." Afterwards, the program prints the "true" and "false" counts in
// the map
}
}
I attempted to refactor the code to use functional interfaces; however, I getting errors from my IDE (Netbeans 8.0 Patch 2 running Java 1.8.0_05):
public class MyClass {
private Map<String, Map<String, Short>> anchorLookup;
...
public void CheckMissingAnchors(...){
Map<String, Boolean> anchorfound = anchorLookup.keySet()
.stream()
.map((s) -> anchorlookup.get(s).keySet()) // at this point I am expecting a
// Stream<Set<String>> which I thought could be "streamed" for the collector method
// ; however, my IDE does not allow me to select the "stream()" method
.sequential() // this still gives me a Stream<Set<String>>
.collect(Collectors.toMap((s) -> s, (s) -> false);
// I receive an error for the preceding method call, as Stream<Set<String>> cannot be
// converted to type String
...
}
}
Is there a better way to create the "anchorfound" map using the Collection methods or is the vanilla Java "foreach" structure the best way to generate this data structure?
I apologize for any obvious errors in my code. My formal training was not in computer science but I would like to learn more about Java's implementation of functional programming concepts.

I believe what you need is a flatMap.
This way you convert each key of the outer map to a stream of the keys of the corresponding inner map, and then flatten them to a single stream of String.
public class MyClass {
private Map<String, Map<String, Short>> anchorLookup;
...
public void CheckMissingAnchors(...){
Map<String, Boolean> anchorfound = anchorLookup.keySet()
.stream()
.flatMap(s -> anchorlookup.get(s).keySet().stream())
.collect(Collectors.toMap((s) -> s, (s) -> false);
...
}
}

Eran's suggestion of flatMap is a good one, +1.
This can be simplified somewhat by using Map.values() instead of Map.keySet(), since the map's keys aren't used for any other purpose than to retrieve the values. Streaming the result of Map.values() gives a Stream<Map<String,Short>>. Here we don't care about the inner map's values, so we can use keySet() to extract the keys, giving a Stream<Set<String>>. Now we just flatMap these sets into Stream<String>. Finally we send the results into the collector as before.
The resulting code looks like this:
public class MyClass {
private Map<String, Map<String, Short>> anchorLookup;
public void checkMissingAnchors() {
Map<String, Boolean> anchorfound = anchorLookup.values().stream()
.map(Map::keySet)
.flatMap(Set::stream)
.collect(Collectors.toMap(s -> s, s -> false));
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best practice for iterating over Collection with impure functions in Java - java

Related

How to remove Keys that would cause Collisions before executing Collectors.toMap()

Java 8 using stream, flatMap and lambda

Can TreeMap be used to retrieve all key/value pairs above a given key value?

How to create a List<T> from Map<K,V> and List<K> of keys?

Java 8 Map KeySet Stream not working as desired for use in Collector

Categories

Resources