How to avoid multiple Streams with Java 8 - java

I am having the below code
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Company"))
.findFirst()
.ifPresent(id -> {
domainResp.setId(id.getId());
});
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Private"))
.findFirst()
.ifPresent(id ->
domainResp.setPrivateId(id.getId())
);
Here I'm iterating/streaming the list of Id objects 2 times.
The only difference between the two streams is in the filter() operation.
How to achieve it in single iteration, and what is the best approach (in terms of time and space complexity) to do this?

You can achieve that with Stream IPA in one pass though the given set of data and without increasing memory consumption (i.e. the result will contain only ids having required attributes).
For that, you can create a custom Collector that will expect as its parameters a Collection attributes to look for and a Function responsible for extracting the attribute from the stream element.
That's how this generic collector could be implemented.
/** *
* #param <T> - the type of stream elements
* #param <F> - the type of the key (a field of the stream element)
*/
class CollectByKey<T, F> implements Collector<T, Map<F, T>, Map<F, T>> {
private final Set<F> keys;
private final Function<T, F> keyExtractor;
public CollectByKey(Collection<F> keys, Function<T, F> keyExtractor) {
this.keys = new HashSet<>(keys);
this.keyExtractor = keyExtractor;
}
#Override
public Supplier<Map<F, T>> supplier() {
return HashMap::new;
}
#Override
public BiConsumer<Map<F, T>, T> accumulator() {
return this::tryAdd;
}
private void tryAdd(Map<F, T> map, T item) {
F key = keyExtractor.apply(item);
if (keys.remove(key)) {
map.put(key, item);
}
}
#Override
public BinaryOperator<Map<F, T>> combiner() {
return this::tryCombine;
}
private Map<F, T> tryCombine(Map<F, T> left, Map<F, T> right) {
right.forEach(left::putIfAbsent);
return left;
}
#Override
public Function<Map<F, T>, Map<F, T>> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
}
main() - demo (dummy Id class is not shown)
public class CustomCollectorByGivenAttributes {
public static void main(String[] args) {
List<Id> ids = List.of(new Id(1, "Company"), new Id(2, "Fizz"),
new Id(3, "Private"), new Id(4, "Buzz"));
Map<String, Id> idByType = ids.stream()
.collect(new CollectByKey<>(List.of("Company", "Private"), Id::getType));
idByType.forEach((k, v) -> {
if (k.equalsIgnoreCase("Company")) domainResp.setId(v);
if (k.equalsIgnoreCase("Private")) domainResp.setPrivateId(v);
});
System.out.println(idByType.keySet()); // printing keys - added for demo purposes
}
}
Output
[Company, Private]
Note, after the set of keys becomes empty (i.e. all resulting data has been fetched) the further elements of the stream will get ignored, but still all remained data is required to be processed.

IMO, the two streams solution is the most readable. And it may even be the most efficient solution using streams.
IMO, the best way to avoid multiple streams is to use a classical loop. For example:
// There may be bugs ...
boolean seenCompany = false;
boolean seenPrivate = false;
for (Id id: getIds()) {
if (!seenCompany && id.getType().equalsIgnoreCase("Company")) {
domainResp.setId(id.getId());
seenCompany = true;
} else if (!seenPrivate && id.getType().equalsIgnoreCase("Private")) {
domainResp.setPrivateId(id.getId());
seenPrivate = true;
}
if (seenCompany && seenPrivate) {
break;
}
}
It is unclear whether that is more efficient to performing one iteration or two iterations. It will depend on the class returned by getIds() and the code of iteration.
The complicated stuff with two flags is how you replicate the short circuiting behavior of findFirst() in your 2 stream solution. I don't know if it is possible to do that at all using one stream. If you can, it will involve something pretty cunning code.
But as you can see your original solution with 2 stream is clearly easier to understand than the above.
The main point of using streams is to make your code simpler. It is not about efficiency. When you try to do complicated things to make the streams more efficient, you are probably defeating the (true) purpose of using streams in the first place.

For your list of ids, you could just use a map, then assign them after retrieving, if present.
Map<String, Integer> seen = new HashMap<>();
for (Id id : ids) {
if (seen.size() == 2) {
break;
}
seen.computeIfAbsent(id.getType().toLowerCase(), v->id.getId());
}
If you want to test it, you can use the following:
record Id(String getType, int getId) {
#Override
public String toString() {
return String.format("[%s,%s]", getType, getId);
}
}
Random r = new Random();
List<Id> ids = r.ints(20, 1, 100)
.mapToObj(id -> new Id(
r.nextBoolean() ? "Company" : "Private", id))
.toList();
Edited to allow only certain types to be checked
If you have more than two types but only want to check on certain ones, you can do it as follows.
the process is the same except you have a Set of allowed types.
You simply check to see that your are processing one of those types by using contains.
Map<String, Integer> seen = new HashMap<>();
Set<String> allowedTypes = Set.of("company", "private");
for (Id id : ids) {
String type = id.getType();
if (allowedTypes.contains(type.toLowerCase())) {
if (seen.size() == allowedTypes.size()) {
break;
}
seen.computeIfAbsent(type,
v -> id.getId());
}
}
Testing is similar except that additional types need to be included.
create a list of some types that could be present.
and build a list of them as before.
notice that the size of allowed types replaces the value 2 to permit more than two types to be checked before exiting the loop.
List<String> possibleTypes =
List.of("Company", "Type1", "Private", "Type2");
Random r = new Random();
List<Id> ids =
r.ints(30, 1, 100)
.mapToObj(id -> new Id(possibleTypes.get(
r.nextInt((possibleTypes.size()))),
id))
.toList();

You can group by type and check the resulting map.
I suppose the type of ids is IdType.
Map<String, List<IdType>> map = trainResponse.getIds()
.stream()
.collect(Collectors.groupingBy(
id -> id.getType().toLowerCase()));
Optional.ofNullable(map.get("company")).ifPresent(ids -> domainResp.setId(ids.get(0).getId()));
Optional.ofNullable(map.get("private")).ifPresent(ids -> domainResp.setPrivateId(ids.get(0).getId()));

I'd recommend a traditionnal for loop. In addition of being easily scalable, this prevents you from traversing the collection multiple times.
Your code looks like something that'll be generalised in the future, thus my generic approch.
Here's some pseudo code (with errors, just for the sake of illustration)
Set<String> matches = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
for(id : trainResponse.getIds()) {
if (! matches.add(id.getType())) {
continue;
}
switch (id.getType().toLowerCase()) {
case "company":
domainResp.setId(id.getId());
break;
case "private":
...
}
}

Something along these lines can might work, it would go through the whole stream though, and won't stop at the first occurrence.
But assuming a small stream and only one Id for each type, why not?
Map<String, Consumer<String>> setters = new HashMap<>();
setters.put("Company", domainResp::setId);
setters.put("Private", domainResp::setPrivateId);
trainResponse.getIds().forEach(id -> {
if (setters.containsKey(id.getType())) {
setters.get(id.getType()).accept(id.getId());
}
});

We can use the Collectors.filtering from Java 9 onwards to collect the values based on condition.
For this scenario, I have changed code like below
final Map<String, String> results = trainResponse.getIds()
.stream()
.collect(Collectors.filtering(
id -> id.getType().equals("Company") || id.getIdContext().equals("Private"),
Collectors.toMap(Id::getType, Id::getId, (first, second) -> first)));
And getting the id from results Map.

Related

Converting array iteration to lambda function using Java8

I am trying to convert to Lambda function
So far I am able to convert the above code to lambda function like as shown below
Stream.of(acceptedDetails, rejectedDetails)
.filter(list -> !isNull(list) && list.length > 0)
.forEach(new Consumer<Object>() {
public void accept(Object acceptedOrRejected) {
String id;
if(acceptedOrRejected instanceof EmployeeValidationAccepted) {
id = ((EmployeeValidationAccepted) acceptedOrRejected).getId();
} else {
id = ((EmployeeValidationRejected) acceptedOrRejected).getAd().getId();
}
if(acceptedOrRejected instanceof EmployeeValidationAccepted) {
dates1.add(new Integer(id.split("something")[1]));
Integer empId = Integer.valueOf(id.split("something")[2]);
empIds1.add(empId);
} else {
dates2.add(new Integer(id.split("something")[1]));
Integer empId = Integer.valueOf(id.split("something")[2]);
empIds2.add(empId);
}
}
});
But still my goal was to avoid repeating the same logic and also to convert to Lambda function, still in my converted lambda function I feel its not clean and efficient.
This is just for my learning aspect I am doing this stuff by taking one existing code snippet.
Can anyone please tell me how can I improvise the converted Lambda function
Generally, when you try to refactor code, you should only focus on the necessary changes.
Just because you’re going to use the Stream API, there is no reason to clutter the code with checks for null or empty arrays which weren’t in the loop based code. Neither should you change BigInteger to Integer.
Then, you have two different inputs and want to get distinct results from each of them, in other words, you have two entirely different operations. While it is reasonable to consider sharing common code between them, once you identified identical code, there is no sense in trying to express two entirely different operations as a single operation.
First, let’s see how we would do this for a traditional loop:
static void addToLists(String id, List<Integer> empIdList, List<BigInteger> dateList) {
String[] array = id.split("-");
dateList.add(new BigInteger(array[1]));
empIdList.add(Integer.valueOf(array[2]));
}
List<Integer> empIdAccepted = new ArrayList<>();
List<BigInteger> dateAccepted = new ArrayList<>();
for(EmployeeValidationAccepted acceptedDetail : acceptedDetails) {
addToLists(acceptedDetail.getId(), empIdAccepted, dateAccepted);
}
List<Integer> empIdRejected = new ArrayList<>();
List<BigInteger> dateRejected = new ArrayList<>();
for(EmployeeValidationRejected rejectedDetail : rejectedDetails) {
addToLists(rejectedDetail.getAd().getId(), empIdRejected, dateRejected);
}
If we want to express the same as Stream operations, there’s the obstacle of having two results per operation. It truly took until JDK 12 to get a built-in solution:
static Collector<String,?,Map.Entry<List<Integer>,List<BigInteger>>> idAndDate() {
return Collectors.mapping(s -> s.split("-"),
Collectors.teeing(
Collectors.mapping(a -> Integer.valueOf(a[2]), Collectors.toList()),
Collectors.mapping(a -> new BigInteger(a[1]), Collectors.toList()),
Map::entry));
}
Map.Entry<List<Integer>, List<BigInteger>> e;
e = Arrays.stream(acceptedDetails)
.map(EmployeeValidationAccepted::getId)
.collect(idAndDate());
List<Integer> empIdAccepted = e.getKey();
List<BigInteger> dateAccepted = e.getValue();
e = Arrays.stream(rejectedDetails)
.map(r -> r.getAd().getId())
.collect(idAndDate());
List<Integer> empIdRejected = e.getKey();
List<BigInteger> dateRejected = e.getValue();
Since a method can’t return two values, this uses a Map.Entry to hold them.
To use this solution with Java versions before JDK 12, you can use the implementation posted at the end of this answer. You’d also have to replace Map::entry with AbstractMap.SimpleImmutableEntry::new then.
Or you use a custom collector written for this specific operation:
static Collector<String,?,Map.Entry<List<Integer>,List<BigInteger>>> idAndDate() {
return Collector.of(
() -> new AbstractMap.SimpleImmutableEntry<>(new ArrayList<>(), new ArrayList<>()),
(e,id) -> {
String[] array = id.split("-");
e.getValue().add(new BigInteger(array[1]));
e.getKey().add(Integer.valueOf(array[2]));
},
(e1, e2) -> {
e1.getKey().addAll(e2.getKey());
e1.getValue().addAll(e2.getValue());
return e1;
});
}
In other words, using the Stream API does not always make the code simpler.
As a final note, we don’t need to use the Stream API to utilize lambda expressions. We can also use them to move the loop into the common code.
static <T> void addToLists(T[] elements, Function<T,String> tToId,
List<Integer> empIdList, List<BigInteger> dateList) {
for(T t: elements) {
String[] array = tToId.apply(t).split("-");
dateList.add(new BigInteger(array[1]));
empIdList.add(Integer.valueOf(array[2]));
}
}
List<Integer> empIdAccepted = new ArrayList<>();
List<BigInteger> dateAccepted = new ArrayList<>();
addToLists(acceptedDetails, EmployeeValidationAccepted::getId, empIdAccepted, dateAccepted);
List<Integer> empIdRejected = new ArrayList<>();
List<BigInteger> dateRejected = new ArrayList<>();
addToLists(rejectedDetails, r -> r.getAd().getId(), empIdRejected, dateRejected);
A similar approach as #roookeee already posted with but maybe slightly more concise would be to store the mappings using mapping functions declared as :
Function<String, Integer> extractEmployeeId = empId -> Integer.valueOf(empId.split("-")[2]);
Function<String, BigInteger> extractDate = empId -> new BigInteger(empId.split("-")[1]);
then proceed with mapping as:
Map<Integer, BigInteger> acceptedDetailMapping = Arrays.stream(acceptedDetails)
.collect(Collectors.toMap(a -> extractEmployeeId.apply(a.getId()),
a -> extractDate.apply(a.getId())));
Map<Integer, BigInteger> rejectedDetailMapping = Arrays.stream(rejectedDetails)
.collect(Collectors.toMap(a -> extractEmployeeId.apply(a.getAd().getId()),
a -> extractDate.apply(a.getAd().getId())));
Hereafter you can also access the date of acceptance or rejection corresponding to the employeeId of the employee as well.
How about this:
class EmployeeValidationResult {
//constructor + getters omitted for brevity
private final BigInteger date;
private final Integer employeeId;
}
List<EmployeeValidationResult> accepted = Stream.of(acceptedDetails)
.filter(Objects:nonNull)
.map(this::extractValidationResult)
.collect(Collectors.toList());
List<EmployeeValidationResult> rejected = Stream.of(rejectedDetails)
.filter(Objects:nonNull)
.map(this::extractValidationResult)
.collect(Collectors.toList());
EmployeeValidationResult extractValidationResult(EmployeeValidationAccepted accepted) {
return extractValidationResult(accepted.getId());
}
EmployeeValidationResult extractValidationResult(EmployeeValidationRejected rejected) {
return extractValidationResult(rejected.getAd().getId());
}
EmployeeValidationResult extractValidationResult(String id) {
String[] empIdList = id.split("-");
BigInteger date = extractDate(empIdList[1])
Integer empId = extractId(empIdList[2]);
return new EmployeeValidationResult(date, employeeId);
}
Repeating the filter or map operations is good style and explicit about what is happening. Merging the two lists of objects into one and using instanceof clutters the implementation and makes it less readable / maintainable.

Add prefix and suffix to Collectors.joining() only if there are multiple items present

I have a stream of strings:
Stream<String> stream = ...;
I want to construct a string which concatenates these items with , as a separator. I do this as following:
stream.collect(Collectors.joining(","));
Now I want add a prefix [ and a suffix ] to this output only if there were multiple items. For example:
a
[a,b]
[a,b,c]
Can this be done without first materializing the Stream<String> to a List<String> and then checking on List.size() == 1? In code:
public String format(Stream<String> stream) {
List<String> list = stream.collect(Collectors.toList());
if (list.size() == 1) {
return list.get(0);
}
return "[" + list.stream().collect(Collectors.joining(",")) + "]";
}
It feels odd to first convert the stream to a list and then again to a stream to be able to apply the Collectors.joining(","). I think it's suboptimal to loop through the whole stream (which is done during a Collectors.toList()) only to discover if there is one or more item(s) present.
I could implement my own Collector<String, String> which counts the number of given items and use that count afterwards. But I am wondering if there is a directer way.
This question intentionally ignores there case when the stream is empty.
Yes, this is possible using a custom Collector instance that will use an anonymous object with a count of items in the stream and an overloaded toString() method:
public String format(Stream<String> stream) {
return stream.collect(
() -> new Object() {
StringJoiner stringJoiner = new StringJoiner(",");
int count;
#Override
public String toString() {
return count == 1 ? stringJoiner.toString() : "[" + stringJoiner + "]";
}
},
(container, currentString) -> {
container.stringJoiner.add(currentString);
container.count++;
},
(accumulatingContainer, currentContainer) -> {
accumulatingContainer.stringJoiner.merge(currentContainer.stringJoiner);
accumulatingContainer.count += currentContainer.count;
}
).toString();
}
Explanation
Collector interface has the following methods:
public interface Collector<T,A,R> {
Supplier<A> supplier();
BiConsumer<A,T> accumulator();
BinaryOperator<A> combiner();
Function<A,R> finisher();
Set<Characteristics> characteristics();
}
I will omit the last method as it is not relevant for this example.
There is a collect() method with the following signature:
<R> R collect(Supplier<R> supplier,
BiConsumer<R, ? super T> accumulator,
BiConsumer<R, R> combiner);
and in our case it would resolve to:
<Object> Object collect(Supplier<Object> supplier,
BiConsumer<Object, ? super String> accumulator,
BiConsumer<Object, Object> combiner);
In the supplier, we are using an instance of StringJoiner (basically the same thing that Collectors.joining() is using).
In the accumulator, we are using StringJoiner::add() but we increment the count as well
In the combiner, we are using StringJoiner::merge() and add the count to the accumulator
Before returning from format() function, we need to call toString() method to wrap our accumulated StringJoiner instance in [] (or leave it as is is, in case of a single-element stream
The case for an empty case could also be added, I left it out in order not to make this collector more complicated.
There is already an accepted answer and I upvoted it too.
Still I would like to offer potentially another solution. Potentially because it has one requirement:
The stream.spliterator() of Stream<String> stream needs to be Spliterator.SIZED.
If that applies to your case, you could use also this solution:
public String format(Stream<String> stream) {
Spliterator<String> spliterator = stream.spliterator();
StringJoiner sj = spliterator.getExactSizeIfKnown() == 1 ?
new StringJoiner("") :
new StringJoiner(",", "[", "]");
spliterator.forEachRemaining(sj::add);
return sj.toString();
}
According to the JavaDoc Spliterator.getExactSizeIfKnown() "returns estimateSize() if this Spliterator is SIZED, else -1." If a Spliterator is SIZED then "estimateSize() prior to traversal or splitting represents a finite size that, in the absence of structural source modification, represents an exact count of the number of elements that would be encountered by a complete traversal."
Since "most Spliterators for Collections, that cover all elements of a Collection report this characteristic" (API Note in JavaDoc of SIZED) this could be the desired directer way.
EDIT:
If the Stream is empty we can return an empty String at once. If the Stream has only one String there is no need to create a StringJoiner and to copy the String to it. We return the single String directly.
public String format(Stream<String> stream) {
Spliterator<String> spliterator = stream.spliterator();
if (spliterator.getExactSizeIfKnown() == 0) {
return "";
}
if (spliterator.getExactSizeIfKnown() == 1) {
AtomicReference<String> result = new AtomicReference<String>();
spliterator.tryAdvance(result::set);
return result.get();
}
StringJoiner result = new StringJoiner(",", "[", "]");
spliterator.forEachRemaining(result::add);
return result.toString();
}
I know that I'm late to the party, but I too came across this very same problem recently, and I thought it might be worthwhile documenting how I solved it.
While the accepted solution does work, my opinion is that it is more complicated than it needs to be, making it both hard to read and understand. As opposed to defining an Object implementation and then separate accumulator and combiner functions that access and modify the state of said object, why not just define your own collector? In the problem presented above, this can be achieved as follows:
Collector<CharSequence, StringJoiner, String> collector = new Collector<>() {
int count = 0;
#Override
public Supplier<StringJoiner> supplier() {
return () -> new StringJoiner(",");
}
#Override
public BiConsumer<StringJoiner, CharSequence> accumulator() {
return (joiner, sequence) -> {
count++;
joiner.add(sequence);
};
}
#Override
public BinaryOperator<StringJoiner> combiner() {
return StringJoiner::merge;
}
#Override
public Function<StringJoiner, String> finisher() {
return (joiner) -> {
String joined = joiner.toString();
return (count > 1) ? "[" + joined + "]" : joined;
};
}
#Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
};
The principle is the same, you have a counter that is incremented whenever elements are added to the StringJoiner. Based on the total number of elements added, the finisher decides wheater the StringJoiner result should be wrapped in square brackets. It goes without saying, but you can even define a separate class specific to this implementation, allowing it to be used throughout your codebase. You can even go a step further by making it more generic to support a wider range of input types, adding custom prefix/suffix parameters, and so on. Speaking of use, just pass instances of the collector to the collect method of your Stream:
return list.stream().collect(collector);

What is the best way to implement the python count function in java?

I am learning how to use streams in java and I would like to know the most efficient way to copy the python count functionality into java.
For those unfamiliar with python count, see here.
I've already done a naive implementation but I doubt this would ever get added to a production level environment:
private List<String> countMessages(List<String> messages) {
Map<String, Integer> messageOccurrences = new HashMap<>();
List<String> stackedMessages = new LinkedList<String>();
this.messages.stream().filter((message) -> (messageOccurrences.containsKey(message))).forEachOrdered((message) -> {
int new_occ = messageOccurrences.get(message) + 1;
messageOccurrences.put(message, new_occ);
});
messageOccurrences.keySet().forEach((key) -> {
stackedMessages.add(key + "(" + messageOccurrences.get(key) + "times)" );
});
return stackedMessages;
}
Any improvements or pointers would be appreciated.
To answer the question "what is the best way to implement the python count function in java?".
Java already has Collections.frequency which will do exactly that.
However, if you want to do it with the streams API then I believe a generic solution would be:
public static <T> long count(Collection<T> source, T element) {
return source.stream().filter(e -> Objects.equals(e, element)).count();
}
then the use case would be:
long countHellp = count(myStringList, "hello");
long countJohn = count(peopleList, new Person("John"));
long count101 = count(integerList, 101);
...
...
or you can even pass a predicate if you wanted:
public static <T> long count(Collection<T> source, Predicate<? super T> predicate) {
return source.stream().filter(predicate).count();
}
Then the use case would be for example:
long stringsGreaterThanTen = count(myStringList, s -> s.length() > 10);
long malesCount = count(peopleList, Person::isMale);
long evens = count(integerList, i -> i % 2 == 0);
...
...
Given your comment on the post, it seems like you want to "group" then and get the count of each group.
public Map<String, Long> countMessages(List<String> messages) {
return messages.stream()
.collect(groupingBy(Function.identity(), counting()));
}
This creates a stream from the messages list and then groups them, passing a counting() as the downstream collector meaning we will retrieve a Map<String, Long> where the keys are the elements and the values are the occurrences of that specific string.
Ensure you have the import:
import static java.util.stream.Collectors.*;
for the latter solution.

How to eliminate duplicate entries within a stream based on a own Equal class

I do have a simialar problem like descripted here. But with two differences first I do use the stream api and second I do have an equals() and hashCode() method already. But within the stream the equalitity of the of Blogs are in this context not the same as defined in the Blog class.
Collection<Blog> elements = x.stream()
... // a lot of filter and map stuff
.peek(p -> sysout(p)) // a stream of Blog
.? // how to remove duplicates - .distinct() doesn't work
I do have a class with an equal Method lets call it ContextBlogEqual with the method
public boolean equal(Blog a, Blog b);
Is there any way removing all duplicate entries with my current stream approach based on the ContextBlogEqual#equal method?
I thought already on grouping, but this doesn't work either, because the reason why blogA and blogB is equal isn't just one parameter. Also I have no idea how I could use .reduce(..), because there is useally more than one element left.
In essence, you either have to define hashCode to make your data work with a hashtable, or a total order to make it work with a binary search tree.
For hashtables you'll need to declare a wrapper class which will override equals and hashCode.
For binary trees you can define a Comparator<Blog> which respects your equality definition and adds an arbitrary, but consistent, ordering criterion. Then you can collect into a new TreeSet<Blog>(yourComparator).
First, please note that equal(Blog, Blog) method is not enough for the most scenarios as you will need to pairwise compare all the entries which is not efficient. It's better to define the function which extracts new key from the blog entry. For example, let's consider the following Blog class:
static class Blog {
final String name;
final int id;
final long time;
public Blog(String name, int id, long time) {
this.name = name;
this.id = id;
this.time = time;
}
#Override
public int hashCode() {
return Objects.hash(name, id, time);
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null || getClass() != obj.getClass())
return false;
Blog other = (Blog) obj;
return id == other.id && time == other.time && Objects.equals(name, other.name);
}
public String toString() {
return name+":"+id+":"+time;
}
}
Let's have some test data:
List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234),
new Blog("bar", 2, 1345), new Blog("foo", 1, 1345),
new Blog("bar", 2, 1345));
List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList());
System.out.println(distinctBlogs);
Here distinctBlogs contains three entries: [foo:1:1234, bar:2:1345, foo:1:1345]. Suppose that it's undesired, because we don't want to compare the time field. The simplest way to create new key is to use Arrays.asList:
Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id);
The resulting keys already have proper equals and hashCode implementations.
Now if you fine with terminal operation, you may create a custom collector like this:
List<Blog> distinctByNameId = blogs.stream().collect(
Collectors.collectingAndThen(Collectors.toMap(
keyExtractor, Function.identity(),
(a, b) -> a, LinkedHashMap::new),
map -> new ArrayList<>(map.values())));
System.out.println(distinctByNameId);
Here we use keyExtractor to generate the keys and merge function is (a, b) -> a which means select the previously added entry when repeating key appears. We use LinkedHashMap to preserve the order (omit this parameter if you don't care about order). Finally we dump the map values into the new ArrayList. You can move such collector creation to the separate method and generalize it:
public static <T> Collector<T, ?, List<T>> distinctBy(
Function<? super T, ?> keyExtractor) {
return Collectors.collectingAndThen(
Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new),
map -> new ArrayList<>(map.values()));
}
This way the usage will be simpler:
List<Blog> distinctByNameId = blogs.stream()
.collect(distinctBy(b -> Arrays.asList(b.name, b.id)));
Essentially, you'll need a helper method like this one:
static <T, U> Stream<T> distinct(
Stream<T> stream,
Function<? super T, ? extends U> keyExtractor
) {
final Map<U, String> seen = new ConcurrentHashMap<>();
return stream.filter(t -> seen.put(keyExtractor.apply(t), "") == null);
}
It takes a Stream, and returns a new Stream that contains only distinct values given the keyExtractor. An example:
class O {
final int i;
O(int i) {
this.i = i;
}
#Override
public String toString() {
return "O(" + i + ")";
}
}
distinct(Stream.of(new O(1), new O(1), new O(2)), o -> o.i)
.forEach(System.out::println);
This yields
O(1)
O(2)
Disclaimer
As commented by Tagir Valeev here and in this similar answer by Stuart Marks, this approach has flaws. The operation as implemented here...
is unstable for ordered parallel streams
is not optimal for sequential streams
violates the stateless predicate constraint on Stream.filter()
Wrapping the above in your own library
You can of course extend Stream with your own functionality and implement this new distinct() function in there, e.g. like jOOλ or Javaslang do:
Seq.of(new O(1), new O(1), new O(2))
.distinct(o -> o.i)
.forEach(System.out::println);

How to create a List<T> from Map<K,V> and List<K> of keys?

Using Java 8 lambdas, what's the "best" way to effectively create a new List<T> given a List<K> of possible keys and a Map<K,V>? This is the scenario where you are given a List of possible Map keys and are expected to generate a List<T> where T is some type that is constructed based on some aspect of V, the map value types.
I've explored a few and don't feel comfortable claiming one way is better than another (with maybe one exception -- see code). I'll clarify "best" as a combination of code clarity and runtime efficiency. These are what I came up with. I'm sure someone can do better, which is one aspect of this question. I don't like the filter aspect of most as it means needing to create intermediate structures and multiple passes over the names List. Right now, I'm opting for Example 6 -- a plain 'ol loop. (NOTE: Some cryptic thoughts are in the code comments, especially "need to reference externally..." This means external from the lambda.)
public class Java8Mapping {
private final Map<String,Wongo> nameToWongoMap = new HashMap<>();
public Java8Mapping(){
List<String> names = Arrays.asList("abbey","normal","hans","delbrook");
List<String> types = Arrays.asList("crazy","boring","shocking","dead");
for(int i=0; i<names.size(); i++){
nameToWongoMap.put(names.get(i),new Wongo(names.get(i),types.get(i)));
}
}
public static void main(String[] args) {
System.out.println("in main");
Java8Mapping j = new Java8Mapping();
List<String> testNames = Arrays.asList("abbey", "froderick","igor");
System.out.println(j.getBongosExample1(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample2(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample3(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample4(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample5(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
System.out.println(j.getBongosExample6(testNames).stream().map(Bongo::toString).collect(Collectors.joining(", ")));
}
private static class Wongo{
String name;
String type;
public Wongo(String s, String t){name=s;type=t;}
#Override public String toString(){return "Wongo{name="+name+", type="+type+"}";}
}
private static class Bongo{
Wongo wongo;
public Bongo(Wongo w){wongo = w;}
#Override public String toString(){ return "Bongo{wongo="+wongo+"}";}
}
// 1: Create a list externally and add items inside 'forEach'.
// Needs to externally reference Map and List
public List<Bongo> getBongosExample1(List<String> names){
final List<Bongo> listOne = new ArrayList<>();
names.forEach(s -> {
Wongo w = nameToWongoMap.get(s);
if(w != null) {
listOne.add(new Bongo(nameToWongoMap.get(s)));
}
});
return listOne;
}
// 2: Use stream().map().collect()
// Needs to externally reference Map
public List<Bongo> getBongosExample2(List<String> names){
return names.stream()
.filter(s -> nameToWongoMap.get(s) != null)
.map(s -> new Bongo(nameToWongoMap.get(s)))
.collect(Collectors.toList());
}
// 3: Create custom Collector
// Needs to externally reference Map
public List<Bongo> getBongosExample3(List<String> names){
Function<List<Wongo>,List<Bongo>> finisher = list -> list.stream().map(Bongo::new).collect(Collectors.toList());
Collector<String,List<Wongo>,List<Bongo>> bongoCollector =
Collector.of(ArrayList::new,getAccumulator(),getCombiner(),finisher, Characteristics.UNORDERED);
return names.stream().collect(bongoCollector);
}
// example 3 helper code
private BiConsumer<List<Wongo>,String> getAccumulator(){
return (list,string) -> {
Wongo w = nameToWongoMap.get(string);
if(w != null){
list.add(w);
}
};
}
// example 3 helper code
private BinaryOperator<List<Wongo>> getCombiner(){
return (l1,l2) -> {
l1.addAll(l2);
return l1;
};
}
// 4: Use internal Bongo creation facility
public List<Bongo> getBongosExample4(List<String> names){
return names.stream().filter(s->nameToWongoMap.get(s) != null).map(s-> new Bongo(nameToWongoMap.get(s))).collect(Collectors.toList());
}
// 5: Stream the Map EntrySet. This avoids referring to anything outside of the stream,
// but bypasses the lookup benefit from Map.
public List<Bongo> getBongosExample5(List<String> names){
return nameToWongoMap.entrySet().stream().filter(e->names.contains(e.getKey())).map(e -> new Bongo(e.getValue())).collect(Collectors.toList());
}
// 6: Plain-ol-java loop
public List<Bongo> getBongosExample6(List<String> names){
List<Bongo> bongos = new ArrayList<>();
for(String s : names){
Wongo w = nameToWongoMap.get(s);
if(w != null){
bongos.add(new Bongo(w));
}
}
return bongos;
}
}
If namesToWongoMap is an instance variable, you can't really avoid a capturing lambda.
You can clean up the stream by splitting up the operations a little more:
return names.stream()
.map(n -> namesToWongoMap.get(n))
.filter(w -> w != null)
.map(w -> new Bongo(w))
.collect(toList());
return names.stream()
.map(namesToWongoMap::get)
.filter(Objects::nonNull)
.map(Bongo::new)
.collect(toList());
That way you don't call get twice.
This is very much like the for loop, except, for example, it could theoretically be parallelized if namesToWongoMap can't be mutated concurrently.
I don't like the filter aspect of most as it means needing to create intermediate structures and multiple passes over the names List.
There are no intermediate structures and there is only one pass over the List. A stream pipeline says "for each element...do this sequence of operations". Each element is visited once and the pipeline is applied.
Here are some relevant quotes from the java.util.stream package description:
A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
Processing streams lazily allows for significant efficiencies; in a pipeline such as the filter-map-sum example above, filtering, mapping, and summing can be fused into a single pass on the data, with minimal intermediate state.
Radiodef's answer pretty much nailed it, I think. The solution given there:
return names.stream()
.map(namesToWongoMap::get)
.filter(Objects::nonNull)
.map(Bongo::new)
.collect(toList());
is probably about the best that can be done in Java 8.
I did want to mention a small wrinkle in this, though. The Map.get call returns null if the name isn't present in the map, and this is subsequently filtered out. There's nothing wrong with this per se, though it does bake null-means-not-present semantics into the pipeline structure.
In some sense we'd want a mapper pipeline operation that has a choice of returning zero or one elements. A way to do this with streams is with flatMap. The flatmapper function can return an arbitrary number of elements into the stream, but in this case we want just zero or one. Here's how to do that:
return names.stream()
.flatMap(name -> {
Wongo w = nameToWongoMap.get(name);
return w == null ? Stream.empty() : Stream.of(w);
})
.map(Bongo::new)
.collect(toList());
I admit this is pretty clunky and so I wouldn't recommend doing this. A slightly better but somewhat obscure approach is this:
return names.stream()
.flatMap(name -> Optional.ofNullable(nameToWongoMap.get(name))
.map(Stream::of).orElseGet(Stream::empty))
.map(Bongo::new)
.collect(toList());
but I'm still not sure I'd recommend this as it stands.
The use of flatMap does point to another approach, though. If you have a more complicated policy of how to deal with the not-present case, you could refactor this into a helper function that returns a Stream containing the result or an empty Stream if there's no result.
Finally, JDK 9 -- still under development as of this writing -- has added Stream.ofNullable which is useful in exactly these situations:
return names.stream()
.flatMap(name -> Stream.ofNullable(nameToWongoMap.get(name)))
.map(Bongo::new)
.collect(toList());
As an aside, JDK 9 has also added Optional.stream which creates a zero-or-one stream from an Optional. This is useful in cases where you want to call an Optional-returning function from within flatMap. See this answer and this answer for more discussion.
One approach I didn't see is retainAll:
public List<Bongo> getBongos(List<String> names) {
Map<String, Wongo> copy = new HashMap<>(nameToWongoMap);
copy.keySet().retainAll(names);
return copy.values().stream().map(Bongo::new).collect(
Collectors.toList());
}
The extra Map is a minimal performance hit, since it's just copying pointers to objects, not the objects themselves.

Categories