Preserve order in Java stream with collect - java

I am running into a bit of an issue where I need order preserved in an operation being performed on a list of strings and using the collect method from the java streams api.
public List<String> countOccurrences(ArrayList<String> messages) {
List<String> stackedMessages = new LinkedList<>();
HashMap<String, Integer> messageOccurrences =
messages.stream()
.collect(groupingBy(Function.identity(), summingInt(e -> 1)));
messageOccurrences.forEach((key, value) -> {
String appendString = value == 1 ? "" : " (" + value + " times)";
stackedMessages.add(key + appendString);
});
return stackedMessages;
}
The problem with the above code is if I process a list such as ["blah", "blah", "yep"], it returns ["yep", "blah (2 times)"] where I need it to return ["blah (2 times)", "yep"].
I looked at this post here already and was lead to believe if I am using a stream on an already ordered data structure then order would be ensured: How to ensure order of processing in java8 streams?
I'm thinking I need to change groupingBy to toMap and as of right now I am reading through that documentation. Anybody who is well versed in the subject matter already please offer some pointers.
UPDATE:
Thanks to the user #Aominè, this is the correct way to do it using groupingBy
.collect(groupingBy(Function.identity(),LinkedHashMap::new, summingInt(e->1)))

You’ll need to collect to a LinkedHashMap to get the expected result. Also, you don’t have to do another forEach separately from the stream pipeline after the groupingBy, just create a stream from the entrySet and map then collect to list.
return messages.stream()
.collect(groupingBy(Function.identity(), LinkedHashMap::new, summingInt(e -> 1)))
.entrySet()
.stream()
.map(e -> e.getKey()+(e.getValue() == 1 ? "" : " (" + e.getValue() +" times)"))
.collect(toList());

Related

Stream to compare two list by index

I have 2 distinct lists: List1 and List2, and I want to perform an action for each index where the elements have the same getName() value:
for (int i = 0; i < 5; i++) {
if (list1.get(i).getName().equals(list2.get(i).getName())) {
// TODO
}
}
Is there a way to do this using Java streams? I have tried the logic:
if (List1().stream().anymatch(x -> List2().stream().anymatch(y -> x.getName().equals(y.getName))))) {
// TODO
}
This works, but first object(index) of List1 is compared with every object(index) of List2.
What I need is first object(index) of List1 to be compared with first of List2, second index to second and so on.
How to write this if logic using stream instead of for loop.
One solution is to use a third-party library that has the ability to zip pairs of streams into a single stream of pairs. Using this approach, the two streams would be zipped into a single stream containing the pairs of elements, the pairs with equal names would be retained via a call to Stream.filter(), and then finally the remaining filtered pairs of elements will have the action applied to them.
The Stack Overflow answer Zipping streams using JDK8 with lambda (java.util.stream.Streams.zip) contains various solutions for zipping two streams in this manner.
Guava
Google's Guava library provides the Streams.zip method:
Streams.zip(list1.stream(), list2.stream(), Map::entry)
.filter(e -> e.getKey().getName().equals(e.getValue().getName()))
.forEachOrdered(e ->
System.out.println("Match: " + e.getKey() + ", " + e.getValue()));
StreamEx
The StreamEx library provides multiple zip methods, including StreamEx.zip and EntryStream.zip:
EntryStream.zip(list1, list2)
.filterKeyValue((k, v) -> k.getName().equals(v.getName()))
.forKeyValue((k, v) -> System.out.println("Match: " + k + ", " + v));
One approach would be to stream over the indexes rather than the lists, and use List.get(i) to retrieve elements from each List.
IntStream.range(0, list1.size())
.mapToObj(i -> Map.entry(list1.get(i), list2.get(i)))
.filter(e -> e.getKey().getName().equals(e.getValue().getName()))
.forEachOrdered(e ->
System.out.println("Match: " + e.getKey() + ", " + e.getValue()));
or
IntStream.range(0, list1.size())
.filter(i -> list1.get(i).getName().equals(list2.get(i).getName()))
.forEachOrdered(i ->
System.out.printf("Match at index %s: %s, %s\n", i, list1.get(i), list2.get(i)));
Note for this approach to be efficient, the lists must support fast (e.g. constant-time) random access. This would generally mean any list which implements the RandomAccess interface, such as ArrayList and Arrays.asList().

J8 Stream Sorting by Vector in List<Tuple2<Vector, String>>

List<Tuple2<Vector, String>> vectors = parsedData1.collect();
vectors.stream().map(e -> e._2 + " : " + clusters.predict(e._1)).sorted().forEach(System.out::println);
The above code creates a list of results that attaches an int to a string. When I go to print them I get something like this:
How do I sort the stream by the number and not the string?
You want to use sorted(Comparator) that enables sorting based on a custom comparing mechanism. Another thing you want is to sort first before you concatenate the String, otherwise, you would need to split it back to detect a number, sort based according to it, and concatenate again.
vectors.stream()
.sorted(Comparator.comparing(e -> clusters.predict(e._1)))
.map(e -> e._2 + " : " + clusters.predict(e._1))
.forEach(System.out::println);
In case clusters.predict(e._1) is computationally expensive, you might want to precompute its value and act then:
vectors.stream()
.map(e -> new Tuple2<>(clusters.predict(e._1), e._2))
.sorted(Comparator.comparing(e -> e._1))
.map(e -> e._2 + " : " + e._1)
.forEach(System.out::println);

Temporary store a value using `java-stream`

I was wondering if there is a way of temporary storing/carrying a value within a java-stream?
List<Department> departments = (ArrayList<Department>) departmentRepository.findAll();
departments.parallelStream()
.filter(e -> e.getUnderstuffCount(date) > 0)
.forEach(e -> {
holidaysPlaningProblemManager.addProblem(e, date, e.getUnderstuffCount(date));
});
the multiple calls for e.getUnderstuffCount(date) seams "dirty" and I was wondering if there is a way to avoid it by carrying the result of a single call?
EDIT:
Please consider this as a java-stream question. There are a lot of ways to solve this, for example a #Transient field in the Department #Entity can store the value and a simple getter can return it.
But i'm interested if this can be done using the java-stream
As I mentioned in my comment, there is a popular suggestion for questions like this to map a stream element to a pair containing the element itself and whichever intermediate value you need. First, let's find a simpler example (that also compiles for me):
Stream.of("Hello", "World", "!")
.filter(s -> s.length() > 1)
.forEach(s -> System.out.println(s + ": " + s.length()));
// Hello: 5
// World: 5
If we now take either a simple array or an AbstractMap.SimpleEntry as a pair object, we can add one mapping step:
Stream.of("Hello", "World", "!")
.map(s -> new Object[] {s, s.length()})
.filter(a -> ((int) a[1]) > 1)
.forEach(a -> System.out.println(a[0] + ": " + a[1]));
Stream.of("Hello", "World", "!")
.map(s -> new AbstractMap.SimpleEntry<>(s, s.length()))
.filter(e -> e.getValue() > 1)
.forEach(e -> System.out.println(e.getKey() + ": " + e.getValue()));
The array variant is a bit shorter to write, but not type-safe. I'd use the second posibility. Or maybe you could add an actual Pair<L, R> class to your project with short method names and a nice factory method (Pair.of(s, s.length())). Whichever way you choose, it is better than collecting into a map with a collector (stream.collect(toMap()).entrySet().stream().filter(e).forEach(e)), as you do not materialize the whole map into memory.
The other possibility I mentioned is to pull the check into the forEach. Given your (possibily simplified) example, I'd say this is the most easy to read way, but is of course not as "stream-y" or "lambda-y".
Stream.of("Hello", "World", "!").forEach(s -> {
int length = s.length();
if (length > 1)
{
System.out.println(s + ": " + length);
}
});
In this simple example, you don't even need to have a stream. Just call Collection.forEach().
In the end you have to see whether using streams here is even worth it. You use parallel stream, but does it actually speed things up (e.g. is your holidaysPlaningProblemManager thread-safe or does it just synchronize all threads again, making the parallel stream pointless)? If not, you could just refactor to a normal for loop or Collection.forEach.
If you want to filter the departments, I think you should filter it first before retrieving it from the database and have the problem manager accept a list.
List<Department> departments = departmentRepository.someMethod(date);
holidaysPlaningProblemManager.addProblems(date, departments);
Unless you still have something to do with the elements in the collection.

Java 8 streams - map element to pair of elements

I have a stream of elements. I want to map each element to two different elements of the same type so that my stream will be two times longer in consequence.
I did this by concatenating two streams, but I'm wondering is it possible to do this simpler?
What I've done so far:
private List<String> getTranslationFilesNames() {
return Stream.concat(SUPPORTED_LANGUAGES.stream()
.map(lang -> PATH_TO_FILES_WITH_TRANSLATIONS_1 + lang + FILE_EXTENSION),
SUPPORTED_LANGUAGES.stream()
.map(lang -> PATH_TO_FILES_WITH_TRANSLATIONS_2 + lang + FILE_EXTENSION))
.collect(Collectors.toList());
}
I don't find this solution very elegant. Is there better approach to achieve the same effect?
If you don't care about the order, you can map each element to a pair of elements with flatMap:
private List<String> getTranslationFilesNames() {
return SUPPORTED_LANGUAGES.stream()
.flatMap(lang -> Stream.of(PATH_TO_FILES_WITH_TRANSLATIONS_1 + lang + FILE_EXTENSION,
PATH_TO_FILES_WITH_TRANSLATIONS_2 + lang + FILE_EXTENSION)),
.collect(Collectors.toList());
}
Instead of creating a combined Stream, you can simply use the #flatMap operator to duplicate the input elements (note that this may be a suitable solution only if the elements order does not matter):
private List<String> getTranslationFilesNames() {
return SUPPORTED_LANGUAGES.stream()
.flatMap(s -> Stream.of(
PATH_TO_FILES_WITH_TRANSLATIONS_1 + s + FILE_EXTENSION,
PATH_TO_FILES_WITH_TRANSLATIONS_2 + s + FILE_EXTENSION
)
)
.collect(Collectors.toList());
}

How to obtain new list after groupby and sum over one attribute

I have a list of Settlement class which has the following attributes:
public class Settlement {
private String contractNo;
private String smartNo;
private String dealTrackNo;
private String buySellFlag;
private String cashFlowType;
private String location;
private String leaseNo;
private String leaseName;
private double volume;
private double price;
private double settleAmount;
// getters and setters
}
Now I would like to group the list of Settlement by SmartNo (String) and get the sum over settleAmount which becomes the new settleAmount for each SmartNo.
Since I am using Java 8, stream should be the way to go.
Groupby should be quite straight forward using the following code:
Map<String, List<Settlement>> map = list.stream()
.collect(Collectors.groupingBy(Settlement::getSmartNo));
System.out.println(map.getValues());
What if I want to get a new list after grouping by SmartNo and summing over settlementAmount? Most of the examples out there only shows how to print out the sums. What I am interested is how to get the aggregated list?
If I understand the question correctly, you need a toMap collector with custom merger like this:
list.stream().collect(Collectors.toMap(
Settlement::getSmartNo,
Function.identity(),
(s1, s2) -> s1.addAmount(s2.getSettleAmount())));
With a helper method inside Settlement class:
Settlement addAmount(double addend) {
this.settleAmount += addend;
return this;
}
I think the not-too-complex way through is a new stream on each member of the values() of your map and then a map() and reduce(). I am mapping to a new class AggregatedSettlement with just the three fields smartNo, volume and settleAmount (the last will be the sum). And then reducing by summing the settleAmounts.
List<AggregatedSettlement> aggregatedList = list.stream()
.collect(Collectors.groupingBy(Settlement::getSmartNo))
.values()
.stream()
.map(innerList -> innerList.stream()
.map(settlm -> new AggregatedSettlement(settlm.getSmartNo(),
settlm.getVolume(), settlm.getSettleAmount()))
.reduce((as1, as2) -> {
if (as1.getVolume() != as2.getVolume()) {
throw new IllegalStateException("Different volumes " + as1.getVolume()
+ " and " + as2.getVolume() + " for smartNo " + as1.getSmartNo());
}
return new AggregatedSettlement(as1.getSmartNo(), as1.getVolume(),
as1.getSettleAmount() + as2.getSettleAmount());
})
.get()
)
.collect(Collectors.toList());
I am not too happy about the call to get() on the Optional<AggregatedSettlement> that I get from reduce(); usually you should avoid get(). In this case I know that the original grouping only produced lists of at least one element, so the the reduce() cannot give an empty optional, hence the call to get() will work. A possible refinement would be orElseThrow() and a more explanatory exception.
I am sure there’s room for optimization. I am really producing quite many more AggregatedSettlement objects than we need in the end. As always, don’t optimize until you know you need to.
Edit: If only for the exercise here’s the version that doesn’t construct superfluous AggregatedSettlement objects. Instead it creates two streams on each list from your map, and it’s 5 lines longer:
List<AggregatedSettlement> aggregatedList = list.stream()
.collect(Collectors.groupingBy(Settlement::getSmartNo))
.entrySet()
.stream()
.map(entry -> {
double volume = entry.getValue()
.stream()
.mapToDouble(Settlement::getVolume)
.reduce((vol1, vol2) -> {
if (vol1 != vol2) {
throw new IllegalStateException("Different volumes " + vol1
+ " and " + vol2 + " for smartNo " + entry.getKey());
}
return vol1;
})
.getAsDouble();
double settleAmountSum = entry.getValue()
.stream()
.mapToDouble(Settlement::getSettleAmount)
.sum();
return new AggregatedSettlement(entry.getKey(), volume, settleAmountSum);
})
.collect(Collectors.toList());
Pick the one you find easier to read.
Edit 2: It seems from this answer that in Java 9 I will be able to avoid the call to Optional.get() if instead of map() I use flatMap() and instead of get() I use stream(). It will be 6 chars longer, I may still prefer it. I haven’t tried Java 9 yet, though (now I know what I’m going to do today :-) The advantage of get() is of course that it would catch a programming error where the inner list comes out empty.

Categories