java 8 parallel stream Issue - java

_logger.info("data size : "+saleData.size);
saleData.parallelStream().forEach(data -> {
SaleAggrData saleAggrData = new SaleAggrData() {
{
setCatId(data.getCatId());
setRevenue(RoundUpUtil.roundUpDouble(data.getRevenue()));
setMargin(RoundUpUtil.roundUpDouble(data.getMargin()));
setUnits(data.getUnits());
setMarginRate(ComputeUtil.marginRate(data.getRevenue(), data.getMargin()));
setOtd(ComputeUtil.OTD(data.getRevenue(), data.getUnits()));
setSaleDate(data.getSaleDate());
setDiscountDepth(ComputeUtil.discountDepth(data.getRegularPrice(), data.getRevenue()));
setTransactions(data.getTransactions());
setUpt(ComputeUtil.UPT(data.getUnits(), data.getTransactions()));
}
};
salesAggrData.addSaleAggrData(saleAggrData);
});
The Issue with code is that when I am getting an response from DB, and while iterating using a parallel stream, the data size is different every time, while when using a sequential stream it's working fine.
I can't use a sequential Stream because the data is huge and it's taking time.
Any lead would be helpful.

You are adding elements in parallel to salesAggrData which I'm assuming is some Collection. If it's not a thread-safe Collection, no wonder you get inconsistent results.
Instead of forEach, why don't you use map() and then collect the result into some Collection?
List<SaleAggrData> salesAggrData =
saleData.parallelStream()
.map(data -> {
SaleAggrData saleAggrData = new SaleAggrData() {
{
setCatId(data.getCatId());
setRevenue(RoundUpUtil.roundUpDouble(data.getRevenue()));
setMargin(RoundUpUtil.roundUpDouble(data.getMargin()));
setUnits(data.getUnits());
setMarginRate(ComputeUtil.marginRate(data.getRevenue(), data.getMargin()));
setOtd(ComputeUtil.OTD(data.getRevenue(), data.getUnits()));
setSaleDate(data.getSaleDate());
setDiscountDepth(ComputeUtil.discountDepth(data.getRegularPrice(), data.getRevenue()));
setTransactions(data.getTransactions());
setUpt(ComputeUtil.UPT(data.getUnits(), data.getTransactions()));
}
};
return saleAggrData;
})
.collect(Collectors.toList());
BTW, I'd probably change that anonymous class instance creation, and use a constructor of a named class to create the SaleAggrData instances.

Related

Aggregate values and convert into single type within the same Java stream

I have a class with a collection of Seed elements. One of the method's return type of Seed is Optional<Pair<Boolean, String>>.
I'm trying to loop over all seeds, find if any boolean value is true and at the same time, create a set with all the String values. For instance, my input is in the form Optional<Pair<Boolean, String>>, the output should be Optional<Signal> where Signal is like:
class Signal {
public boolean exposure;
public Set<String> alarms;
// constructor and getters (can add anything to this class, it's just a bag)
}
This is what I currently have that works:
// Seed::hadExposure yields Optional<Pair<Boolean, String>> where Pair have key/value or left/right
public Optional<Signal> withExposure() {
if (seeds.stream().map(Seed::hadExposure).flatMap(Optional::stream).findAny().isEmpty()) {
return Optional.empty();
}
final var exposure = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.anyMatch(Pair::getLeft);
final var alarms = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.map(Pair::getRight)
.filter(Objects::nonNull)
.collect(Collectors.toSet());
return Optional.of(new Signal(exposure, alarms));
}
Now I have time to make it better because Seed::hadExposure could become and expensive call, so I was trying to see if I could make all of this with only one pass. I've tried (some suggestions from previous questions) with reduce, using collectors (Collectors.collectingAndThen, Collectors.partitioningBy, etc.), but nothing so far.
It's possible to do this in a single stream() expression using map to convert the non-empty exposure to a Signal and then a reduce to combine the signals:
Signal signal = exposures.stream()
.map(exposure ->
new Signal(
exposure.getLeft(),
exposure.getRight() == null
? Collections.emptySet()
: Collections.singleton(exposure.getRight())))
.reduce(
new Signal(false, new HashSet<>()),
(leftSig, rightSig) -> {
HashSet<String> alarms = new HashSet<>();
alarms.addAll(leftSig.alarms);
alarms.addAll(rightSig.alarms);
return new Signal(
leftSig.exposure || rightSig.exposure, alarms);
});
However, if you have a lot of alarms it would be expensive because it creates a new Set and adds the new alarms to the accumulated alarms for each exposure in the input.
In a language that was designed from the ground-up to support functional programming, like Scala or Haskell, you'd have a Set data type that would let you efficiently create a new set that's identical to an existing set but with an added element, so there'd be no efficiency worries:
filteredSeeds.foldLeft((false, Set[String]())) { (result, exposure) =>
(result._1 || exposure.getLeft, result._2 + exposure.getRight)
}
But Java doesn't come with anything like that out of the box.
You could create just a single Set for the result and mutate it in your stream's reduce expression, but some would regard that as poor style because you'd be mixing a functional paradigm (map/reduce over a stream) with a procedural one (mutating a set).
Personally, in Java, I'd just ditch the functional approach and use a for loop in this case. It'll be less code, more efficient, and IMO clearer.
If you have enough space to store an intermediate result, you could do something like:
List<Pair<Boolean, String>> exposures =
seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.collect(Collectors.toList());
Then you'd only be calling the expensive Seed::hadExposure method once per item in the input list.

Converting nested for loop to a stream maintaining data

I am looking at a code that has deeply nested for loop that I wanted to rewrite in a pure functional form using java-8 streams but what I see is that there are multiple values that are needed at each level and I am not sure how to approach to solve this in a clean way.
List<Report> reports = new ArrayList();
for (DigitalLogic dl : digitalLogics){
for (Wizard wiz : dl.getWizards){
for(Vice vice : wiz.getVices()){
reports.add(createReport(dl, wiz, vice));
}
}
}
//
Report createReport(DigitalLogic dl, Wizard wiz, Vice vice){
//Gets certain elements from all parameters and creates a report object
}
My real case scenario is much more complicated than this but I am wondering if there is a cleaner pure functional way of writing this using streams. Below is my initial attempt
List<Report> reports = new ArrayList();
digitalLogics.stream()
.map(dl -> dl.getWizards())
.flatMap(List::stream())
.map(wiz -> wiz.getVices())
.flatMap(List::stream())
.forEach(vice -> reports.add(createReport(?, ?, vice));
Obviously, I have lost the DigitalLogic and Wizard references.
I will go with forEach method because stream solution makes this complicated
List<Report> reports = new ArrayList<>();
digitalLogics.forEach(dl->dl.getWizards()
.forEach(wiz->wiz.getVices()
.forEach(v->reports.add(createReport(dl, wiz, v)))));
Though currently what you have(for loops) is much cleaner than what it would be with streams, yet if you were to try it out :
public void createReports(List<DigitalLogic> digitalLogics) {
List<Report> reports = digitalLogics.stream()
.flatMap(dl -> dl.getWizards().stream()
.map(wizard -> new AbstractMap.SimpleEntry<>(dl, wizard)))
.flatMap(entry -> entry.getValue().getVices().stream()
.map(vice -> createReport(entry.getKey(), entry.getValue(), vice)))
.collect(Collectors.toList());
}

Java 8 Streams for List iteration

I have a HashMap that contains List<Dto> and List<List<String>>:
Map<List<Dto>, List<List<String>>> mapData = new HashMap();
and an Arraylist<Dto>.
I want to iterate over this map, get the keys-key1, key2 etc and get the value out of it and set it to the Dto object and thereafter add it to a List. So i am able to successfully iterate using foreach and get it added to lists but not able to get it correctly done using Java 8. So i need some help on that. Here is the sample code
List<DTO> dtoList = new ArrayList();
DTO dto = new DTO();
mapData.entrySet().stream().filter(e->{
if(e.getKey().equals("key1")){
dto.setKey1(e.getValue())
}
if(e.getKey().equals("key2")){
dto.setKey2(e.getValue())
}
});
Here e.getValue() is from List<List<String>>()
so first thing is I need to iterate over it to set the value.
And second is I need to add dto to a Arraylist dtoList. So how to achieve this.
Basic Snippet that i tried without adding to a HashMap where List has keys, multiList has values and Dto list is where finally i add into
for(List<Dto> dtoList: column) {
if ("Key1".equalsIgnoreCase(column.getName())) {
index = dtoList.indexOf(column);
}
}
for(List<String> listoflists: multiList) {
if(listoflists.contains(index)) {
for(String s: listoflists) {
dto.setKey1(s);
}
dtoList.add(dto);
}
}
See https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
So in your snippet above, filter isn't really doing anything. To trigger it, you'd add a collect operation at the end. Notice that the filter lambda function needs to return a boolean for your code to compile in the first place.
mapData.entrySet().stream().filter(entry -> {
// do something here
return true;
}).collect(Collectors.toList());
Of course you don't need to abuse intermediate operations - or generate a bunch of new objects - for straightforward tasks, something like this should suffice:
mapData.entrySet().stream().forEach(entry -> {
// do something
});

Collect stream only if allMatch filter and process stream once in Java

I have the following stream code:
List<Data> results = items.stream()
.map(item -> requestDataForItem(item))
.filter(data -> data.isValid())
.collect(Collectors.toList());
Data requestDataForItem(Item item) {
// call another service here
}
The problem is that I want to call
requestDataForItem only when all elements in the stream are valid.
For example,
if the first item is invalid I don't wont to make the call for any element in the stream.
There is .allMatch in the stream API,
but it returns a boolean.
I want to do the same as .allMatch than
.collect the result when everything matched.
Also, I want to process stream only once,
with two loops it is easy.
Is this possible with the Java Streams API?
This would be a job for Java 9:
List<Data> results = items.stream()
.map(item -> requestDataForItem(item))
.takeWhile(data -> data.isValid())
.collect(Collectors.toList());
This operation will stop at the first invalid element. In a sequential execution, this implies that no subsequent requestDataForItem calls are made. In a parallel execution, some additional elements might get processed concurrently, before the operation stops, but that’s the price for efficient parallel processing.
In either case, the result list will only contain the elements before the first encountered invalid element and you can easily check using results.size() == items.size() whether all elements were valid.
In Java 8, there is no such simple method and using an additional library or rolling out your own implementation of takeWhile wouldn’t pay off considering how simple the non-stream solution would be
List<Data> results = new ArrayList<>();
for(Item item: items) {
Data data = requestDataForItem(item);
if(!data.isValid()) break;
results.add(data);
}
You could theoretically use .allMatch then collect if .allMatch returns true, but then you'd be processing the collection twice. There's no way to do what you're trying to do with the streams API directly.
You could create a method to do this for you and simply pass your collection to it as opposed to using the stream API. This is slightly less elegant than using the stream API but more efficient as it processes the collection only once.
List<Data> results = getAllIfValid(
items.stream().map(item ->
requestDataForItem(item).collect(Collectors.toList())
);
public List<Data> getAllIfValid(List<Data> items) {
List<Data> results = new ArrayList<>();
for (Data d : items) {
if (!d.isValid()) {
return new ArrayList<>();
}
results.add(d);
}
return results;
}
This will return all the results if every element passes and only processes the items collection once. If any fail the isValid() check, it'll return an empty list as you want all or nothing. Simply check to see if the returned collection is empty to see whether or not all items passed the isValid() check.
Implement a two step process:
test if allMatch returns true.
If it does return true, do the collect with a second stream.
Try this.
List<Data> result = new ArrayList<>();
boolean allValid = items.stream()
.map(item -> requestDataForItem(item))
.allMatch(data -> data.isValid() && result.add(data));
if (!allValid)
result.clear();

Is it safe to share a stream instance among multiple threads?

I have a set of keys.
class X {
private static String[] keys = {"k1", "k2", ... };
I have to extract values for the keys from a request. I think I can use map to extract values and create a list necessary objects in some request processing method like this:
public void processReq(Request req) {
...
Stream.of(keys).map(k-> new Pack(k, req.getHeader(k)));
But creating Stream per every request looks unnecessary task. If sharing Stream instance among multiple threads is safe, I think I can modify the code like this:
class X {
private static Stream<String> keys = Stream.of("k1", "k2", ...);
...
public void processReq(Request req) {
...
keys..map(k-> new Pack(k, req.getHeader(k)));
So, is sharing Stream instance among multiple threads like this safe?
Streams are not intended to be used more than once, even in the same thread. If you want to have a collection, use a List (or an array)
private static final List<String> keys = Arrays.asList("k1", "k2", ...);
This can be used multiple times.
List<Pack> packs = keys.stream()
.map(k-> new Pack(k, req.getHeader(k)))
.collect(Collectors.toList());
In your code, the new Pack or req.getHeader is where most of the time is spent.
No; it's not generally safe. "Unless the source was explicitly designed for concurrent modification, unpredictable or erroneous behavior may result from modifying the stream source while it is being queried... A stream should be operated on only once."

Categories