I have a HashMap that contains List<Dto> and List<List<String>>:
Map<List<Dto>, List<List<String>>> mapData = new HashMap();
and an Arraylist<Dto>.
I want to iterate over this map, get the keys-key1, key2 etc and get the value out of it and set it to the Dto object and thereafter add it to a List. So i am able to successfully iterate using foreach and get it added to lists but not able to get it correctly done using Java 8. So i need some help on that. Here is the sample code
List<DTO> dtoList = new ArrayList();
DTO dto = new DTO();
mapData.entrySet().stream().filter(e->{
if(e.getKey().equals("key1")){
dto.setKey1(e.getValue())
}
if(e.getKey().equals("key2")){
dto.setKey2(e.getValue())
}
});
Here e.getValue() is from List<List<String>>()
so first thing is I need to iterate over it to set the value.
And second is I need to add dto to a Arraylist dtoList. So how to achieve this.
Basic Snippet that i tried without adding to a HashMap where List has keys, multiList has values and Dto list is where finally i add into
for(List<Dto> dtoList: column) {
if ("Key1".equalsIgnoreCase(column.getName())) {
index = dtoList.indexOf(column);
}
}
for(List<String> listoflists: multiList) {
if(listoflists.contains(index)) {
for(String s: listoflists) {
dto.setKey1(s);
}
dtoList.add(dto);
}
}
See https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
Stream operations are divided into intermediate and terminal operations, and are combined to form stream pipelines. A stream pipeline consists of a source (such as a Collection, an array, a generator function, or an I/O channel); followed by zero or more intermediate operations such as Stream.filter or Stream.map; and a terminal operation such as Stream.forEach or Stream.reduce.
So in your snippet above, filter isn't really doing anything. To trigger it, you'd add a collect operation at the end. Notice that the filter lambda function needs to return a boolean for your code to compile in the first place.
mapData.entrySet().stream().filter(entry -> {
// do something here
return true;
}).collect(Collectors.toList());
Of course you don't need to abuse intermediate operations - or generate a bunch of new objects - for straightforward tasks, something like this should suffice:
mapData.entrySet().stream().forEach(entry -> {
// do something
});
Related
I have a list of strings (partList) which is a fraction part of another list of strings (completeList) and i want to process an object (through processObj()) within a for loop in a way that the current element of the partList will be retracted from the completeList: when the current iteration is on an element of partList then the processing of the object will involve that element and the rest of elements from completeList i do it like so for now:
for (String el: partList) {
completeList.remove(el);
//process the target object using as parameters el and the rest of the complete list except el...
processObj(el,completeList);
completeList.add(el);
}
Is it the right way of doing it?
Thanks for the enlightenment.
I'm not sure the purpose of removing then adding back to the same list, but you can use a Predicate to accept and process certain values.
Predicate<String> accept = (s) -> {
return true; // accepts all strings; you could use partList.contains(s) here, or !s.equals(el)
}
completeList.stream()
.filter(accept.negate()) // Inverse the predicate to implement "except"
.forEach(processObj);
Replace forEach with map if you want to modify the stream values, then you can collect() to get data back to a list.
Java 11 here. I have a List<Foobar> as well as a Map<Foobar,List<String>>.
I would like to iterate over the list and:
if the current Foobar is a key in the map, and a specific string ("Can't please everyone") to that entry's value list
if the current Foobar is not a key in the map, and it as a new key, with a value that is an ArrayList consisting of a single string with the same value
I can accomplish this like so:
List<Foobar> foobarList = getSomehow();
Map<Foobar,List<String>> foobarMap = getItSomehow();
String msg = "Can't please everyone";
for (Foobar fb : foobarList) {
if (foobarMap.containsKey(fb)) {
foobarMap.get(fb).add(msg);
} else {
foobarMap.put(fb, Collections.singletonList(msg));
}
}
This works great, but I'm trying to get this to work using the Java Stream API. My best attempt thus far:
List<Foobar> foobarList = getSomehow();
Map<Foobar,List<String>> foobarMap = getItSomehow();
String msg = "Can't please everyone";
foobarList.stream()
.filter(fb -> foobarMap.containsKey(fb))
.map(fb -> foobarMap.get(fb).add(msg))
.filter(fb -> !foobarMap.containsKey(fb))
.map(fb -> foobarMap.put(fb. Collections.singleton(msg));
Yields several compiler errors. Can anyone spot where I'm going awry?
Streams are used either
To modify the contents of the stream elements, or
To produce another stream from it, or
To iterate over the elements and do something that doesn't affect the elements of this stream.
Since your use case is the last type, the logical operation is simply forEach(..). (I know it is a dampener :-), but that is how the use case is.)
foobarList.forEach( fb -> {
if (foobarMap.containsKey(fb)) {
foobarMap.get(fb).add(msg);
} else {
foobarMap.put(fb, Collections.singletonList(msg));
}
} );
As noticed by #Sree Kumar, you should use forEach().
However, I would suggest leveraging the Map.merge() method:
foobarList.forEach(fb -> foobarMap.merge(fb, Collections.singletonList(msg),
(l1, l2) -> Stream.concat(l1.stream(), l2.stream()).toList()));
I have a class with a collection of Seed elements. One of the method's return type of Seed is Optional<Pair<Boolean, String>>.
I'm trying to loop over all seeds, find if any boolean value is true and at the same time, create a set with all the String values. For instance, my input is in the form Optional<Pair<Boolean, String>>, the output should be Optional<Signal> where Signal is like:
class Signal {
public boolean exposure;
public Set<String> alarms;
// constructor and getters (can add anything to this class, it's just a bag)
}
This is what I currently have that works:
// Seed::hadExposure yields Optional<Pair<Boolean, String>> where Pair have key/value or left/right
public Optional<Signal> withExposure() {
if (seeds.stream().map(Seed::hadExposure).flatMap(Optional::stream).findAny().isEmpty()) {
return Optional.empty();
}
final var exposure = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.anyMatch(Pair::getLeft);
final var alarms = seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.map(Pair::getRight)
.filter(Objects::nonNull)
.collect(Collectors.toSet());
return Optional.of(new Signal(exposure, alarms));
}
Now I have time to make it better because Seed::hadExposure could become and expensive call, so I was trying to see if I could make all of this with only one pass. I've tried (some suggestions from previous questions) with reduce, using collectors (Collectors.collectingAndThen, Collectors.partitioningBy, etc.), but nothing so far.
It's possible to do this in a single stream() expression using map to convert the non-empty exposure to a Signal and then a reduce to combine the signals:
Signal signal = exposures.stream()
.map(exposure ->
new Signal(
exposure.getLeft(),
exposure.getRight() == null
? Collections.emptySet()
: Collections.singleton(exposure.getRight())))
.reduce(
new Signal(false, new HashSet<>()),
(leftSig, rightSig) -> {
HashSet<String> alarms = new HashSet<>();
alarms.addAll(leftSig.alarms);
alarms.addAll(rightSig.alarms);
return new Signal(
leftSig.exposure || rightSig.exposure, alarms);
});
However, if you have a lot of alarms it would be expensive because it creates a new Set and adds the new alarms to the accumulated alarms for each exposure in the input.
In a language that was designed from the ground-up to support functional programming, like Scala or Haskell, you'd have a Set data type that would let you efficiently create a new set that's identical to an existing set but with an added element, so there'd be no efficiency worries:
filteredSeeds.foldLeft((false, Set[String]())) { (result, exposure) =>
(result._1 || exposure.getLeft, result._2 + exposure.getRight)
}
But Java doesn't come with anything like that out of the box.
You could create just a single Set for the result and mutate it in your stream's reduce expression, but some would regard that as poor style because you'd be mixing a functional paradigm (map/reduce over a stream) with a procedural one (mutating a set).
Personally, in Java, I'd just ditch the functional approach and use a for loop in this case. It'll be less code, more efficient, and IMO clearer.
If you have enough space to store an intermediate result, you could do something like:
List<Pair<Boolean, String>> exposures =
seeds.stream()
.map(Seed::hadExposure)
.flatMap(Optional::stream)
.collect(Collectors.toList());
Then you'd only be calling the expensive Seed::hadExposure method once per item in the input list.
I have the following stream code:
List<Data> results = items.stream()
.map(item -> requestDataForItem(item))
.filter(data -> data.isValid())
.collect(Collectors.toList());
Data requestDataForItem(Item item) {
// call another service here
}
The problem is that I want to call
requestDataForItem only when all elements in the stream are valid.
For example,
if the first item is invalid I don't wont to make the call for any element in the stream.
There is .allMatch in the stream API,
but it returns a boolean.
I want to do the same as .allMatch than
.collect the result when everything matched.
Also, I want to process stream only once,
with two loops it is easy.
Is this possible with the Java Streams API?
This would be a job for Java 9:
List<Data> results = items.stream()
.map(item -> requestDataForItem(item))
.takeWhile(data -> data.isValid())
.collect(Collectors.toList());
This operation will stop at the first invalid element. In a sequential execution, this implies that no subsequent requestDataForItem calls are made. In a parallel execution, some additional elements might get processed concurrently, before the operation stops, but that’s the price for efficient parallel processing.
In either case, the result list will only contain the elements before the first encountered invalid element and you can easily check using results.size() == items.size() whether all elements were valid.
In Java 8, there is no such simple method and using an additional library or rolling out your own implementation of takeWhile wouldn’t pay off considering how simple the non-stream solution would be
List<Data> results = new ArrayList<>();
for(Item item: items) {
Data data = requestDataForItem(item);
if(!data.isValid()) break;
results.add(data);
}
You could theoretically use .allMatch then collect if .allMatch returns true, but then you'd be processing the collection twice. There's no way to do what you're trying to do with the streams API directly.
You could create a method to do this for you and simply pass your collection to it as opposed to using the stream API. This is slightly less elegant than using the stream API but more efficient as it processes the collection only once.
List<Data> results = getAllIfValid(
items.stream().map(item ->
requestDataForItem(item).collect(Collectors.toList())
);
public List<Data> getAllIfValid(List<Data> items) {
List<Data> results = new ArrayList<>();
for (Data d : items) {
if (!d.isValid()) {
return new ArrayList<>();
}
results.add(d);
}
return results;
}
This will return all the results if every element passes and only processes the items collection once. If any fail the isValid() check, it'll return an empty list as you want all or nothing. Simply check to see if the returned collection is empty to see whether or not all items passed the isValid() check.
Implement a two step process:
test if allMatch returns true.
If it does return true, do the collect with a second stream.
Try this.
List<Data> result = new ArrayList<>();
boolean allValid = items.stream()
.map(item -> requestDataForItem(item))
.allMatch(data -> data.isValid() && result.add(data));
if (!allValid)
result.clear();
I am migrating some map-reduce code into Spark, and having problems when constructing an Iterable to return in the function.
In MR code, I had a reduce function that grouped by key, and then (using multipleOutputs) would iterate the values and use write (in multiple outputs, but that's unimportant) to some code like this (simplified):
reduce(Key key, Iterable<Text> values) {
// ... some code
for (Text xml: values) {
multipleOutputs.write(key, val, directory);
}
}
However, in Spark I have translated a map and this reduce into a sequence of:
mapToPair -> groupByKey -> flatMap
as recommended... in some book.
mapToPair basically adds a Key via functionMap, which based on some values on the record creates a Key for that record. Sometimes a key may have ver high cardinality.
JavaPairRDD<Key, String> rddPaired = inputRDD.mapToPair(new PairFunction<String, Key, String>() {
public Tuple2<Key, String> call(String value) {
//...
return functionMap.call(value);
}
});
The rddPaired is applied a RDD.groupByKey() to get the RDD to feed the flatMap function:
JavaPairRDD<Key, Iterable<String>> rddGrouped = rddPaired.groupByKey();
Once grouped, a flatMap call to do the reduce. Here, operation is a transformation :
public Iterable<String> call (Tuple2<Key, Iterable<String>> keyValue) {
// some code...
List<String> out = new ArrayList<String>();
if (someConditionOnKey) {
// do a logic
Grouper grouper = new Grouper();
for (String xml : keyValue._2()) {
// group in a separate class
grouper.add(xml);
}
// operation is now performed on the whole group
out.add(operation(grouper));
} else {
for (String xml : keyValue._2()) {
out.add(operation(xml));
}
return out;
}
}
It works fine... with keys that don't have too many records. Actually, it breaks by OutOfMemory when a key with lot of values enters the "else" on the reduce.
Note: I have included the "if" part to explain the logic I want to produce, but the failure happens when entering the "else"... because when data enters the "else", it normally means there will be many more values for that due by the nature of the data.
It is clear that, having to keep all of the grouped values in "out" list, it won't scale if a key has millions of records, because it will keep them in memory. I have reached the point where the OOM happens (yes, it's when performing the "operation" above which asks for memory - and none is given. It's not a very expensive memory operation though).
Is there any way to avoid this in order to scale? Either by replicating behaviour using some other directives to reach the same output in a more scalable way, or to be able to hand to Spark the values for merging (just as I used to do with MR)...
It's inefficient to do condition inside the flatMap operation. You should check the condition outside to create 2 distinct RDDs and deal with them separatedly.
rddPaired.cache();
// groupFilterFunc will filter which items need grouping
JavaPairRDD<Key, Iterable<String>> rddGrouped = rddPaired.filter(groupFilterFunc).groupByKey();
// processGroupedValuesFunction should call `operation` on group of all values with the same key and return the result
rddGrouped.mapValues(processGroupedValuesFunction);
// nogroupFilterFunc will filter which items don't need grouping
JavaPairRDD<Key, Iterable<String>> rddNoGrouped = rddPaired.filter(nogroupFilterFunc);
// processNoGroupedValuesFunction2 should call `operation` on a single value and return the result
rddNoGrouped.mapValues(processNoGroupedValuesFunction2);