Use case:
Process list of string via method which returns ImmutableTable of type {R,C,V}. For instance ImmutableTable of {Integer,String,Boolean} process(String item){...}
Collect the result i.e, merge all results and return ImmutableTable. Is there a way to achieve it?
Current implementation (as suggested by Bohemian):
How about using parallel stream ? Is there any concurrency issues in the below code? With Parallel stream i an getting "NullPointerException at index 1800" on tableBuilder.build(), but works fine with stream.
ImmutableTable<Integer, String, Boolean> buildData() {
// list of 4 AwsS3KeyName
listToProcess.parallelStream()
//Create new instance via Guice dependency injection
.map(s3KeyName -> ProcessorInstanceProvider.get()
.fetchAndBuild(s3KeyName))
.forEach(tableBuilder::putAll);
return tableBuilder.build(); }
While below code worksgreat with stream as well as parallel stream. But ImmutableBuild is failing due to duplicate entry for row and col. What could be the best way to prevent duplicates while merging tables ?
public static <R, C, V> Collector<ImmutableTable<R, C, V>,
ImmutableTable.Builder<R, C, V>, ImmutableTable<R, C, V>>
toImmutableTable()
{
return Collector.of(ImmutableTable.Builder::new,
ImmutableTable.Builder::putAll, (builder1, builder2) ->
builder1.putAll(builder2.build()), ImmutableTable.Builder::build); }
Edit :
If there is any duplicate entry in ImmutableTable.Builder while merging different tables then it fails,
Trying to avoid faluire by putting ImmutableTables in HashBasedTable
ImmutableTable.copyOf(itemListToProcess.parallelStream()
.map(itemString ->
ProcessorInstanceProvider.get()
.buildImmutableTable(itemString))
.collect(
Collector.of(
HashBasedTable::create,
HashBasedTable::putAll,
(a, b) -> {
a.putAll(b);
return a;
}));
)
But i am getting runtime exception "Caused by: java.lang.IllegalAccessError: tried to access class com.google.common.collect.AbstractTable".
How can we use HashBasedTable as Accumulator to collect ImmutablesTables, as HashBasedTable overrides the existing entry with latest one and doesn't fail if we try to put duplicate entry , and return aggregated Immutable table.
Since Guava 21 you can use ImmutableTable.toImmutableTable collector.
public ImmutableTable<Integer, String, Boolean> processList(List<String> strings) {
return strings.stream()
.map(this::processText)
.flatMap(table -> table.cellSet().stream())
.collect(ImmutableTable.toImmutableTable(
Table.Cell::getRowKey,
Table.Cell::getColumnKey,
Table.Cell::getValue,
(b1, b2) -> b1 && b2 // You can ommit merge function!
));
}
private ImmutableTable<Integer, String, Boolean> processText(String text) {
return ImmutableTable.of(); // Whatever
}
This should work:
List<String> list; // given a list of String
ImmutableTable result = list.parallelStream()
.map(processor::process) // converts String to ImmutableTable
.collect(ImmutableTable.Builder::new, ImmutableTable.Builder::putAll,
(a, b) -> a.putAll(b.build())
.build();
This reduction is threadsafe.
Or using HashBasedTable as the intermediate data structure:
ImmutableTable result = ImmutableTable.copyOf(list.parallelStream()
.map(processor::process) // converts String to ImmutableTable
.collect(HashBasedTable::create, HashBasedTable::putAll, HashBasedTable::putAll));
You should be able to do this by creating an appropriate Collector, using the Collector.of static factory method:
ImmutableTable<R, C, V> table =
list.stream()
.map(processor::process)
.collect(
Collector.of(
() -> new ImmutableTable.Builder<R, C, V>(),
(builder, table1) -> builder.putAll(table1),
(builder1, builder2) ->
new ImmutableTable.Builder<R, C, V>()
.putAll(builder1.build())
.putAll(builder2.build()),
ImmutableTable.Builder::build));
Related
I have map Map<Nominal, Integer> with objects and their counts:
a -> 3
b -> 1
c -> 2
And I need to get such a List<Nominal> from it:
a
a
a
b
c
c
How can I do this using the Stream API?
We can use Collections::nCopies to achieve the desired result:
private static <T> List<T> transform(Map<? extends T, Integer> map) {
return map.entrySet().stream()
.map(entry -> Collections.nCopies(entry.getValue(), entry.getKey()))
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
Ideone demo
Remark
In the demo, I changed the key-type of the Map from Nominal to Object since the definition of Nominal was not provided. Changing the key-type, however, does not influence the solution.
Stream the entries and use flatMap to generate multiple copies of each key based on the value.
List<Nominal> expanded = map.entrySet().stream()
.flatMap(e -> generate(e::getKey).limit(e.getValue()))
.collect(toList());
I have a list of records, where each record has two primary keys Primary Id and Alternate Id. I want to make a map through which I can access the processed records using either Primary Id or the Alternate Id using RxJava operations.
Current implementation:
ImmutableMap.Builder<String, Record> mapBuilder = new ImmutableMap.Builder<>();
fetchRecords()
.forEach(
record -> {
parsedRecord = dosomething(record);
mapBuilder.put(parsedRecord.getPrimaryId(), parsedRecord);
mapBuilder.put(parsedRecord.getAlternativeId(), parsedRecord);
});
return mapBuilder.build();
How I want it to look like:
fetchRecords().stream()
.map(doSomething)
.collect(Collectors.toMap(RecordData::getPrimaryId, Function.identity()));
// Need to add this as well
.collect(Collectors.toMap(RecordData::getAlterantiveId, Function.identity()));
Just wanted to know if there is a way that I can add the secondary id to record mapping as well in a single pass over fetchRecords().
I'm not familiar with rx-java but this might provide a starting point. Create a custom collector to add both keys. This is a very basic collector and does not handle duplicate keys other than using the last one provided. I simply used the new record feature introduced in Java 15 to create an immutable "class". This would work the same way for a regular class.
record ParsedRecord(String getPrimaryId,
String getAlternativeId, String someValue) {
#Override
public String toString() {
return someValue;
}
}
Map<String, ParsedRecord> map = records.stream()
.collect(twoKeys(ParsedRecord::getPrimaryId,
ParsedRecord::getAlternativeId, p -> p));
map.entrySet().forEach(System.out::println);
Prints the following:
A=value1
B=value1
C=value2
D=value2
Here is the collector. It is essentially a simplified version of toMap that takes two keys instead of one.
private static <T, K, V> Collector<T, ?, Map<K, V>> twoKeys(
Function<T, K> keyMapper1, Function<T, K> keyMapper2,
Function<T, V> valueMapper) {
return Collector.of(
() -> new HashMap<K, V>(),
(m, r) -> {
V v = valueMapper.apply(r);
m.put(keyMapper1.apply(r),v);
m.put(keyMapper2.apply(r),v);
}, (m1, m2) -> {
m1.putAll(m2);
return m1;
}, Characteristics.UNORDERED);
}
Or just keep it simple and efficient.
Map<String, ParsedRecord> map = new HashMap<>();
for(ParsedRecord pr : records) {
map.put(pr.getPrimaryId(), pr);
map.put(pr.getAlternativeId(), pr);
}
Consider this code:
Function<BigDecimal,BigDecimal> func1 = x -> x;//This could be anything
Function<BigDecimal,BigDecimal> func2 = y -> y;//This could be anything
Map<Integer,BigDecimal> data = new HashMap<>();
Map<Integer,BigDecimal> newData =
data.entrySet().stream().
collect(Collectors.toMap(Entry::getKey,i ->
func1.apply(i.getValue())));
List<BigDecimal> list =
newData.entrySet().stream().map(i ->
func2.apply(i.getValue())).collect(Collectors.toList());
Basically what I'm doing is updating an HashMap with func1,to apply a second trasformation with func2 and to save second time updated value in a list.
I DID all in immutable way generating the new objects newData and list.
MY QUESTION:
It is possible to do that streaming the original HashMap (data) once?
I tried this:
Function<BigDecimal,BigDecimal> func1 = x -> x;
Function<BigDecimal,BigDecimal> func2 = y -> y;
Map<Integer,BigDecimal> data = new HashMap<>();
List<BigDecimal> list = new ArrayList<>();
Map<Integer,BigDecimal> newData =
data.entrySet().stream().collect(Collectors.toMap(
Entry::getKey,i ->
{
BigDecimal newValue = func1.apply(i.getValue());
//SIDE EFFECT!!!!!!!
list.add(func2.apply(newValue));
return newValue;
}));
but doing so I have a side effect in list updating so I lost the 'immutable way' requirement.
This seems like an ideal use case for the upcoming Collectors.teeing method in JDK 12. Here's the webrev and here's the CSR. You can use it as follows:
Map.Entry<Map<Integer, BigDecimal>, List<BigDecimal>> result = data.entrySet().stream()
.collect(Collectors.teeing(
Collectors.toMap(
Map.Entry::getKey,
i -> func1.apply(i.getValue())),
Collectors.mapping(
i -> func1.andThen(func2).apply(i.getValue()),
Collectors.toList()),
Map::entry));
Collectors.teeing collects to two different collectors and then merges both partial results into the final result. For this final step I'm using JDK 9's Map.entry(K k, V v) static method, but I could have used any other container, i.e. Pair or Tuple2, etc.
For the first collector I'm using your exact code to collect to a Map, while for the second collector I'm using Collectors.mapping along with Collectors.toList, using Function.andThen to compose your func1 and func2 functions for the mapping step.
EDIT: If you cannot wait until JDK 12 is released, you could use this code meanwhile:
public static <T, A1, A2, R1, R2, R> Collector<T, ?, R> teeing(
Collector<? super T, A1, R1> downstream1,
Collector<? super T, A2, R2> downstream2,
BiFunction<? super R1, ? super R2, R> merger) {
class Acc {
A1 acc1 = downstream1.supplier().get();
A2 acc2 = downstream2.supplier().get();
void accumulate(T t) {
downstream1.accumulator().accept(acc1, t);
downstream2.accumulator().accept(acc2, t);
}
Acc combine(Acc other) {
acc1 = downstream1.combiner().apply(acc1, other.acc1);
acc2 = downstream2.combiner().apply(acc2, other.acc2);
return this;
}
R applyMerger() {
R1 r1 = downstream1.finisher().apply(acc1);
R2 r2 = downstream2.finisher().apply(acc2);
return merger.apply(r1, r2);
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::applyMerger);
}
Note: The characteristics of the downstream collectors are not considered when creating the returned collector (left as an exercise).
EDIT 2: Your solution is absolutely OK, even though it uses two streams. My solution above streams the original map only once, but it applies func1 to all the values twice. If func1 is expensive, you might consider memoizing it (i.e. caching its results, so that whenever it's called again with the same input, you return the result from the cache instead of computing it again). Or you might also first apply func1 to the values of the original map, and then collect with Collectors.teeing.
Memoizing is easy. Just declare this utility method:
public <T, R> Function<T, R> memoize(Function<T, R> f) {
Map<T, R> cache = new HashMap<>(); // or ConcurrentHashMap
return k -> cache.computeIfAbsent(k, f);
}
And then use it as follows:
Function<BigDecimal, BigDecimal> func1 = memoize(x -> x); //This could be anything
Now you can use this memoized func1 and it will work exactly as before, except that it will return results from the cache when its apply method is invoked with an argument that has been previously used.
The other solution would be to apply func1 first and then collect:
Map.Entry<Map<Integer, BigDecimal>, List<BigDecimal>> result = data.entrySet().stream()
.map(i -> Map.entry(i.getKey(), func1.apply(i.getValue())))
.collect(Collectors.teeing(
Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue),
Collectors.mapping(
i -> func2.apply(i.getValue()),
Collectors.toList()),
Map::entry));
Again, I'm using jdk9's Map.entry(K k, V v) static method.
Your code can be simplified this way:
List<BigDecimal> list = data.values().stream()
.map(func1)
.map(func2)
.collect(Collectors.toList());
Your goal is to apply these functions to all the BigDecimal values in the Map. You can get all these values from the map using Map::values which returns the List. Then apply the Stream to the List only. Consider the data already contains some entries:
List<BigDecimal> list = data.values().stream()
.map(func1)
.map(func2)
.collect(Collectors.toList());
I discourage you from iterating all the entries (Set<Entry<Integer, BigDecimal>>) since you only need to work with the values.
Try this way it returns Array of Object[2] the first one is the map and second one is the list
Map<Integer, BigDecimal> data = new HashMap<>();
data.put(1, BigDecimal.valueOf(30));
data.put(2, BigDecimal.valueOf(40));
data.put(3, BigDecimal.valueOf(50));
Function<BigDecimal, BigDecimal> func1 = x -> x.add(BigDecimal.valueOf(10));//This could be anything
Function<BigDecimal, BigDecimal> func2 = y -> y.add(BigDecimal.valueOf(-20));//This could be anything
Object[] o = data.entrySet().stream()
.map(AbstractMap.SimpleEntry::new)
.map(entry -> {
entry.setValue(func1.apply(entry.getValue()));
return entry;
})
.collect(Collectors.collectingAndThen(toMap(Map.Entry::getKey, Map.Entry::getValue), a -> {
List<BigDecimal> bigDecimals = a.values().stream().map(func2).collect(Collectors.toList());
return new Object[]{a,bigDecimals};
}));
System.out.println(data);
System.out.println((Map<Integer, BigDecimal>)o[0]);
System.out.println((List<BigDecimal>)o[1]);
Output:
Original Map: {1=30, 2=40, 3=50}
func1 map: {1=40, 2=50, 3=60}
func1+func2 list: [20, 30, 40]
This question already has answers here:
Java Lambda Stream Distinct() on arbitrary key? [duplicate]
(9 answers)
Closed 7 years ago.
Let's prefix this by my objects equals implementation is not how I need to filter so distinct itself does not work.
class MyObject {
String foo;
MyObject( String foo ) {
this.foo = foo;
}
public String getFoo() { return foo; }
}
Collection<MyObject> listA = Arrays.asList("a", "b", "c").stream().map(MyObject::new)
.collect(Collectors.toList());
Collection<MyObject> listB = Arrays.asList("b", "d").stream().map(MyObject::new)
.collect(Collectors.toList());
// magic
How can I merge and deduplicate the lists so that the resulting list should be of MyObjects containing "a", "b", "c", "d"?
Note: This is a simplification of what methods we actually need to deduplicate, which are actually complex DTOs of entities loaded by hibernate, but this example should adequately demonstrate the objective.
Such feature is discussed by JDK developers (see JDK-8072723) and might be included in Java-9 (though not guaranteed). The StreamEx library developed by me already has such feature, so you can use it:
List<MyObject> distinct = StreamEx.of(listA).append(listB)
.distinct(MyObject::getFoo).toList();
The StreamEx class is an enhanced Stream which is completely compatible with JDK Stream, but has many additional operations including distinct(Function) which allows you to specify key extractor for distinct operation. Internally it's pretty similar to the solution proposed by #fge.
You can also consider writing custom collector which will combine getting distinct objects and storing them to list:
public static <T> Collector<T, ?, List<T>> distinctBy(Function<? super T, ?> mapper) {
return Collector.<T, Map<Object, T>, List<T>> of(LinkedHashMap::new,
(map, t) -> map.putIfAbsent(mapper.apply(t), t), (m1, m2) -> {
for(Entry<Object, T> e : m2.entrySet()) {
m1.putIfAbsent(e.getKey(), e.getValue());
}
return m1;
}, map -> new ArrayList<>(map.values()));
}
This collector intermediately collects the results into Map<Key, Element> where Key is the extracted Key and Element is the corresponding stream element. To make sure that exactly first occurring element will be preserved among all repeating ones, the LinkedHashMap is used. Finally you just need to take the values() of this map and dump them into the list. So now you can write:
List<MyObject> distinct = Stream.concat(listA.stream(), listB.stream())
.collect(distinctBy(MyObject::getFoo));
If you don't care whether the resulting collection is list or not, you can even remove the new ArrayList<>() step (just using Map::values as a finisher). Also more simplifications are possible if you don't care about order:
public static <T> Collector<T, ?, Collection<T>> distinctBy(Function<? super T, ?> mapper) {
return Collector.<T, Map<Object, T>, Collection<T>> of(HashMap::new,
(map, t) -> map.put(mapper.apply(t), t),
(m1, m2) -> { m1.putAll(m2); return m1; },
Map::values);
}
Such collector (preserving the order and returning the List) is also available in StreamEx library.
If .equals() does not work for you then you may want to have a go at using Guava's Equivalence.
Provided that your type is T, you need to implement an Equivalence<T>; once you have this, you need to create a:
Set<Equivalence.Wrapper<T>>
into which you'll gather your values. Then, provided your implementation of Equivalence<T> is some static variable named EQ, adding to this set is as simple as:
coll1.stream().map(EQ::wrap).forEach(set::add);
coll2.stream().map(EQ::wrap).forEach(set::add);
And then to obtain a List<T> from this set, you could:
final Set<T> unwrapped = set.stream().map(Equivalence.Wrapper::get)
.collect(Collectors.toSet());
But of course, since in your comments you say you can do it with a loop, well... Why not keep using that loop?
If it works, don't fix it...
Collection<MyObject> result = Stream.concat(listA.stream(), listB.stream())
.filter(distinct(MyObject::getFoo))
.collect(Collectors.toList());
public static <T> Predicate<T> distinct(Function<? super T, Object> keyExtractor) {
Map<Object, String> seen = new ConcurrentHashMap<>();
return t -> seen.put(keyExtractor.apply(t), "") == null;
}
I found this distinct function once in a blog (can't remember the link atm).
I have a class like this:
class MultiDataPoint {
private DateTime timestamp;
private Map<String, Number> keyToData;
}
and i want to produce , for each MultiDataPoint
class DataSet {
public String key;
List<DataPoint> dataPoints;
}
class DataPoint{
DateTime timeStamp;
Number data;
}
of course a 'key' can be the same across multiple MultiDataPoints.
So given a List<MultiDataPoint>, how do I use Java 8 streams to convert to List<DataSet>?
This is how I am currently doing the conversion without streams:
Collection<DataSet> convertMultiDataPointToDataSet(List<MultiDataPoint> multiDataPoints)
{
Map<String, DataSet> setMap = new HashMap<>();
multiDataPoints.forEach(pt -> {
Map<String, Number> data = pt.getData();
data.entrySet().forEach(e -> {
String seriesKey = e.getKey();
DataSet dataSet = setMap.get(seriesKey);
if (dataSet == null)
{
dataSet = new DataSet(seriesKey);
setMap.put(seriesKey, dataSet);
}
dataSet.dataPoints.add(new DataPoint(pt.getTimestamp(), e.getValue()));
});
});
return setMap.values();
}
It's an interesting question, because it shows that there are a lot of different approaches to achieve the same result. Below I show three different implementations.
Default methods in Collection Framework: Java 8 added some methods to the collections classes, that are not directly related to the Stream API. Using these methods, you can significantly simplify the implementation of the non-stream implementation:
Collection<DataSet> convert(List<MultiDataPoint> multiDataPoints) {
Map<String, DataSet> result = new HashMap<>();
multiDataPoints.forEach(pt ->
pt.keyToData.forEach((key, value) ->
result.computeIfAbsent(
key, k -> new DataSet(k, new ArrayList<>()))
.dataPoints.add(new DataPoint(pt.timestamp, value))));
return result.values();
}
Stream API with flatten and intermediate data structure: The following implementation is almost identical to the solution provided by Stuart Marks. In contrast to his solution, the following implementation uses an anonymous inner class as intermediate data structure.
Collection<DataSet> convert(List<MultiDataPoint> multiDataPoints) {
return multiDataPoints.stream()
.flatMap(mdp -> mdp.keyToData.entrySet().stream().map(e ->
new Object() {
String key = e.getKey();
DataPoint dataPoint = new DataPoint(mdp.timestamp, e.getValue());
}))
.collect(
collectingAndThen(
groupingBy(t -> t.key, mapping(t -> t.dataPoint, toList())),
m -> m.entrySet().stream().map(e -> new DataSet(e.getKey(), e.getValue())).collect(toList())));
}
Stream API with map merging: Instead of flattening the original data structures, you can also create a Map for each MultiDataPoint, and then merge all maps into a single map with a reduce operation. The code is a bit simpler than the above solution:
Collection<DataSet> convert(List<MultiDataPoint> multiDataPoints) {
return multiDataPoints.stream()
.map(mdp -> mdp.keyToData.entrySet().stream()
.collect(toMap(e -> e.getKey(), e -> asList(new DataPoint(mdp.timestamp, e.getValue())))))
.reduce(new HashMap<>(), mapMerger())
.entrySet().stream()
.map(e -> new DataSet(e.getKey(), e.getValue()))
.collect(toList());
}
You can find an implementation of the map merger within the Collectors class. Unfortunately, it is a bit tricky to access it from the outside. Following is an alternative implementation of the map merger:
<K, V> BinaryOperator<Map<K, List<V>>> mapMerger() {
return (lhs, rhs) -> {
Map<K, List<V>> result = new HashMap<>();
lhs.forEach((key, value) -> result.computeIfAbsent(key, k -> new ArrayList<>()).addAll(value));
rhs.forEach((key, value) -> result.computeIfAbsent(key, k -> new ArrayList<>()).addAll(value));
return result;
};
}
To do this, I had to come up with an intermediate data structure:
class KeyDataPoint {
String key;
DateTime timestamp;
Number data;
// obvious constructor and getters
}
With this in place, the approach is to "flatten" each MultiDataPoint into a list of (timestamp, key, data) triples and stream together all such triples from the list of MultiDataPoint.
Then, we apply a groupingBy operation on the string key in order to gather the data for each key together. Note that a simple groupingBy would result in a map from each string key to a list of the corresponding KeyDataPoint triples. We don't want the triples; we want DataPoint instances, which are (timestamp, data) pairs. To do this we apply a "downstream" collector of the groupingBy which is a mapping operation that constructs a new DataPoint by getting the right values from the KeyDataPoint triple. The downstream collector of the mapping operation is simply toList which collects the DataPoint objects of the same group into a list.
Now we have a Map<String, List<DataPoint>> and we want to convert it to a collection of DataSet objects. We simply stream out the map entries and construct DataSet objects, collect them into a list, and return it.
The code ends up looking like this:
Collection<DataSet> convertMultiDataPointToDataSet(List<MultiDataPoint> multiDataPoints) {
return multiDataPoints.stream()
.flatMap(mdp -> mdp.getData().entrySet().stream()
.map(e -> new KeyDataPoint(e.getKey(), mdp.getTimestamp(), e.getValue())))
.collect(groupingBy(KeyDataPoint::getKey,
mapping(kdp -> new DataPoint(kdp.getTimestamp(), kdp.getData()), toList())))
.entrySet().stream()
.map(e -> new DataSet(e.getKey(), e.getValue()))
.collect(toList());
}
I took some liberties with constructors and getters, but I think they should be obvious.