Java 8 lambda: convert Collection to Map of element, iteration position - java

How do you convert a collection like ["a", "b", "c"] to a map like {"a": 0, "b":1, "c":2} with the values being the order of iteration.
Is there a one liner with streams and collectors in JDK8 for it?
Old fashion way is like this:
Collection<String> col = apiCall();
Map<String, Integer> map = new HashMap<>();
int pos = 0;
for (String s : collection) {
map.put(s, pos++);
}

It you don't need a parallel stream you can use the length of the map as an index counter:
collection.stream().forEach(i -> map.put(i, map.size() + 1));

Here's an approach:
List<String> list = Arrays.asList("a", "b", "c");
Map<String, Integer> map =
IntStream.range(0, list.size())
.boxed()
.collect(toMap(i -> list.get(i), i -> i));
Not necessarily a one-liner or shorter than the straightforward loop, but it does work using a parallel stream if you change toMap to toConcurrentMap.
Also note, this assumes that you have a random-access list, not a general Collection. If you have a Collection that you otherwise can make no assumptions about, there's not much you can do other than to iterate over it sequentially and increment a counter.
UPDATE
The OP has clarified that the input is a Collection and not a List so the above doesn't apply. It seems that we can assume very little about the input Collection. The OP has specified iteration order. With a sequential iterator, the elements will come out in some order although no guarantees can be made about it. It might change from run to run, or even from one iteration to the next (though this would be unusual in practice -- unless the underlying collection is modified).
If the exact iteration order needs to be preserved, I don't believe there's a way to preserve it into the result Map without iterating the input Collection sequentially.
If, however, the exact iteration order isn't important, and the requirement is that the output Map have unique values for each input element, then it would be possible to do something like this in parallel:
Collection<String> col = apiCall();
Iterator<String> iter = col.iterator();
Map<String, Integer> map =
IntStream.range(0, col.size())
.parallel()
.boxed()
.collect(toConcurrentMap(i -> { synchronized (iter) { return iter.next(); }},
i -> i));
This is now far from a one-liner. It's also not clear to me how useful it is. :-) But it does demonstrate that it's possible to do something like this in parallel. Note that we've had to synchronize access to the input collection's iterator since it will be called from multiple threads. Also note that this is an unusual use of the iterator, since we never call hasNext and we assume that it is safe to call next exactly the number of times returned by the input collection's size().

Based on maba’s answer the general solution is:
collection.stream().forEachOrdered(i -> map.put(i, map.size()));
From the documentation of void forEachOrdered(Consumer<? super T> action):
This operation processes the elements one at a time, in encounter order if one exists.
The important aspect here that it retains the order if there is one, e.g. if the Collection is a SortedSet or a List. Such a stream is called an ordered stream (not to confuse with sorted stream). It might invoke the consumer method by different threads but always ensuring the “one at a time” and thread-safety guaranty.
Of course, it won’t benefit from parallel execution if the stream is parallel.
For completeness, here is the solution which will work even on parallel streams utilizing the parallel processing, if they are still ordered:
stream.collect(HashMap::new, (m, i) -> m.put(i, m.size()),
(a, b) -> {int offset = a.size(); b.forEach((k, v) -> a.put(k, v + offset));});

If you don't mind using 3rd party libraries, my cyclops-react lib has extensions for all JDK Collection types, with a large number of powerful operaitons attached, so you could implement this like so :-
CollectionX<String> col = CollectionX.fromCollection(orgCol);
col.zipWithIndex()
.toMap(k->k.v1, v->v.v2);
cyclops-react Collection extensions are eager, so you would get better performance with our Stream extension, ReactiveSeq (which extends jOOλ's Seq, which in turn is an extension of JDK's java.util.stream.Stream, it also implements the reactive-streams api).
ReactiveSeq.fromCollection(col)
.zipWithIndex()
.toMap(k->k.v1, v->v.v2);

You can use AtomicInteger as index in stream:
Collection col = Arrays.asList("a", "b", "c");
AtomicInteger index = new AtomicInteger(0);
Map collectedMap = col.stream().collect(Collectors.toMap(Function.identity(), el -> index.getAndIncrement()));
System.out.println("collectedMap = " + collectedMap);

Try
int[] pos = { 0 };
list.forEach( a -> map.put(a, pos[0]++));

Related

How to propagate variables in a stream in java 8

I would be currious to know how to propagate variable into a stream in java 8.
An example is better than a long explaination, so how would you convert the following (abstract) code into streams:
Map<Integer,A> myMap = new HashMap();
for (Entry<Integer,A> entry : myMap)
{
int param1=entry.getValue().getParam1();
List param2=entry.getValue().getParam2();
for (B b : param2)
{
System.out.println(""+entry.getKey()+"-"+param1+"-"+b.toString());
}
}
Knowing that this example is a simplification of the problem (for example, i need "param1" more than once in the next for loop)
So far, the only idea i have is to store all the informations i need into a tuple to finally use the forEach stream method over this tuple.
(Not sure to be very clear....)
Edit:I simplified my example too much. My case is more something like that:
Map<Integer,A> myMap = new HashMap();
for (Entry<Integer,A> entry : myMap)
{
int param1=entry.getValue().getParam1();
CustomList param2=entry.getValue().getParam2();
for (int i = 0; i<param2.size(); i++)
{
System.out.println(""+entry.getKey()+"-"+param1+"-"+param2.get(i).toString());
}
}
I could write something like that with stream:
myMap.entrySet().stream()
.forEach(
e -> IntStream.range(0, e.getValue.getParam2().getSize())
.forEach(
i -> System.out.println(e.getKey()+"-"+e.getValue().getParam1()+"-"+e.getValue.getParam2.get(i))
)
);
However, what i have instead of "e.getValue.getParam2()" in my real case is much more complex (a sequence of 5-6 methods) and heavier than just retrieving a variable (it executes some logic), so i would like to avoid to repeat e.getValue.getParam2 (once in just before the forEach, and once in the forEach)
i know that it's maybe not the best use case for using stream, but I am learning about it and would like to know about the limits
Thanks!
Something like this:
myMap.forEach(
(key, value) -> value.getParam2().forEach(
b -> System.out.println(key+"-"+value.getParam1()+"-"+b)
)
);
That is, for each key/value pair, iterate through value.getParam2(). For each one of those, print out string formatted as you specified. I'm not sure what that gets you, other than being basically what you had before, but using streams.
Update
Responding to updates to your question, this:
myMap.forEach((key, value) -> {
final CustomList param2 = value.getParam2();
IntStream.range(0, param2.getSize()).forEach(
i -> System.out.println(key+"-"+value.getParam1()+"-"+param2.get(i))
)
});
Here we assign the result of getParam2() to a final variable, so it is only calculated once. Final (and effectively final) variables are visible inside lambda functions.
(Thank you to Holger for the suggestions.)
Note that there are more features in the Java 8 API than just streams. Especially, if you just want to process all elements of a collection, you don’t need streams.
You can simplify every form of coll.stream().forEach(consumer) to coll.forEach(consumer). This applies to map.entrySet() as well, however, if you want to process all mappings of a Map, you can use forEach on the Map directly, providing a BiConsumer<KeyType,ValueType> rather than a Consumer<Map.Entry<KeyType,ValueType>>, which can greatly improve the readability:
myMap.forEach((key, value) -> {
int param1 = value.getParam1();
CustomList param2 = value.getParam2();
IntStream.range(0, param2.size()).mapToObj(param2::get)
.forEach(obj -> System.out.println(key+"-"+param1+"-"+obj));
});
It’s worth thinking about adding a forEach(Consumer<ElementType>) method to your CustomList, even if the CustomList doesn’t support the other standard collection operations…

Filtering a stream based on its values in a toMap collection

I have a situation where I have Player objects in a development project, and the task is simply measuring the distance and returning results which fall under a certain threshold. Of course, I'm wanting to use streams in the most concise manner possible.
Currently, I have a solution which maps the stream, and then filters via an iterator:
Stream<Player> str = /* source of my player stream I'm filtering */;
Map<Player, Double> dists = str.collect(Collectors.toMap(...)); //mapping function
Iterator<Map.Entry<Player, Double>> itr = map.entrySet().iterator();
while (itr.hasNext()) {
if (itr.next().getValue() <= radiusSquared) {
itr.remove();
}
}
However, what I'd like to achieve is something which performs this filtering while the stream is being operated upon, something which says "if this predicate fails, do not collect", to attempt and save the second iteration. Additionally, I don't want to calculate the distances twice, so doing a filter via the mapping function, and then re-mapping isn't a plausible solution.
The only real viable solution I've thought of is mapping to a Pair<A, B>, but if there's native support for some form of binary stream, that'd be better.
Is there native support for this in java's stream API?
Filtering a Map afterwards is not as bad as it seems, keep in mind that iterating over a Map does not imply the same cost as performing a lookup (e.g. hashing).
But instead of
Iterator<Map.Entry<Player, Double>> itr = map.entrySet().iterator();
while (itr.hasNext()) {
if (itr.next().getValue() <= radiusSquared) {
itr.remove();
}
}
you may simply use
map.values().removeIf(value -> value <= radiusSquared);
Even if you insist on having it as part of the collect operation, you can do it as postfix operation:
Map<Player, Double> dists = str.collect(
Collectors.collectingAndThen(Collectors.toMap(p->p, p->calculate(p)),
map -> { map.values().removeIf(value -> value <= radiusSquared); return map; }));
Avoiding to put unwanted entries in the first place is possible, but it implies manually retracing what the existing toMap collector does:
Map<Player, Double> dists = str.collect(
HashMap::new,
(m, p) -> { double value=calculate(p); if(value > radiusSquared) m.put(p, value); },
Map::putAll);
Note that your old-style iterator-loop could be rewritten in Java-8 using Collection.removeIf:
map.values().removeIf(dist -> dist <= radiusSquared);
So it does not actually that bad. Don't forget that keySet() and values() are modifiable.
If you want to solve this using single pipeline (for example, most of the entries are to be removed), then bad news for you. Seems that current Stream API does not allow you to do this without explicit use of the class with pair semantics. It's quite natural to create a Map.Entry instance, though already existing option is AbstractMap.SimpleEntry which has quite long and unpleasant name:
str.map(player -> new AbstractMap.SimpleEntry(player, getDistance(player)))
.filter(entry -> entry.getValue() > radiusSquared)
.toMap(Entry::getKey, Entry::getValue);
Note that it's likely that in Java-9 there will be Map.entry() static method, so you could use Map.entry(player, getDistance(player)). See JEP-269 for details.
As usual my StreamEx library has some syntactic sugar to solve this problem in cleaner way:
StreamEx.of(str).mapToEntry(player -> getDistance(player))
.filterValues(dist -> dist > radiusSquared)
.toMap();
And regarding the comments: yes, toMap() collector uses one-by-one insert, but don't worry: bulk inserts to map rarely improve the speed. You even cannot pre-size the hash-table (if your map is hash-based) as you don't know much about the elements being inserted. Probably you want to insert a million of objects with the same key: allocating the hash-table for million entries just to discover that you will have only one entry after insertion would be too wasteful.
If your goal is just to have one iteration and calculate distances only once, then you could do this:
Stream<Player> str = /* source of my player stream I'm filtering */;
Map<Player, Double> dists = new HashMap<>();
str.forEach(p -> {
double distance = /* calculate distance */;
if (distance <= radiusSquared) {
dists.put(p, distance);
}
});
No collector any more, but is it that important?

Java 8 list to map with stream

I have a List<Item> collection.
I need to convert it into Map<Integer, Item>
The key of the map must be the index of the item in the collection.
I can not figure it out how to do this with streams.
Something like:
items.stream().collect(Collectors.toMap(...));
Any help?
As this question is identified as possible duplicate I need to add that my concrete problem was - how to get the position of the item in the list and put it as a key value
You can create a Stream of the indices using an IntStream and then convert them to a Map :
Map<Integer,Item> map =
IntStream.range(0,items.size())
.boxed()
.collect(Collectors.toMap (i -> i, i -> items.get(i)));
One more solution just for completeness is to use custom collector:
public static <T> Collector<T, ?, Map<Integer, T>> toMap() {
return Collector.of(HashMap::new, (map, t) -> map.put(map.size(), t),
(m1, m2) -> {
int s = m1.size();
m2.forEach((k, v) -> m1.put(k+s, v));
return m1;
});
}
Usage:
Map<Integer, Item> map = items.stream().collect(toMap());
This solution is parallel-friendly and does not depend on the source (you can use list without random access or Files.lines() or whatever).
Don't feel like you have to do everything in/with the stream. I would just do:
AtomicInteger index = new AtomicInteger();
items.stream().collect(Collectors.toMap(i -> index.getAndIncrement(), i -> i));
As long as you don't parallelise the stream this will work and it avoids potentially expensive and/or problematic (in the case of duplicates) get() and indexOf() operations.
(You cannot use a regular int variable in place of the AtomicInteger because variables used from outside a lambda expression must be effectively final. Note that when uncontested (as in this case), AtomicInteger is very fast and won't pose a performance problem. But if it worries you you can use a non-thread-safe counter.)
This is updated answer and has none of the problems mentioned in comments.
Map<Integer,Item> outputMap = IntStream.range(0,inputList.size()).boxed().collect(Collectors.toMap(Function.identity(), i->inputList.get(i)));
Using a third party library (protonpack for example, but there are others) you can zip the value with its index and voila:
StreamUtils.zipWithIndex(items.stream())
.collect(Collectors.toMap(Indexed::getIndex, Indexed::getValue));
although getIndex returns a long, so you may need to cast it using something similar to:
i -> Integer.valueOf((int) i.getIndex())
Eran's answer is usually the best approach for random-access lists.
If your List isn't random access, or if you have a Stream instead of a List, you can use forEachOrdered:
Stream<Item> stream = ... ;
Map<Integer, Item> map = new HashMap<>();
AtomicInteger index = new AtomicInteger();
stream.forEachOrdered(item -> map.put(index.getAndIncrement(), item));
This is safe, if the stream is parallel, even though the destination map is thread-unsafe and is operated upon as a side effect. The forEachOrdered guarantees that items are processed one-at-a-time, in order. For this reason it's unlikely that any speedup will result from running in parallel. (There might be some speedup if there are expensive operations in the pipeline before the forEachOrdered.)

Java 8 Streams: Map the same object multiple times based on different properties

I was presented with an interesting problem by a colleague of mine and I was unable to find a neat and pretty Java 8 solution. The problem is to stream through a list of POJOs and then collect them in a map based on multiple properties - the mapping causes the POJO to occur multiple times
Imagine the following POJO:
private static class Customer {
public String first;
public String last;
public Customer(String first, String last) {
this.first = first;
this.last = last;
}
public String toString() {
return "Customer(" + first + " " + last + ")";
}
}
Set it up as a List<Customer>:
// The list of customers
List<Customer> customers = Arrays.asList(
new Customer("Johnny", "Puma"),
new Customer("Super", "Mac"));
Alternative 1: Use a Map outside of the "stream" (or rather outside forEach).
// Alt 1: not pretty since the resulting map is "outside" of
// the stream. If parallel streams are used it must be
// ConcurrentHashMap
Map<String, Customer> res1 = new HashMap<>();
customers.stream().forEach(c -> {
res1.put(c.first, c);
res1.put(c.last, c);
});
Alternative 2: Create map entries and stream them, then flatMap them. IMO it is a bit too verbose and not so easy to read.
// Alt 2: A bit verbose and "new AbstractMap.SimpleEntry" feels as
// a "hard" dependency to AbstractMap
Map<String, Customer> res2 =
customers.stream()
.map(p -> {
Map.Entry<String, Customer> firstEntry = new AbstractMap.SimpleEntry<>(p.first, p);
Map.Entry<String, Customer> lastEntry = new AbstractMap.SimpleEntry<>(p.last, p);
return Stream.of(firstEntry, lastEntry);
})
.flatMap(Function.identity())
.collect(Collectors.toMap(
Map.Entry::getKey, Map.Entry::getValue));
Alternative 3: This is another one that I came up with the "prettiest" code so far but it uses the three-arg version of reduce and the third parameter is a bit dodgy as found in this question: Purpose of third argument to 'reduce' function in Java 8 functional programming. Furthermore, reduce does not seem like a good fit for this problem since it is mutating and parallel streams may not work with the approach below.
// Alt 3: using reduce. Not so pretty
Map<String, Customer> res3 = customers.stream().reduce(
new HashMap<>(),
(m, p) -> {
m.put(p.first, p);
m.put(p.last, p);
return m;
}, (m1, m2) -> m2 /* <- NOT USED UNLESS PARALLEL */);
If the above code is printed like this:
System.out.println(res1);
System.out.println(res2);
System.out.println(res3);
The result would be:
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
So, now to my question: How should I, in a Java 8 orderly fashion, stream through the List<Customer> and then somehow collect it as a Map<String, Customer> where you split the whole thing as two keys (first AND last) i.e. the Customer is mapped twice. I do not want to use any 3rd party libraries, I do not want to use a map outside of the stream as in alt 1. Are there any other nice alternatives?
The full code can be found on hastebin for simple copy-paste to get the whole thing running.
I think your alternatives 2 and 3 can be re-written to be more clear:
Alternative 2:
Map<String, Customer> res2 = customers.stream()
.flatMap(
c -> Stream.of(c.first, c.last)
.map(k -> new AbstractMap.SimpleImmutableEntry<>(k, c))
).collect(toMap(Map.Entry::getKey, Map.Entry::getValue));
Alternative 3: Your code abuses reduce by mutating the HashMap. To do mutable reduction, use collect:
Map<String, Customer> res3 = customers.stream()
.collect(
HashMap::new,
(m,c) -> {m.put(c.first, c); m.put(c.last, c);},
HashMap::putAll
);
Note that these are not identical. Alternative 2 will throw an exception if there are duplicate keys while Alternative 3 will silently overwrite the entries.
If overwriting entries in case of duplicate keys is what you want, I would personally prefer Alternative 3. It is immediately clear to me what it does. It most closely resembles the iterative solution. I would expect it to be more performant as Alternative 2 has to do a bunch of allocations per customer with all that flatmapping.
However, Alternative 2 has a huge advantage over Alternative 3 by separating the production of entries from their aggregation. This gives you a great deal of flexibility. For example, if you want to change Alternative 2 to overwrite entries on duplicate keys instead of throwing an exception, you would simply add (a,b) -> b to toMap(...). If you decide you want to collect matching entries into a list, all you would have to do is replace toMap(...) with groupingBy(...), etc.

Is it possible to use an ordered Collector with a parallel stream?

When computing Cartesian Products using streams, I can produce them in parallel, and consume them in order, which the following code demonstrates:
int min = 0;
int max = 9;
Supplier<IntStream> supplier = () -> IntStream.rangeClosed(min, max).parallel();
supplier.get()
.flatMap(a -> supplier.get().map(b -> a * b))
.forEachOrdered(System.out::println);
This will print everything perfectly in order, now consider the following code where I want to add it to a list, while preserving the order.
int min = 0;
int max = 9;
Supplier<IntStream> supplier = () -> IntStream.rangeClosed(min, max).parallel();
List<Integer> list = supplier.get()
.flatMap(a -> supplier.get().map(b -> a * b))
.boxed()
.collect(Collectors.toList());
list.forEach(System.out::println);
Now it does not print in order!
This is understandable given that I nowhere demand that the order should be preserved.
Now the question: Is there a way to collect() or is there a Collector that preserves order?
When I executed your code, I got the outputs in order. In fact got the same output for both the code. It seems like the Collector returned from Collectors.toList is already ordered, as being depicted by following code:
Collector collector = Collectors.toList();
System.out.print(collector.characteristics());
it prints:
[IDENTITY_FINISH]
Since there is no UNORDERED characteristics set for the collector, it will process the elements in order only, and which is what the behaviour I'm seeing.
In fact this is clearly mentioned in the docs of Collectors.toList():
Returns:
a Collector which collects all the input elements into a List, in encounter order

Categories