Idiomatically enumerating a Stream of objects in Java 8 - java

How can one idiomatically enumerate a Stream<T> which maps each T instance to a unique integer using Java 8 stream methods (e.g. for an array T[] values, creating a Map<T,Integer> where Map.get(values[i]) == i evaluates to true)?
Currently, I'm defining an anonymous class which increments an int field for use with the Collectors.toMap(..) method:
private static <T> Map<T, Integer> createIdMap(final Stream<T> values) {
return values.collect(Collectors.toMap(Function.identity(), new Function<T, Integer>() {
private int nextId = 0;
#Override
public Integer apply(final T t) {
return nextId++;
}
}));
}
However, is there not a more concise/elegant way of doing this using the Java 8 stream API? — bonus points if it can be safely parallelized.

Your approach will fail, if there is a duplicate element.
Besides that, your task requires mutable state, hence, can be solved with Mutable reduction. When we populate a map, we can simple use the map’s size to get an unused id.
The trickier part is the merge operation. The following operation simply repeats the assignments for the right map, which will handle potential duplicates.
private static <T> Map<T, Integer> createIdMap(Stream<T> values) {
return values.collect(HashMap::new, (m,t) -> m.putIfAbsent(t,m.size()),
(m1,m2) -> {
if(m1.isEmpty()) m1.putAll(m2);
else m2.keySet().forEach(t -> m1.putIfAbsent(t, m1.size()));
});
}
If we rely on unique elements, or insert an explicit distinct(), we can use
private static <T> Map<T, Integer> createIdMap(Stream<T> values) {
return values.distinct().collect(HashMap::new, (m,t) -> m.put(t,m.size()),
(m1,m2) -> { int leftSize=m1.size();
if(leftSize==0) m1.putAll(m2);
else m2.forEach((t,id) -> m1.put(t, leftSize+id));
});
}

I would do it in this way:
private static <T> Map<T, Integer> createIdMap2(final Stream<T> values) {
List<T> list = values.collect(Collectors.toList());
return IntStream.range(0, list.size()).boxed()
.collect(Collectors.toMap(list::get, Function.identity()));
}
For sake or parallelism, it can be changed to
return IntStream.range(0, list.size()).parallel().boxed().
(...)

Comparing to convert the input stream to List first in the solution provided by Andremoniy. I would prefer to do it in different way because we don't know the cost of "toList()" and "list.get(i)", and it's unnecessary to create an extra List, which could be small or bigger
private static <T> Map<T, Integer> createIdMap2(final Stream<T> values) {
final MutableInt idx = MutableInt.of(0); // Or: final AtomicInteger idx = new AtomicInteger(0);
return values.collect(Collectors.toMap(Function.identity(), e -> idx.getAndIncrement()));
}
Regardless to the question, I think it's a bad design to pass streams as parameters in a method.

Related

How to avoid multiple Streams with Java 8

I am having the below code
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Company"))
.findFirst()
.ifPresent(id -> {
domainResp.setId(id.getId());
});
trainResponse.getIds().stream()
.filter(id -> id.getType().equalsIgnoreCase("Private"))
.findFirst()
.ifPresent(id ->
domainResp.setPrivateId(id.getId())
);
Here I'm iterating/streaming the list of Id objects 2 times.
The only difference between the two streams is in the filter() operation.
How to achieve it in single iteration, and what is the best approach (in terms of time and space complexity) to do this?
You can achieve that with Stream IPA in one pass though the given set of data and without increasing memory consumption (i.e. the result will contain only ids having required attributes).
For that, you can create a custom Collector that will expect as its parameters a Collection attributes to look for and a Function responsible for extracting the attribute from the stream element.
That's how this generic collector could be implemented.
/** *
* #param <T> - the type of stream elements
* #param <F> - the type of the key (a field of the stream element)
*/
class CollectByKey<T, F> implements Collector<T, Map<F, T>, Map<F, T>> {
private final Set<F> keys;
private final Function<T, F> keyExtractor;
public CollectByKey(Collection<F> keys, Function<T, F> keyExtractor) {
this.keys = new HashSet<>(keys);
this.keyExtractor = keyExtractor;
}
#Override
public Supplier<Map<F, T>> supplier() {
return HashMap::new;
}
#Override
public BiConsumer<Map<F, T>, T> accumulator() {
return this::tryAdd;
}
private void tryAdd(Map<F, T> map, T item) {
F key = keyExtractor.apply(item);
if (keys.remove(key)) {
map.put(key, item);
}
}
#Override
public BinaryOperator<Map<F, T>> combiner() {
return this::tryCombine;
}
private Map<F, T> tryCombine(Map<F, T> left, Map<F, T> right) {
right.forEach(left::putIfAbsent);
return left;
}
#Override
public Function<Map<F, T>, Map<F, T>> finisher() {
return Function.identity();
}
#Override
public Set<Characteristics> characteristics() {
return Collections.emptySet();
}
}
main() - demo (dummy Id class is not shown)
public class CustomCollectorByGivenAttributes {
public static void main(String[] args) {
List<Id> ids = List.of(new Id(1, "Company"), new Id(2, "Fizz"),
new Id(3, "Private"), new Id(4, "Buzz"));
Map<String, Id> idByType = ids.stream()
.collect(new CollectByKey<>(List.of("Company", "Private"), Id::getType));
idByType.forEach((k, v) -> {
if (k.equalsIgnoreCase("Company")) domainResp.setId(v);
if (k.equalsIgnoreCase("Private")) domainResp.setPrivateId(v);
});
System.out.println(idByType.keySet()); // printing keys - added for demo purposes
}
}
Output
[Company, Private]
Note, after the set of keys becomes empty (i.e. all resulting data has been fetched) the further elements of the stream will get ignored, but still all remained data is required to be processed.
IMO, the two streams solution is the most readable. And it may even be the most efficient solution using streams.
IMO, the best way to avoid multiple streams is to use a classical loop. For example:
// There may be bugs ...
boolean seenCompany = false;
boolean seenPrivate = false;
for (Id id: getIds()) {
if (!seenCompany && id.getType().equalsIgnoreCase("Company")) {
domainResp.setId(id.getId());
seenCompany = true;
} else if (!seenPrivate && id.getType().equalsIgnoreCase("Private")) {
domainResp.setPrivateId(id.getId());
seenPrivate = true;
}
if (seenCompany && seenPrivate) {
break;
}
}
It is unclear whether that is more efficient to performing one iteration or two iterations. It will depend on the class returned by getIds() and the code of iteration.
The complicated stuff with two flags is how you replicate the short circuiting behavior of findFirst() in your 2 stream solution. I don't know if it is possible to do that at all using one stream. If you can, it will involve something pretty cunning code.
But as you can see your original solution with 2 stream is clearly easier to understand than the above.
The main point of using streams is to make your code simpler. It is not about efficiency. When you try to do complicated things to make the streams more efficient, you are probably defeating the (true) purpose of using streams in the first place.
For your list of ids, you could just use a map, then assign them after retrieving, if present.
Map<String, Integer> seen = new HashMap<>();
for (Id id : ids) {
if (seen.size() == 2) {
break;
}
seen.computeIfAbsent(id.getType().toLowerCase(), v->id.getId());
}
If you want to test it, you can use the following:
record Id(String getType, int getId) {
#Override
public String toString() {
return String.format("[%s,%s]", getType, getId);
}
}
Random r = new Random();
List<Id> ids = r.ints(20, 1, 100)
.mapToObj(id -> new Id(
r.nextBoolean() ? "Company" : "Private", id))
.toList();
Edited to allow only certain types to be checked
If you have more than two types but only want to check on certain ones, you can do it as follows.
the process is the same except you have a Set of allowed types.
You simply check to see that your are processing one of those types by using contains.
Map<String, Integer> seen = new HashMap<>();
Set<String> allowedTypes = Set.of("company", "private");
for (Id id : ids) {
String type = id.getType();
if (allowedTypes.contains(type.toLowerCase())) {
if (seen.size() == allowedTypes.size()) {
break;
}
seen.computeIfAbsent(type,
v -> id.getId());
}
}
Testing is similar except that additional types need to be included.
create a list of some types that could be present.
and build a list of them as before.
notice that the size of allowed types replaces the value 2 to permit more than two types to be checked before exiting the loop.
List<String> possibleTypes =
List.of("Company", "Type1", "Private", "Type2");
Random r = new Random();
List<Id> ids =
r.ints(30, 1, 100)
.mapToObj(id -> new Id(possibleTypes.get(
r.nextInt((possibleTypes.size()))),
id))
.toList();
You can group by type and check the resulting map.
I suppose the type of ids is IdType.
Map<String, List<IdType>> map = trainResponse.getIds()
.stream()
.collect(Collectors.groupingBy(
id -> id.getType().toLowerCase()));
Optional.ofNullable(map.get("company")).ifPresent(ids -> domainResp.setId(ids.get(0).getId()));
Optional.ofNullable(map.get("private")).ifPresent(ids -> domainResp.setPrivateId(ids.get(0).getId()));
I'd recommend a traditionnal for loop. In addition of being easily scalable, this prevents you from traversing the collection multiple times.
Your code looks like something that'll be generalised in the future, thus my generic approch.
Here's some pseudo code (with errors, just for the sake of illustration)
Set<String> matches = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
for(id : trainResponse.getIds()) {
if (! matches.add(id.getType())) {
continue;
}
switch (id.getType().toLowerCase()) {
case "company":
domainResp.setId(id.getId());
break;
case "private":
...
}
}
Something along these lines can might work, it would go through the whole stream though, and won't stop at the first occurrence.
But assuming a small stream and only one Id for each type, why not?
Map<String, Consumer<String>> setters = new HashMap<>();
setters.put("Company", domainResp::setId);
setters.put("Private", domainResp::setPrivateId);
trainResponse.getIds().forEach(id -> {
if (setters.containsKey(id.getType())) {
setters.get(id.getType()).accept(id.getId());
}
});
We can use the Collectors.filtering from Java 9 onwards to collect the values based on condition.
For this scenario, I have changed code like below
final Map<String, String> results = trainResponse.getIds()
.stream()
.collect(Collectors.filtering(
id -> id.getType().equals("Company") || id.getIdContext().equals("Private"),
Collectors.toMap(Id::getType, Id::getId, (first, second) -> first)));
And getting the id from results Map.

Stream API how to modify key and value in a map?

I have a String - Array map that looks like this
dishIdQuantityMap[43]=[Ljava.lang.String;#301d55ce
dishIdQuantityMap[42]=[Ljava.lang.String;#72cb31c2
dishIdQuantityMap[41]=[Ljava.lang.String;#1670799
dishIdQuantityMap[40]=[Ljava.lang.String;#a5b3d21
What I need to do is
Create a new map, where key - only numbers extracted from String like this ( key -> key.replaceAll("\\D+","");
Value - first value from array like this value -> value[0];
Filter an array so that only this paris left where value > 0
I've spent an hour trying to solve it myself, but fail with .collect(Collectors.toMap()); method.
UPD:
The code I've done so far. I fail to filter the map.
HashMap<Long, Integer> myHashMap = request.getParameterMap().entrySet().stream()
.filter(e -> Integer.parseInt(e.getValue()[0]) > 0)
.collect(Collectors.toMap(MapEntry::getKey, MapEntry::getValue));
You can do it by using stream and an auxiliary KeyValuePair class.
The KeyValuePair would be as simple as:
public class KeyValuePair {
public KeyValuePair(String key, int value) {
this.key = key;
this.value = value;
}
private String key;
private int value;
//getters and setters
}
Having this class you can use streams as bellow:
Map<String, Integer> resultMap = map.entrySet().stream()
.map(entry -> new KeyValuePair(entry.getKey().replaceAll("Key", "k"), entry.getValue()[0]))
.filter(kvp -> kvp.value > 0)
.collect(Collectors.toMap(KeyValuePair::getKey, KeyValuePair::getValue));
In the example I'm not replacing and filtering exactly by the conditions you need but, as you said you are having problems with the collector, you probably just have to adapt the code you currently have.
I addressed this problem in the following order:
filter out entries with value[0] > 0. This step is the last on your list, but with regards to performance, it's better to put this operation at the beginning of the pipeline. It might decrease the number of objects that have to be created during the execution of the map() operation;
update the entries. I.e. replace every entry with a new one. Note, this step doesn't require creating a custom class to represent a key-value pair, AbstractMap.SimpleEntry has been with us for a while. And since Java 9 instead of instantiating AbstractMap.SimpleEntry we can make use of the static method entry() of the Map interface;
collect entries into the map.
public static Map<Long, Integer> processMap(Map<String, String[]> source) {
return source.entrySet().stream()
.filter(entry -> Integer.parseInt(entry.getValue()[0]) > 0)
.map(entry -> updateEntry(entry))
.collect(Collectors.toMap(Map.Entry::getKey,
Map.Entry::getValue));
}
private static Map.Entry<Long, Integer> updateEntry(Map.Entry<String, String[]> entry) {
return Map.entry(parseKey(entry.getKey()), parseValue(entry.getValue()));
}
The logic for parsing keys and values was extracted into separate methods to make the code cleaner.
private static Long parseKey(String key) {
return Long.parseLong(key.replaceAll("\\D+",""));
}
private static Integer parseValue(String[] value) {
return Integer.parseInt(value[0]);
}
public static void main(String[] args) {
System.out.println(processMap(Map.of("48i;", new String[]{"1", "2", "3"},
"129!;", new String[]{"9", "5", "9"})));
}
Output
{48=1, 129=9}

How to efficiently compute the maximum value of a collection after applying some function

Suppose you have a method like this that computes the maximum of a Collection for some ToIntFunction:
static <T> void foo1(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty())
throw new NoSuchElementException();
int max = Integer.MIN_VALUE;
T maxT = null;
for (T t : collection) {
int result = function.applyAsInt(t);
if (result >= max) {
max = result;
maxT = t;
}
}
// do something with maxT
}
With Java 8, this could be translated into
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = collection.stream()
.max(Comparator.comparingInt(function))
.get();
// do something with maxT
}
A disadvantage with the new version is that function.applyAsInt is invoked repeatedly for the same value of T. (Specifically if the collection has size n, foo1 invokes applyAsInt n times whereas foo2 invokes it 2n - 2 times).
Disadvantages of the first approach are that the code is less clear and you can't modify it to use parallelism.
Suppose you wanted to do this using parallel streams and only invoke applyAsInt once per element. Can this be written in a simple way?
You can use a custom collector that keeps running pair of the maximum value and the maximum element:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
class Pair {
int max = Integer.MIN_VALUE;
T maxT = null;
}
T maxT = collection.stream().collect(Collector.of(
Pair::new,
(p, t) -> {
int result = function.applyAsInt(t);
if (result >= p.max) {
p.max = result;
p.maxT = t;
}
},
(p1, p2) -> p2.max > p1.max ? p2 : p1,
p -> p.maxT
));
// do something with maxT
}
One advantage is that this creates a single Pair intermediate object that is used through-out the collecting process. Each time an element is accepted, this holder is updated with the new maximum. The finisher operation just returns the maximum element and disgards the maximum value.
As I stated in the comments I would suggest introducing an intermediate datastructure like:
static <T> void foo2(Collection<? extends T> collection, ToIntFunction<? super T> function) {
if (collection.isEmpty()) {
throw new IllegalArgumentException();
}
class Pair {
final T value;
final int result;
public Pair(T value, int result) {
this.value = value;
this.result = result;
}
public T getValue() {
return value;
}
public int getResult() {
return result;
}
}
T maxT = collection.stream().map(t -> new Pair(t, function.applyAsInt(t)))
.max(Comparator.comparingInt(Pair::getResult)).get().getValue();
// do something with maxT
}
Another way would be to use a memoized version of function:
static <T> void foo2(Collection<? extends T> collection,
ToIntFunction<? super T> function, T defaultValue) {
T maxT = collection.parallelStream()
.max(Comparator.comparingInt(ToIntMemoizer.memoize(function)))
.orElse(defaultValue);
// do something with maxT
}
Where ToIntMemoizer.memoize(function) code would be as follows:
public class ToIntMemoizer<T> {
private final Map<T, Integer> cache = new ConcurrentHashMap<>();
private ToIntMemoizer() {
}
private ToIntFunction<T> doMemoize(ToIntFunction<T> function) {
return input -> cache.computeIfAbsent(input, function::apply);
}
public static <T> ToIntFunction<T> memoize(ToIntFunction<T> function) {
return new ToIntMemoizer<T>().doMemoize(function);
}
}
This uses a ConcurrentHashMap to cache already computed results. If you don't need to support parallelism, you can perfectly use a HashMap.
One disadvantage is that the result of the function needs to be boxed/unboxed. On the other hand, as the function is memoized, a result will be computed only once for each repeated element of the collection. Then, if the function is invoked with a repeated input value, the result will be returned from the cache.
If you don't mind using third-party library, my StreamEx optimizes all these cases in special methods like maxByInt and so on. So you can simply use:
static <T> void foo3(Collection<? extends T> collection, ToIntFunction<? super T> function) {
T maxT = StreamEx.of(collection).parallel()
.maxByInt(function)
.get();
// do something with maxT
}
The implementation uses reduce with mutable container. This probably abuses API a little, but works fine for sequential and parallel streams and unlike collect solution defers the container allocation to the first accumulated element (thus no container is allocated if parallel subtask covers no elements which occurs quite often if you have the filtering operation upstream).

Assign unique IDs to objects using Java 8 streams

static <T> Map<T, Integer> assignIds(Collection<T> objects);
I want to write a function that takes a collection of unique objects and assigns a different ID number to each. The ID numbers should be assigned sequentially.
I could easily do this with an explicit loop like:
Map<T, Integer> ids = new HashMap<>();
int id = 0;
for (T object: objects) {
ids.put(object, id++);
}
Is there an elegant way to do this with the new Java 8 Stream API?
Here's one way:
static <T> Map<T, Integer> assignIds(Collection<T> objects) {
AtomicInteger ai = new AtomicInteger();
return objects.stream()
.collect(Collectors.toMap(o -> o, o -> ai.getAndIncrement()));
}
The above solution could also make use of parallelStream() instead of stream().
Here's another that works sequentially:
static <T> Map<T, Integer> assignIds(Collection<T> objects) {
Map<T, Integer> result = new HashMap<>();
objects.stream().forEachOrdered(o -> result.put(o, result.size()));
return result;
}
Building upon ZouZou's answer...
static <T> Map<T, Integer> assignIds(Collection<T> objects) {
OfInt ids = IntStream.range(0, objects.size()).iterator();
return objects.stream().collect(Collectors.toMap(o -> o, o -> ids.next()));
}
The idiomatic way to do this in for instance Scala would be to use zipWithIndex. There's no such method in the Java 8 Streams API, not even a zip method which you could combine with an IntStream.
You could use a primitive iterator to generate the ids:
static <T> Map<T, Integer> assignIds(Collection<T> objects) {
PrimitiveIterator.OfInt iterator = IntStream.iterate(0, x -> x + 1)
.limit(objects.size())
.iterator();
return objects.stream().collect(Collectors.toMap(obj -> obj, id -> iterator.next()));
}
You might be interested to use the protonpack library which defines some utility methods for Streams (such as zipWithIndex). So it could looks like this:
static <T> Map<T, Long> assignIds(Collection<T> objects) {
return StreamUtils.zipWithIndex(objects.stream())
.collect(Collectors.toMap(Indexed::getValue, Indexed::getIndex));
}

Two way collections in Java

I have a list of objects. The objects are given an ID and stored in a Hashtable. If I need an object with particular ID, I simply say:
ht.get(ID);
However, sometimes I need to get the ID for a given object:
ht.get(Object);
My first idea is to use two different HashTables; one for ID -> Object mapping and the other for Object -> ID mapping.
Does this sound like a good enough solution?
If you cannot use external collections (as you seem to not want to use given one of your comments) you could write a simple class to do what you want (which, yes, is essentially your first thought), along the lines of (I didn't compile this, and it is just a first thought so could be a bad idea, etc ...):
EDIT: now there are two versions, one that allows for duplicate values and one that does not. The ones that does not will remove the key if the value is overwritten.
This version does not allow duplicate values:
class Foo<K, V>
{
private final Map<K, V> keyValue;
private final Map<V, K> valueKey;
{
keyValue = new HashMap<K, V>();
valueKey = new HashMap<V, K>();
}
// this makes sure that if you do not have duplicate values.
public void put(final K key, final V value)
{
if(keyValue.containsValue(value))
{
keyValue.remove(valueKey.get(value));
}
keyValue.put(key, value);
valueKey.put(value, key);
}
public V getValueForKey(final K key)
{
return (keyValue.get(key));
}
public K getKeyForValue(final V value)
{
return (valueKey.get(value));
}
public static void main(final String[] argv)
{
Foo<String, String> foo;
foo = new Foo<String, String>();
foo.put("a", "Hello");
foo.put("b", "World");
foo.put("c", "Hello");
System.out.println(foo.getValueForKey("a"));
System.out.println(foo.getValueForKey("b"));
System.out.println(foo.getValueForKey("c"));
System.out.println(foo.getKeyForValue("Hello"));
System.out.println(foo.getKeyForValue("World"));
}
}
This version allows duplicated values and gives you back a list of all of the keys that have a given value:
class Foo<K, V>
{
private final Map<K, V> keyValue;
private final Map<V, List<K>> valueKeys;
{
keyValue = new HashMap<K, V>();
valueKeys = new HashMap<V, List<K>>();
}
public void put(final K key, final V value)
{
List<K> values;
keyValue.put(key, value);
values = valueKeys.get(value);
if(values == null)
{
values = new ArrayList<K>();
valueKeys.put(value, values);
}
values.add(key);
}
public V getValueForKey(final K key)
{
return (keyValue.get(key));
}
public List<K> getKeyForValue(final V value)
{
return (valueKeys.get(value));
}
public static void main(final String[] argv)
{
Foo<String, String> foo;
foo = new Foo<String, String>();
foo.put("a", "Hello");
foo.put("b", "World");
foo.put("c", "Hello");
System.out.println(foo.getValueForKey("a"));
System.out.println(foo.getValueForKey("b"));
System.out.println(foo.getValueForKey("c"));
System.out.println(foo.getKeyForValue("Hello"));
System.out.println(foo.getKeyForValue("World"));
}
}
Hiding the two maps in a class is a good idea, because of you find a better way later all you need to do is replace the innards of the class and the rest of your code is left untouched.
If using an external library is OK, you should check BiMap on google collections:
http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/BiMap.html
What you are looking for is a bidirectional map. You can find it in the commons collections in the classes implementing the BidiMap interface or the Google Guava.
What you are looking for is a Bi-directional Map.
Try Apache Collections BidiMap.
http://commons.apache.org/collections/api-3.1/org/apache/commons/collections/BidiMap.html
Not that I know of immediatley but you can build one ... How about having a single collection of your objects and several lookup structures (hashmaps or trees) that don't store the objects themselves (for memory saving reasons) but the index into your single collection? This way you use the appropriate lookup structure you need (Id -> object or vice versa) get back an integer value that you can index into your original collection. This way you can do more than a bidirectional lookup in case you need to do so in the future.

Categories