Using Java Streams to operate on an integer power set - java

I am generating a power set (Set<Set<Integer>>) from an original set (Set<Integer>).
i.e. {1, 2, 3} -> { {}, {1}, {2}, {3}, {1,2}, {2,3}, {1,3}, {1,2,3} }
Then I am using an isClique(Set<Integer>) method that returns a boolean if the given set is a clique in the adjacency matrix I am using.
I want to use a java stream to parallelize this operation and return the largest subset that is also a clique.
I am thinking something like this, but every variation I come up with causes a variety of compilation errors.
Optional result = powerSet.stream().parallel().
filter(e ->{return(isClique(e));}).
collect(Collectors.maxBy(Comparator Set<Integer> comparator));
I either get:
MaxClique.java:86: error: incompatible types: Stream<Set<Integer>> cannot be converted to Set<Integer>
currentMax = powerSet.stream().parallel().filter(e -> { return (isClique(e));});//.collect(Collectors.maxBy(Comparator <Set<Integer>> comparator));
or something related to the comparator (which I'm not sure I'm doing correctly).
Please advise, thanks.

You have some syntax problems. But beside that, you can compute the same optional using:
Optional<Set<Integer>> result = powerSet.stream().parallel()
.filter(e -> isClique(e))
.collect(
Collectors.maxBy(
(set1, set2) -> Integer.compare(set1.size(), set2.size())
)
);
This is filtering based on your condition, then pulling the max value based on a comparator that compares set sizes.

Your major issue is using the wrong syntax for the comparator. Rather, you'd want something along the lines of:
Optional<Set<Integer>> resultSet =
powerSet.stream()
.parallel()
.filter(e -> isClique(e))
.max(Comparator.comparingInt(Set::size));
Note the use of the max method as opposed to the maxBy, this is because the maxBy is typically used as a downstream collector. in fact, the real motivation for it to exist is to be used as a downstream collector.
Also, note the use of Optional<Set<Integer>> being the receiver type as opposed to Optional as in your example code snippet. The latter is a raw type and you should avoid to use them unless there is no choice.
Lastly, but not least, if you haven't already done so then I'd suggest you try executing the code sequentially first and if you think you can benefit from parallel streams then you can proceed with the current approach.

Related

Java stream map-if-or-else with predicate and alternative value

Do Java streams have a convenient way to map based upon a predicate, but if the predicate is not met to map to some other value?
Let's say I have Stream.of("2021", "", "2023"). I want to map that to Stream.of(Optional.of(Year.of(2021)), Optional.empty(), Optional.of(Year.of(2023))). Here's one way I could do that:
Stream<String> yearStrings = Stream.of("2021", "", "2023");
Stream<Optional<Year>> yearsFound = yearStrings.map(yearString ->
!yearString.isEmpty() ? Year.parse(yearString) : null)
.map(Optional::ofNullable);
But here is what I would like to do, using a hypothetical filter-map:
Stream<String> yearStrings = Stream.of("2021", "", "2023");
Stream<Optional<Year>> yearsFound = yearStrings.mapIfOrElse(not(String::isEmpty),
Year::parse, null).map(Optional::ofNullable);
Of course I can write my own mapIfOrElse(Predicate<>, Function<>, T) function to use with Stream.map(), but I wanted to check if there is something similar in Java's existing arsenal that I've missed.
There is not a very much better way of doing it than you have it - it might be nicer if you extracted it to a method, but that's really it.
Another way might be to construct Optionals from all values, and then use Optional.filter to map empty values to empty optionals:
yearStreams.map(Optional::of)
.map(opt -> opt.filter(Predicate.not(String::isEmpty)));
Is this better? Probably not.
Yet another way would be to make use of something like Guava's Strings.emptyToNull (other libraries are available), which turns your empty strings into null first; and then use Optional.ofNullable to turn non-nulls and nulls into non-empty and empty Optionals, respectively:
yearStreams.map(Strings::emptyToNull)
.map(Optional::ofNullable)
You can just simply use filter to validate and then only map
Stream<Year> yearsFound = yearStrings.filter(yearString->!yearString.isEmpty()).map(Year::parse)
It's hardly possible to combine all these actions smoothly in well-readable way within a single stream operation.
Here's a weird method-chaining with Java 16 mapMulti():
Stream<Optional<Year>> yearsFound = yearStrings
.mapMulti((yearString, consumer) ->
Optional.of(yearString).filter(s -> !s.isEmpty()).map(Year::parse)
.ifPresentOrElse(year -> consumer.accept(Optional.of(year)),
() -> consumer.accept(Optional.empty()))
);

How to get index of findFirst() in java 8?

I have the following code:
ArrayList <String> entries = new ArrayList <String>();
entries.add("0");
entries.add("1");
entries.add("2");
entries.add("3");
String firstNotHiddenItem = entries.stream()
.filter(e -> e.equals("2"))
.findFirst()
.get();
I need to know what is the index of that first returned element, since I need to edit it inside of entries ArrayList. As far as I know get() returns the value of the element, not a reference. Should I just use
int indexOf(Object o)
instead?
You can get the index of an element using an IntStream like:
int index = IntStream.range(0, entries.size())
.filter(i -> "2".equals(entries.get(i)))
.findFirst().orElse(-1);
But you should use the List::indexOf method which is the preferred way, because it's more concise, more expressive and computes the same results.
You can't in a straightforward way - streams process elements without context of where they are in the stream.
However, if you're prepared to take the gloves off...
int[] position = {-1};
String firstNotHiddenItem = entries.stream()
.peek(x -> position[0]++) // increment every element encounter
.filter("2"::equals)
.findFirst()
.get();
System.out.println(position[0]); // 2
The use of an int[], instead of a simple int, is to circumvent the "effectively final" requirement; the reference to the array is constant, only its contents change.
Note also the use of a method reference "2"::equals instead of a lambda e -> e.equals("2"), which not only avoids a possible NPE (if a stream element is null) and more importantly looks way cooler.
A more palatable (less hackalicious) version:
AtomicInteger position = new AtomicInteger(-1);
String firstNotHiddenItem = entries.stream()
.peek(x -> position.incrementAndGet()) // increment every element encounter
.filter("2"::equals)
.findFirst()
.get();
position.get(); // 2
This will work using Eclipse Collections with Java 8
int firstIndex = ListIterate.detectIndex(entries, "2"::equals);
If you use a MutableList, you can simplify the code as follows:
MutableList<String> entries = Lists.mutable.with("0", "1", "2", "3");
int firstIndex = entries.detectIndex("2"::equals);
There is also a method to find the last index.
int lastIndex = entries.detectLastIndex("2"::equals);
Note: I am a committer for Eclipse Collections
Yes, you should use indexOf("2") instead. As you might have noticed, any stream based solution has a higher complexity, without providing any benefit.
In this specific situation, there is no significant difference in performance, but overusing streams can cause dramatic performance degradation, e.g. when using map.entrySet().stream().filter(e -> e.getKey().equals(object)).map(e -> e.getValue()) instead of a simple map.get(object).
The collection operations may utilize their known structure while most stream operation imply a linear search. So genuine collection operations are preferable.
Of course, if there is no collection operation, like when your predicate is not a simple equality test, the Stream API may be the right tool. As shown in “Is there a concise way to iterate over a stream with indices in Java 8?”, the solution for any task involving the indices works by using the indices as starting point, e.g. via IntStream.range, and accessing the list via List.get(int). If the source in not an array or a random access List, there is no equally clean and efficient solution. Sometimes, a loop might turn out to be the simplest and most efficient solution.

Streams with TreeMap return incoherent results

I am trying to solve the following exercise from "Core Java for the Impatient" by Cay Horstmann:
When an encoder of a Charset with partial Unicode coverage can’t encode a
character, it replaces it with a default—usually, but not always, the encoding of "?".
Find all replacements of all available character sets that support encoding. Use the
newEncoder method to get an encoder, and call its replacement method to get
the replacement. For each unique result, report the canonical names of the charsets
that use it.
For the sake of education, I have decided to tackle the exercise with gargantuan one-liner using the streaming API, even though - in my opinion - a cleaner solution would divide the calculations into a number of steps, with intermediate variables in-between (certainly it would ease the debugging). Without further ado, here is a monster of code I have created:
Charset.availableCharsets().values().stream().filter(charset -> charset.canEncode()).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode())),
Collectors.mapping( charset -> charset.name(), Collectors.toList()))).
values().stream().map(list -> list.stream().collect(Collectors.joining(", "))).forEach(System.out::println);
Basically, we take into account only the charsets that canEncode; create a Map with replacement as key and a list of canonical names as values; because grouping didn't work for arrays with default implementation of groupingBy, which uses HashMap, I have decided to use TreeMap. We then work with the Lists of canonical names, join them with comma and print.
Unfortunately, I have found it to give incoherent results. If I run the function twice in the same program, the first instance returns results consisting of 23 Strings, the second one - just 21 Strings. I suspect it has something to do with a poor implementation of Comparator for TreeMap, which was defined as follows:
((arr1, arr2) -> Arrays.equals(arr1, arr2) == true ? 0 : Integer.compare(arr1.hashCode(), arr2.hashCode()))
If that is the cause, what should be a proper Comparator in this case? Apart from that, can the one-liner be improved in any way?
I am also curious if such convoluted constructs as the code I have written are encountered in professional programs? Maybe it's only me who find it unreadable?
There is no guarantee that the hash code of two distinct instances will be different. That would be an ideal situation, but is never guaranteed. Only the opposite is true: if two objects are equal, they have the same hash code.
So if you create a comparator that considers the objects to be the same when they have the same hash code, arbitrary objects might be considered to be the same. Since the byte[] arrays returned by replacement() are defensive copies, read temporary objects, the result may vary in every run of this code.
Further, since the hash code of an array has nothing to do with its content, your comparator violates the transitivity rule: two arrays with equal content are supposed to be the same, but since they might/very likely have different hash codes, they have a different relation when being compared with a third array, not having the same content, a == b, but a < c and b > c. This is the reason why even equal arrays, which you compare by Arrays.equals can end up in different groups, as the TreeSet failed to find the existing key when comparing with other keys then.
If you want the arrays to be compared by value, you can use:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> charset.newEncoder().replacement(),
() -> new TreeMap<>(Comparator.comparing(ByteBuffer::wrap)),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
ByteBuffers are Comparable and consistently evaluate the contents of the wrapped array.
I moved the Collectors.joining collector into the grouping collector to avoid the creation of the temporary List whose contents you are going to join afterwards anyway.
By the way, never use code like expression == true. There is no reason to append == true as expression is already sufficient.
Since you are only interested in the values, in other words, don’t need the keys to be of a certain type, you may wrap all arrays beforehand, simplifying the operation and even make it slightly more efficient:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
TreeMap::new,
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This change even allows resorting to hashing, if no consistent iteration order is required:
Charset.availableCharsets().values().stream().filter(Charset::canEncode).collect(
Collectors.groupingBy(
charset -> ByteBuffer.wrap(charset.newEncoder().replacement()),
Collectors.mapping(Charset::name, Collectors.joining(", "))))
.values().forEach(System.out::println);
This works, because ByteBuffer also implements equals and hashCode.

Conditionally add an operation to a Java 8 stream

I'm wondering if I can add an operation to a stream, based off of some sort of condition set outside of the stream. For example, I want to add a limit operation to the stream if my limit variable is not equal to -1.
My code currently looks like this, but I have yet to see other examples of streams being used this way, where a Stream object is reassigned to the result of an intermediate operation applied on itself:
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
stream.collect(Collectors.toList());
As stated in this stackoverflow post, the filter isn't actually applied until a terminal operation is called. Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
There is no semantic difference between a chained series of invocations and a series of invocations storing the intermediate return values. Thus, the following code fragments are equivalent:
a = object.foo();
b = a.bar();
c = b.baz();
and
c = object.foo().bar().baz();
In either case, each method is invoked on the result of the previous invocation. But in the latter case, the intermediate results are not stored but lost on the next invocation. In the case of the stream API, the intermediate results must not be used after you have called the next method on it, thus chaining is the natural way of using stream as it intrinsically ensures that you don’t invoke more than one method on a returned reference.
Still, it is not wrong to store the reference to a stream as long as you obey the contract of not using a returned reference more than once. By using it they way as in your question, i.e. overwriting the variable with the result of the next invocation, you also ensure that you don’t invoke more than one method on a returned reference, thus, it’s a correct usage. Of course, this only works with intermediate results of the same type, so when you are using map or flatMap, getting a stream of a different reference type, you can’t overwrite the local variable. Then you have to be careful to not use the old local variable again, but, as said, as long as you are not using it after the next invocation, there is nothing wrong with the intermediate storage.
Sometimes, you have to store it, e.g.
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt"))) {
stream.filter(s -> !s.isEmpty()).forEach(System.out::println);
}
Note that the code is equivalent to the following alternatives:
try(Stream<String> stream = Files.lines(Paths.get("myFile.txt")).filter(s->!s.isEmpty())) {
stream.forEach(System.out::println);
}
and
try(Stream<String> srcStream = Files.lines(Paths.get("myFile.txt"))) {
Stream<String> tmp = srcStream.filter(s -> !s.isEmpty());
// must not be use variable srcStream here:
tmp.forEach(System.out::println);
}
They are equivalent because forEach is always invoked on the result of filter which is always invoked on the result of Files.lines and it doesn’t matter on which result the final close() operation is invoked as closing affects the entire stream pipeline.
To put it in one sentence, the way you use it, is correct.
I even prefer to do it that way, as not chaining a limit operation when you don’t want to apply a limit is the cleanest way of expression your intent. It’s also worth noting that the suggested alternatives may work in a lot of cases, but they are not semantically equivalent:
.limit(condition? aLimit: Long.MAX_VALUE)
assumes that the maximum number of elements, you can ever encounter, is Long.MAX_VALUE but streams can have more elements than that, they even might be infinite.
.limit(condition? aLimit: list.size())
when the stream source is list, is breaking the lazy evaluation of a stream. In principle, a mutable stream source might legally get arbitrarily changed up to the point when the terminal action is commenced. The result will reflect all modifications made up to this point. When you add an intermediate operation incorporating list.size(), i.e. the actual size of the list at this point, subsequent modifications applied to the collection between this point and the terminal operation may turn this value to have a different meaning than the intended “actually no limit” semantic.
Compare with “Non Interference” section of the API documentation:
For well-behaved stream sources, the source can be modified before the terminal operation commences and those modifications will be reflected in the covered elements. For example, consider the following code:
List<String> l = new ArrayList(Arrays.asList("one", "two"));
Stream<String> sl = l.stream();
l.add("three");
String s = sl.collect(joining(" "));
First a list is created consisting of two strings: "one"; and "two". Then a stream is created from that list. Next the list is modified by adding a third string: "three". Finally the elements of the stream are collected and joined together. Since the list was modified before the terminal collect operation commenced the result will be a string of "one two three".
Of course, this is a rare corner case as normally, a programmer will formulate an entire stream pipeline without modifying the source collection in between. Still, the different semantic remains and it might turn into a very hard to find bug when you once enter such a corner case.
Further, since they are not equivalent, the stream API will never recognize these values as “actually no limit”. Even specifying Long.MAX_VALUE implies that the stream implementation has to track the number of processed elements to ensure that the limit has been obeyed. Thus, not adding a limit operation can have a significant performance advantage over adding a limit with a number that the programmer expects to never be exceeded.
There is two ways you can do this
// Do some stream stuff
List<E> results = list.stream()
.filter(e -> e.getTimestamp() < max);
.limit(limit > 0 ? limit : list.size())
.collect(Collectors.toList());
OR
// Do some stream stuff
stream = stream.filter(e -> e.getTimestamp() < max);
// Limit the stream
if (limit != -1) {
stream = stream.limit(limit);
}
// Collect stream to list
List<E> results = stream.collect(Collectors.toList());
As this is functional programming you should always work on the result of each function. You should specifically avoid modifying anything in this style of programming and treat everything as if it was immutable if possible.
Since I'm reassigning the value of stream before a terminal operation is called, is the above code still a proper way to use Java 8 streams?
It should work, however it reads as a mix of imperative and functional coding. I suggest writing it as a fixed stream as per my first answer.
I think your first line needs to be:
stream = stream.filter(e -> e.getTimestamp() < max);
so that your using the stream returned by filter in subsequent operations rather than the original stream.
I known it is a bit too late, but I had the same question myself and didn't find the satisfying answer, however, inspired by this question and answers I came to the following solution:
return Stream.of( ///< wrap target stream in other stream ;)
/*do regular stream stuff*/
stream.filter(e -> e.getTimestamp() < max)
).flatMap(s -> limit != -1 ? s.limit(limit) : s) ///< apply limit only if necessary and unwrap stream of stream to "normal" stream
.collect(Collectors.toList()) ///< do final stuff

Get last element of Stream/List in a one-liner

How can I get the last element of a stream or list in the following code?
Where data.careas is a List<CArea>:
CArea first = data.careas.stream()
.filter(c -> c.bbox.orientationHorizontal).findFirst().get();
CArea last = data.careas.stream()
.filter(c -> c.bbox.orientationHorizontal)
.collect(Collectors.toList()).; //how to?
As you can see getting the first element, with a certain filter, is not hard.
However getting the last element in a one-liner is a real pain:
It seems I cannot obtain it directly from a Stream. (It would only make sense for finite streams)
It also seems that you cannot get things like first() and last() from the List interface, which is really a pain.
I do not see any argument for not providing a first() and last() method in the List interface, as the elements in there, are ordered, and moreover the size is known.
But as per the original answer: How to get the last element of a finite Stream?
Personally, this is the closest I could get:
int lastIndex = data.careas.stream()
.filter(c -> c.bbox.orientationHorizontal)
.mapToInt(c -> data.careas.indexOf(c)).max().getAsInt();
CArea last = data.careas.get(lastIndex);
However it does involve, using an indexOf on every element, which is most likely not you generally want as it can impair performance.
It is possible to get the last element with the method Stream::reduce. The following listing contains a minimal example for the general case:
Stream<T> stream = ...; // sequential or parallel stream
Optional<T> last = stream.reduce((first, second) -> second);
This implementations works for all ordered streams (including streams created from Lists). For unordered streams it is for obvious reasons unspecified which element will be returned.
The implementation works for both sequential and parallel streams. That might be surprising at first glance, and unfortunately the documentation doesn't state it explicitly. However, it is an important feature of streams, and I try to clarify it:
The Javadoc for the method Stream::reduce states, that it "is not constrained to execute sequentially".
The Javadoc also requires that the "accumulator function must be an associative, non-interfering, stateless function for combining two values", which is obviously the case for the lambda expression (first, second) -> second.
The Javadoc for reduction operations states: "The streams classes have multiple forms of general reduction operations, called reduce() and collect() [..]" and "a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless."
The documentation for the closely related Collectors is even more explicit: "To ensure that sequential and parallel executions produce equivalent results, the collector functions must satisfy an identity and an associativity constraints."
Back to the original question: The following code stores a reference to the last element in the variable last and throws an exception if the stream is empty. The complexity is linear in the length of the stream.
CArea last = data.careas
.stream()
.filter(c -> c.bbox.orientationHorizontal)
.reduce((first, second) -> second).get();
If you have a Collection (or more general an Iterable) you can use Google Guava's
Iterables.getLast(myIterable)
as handy oneliner.
One liner (no need for stream;):
Object lastElement = list.isEmpty() ? null : list.get(list.size()-1);
Guava has dedicated method for this case:
Stream<T> stream = ...;
Optional<T> lastItem = Streams.findLast(stream);
It's equivalent to stream.reduce((a, b) -> b) but creators claim it has much better performance.
From documentation:
This method's runtime will be between O(log n) and O(n), performing
better on efficiently splittable streams.
It's worth to mention that if stream is unordered this method behaves like findAny().
list.stream().sorted(Comparator.comparing(obj::getSequence).reversed()).findFirst().get();
reverse the order and get the first element from the list. here object has sequence number, Comparator provides multiple functionalities can be used as per logic.
Another way to get the last element is by using sort.
Optional<CArea> num=data.careas.stream().sorted((a,b)->-1).findFirst();
You can also use skip() function as below...
long count = data.careas.count();
CArea last = data.careas.stream().skip(count - 1).findFirst().get();
it's super simple to use.
One more approach. Pair will have first and last elements:
List<Object> pair = new ArrayList<>();
dataStream.ForEach(o -> {
if (pair.size() == 0) {
pair.add(o);
pair.add(o);
}
pair.set(1, o);
});
If you need to get the last N number of elements. Closure can be used.
The below code maintains an external queue of fixed size until, the stream reaches the end.
final Queue<Integer> queue = new LinkedList<>();
final int N=5;
list.stream().peek((z) -> {
queue.offer(z);
if (queue.size() > N)
queue.poll();
}).count();
Another option could be to use reduce operation using identity as a Queue.
final int lastN=3;
Queue<Integer> reduce1 = list.stream()
.reduce(
(Queue<Integer>)new LinkedList<Integer>(),
(m, n) -> {
m.offer(n);
if (m.size() > lastN)
m.poll();
return m;
}, (m, n) -> m);
System.out.println("reduce1 = " + reduce1);

Categories