Why Java 8 Stream interface does not have min() no-parameter version? - java

java.util.stream.Stream interface has two versions of sorted method – sorted() which sorts elements in natural order and sorted(Comparator). Why min() method was not introduced to Stream interface, which would return minimal element from natural-ordering point of view?

It should be clear that for min, max, and sorted, adding a method to Stream that does not require a comparator introduces a way to lose the generic type safety. The reason is that the current version of the Java language does not support restricting methods to instances of a specific parameterization, i.e. limit them to streams of comparable elements.
So the question could be the other way round, why has this potential break of the type safety been allowed with sorted()?
I can’t look into the developers mind, but one interesting point is that sorting has been treated specially for a long time now. With the introduction of Generics, it became possible to enforce that sorting without a Comparator can only be attempted for collections or arrays with comparable elements. However, especially when implementing generic collections, developers might face the fact that arrays can’t be created with a generic element type. There might be other scenarios, where a developer encounters an array or collection of a formally non-comparable type while the contained elements are comparable for sure. As said, I can’t look into the developers mind to say, which scenarios were considered.
But
Arrays.sort(Object[]) does not enforce the array type to be a subtype of Comparable. Even if it did,
sort(T[] a, Comparator<? super T> c) specifies that a null comparator implies “natural order”, which allows requesting natural order for any type
Collections.sort(List<T> list) requires a comparable element type, but
Collections.sort(List<T> list, Comparator<? super T> c) again specifies that a null comparator implies “natural order”, so there’s still an easy way to undermine the type system. Since the “null means natural” rule was already specified before Generics existed, it had to be kept for compatibility.
But it’s not just all about backwards compatibility. List.sort(Comparator), introduced in Java 8, is also specified as accepting null as argument for “natural order”, so now we have another scenario, where an implementer might have to sort data without a compile-time type that guarantees comparable elements.
So when it comes to sorting, there are already lots of opportunities to dodge the type system. But Stream.sorted(Comparator) is the only sort method not accepting a null comparator. So sorting by natural order without specifying Comparator.naturalOrder() is only possible using sorted() without arguments. By the way, having an already sorted input with a null comparator and requesting sorted() without comparator is the only situation where the Stream implementation will detect that sorting isn’t necessary, i.e. it doesn’t compare comparators and doesn’t check for Comparator.naturalOrder().
Generally, the type safety of comparators is astonishing weak. E.g. Collections.reverseOrder() returns a comparator of arbitrary type, not demanding the type to be comparable. So instead of min(), you could use max(Collections.reverseOrder()) to request the minimum, regardless of the stream’s formal type. Or use Collections.reverseOrder(Collections.reverseOrder()) to get the equivalent of Comparator.naturalOrder() for an arbitrary type. Likewise, Collator implements Comparator<Object>, for whatever reason, despite it can only compare Strings.

I'd assume that would just pollute the API. one could say "why is there no max parameterless version", "why is there no CharStream" and it could on and on in terms of the things that could be made available but were decided best not to.
having a parameterless min method over this:
someList.stream().min(Comparator.naturalOrder());
would be no different.
therefore it's best to just create a reusable method rather than polluting the API with all the possible things.

I think min() only allows the signature that accepts a Comparator because the Stream could be of any type, even a type created by you. In such case it would be impossible to rely on a natural order, as the class you've created, can't have a natural order until you specify it.
If, insteam of the class Stream you use, IntStream, you'll see that a min() method with no arguments is defined. This does what you want. It is the following:
public static void main (String... args){
IntStream s=IntStream.of(1,2,3,4,5,6,7,8,9,10);
System.out.println(s.min().getAsInt());
}

Comparator.naturalOrder serves this purpose. Its existence (or custom implementations thereof) allows other classes to remain simpler because they don't have to implement special codepaths for null-values in Comparator fields.
Its type will also force the stream's T to implement Comparable.

Related

Concrete List type as parameter in java method

If there is a reason to make a parameter not generic, is that a good approach?
Let's say I know in the method there takes place only access of members of list but not insertion should I force the developers to pass an ArrayList.
public void method(ArrayList<Integer> list)
{
// ......
}
as you can see developers have to pass a list of type ArrayList otherwise they get error.
should I force the developers to pass an ArrayList.
IMO, no. The developer might use the list somewhere else. Only he knows the actual usage of the list and it should be up to him to choose the best list implementation. If there is no actual reason to enforce a specific implementation, you should always use the List interface.
Generally speaking, there are two good (albeit possibly rare) reasons I can think of to do this.
First, if you want to use a method that's only present in a specific implementation. ArrayList doesn't seem to have too many useful methods that aren't already specified by the List or even Collection interfaces, but it's still a possibility.
Second, which is a slight variation on the previous reason, is if you want to convey some performance expectations of your method. For example, both the ArrayList and LinkedList classes have a get(int) method. ArrayList's implementation works in constant time (O(1)), while LinkedList's is linearly dependent on the size of the list (O(n)). If your method relies heavily on this method, you may not want to allow calling it with a LinkedList.
The main issue with this is that the caller probably hasn't typed their List as an ArrayList, so even if they do have an ArrayList, they would probably have to write an explicit cast at every call-site. That would make the caller's code messier.
Another issue is that ArrayList is not the only class which implements List and supports random access in O(1) time. For example, Arrays.asList returns a java.util.Arrays$ArrayList object which is not an instance of java.util.ArrayList. Alternatively, for example if I want to implement a sparse List using a hashtable, such that get takes O(1) time but it's not an ArrayList, then I wouldn't be able to supply my list to this method even though it meets the performance requirements.
My solution would be to check the argument at runtime:
if(list instanceof LinkedList) {
// ...
}
Then you can either log a warning to inform about the performance issue, or (if you must be prescriptive), throw an IllegalArgumentException to say the method should not be called with a LinkedList. This means the caller's mistake would be detected at runtime rather than compile-time, but it should be detected the very first time it's tested, so that is no great loss.
Another option, of course, is to take an array instead of a list as your parameter. That won't be suitable for all purposes, but it's e.g. how the standard library's binarySearch method ensures it takes a sequence which supports random access in O(1) time.

Java variable type Collection for HashSet or other implementations?

I have often seen declarations like List<String> list = new ArrayList<>(); or Set<String> set = new HashSet<>(); for fields in classes. For me it makes perfect sense to use the interfaces for the variable types to provide flexibility in the implementation. The examples above do still define which kind of Collections have to be used, respectively which operations are allowed and how it should behave in some cases (due to docs).
Now consider the case where actually only the functionality of the Collection (or even the Iterable) interface is required to use the field in the class and the kind of Collection doesn't actually matter or I don't want to overspecify it. So I choose for example HashSet as implementation and declare the field as Collection<String> collection = new HashSet<>();.
Should the field then actually be of type Set in this case? Is this kind of declaration bad practice, if so, why? Or is it good practice to specify the actual type as less as possible (and still provide all required methods). The reason why I ask this is because I have hardly ever seen such a declaration and lately I get more an more in the situation where I only need to specify the functionality of the Collection interface.
Example:
// Only need Collection features, but decided to use a LinkedList
private final Collection<Listener> registeredListeners = new LinkedList<>();
public void init() {
ExampleListener listener = new ExampleListener();
registerListenerSomewhere(listener);
registeredListeners.add(listener);
listener = new ExampleListener();
registerListenerSomewhere(listener);
registeredListeners.add(listener);
}
public void reset() {
for (Listener listener : registeredListeners) {
unregisterListenerSomewhere(listener);
}
registeredListeners.clear();
}
Since your example uses a private field it doesn't matter all that much about hiding the implementation type. You (or whoever is maintaining this class) can always just go look at the field's initializer to see what it is.
Depending on how it's used, though, it might be worth declaring a more specific interface for the field. Declaring it to be a List indicates that duplicates are allowed and that ordering is significant. Declaring it to be a Set indicates that duplicates aren't allowed and that ordering is not significant. You might even declare the field to have a particular implementation class if there's something about it that's significant. For example, declaring it to be LinkedHashSet indicates that duplicates aren't allowed but that ordering is significant.
The choice of whether to use an interface, and what interface to use, becomes much more significant if the type appears in the public API of the class, and on what the compatibility constraints on this class are. For example, suppose there were a method
public ??? getRegisteredListeners() {
return ...
}
Now the choice of return type affects other classes. If you can change all the callers, maybe it's no big deal, you just have to edited other files. But suppose the caller is an application that you have no control over. Now the choice of interface is critical, as you can't change it without potentially breaking the applications. The rule here is usually to choose the most abstract interface that supports the operations you expect callers to want to perform.
Most of the Java SE APIs return Collection. This provides a fair degree of abstraction from the underlying implementation, but it also provides the caller a reasonable set of operations. The caller can iterate, get the size, do a contains check, or copy all the elements to another collection.
Some code bases use Iterable as the most-abstract interface to return. All it does is allow the caller to iterate. Sometimes this is all that's necessary, but it might be somewhat limiting compared to Collection.
Another alternative is to return a Stream. This is helpful if you think the caller might want to use stream's operations (such as filter, map, find, etc.) instead of iterating or using collection operations.
Note that if you choose to return Collection or Iterable, you need to make sure that you return an unmodifiable view or make a defensive copy. Otherwise, callers could modify your class's internal data, which would probably lead to bugs. (Yes, even an Iterable can permit modification! Consider getting an Iterator and then calling the remove() method.) If you return a Stream, you don't need to worry about that, since you can't use a Stream to modify the underlying source.
Note that I turned your question about the declaration of a field into a question about the declaration of method return types. There is this idea of "program to the interface" that's quite prevalent in Java. In my opinion it doesn't matter very much for local variables (which is why it's usually fine to use var), and it matters little for private fields, since those (almost) by definition affect only the class in which they're declared. However, the "program to the interface" principle is very important for API signatures, so those cases are where you really need to think about interface types. Private fields, not so much.
(One final note: there is a case where you need to be concerned about the types of private fields, and that's when you're using a reflective framework that manipulates private fields directly. In that case, you need to think of those fields as being public -- just like method return types -- even though they're not declared public.)
As with all things, it's a question of tradeoffs. There are two opposing forces.
The more generic the type, the more freedom the implementation has. If you use Collection you're free to use an ArrayList, HashSet, or LinkedList without affecting the user/caller.
The more generic the return type, the less features there are available to the user/caller. A List provides index-based lookup. A SortedSet makes it easy to get contiguous subsets via headSet, tailSet, and subSet. A NavigableSet provides efficient O(log n) binary search lookup methods. If you return Collection, none of these are available. Only the most generic access functions can be used.
Furthermore, the sub-types guarantee special properties that Collection does not: Sets hold unique items. SortedSets are sorted. Lists have an order; they're not unordered bags of items. If you use Collection then the user/caller can't necessarily assume that these properties hold. They may be forced to code defensively and, for instance, handle duplicate items even if you know there won't be duplicates.
A reasonable decision process might be:
If O(1) indexed access is guaranteed, use List.
If elements are sorted and unique, use SortedSet or NavigableSet.
If element uniqueness is guaranteed and order is not, use Set.
Otherwise, use Collection.
It really depends on what you want to do with the collection object.
Collection<String> cSet = new HashSet<>();
Collection<String> cList = new ArrayList<>();
Here in this case if you want you can do :
cSet = cList;
But if you do like :
Set<String> cSet = new HashSet<>();
the above operation is not permissible though you can construct a new list using the constructor.
Set<String> set = new HashSet<>();
List<String> list = new ArrayList<>();
list = new ArrayList<>(set);
So basically depending on the usage you can use Collection or Set interface.

Is it appropriate to extend List to add fields

The question is framed for List but easily applies to others in the java collections framework.
For example, I would say it is certainly appropriate to make a new List sub-type to store something like a counter of additions since it is an integral part of the list's operation and doesn't alter that it "is a list". Something like this:
public class ChangeTrackingList<E> extends ArrayList<E> {
private int changeCount;
...
#Override public boolean add(E e) {
changeCount++;
return super.add(e);
}
// other methods likewise overridden as appropriate to track change counter
}
However, what about adding additional functionality out of the knowledge of a list ADT, such as storing arbitrary data associated with a list element? Assuming the associated data was properly managed when elements are added and removed, of course. Something like this:
public class PayloadList<E> extends ArrayList<E> {
private Object[] payload;
...
public void setData(int index, Object data) {
... // manage 'payload' array
payload[index] = data;
}
public Object getData(int index) {
... // manage 'payload' array, error handling, etc.
return payload[index];
}
}
In this case I have altered that it is "just a list" by adding not only additional functionality (behavior) but also additional state. Certainly part of the purpose of type specification and inheritance, but is there an implicit restriction (taboo, deprecation, poor practice, etc.) on Java collections types to treat them specially?
For example, when referencing this PayloadList as a java.util.List, one will mutate and iterate as normal and ignore the payload. This is problematic when it comes to something like persistence or copying which does not expect a List to carry additional data to be maintained. I've seen many places that accept an object, check to see that it "is a list" and then simply treat it as java.util.List. Should they instead allow arbitrary application contributions to manage specific concrete sub-types?
Since this implementation would constantly produce issues in instance slicing (ignoring sub-type fields) is it a bad idea to extend a collection in this way and always use composition when there is additional data to be managed? Or is it instead the job of persistence or copying to account for all concrete sub-types including additional fields?
This is purely a matter of opinion, but personally I would advise against extending classes like ArrayList in almost all circumstances, and favour composition instead.
Even your ChangeTrackingList is rather dodgy. What does
list.addAll(Arrays.asList("foo", "bar"));
do? Does it increment changeCount twice, or not at all? It depends on whether ArrayList.addAll() uses add(), which is an implementation detail you should not have to worry about. You would also have to keep your methods in sync with the ArrayList methods. For example, at present addAll(Collection<?> collection) is implemented on top of add(), but if they decided in a future release to check first if collection instanceof ArrayList, and if so use System.arraycopy to directly copy the data, you would have to change your addAll() method to only increment changeCount by collection.size() if the collection is an ArrayList (otherwise it gets done in add()).
Also if a method is ever added to List (this happened with forEach() and stream() for example) this would cause problems if you were using that method name to mean something else. This can happen when extending abstract classes too, but at least an abstract class has no state, so you are less likely to be able to cause too much damage by overriding methods.
I would still use the List interface, and ideally extend AbstractList. Something like this
public final class PayloadList<E> extends AbstractList<E> implements RandomAccess {
private final ArrayList<E> list;
private final Object[] payload;
// details missing
}
That way you have a class that implements List and makes use of ArrayList without you having to worry about implementation details.
(By the way, in my opinion, the class AbstractList is amazing. You only have to override get() and size() to have a functioning List implementation and methods like containsAll(), toString() and stream() all just work.)
One aspect you should consider is that all classes that inherit from AbstractList are value classes. That means that they have meaningful equals(Object) and hashCode() methods, therefore two lists are judged to be equal even if they are not the same instance of any class.
Furthermore, the equals() contract from AbstractList allows any list to be compared with the current list - not just a list with the same implementation.
Now, if you add a value item to a value class when you extend it, you need to include that value item in the equals() and hashCode() methods. Otherwise you will allow two PayloadList lists with different payloads to be considered "the same" when somebody uses them in a map, a set, or just a plain equals() comparison in any algorithm.
But in fact, it's impossible to extend a value class by adding a value item! You'll end up breaking the equals() contract, either by breaking symmetry (A plain ArrayList containing [1,2,3] will return true when compared with a PayloadList containing [1,2,3] with a payload of [a,b,c], but the reverse comparison won't return true). Or you'll break transitivity.
This means that basically, the only proper way to extend a value class is by adding non-value behavior (e.g. a list that logs every add() and remove()). The only way to avoid breaking the contract is to use composition. And it has to be composition that does not implement List at all (because again, other lists will accept anything that implements List and gives the same values when iterating it).
This answer is based on item 8 of Effective Java, 2nd Edition, by Joshua Bloch
If the class is not final, you can always extend it. Everything else is subjective and a matter of opinion.
My opinion is to favor composition over inheritance, since in the long run, inheritance produces low cohesion and high coupling, which is the opposite of a good OO design. But this is just my opinion.
The following is all just opinion, the question invites opinionated answers (I think its borderline to not being approiate for SO).
While your approach is workable in some situations, I'd argue its bad design and it is very brittle. Its also pretty complicated to cover all loopholes (maybe even impossible, for example when the list is sorted by means other than List.sort).
If you require extra data to be stored and managed it would be better integrated into the list items themselves, or the data could be associated using existing collection types like Map.
If you really need an association list type, consider making it not an instance of java.util.List, but a specialized type with specialized API. That way no accidents are possible by passing it where a plain list is expected.

Why is the combiner of the Collector interface not consistent with the overloaded collect method?

There is an overload method, collect(), in interface Stream<T> with the following signature:
<R> R collect(Supplier<R> supplier,
BiConsumer<R,? super T> accumulator,
BiConsumer<R,R> combiner)
There is another version of collect(Collector<? super T,A,R> collector), which receives an object with the previous three functions. The property of the interface Collector corresponding to the combiner has the signature BinaryOperator<A> combiner().
In the latter case, the Java API 8 states that:
The combiner function may fold state from one argument into the other and return that, or may return a new result container.
Why does the former collect method not receive a BinaryOperator<R> too?
The "inline" (3-arg) version of collect is designed for when you already have these functions "lying around". For example:
ArrayList<Foo> list = stream.collect(ArrayList::new,
ArrayList::add,
ArrayList::addAll);
Or
BitSet bitset = stream.collect(BitSet::new,
BitSet::set,
BitSet::or);
While these are just motivating examples, our explorations with similar existing builder classes was that the signatures of the existing combiner candidates were more suited to conversion to BiConsumer than to BinaryOperator. Offering the "flexibility" you ask for would make this overload far less useful in the very cases it was designed to support -- which is when you've already got the functions lying around, and you don't want to have to make (or learn about making) a Collector just to collect them.
Collector, on the other hand, has a far wider range of uses, and so merits the additional flexibility.
Keep in mind that the primary purpose of Stream.collect() is to support Mutable Reduction. For this operation, both functions, the accumulator and the combiner are meant to manipulate the mutable container and don’t need to return a value.
Therefore it is much more convenient not to insist on returning a value. As Brian Goetz has pointed out, this decision allows to re-use a lot of existing container types and their methods. Without the ability to use these types directly, the entire three-arg collect method would be pointless.
In contrast, the Collector interface is an abstraction of this operation supporting much more use cases. Most notably, you can even model the ordinary, i.e. non-mutable, Reduction operation with value types (or types having value type semantics) via a Collector. In this case, there must be a return value as the value objects themselves must not be modified.
Of course, it is not meant to be used as stream.collect(Collectors.reducing(…)) instead of stream.reduce(…). Instead, this abstraction comes handy when combining collectors, e.g. like groupingBy(…,reducing(…)).
If the former collect method receive a BinaryOperator<R> then the following example would not compile:
ArrayList<Foo> list = stream.collect(ArrayList::new,
ArrayList::add,
ArrayList::addAll);
In this case the compiler could not infer the return type of the combiner and would give a compilation error.
So if this version of collect method was consistent with the Collector interface then it would promote a more complex use of this version of collect method, that was not intended.

What does Natural Order mean In this context in java?

I have a question which appeared in a past paper (I'm revising for my exams) and I came across this word natural order which appears to be a keywords since it was written in bold on the paper. I've looked online at Natural Order but I couldn't find anything that related it to arraylist's like my question asks.
Please note, I do not need help solving the actual question, I just wish to understand what natural order means.
Question:
Write a Java static method called atLeast which takes an ArrayList of objects which
have natural order, an object of the element type of the ArrayList, and an integer n. A
call to the method should return true if at least n elements of the ArrayList are greater
than the element type object according to natural order, otherwise it should return false.
This likely means the objects in the List implement Comparable:
This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.
The declaration would look something like this:
static <T extends Comparable<? super T>>
boolean atLeast(List<T> list, T key, int n) {
...
}
Natural order means the default ordering for a particular type of collection. It actually depends upon the type of collection you are using. eg. if its a string collection, it will be sorted in alphabetical order, for numbers it follows numerical order.
Refer here for better understanding about natural ordering.
You can have a look at here for detail.
For objects to have a natural order they must implement the interface java.lang.Comparable. In other words, the objects must be comparable to determine their order. Here is how the Comparable interface looks:
public interface Comparable<T> {
int compareTo(T o);
}

Categories