Why does Stream#toList's default implementation seem overcomplicated / suboptimal?

Why does Stream#toList's default implementation seem overcomplicated / suboptimal? - java

Looking at the implementation for Stream#toList, I just noticed how overcomplicated and suboptimal it seemed.
Like mentioned in the javadoc just above, this default implementation is not used by most Stream implementation, however, it could have been otherwise in my opinion.
The sources
/**
* Accumulates the elements of this stream into a {#code List}. The elements in
* the list will be in this stream's encounter order, if one exists. The returned List
* is unmodifiable; calls to any mutator method will always cause
* {#code UnsupportedOperationException} to be thrown. There are no
* guarantees on the implementation type or serializability of the returned List.
*
* <p>The returned instance may be value-based.
* Callers should make no assumptions about the identity of the returned instances.
* Identity-sensitive operations on these instances (reference equality ({#code ==}),
* identity hash code, and synchronization) are unreliable and should be avoided.
*
* <p>This is a terminal operation.
*
* #apiNote If more control over the returned object is required, use
* {#link Collectors#toCollection(Supplier)}.
*
* #implSpec The implementation in this interface returns a List produced as if by the following:
* <pre>{#code
* Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray())))
* }</pre>
*
* #implNote Most instances of Stream will override this method and provide an implementation
* that is highly optimized compared to the implementation in this interface.
*
* #return a List containing the stream elements
*
* #since 16
*/
#SuppressWarnings("unchecked")
default List<T> toList() {
return (List<T>) Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray())));
}
My idea of what would be better
return (List<T>) Collections.unmodifiableList(Arrays.asList(this.toArray()));
Or even
return Arrays.asList(this.toArray()));
IntelliJ's proposal
return (List<T>) List.of(this.toArray());
Is there any good reason for the implementation in the JDK sources?

The toArray method might be implemented to return an array that is then mutated afterwards, which would effectively make the returned list not immutable. That's why an explicit copy by creating a new ArrayList is done.
It's essentially a defensive copy.
This was also discussed during the review of this API, where Stuart Marks writes:
As written it's true that the default implementation does perform apparently redundant copies, but we can't be assured that toArray() actually returns a freshly created array. Thus, we wrap it using Arrays.asList and then copy it using the ArrayList constructor. This is unfortunate but necessary to avoid situations where someone could hold a reference to the internal array of a List, allowing modification of a List that's supposed to be unmodifiable.

Related

list's iterator is implementing both Kotlin.Collection.Iterator as well as Java.util.Iterator?

I have this code
val list = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9)
list.iterator().forEachRemaining{}
When I check iterator() return type, it is returning iterator of type from Kotlin.collections package
public interface Iterator<out T> {
/**
* Returns the next element in the iteration.
*/
public operator fun next(): T
/**
* Returns `true` if the iteration has more elements.
*/
public operator fun hasNext(): Boolean
}
From the above, there's no forEachRemaining{} function. However, I could still use forEachRemaining{} which is from public interface Iterator<E> of java.util; package. i.e.
{
/**
* Returns {#code true} if the iteration has more elements.
* (In other words, returns {#code true} if {#link #next} would
* return an element rather than throwing an exception.)
*
* #return {#code true} if the iteration has more elements
*/
boolean hasNext();
/**
* Returns the next element in the iteration.
*
* #return the next element in the iteration
* #throws NoSuchElementException if the iteration has no more elements
*/
E next();
/**
* Removes from the underlying collection the last element returned
* by this iterator (optional operation). This method can be called
* only once per call to {#link #next}. The behavior of an iterator
* is unspecified if the underlying collection is modified while the
* iteration is in progress in any way other than by calling this
* method.
*
* #implSpec
* The default implementation throws an instance of
* {#link UnsupportedOperationException} and performs no other action.
*
* #throws UnsupportedOperationException if the {#code remove}
* operation is not supported by this iterator
*
* #throws IllegalStateException if the {#code next} method has not
* yet been called, or the {#code remove} method has already
* been called after the last call to the {#code next}
* method
*/
default void remove() {
throw new UnsupportedOperationException("remove");
}
/**
* Performs the given action for each remaining element until all elements
* have been processed or the action throws an exception. Actions are
* performed in the order of iteration, if that order is specified.
* Exceptions thrown by the action are relayed to the caller.
*
* #implSpec
* <p>The default implementation behaves as if:
* <pre>{#code
* while (hasNext())
* action.accept(next());
* }</pre>
*
* #param action The action to be performed for each element
* #throws NullPointerException if the specified action is null
* #since 1.8
*/
default void forEachRemaining(Consumer<? super E> action) {
Objects.requireNonNull(action);
while (hasNext())
action.accept(next());
}
}
How could the iterator() have access to both Iterator of Kotlin.collections package as well as java.util; package? Did I miss something?

Some classes from Kotlin standard library are mapped to platform-specific classes (e.g. to Java classes for Kotlin/JVM) automatically. This is the case for Iterator you've mentioned.
Note that collection-related classes don't have one-to-one mapping. Kotlin's kotlin.collection.Iterator only contains read-only operations as you've mentioned in the question. It has sibling interface kotlin.collection.MutableIterator which extends Iterator and adds remove() method. Both of these are mapped into Java's java.util.Iterator. So all the Kotlin code, including extension methods like forEachRemaining, is declared using Kotlin types, but Java ones would be used under the hood.
When you are passing both Kotlin k.c.Iterator<T> and k.c.MutableIterator<T> to Java code, it sees usual Java j.u.Iterator<T>. When you are passing j.u.Iterator<T> to Kotlin code, it sees so-called platform type (Mutable)Iterator<T>!. This means that
You are free to declare it both nullable and non-null depending on the passing code Javadoc or usage, hence ! in type name.
You can use it as both MutableIterator and Iterator depending on your use-case.
Motivation behind this mapping as opposed to entirely separate collections in standard library as it's done in Scala, for example, is simple. You don't have to do any copying when mapping between Java and Kotlin worlds. The downside is additional complexity of implementation, which we mostly don't see as users.
See Java interoperability section of Kotlin documentation for additional details.

why do I need Iterator interface and why should I use it?

I am new to Java so maybe to some of you my question will seem silly.
As I understand from some tutorial if I need to make on my custom object foreach the object must implement Iterable interface.
My question is why do I need Iterator interface and why should I use it?

As you mentioned, Iterable is used in foreach loops.
Not everything can be used in a foreach loop, right? What do you think this will do?
for (int a : 10)
The designers of Java wanted to make the compiler able to spot this nonsense and report it to you as a compiler error. So they thought, "what kind of stuff can be used in a foreach loop?" "Well", they thought, "objects must be able to return an iterator". And this interface is born:
public interface Iterable<T> {
/**
* Returns an iterator over elements of type {#code T}.
*
* #return an Iterator.
*/
Iterator<T> iterator();
}
The compiler can just check whether the object in the foreach loop implements Iterable or not. If it does not, spit out an error. You can think of this as a kind of "marker" to the compiler that says "Yes I can be iterated over!"
"What is an iterator then?", they thought again, "Well, an iterator should be able to return the next element and to return whether it has a next element. Some iterators should also be able to remove elements". So this interface is born:
public interface Iterator<E> {
/**
* Returns {#code true} if the iteration has more elements.
* (In other words, returns {#code true} if {#link #next} would
* return an element rather than throwing an exception.)
*
* #return {#code true} if the iteration has more elements
*/
boolean hasNext();
/**
* Returns the next element in the iteration.
*
* #return the next element in the iteration
* #throws NoSuchElementException if the iteration has no more elements
*/
E next();
/**
* Removes from the underlying collection the last element returned
* by this iterator (optional operation). This method can be called
* only once per call to {#link #next}. The behavior of an iterator
* is unspecified if the underlying collection is modified while the
* iteration is in progress in any way other than by calling this
* method.
*
* #implSpec
* The default implementation throws an instance of
* {#link UnsupportedOperationException} and performs no other action.
*
* #throws UnsupportedOperationException if the {#code remove}
* operation is not supported by this iterator
*
* #throws IllegalStateException if the {#code next} method has not
* yet been called, or the {#code remove} method has already
* been called after the last call to the {#code next}
* method
*/
default void remove() {
throw new UnsupportedOperationException("remove");
}
}

The Iterator is design pattern, it allows to go through collection of same object in certain way, this also allow to hide implementation of store element and iteration mechanism from user. As you can see in javadoc many classes implements Itarable interface, not only collections. In example it allows you to iterate through two List implementations in same performance, when ArrayList give indexes in same time but LinkedList for give certain index need to go all elements previously to this number and this much slower. But when you get Iterator from this implementation you get same performance in both cases because iterate algorithm optimised in both list in different way. ResultSet is also iterator but it does not implement interface from java.util it allow to iterate in all result of query in db in same way and hide back structures responsibles for elements store and db participation. In example when you need some optimization you may make new ResultSet implementation query db per next result invoke or what ever you want, because it also decouple client code from elements storage realization and iteration algorithms.

HashSet or HashMap without defining a hashCode() method in a new class

What happens if you design a new class and try to insert objects of that class into a HashSet or HashMap without defining a hashCode() method?
Please keep make the explanation easy. I'm studying for an exam and I'm still rusty with hashes in Java. Thank you.

A HashMap stores data into multiple singly linked lists of entries (also called buckets or bins). All the lists are registered in an array of Entry (Entry[] array)
The following picture shows the inner storage of a HashMap instance with an array of nullable entries. Each Entry can link to another Entry to form a linked list.
When a user calls put(K key, V value) or get(Object key), the function computes the index of the bucket in which the Entry should be.
This index of the bucket (linked list) is generated using hashcode of the key.
So, if you have overridden the hashCode method, it will use overridden method to compute index of the bucket
otherwise default hash code is used which is the memory address for your object. So in that case even your objects are you will have a new entry in your map. So even if you try to store logically equal objects. They wil be reataed as different by hash Map.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
For example:
MyObject a = new MyObject("a", 123,"something");
MyObject b = new MyObject("a", 123,"something");
a and b will have different hashcodes.

Nothing will happen :-)
Every object has own hashCode() method that inherited from Object class. So, your every new object will be unique. By herself, they will be identified as unique by HashSet or HashMap.
Here are official comments:
/**
* Returns a hash code value for the object. This method is
* supported for the benefit of hash tables such as those provided by
* {#link java.util.HashMap}.
* <p>
* The general contract of {#code hashCode} is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the {#code hashCode} method
* must consistently return the same integer, provided no information
* used in {#code equals} comparisons on the object is modified.
* This integer need not remain consistent from one execution of an
* application to another execution of the same application.
* <li>If two objects are equal according to the {#code equals(Object)}
* method, then calling the {#code hashCode} method on each of
* the two objects must produce the same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
* according to the {#link java.lang.Object#equals(java.lang.Object)}
* method, then calling the {#code hashCode} method on each of the
* two objects must produce distinct integer results. However, the
* programmer should be aware that producing distinct integer results
* for unequal objects may improve the performance of hash tables.
* </ul>
* <p>
* As much as is reasonably practical, the hashCode method defined by
* class {#code Object} does return distinct integers for distinct
* objects. (This is typically implemented by converting the internal
* address of the object into an integer, but this implementation
* technique is not required by the
* Java™ programming language.)
*
* #return a hash code value for this object.
* #see java.lang.Object#equals(java.lang.Object)
* #see java.lang.System#identityHashCode
*/
public native int hashCode();

Interface and implemented methods

I have here a question. Based on Java 7 API Collection is an interface, but yet it comes with some concrete methods, such as size(). I don't get it, how is that interface contains implemented methods. It makes sense if that was an abstract class.
Best regards

Collection is an interface, but yet it comes with some concrete methods, such as size().
This is not true. An interface as you already know just defines the contract and leaves the implementation to classes implementing it. If you're referring to something like
Collection<String> collection = new ArrayList<String>();
System.out.println("Size of the collection is: " + collection.size());
Please note that the size() implementation was provided by the ArrayList not Collection.

java.util.Collection does not have implemented methods, it is an interface. Here's the declaration of the size method:
/**
* Returns the number of elements in this collection. If this collection
* contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
* <tt>Integer.MAX_VALUE</tt>.
*
* #return the number of elements in this collection
*/
int size();

There is no concrete implementation for any methods. The method you are referring to, size also doesn't have any concrete implementation.
/**
* Returns the number of elements in this collection. If this collection
* contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
* <tt>Integer.MAX_VALUE</tt>.
*
* #return the number of elements in this collection
*/
int size();

Java Modcount (ArrayList)

In Eclipse, I see that ArrayList objects have a modCount field. What is its purpose? (number of modifications?)

It allows the internals of the list to know if there has been a structural modification made that might cause the current operation to give incorrect results.
If you've ever gotten ConcurrentModificationException due to modifying a list (say, removing an item) while iterating it, its internal modCount was what tipped off the iterator.
The AbstractList docs give a good detailed description.

Yes. If you ever intend to extend AbstractList, you have to write your code so that it adheres the modCount's javadoc as cited below:
/**
* The number of times this list has been <i>structurally modified</i>.
* Structural modifications are those that change the size of the
* list, or otherwise perturb it in such a fashion that iterations in
* progress may yield incorrect results.
*
* <p>This field is used by the iterator and list iterator implementation
* returned by the {#code iterator} and {#code listIterator} methods.
* If the value of this field changes unexpectedly, the iterator (or list
* iterator) will throw a {#code ConcurrentModificationException} in
* response to the {#code next}, {#code remove}, {#code previous},
* {#code set} or {#code add} operations. This provides
* <i>fail-fast</i> behavior, rather than non-deterministic behavior in
* the face of concurrent modification during iteration.
*
* <p><b>Use of this field by subclasses is optional.</b> If a subclass
* wishes to provide fail-fast iterators (and list iterators), then it
* merely has to increment this field in its {#code add(int, E)} and
* {#code remove(int)} methods (and any other methods that it overrides
* that result in structural modifications to the list). A single call to
* {#code add(int, E)} or {#code remove(int)} must add no more than
* one to this field, or the iterators (and list iterators) will throw
* bogus {#code ConcurrentModificationExceptions}. If an implementation
* does not wish to provide fail-fast iterators, this field may be
* ignored.
*/
Taking a look into the actual JDK source code and reading the javadocs (either online or in code) help a lot in understanding what's going on. Good luck.
I would add, you can add JDK source code to Eclipse so that every F3 or CTRL+click on any Java SE class/method points to the actual source code. If you download the JDK, you should have the src.zip in the JDK installation folder. Now, in Eclipse's the top menu, go to Window » Preferences » Java » Installed JREs. Select the current JRE and click Edit. Select the rt.jar file, click at Source Attachment, click at External File, navigate to JDK folder, select the src.zip file and add it. Now the source code of the Java SE API is available in Eclipse. The JDK source code gives a lot of insights. Happy coding :)

protected transient int modCount = 0; is the property declared at public abstract class AbstractList, to identify total number of structural modification made in this collection. Means if there is a add/remove there will be an increment in this counter for both operation. Hence this counter always get incremented for any modification. So not useful for size computation.
This will be useful to throw ConcurrentModificationException.
ConcurrentModificationException will be thrown while iterating the collection by one thread and there is a modification in the collection by another thread.
This is achieved like whenever iterator object is created modCount will be set into expectedCount, and each iterator navigation expectedCount will be compared with modCount to throw ConcurrentModificationException when there is a change.
private class Itr implements Iterator<E> {
...
...
/**
* The modCount value that the iterator believes that the backing
* List should have. If this expectation is violated, the iterator
* has detected concurrent modification.
*/
int expectedModCount = modCount;
public E next() {
checkForComodification();
...
...
}
final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}
...
...
}
size() api won't suits here; since if there is two operation (add and remove) happened before next() called still size will show the same value; hence not able to detect the modification happened on this collection using size() api while iteration. Hence we need modification_increment_counter which is modCount.

It's the number of times the structure (size) of the collection changes

From the Java API for the mod count field:
The number of times this list has been structurally modified. Structural modifications are those that change the size of the list, or otherwise perturb it in such a fashion that iterations in progress may yield incorrect results.

From the 1.4 javadoc on AbstractList:
protected transient int modCount
The number of times this list has been
structurally modified. Structural
modifications are those that change
the size of the list, or otherwise
perturb it in such a fashion that
iterations in progress may yield
incorrect results.
This field is used by the iterator and
list iterator implementation returned
by the iterator and listIterator
methods. If the value of this field
changes unexpectedly, the iterator (or
list iterator) will throw a
ConcurrentModificationException in
response to the next, remove,
previous, set or add operations. This
provides fail-fast behavior, rather
than non-deterministic behavior in the
face of concurrent modification during
iteration.
Use of this field by subclasses is optional.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.