Java 8 stream's toArray and size parameter - java

I was wondering how stream().toArray[x -> new Integer[x]] knows what size of array to from? I wrote a snippet in which i created a list of an integer of size 4 and filtered the values and it created an array of length of the filtered stream, I could not see any method on stream to get a size of the stream.
List<Integer> intList = new ArrayList<Integer>();
intList.add(1);
intList.add(2);
intList.add(3);
intList.add(4);
Integer[] array = intList.stream()
.filter(x -> x > 2)
.toArray(x -> {
System.out.println("x --> " + x);
return new Integer[x];
});
System.out.println("array length: " + array.length);
Output of above code:
x --> 2
array length: 2
initially, the snippet was like
Integer[] array = intList.stream()
.filter(x -> x > 2)
.toArray(x -> new Integer[x]);
Just to get the understanding what value of x it passes i had to change it to print x in lambda

Of course, this is implementation dependent. For some streams, the size is predicable, if the source has a known size and no size changing intermediate operation is involved. Since you are using a filter operation, this doesn’t apply, however, there is an estimate size, based on the unfiltered count.
Now, the Stream implementation simply allocates a temporary buffer, either using the estimated size or a default size with support for increasing the capacity, if necessary, and copies the data into the destination array, created by your function, in a final step.
The intermediate buffers could be created via the supplied function, which is the reason why the documentation states “…using the provided generator function to allocate the returned array, as well as any additional arrays that might be required for a partitioned execution or for resizing” and I vaguely remember seeing such a behavior in early versions. However, the current implementation just uses Object[] arrays (or Object[][] in a “spined buffer”) for intermediate storage and uses the supplied function only for creating the final array. Therefore, you can’t observe intermediate array creation with the function, given this specific JRE implementation.

The thing is: this is a terminal operation. It happens in the end, when the stream was processed: meaning - the "final" count is known by then; as there are no more operations that could remove/add values to the stream!

Simply look at javas stream documentation of toArray.
<A> A[] toArray(IntFunction<A[]> generator)
Returns an array containing the elements of this stream, using the provided generator function to allocate the returned array, as well as any additional arrays that might be required for a partitioned execution or for resizing.
This is a terminal operation.
API Note:
The generator function takes an integer, which is the size of the desired array, and produces an array of the desired size. This can be concisely expressed with an array constructor reference
Therefore toArray does give you the desired array size as a parameter and you are responsible for allocating a correct sized array, at least when using this method. This method is a terminal operation. So the size calculation is done within the internals of the Stream API.
IMHO it is better to grasp if you name your lambda parameters differently for filter and toArray.
Integer[] array = intList.stream()
.filter(myint -> myint > 2)
.toArray(desiredArraySize -> new Integer[desiredArraySize]);

Related

How can an array size be extended to range of long in java and if not possible what other data structure can be used for the same

How can an array size be extended to range of long in java and if not possible what other data structure can be used for the same
Java arrays are built-in constructs. You cannot influence their indexing scheme in any way, so you would have to use int index.
Similarly, all Java collections are limited to 231 entries, because their size() method returns an int, and direct access methods, where available, also take an int.
If you need a data structure that stores more than 231 items, make a 2D array of "chunks", each representing a portion of a "big" array. In essence, you would build your own class that translates long addressing to a pair of ints for emulating a linear indexing space.
Java arrays' size are fixed. When you create a new array and assign it a size it can not be extended or reduced but the contents white the array can be easily changed as long as it is of the same type. Arrays in java must contain the same data type within the array.
for example :
int[] a = new int[5];
is creating a new array of size of that can store integers.
If you want to have an army that can be increased in size i would recommend using a data type called a dynamic array. In java, dynamic arrays are called ArrayList. An array list is dynamic so you do not set a size when you create the ArrayList but you must define the type of the array list
for example:
Arraylist<Integer> a = new ArrayList<Integer>();

Difference between implicit and explicit ArrayList size declarations?

what is the difference between the following declarations:
List list1 = new ArrayList();
List list2 = new ArrayList(10);
By default is allocates it with 10. But is there any difference?
Can I add an 11th element to list2 by list2.add("something")?
Here is the source code for you for first example
public ArrayList() {
this(10);
}
So there is no difference. Since the initial capacity is 10, no matter you pass 10 or not, it gets initialised with capacity 10.
Can I add 11th element in the list2 by list2.add("something")?
Ofcourse, initial capacity is not final capacity. So as you keep on adding more than 10, the size of the list keeps increasing.
If you want to have a fixed size container, use Arrays.asList (or, for primitive arrays, the asList methods in Guava) and also consider java.util.Collections.unmodifiableList()
Worth reading about this change in Java 8 : In Java 8, why is the default capacity of ArrayList now zero?
In short, providing initial capacity wont really change anything interms of size.
You can always add elements in a list. However, the inlying array, which is used by the ArrayList, is initialized with either the default size of 10 or the size, which you specify when initializing the ArrayList. This means, if you e.g. add the 11th element, the array size has to be increased, which is done by copying the contents of the array to a new, bigger array instance. This of course needs time depending on the size of the list/array. So if you already know, that your list will hold thousands of elements, it is faster if you already initialize the list with that approximate size.
ArrayLists in Java are auto-growable, and will resize themselves if they need to in order to add additional elements. The size parameter in the constructor is just used for the initial size of the internal array, and is a sort of optimization for when you know exactly what you're going to use the array for.
Specifying this initial capacity is often a premature optimization, but if you really need an ArrayList of 10 elements, you should specify it explicitly, not assume that the default size is 10. Although this really used to be the default behavior (up to JDK 7, IIRC), you should not rely on it - JDK 8 (checked with java-1.8.0-openjdk-1.8.0.101-1.b14.fc24.x86_64 I have installed) creates empty ArrayLists by default.
The other answers have explained really well, but just to keep things relevant, in JDK 1.7.0_95:
/**
* Constructs a new {#code ArrayList} instance with zero initial capacity.
*/
public ArrayList() {
array = EmptyArray.OBJECT;
}
/**
* Constructs a new instance of {#code ArrayList} with the specified
* initial capacity.
*
* #param capacity
* the initial capacity of this {#code ArrayList}.
*/
public ArrayList(int capacity) {
if (capacity < 0) {
throw new IllegalArgumentException("capacity < 0: " + capacity);
}
array = (capacity == 0 ? EmptyArray.OBJECT : new Object[capacity]);
}
As the comment mentions, the constructor accepting no arguments initializes an ArrayList with zero initial capacity.
And even more interesting here is a variable (with a comment) that lends a lot of information on its own:
/**
* The minimum amount by which the capacity of an ArrayList will increase.
* This tuning parameter controls a time-space tradeoff. This value (12)
* gives empirically good results and is arguably consistent with the
* RI's specified default initial capacity of 10: instead of 10, we start
* with 0 (sans allocation) and jump to 12.
*/
private static final int MIN_CAPACITY_INCREMENT = 12;
You just picked the perfect example. Both actually do the same as new ArrayList() calls this(10) ;) But internally it would define the holding array with the size 10. the ArrayList#size method on the other side does just return a variable size, which only will be changed after adding and removing elements. This variable is also the main reason for IOOB Exceptions. So you wont be able to do so.
If you check the code of the ArrayList for example, you´ll notice that the method ArrayList#add will call ArrayList#rangeCheck. The range check actually just cares for the size variable and not the actuall length of the array holding the data for the List.
Due to this you´ll still not be able to insert data at the index 5 for example. The internal length of the data array at this point will be 10, but as you didn´t add anything to your List, the size variable will still be 0 and you´ll get the proper IndexOutOfBoundsException when you´ll try to do so.
just try to call list.size() after initializing the List with any size, and you´ll notice the returned size will be 0.
The initialization of ArrayList has been optimized since JDK 1.7 update 40 and there's a good explanation about the two different behaviours at this link
java-optimization-empty-arraylist-and-Hashmap-cost-less-memory-jdk-17040-update.
So before Java 1.7u40 there're no difference but from that version there's a quite substantial difference.
This difference is about perfomance optimization and doesn't change the contract of List.add(E e) and ArrayList(int initialCapacity).

Java streams: one liner to perform multiple numeric operations on Int stream

I have been reading about Java 8, but have not tried it out yet. So, I'm attempting to do simple math operations using it.
I'm trying to find the average, sum, max, and minimum of a list using new techniques from Java 8. I want to print out all the numbers in the collection first, and then print out average, sum, min, and max.
Here is what I have:
List<Integer> list = Arrays.asList(5,6,2,12,7,9,15,18,-1,-8);
OptionalDouble average = list.stream().mapToInt(num -> num).average();
int sum = list.stream().mapToInt(num -> num).sum();
//sort the numbers, so that min and max are found in array
int[] arr = list.stream().mapToInt(a -> {System.out.println(a); return a;}).sorted().toArray();
System.out.println("Average: " + average.getAsDouble());
System.out.println("Sum: " + sum);
System.out.println("The minimum is: " + arr[0] + ", and maximum: " + arr[arr.length-1]);
Is there a way to do this with one stream initiation instead of making 3 different streams? Is there a way to perform multiple, parallel, operations on one single stream source?
Also, what if I was to get the numbers I'm performing operations on from the terminal/console, instead of it already being in a collection. If I recall correctly, Java 8 In Action describes one of the differences between Collections and Streams as Collections already having all their elements stored, while Streams continuously getting their data from the source, element by element. So, this is similar to a user providing numbers via the console, one by one. So, my second question is, is it possible to make the data source for a stream the System.in instead of having to make a List first from the user input, and then converting that List to a stream?
Once you have the IntStream, call summaryStatistics() to get back an IntSummaryStatistics object that holds the count, sum, min, max, and average.
Generally, you can call collect on the IntStream to perform your own customized calculations on the stream values. Pass in a Supplier that supplies the initial state of the calculations (e.g. sum is 0). Pass in a ObjIntConsumer that processes the current value into the state of the calculations (e.g. a value is added to the sum). Pass in a BiConsumer that merges the results of two separate calculations (used in parallel calculations) (e.g. two sums are added together and stored in the first sum).
I know of no built-in way of converting an input stream to a java.util.streams.Stream. The most straightforward way is to do as you already suggest - read from the input, store the values in a List, then process it with a Stream. This certainly works, but it is like a "full barrier" - the whole contents must be in memory at once before further processing can take place.
If I were to create something that would convert input from an InputStream to a java.util.streams.Stream, I would have some kind of Reader or Scanner inside of a custom implementation of Spliterator.ofInt, which would read and parse the int values on demand. Then you could pass an instance of this custom Spliterator.ofInt to StreamSupport.intStream to create an IntStream.

How to sum a list of integers with java streams?

I want to sum a list of Integers. It works as follows, but the syntax does not feel right. Could the code be optimized?
Map<String, Integer> integers;
integers.values().stream().mapToInt(i -> i).sum();
This will work, but the i -> i is doing some automatic unboxing which is why it "feels" strange. mapToInt converts the stream to an IntStream "of primitive int-valued elements". Either of the following will work and better explain what the compiler is doing under the hood with your original syntax:
integers.values().stream().mapToInt(i -> i.intValue()).sum();
integers.values().stream().mapToInt(Integer::intValue).sum();
I suggest 2 more options:
integers.values().stream().mapToInt(Integer::intValue).sum();
integers.values().stream().collect(Collectors.summingInt(Integer::intValue));
The second one uses Collectors.summingInt() collector, there is also a summingLong() collector which you would use with mapToLong.
And a third option: Java 8 introduces a very effective LongAdder accumulator designed to speed-up summarizing in parallel streams and multi-thread environments. Here, here's an example use:
LongAdder a = new LongAdder();
map.values().parallelStream().forEach(a::add);
sum = a.intValue();
From the docs
Reduction operations
A reduction operation (also called a fold) takes a sequence of input elements and combines them into a single summary result by repeated application of a combining operation, such as finding the sum or maximum of a set of numbers, or accumulating elements into a list. The streams classes have multiple forms of general reduction operations, called reduce() and collect(), as well as multiple specialized reduction forms such as sum(), max(), or count().
Of course, such operations can be readily implemented as simple sequential loops, as in:
int sum = 0;
for (int x : numbers) {
sum += x;
}
However, there are good reasons to prefer a reduce operation over a mutative accumulation such as the above. Not only is a reduction "more abstract" -- it operates on the stream as a whole rather than individual elements -- but a properly constructed reduce operation is inherently parallelizable, so long as the function(s) used to process the elements are associative and stateless. For example, given a stream of numbers for which we want to find the sum, we can write:
int sum = numbers.stream().reduce(0, (x,y) -> x+y);
or:
int sum = numbers.stream().reduce(0, Integer::sum);
These reduction operations can run safely in parallel with almost no modification:
int sum = numbers.parallelStream().reduce(0, Integer::sum);
So, for a map you would use:
integers.values().stream().mapToInt(i -> i).reduce(0, (x,y) -> x+y);
Or:
integers.values().stream().reduce(0, Integer::sum);
You can use reduce method:
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, (x, y) -> x + y);
or
long sum = result.stream().map(e -> e.getCreditAmount()).reduce(0L, Integer::sum);
You can use reduce() to sum a list of integers.
int sum = integers.values().stream().reduce(0, Integer::sum);
You can use collect method to add list of integers.
List<Integer> list = Arrays.asList(2, 4, 5, 6);
int sum = list.stream().collect(Collectors.summingInt(Integer::intValue));
I have declared a list of Integers.
ArrayList<Integer> numberList = new ArrayList<Integer>(Arrays.asList(1, 2, 3, 4, 5));
You can try using these different ways below.
Using mapToInt
int sum = numberList.stream().mapToInt(Integer::intValue).sum();
Using summarizingInt
int sum = numberList.stream().collect(Collectors.summarizingInt(Integer::intValue)).getSum();
Using reduce
int sum = numberList.stream().reduce(Integer::sum).get().intValue();
May this help those who have objects on the list.
If you have a list of objects and wanted to sum specific fields of this object use the below.
List<ResultSom> somList = MyUtil.getResultSom();
BigDecimal result= somList.stream().map(ResultSom::getNetto).reduce(
BigDecimal.ZERO, BigDecimal::add);
This would be the shortest way to sum up int type array (for long array LongStream, for double array DoubleStream and so forth). Not all the primitive integer or floating point types have the Stream implementation though.
IntStream.of(integers).sum();
Unfortunately looks like the Stream API only returns normal streams from, say, List<Integer>#stream(). Guess they're pretty much forced to because of how generics work.
These normal Streams are of generic objects so don't have specialized methods like sum() etc. so you have to use the weird re-stream "looks like a no-op" conversion by default to get to those methods... .mapToInt(i -> i).
Another option is using "Eclipse Collections" which are like an expanded java Stream API
IntLists.immutable.ofAll(integers.values()).sum();
There is one more option no one considered here and it reflects on usage of multi-core environment. If you want to use its advantages, then next code should be used instead of the other mentioned solutions:
int sum = integers.values().parallelStream().mapToInt(Integer::intValue)
.reduce(0, Integer::sum, Integer::sum);
This solution is similar to other ones, but please notice the third argument in reduce. It tells compiler what to do with partial summaries calculated in different chunks of the stream, by different threads. Also instead of stream(), the parallelStream() is used. In this case it would just summarize it. The other option to put as third argument is (i, j) -> i + j, which means that it would add a value of a stream chunk (j) to the current value (i) and use it as a current value for the next stream chunk until all partial results are processed.
Even when using plain stream() it is useful to tell to reduce what to do with stream chunks' summaries, just in case someone, or you, would like to parallelize it in the future. The initial development is best time for that, since later on you need to remember what this is supposed to be and need to spend some time in understanding the purpose of that code again.
And of course instead of method reference operator you can have different dialect of lambda. I prefer it this way as more compact and still easy readable.
Also remember this can be used for more complex calculations too, but always be aware there are no guarantees about sequence and deployment of stream elements to threads.
IntStream.of(1, 2, 23).sum();
IntStream.of(1, 2, 23,1, 2, 23,1, 2, 23).max().getAsInt();

What is the Java equivalent to Python's reduce function?

Similar questions have been asked, here and here, but given the advent of Java 8, and the generally outdated nature of these questions I'm wondering if now there'd be something at least kindred to it?
This is what I'm referring to.
You can use a lambda and Stream.reduce, there is a page in the docs dedicated to reductions:
Integer totalAgeReduce = roster
.stream()
.map(Person::getAge)
.reduce(
0,
(a, b) -> a + b);
This is the example used in the Python docs implemented with Java 8 streams:
List<Integer> numbers = Arrays.asList(new Integer[] { 1, 2, 3, 4, 5 });
Optional<Integer> sum = numbers.stream().reduce((a, b) -> a + b);
System.out.println(sum.get());
The Stream.reduce Method
The Stream.reduce method is a general-purpose reduction operation. Consider the following pipeline, which calculates the sum of the male members' ages in the collection roster. It uses the Stream.sum reduction operation:
Integer totalAge = roster
.stream()
.mapToInt(Person::getAge)
.sum();
Compare this with the following pipeline, which uses the Stream.reduce operation to calculate the same value:
Integer totalAgeReduce = roster
.stream()
.map(Person::getAge)
.reduce(
0,
(a, b) -> a + b);
The reduce operation in this example takes two arguments:
identity: The identity element is both the initial value of the reduction and the default result if there are no elements in the stream. In this example, the identity element is 0; this is the initial value of the sum of ages and the default value if no members exist in the collection roster.
accumulator: The accumulator function takes two parameters: a partial result of the reduction (in this example, the sum of all processed integers so far) and the next element of the stream (in this example, an integer). It returns a new partial result. In this example, the accumulator function is a lambda expression that adds two Integer values and returns an Integer value:
(a, b) -> a + b
The reduce operation always returns a new value. However, the accumulator function also returns a new value every time it processes an element of a stream. Suppose that you want to reduce the elements of a stream to a more complex object, such as a collection. This might hinder the performance of your application. If your reduce operation involves adding elements to a collection, then every time your accumulator function processes an element, it creates a new collection that includes the element, which is inefficient. It would be more efficient for you to update an existing collection instead. You can do this with the Stream.collect method, which the next section describes.
The official oracle tutorial describes how the Stream.reduce works. Please have a look, I believe it will answer your query.

Categories