I want to convert this while loop to equivalent code using a Java 8 Streams, but I don't know how to both stream the List and remove elements from it.
private List<String> nameList = new ArrayList<>();
while (nameList.size() > 0) {
String nameListFirstEntry = nameList.get(0);
nameList.remove(0);
setNameCombinations(nameListFirstEntry);
}
I guess this will do
nameList.forEach(this::setNameCombinations);
nameList.clear();
In case you don't need the original list anymore, you might as well create a new empty list instead.
Because List#remove(int) also returns the element, you can both stream the list's elements and remove them via a stream:
Stream.generate(() -> nameList.remove(0))
.limit(nameList.size())
.forEach(this::setNameCombinations);
This code doesn't break any "rules". From the javadoc of Stream#generate():
Returns an infinite sequential unordered stream where each element is generated by the provided Supplier. This is suitable for generating constant streams, streams of random elements, etc.
There is no mention of any restrictions on how the supplier is implemented or that is must have no side effects etc. The Supplier's only contract is to supply.
For those who doubt this is "works", here's some test code using 100K elements showing that indeed order is preserved:
int size = 100000;
List<Integer> list0 = new ArrayList<>(size); // the reference list
IntStream.range(0, size).boxed().forEach(list0::add);
List<Integer> list1 = new ArrayList<>(list0); // will feed stream
List<Integer> list2 = new ArrayList<>(size); // will consume stream
Stream.generate(() -> list1.remove(0))
.limit(list1.size())
.forEach(list2::add);
System.out.println(list0.equals(list2)); // always true
Related
There is a problem on my server where it became a bottle neck due to a specific problem to solve resolving a List<List<SomeObject>> into a List<SomeObject>. The CPU of the server spiked above normal means.
DataStructure is:
Object:
List<SomeObject> childList;
Trying to make a List<Object> flatmapped to List<SomeObject> in the most computationally efficient way.
If parentList = List<Object>:
I Tried:
parentList.stream().flatMap(child -> child.getChildList().stream()).collect(Collectors.toList())
Also tried:
List<Object> all = new ArrayList<>();
parentList.forEach(child -> all.addAll(child.getChildList()))
Any other suggestions? These seem to be similar in computation but pretty high due to copying underneath the hood.
This may be more efficient since it eliminates creating multiple streams via flatMap. MapMulti was introduced in Java 16. It takes the streamed argument and a consumer which puts something on the stream, in this case each list's object.
List<List<Object>> lists = new ArrayList<>(
List.of(List.of("1", "2", "3"),
List.of("4", "5", "6", "7"),
List.of("8", "9")));
List<Object> list = lists.stream().mapMulti(
(lst, consumer) -> lst.forEach(consumer))
.toList();
System.out.print(list);
prints
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Do we know more about which List implementation is used?
I would try to init the resulting list with the correct expected size.
This avoids unnecessary copying.
This assumes that the size of the lists can be retrieved fast.
int expectedSize = parentList.stream()
.mapToInt(entry -> entry.getChildList().size())
.sum();
List<SomeObject> result = new ArrayList<>(expectedSize);
for (var entry : parentList) {
result.addAll(entry.getChildList());
}
In java 8
List<Object> listOne = new ArrayList<>();
List<Object> listTwo = new ArrayList<>();
List<Object> listThree = new ArrayList<>();
...
Stream.of(...) concatenate many lists
List<Object> newList = Stream.of(listOne,listTwo,listThree).flatMap(Collection::stream).collect(Collectors.toList());
In Java 16+
List<Object> newList=Stream.concat(Stream.concat(listOne, listTwo), listThree).toList();
Being an ETL (“Extract Transform and Load”) process, Streams processes collections of data using multiple threads of execution at each stage of processing.
One way to make the flat mapping more computationally efficient is to use a for loop instead of the stream API or forEach method. The for loop would iterate over the parent list, and for each element, it would add the child list to the flat list. This avoids the overhead of creating streams and using the collect method. Additionally, using an ArrayList to store the flat list instead of a LinkedList can also improve performance since it has a more efficient implementation of the addAll method.
List<SomeObject> flatList = new ArrayList<>();
for (Object o : parentList) {
flatList.addAll(o.getChildList());
Another way would be to use an iterator. Iterator is an interface for traversing a collection and it's more efficient than forEach or for loop.
List<SomeObject> flatList - new ArrayList<>();
Iterator<Object> iterator = parentList.iterator();
while(iterator.hasNext()){
Object o = iterator.next():
flatList.addAll(o.getChildList()):
}
You could also use the concat method for List, which concatenates two lists in an efficient way and results in a new list.
List<SomeObject> flatList = new ArrayList<>()
for (Object o : parentList){
flatList.concat(o.getChildList());
}
THERE ARE SERVERAL RESOURCES THAT YOU CAN USE FOR ADDITIONAL READING ON THIS TOPIC. HERE ARE A FEW THAT I WOULD RECOMMEND.
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/List.html
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/ArrayList.html
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/Iterator.html
https://www.oreilly.com/library/view/java-performance-the/9781449358652/
https://www.tutorialspoint.com/java_data_structure_algorithms/index.htm
There are two test cases which use parallelStream():
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().filter(integer -> (integer % 2) == 0).forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>9332
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>17908
Why do I always lose data when using parallelStream?
What did i do wrong?
ArrayList isn't thread safe. You need to do
List<String> strings = Collections.synchronizedList(new ArrayList<>());
or
List<String> strings = new Vector<>();
to ensure all updates are synchronized, or switch to
List<String> strings = src.parallelStream()
.filter(integer -> (integer % 2) == 0)
.map(integer -> integer + "")
.collect(Collectors.toList());
and leave the list building to the Streams framework. Note that it's undefined whether the list returned by collect is modifiable, so if that is a requirement, you may need to modify your approach.
In terms of performance, Stream.collect is likely to be much faster than using Stream.forEach to add to a synchronized collection, since the Streams framework can handle collection of values in each thread separately without synchronization and combine the results at the end in a thread safe fashion.
ArrayList isn't thread-safe. While 1 thread sees a list with 30 elements another might still see 29 and override the 30th position (loosing 1 element).
Another issue might arise when the array backing the list needs to be resized. A new array (with double the size) is created and elements from the original array are copied into it. While other threads might have added stuff the thread doing the resizing might not have seen this or multiple threads are resizing and eventually only 1 will win.
When using multiple threads you need to either do some syncronized when accessing the list OR use a multi-thread safe list (by either wrapping it in a SynchronizedList or by using a CopyOnWriteArrayList to mention 2 possible solutions). Even better would be to use the collect method on the stream to put everything into a list.
ParallelStream with forEach is a deadly combo if not used carefully.
Please take a look at below points to avoid any bugs:
If you have a preexisting list object in which you want to add more objects from a parallelStream loop, Use Collections.synchronizedList & pass that pre-existing list object to it before looping through the parallelstream.
If you have to create a new list, then you can use Vector to initialize the list outside the loop.
or
If you have to create a new list, then simply use parallelStream and collect the output at the end.
You lose the benefits of using stream (and parallel stream) when you try to do mutation. As a general rule, avoid mutation when using streams. Venkat Subramaniam explains why. Instead, use collectors. Also try to get a lot accomplished within the stream chain. For example:
System.out.println(
IntStream.range(0, 200000)
.filter(i -> i % 2 == 0)
.mapToObj(String::valueOf)
.collect(Collectors.toList()).size()
);
You can run that in parallelStream by adding .parallel()
I create the below Set:
Set<String> set = new HashSet<>();
set.add("Test1,Test2");
set.add("Test3,Test4");
I need to convert this Set to a List by splitting of all elements separately.
Final List should contain four elements, i.e.
Test1, Test2, Test3, Test4
Please clarify how to convert the Set to a List using Java 8.
I tried like this, but it returns a List of List of String, instead of a List of String.
set.stream().map(x-> Arrays.asList(x.split(","))).collect(Collectors.toList());
You need to use flatMap(...) to convert the list of list of elements into a list of elements. Into flatMap(...) you need to provide a lamba or method reference to convert the elements of the stream (the lists) into a stream of elements (the actual elements of the lists).
Since here your elements in the stream are lists, you can do Collection::stream but if you were to keep the array (not using Arrays.asList(...)) you could also do Arrays::stream.
A final possible solution could be:
set.stream().map(x -> x.split(",")).flatMap(Arrays::stream).collect(Collectors.toList())
Or this less efficient solution:
set.stream().map(x -> Arrays.asList(x.split(","))).flatMap(Collection::stream).collect(Collectors.toList())
set.stream()
.map(i -> Arrays.asList(i.split(",")))
.flatMap(list -> list.stream())
.sorted()
.collect(Collectors.toList());
I am guessing set is a Set of Strings, since you can split the items in the lambda. String.split returns an array of Strings, you convert it to the List with Arrays.asList. So now you have Stream of List<String>s, meaning, by collecting them with toList, it gives you List<List<String>>. So before collecting the items, you need to call flatMap(list -> list.stream()) so it becomes Stream of Strings
Why not consider a simpler approach?
HashSet<String> mySet = new HashSet<>();
List<String> myList = new ArrayList<>();
for (String string : mySet) {
String[] strings = string.split(",");
myList.addAll(Arrays.asList(strings));
}
This is still using Java 8.
Is there a way to convert a List<Set<String>> mainList to a plain List, without iterating over elements?
For example this one has value:
mainList = {ArrayList#705} size = 2
0 = {HashSet#708} size = 3
0 = "A2"
1 = "A1"
2 = "A3"
1 = {HashSet#709} size = 3
0 = "A6"
1 = "A5"
2 = "A7"
I would like to have a new list like so:
list = A2,A1,A3, A6, A5, A7
If you are only curious about not using iterator, you can use simple for each loop to solve the purpose
List<Set<String>> hs = null ; // Actual given List<Set<String>
ArrayList<String> arrayList = new ArrayList<String>(); // Plain List that will contain all the strings
for(Set<String> set :hs) {
arrayList.addAll(new ArrayList<String>(set)); // adding all the elements in list from hashset
}
and with using streams(java 1.8 and above) in this way
List<Set<String>> list = null ;
List<String> al = hs.stream().flatMap(Set::stream).collect(Collectors.toList());
but can you please explain why you don't want to use iterators?
You can't. Ordinarily, the only way to copy n things is to iterate over each of them.
The only way to avoid iterating over elements would be a lower-level operation like an array copy.
An ArrayList would do this (others like LinkedList would not) but no Set implementation in the JDK provides its own toArray implementation. They all use AbstractCollection.toArray which internally iterates over all of the elements.
If you implemented or found an array-based Set implementation (which would almost certainly not be an optimal Set, however) then you could flatten an ArrayList<ArraySet<String>> by using a series of array copies without iterating over the elements.
If you are using java 1.8 and above, then you can use streams but it will still use an internal iterator. Here is an example:
List<String> list = mainList.stream() // create stream
.flatMap(Set::stream) // convert Set<String> to Stream
.collect(Collectors.toList()); // collect to new ArrayList
I want to run this code in parallel using java parallel stream and update result in two ArrayList. The code given below is working fine except that the non-thread-safety of ArrayList may cause incorrect results, and I don't want to synchronize the ArrayList. Can someone please suggest me a proper way of using parallel stream for my case.
List<Integer> passedList= new ArrayList<>();
List<Integer> failedList= new ArrayList<>();
Integer[] input = {0,1,2,3,4,5,6,7,8,9};
List<Integer> myList = Arrays.asList(input);
myList.parallelStream().forEach(element -> {
if (isSuccess(element)) {//Some SOAP API call.
passedList.add(element);
} else {
failedList.add(element);
}
});
System.out.println(passedList);
System.out.println(failedList);
An appropriate solution would be to use Collectors.partitioningBy:
Integer[] input = {0,1,2,3,4,5,6,7,8,9};
List<Integer> myList = Arrays.asList(input);
Map<Boolean, List<Integer>> map = myList.parallelStream()
.collect(Collectors.partitioningBy(element -> isSuccess(element)));
List<Integer> passedList = map.get(true);
List<Integer> failedList = map.get(false);
This way you will have no thread-safety problems as the task will be decomposed in map-reduce manner: the parts of the input will be processed independently and joined after that. If your isSuccess method is slow you will likely to have performance boost here.
By the way you can create a parallel stream from the original array using Arrays.stream(input).parallel() without necessity to create an intermediate myList.