I want to run this code in parallel using java parallel stream and update result in two ArrayList. The code given below is working fine except that the non-thread-safety of ArrayList may cause incorrect results, and I don't want to synchronize the ArrayList. Can someone please suggest me a proper way of using parallel stream for my case.
List<Integer> passedList= new ArrayList<>();
List<Integer> failedList= new ArrayList<>();
Integer[] input = {0,1,2,3,4,5,6,7,8,9};
List<Integer> myList = Arrays.asList(input);
myList.parallelStream().forEach(element -> {
if (isSuccess(element)) {//Some SOAP API call.
passedList.add(element);
} else {
failedList.add(element);
}
});
System.out.println(passedList);
System.out.println(failedList);
An appropriate solution would be to use Collectors.partitioningBy:
Integer[] input = {0,1,2,3,4,5,6,7,8,9};
List<Integer> myList = Arrays.asList(input);
Map<Boolean, List<Integer>> map = myList.parallelStream()
.collect(Collectors.partitioningBy(element -> isSuccess(element)));
List<Integer> passedList = map.get(true);
List<Integer> failedList = map.get(false);
This way you will have no thread-safety problems as the task will be decomposed in map-reduce manner: the parts of the input will be processed independently and joined after that. If your isSuccess method is slow you will likely to have performance boost here.
By the way you can create a parallel stream from the original array using Arrays.stream(input).parallel() without necessity to create an intermediate myList.
Related
There is a problem on my server where it became a bottle neck due to a specific problem to solve resolving a List<List<SomeObject>> into a List<SomeObject>. The CPU of the server spiked above normal means.
DataStructure is:
Object:
List<SomeObject> childList;
Trying to make a List<Object> flatmapped to List<SomeObject> in the most computationally efficient way.
If parentList = List<Object>:
I Tried:
parentList.stream().flatMap(child -> child.getChildList().stream()).collect(Collectors.toList())
Also tried:
List<Object> all = new ArrayList<>();
parentList.forEach(child -> all.addAll(child.getChildList()))
Any other suggestions? These seem to be similar in computation but pretty high due to copying underneath the hood.
This may be more efficient since it eliminates creating multiple streams via flatMap. MapMulti was introduced in Java 16. It takes the streamed argument and a consumer which puts something on the stream, in this case each list's object.
List<List<Object>> lists = new ArrayList<>(
List.of(List.of("1", "2", "3"),
List.of("4", "5", "6", "7"),
List.of("8", "9")));
List<Object> list = lists.stream().mapMulti(
(lst, consumer) -> lst.forEach(consumer))
.toList();
System.out.print(list);
prints
[1, 2, 3, 4, 5, 6, 7, 8, 9]
Do we know more about which List implementation is used?
I would try to init the resulting list with the correct expected size.
This avoids unnecessary copying.
This assumes that the size of the lists can be retrieved fast.
int expectedSize = parentList.stream()
.mapToInt(entry -> entry.getChildList().size())
.sum();
List<SomeObject> result = new ArrayList<>(expectedSize);
for (var entry : parentList) {
result.addAll(entry.getChildList());
}
In java 8
List<Object> listOne = new ArrayList<>();
List<Object> listTwo = new ArrayList<>();
List<Object> listThree = new ArrayList<>();
...
Stream.of(...) concatenate many lists
List<Object> newList = Stream.of(listOne,listTwo,listThree).flatMap(Collection::stream).collect(Collectors.toList());
In Java 16+
List<Object> newList=Stream.concat(Stream.concat(listOne, listTwo), listThree).toList();
Being an ETL (“Extract Transform and Load”) process, Streams processes collections of data using multiple threads of execution at each stage of processing.
One way to make the flat mapping more computationally efficient is to use a for loop instead of the stream API or forEach method. The for loop would iterate over the parent list, and for each element, it would add the child list to the flat list. This avoids the overhead of creating streams and using the collect method. Additionally, using an ArrayList to store the flat list instead of a LinkedList can also improve performance since it has a more efficient implementation of the addAll method.
List<SomeObject> flatList = new ArrayList<>();
for (Object o : parentList) {
flatList.addAll(o.getChildList());
Another way would be to use an iterator. Iterator is an interface for traversing a collection and it's more efficient than forEach or for loop.
List<SomeObject> flatList - new ArrayList<>();
Iterator<Object> iterator = parentList.iterator();
while(iterator.hasNext()){
Object o = iterator.next():
flatList.addAll(o.getChildList()):
}
You could also use the concat method for List, which concatenates two lists in an efficient way and results in a new list.
List<SomeObject> flatList = new ArrayList<>()
for (Object o : parentList){
flatList.concat(o.getChildList());
}
THERE ARE SERVERAL RESOURCES THAT YOU CAN USE FOR ADDITIONAL READING ON THIS TOPIC. HERE ARE A FEW THAT I WOULD RECOMMEND.
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/List.html
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/ArrayList.html
https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/Iterator.html
https://www.oreilly.com/library/view/java-performance-the/9781449358652/
https://www.tutorialspoint.com/java_data_structure_algorithms/index.htm
There are two test cases which use parallelStream():
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().filter(integer -> (integer % 2) == 0).forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>9332
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>17908
Why do I always lose data when using parallelStream?
What did i do wrong?
ArrayList isn't thread safe. You need to do
List<String> strings = Collections.synchronizedList(new ArrayList<>());
or
List<String> strings = new Vector<>();
to ensure all updates are synchronized, or switch to
List<String> strings = src.parallelStream()
.filter(integer -> (integer % 2) == 0)
.map(integer -> integer + "")
.collect(Collectors.toList());
and leave the list building to the Streams framework. Note that it's undefined whether the list returned by collect is modifiable, so if that is a requirement, you may need to modify your approach.
In terms of performance, Stream.collect is likely to be much faster than using Stream.forEach to add to a synchronized collection, since the Streams framework can handle collection of values in each thread separately without synchronization and combine the results at the end in a thread safe fashion.
ArrayList isn't thread-safe. While 1 thread sees a list with 30 elements another might still see 29 and override the 30th position (loosing 1 element).
Another issue might arise when the array backing the list needs to be resized. A new array (with double the size) is created and elements from the original array are copied into it. While other threads might have added stuff the thread doing the resizing might not have seen this or multiple threads are resizing and eventually only 1 will win.
When using multiple threads you need to either do some syncronized when accessing the list OR use a multi-thread safe list (by either wrapping it in a SynchronizedList or by using a CopyOnWriteArrayList to mention 2 possible solutions). Even better would be to use the collect method on the stream to put everything into a list.
ParallelStream with forEach is a deadly combo if not used carefully.
Please take a look at below points to avoid any bugs:
If you have a preexisting list object in which you want to add more objects from a parallelStream loop, Use Collections.synchronizedList & pass that pre-existing list object to it before looping through the parallelstream.
If you have to create a new list, then you can use Vector to initialize the list outside the loop.
or
If you have to create a new list, then simply use parallelStream and collect the output at the end.
You lose the benefits of using stream (and parallel stream) when you try to do mutation. As a general rule, avoid mutation when using streams. Venkat Subramaniam explains why. Instead, use collectors. Also try to get a lot accomplished within the stream chain. For example:
System.out.println(
IntStream.range(0, 200000)
.filter(i -> i % 2 == 0)
.mapToObj(String::valueOf)
.collect(Collectors.toList()).size()
);
You can run that in parallelStream by adding .parallel()
I want to convert this while loop to equivalent code using a Java 8 Streams, but I don't know how to both stream the List and remove elements from it.
private List<String> nameList = new ArrayList<>();
while (nameList.size() > 0) {
String nameListFirstEntry = nameList.get(0);
nameList.remove(0);
setNameCombinations(nameListFirstEntry);
}
I guess this will do
nameList.forEach(this::setNameCombinations);
nameList.clear();
In case you don't need the original list anymore, you might as well create a new empty list instead.
Because List#remove(int) also returns the element, you can both stream the list's elements and remove them via a stream:
Stream.generate(() -> nameList.remove(0))
.limit(nameList.size())
.forEach(this::setNameCombinations);
This code doesn't break any "rules". From the javadoc of Stream#generate():
Returns an infinite sequential unordered stream where each element is generated by the provided Supplier. This is suitable for generating constant streams, streams of random elements, etc.
There is no mention of any restrictions on how the supplier is implemented or that is must have no side effects etc. The Supplier's only contract is to supply.
For those who doubt this is "works", here's some test code using 100K elements showing that indeed order is preserved:
int size = 100000;
List<Integer> list0 = new ArrayList<>(size); // the reference list
IntStream.range(0, size).boxed().forEach(list0::add);
List<Integer> list1 = new ArrayList<>(list0); // will feed stream
List<Integer> list2 = new ArrayList<>(size); // will consume stream
Stream.generate(() -> list1.remove(0))
.limit(list1.size())
.forEach(list2::add);
System.out.println(list0.equals(list2)); // always true
I have ImmutableMap<String, Integer> and List<String> that defines the order. I want to get ImmutableList<Integer> with that order. For example:
map (<"a", 66>, <"kk", 2>, <"m", 8>)
list ["kk", "m", "a"]
As result I want another list of values from given list with defined order :[2, 8, 66].
What is the best way to do it in java?
Without streams you can use a simple for loop:
List<Integer> valuesInOrder = new ArrayList<>(map.size());
for(String s : list) {
valuesInOrder.add(map.get(s));
}
If you want to use streams you could do:
List<Integer> valuesInOrder =
list.stream().map(map::get).collect(Collectors.toList());
Try this,
List<Integer> numbers= new ArrayList<>(map.size());
for(String l : list) {
numbers.add(map.get(l));
}
example from Java 8 streams
List<Integer> numbers= list.stream().map(map::get).collect(Collectors.toList());
Java SE 8 to the rescue! The Java API comes with a new abstraction called Stream that lets you process data in a declarative way. Streams can leverage multi-core architectures without you having to write a single line of multithread code.
Here I am using Javaparallel stream to iterate through a List and calling a REST call with each list element as input. I need to add all the results of the REST call to a collection for which I am using an ArrayList. The code given below is working fine except that the non-thread-safety of ArrayList would cause incorrect results, and adding needed synchronization would cause contention, undermining the benefit of parallelism.
Can someone please suggest me a proper way of using parallel stream for my case.
public void myMethod() {
List<List<String>> partitions = getInputData();
final List<String> allResult = new ArrayList<String>();
partitions.parallelStream().forEach(serverList -> callRestAPI(serverList, allResult);
}
private void callRestAPI(List<String> serverList, List<String> allResult) {
List<String> result = //Do a REST call.
allResult.addAll(result);
}
You can do the operation with map instead of forEach - that will guarantee thread safety (and is cleaner from a functional programming perspective):
List<String> allResult = partitions.parallelStream()
.map(this::callRestAPI)
.flatMap(List::stream) //flattens the lists
.collect(toList());
And your callRestAPI method:
private List<String> callRestAPI(List<String> serverList) {
List<String> result = //Do a REST call.
return result;
}
I wouldn't shy away from synchronising access to your ArrayList. Given that you're accessing a remote service via Rest, I suspect the cost of synchronisation would be negligible. I would measure the effect before you spend time optimising.