java intstream parallel loop omitting data

java intstream parallel loop omitting data - java

I have this piece of code:
ArrayList<ArrayList<Double> results = new ArrayList<ArrayList<Double>();
IntStream.range(0, 100).parallel().forEach(x ->{
for (int y = 0; y <100;y++){
for (int z = 0; z <100;z++){
for (int q = 0; q <100;q++){
results.add(someMethodThatReturnsArrayListDouble);
}
}
}
});
System.out.println(results.size());
After running this code, i get always different results.size(), always a few short. Any idea why is that and how to fix it?

ArrayList is not threadsafe. If you try and add items to it in different threads (which is what a parallellised stream does), it is likely to break.
From the docs:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method.
The easiest fix, in this case, would be to remove the call to parallel().

You result is not synchronized. There are multiple ways to solve your problem, the best would be letting the java stream api handle the combining of the lists.
List<List<Double>> results = IntStream.range(0, 100).parallel().flatmap(x ->{
List<Double>> results = new ArrayList<Double>();
for (int y = 0; y <100;y++){
for (int z = 0; z <100;z++){
for (int q = 0; q <100;q++){
results.add(someMethodThatReturnsArrayListDouble);
}
}
}
return results.stream();
}).collect(Collectors.toList());
This collects the lists in the method, and returns them as a stream to be combined at the end of the method using collectors.toList(), what is thread safe.

use
Vector
it's a thread-safe implementation of List.

Related

java 8 parallelStream().forEach Result data loss

There are two test cases which use parallelStream():
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().filter(integer -> (integer % 2) == 0).forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>9332
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>17908
Why do I always lose data when using parallelStream?
What did i do wrong?

ArrayList isn't thread safe. You need to do
List<String> strings = Collections.synchronizedList(new ArrayList<>());
or
List<String> strings = new Vector<>();
to ensure all updates are synchronized, or switch to
List<String> strings = src.parallelStream()
.filter(integer -> (integer % 2) == 0)
.map(integer -> integer + "")
.collect(Collectors.toList());
and leave the list building to the Streams framework. Note that it's undefined whether the list returned by collect is modifiable, so if that is a requirement, you may need to modify your approach.
In terms of performance, Stream.collect is likely to be much faster than using Stream.forEach to add to a synchronized collection, since the Streams framework can handle collection of values in each thread separately without synchronization and combine the results at the end in a thread safe fashion.

ArrayList isn't thread-safe. While 1 thread sees a list with 30 elements another might still see 29 and override the 30th position (loosing 1 element).
Another issue might arise when the array backing the list needs to be resized. A new array (with double the size) is created and elements from the original array are copied into it. While other threads might have added stuff the thread doing the resizing might not have seen this or multiple threads are resizing and eventually only 1 will win.
When using multiple threads you need to either do some syncronized when accessing the list OR use a multi-thread safe list (by either wrapping it in a SynchronizedList or by using a CopyOnWriteArrayList to mention 2 possible solutions). Even better would be to use the collect method on the stream to put everything into a list.

ParallelStream with forEach is a deadly combo if not used carefully.
Please take a look at below points to avoid any bugs:
If you have a preexisting list object in which you want to add more objects from a parallelStream loop, Use Collections.synchronizedList & pass that pre-existing list object to it before looping through the parallelstream.
If you have to create a new list, then you can use Vector to initialize the list outside the loop.
or
If you have to create a new list, then simply use parallelStream and collect the output at the end.

You lose the benefits of using stream (and parallel stream) when you try to do mutation. As a general rule, avoid mutation when using streams. Venkat Subramaniam explains why. Instead, use collectors. Also try to get a lot accomplished within the stream chain. For example:
System.out.println(
IntStream.range(0, 200000)
.filter(i -> i % 2 == 0)
.mapToObj(String::valueOf)
.collect(Collectors.toList()).size()
);
You can run that in parallelStream by adding .parallel()

Multiple threads in for loop

I have a method I need to call for each element in a list, then return this list to the caller in another class. I want to create a Thread for each element but am struggling to get my head around how to do this.
public List<MyList> threaded(List<Another> another) {
List<MyList> myList= new ArrayList<>();
Visibility visi = new Visibility();
Thread[] threads = new Thread[another.size()];
for (int i = 0; i < another.size(); i++) {
visi = test(another.get(i));
myList.add(visi);
}
return myList;
}
So i've defined an array of threads that matches the number of elements in another list. To use each of those threads in the loop and then return the myList after all threads have been executed is where i'm lost.

This looks like a perfect use case for a Stream.parallelStream()
public List<MyList> threaded(List<Another> another) {
return another.parallelStream()
.map(a -> test(a));
.collect(Collectors.toList());
}
This will call test on each Another and collect the results as a List using as many cpus as you have available (up to the number of objects you have)
Yes, you could create a Thread for each one, except this is like to be less efficient and much more complicated.

Iterate through an list of objects and run a function for each - Java

I'm wondering for the simplest method for how to run a specific function for each object in an array (or other list type)
My goal is to be able create a list of objects, and have each object run a specific function as it passes through the iterator.
I've tried a for loop on an arraylist
for (int i = 0; i < testList.size(); i++)
{
this = textList.get(i);
this.exampleFunction();
}
But this gives me a 'Variable expected' error

Assuming you're using Java 8+, and you have a Collection<TypeInList> you could call Collection.stream() and do a forEach on that. Like,
testList.stream().forEach(TypeInList::function);
Your current approach is trying to do things with this that cannot be done. It could be fixed like,
for (int i = 0; i < testList.size(); i++)
{
TypeInList that = testList.get(i); // this is a reserved word.
that.function();
}
or
for (TypeInList x : testList) {
x.function();
}

There are multiple ways to iterate through a list, but the easiest I personally find is like this:
Assuming that your list contains String objects e.g.:
List<String> list = new ArrayList();
list.add("Hello");
list.add("World");
for(String current : list){
System.out.println(current);
}
The loop will iterate twice, and console will output the following:
Hello
World
This approach doesn't rely on indexes (as how you're using it in your question), as such I find it easy to use for iterating through a single list.
However the disadvantage is that if you have 2 separate lists that you would like to iterate through, the lack of indexes makes it a bit more complicated. The easier approach for iterating through multiple lists would be using the traditional approach, something like this:
for(int i=0; i<list.size(); i++){
int x = list1.get(i);
int y = list2.get(i);
}
As such your use-case really determines the ideal method you can adopt.

Remove an object from an ArrayList without (implicitly) looping through it

I am looping through a list A to find X. Then, if X has been found, it is stored into list B. After this, I want to delete X from list A. As speed is an important issue for my application, I want to delete X from A without looping through A. This should be possible as I already know the location of X in A (I found its position in the first line). How can I do this?
for(int i = 0; i<n; i++) {
Object X = methodToGetObjectXFromA();
B.add(X);
A.remove(X); // But this part is time consuming, as I unnecessarily loop through A
}
Thanks!

Instead of returning the object from yhe method, you can return its index and then remove by index:
int idx = methodToGetObjectIndexFromA();
Object X = A.remove(idx); // But this part is time consuming, as I unnecessarily loop through A
B.add(X);
However, note that the remove method may be still slow due to potential move of the array elements.

You can use an iterator, and if performance is an issue is better you use a LinkedList for the list you want to remove from:
public static void main(String[] args) {
List<Integer> aList = new LinkedList<>();
List<Integer> bList = new ArrayList<>();
aList.add(1);
aList.add(2);
aList.add(3);
int value;
Iterator<Integer> iter = aList.iterator();
while (iter.hasNext()) {
value = iter.next().intValue();
if (value == 3) {
bList.add(value);
iter.remove();
}
}
System.out.println(aList.toString()); //[1, 2]
System.out.println(bList.toString()); //[3]
}

If you stored all the objects to remove in a second collection, you may use ArrayList#removeAll(Collection)
Removes from this list all of its elements that are contained in the
specified collection.
Parameters:
c collection containing elements to be removed from this list
In this case, just do
A.removeAll(B);
When exiting your loop.
Addition
It calls ArrayList#batchRemove which will use a loop to remove the objects but you do not have to do it yourself.

Multithreaded: Identifying duplicate objects

I'm trying to implement a duplicate objects finding method over a List object. Traversing through the List and finding the duplicate objects using multiple threads is the target. So far I used ExecutorService as follows.
ExecutorService executor = Executors.newFixedThreadPool(5);
for (int i = 0; i < jobs; i++) {
Runnable worker = new TaskToDo(jobs);
executor.execute(worker);
}
executor.shutdown();
while (!executor.isTerminated()) {
}
System.out.println("Finished all threads");
At TaskToDo class I iterate through the loop. When a duplicate is detected the one out of them will be removed from the List. Following are the problems I faced,
When using multiple threads at the executor it does not result as intended. Some duplicate values are still exist in the list. But a single thread at the executor works perfectly. I tried
List<String> list = Collections.synchronizedList(new LinkedList<String>()) also but same problem exists.
What is the best data structure that i can use for this purpose of removing duplicates for better performance ?
Google gave some results to use Concurrent structures. But difficult to figure out a correct approach to achieve this.
Appreciate your help. Thanks in advance... :)
Following is the code for iterating through the specified list object. Here actual content of the files will be compared.
for(int i = currentTemp; i < list.size() - 1; i++){
if(isEqual(list.get(currentTemp), list.get(i+1))){
synchronized (list) {
list.remove(i + 1);
i--;
}}}

With your current logic, you would have to synchronize at coarser granularity, otherwise you risk removing the wrong element.
for (int i = currentTemp; i < list.size() - 1; i++) {
synchronized (list) {
if (i + 1 > list.size() && isEqual(list.get(currentTemp), list.get(i+1))) {
list.remove(i + 1);
i--;
}
}
}
You see, the isEqual() check must be inside the synchronized block to ensure atomicity of the equivalence check with the element removal. Assuming most of your concurrent processing benefit would come from asynchronous comparison of list elements using isEqual(), this change nullifies any benefit you sought.
Also, checking list.size() outside the synchronized block isn't good enough, because list elements can be removed by other threads. And unless you have a way of adjusting your list index down when elements are removed by other threads, your code will unknowingly skip checking some elements in the list. The other threads are shifting elements out from under the current thread's for loop.
This task would be much better implemented using an additional list to keep track of indexes that should be removed:
private volatile Set<Integer> indexesToRemove =
Collections.synchronizedSet(new TreeSet<Integer>(
new Comparator<Integer>() {
#Override public int compare(Integer i1, Integer i2) {
return i2.compareTo(i1); // sort descending for later element removal
}
}
));
The above should be declared at the same shared level as your list. Then the code for iterating through the list should look like this, with no synchronization required:
int size = list.size();
for (int i = currentTemp; i < size - 1; i++) {
if (!indexesToRemove.contains(i + 1)) {
if (isEqual(list.get(currentTemp), list.get(i+1))) {
indexesToRemove.add(i + 1);
}
}
}
And finally, after you have join()ed the worker threads back to a single thread, do this to de-duplicate your list:
for (Integer i: indexesToRemove) {
list.remove(i.intValue());
}
Because we used a descending-sorted TreeSet for indexesToRemove, we can simply iterate its indexes and remove each from the list.

If your algorithm acts on sufficient data that you might really benefit from multiple threads, you encounter another issue that will tend to mitigate any performance benefits. Each thread has to scan the entire list to see if the element it is working on is a duplicate, which will cause the CPU cache to keep missing as various threads compete to access different parts of the list.
This is known as False Sharing.
Even if False Sharing does not get you, you are de-duping the list in O(N^2) because for each element of the list, you re-iterate the entire list.
Instead, consider using a Set to initially collect the data. If you cannot do that, test the performance of adding the list elements to a Set. That should be a very efficient way to approach this problem.

If you're trying to dedup a large number of files, you really ought to be using a hash-based structure. Concurrently modifying lists is dangerous, not least because indexes into the list will constantly be changing out from under you, and that's bad.
If you can use Java 8, my approach would look something like this. Let's assume you have a List<String> fileList.
Collection<String> deduplicatedFiles = fileList.parallelStream()
.map(FileSystems.getDefault()::getPath) // convert strings to Paths
.collect(Collectors.toConcurrentMap(
path -> {
try {
return ByteBuffer.wrap(Files.readAllBytes(path)),
// read out the file contents and wrap in a ByteBuffer
// which is a suitable key for a hash map
} catch (IOException e) {
throw new RuntimeException(e);
}
},
path -> path.toString(), // in the values, convert back to string
(first, second) -> first) // resolve duplicates by choosing arbitrarily
.values();
That's the entire thing: it concurrently reads all the files, hashes them (though with an unspecified hash algorithm that may not be great), deduplicates them, and spits out a list of files with distinct contents.
If you're using Java 7, then what I'd do would be something like this.
CompletionService<Void> service = new ExecutorCompletionService<>(
Executors.newFixedThreadPool(4));
final ConcurrentMap<ByteBuffer, String> unique = new ConcurrentHashMap<>();
for (final String file : fileList) {
service.submit(new Runnable() {
#Override public void run() {
try {
ByteBuffer buffer = ByteBuffer.wrap(Files.readAllBytes(
FileSystem.getDefault().getPath(file)));
unique.putIfAbsent(buffer, file);
} catch (IOException e) {
throw new RuntimeException(e);
}
}, null);
}
for (int i = 0; i < fileList.size(); i++) {
service.take();
}
Collection<String> result = unique.values();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java intstream parallel loop omitting data - java

use Vector it's a thread-safe implementation of List.

Related

java 8 parallelStream().forEach Result data loss

Multiple threads in for loop

Iterate through an list of objects and run a function for each - Java

Remove an object from an ArrayList without (implicitly) looping through it

Multithreaded: Identifying duplicate objects

Categories

Resources