Java .parallelStream() with random parameter inside [duplicate] - java

We have multiple threads calling add(obj) on an ArrayList.
My theory is that when add is called concurrently by two threads, that only one of the two objects being added is really added to the ArrayList. Is this plausable?
If so, how do you get around this? Use a synchronized collection like Vector?

There is no guaranteed behavior for what happens when add is called concurrently by two threads on ArrayList. However, it has been my experience that both objects have been added fine. Most of the thread safety issues related to lists deal with iteration while adding/removing. Despite this, I strongly recommend against using vanilla ArrayList with multiple threads and concurrent access.
Vector used to be the standard for concurrent lists, but now the standard is to use the Collections synchronized list.
Also I highly recommend Java Concurrency in Practice by Goetz et al if you're going to be spending any time working with threads in Java. The book covers this issue in much better detail.

Any number of things could happen. You could get both objects added correctly. You could get only one of the objects added. You could get an ArrayIndexOutOfBounds exception because the size of the underlying array was not adjusted properly. Or other things may happen. Suffice it to say that you cannot rely on any behavior occurring.
As alternatives, you could use Vector, you could use Collections.synchronizedList, you could use CopyOnWriteArrayList, or you could use a separate lock. It all depends on what else you are doing and what kind of control you have over access to the collection.

You could also get a null, an ArrayOutOfBoundsException, or something left up to the implementation. HashMaps have been observed to go into an infinite loop in production systems. You don't really need to know what might go wrong, just don't do it.
You could use Vector, but it tends to work out the interface is not rich enough. You will probably find that you want a different data structure in most cases.

I came up with the following code to mimic somewhat a real world scenario.
100 tasks are run in parallel and they update their completed status to the main program. I use a CountDownLatch to wait for task completion.
import java.util.concurrent.*;
import java.util.*;
public class Runner {
// Should be replaced with Collections.synchronizedList(new ArrayList<Integer>())
public List<Integer> completed = new ArrayList<Integer>();
/**
* #param args
*/
public static void main(String[] args) {
Runner r = new Runner();
ExecutorService exe = Executors.newFixedThreadPool(30);
int tasks = 100;
CountDownLatch latch = new CountDownLatch(tasks);
for (int i = 0; i < tasks; i++) {
exe.submit(r.new Task(i, latch));
}
try {
latch.await();
System.out.println("Summary:");
System.out.println("Number of tasks completed: "
+ r.completed.size());
} catch (InterruptedException e) {
e.printStackTrace();
}
exe.shutdown();
}
class Task implements Runnable {
private int id;
private CountDownLatch latch;
public Task(int id, CountDownLatch latch) {
this.id = id;
this.latch = latch;
}
public void run() {
Random r = new Random();
try {
Thread.sleep(r.nextInt(5000)); //Actual work of the task
} catch (InterruptedException e) {
e.printStackTrace();
}
completed.add(id);
latch.countDown();
}
}
}
When i ran the application 10 times and at least 3 to 4 times the program did not print correct number of completed tasks. Ideally it should print 100(if no exceptions happen). But in some cases it was printing 98, 99 etc.
Thus it proves that concurrent updates of ArrayList will not give correct results.
If i replace the ArrayList with a Synchronized version, the program outputs the correct results.

you can use List l = Collections.synchronizedList(new ArrayList()); if you want thread safe version of arrayList.

The behavior is probably undefined since ArrayList isn't threadsafe. If you modify the list while an Iterator is interating over it then you will get a ConcurrentModificationException. You can wrap the ArrayList with Collection.synchronizedList or use a thread-safe collection (there are many), or just put the add calls in a synchronized block.

You could use instead of ArrayList(); :
Collections.synchronizedList( new ArrayList() );
or
new Vector();
synchronizedList as of me preferable because it's:
faster on 50-100%
can work with already existing ArrayList's

In my recently experience using ArrayList to add new elements from different threads will miss a few of them, so using Collections.synchronizedList(new ArrayList()) avoid that issue.
List<String> anotherCollection = new ArrayList<>();
List<String> list = new ArrayList<>();
// if 'anotherCollection' is bigger enough it will miss some elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));
List<String> listSync = Collections.synchronizedList(new ArrayList<>());
// regardless of 'anotherCollection' is bigger it will add all the elements.
anotherCollection.parallelStream().forEach(el -> list.add("element" + el));

java.util.concurrent has a thread-safe array list. The standard ArrayList is not thread-safe and the behavior when multiple threads update at the same time is undefined. There can also be odd behaviors with multiple readers when one or more threads is writing at the same time.

http://java.sun.com/j2se/1.4.2/docs/api/java/util/ArrayList.html
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Since there is no synchronization internally, what you theorize is not plausible.
So, things get out of sync, with unpleasant and unpredictable results.

Related

Java Thread: Real Time Application Example

I was asked a question in an interview, where i have list available in the main method and and i was told there is some operation to be performed on each item in the list, how would i achieve this using threads concept.
Consider the following scenario:
I have a list of integers. I need to print all the values from the list. Can it be done using threads concept where i have multiple threads running on each item in the list and where each thread is used to print out a value rather than one thread printing all the values? I am not trying to modify any value in the list.
I hope you are looking for something like that:
public class MaltiThreadExample {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
List<Integer> list = new ArrayList<>(Arrays.asList(1, 2, 3));
for (int i : list) {
Thread th = new Thread() {
#Override
public void run() {
System.out.println(i);
}
};
th.start();
}
}
}
The output is for one execution:
run:
3
1
2
BUILD SUCCESSFUL (total time: 0 seconds)
Yes, it is a typical producer-consumer paradigm:
Imagine a Runnable class who receives an Iterator as parameter, and waits over a certain monitor, and then consumes one item from the iterator, and last notifies the same monitor. Loops while the iterator has more items.
Upon this, it will be enough to create the list of numbers, create the consumer threads passing them the list's iterator, and start them.
The code below is not tested at all. It's just something that comes into mind. The last implementation using parallelStream() might be what you are looking for.
public class DemoApplication {
public static void main(String[] args) {
final List<Integer> myIntegerList = Arrays.asList(1, 2, 3);
// Good old for-each-loop
for (Integer item : myIntegerList) {
System.out.print(item);
}
// Java 8 forEach with Consumer
final Consumer<Integer> consumer = new Consumer<Integer>() {
#Override
public void accept(Integer item) {
System.out.print(item);
}
};
myIntegerList.forEach(consumer);
// Java 8 forEach with Lambda
myIntegerList.forEach((item) -> System.out.print(item));
// Java 8 forEach on parallelStream with Lambda
myIntegerList.parallelStream().forEach((item) -> System.out.print(item));
}
}
i am trying to understand the advantage of threads.
There are basically two reasons for using multiple threads in a program:
(1) Asynchronous event handling: Imagine a program that must wait for and respond to several different kinds of input, and each kind of input can happen at completely arbitrary times.
Before threads, we used to write a big event loop, that would poll for each different kind of event, and then dispatch to different handler functions. Things could start to get ugly when one or more of the event handlers was stateful (i.e., what it did next would depend on the history of previous events.)
A program that has one thread for each different kind of event often is much cleaner. That is to say, it's easier to understand, easier to modify, etc. Each thread loops waiting for just one kind of event, and its state (if any) can be kept in local variables, or its state can be implicit (i.e., depends on what function the thread is in at any given time).
(2) Multiprocessing (a.k.a., "parallel processing", "concurrent programming",...): Using worker threads to perform background computations probably is the most widespread model of multiprocessing in use at this moment in time.
Multithreading is the lowest-level of all multiprocessing models which means (a) it is the hardest to understand, but (b) it is the most versatile.
It can be done. We can make use of concurrenthashmap. We can add the list to this map and pass it to the threads. Each thread will try to get the lock on the resource to operate.

ArrayList vs Vector performance in single-threaded application

I was just looking for the answer for the question why ArrayList is faster than Vector and i found ArrayList is faster as it is not synchronized.
so my doubt is:
If ArrayList is not synchronized why would we use it in multithreaded environment and compare it with Vector.
If we are in a single threaded environment then how the performance of the Vector decreases as there is no Synchronization going on as we are dealing with a single thread.
Why should we compare the performance considering the above points ?
Please guide me :)
a) Methods using ArrayList in a multithreaded program may be synchronized.
class X {
List l = new ArrayList();
synchronized void add(Object e) {
l.add(e);
}
...
b) We can use ArrayList without exposing it to other threads, this is when ArrayList is referenced only from local variables
void x() {
List l = new ArrayList(); // no other thread except current can access l
...
Even in a single threaded environment entering a synchronized method takes a lock, this is where we lose performance
public synchronized boolean add(E e) { // current thread will take a lock here
modCount++;
...
You can use ArrayList in a multithread environment if the list is not shared between threads.
If the list is shared between threads you can synchronize the access to that list.
Otherwise you can use Collections.synchronizedList() to get a List that can be used thread safely.
Vector is an old implementation of a synchronized List that is no longer used because the internal implementation basically synchronize every method. Generally you want to synchronize a sequence of operations. Otherwyse you can throw a ConcurrentModificationException when iterating the list another thread modify it. In addition synchronize every method is not good from a performance point of view.
In addition also in a single thread environment accessing a synchronized method needs to perform some operations, so also in a single thread application Vector is not a good solution.
Just because a component is single threaded doesn't mean that it cannot be used in a thread safe context. Your application may have it's own locking in which case additional locking is redundant work.
Conversely, just because a component is thread safe, it doesn't mean that you cannot use it in an unsafe manner. Typically thread safety extends to a single operation. E.g. if you take an Iterator and call next() on a collection this is two operations and they are no longer thread safe when used in combination. You still have to use locking for Vector. Another simple example is
private Vector<Integer> vec =
vec.add(1);
int n = vec.remove(vec.size());
assert n == 1;
This is atleast three operations however the number of things which can go wrong are much more than you might suppose. This is why you end up doing your own locking and why the locking inside Vector might be redundant, even unwanted.
For you own interest;
vec can change at any point t another Vector or null
vec.add(2) can happen between any operation, changing the size and the last element.
vec.remove() can happen between any operation.
vec.add(null) can happen between any operation resulting in a possible NullPointerException
The vec can /* change */ in these places.
private Vector<Integer> vec =
vec.add(1); /* change*/
int n = vec.remove(vec.size() /* change*/);
assert n == 1;
In short, assuming that just because you used a thread safe collection your code is now thread safe is a big assumption.
A common pattern which breaks is
for(int n : vec) {
// do something.
}
Look harmless enough except
for(Iterator iter = vec.iterator(); /* change */ vec.hasNext(); ) {
/* change */ int n = vec.next();
I have marked with /* change */ where another thread could change the collection meaning this loop can get a ConcurrentModificationException (but might not)
there is no Synchronization
The JVM doesn't know there is no need for synchronization and so it still has to do something. It has an optimisation to reduce the cost of uncontended locks, but it still has to do work.
You need to understand the basic concept to know answer for your above questions...
When you say array list is not syncronized and vector is, we mean that the methods in those classes (like add(), get(), remove() etc...) are synchronized in vector class and not in array list class. These methods will act upon tha data being stored .
So, the data saved in vector class cannot be edited / read parallely as add, get, remove metods are synchornized and the same in array list can be done parallely as these methods in array list are not synchronized...
This parallel activity makes array list fast and vector slow... This behavior remains same though you use them in either multithreaded (or) single threaded enviornment...
Hope this answers your question...

How can I stop two threads colliding when accessing java ArrayList?

I have two threads which both need to access an ArrayList<short[]> instance variable.
One thread is going to asynchronously add short[] items to the list via a callback when new data has arrived : void dataChanged(short[] theData)
The other thread is going to periodically check if the list has items and if it does it is going to iterate over all the items, process them, and remove them from the array.
How can I set this up to guard for collisions between the two threads?
This contrived code example currently throws a java.util.ConcurrentModificationException
//instance vairbales
private ArrayList<short[]> list = new ArrayList<short[]>();
//asynchronous callback happening on the thread that adds the data to the list
void dataChanged(short[] theData) {
list.add(theData);
}
//thread that iterates over the list and processes the current data it contains
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
while (true) {
for(short[] item : list) {
//process the data
}
//clear the list to discared of data which has been processed.
list.clear();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
You might want to use a producer consumer queue like an ArrayBlockingQueue instead or a similar concurrent collection.
The producer–consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue. The producer's job is to generate a piece of data, put it into the buffer and start again. At the same time, the consumer is consuming the data (i.e., removing it from the buffer) one piece at a time. The problem is to make sure that the producer won't try to add data into the buffer if it's full and that the consumer won't try to remove data from an empty buffer.
One thread offers short[]s and the other take()s them.
The easiest way is to change the type of list to a thread safe list implementation:
private List<short[]> list = new CopyOnWriteArrayList<short[]>();
Note that this type of list is not extremely efficient if you mutate it a lot (add/remove) - but if it works for you that's a simple solution.
If you need more efficiency, you can use a synchronized list instead:
private List<short[]> list = Collections.synchronizedList(new ArrayList<short[]>());
But you will need to synchronize for iterating:
synchronized(list) {
for(short[] item : list) {
//process the data
}
}
EDIT: proposals to use a BlockingQueue are probably better but would need more changes in your code.
You might look into a blockingqueue for this instead of an arraylist.
Take a look at Java's synchronization support.
This page covers making a group of statements synchronized on a specified object. That is: only one thread may execute any sections synchronized on that object at once, all others have to wait.
You can use synchronized blocks, but I think the best solution is to not share mutable data between threads at all.
Make each thread to write in its own space and collect and aggregate the results when the workers are finished.
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#synchronizedList%28java.util.List%29
You can ask the Collections class to wrap up your current ArrayList in a synchronized list.

Java Iterator Concurrency

I'm trying to loop over a Java iterator concurrently, but am having troubles with the best way to do this.
Here is what I have where I don't try to do anything concurrently.
Long l;
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
l = i.next();
someObject.doSomething(l);
anotheObject.doSomething(l);
}
There should be no race conditions between the things I'm doing on the non iterator objects, so I'm not too worried about that. I'd just like to speed up how long it takes to loop through the iterator by not doing it sequentially.
Thanks in advance.
One solution is to use an executor to parallelise your work.
Simple example:
ExecutorService executor = Executors.newCachedThreadPool();
Iterator<Long> i = getUserIDs();
while (i.hasNext()) {
final Long l = i.next();
Runnable task = new Runnable() {
public void run() {
someObject.doSomething(l);
anotheObject.doSomething(l);
}
}
executor.submit(task);
}
executor.shutdown();
This will create a new thread for each item in the iterator, which will then do the work. You can tune how many threads are used by using a different method on the Executors class, or subdivide the work as you see fit (e.g. a different Runnable for each of the method calls).
A can offer two possible approaches:
Use a thread pool and dispatch the items received from the iterator to a set of processing threads. This will not accelerate the iterator operations themselves, since those would still happen in a single thread, but it will parallelize the actual processing.
Depending on how the iteration is created, you might be able to split the iteration process to multiple segments, each to be processed by a separate thread via a different Iterator object. For an example, have a look at the List.sublist(int fromIndex, int toIndex) and List.listIterator(int index) methods.
This would allow the iterator operations to happen in parallel, but it is not always possible to segment the iteration like this, usually due to the simple fact that the items to be iterated over are not immediately available.
As a bonus trick, if the iteration operations are expensive or slow, such as those required to access a database, you might see a throughput improvement if you separate them out to a separate thread that will use the iterator to fill in a BlockingQueue. The dispatcher thread will then only have to access the queue, without waiting on the iterator object to retrieve the next item.
The most important advice in this case is this: "Use your profiler", usually to be followed by "Do not optimise prematurely". By using a profiler, such as VisualVM, you should be able to ascertain the exact cause of any performance issues, without taking shots in the dark.
If you are using Java 7, you can use the new fork/join; see the tutorial.
Not only does it split automatically the tasks among the threads, but if some thread finishes its tasks earlier than the other threads, it "steals" some tasks from the other threads.

what to use in multithreaded environment; Vector or ArrayList

I have this situation:
web application with cca 200 concurent requests (Threads) are in need to log something to local filesystem. I have one class to which all threads are placing their calls, and that class internally stores messages to one Array (Vector or ArrayList) which then in turn will be written to filesystem.
Idea is to return from thread's call ASAP so thread can do it's job as fast as possible, what thread wanted to log can be written to filesystem later, it is not so crucial.
So, that class in turn removes first element from that list and writes it to filesystem, while in real time there is 10 or 20 threads which are appending new logs at the end of that list.
I would like to use ArrayList since it is not synchronized and therefore thread's calls will last less, question is:
am I risking deadlocks / data loss? Is it better to use Vector since it is thread safe? Is it slower to use Vector?
Actually both ArrayList and Vector are very bad choices here, not because of synchronization (which you would definitely need), but because removing the first element is O(n).
The perfect data structure for your purspose is the ConcurrentLinkedQueue: it offers both thread safety (without using synchronization), and O(1) adding and removing.
Are you limitted to particular (old) java version? It not please consider using java.util.concurrent.LinkedBlockingQueue for this kind of stuff. It's really worth looking at java.util.concurrent.* package when dealing with concurrency.
Vector is worse than useless. Don't use it even when using multithreading. A trivial example of why it's bad is to consider two threads simultaneously iterating and removing elements on the list at the same time. The methods size(), get(), remove() might all be synchronized but the iteration loop is not atomic so - kaboom. One thread is bound to try removing something which is not there, or skip elements because the size() changes.
Instead use synchronized() blocks where you expect two threads to access the same data.
private ArrayList myList;
void removeElement(Object e)
{
synchronized (myList) {
myList.remove(e);
}
}
Java 5 provides explicit Lock objects which allow more finegrained control, such as being able to attempt to timeout if a resource is not available in some time period.
private final Lock lock = new ReentrantLock();
private ArrayList myList;
void removeElement(Object e) {
{
if (!lock.tryLock(1, TimeUnit.SECONDS)) {
// Timeout
throw new SomeException();
}
try {
myList.remove(e);
}
finally {
lock.unlock();
}
}
There actually is a marginal difference in performance between a sychronizedlist and a vector. (http://www.javacodegeeks.com/2010/08/java-best-practices-vector-arraylist.html)

Categories