I'm trying to understand the difference in behaviour of an ArrayList and a Vector. Does the following snippet in any way illustrate the difference in synchronization ? The output for the ArrayList (f1) is unpredictable while the output for the Vector (f2) is predictable. I think it may just be luck that f2 has predictable output because modifying f2 slightly to get the thread to sleep for even a ms (f3) causes an empty vector ! What's causing that ?
public class D implements Runnable {
ArrayList<Integer> al;
Vector<Integer> vl;
public D(ArrayList al_, Vector vl_) {
al = al_;
vl = vl_;
}
public void run() {
if (al.size() < 20)
f1();
else
f2();
} // 1
public void f1() {
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
}
public void f2() {
if (vl.size() == 0)
vl.add(0);
else
vl.add(vl.get(vl.size() - 1) + 1);
}
public void f3() {
if (vl.size() == 0) {
try {
Thread.sleep(1);
vl.add(0);
} catch (InterruptedException e) {
System.out.println(e.getMessage());
}
} else {
vl.add(vl.get(vl.size() - 1) + 1);
}
}
public static void main(String... args) {
Vector<Integer> vl = new Vector<Integer>(20);
ArrayList<Integer> al = new ArrayList<Integer>(20);
for (int i = 1; i < 40; i++) {
new Thread(new D(al, vl), Integer.toString(i)).start();
}
}
}
To answer the question: Yes vector is synchronized, this means that concurrent actions on the data structure itself won't lead to unexpected behavior (e.g. NullPointerExceptions or something). Hence calls like size() are perfectly safe with a Vector in concurrent situations, but not with an ArrayList (note if there are only read accesses ArrayLists are safe too, we get into problems as soon as at least one thread writes to the datastructure, e.g. add/remove)
The problem is, that this low level synchronization is basically completely useless and your code already demonstrates this.
if (al.size() == 0)
al.add(0);
else
al.add(al.get(al.size() - 1) + 1);
What you want here is to add a number to your datastructure depending on the current size (ie if N threads execute this, in the end we'd want the list to contain the numbers [0..N)). Sadly that does not work:
Assume that 2 threads execute this code sample concurrently on an empty list/vector. The following timeline is quite possible:
T1: size() # go to true branch of if
T2: size() # alas we again take the true branch.
T1: add(0)
T2: add(0) # ouch
Both execute size() and get back the value 0. They then go into the true branch of the and both add 0 to the datastructure. That's not what you want.
Hence you'll have to synchronize in your business logic anyhow to make sure that size() and add() are executed atomically. Hence the synchronization of vector is quite useless in almost any scenario (contrary to some claims on modern JVMs the performance hit of an uncontended lock is completely negligible though, but the Collections API is much nicer so why not use it)
In The Beginning (Java 1.0) there was the "synchronized vector".
Which entailed a potentially HUGE performance hit.
Hence the addition of "ArrayList" and friends in Java 1.2 onwards.
Your code illustrates the rationale for making vectors synchronized in the first place. But it's simply unnecessary most of the time, and better done in other ways most of the rest of the time.
IMHO...
PS:
An interesting link:
http://www.coderanch.com/t/523384/java/java/ArrayList-Vector-size-incrementation
Vectors are Thread safe. ArrayLists are not. That is why ArrayList is faster than the vector.
The below link has nice info about this.
http://www.javaworld.com/javaworld/javaqa/2001-06/03-qa-0622-vector.html
I'm trying to understand the difference in behaviour of an ArrayList
and a Vector
Vector is synchronized while ArrayList is not. ArrayList is not thread-safe.
Does the following snippet in any way illustrate the difference in
synchronization ?
No difference since only Vector is sunchronized
Related
I have been asked to implement fine grained locking on a hashlist. I have done this using synchronized but the questions tells me to use Lock instead.
I have created a hashlist of objects in the constructor
private LinkedList<E> data[];;
private Lock lock[];
private Lock lockR = new ReentrantLock();
// The constructors ensure that both the data and the dataLock are the same size
#SuppressWarnings("unchecked")
public ConcurrentHashList(int n){
if(n > 1000) {
data = (LinkedList<E>[])(new LinkedList[n/10]);
lock = new Lock [n/10];
}
else {
data = (LinkedList<E>[])(new LinkedList[100]);
lock = new Lock [100]; ;
}
for(int j = 0; j < data.length;j++) {
data[j] = new LinkedList<E>();
lock[j] = new ReentrantLock();// Adding a lock to each bucket index
}
}
The original method
public void add(E x){
if(x != null){
lock.lock();
try{
int index = hashC(x);
if(!data[index].contains(x))
data[index].add(x);
}finally{lock.unlock();}
}
}
Using synchronization to grab a handle on the object hashlist to allow mutable Threads to work on mutable indexes concurrently.
public void add(E x){
if(x != null){
int index = hashC(x);
synchronized (dataLock[index]) { // Getting handle before adding
if(!data[index].contains(x))
data[index].add(x);
}
}
}
I do not know how to implement it using Lock though I can not lock a single element in a array only the whole method which means it is not coarse grained.
Using an array of ReentrantLock
public void add(E x){
if(x != null){
int index = hashC(x);
dataLock[index].lock();
try {
// Getting handle before adding
if(!data[index].contains(x))
data[index].add(x);
}finally {dataLock[index].unlock();}
}
}
The hash function
private int hashC(E x){
int k = x.hashCode();
int h = Math.abs(k % data.length);
return(h);
}
Presumably, hashC() is a function that is highly likely to produce unique numbers. As in, you have no guarantee that the hashes are unique, but the incidence of non-unique hashes is extremely low. For a data structure with a few million entries, you have a literal handful of collisions, and any given collision always consists of only a pair or maybe 3 conflicts (2 to 3 objects in your data structure have the same hash, but not 'thousands').
Also, assumption: the hash for a given object is constant. hashC(x) will produce the same value no matter how many times you call it, assuming you provide the same x.
Then, you get some fun conclusions:
The 'bucket' (The LinkedList instance found at array slot hashC(x) in data) that your object should go into, is always the same - you know which one it should be based solely on the result of hashC.
Calculating hashC does not require a lock of any sort. It has no side effects whatsoever.
Thus, knowing which bucket you need for a given operation on a single value (Be it add, remove, or check-if-in-collection) can be done without locking anything.
Now, once you know which bucket you need to look at / mutate, okay, now locking is involved.
So, just have 1 lock for each bucket. Not a List<Object> locks[];, that's a whole list worth of locks per bucket. Just Object[] locks is all you need, or ReentrantLock[] locks if you prefer to use lock/unlock instead of synchronized (lock[bucketIdx]) { ... }.
This is effectively fine-grained: After all, the odds that one operation needs to twiddle its thumbs because another thread is doing something, even though that other thread is operating on a different object, is very low; it would require the two different objects to have a colliding hash, which is possible, but extremely rare - as per assumption #1.
NB: Note that therefore lock can go away entirely, you don't need it, unless you want to build into your code that the code may completely re-design its bucket structure. For example, 1000 buckets feels a bit meh if you end up with a billion objects. I don't think 'rebucket everything' is part of the task here, though.
Faced a project with this code:
public class IndexUpdater implements Runnable {
#Override
public void run() {
final AtomicInteger count = new AtomicInteger(0);
FindIterable<Document> iterable = mongoService.getDocuments(entryMeta, null, "guid");
iterable.forEach(new Block<Document>() {
#Override
public void apply(final Document document) {
count.incrementAndGet();
// A lot of code....
if (count.get() / 100 * 100 == count.get()) {
LOG.info(String.format("Processing: %s", count.get()));
}
}
});
}
}
Here I am interested in three lines of code:
if (count.get() / 100 * 100 == count.get()) {
LOG.info(String.format("Processing: %s", count.get()));
}
Does this condition make sense considering multithreading and the type of the AtomicInteger variable? Or is this a pointless check?
Interestingly, IntellijIdea does not emphasize this construct as meaningless.
I wouldn't call this code "meaningless", but rather wrong (or, it has probably-unintended semantics).
If this were being invoked in a multithreaded way, you wouldn't always get the same value for count.get() on the three invocations in the method (there are four if you include count.incrementAndGet()).
The consequences of this don't look catastrophic in this case - you'd perhaps miss a few logging statements, and you might see some unexpected messages like Processing 101, and then wonder why the number isn't a multiple of 100. But perhaps if you employed the same construct elsewhere, there would be more significant implications.
Put the result of count.incrementAndGet() into a variable (*), so you can use that afterwards.
But then, it would be easier to use count.get() % 100 == 0 as well:
int value = count.incrementAndGet();
// A lot of code....
if (value % 100 == 0) {
LOG.info(String.format("Processing: %s", value));
}
which is both correct (or, it is probably what is intended) and easier to read.
(*) Depending on what you actually want to show with this logging message, you might want to put the count.incrementAndGet() after "A lot of code".
Given, that add method defination in ArrayList is as follows :-
public boolean add(E e) {
ensureCapacity(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
Please find following program to check Thread safety of ArrayList.
package pack4;
import java.util.ArrayList;
public class Demo {
public static void main(String[] args) {
ArrayList<String> al = new ArrayList<String>() ;
new AddFirstElementThread(al).start() ;
new RemoveFirstElementThread(al).start() ;
}
}
class AddFirstElementThread extends Thread{
ArrayList<String> list ;
public AddFirstElementThread(ArrayList<String> l) {
list = l ;
}
#Override
public void run() {
while(true){
if(list.size() == 0){
list.add("First element") ;
}
}
}
}
class RemoveFirstElementThread extends Thread{
ArrayList<String> list ;
public RemoveFirstElementThread(ArrayList<String> l) {
list = l ;
}
#Override
public void run() {
while(true){
if(list.isEmpty()){
try{
list.get(0) ;
System.out.println("Hence Proved, that ArrayList is not Thread-safe.");
System.exit(1) ;
}catch (Exception e) {
//continue, if no value is there at index 0
}
}
}
}
}
But, the program never terminates, thus fails to prove thread-safety of ArrayList.
Please, suggest correct implementation to test Thread-safe behaviour of ArrayList and Vector.
Thanks & Best Regards,
Rits
ArrayList is not thread-safe; Vector is. You can wrap an ArrayList with Collections.synchronizedList() if you require it.
The point about unsafe code is there is no guarantee how it will behave when multiple thread are used. You cannot guarantee unsafe code will fail. This is because code is not written to be unsafe, it may not have any guarantees that it is. Thread safety can only be determined by reading and understanding the code.
The problem with thread safety is that it is very hard to prove experimentally. Its not easy to prove something is not thread safe unless you know the exact edge case which will trigger an issue. Additionally, thread safety issues are more or less likely to show depending on the architecture of your system and the load on it. i.e. it can work fine for days and fail unpredictably.
"Thread-safety" on Collections was a pretty bad idea to start with for the vast majority of cases, because it's way too narrow and you need some higher level synchronization anyhow.
In case you really want to remove the first element or add an element to a list, you are better off with for example this here, but your contrived example may need some higher synchronization anyhow (depends on what exactly the semantics should be - if you don't see why, it's probably a really good idea to read something about concurrency)
And finally just have a look at the concurrent framework.
This code should produce even and uneven output because there is no synchronized on any methods. Yet the output on my JVM is always even. I am really confused as this example comes straight out of Doug Lea.
public class TestMethod implements Runnable {
private int index = 0;
public void testThisMethod() {
index++;
index++;
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
Output
Thread[Thread-8,5,main] 135134
Thread[Thread-8,5,main] 135136
Thread[Thread-8,5,main] 135138
Thread[Thread-8,5,main] 135140
Thread[Thread-8,5,main] 135142
Thread[Thread-8,5,main] 135144
I tried with volatile and got the following (with an if to print only if odd):
Thread[Thread-12,5,main] 122229779
Thread[Thread-12,5,main] 122229781
Thread[Thread-12,5,main] 122229783
Thread[Thread-12,5,main] 122229785
Thread[Thread-12,5,main] 122229787
Answer to comments:
the index is infact shared, because we have one TestMethod instance but many Threads that call testThisMethod() on the one TestMethod that we have.
Code (no changes besides the mentioned above):
public class TestMethod implements Runnable {
volatile private int index = 0;
public void testThisMethod() {
index++;
index++;
if(index % 2 != 0){
System.out.println(Thread.currentThread().toString() + " "
+ index );
}
}
public void run() {
while(true) {
this.testThisMethod();
}
}
public static void main(String args[]) {
int i = 0;
TestMethod method = new TestMethod();
while(i < 20) {
new Thread(method).start();
i++;
}
}
}
First off all: as others have noted there's no guarantee at all, that your threads do get interrupted between the two increment operations.
Note that printing to System.out pretty likely forces some kind of synchronization on your threads, so your threads are pretty likely to have just started a time slice when they return from that, so they will probably complete the two incrementation operations and then wait for the shared resource for System.out.
Try replacing the System.out.println() with something like this:
int snapshot = index;
if (snapshot % 2 != 0) {
System.out.println("Oh noes! " + snapshot);
}
You don't know that. The point of automatic scheduling is that it makes no guarantees. It might treat two threads that run the same code completely different. Or completely the same. Or completely the same for an hour and then suddenly different...
The point is, even if you fix the problems mentioned in the other answers, you still cannot rely on things coming out a particular way; you must always be prepared for any possible interleaving that the Java memory and threading model allows, and that includes the possibility that the println always happens after an even number of increments, even if that seems unlikely to you on the face of it.
The result is exactly as I would expect. index is being incremented twice between outputs, and there is no interaction between threads.
To turn the question around - why would you expect odd outputs?
EDIT: Whoops. I wrongly assumed a new runnable was being created per Thread, and therefore there was a distinct index per thread, rather than shared. Disturbing how such a flawed answer got 3 upvotes though...
You have not marked index as volatile. This means that the compiler is allowed to optimize accesses to it, and it probably merges your 2 increments to one addition.
You get the output of the very first thread you start, because this thread loops and gives no chance to other threads to run.
So you should Thread.sleep() or (not recommended) Thread.yield() in the loop.
I am wondering if it is possible to avoid the lost update problem, where multiple threads are updating the same date, while avoiding using synchronized(x) { }.
I will be doing numerous adds and increments:
val++;
ary[x] += y;
ary[z]++;
I do not know how Java will compile these into byte code and if a thread could be interrupted in the middle of one of these statements blocks of byte code. In other words are those statements thread safe?
Also, I know that the Vector class is synchronized, but I am not sure what that means. Will the following code be thread safe in that the value at position i will not change between the vec.get(i) and vec.set(...).
class myClass {
Vector<Integer> vec = new Vector<>(Integer);
public void someMethod() {
for (int i=0; i < vec.size(); i++)
vec.set(i, vec.get(i) + value);
}
}
Thanks in advance.
For the purposes of threading, ++ and += are treated as two operations (four for double and long). So updates can clobber one another. Not just be one, but a scheduler acting at the wrong moment could wipe out milliseconds of updates.
java.util.concurrent.atomic is your friend.
Your code can be made safe, assuming you don't mind each element updating individually and you don't change the size(!), as:
for (int i=0; i < vec.size(); i++) {
synchronized (vec) {
vec.set(i, vec.get(i) + value);
}
}
If you want to add resizing to the Vector you'll need to move the synchronized statement outside of the for loop, and you might as well just use plain new ArrayList. There isn't actually a great deal of use for a synchronised list.
But you could use AtomicIntegerArray:
private final AtomicIntegerArray ints = new AtomicIntegerArray(KNOWN_SIZE);
[...]
int len = ints.length();
for (int i=0; i<len; ++i) {
ints.addAndGet(i, value);
}
}
That has the advantage of no locks(!) and no boxing. The implementation is quite fun too, and you would need to understand it do more complex update (random number generators, for instance).
vec.set() and vec.get() are thread safe in that they will not set and retrieve values in such a way as to lose sets and gets in other threads. It does not mean that your set and your get will happen without an interruption.
If you're really going to be writing code like in the examples above, you should probably lock on something. And synchronized(vec) { } is as good as any. You're asking here for two operations to happen in sync, not just one thread safe operation.
Even java.util.concurrent.atomic will only ensure one operation (a get or set) will happen safely. You need to get-and-increment in one operation.