Lock-free array expansion in Java

Lock-free array expansion in Java - java

I have an array to which many threads are writing. However each thread has a pre-assigned range of indices which it may write to. Further, nothing will be reading from the array until all threads are done.
So far, so thread-safe. The problem arises when I need to expand the array, by which of course I mean swap it out for a larger array which copies the first. This is only done occasionally (similar to an ArrayList).
Currently I'm acquiring a lock for every single write to the array. Even though there is no need to lock in order to keep the array consistent, I'm having to lock in case the array is currently being copied/swapped.
As there are very many writes I don't want to require a lock for them. I'm okay with a solution which requires locking for writer threads only while the array is being copied and swapped, as this is infrequent.
But I can't just impose write locks only when the copy/swap is in progress, as threads may already be committing writes to the old array.
I think I need some variety of barrier which waits for all writes to complete, then pauses the threads while I copy/swap the array. But CyclicBarrier would require me to know exactly how many threads are currently active, which is non-trivial and possibly susceptible to edge-cases in which the barrier ends up waiting forever, or lowers itself too early. In particular I'm not sure how I'd deal with a new thread coming in while the barrier is already up, or how to deal with threads which are currently polling a job queue, so will never decrement the barrier count while there are no new jobs.
I may have to implement something which (atomically) counts active threads and tries to pre-empt all the edge cases.
But this may well be a "solved" problem that I don't know about, so I'm hoping there may be a simpler (therefore better) solution than the Cyclic barrier/thread counting. Ideally one which uses an existing utility class.
By the way, I've considered CopyOnWriteArray. This is no use to me, as it copies for every write (a lot of them), not just array expansions.
Also note the structure written to pretty much has to be an array, or array-based.
Thanks

Although it's technically not correct, you can probably use a ReadWriteLock. The threads that are writing to a single portion all use a read lock (this is the technically incorrect part, they're not reading...), and the resize uses a write lock. That way, all writing threads can work together. A resize has to wait until all portioned writes are done, which then blocks the entire array. Once that is done, all portioned writes can continue.

There is a solution, although there will be some overhead, but no locking.
But first, I would recommend using a 2-D array (an array of arrays) unless you absolutely need a 1-D array. You can then expand the top-level array without affecting the contents of the lower-level arrays. You can also write a wrapper class for this to access the whole thing using 1-D indices if you wish.
But if you really want to have a 1-D array, I would recommend the following:
I am assuming each thread has some number which it knows which uniquely identifies itself and can be converted to a small index (else, I don't see how you index into the main array).
I also assume you have a reference to the main array called mainArray which is a statically accessible, but it also could be injected into the threads. It should be declared volatile.
You need another array currentArrays of length numberOfThreads, also available to all of the threads. Each array element will contain a reference of the main array the thread is currently using.
When you need to grow the array, allocate a new array and write its reference to mainArray. You don't need to copy anything at this point.
Before accessing the main array in your threads you need to grab a local reference to it (i.e., a local variable) by assigning from mainArray.
Then compare the grabbed reference with the reference in currentArrays. If it is the same, carry on, being careful to use the local reference.
If it is different, call a method (that you will write) to copy the part of the previous array for your thread to the new array and then carry on as before. Write the new array reference to currentArrays for that thread. Again, use the local reference until you are done.
The old array should be garbage collected once all of the threads have finished copying their part of it, which means not until all threads have had at least one request requiring it.
There will be some initialisation code for first time use which should be obvious (all currentArrays elements are set to mainArray).
I believe this should work. There is obviously the overhead of comparing array references before you can access the array; however, if you do a lot of array accesses in a single transaction/request you can save the array reference that you grabbed, pass it around and only recheck when you need to grab it again. That should reduce the overhead.
Disclaimer: I haven't tested it. Comments welcome.

Related

Is there a Java data structure that is thread-safe for parallel threads writing to different parts of an array of fixed size?

This is what I'm trying to implement:
A (singleton) array of fixed size (say 1000 elements)
A pool of threads writing smaller (<=100) element blocks to that array in parallel
We are guaranteed that total writes by all threads in the pool will write <1000 elements, so we never have to grow the array.
The order of writes doesn't matter but they have to be contiguous, e.g Thread1 populates array indexes 0-49, Thread 3 indexes 50-149, Thread 2 indexes 149-200
Is there a thread-safe data structure to achieve this?
Clearly, I would need to synchronize the "index manager" which allocates where in the array indexes a given thread needs to write. But is there a Java data structure for the array itself that can be used for this, without worrying about thread safety?

You should be able to use an AtomicReferenceArray. You can safely update indexes or atomically update with compareAndSet (though it appears you wont need that).
Editing to address akhil_mittal's question.
Let's switch the train of thought from updating an array to updating individual fields. If you were to update a field in a class the write will occur without word tearing, it won't be the case that the write will be some bits from one thread and some bits from another thread. The same is true for array indexes.
However, if you were to update a field in a class by multiple threads, the write from one thread may not be immediately visible to another thread. That is because the write may be buffered on a processor cache and eventually flushed to the other processors. The same is true for an array write to a particular index. It will be eventually visible but does not guarantee a happens-before ordering.
do we still need to concern about thread safety
You would need to worry about thread-safety the same way you would need to worry about thread-safety for a non-volatile field. It turns out that DVK may not need to worry about the writes being immediately visible.
The point of this answer is to explain that array writes are not necessarily thread-safe and using an AtomicReferenceArray can protect you from delayed writes.

Your question has been answered already by others so I'll just add examples:
Adding to an array by different threads is the way parallel sort works.
Creating arrays with the Fork/Join framework does so by the work-threads writing to different parts of the array.
Go ahead and do it, you're fine.

Thread safety when only one thread is writing

I know if two threads are writing to the same place I need to make sure they do it in a safe way and cause no problems but what if just one thread reads and does all the writing while another just reads.
In my case I'm using a thread in a small game for the 1st time to keep the updating apart from the rendering. The class that does all the rendering will never write to anything it reads, so I am not sure anymore if I need handle every read and write for everything they both share.
I will take the right steps to make sure the renderer does not try to read anything that is not there anymore but when calling things like the player and entity's getters should I be treating them in the same way? or would setting the values like x, y cords and Booleans like "alive" to volatile do the trick?
My understanding has become very murky on this and could do with some enlightening
Edit: The shared data will be anything that needs to be drawn and moved and stored in lists of objects.
For example the player and other entity's;

With the given information it is not possible to exactly specify a solution, but it is clear that you need some kind of method to synchronize between the threads. The issue is that as long as the write operations are not atomic that you could be reading data at the moment that it is being updates. This means that you for instance get an old y-coordinate with a new x-coordinate.
Basically you only do not need to worry about synchronization if both threads are only reading the information or - even better - if all the data structures are immutable (so both threads can not modify the objects). The best way to proceed is to think about which operations need to be atomic first, and then create a solution to make the operations atomic.
Don't forget: get it working, get it right, get it optimized (in that order).

You could have problems in this case if list's sizes are variable and you don't synchronize the access to them, consider this:
read-only thread reads mySharedList size and it sees it is 15; at that moment its CPU time finishes and read-write thread is given the CPU
read-write thread deletes an element from the list, now its size is 14.
read-only thread is again granted CPU time, it tries to read the last element using the (now obsolete) size it read before being interrupted, you'll have an Exception.

Detemining queue size

I have to implement a queue to which object will be added and removed by two different threads at different time based on some factor.My problem is the requirement says the queue( whole queue and data it hold) should not take 200KB+ data .If size is 200 thread should wait for space to be available to push more data.Object pushed may vary in size.I can create java queue obut the size of queue will return the total object pushed instead of total memory used How do i determine the totla size of data my queue is refering to .
Consider the object pushed as
class A{
int x;
byte[] buf;//array size vary per object
}

There is no out of the box functionality for this in Java. (In part, because there is no easy way to know if the objects added to the collection are referenced elsewhere and therefore if adding them takes up additional memory.)
For your use case, you would probably be best of just subclassing queue. Override the super to add the size of the object to a counter (obviously you will have to make this calculation thread safe.) and to throw an exception IllegalStateException if it doesn't have room. Similarly decrement your counter if on an overridden remove class.
The method of determining how to much space to add to the counter could vary. Farlan suggested using this and that looks like it would work. But since you are suggesting that you are dealing with a byte array, the size of the data you are adding might already be known to you. You will also have to consider whether you want to consider any of the overhead. The object takes some space, as does the reference inside of the queue itself. Plus the queue object. You could figure out exact values for that, but since it seems like your requirement is just to prevent outofmemory, you could probably just use rough estimates for those as long as you are consistent.
The details of what queue class you want to subclass may depend on how much contention you think there will be between the threads. But it sounds like you have a handle on the sync issues.

Synchronizing on an array. Does it synchronize on all the elements or on the array object?

If I synchronize on an array, does this mean that I'm synchronizing on all the elements in it or am I synchronizing on the array object? If the latter is true, then how can I synchronize on all the elements in an array at once so that I can make sure non will be accessed while executing a certain block?
E.g.
Lets say we have an array of bank accounts and we want to make sure no thread can access any account when a certain block of code is being executed.

It synchronizes on the monitor for the array itself.
Even if you could synchronize on all the elements, that wouldn't ensure that they weren't accessed - because synchronization is only advisory.
The solution here is probably encapsulation: don't allow other code to see the array itself at all. That way you can control how other code is able to access the members of the array, via your own methods (like ArrayList does, for example).
Note that even if you do all of this, it won't stop other code from fetching the array element before your exclusive code starts running, and then using that reference while your exclusive code is running (e.g. mutating the object it refers to). You haven't really given us much information about what you're trying to do, but you may need to take a different approach.

Java 9 handles memory differently and has more options for synchronizing on array elements. For a detailed technical explanation see "Using JDK 9 Memory Order Modes" http://gee.cs.oswego.edu/dl/html/j9mm.html
If this link goes away in the future, google "java varhandle" to learn more.

Accessing one array with multiple threads but either only reading or only writing

I'm wondering if there could be any problems while accessing one array with multiple threads but either only reading or only writing.
When the threads write to the array it wouldn't matter in which order they write and even if they write to the same entry all threads would write the same value.
For example, if I want to find prime numbers via the Sieve of Eratosthenes:
I create an array of consecutive numbers and set all multiples of prime numbers to 0 using multiple threads.
It wouldn't matter if the thread which strikes off the multiples of two and the thread which strikes off the multiples of 5 set the entry of the number 20 to 0 at the same time or one before or after the other.
So it's not an question of the qualitiy or consistency of the data, but of the technical possibility to do it wihout facing any java errors.

I'm assuming you mean 'without synchronization controls'. The short answer is no.
Synchronization is used for 2 reasons:
Mutual exclusion of data
communication between threads
Your setup indicates that the first reason isn't really a problem in your case. The algorithm effectively separates the data out so that multiple worker threads won't be using the same data.
However, in order for changes done in one thread to become visible to another thread, you must use synchronization. Without synchronization, the JVM makes no guarantee as to the ordering of writes. Updates that one thread makes may be visible in another thread at any time later, or even never. See Effective Java Item #66, and maybe look at the Java Concurrency in Practice book.

I don't think it would work since eventually you need to read the variables (to output them, save to disk, etc.). And the read has to be synchronized in order to guarantee correct interthread operation ordering. Remember that without synchronization java only guarantees intrathread operation ordering.
Now, you can say that you don't want to read them at all in anyway, but if that is the case, java can just optimize throwing away the whole code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.