multiple threads accessing an ArrayList - java

i have an ArrayList that's used to buffer data so that other threads can read them
this array constantly has data added to it since it's reading from a udp source, and the other threads constantly reading from that array.Then the data is removed from the array.
this is not the actual code but a simplified example :
public class PacketReader implements Runnable{
pubic static ArrayList<Packet> buffer = new ArrayList() ;
#Override
public void run(){
while(bActive){
//read from udp source and add data to the array
}
}
public class Player implements Runnable(){
#Override
public void run(){
//read packet from buffer
//decode packets
// now for the problem :
PacketReader.buffer.remove(the packet that's been read);
}
}
The remove() method removes packets from the array and then shifts all the packets on the right to the left to cover the void.
My concern is : since the buffer is constantly being added to and read from by multiple threads , would the remove() method make issues since its gonna have to shift packets to the left?
i mean if .add() or .get() methods get called on that arraylist at the same time that shift is being done would it be a problem ?
i do get index out of bounds exception sometimes and its something like :
index : 100 size 300 , which is strange cuz index is within size , so i want to know if this is what may possibly be causing the problem or should i look for other problems .
thank you

It sounds like what you really want is a BlockingQueue. ArrayBlockingQueue is probably a good choice. If you need an unbounded queue and don't care about extra memory utilization (relative to ArrayBlockingQueue), LinkedBlockingQueue also works.
It lets you push items in and pop them out, in a thread-safe and efficient way. The behavior of those pushes and pops can differ (what happens when you try to push to a full queue, or pop from an empty one?), and the JavaDocs for the BlockingQueue interface have a table that shows all of these behaviors nicely.
A thread-safe List (regardless of whether it comes from synchronizedList or CopyOnWriteArrayList) isn't actually enough, because your use case uses a classic check-then-act pattern, and that's inherently racy. Consider this snippet:
if(!list.isEmpty()) {
Packet p = list.remove(0); // remove the first item
process(p);
}
Even if list is thread-safe, this usage is not! What if list has one element during the "if" check, but then another thread removes it before you get to remove(0)?
You can get around this by synchronizing around both actions:
Pattern p;
synchronized (list) {
if (list.isEmpty()) {
p = null;
} else {
p = list.remove(0);
}
}
if (p != null) {
process(p); // we don't want to call process(..) while still synchronized!
}
This is less efficient and takes more code than a BlockingQueue, though, so there's no reason to do it.

Yes there would be problems because ArrayList is not thread-safe, the internal state of the ArrayList object would be corrupted and eventually you would have some incorrect output or runtime exceptions appearing. You can try using synchronizedList(List list), or if it's a good fit you could try using a CopyOnWriteArrayList.
This issue is the Producer–consumer problem. You can see how much people fix it by using a lock of some kind taking turns extracting an object out of a buffer (a List in your case). There are thread safe buffer implementations you could look at as well if you don't necessarily need a List.

Related

What is the proper way to wait (block) until a LinkedBlockingQueue is nonempty, without mutating it? [duplicate]

I have a blocking queue of objects.
I want to write a thread that blocks till there is a object on the queue. Similar to the functionality provided by BlockingQueue.take().
However, since I do not know if I will be able to process the object successfully, I want to just peek() and not remove the object. I want to remove the object only if I am able to process it successfully.
So, I would like a blocking peek() function. Currently, peek() just returns if the queue is empty as per the javadocs.
Am I missing something? Is there another way to achieve this functionality?
EDIT:
Any thoughts on if I just used a thread safe queue and peeked and slept instead?
public void run() {
while (!exit) {
while (queue.size() != 0) {
Object o = queue.peek();
if (o != null) {
if (consume(o) == true) {
queue.remove();
} else {
Thread.sleep(10000); //need to backoff (60s) and try again
}
}
}
Thread.sleep(1000); //wait 1s for object on queue
}
}
Note that I only have one consumer thread and one (separate) producer thread. I guess this isn't as efficient as using a BlockingQueue... Any comments appreciated.
You could use a LinkedBlockingDeque and physically remove the item from the queue (using takeLast()) but replace it again at the end of the queue if processing fails using putLast(E e). Meanwhile your "producers" would add elements to the front of the queue using putFirst(E e).
You could always encapsulate this behaviour within your own Queue implementation and provide a blockingPeek() method that performs takeLast() followed by putLast() behind the scenes on the underlying LinkedBlockingDeque. Hence from the calling client's perspective the element is never removed from your queue.
However, since I do not know if I will be able to process the object successfully, I want to just peek() and not remove the object. I want to remove the object only if I am able to process it successfully.
In general, it is not thread-safe. What if, after you peek() and determine that the object can be processed successfully, but before you take() it to remove and process, another thread takes that object?
Could you also just add an event listener queue to your blocking queue, then when something is added to the (blocking)queue, send an event off to your listeners? You could have your thread block until it's actionPerformed method was called.
The only thing I'm aware of that does this is BlockingBuffer in Apache Commons Collections:
If either get or remove is called on
an empty Buffer, the calling thread
waits for notification that an add or
addAll operation has completed.
get() is equivalent to peek(), and a Buffer can be made to act like BlockingQueue by decorating a UnboundedFifoBuffer with a BlockingBuffer
The quick answer is, not there's not really a way have a blocking peek, bar implementing a blocking queue with a blocking peek() yourself.
Am I missing something?
peek() can be troublesome with concurrency -
If you can't process your peek()'d message - it'll be left in the queue, unless you have multiple consumers.
Who is going to get that object out of the queue if you can't process it ?
If you have multiple consumers, you get a race condition between you peek()'ing and another thread also processing items, resulting in duplicate processing or worse.
Sounds like you might be better off actually removing the item and process it using a
Chain-of-responsibility pattern
Edit: re: your last example: If you have only 1 consumer, you will never get rid of the object on the queue - unless it's updated in the mean time - in which case you'd better be very very careful about thread safety and probably shouldn't have put the item in the queue anyway.
Not an answer per se, but: JDK-6653412 claims this is not a valid use case.
Looks like BlockingQueue itself doesn't have the functionality you're specifying.
I might try to re-frame the problem a little though: what would you do with objects you can't "process correctly"? If you're just leaving them in the queue, you'll have to pull them out at some point and deal with them. I'd reccommend either figuring out how to process them (commonly, if a queue.get() gives any sort of invalid or bad value, you're probably OK to just drop it on the floor) or choosing a different data structure than a FIFO.
The 'simplest' solution
Do not process the next element until the previous element is processed succesfully.
public void run() {
Object lastSuccessfullyProcessedElement = null;
while (!exit) {
Object obj = lastSuccessfullyProcessedElement == null ? queue.take() : lastSuccessfullyProcessedElement; // blocking
boolean successful = process(obj);
if(!successful) {
lastSuccessfullyProcessedElement = obj;
} else {
lastSuccessfullyProcessedElement = null;
}
}
}
Calling peek() and checking if the value is null is not CPU efficient.
I have seen CPU usage going to 10% on my system when the queue is empty for the following program.
while (true) {
Object o = queue.peek();
if(o == null) continue;
// omitted for the sake of brevity
}
Adding sleep() adds slowness.
Adding it back to the queue using putLast will disturb the order. Moreover, it is a blocking operation which requires locks.

Using a PriorityBlockingQueue to feed in logged objects for processing

I have an application that reads in objects from multiple serialized object logs and hands them off to another class for processing. My question focuses on how to efficiently and cleanly read in the objects and send them off.
The code was pulled from an older version of the application, but we ended up keeping it as is. It hasn't really been used much until the past week, but I recently started looking at the code more closely to try and improve it.
It opens N ObjectInputStreams, and reads one object from each stream to store them in an array (assume inputStreams below is just an array of ObjectInputStream objects that corresponds to each log file):
for (int i = 0; i < logObjects.length; i++) {
if (inputStreams[i] == null) {
continue;
}
try {
if (logObjects[i] == null) {
logObjects[i] = (LogObject) inputStreams[i].readObject();
}
} catch (final InvalidClassException e) {
LOGGER.warn("Invalid object read from " + logFileList.get(i).getAbsolutePath(), e);
} catch (final EOFException e) {
inputStreams[i] = null;
}
}
The objects that were serialized to file are LogObject objects. Here is the LogObject class:
public class LogObject implements Serializable {
private static final long serialVersionUID = -5686286252863178498L;
private Object logObject;
private long logTime;
public LogObject(Object logObject) {
this.logObject = logObject;
this.logTime = System.currentTimeMillis();
}
public Object getLogObject() {
return logObject;
}
public long getLogTime() {
return logTime;
}
}
Once the objects are in the array, it then compares the log time and sends off the object with the earliest time:
// handle the LogObject with the earliest log time
minTime = Long.MAX_VALUE;
for (int i = 0; i < logObjects.length; i++) {
logObject = logObjects[i];
if (logObject == null) {
continue;
}
if (logObject.getLogTime() < minTime) {
index = i;
minTime = logObject.getLogTime();
}
}
handler.handleOutput(logObjects[index].getLogObject());
My first thought was to create a thread for each file that reads in and puts the objects in a PriorityBlockingQueue (using a custom comparator that uses the LogObject log time to compare). Another thread could then be taking the values out and sending them off.
The only issue here is that one thread could put an object on the queue and have it taken off before another thread could put one on that may have an earlier time. This is why the objects were read in and stored in an array initially before checking for the log time.
Does this constraint prohibit me from implementing a multi-threaded design? Or is there a way I can tweak my solution to make it more efficient?
As far as I understand your problem you need to process LogObjects strictly in order. In that case initial part of your code is totally correct. What this code does is merge sort of several input streams. You need to read one object for each stream (this is why temporary array is needed) then take appropriate (minimum/maximum) LogObject and handle to processor.
Depending on your context you might be able to do processing in several threads. The only thing you need to change is to put LogObjects in ArrayBlockingQueue and processors might runs on several independent threads. Another option is to send LogObjects for processing in ThreadPoolExecutor. Last option is more simple and straightforward.
But be aware of several pitfalls on the way:
for this algorithm to work correctly individual streams must be already sorted. Otherwise your program is broken;
when you do processing in parallel message processing order is strictly speaking is not defined. So proposed algorithms only guarantees message processing start order (dispatch order). It might be not what you want.
So now you should face several questions:
Do processing order is really required?
If so, does global order required (over all messages) or local one (over independent group of messages)?
Answer to those question will have great impact on your ability to do parallel processing.
If the answer on first question is yes, sadly, parallel processing is not an option.
I agree with you. Throw this away and use a PriorityBlockingQueue.
The only issue here is that if Thread 1 has read an object from File 1 in and put it in the queue (and the object File 2 was going to read in has an earlier log time), the reading Thread could take it and send it off, resulting in a log object with a later time being sent first
This is exactly like the merge phase of a balanced merge (Knuth ACP vol 3). You must read the next input from the same file as you got the previous lowest element from.
Does this constraint prohibit me from implementing a multi-threaded design?
It isn't a constraint. It's imaginary.
Or is there a way I can tweak my solution to make it more efficient?
Priority queues are already pretty efficient. In any case you should certainly worry about correctness first. Then add buffering ;-) Wrap the ObjectInputStreams around BufferedInputStreams, and ensure there is a BufferedOutputStream in your output stack.

How can I stop two threads colliding when accessing java ArrayList?

I have two threads which both need to access an ArrayList<short[]> instance variable.
One thread is going to asynchronously add short[] items to the list via a callback when new data has arrived : void dataChanged(short[] theData)
The other thread is going to periodically check if the list has items and if it does it is going to iterate over all the items, process them, and remove them from the array.
How can I set this up to guard for collisions between the two threads?
This contrived code example currently throws a java.util.ConcurrentModificationException
//instance vairbales
private ArrayList<short[]> list = new ArrayList<short[]>();
//asynchronous callback happening on the thread that adds the data to the list
void dataChanged(short[] theData) {
list.add(theData);
}
//thread that iterates over the list and processes the current data it contains
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
while (true) {
for(short[] item : list) {
//process the data
}
//clear the list to discared of data which has been processed.
list.clear();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
});
You might want to use a producer consumer queue like an ArrayBlockingQueue instead or a similar concurrent collection.
The producer–consumer problem (also known as the bounded-buffer problem) is a classic example of a multi-process synchronization problem. The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue. The producer's job is to generate a piece of data, put it into the buffer and start again. At the same time, the consumer is consuming the data (i.e., removing it from the buffer) one piece at a time. The problem is to make sure that the producer won't try to add data into the buffer if it's full and that the consumer won't try to remove data from an empty buffer.
One thread offers short[]s and the other take()s them.
The easiest way is to change the type of list to a thread safe list implementation:
private List<short[]> list = new CopyOnWriteArrayList<short[]>();
Note that this type of list is not extremely efficient if you mutate it a lot (add/remove) - but if it works for you that's a simple solution.
If you need more efficiency, you can use a synchronized list instead:
private List<short[]> list = Collections.synchronizedList(new ArrayList<short[]>());
But you will need to synchronize for iterating:
synchronized(list) {
for(short[] item : list) {
//process the data
}
}
EDIT: proposals to use a BlockingQueue are probably better but would need more changes in your code.
You might look into a blockingqueue for this instead of an arraylist.
Take a look at Java's synchronization support.
This page covers making a group of statements synchronized on a specified object. That is: only one thread may execute any sections synchronized on that object at once, all others have to wait.
You can use synchronized blocks, but I think the best solution is to not share mutable data between threads at all.
Make each thread to write in its own space and collect and aggregate the results when the workers are finished.
http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#synchronizedList%28java.util.List%29
You can ask the Collections class to wrap up your current ArrayList in a synchronized list.

what to use in multithreaded environment; Vector or ArrayList

I have this situation:
web application with cca 200 concurent requests (Threads) are in need to log something to local filesystem. I have one class to which all threads are placing their calls, and that class internally stores messages to one Array (Vector or ArrayList) which then in turn will be written to filesystem.
Idea is to return from thread's call ASAP so thread can do it's job as fast as possible, what thread wanted to log can be written to filesystem later, it is not so crucial.
So, that class in turn removes first element from that list and writes it to filesystem, while in real time there is 10 or 20 threads which are appending new logs at the end of that list.
I would like to use ArrayList since it is not synchronized and therefore thread's calls will last less, question is:
am I risking deadlocks / data loss? Is it better to use Vector since it is thread safe? Is it slower to use Vector?
Actually both ArrayList and Vector are very bad choices here, not because of synchronization (which you would definitely need), but because removing the first element is O(n).
The perfect data structure for your purspose is the ConcurrentLinkedQueue: it offers both thread safety (without using synchronization), and O(1) adding and removing.
Are you limitted to particular (old) java version? It not please consider using java.util.concurrent.LinkedBlockingQueue for this kind of stuff. It's really worth looking at java.util.concurrent.* package when dealing with concurrency.
Vector is worse than useless. Don't use it even when using multithreading. A trivial example of why it's bad is to consider two threads simultaneously iterating and removing elements on the list at the same time. The methods size(), get(), remove() might all be synchronized but the iteration loop is not atomic so - kaboom. One thread is bound to try removing something which is not there, or skip elements because the size() changes.
Instead use synchronized() blocks where you expect two threads to access the same data.
private ArrayList myList;
void removeElement(Object e)
{
synchronized (myList) {
myList.remove(e);
}
}
Java 5 provides explicit Lock objects which allow more finegrained control, such as being able to attempt to timeout if a resource is not available in some time period.
private final Lock lock = new ReentrantLock();
private ArrayList myList;
void removeElement(Object e) {
{
if (!lock.tryLock(1, TimeUnit.SECONDS)) {
// Timeout
throw new SomeException();
}
try {
myList.remove(e);
}
finally {
lock.unlock();
}
}
There actually is a marginal difference in performance between a sychronizedlist and a vector. (http://www.javacodegeeks.com/2010/08/java-best-practices-vector-arraylist.html)

Best approach to use in Java 6 for a List being accessed concurrently

I have a List object being accessed by multiple threads. There is mostly one thread, and in some conditions two threads, that updates the list. There are one to five threads that can read from this list, depending on the number of user requests being processed.
The list is not a queue of tasks to perform, it is a list of domain objects that are being retrieved and updated concurrently.
Now there are several ways to make the access to this list thread-safe:
-use synchronized block
-use normal Lock (i.e. read and write ops share same lock)
-use ReadWriteLock
-use one of the new ConcurrentBLABLBA collection classes
My question:
What is the optimal approach to use, given that the cricital sections typically do not contain a lot of operations (mostly just adding/removing/inserting or getting elements from the list)?
Can you recommend another approach, not listed above?
Some constrains
-optimal performance is critical, memory usage not so much
-it must be an ordered list (currently synchronizing on an ArrayList), although not a sorted list (i.e. not sorted using Comparable or Comparator, but according to insertion order)
-the list will is big, containing up to 100000 domain objects, thus using something like CopyOnWriteArrayList not feasible
-the write/update ciritical sections are typically very quick, doing simple add/remove/insert or replace (set)
-the read operations will do primarily a elementAt(index) call most of the time, although some read operations might do a binary search, or indexOf(element)
-no direct iteration over the list is done, though operation like indexOf(..) will traverse list
Do you have to use a sequential list? If a map-type structure is more appropriate, you can use a ConcurrentHashMap. With a list, a ReadWriteLock is probably the most effective way.
Edit to reflect OP's edit: Binary search on insertion order? Do you store a timestamp and use that for comparison, in your binary search? If so, you may be able to use the timestamp as the key, and ConcurrentSkipListMap as the container (which maintains key order).
What are the reading threads doing? If they're iterating over the list, then you really need to make sure no-one touches the list during the whole of the iteration process, otherwise you could get very odd results.
If you can define precisely what semantics you need, it should be possible to solve the issue - but you may well find that you need to write your own collection type to do it properly and efficiently. Alternatively, CopyOnWriteArrayList may well be good enough - if potentially expensive. Basically, the more you can tie down your requirements, the more efficient it can be.
I don't know if this is a posible solution for the problem but... it makes sense to me to use a Database manager to hold that huge amount of data and let it manage the transactions
I second Telcontar's suggestion of a database, since they are actually designed for managing this scale of data and negotiating between threads, while in-memory collections are not.
You say that the data is on a database on the server, and the local list on the clients is for the sake of user interface. You shouldn't need to keep all 100000 items on the client at once, or perform such complicated edits on it. It seems to me that what you want on the client is a lightweight cache onto the database.
Write a cache that stores only the current subset of data on the client at once. This client cache does not perform complex multithreaded edits on its own data; instead it feeds all edits through to the server, and listens for updates. When data changes on the server, the client simply forgets and old data and loads it again. Only one designated thread is allowed to read or write the collection itself. This way the client simply mirrors the edits happening on the server, rather than needing complicated edits itself.
Yes, this is quite a complicated solution. The components of it are:
A protocol for loading a range of the data, say items 478712 to 478901, rather than the whole thing
A protocol for receiving updates about changed data
A cache class that stores items by their known index on the server
A thread belonging to that cache which communicated with the server. This is the only thread that writes to the collection itself
A thread belonging to that cache which processes callbacks when data is retrieved
An interface that UI components implement to allow them to recieve data when it has been loaded
At first stab, the bones of this cache might look something like this:
class ServerCacheViewThingy {
private static final int ACCEPTABLE_SIZE = 500;
private int viewStart, viewLength;
final Map<Integer, Record> items
= new HashMap<Integer, Record>(1000);
final ConcurrentLinkedQueue<Callback> callbackQueue
= new ConcurrentLinkedQueue<Callback>();
public void getRecords (int start, int length, ViewReciever reciever) {
// remember the current view, to prevent records within
// this view from being accidentally pruned.
viewStart = start;
viewLenght = length;
// if the selected area is not already loaded, send a request
// to load that area
if (!rangeLoaded(start, length))
addLoadRequest(start, length);
// add the reciever to the queue, so it will be processed
// when the data has arrived
if (reciever != null)
callbackQueue.add(new Callback(start, length, reciever));
}
class Callback {
int start;
int length;
ViewReciever reciever;
...
}
class EditorThread extends Thread {
private void prune () {
if (items.size() <= ACCEPTABLE_SIZE)
return;
for (Map.Entry<Integer, Record> entry : items.entrySet()) {
int position = entry.key();
// if the position is outside the current view,
// remove that item from the cache
...
}
}
private void markDirty (int from) { ... }
....
}
class CallbackThread extends Thread {
public void notifyCallback (Callback callback);
private void processCallback (Callback) {
readRecords
}
}
}
interface ViewReciever {
void recieveData (int viewStart, Record[] records);
void recieveTimeout ();
}
There's a lot of detail you'll have to fill in for yourself, obviously.
You can use a wrapper that implements synchronization:
import java.util.Collections;
import java.util.ArrayList;
ArrayList list = new ArrayList();
List syncList = Collections.synchronizedList(list);
// make sure you only use syncList for your future calls...
This is an easy solution. I'd try this before resorting to more complicated solutions.

Categories