Suppose I have an ArrayList<Foo>. This list is being modified very frequently. Even the values of the elements in the list will have values that change all the time. All of the modifications to this list are being performed by the main thread.
How would I go about cloning the list (deep level, elements should be cloned also) in a separate thread in such a way that it does not delay the main thread (or at least not by much), and the copied list contains a snapshot in time (I think the term is atomically) of all of the Foo objects, with there values identical to the original list (again in one snapshot of time).
Thanks in advance. I know the solution has to do with synchronization but I am at a loss in meeting all of the above criteria.
There isn't a simple answer to this unfortunately. If the modifications being made to the list are already threadsafe, you can grab a lock on the list (or whatever you're using for synchronization), make your copy, and release the lock. You'll need to make sure that any modifications to the items themselves are using the same lock.
Alternatively, you could use immutable constructs (but you need to use this not just for the list, but for the list contents as well), then you'll never need to lock and can just grab a copy of the list at your leisure.
There are downsides and risks to both approaches. If you want a good resource, I strongly recommend Java Concurrency in Practice.
Related
I have multiple threads running in a Java application, and all of them need to access the same list. Only one thread, however, actually needs to insert/delete/change the list, and the others just need to access it.
In the other threads, I want to make a copy of the list for me to use whenever I need to read it, but is there any way to do this in a thread safe way? If I had a method to copy the list element by element and the list changed, that would mess it up wouldn't it?
EDIT:
The list will not be deleted from very often, so would it work if I just copied it normally and caught exceptions? If the list grew in the middle of the copy and I missed it, it wouldn't really make a difference to functionality
You can use CopyOnWriteArrayList for your purpose.
CopyOnWriteArrayList is a concurrent Collection class introduced in Java 5 Concurrency API along with its popular cousin ConcurrentHashMap in Java.
As name suggest CopyOnWriteArrayList creates copy of underlying
ArrayList with every mutation operation e.g. add or set. Normally
CopyOnWriteArrayList is very expensive because it involves costly
Array copy with every write operation but its very efficient if you
have a List where Iteration outnumber mutation e.g. you mostly need to
iterate the ArrayList and don't modify it too often.
With this collection, you shouldn't create a new instance every time. You should have only one object of this and it will work.
Hmm, so I think that what are you looking for is called CopyOnWriteArrayList.
CopyOnWriteArrayList - A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
Ref: CopyOnWriteArrayList
You can use CopyOnWriteArrayList which is thread safe ,but it create new one on every update process.
Or you can use readWriteLock so when update use no one can read while multiple thread can read simultaneously .
I decided to solve this by having a separate thread that handles the thread, with BlockingQueue for the other threads to submit to if they want to write to the list, or get from the list. If they wanted to write to the list it would submit an object with the content that they wanted to write, and if they wanted to read, it would submit a Future that the thread with the list would populate
Depending on your particular usage, you might benefit from one of these:
If you don't really need random access, use ConcurrentLinkedQueue. No explicit synchronization required.
If you don't need random access for writes but only need it for reads, use ConcurrentLinkedQueue for writes and copy it to a list from time to time if changes were made to the queue (in a separate thread), give this list to "readers". Does not require explicit synchronization; gives a "weakly consistent" read view.
Since your writes come from one thread, the previous could work with 2 lists (e.g. the writing thread will copy it to the "reading view" from time to time). However, be aware that if you use an ArrayList implementation and require random access for writes then you are looking at constant copies of memory regions, not good even in the absence of excessive synchronization. This option requires synchronization for the duration of copying.
Use a map instead, ConcurrentHashMap if you don't care about ordering and want O(1) performance or ConcurrentSkipListMap if you do need ordering and are ok with O(logN) performance. No explicit synchronization required.
Use Collections.synchronizedList().
Example :
Collections.synchronizedList(new ArrayList<YourClassNameHere>())
I have some objects that are created/destroyed very often and that can exist in many lists at the same time. To ensure I have no references left to them the objects have a flag isDestroyed, and if this is set, each list is responsible for removing the object from the list.
However this is ofcourse a growing ground for memory leaks. What if I forget to remove objects from one of the lists? To visually monitor that the program behaves correctly, I override finalize and increase a global variable to track destructions (not a formal test, only to get an idea). However as I have no control over the GC, I could in theory wait forever until something is destroyed.
So the question is two-fold: When having objects that are in multiple lists, is a "isDestroyed" considered a good way to control the object lifetime? It forces everyone who uses the object to take care to remove it from their lists, which seems bad.
And, is there any good way to see when the reference count reaches zero on an object, ie when its scheduled for destruction?
EDIT: To be more specific, in my case I my objects represent physical entities in a room. And I have one manager class that draws each object, therefore it is in one list. Another list contains all the objects that are clickable, so there I have another list. Having all objects in one list and using polymorphism or instance of is not an option in this case. When a object is "destroyed", it should neither be shown or clickable in any way, therefore I want to remove it from both lists.
You should have a look at the java.lang.ref Package.
And, is there any good way to see when the reference count reaches
zero on an object, ie when its scheduled for destruction?
You can use the ReferenceQueue Object
From JavaDoc of java.lang.ref.ReferenceQueue
Reference queues, to which registered reference objects are appended
by the garbage collector after the appropriate reachability changes
are detected.
I think this what WeakReference and ReferenceQueue is for - you create a WeakReference for the object you are tracking and associate it with a ReferenceQueue. Then you have another thread that processes WeakReference(s) as it is returned from ReferenceQueue.remove(). WeakReference's are added to ReferenceQueue when the referenced objects is GC'd. But can you give an example on what these lists you are trying to clean up when the referenced objects are dead?
The way this is usually handled is through the Observer pattern. Each list attaches a destroy-listener that gets notified upon destruction. How this meshes with you architecture, I have no details to judge from.
If you want to be notified I'm almost sure you need PhantomReference, read here:
http://weblogs.java.net/blog/2006/05/04/understanding-weak-references
So I have a SomeTask class which extends Thread, and it has Map and List fields. What would be the behavior when you don't do Collections.synchronizedXXX and you have multiple thread of SomeTask running?
Once a Map is called from the database (I am using Object Database to directly store POJO), would I need to synchronized the Map object returned from this database as well?
Map SomeTasksOwnMap = Collections.synchronizedMap(MapReturnedFromDatabase);
Collections.synchronizedXXX is required when 2 or more Threads are accessing the same Map/List.
If your task doesn't access other tasks Map/List, then there is no need to synchronize them.
Example.
Task 1 builds a list of numbers divisible exactly by 2.
Task 2 builds a list of numbers divisible exactly by 3.
These two tasks have individual lists that do not require synchronization.
Example require synchronization.
Task 1 and 2 both calculate numbers and store them in a shared list.
To answer the questions: "What would be the behavior when you don't", you could lose one of the writes if it was timed that both threads wanted to write to index 'x'.
You may also have a null value in the list as the size of the array was increased before the write to the location was done.
Basically you would have an inconsistent view.
No. There is nothing in your question that suggests synchronization is required, because as far as I can tell each thread reads only data within itself: You only need synchronization when threads access data in other threads.
As an aside, having SomeTask extends Thread is a poor design - it should extends Runnable, then use new Thread(new SomeTask()).start().
... should I synchronize all List and Maps?
No you shouldn't. Synchronizing things that don't need it is a waste of resources. And for things that do need synchronization, you need to do it the right way. (And the synchronizedXxx wrappers are not always the right way.)
First, you need to identify the data structures that are going to be visible to multiple threads. Data structures that are provably thread confined don't need synchronizing at all.
Second, you need to examine the way that the data structures are used to see if a synchronizedXxx wrapper is sufficient. For instance, these wrappers don't synchronize iteration, and you can get into trouble if one thread changes a collection while another one is iterating it.
Finally, you need to think about whether the synchronized data structures are heavily used by different threads. The synchronzedXxx wrappers can result in a performance bottleneck if the data structure is heavily used. If this is the case, you should consider using one of the ConcurrentYyyy classes instead.
I need to write an arraylist to a file.
It gets filled all the time and when it gets too big I need to start writing it.
So I thought to check when the arraylist sise is greater then 100 and then append the file and write the current rows .
But the problem is sometimes it doesnt get filled for a few minuettes and I will want to dump the data to a file.
So my second thought was to have another thread that will check if there are rows every few sec and dump it to the file.
But than I would need to manage locks between threads.
My questions are :
1. Is the multithread design ok ?
2. Is there an arraylist that supports multithread in any form ?
You can make synchronized lists using Collections.synchronizedList.
Check out this thread.
You should use
List synchronizedList = Collections.synchronizedList(list);
Instead of ArrayList I would recommend you to use proper implementation of Queue<E>. If you are only appending data to the list and then dumping it to the file removing saved items from the list, queue is a much better choice.
Some implementations are threads safe and will even allow the caller thread to block until something actually appears in the queue - which is much better approach than having a polling thread. BlockingQueue looks very promising for your case.
From your question it appears most of the time you are going to perform write operation and that too from a single thread & will be intermittently checking for the size of list.
other wise you can not use plain old ArrayList.
synchronizing the list and then using locks to access the list looks like a overkill.
rather have a if check that will check the size.
If you are going to access list in multiple threads then to avoid ConcurrentModificationException use the method suggested by #Ludevik
There are other approaches as well but for the sake of simplicity #Ludevik approach fits the bill.
Instead of arrays you can use vectors. Since vectors are thread-safe.
ArrayList is not thread-safe, but you can get a thread-safe list with Collections.synchronizedList()
You need to use Collections.synchronizedList(List) to create a thread safe list. However, you still need to synchronize operations such as add or remove and synchronize the updates to the objects held in the list.
The simple solutions are to use Collections.synchronizedList(list) or Vector. However, there is a gotcha.
The iterator() method for a synchronized list / Vector created as above is NOT synchronized. So there's nothing to stop a thread from trying to add new element to the list if you copy it using an iterator explicitly, by using for (type var : list) {...} or by using a copy constructor that relies on the iterator.
This is liable to result in concurrent modification exceptions. To avoid that problem, you will need to do your own locking.
It may be better idea to use a concurrent Queue class so that the thread that writes stuff to the file doesn't need to iterate a list.
In my application, I have a key-value map that serves as a central repository for storing data that is used to return to a defined state after a crash or restart (checkpointing).
The application is multithreaded and several threads may put key-value pairs into that map. One thread is responsible for regularly creating a checkpoint, i. e. serialize the map to persistant storage.
While the checkpoint is being written, the map should remain unchanged. It's rather easy to avoid new items being added, but what about other threads changing members of "their" objects inside the map?
I could have a single object whose monitor is seized when the checkpointing starts and wrap all write access to any member of the map, and members thereof, in blocks synchronizing on that object. This seems very error-prone and tedious to me.
I could also make the map private to the checkpointer and only put copies of the submitted objects in it. But then I would have to ensure that the copies are deep copies and I wouldn't be able to have the data in the map being automatically updated, on every change to the submitted objects, the submitters would have to re-submit them. This seems like a lot of overhead and also error-prone, as I have to remember putting resubmit code in all the right places.
What's an elegant and reliable way to solve this?
what about other threads changing members of "their" objects inside the map
Here you have a problem :) and it cannot be solved by any kind of Map...
One solution would be to allow only immutable objects in your Map, but this may be impossible for you.
Otherwise you have to share a lock will all threads that may change data referenced by your map and block them all during your snapshot ; but this is a stop the world approach...
pgras is right that immutability would fix things, but that would also be tough. You could just lock the whole thing but that could be a performance problem. I can think of two good ideas.
First is to use a ReadWriteLock (which requires 1.5 or newer). Since your checkpoint can acquire the read lock it can be assured things are safe, but when no one is reading performance should be pretty good. This is still a pretty coarse lock, so you may also want to do #2...
Second is to break things up. Each area of the program could keep it's own map (the map for GUI stuff, the map for user settings, the map for hardware settings, whatever). Each one would have a lock on it and things would go about as usual. When it came time to checkpoint, the checkpointer would grab ALL the locks (so things are consistant) and then do it's job. The catch here is you have define an order for the locks to be grabbed in (say alphabetical) otherwise you'll end-up with deadlocks.
If the maps are orthogonal to each other (updates to one don't require updates to another to be consistent) then the easiest thing may be to push the updates to a central "backup" map in the checkpointer, not unlike something you described.
My biggest question to you would be, how much of a problem is this (performance wise)? Are updates very frequent, or are they rare? That would help to advise on something since my last idea (previous paragraph) could be slow, but it's easy and may not matter.
There is a fantastic book called Java Concurrency in Practice which is basically the Java threading bible. It discusses how to figure out this kind of stuff and strategies to avoid problems or make solving them easier. If you are going to be doing more threading, it's a very useful read.
Actually if your key values are orthogonal to eachother, then things are really easy. The ConcurrentMap interface (there are implemetations such as the ConcurrentHashMap) would solve your problems since they can do changes atomically, so readers wouldn't see inconsistent data. But if you have any two (or more) keys that must be updated at the same time this won't cover you.
I hope this helps. Threading access to shared data structures is complex stuff.