arraylist that supports multithreads - java

I need to write an arraylist to a file.
It gets filled all the time and when it gets too big I need to start writing it.
So I thought to check when the arraylist sise is greater then 100 and then append the file and write the current rows .
But the problem is sometimes it doesnt get filled for a few minuettes and I will want to dump the data to a file.
So my second thought was to have another thread that will check if there are rows every few sec and dump it to the file.
But than I would need to manage locks between threads.
My questions are :
1. Is the multithread design ok ?
2. Is there an arraylist that supports multithread in any form ?

You can make synchronized lists using Collections.synchronizedList.

Check out this thread.
You should use
List synchronizedList = Collections.synchronizedList(list);

Instead of ArrayList I would recommend you to use proper implementation of Queue<E>. If you are only appending data to the list and then dumping it to the file removing saved items from the list, queue is a much better choice.
Some implementations are threads safe and will even allow the caller thread to block until something actually appears in the queue - which is much better approach than having a polling thread. BlockingQueue looks very promising for your case.

From your question it appears most of the time you are going to perform write operation and that too from a single thread & will be intermittently checking for the size of list.
other wise you can not use plain old ArrayList.
synchronizing the list and then using locks to access the list looks like a overkill.
rather have a if check that will check the size.
If you are going to access list in multiple threads then to avoid ConcurrentModificationException use the method suggested by #Ludevik
There are other approaches as well but for the sake of simplicity #Ludevik approach fits the bill.

Instead of arrays you can use vectors. Since vectors are thread-safe.

ArrayList is not thread-safe, but you can get a thread-safe list with Collections.synchronizedList()

You need to use Collections.synchronizedList(List) to create a thread safe list. However, you still need to synchronize operations such as add or remove and synchronize the updates to the objects held in the list.

The simple solutions are to use Collections.synchronizedList(list) or Vector. However, there is a gotcha.
The iterator() method for a synchronized list / Vector created as above is NOT synchronized. So there's nothing to stop a thread from trying to add new element to the list if you copy it using an iterator explicitly, by using for (type var : list) {...} or by using a copy constructor that relies on the iterator.
This is liable to result in concurrent modification exceptions. To avoid that problem, you will need to do your own locking.
It may be better idea to use a concurrent Queue class so that the thread that writes stuff to the file doesn't need to iterate a list.

Related

Java making a copy of a list thread safe?

I have multiple threads running in a Java application, and all of them need to access the same list. Only one thread, however, actually needs to insert/delete/change the list, and the others just need to access it.
In the other threads, I want to make a copy of the list for me to use whenever I need to read it, but is there any way to do this in a thread safe way? If I had a method to copy the list element by element and the list changed, that would mess it up wouldn't it?
EDIT:
The list will not be deleted from very often, so would it work if I just copied it normally and caught exceptions? If the list grew in the middle of the copy and I missed it, it wouldn't really make a difference to functionality
You can use CopyOnWriteArrayList for your purpose.
CopyOnWriteArrayList is a concurrent Collection class introduced in Java 5 Concurrency API along with its popular cousin ConcurrentHashMap in Java.
As name suggest CopyOnWriteArrayList creates copy of underlying
ArrayList with every mutation operation e.g. add or set. Normally
CopyOnWriteArrayList is very expensive because it involves costly
Array copy with every write operation but its very efficient if you
have a List where Iteration outnumber mutation e.g. you mostly need to
iterate the ArrayList and don't modify it too often.
With this collection, you shouldn't create a new instance every time. You should have only one object of this and it will work.
Hmm, so I think that what are you looking for is called CopyOnWriteArrayList.
CopyOnWriteArrayList - A thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.
Ref: CopyOnWriteArrayList
You can use CopyOnWriteArrayList which is thread safe ,but it create new one on every update process.
Or you can use readWriteLock so when update use no one can read while multiple thread can read simultaneously .
I decided to solve this by having a separate thread that handles the thread, with BlockingQueue for the other threads to submit to if they want to write to the list, or get from the list. If they wanted to write to the list it would submit an object with the content that they wanted to write, and if they wanted to read, it would submit a Future that the thread with the list would populate
Depending on your particular usage, you might benefit from one of these:
If you don't really need random access, use ConcurrentLinkedQueue. No explicit synchronization required.
If you don't need random access for writes but only need it for reads, use ConcurrentLinkedQueue for writes and copy it to a list from time to time if changes were made to the queue (in a separate thread), give this list to "readers". Does not require explicit synchronization; gives a "weakly consistent" read view.
Since your writes come from one thread, the previous could work with 2 lists (e.g. the writing thread will copy it to the "reading view" from time to time). However, be aware that if you use an ArrayList implementation and require random access for writes then you are looking at constant copies of memory regions, not good even in the absence of excessive synchronization. This option requires synchronization for the duration of copying.
Use a map instead, ConcurrentHashMap if you don't care about ordering and want O(1) performance or ConcurrentSkipListMap if you do need ordering and are ok with O(logN) performance. No explicit synchronization required.
Use Collections.synchronizedList().
Example :
Collections.synchronizedList(new ArrayList<YourClassNameHere>())

Java Deep Clone List of Objects in Separate Thread Atomically

Suppose I have an ArrayList<Foo>. This list is being modified very frequently. Even the values of the elements in the list will have values that change all the time. All of the modifications to this list are being performed by the main thread.
How would I go about cloning the list (deep level, elements should be cloned also) in a separate thread in such a way that it does not delay the main thread (or at least not by much), and the copied list contains a snapshot in time (I think the term is atomically) of all of the Foo objects, with there values identical to the original list (again in one snapshot of time).
Thanks in advance. I know the solution has to do with synchronization but I am at a loss in meeting all of the above criteria.
There isn't a simple answer to this unfortunately. If the modifications being made to the list are already threadsafe, you can grab a lock on the list (or whatever you're using for synchronization), make your copy, and release the lock. You'll need to make sure that any modifications to the items themselves are using the same lock.
Alternatively, you could use immutable constructs (but you need to use this not just for the list, but for the list contents as well), then you'll never need to lock and can just grab a copy of the list at your leisure.
There are downsides and risks to both approaches. If you want a good resource, I strongly recommend Java Concurrency in Practice.

Thread safe implementation for Hash Map

First, I'll describe what I want and then I'll elaborate on the possibilities I am considering. I don't know which is the best so I want some help.
I have a hash map on which I do read and write operations from a Servlet. Now, since this Servlet is on Tomcat, I need the hash map to be thread safe. Basically, when it is being written to, nothing else should write to it and nothing should be able to read it as well.
I have seen ConcurrentHashMap but noticed its get method is not thread-safe. Then, I have seen locks and something called synchronized.
I want to know which is the most reliable way to do it.
ConcurrentHashMap.get() is thread safe.
You can make HashMap thread safe by wrapping it with Collections.synchronizedMap().
EDIT: removed false information
In any case, the synchronized keyword is a safe bet. It blocks any threads from accessing the object while inside a synchronized block.
// Anything can modify map at this point, making it not thread safe
map.get(0);
as opposed to
// Nothing can modify map until the synchronized block is complete
synchronized(map) {
map.get(0);
}
I would like to suggest you to go with ConcurrentHashMap , the requirement that you have mentioned above ,earlier I also had the same type of requirement for our application but we were little more focused on the performance side.
I ran both ConcurrentHashMap and map returned by Colecctions.synchronizedMap(); , under various types of load and launching multiple threads at a time using JMeter and I monitored them using JProfiler .After all these tests we came to conclusion that that map returned by Colecctions.synchronizedMap() was not as efficient in terms of performance in comaprison to ConcurrentHashMap.
I have written a post also on the same about my experience with both.
Thanks
Collections.synchronizedMap(new HashMap<K, V>);
Returns a synchronized (thread-safe) map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
It is imperative that the user manually synchronize on the returned map when iterating over any of its collection views:
This is the point of ConcurrentHashMap class. It protects your collection, when you have more than 1 thread.

Java Concurrency: should I synchronize all List and Maps?

So I have a SomeTask class which extends Thread, and it has Map and List fields. What would be the behavior when you don't do Collections.synchronizedXXX and you have multiple thread of SomeTask running?
Once a Map is called from the database (I am using Object Database to directly store POJO), would I need to synchronized the Map object returned from this database as well?
Map SomeTasksOwnMap = Collections.synchronizedMap(MapReturnedFromDatabase);
Collections.synchronizedXXX is required when 2 or more Threads are accessing the same Map/List.
If your task doesn't access other tasks Map/List, then there is no need to synchronize them.
Example.
Task 1 builds a list of numbers divisible exactly by 2.
Task 2 builds a list of numbers divisible exactly by 3.
These two tasks have individual lists that do not require synchronization.
Example require synchronization.
Task 1 and 2 both calculate numbers and store them in a shared list.
To answer the questions: "What would be the behavior when you don't", you could lose one of the writes if it was timed that both threads wanted to write to index 'x'.
You may also have a null value in the list as the size of the array was increased before the write to the location was done.
Basically you would have an inconsistent view.
No. There is nothing in your question that suggests synchronization is required, because as far as I can tell each thread reads only data within itself: You only need synchronization when threads access data in other threads.
As an aside, having SomeTask extends Thread is a poor design - it should extends Runnable, then use new Thread(new SomeTask()).start().
... should I synchronize all List and Maps?
No you shouldn't. Synchronizing things that don't need it is a waste of resources. And for things that do need synchronization, you need to do it the right way. (And the synchronizedXxx wrappers are not always the right way.)
First, you need to identify the data structures that are going to be visible to multiple threads. Data structures that are provably thread confined don't need synchronizing at all.
Second, you need to examine the way that the data structures are used to see if a synchronizedXxx wrapper is sufficient. For instance, these wrappers don't synchronize iteration, and you can get into trouble if one thread changes a collection while another one is iterating it.
Finally, you need to think about whether the synchronized data structures are heavily used by different threads. The synchronzedXxx wrappers can result in a performance bottleneck if the data structure is heavily used. If this is the case, you should consider using one of the ConcurrentYyyy classes instead.

Java: Large collection and concurrent threads

I am facing this issue:
I have lots of threads (1024) who access one large collection - Vector.
Question:
is it possible to do something about it which would allow me to do concurrent actions on it without having to synchronize everything (since that takes time)? What I mean, is something like Mysql database works, you don't have to worry about synchronizing and thread-safe issues. Is there some collection alike that in Java? Thanks
Vector is a very old Java class - predates the Collections API. It synchronizes on every operation, so you're not going to have any luck trying to speed it up.
You should consider reworking your code to use something like ConcurrentHashMap or a LinkedBlockingQueue, which are highly optimized for concurrent access.
Failing that, you mention that you'd like performance and access semantics similar to a database - why not use a dedicated database or a message queue? They are likely to implement it better than you ever will, and it's less code for you to write!
[edit] Given your comment:
all what thread does is adding elements to vector
(only if num of elements in vector = 0) &
removing elements from vector. (if vector size > 0)
it sounds very much like you should be using something much more like a queue than a list! A bounded queue with size 1 will give you these semantics - although I'd question why you can't add elements if there is already something there. When you've got thousands of threads this seems like a very inefficient design.
Well first off, this design doesn't sound right. It sounds like you need to think about using a proper database rather than an simple data structure, even if this means just using something like an in-memory instance of HypersonicDB.
However, if you insist on doing things this way, then the java.util.concurrent package has a number of highly concurrent, non-locking data structures. One of them might suit your purpose (e.g. ConcurrentHashMap, if you can use a Map rather than a List)
Looks like you are implementing the producer consumer pattern, you should google "producer consumer java" or have a look at the BlockingQueue interface
I agree with skaffman about looking at java.util.concurrent.
ConcurrentHashMap is very scalable. However, the size() call on it returns only an approximation. So e.g. your app will occasionally be adding elements to it even if !(num of elements in vector = 0).
If you want to strictly enforce the condition you gave, there is no other way than to synchronize.
Instead of having tons of context switches, I guess you could let your users thread post a callable on a queue and have only one thread dealing with the mutation. This will eliminate the need for synchronization on the collection. The user threads can wait on Future.get().
Just an idea.
If you do not want to change your data structure and have only infrequent writes, you might also use one or many ReentrantReadWriteLock to synchronize access. Then many threads can read at the same time, but when a thread wants to write all reads are blocked until the write is done.
But you should check whether the used data structure is appropriate for the task, or whether another of the many java.util or java.util.concurrent classes is more appropriate. java.util.Vector is synchronized, by the way.

Categories