Java provides a subList function to get a view of the list between specified indices and is backed by the parent list meaning, any changes made to subList will reflect in the actual list. What I wish to know is whether these sublists get locked by the parent list if threads try to access them.
As an example, if I have a an ArrayList of 100 elements and I create 4 sublists each with 25 elements and 4 threads try to work in parallel on these sublists, will they work on their independent sublists in a truly parallel manner or will the first thread which gets executed, lock the backing arraylist?
If an arraylist is not locked by default, I am assuming the threads will run in parallel on the sublists without waiting for each other and if I programmatically ensure or rather the logic itself ensures that these threads never work on anything else other than their sublists then it will truly be parallel processing of the sublists, right?
executor.addTask(new Thread(doneSignal, parentList.subList(subListStart, subListEnd)));
The reason I ask is, I tried to loop over the sublists in parallel and noticed that it was substantially slower than not creating 4 threads and looping over the actual parent list.
As it says in the Javadoc for java.util.ArrayList:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Of course, this applies to the subList method. So, no locking is done by the ArrayList itself; you need to do it yourself if you require it.
What I wish to know is whether these sublists get locked by the parent list if threads try to access them.
No, they won't.
... [w]ill they work on their independent sublists in a truly parallel manner
Yes. (Unless there are other factors that are working against parallelism.)
I tried to loop over the sublists in parallel and noticed that it was substantially slower than not creating 4 threads and looping over the actual parent list.
That could be down to other things. For example, thread creation overheads, a thread pool that is too small, or trying to run multi-threaded code when there are too few cores.
Another possibility is that you are creating sublists of a synchronized list. If you do that, when the sublist methods delegate operations to the parent list, those operations will all be locking the same list. Note however it is the parent list that is responsible for this, not the sublists.
Related
My main thread has a private LinkedList which contains task objects for the players in my game. I then have a separate thread that runs every hour that accesses and clears that LinkedList and runs my algorithm which randomly adds new uncompleted tasks to every players LinkedList. Right now I made a getter method that is synchronized so that I dont run into any concurrency issues. This works fine but the synchronized keyword has a lot of overhead especially since its accessed a ton from the main thread while only accessed hourly from my second thread.
I am wondering if there is a way to prioritize the main thread? For example on that 2nd thread I could loop through the players then make a new LinkedList then run my algorithm and add all the tasks to that LinkedList then quickly assign the old LinkedList equal to the new one. This would slightly increase memory usage on the stack while improving main thread speed.
Basically I am trying to avoid making my main thread use synchronization when it will only be used once an hour at most and I am willing to greatly degrade the performance of the 2nd thread to keep the main threads speed. Is there a way I can use the 2nd thread to notify the 1st that it will be locking a method instead of having the 1st thread physically have to go through all of the synchronization over head steps? I feel like this would be possible since if that 2nd thread shares a cache with the main thread and it could change a boolean denoting that the main thread has to wait till that variable is changed back. The main thread would have to check that boolean every time it tries run that method and if the 2nd thread is telling it to wait the main thread will then freeze till the boolean is changed.
Of course the 2nd thread would have to specify which object and method has the lock along with a binary 0 or 1 denoting if its locked or not. Then the main thread would just need to check its shared cache for the object and the binary boolean value once it reaches that method which seems way faster than normal synchronization. Anyways this would then result in them main thread running at normal speed while the 2nd thread handles a bunch of work behind the scenes without degrading main thread performance. Does this exist if so how can I do it and if it does not exist how hard would it actually be to implement?
Premature optimization
It sounds like you are overly worried about the cost of synchronization. Doing a dozen, or a hundred, or even a thousand synchronizations once an hour is not going to impact the performance of your app by any significant amount.
If your concern has not yet been validated by careful study with a profiling tool, you’ve fallen into the common trap of premature optimization.
AtomicReference
Nevertheless, I can suggest an alternative approach.
You want to replace a list once an hour. If you do not mind letting any threads continue using the current list already accessed while you swap out for a new list, then use AtomicReference. An object of this class holds the reference to another object of a specified type.
I generally like the Atomic… classes for thread-safety work because they scream out to the reader that a concurrency problem is at hand.
AtomicReference < List < Task > > listRef = new AtomicReference<>( originalList ) ;
A different thread is able to replace that reference to the old list with a reference to the new list.
listRef.set( newList ) ;
Access by the other thread:
List< Task > list = listRef.get() ;
Note that this approach does not make thread-safe the payload, the list itself. But you claim that only a single thread will ever be manipulating the content of the list. You claim a different thread will only replace the entire list. So this AtomicReference serves the purpose of replacing the list in a thread-safe manner while making the issue of concurrency quite obvious.
volatile
Using AtomicReference accomplishes the same goal as volatile. I’m wary of volatile because (a) its use may go unnoticed by the reader, and (b) I suspect many Java programmers do not understand volatile, especially since its meaning was redefined.
For more info about why plain reference assignment is not thread-safe, see this Question.
https://ivoanjo.me/blog/2018/07/21/writing-to-a-java-treemap-concurrently-can-lead-to-an-infinite-loop-during-reads/ demonstrates how multiple concurrent writers may corrupt a TreeMap in such a way that cycles are created, and iterating the structure becomes an infinite loop.
Is it also possible to get in an infinite loop when concurrently iterating and writing with at most one concurrent writer? If not can anything other than skipping elements, processing elements twice, or throwing a ConcurrentModificationException happen?
Is it also possible to get in an infinite loop when concurrently iterating and writing with at most one concurrent writer?
I would say a cautious no: these infinite loops occur because multiple threads are re-wiring the relationships between the nodes, and so may make conflicting updates. A single thread won't conflict with itself, so such a re-wiring mixup would not occur.
However, I am not confident in this - but I don't need to be: such a usage of a TreeMap is in violation of the documentation:
If multiple threads access a map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.
If you don't externally synchronize, the behavior is undefined. Implementors of the class are free to implement the class in any way that meets the specification; and it is possible that one way might result in an infinite loop.
If you are encountering an infinite loop in a TreeMap, that's a symptom, not the root cause - namely, unsynchronized access to mutable data. This means that there is no guarantee that the values being read by the only-reading threads are correct either.
If you need to concurrently access a map, you'll have to use Collections.synchronizedMap. Otherwise, donc expect it to work correctly.
I have a single threaded model job which iterates over a collection of data and customizes the data. I want to divide the collection into small sublists and want each individual sublist to be executed in parallel. Should I use an array of threads (where the size of the array is the number of sublist created), or a thread pool?
Based on what you are going to divide the collection further? If your type of jobs/data are same then let them be in one collection and threads from threadpool will pick up the task from the list and run in parallel.
It is better to use thread pool in any case, because in this case you can get rid of low level operations for managing arrays of objects and increase flexibility.
You should use ExecutorService instance in your code and choose right type of it.
For example:
Executors.newCachedThreadPool - if your processing logic is simple enough and doesn't require much processing time (to not produce too many concurrent threads that will cause different exceptions).
Executors.newFixedThreadPool - if your processing logic is complex enough, so you should limit number of threads.
So, I think that you should:
Create required ExecutionService in your consumer.
Go through your collection and submit processing job to executor (for each element, instance of Callable). Save them in List<Future<?>> instance.
Iterate through futures (wait completion of all tasks), save results in new collection, send results somewhere and commit kafka offset.
we can synchronize a collection by using 'collections.synchronizedCollection(Collection c)'
or 'collections.synchronizedMap(Map c)' and we can also use java concurrent API like ConcurrentHashMap or ArrayQueue or BlockingQueue.
Is there any difference in synchronization level between these two way of getting synchronized collections or these are almost same?
could any one explain?
Yes: speed during massive parallel processing.
This can be illustrated in a very simple way: Imagine that 100 Threads are waiting to take something out of a collection.
The synchronized way: 99 Threads a put to sleep and 1 Thread gets its value.
The concurrent way: 100 Threads get their value immediately, none is put on hold.
Now the second method takes a little more time than a simple get, but as soon as a minimum like 2 Threads are using it on a constant basis, it is well worth the time you save thanks to concurrent execution.
so now as per my understandings synchronized way is a wrapper and blocks whole collection object and on other hand in concurrent way only objects inside collection get synchronized and we can access 2 or more elements of a collection at same time
To begin with, I have used search and found n topics related to this question. Unfortunately, they didin't help me, so it'll be n++ topics :)
Situation: I will have few working threads (the same class, just many dublicates) (let's call them WT) and one result writing thread (RT).
WT will add objects to the Queue, and RT will take them. Since there will be many WT won't there be any memory problems(independant from the max queue size)? Will those operations wait for each other to be completed?
Moreover, as I understand, BlockingQueue is quite slow, so maybe I should leave it and use normal Queue while in synchronized blocks? Or should I consider my self by using SynchronizedQueue?
LinkedBlockingQueue is designed to handle multiple threads writing to the same queue. From the documentation:
BlockingQueue implementations are thread-safe. All queuing methods achieve their effects atomically using internal locks or other forms of concurrency control. However, the bulk Collection operations addAll, containsAll, retainAll and removeAll are not necessarily performed atomically unless specified otherwise in an implementation.
Therefore, you are quite safe (unless you expect the bulk operations to be atomic).
Of course, if thread A and thread B are writing to the same queue, the order of A's items relative to B's items will be indeterminate unless you synchronize A and B.
As to the choice of queue implementation, go with the simplest that does the job, and then profile. This will give you accurate data on where the bottlenecks are so you won't have to guess.