I'm trying to split a list of objects within smaller sublist and to process them separately on different threads. So I have following code:
List<Instance> instances = xmlInstance.readInstancesFromXml();
List<Future<List<Instance>>> futureList = new ArrayList<>();
int nThreads = 4;
ExecutorService executor = Executors.newFixedThreadPool(nThreads);
final List<List<Instance>> instancesPerThread = split(instances, nThreads);
for (List<Instance> instancesThread : instancesPerThread) {
if (instancesThread.isEmpty()) {
break;
}
Callable<List<Instance>> callable = new MyCallable(instancesThread);
Future<List<Instance>> submit = executor.submit(callable);
futureList.add(submit);
}
instances.clear();
for (Future<List<Instance>> future : futureList) {
try {
final List<Instance> instancesFromFuture = future.get();
instances.addAll(instancesFromFuture);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException ie) {
ie.printStackTrace();
}
And the MyCallable class :
public class MyCallable implements Callable<List<Instance>> {
private List<Instance> instances;
public MyCallable (List<Instance> instances) {
this.instances = Collections.synchronizedList(instances);
}
#Override
public List<Instance> call() throws Exception {
for (Instance instance : instances) {
//process each object and changing some fields;
}
return instances;
}
}
Split method(It's split an given list in given number of list, also trying to have almost same size on each sublist) :
public static List<List<Instance>> split(List<Instance> list, int nrOfThreads) {
List<List<Instance>> parts = new ArrayList<>();
final int nrOfItems = list.size();
int minItemsPerThread = nrOfItems / nrOfThreads;
int maxItemsPerThread = minItemsPerThread + 1;
int threadsWithMaxItems = nrOfItems - nrOfThreads * minItemsPerThread;
int start = 0;
for (int i = 0; i < nrOfThreads; i++) {
int itemsCount = (i < threadsWithMaxItems ? maxItemsPerThread : minItemsPerThread);
int end = start + itemsCount;
parts.add(list.subList(start, end));
start = end;
}
return parts;
}
So, when I'm trying to execute it, I'm getting java.util.ConcurrentModificationException on this line for (Instance instance : instances) {, can somebody give any ideas why it's happening?
public MyCallable (List<Instance> instances) {
this.instances = Collections.synchronizedList(instances);
}
Using synchronizedList like this doesn't help you in the way you think it might.
It's only useful to wrap a list in a synchronizedList at the time you create it (e.g. Collections.synchronizedList(new ArrayList<>()). Otherwise, the underlying list is directly accessible, and thus accessible in an unsynchronized way.
Additionally, synchronizedList only synchronizes for the duration of individual method calls, not for the whole time while you are iterating over it.
The easiest fix here is to take a copy of the list in the constructor:
this.instances = new ArrayList<>(instances);
Then, nobody else has access to that list, so they can't change it while you are iterating it.
This is different to taking a copy of the list in the call method, because the copy is done in a single-threaded part of the code: no other thread can be modifying it while you are taking that copy, so you won't get the ConcurrentModificationException (you can get a CME in single-threaded code, but not using this copy constructor). Doing the copy in the call method means the list is iterated, in exactly the same way as with the for loop you already have.
Related
Consider the following code:
public static void main(String[] args) throws InterruptedException {
int nThreads = 10;
MyThread[] threads = new MyThread[nThreads];
AtomicReferenceArray<Object> array = new AtomicReferenceArray<>(nThreads);
for (int i = 0; i < nThreads; i++) {
MyThread thread = new MyThread(array, i);
threads[i] = thread;
thread.start();
}
for (MyThread thread : threads)
thread.join();
for (int i = 0; i < nThreads; i++) {
Object obj_i = array.get(i);
// do something with obj_i...
}
}
private static class MyThread extends Thread {
private final AtomicReferenceArray<Object> pArray;
private final int pIndex;
public MyThread(final AtomicReferenceArray<Object> array, final int index) {
pArray = array;
pIndex = index;
}
#Override
public void run() {
// some entirely local time-consuming computation...
pArray.set(pIndex, /* result of the computation */);
}
}
Each MyThread computes something entirely locally (without need to synchronize with other threads) and writes the result to its specific array cell. The main thread waits until all MyThreads have finished, and then retrieves the results and does something with them.
Using the get and set methods of AtomicReferenceArray provides a memory ordering which guarantees that the main thread will see the results written by the MyThreads.
However, since every array cell is written only once, and no MyThread has to see the result written by any other MyThread, I wonder if these strong ordering guarantees are actually necessary or if the following code, with plain array cell accesses, would be guaranteed to always yield the same results as the code above:
public static void main(String[] args) throws InterruptedException {
int nThreads = 10;
MyThread[] threads = new MyThread[nThreads];
Object[] array = new Object[nThreads];
for (int i = 0; i < nThreads; i++) {
MyThread thread = new MyThread(array, i);
threads[i] = thread;
thread.start();
}
for (MyThread thread : threads)
thread.join();
for (int i = 0; i < nThreads; i++) {
Object obj_i = array[i];
// do something with obj_i...
}
}
private static class MyThread extends Thread {
private final Object[] pArray;
private final int pIndex;
public MyThread(final Object[] array, final int index) {
pArray = array;
pIndex = index;
}
#Override
public void run() {
// some entirely local time-consuming computation...
pArray[pIndex] = /* result of the computation */;
}
}
On the one hand, under plain mode access the compiler or runtime might happen to optimize away the read accesses to array in the final loop of the main thread and replace Object obj_i = array[i]; with Object obj_i = null; (the implicit initialization of the array) as the array is not modified from within that thread. On the other hand, I have read somewhere that Thread.join makes all changes of the joined thread visible to the calling thread (which would be sensible), so Object obj_i = array[i]; should see the object reference assigned by the i-th MyThread.
So, would the latter code produce the same results as the above?
So, would the latter code produce the same results as the above?
Yes.
The "somewhere" that you've read about Thread.join could be JLS 17.4.5 (The "Happens-before order" bit of the Java Memory Model):
All actions in a thread happen-before any other thread successfully returns from a join() on that thread.
So, all of your writes to individual elements will happen before the final join().
With this said, I would strongly recommend that you look for alternative ways to structure your problem that don't require you to be worrying about the correctness of your code at this level of detail (see my other answer).
An easier solution here would appear to be to use the Executor framework, which hides typically unnecessary details about the threads and how the result is stored.
For example:
ExecutorService executor = ...
List<Future<Object>> futures = new ArrayList<>();
for (int i = 0; i < nThreads; i++) {
futures.add(executor.submit(new MyCallable<>(i)));
}
executor.shutdown();
for (int i = 0; i < nThreads; ++i) {
array[i] = futures.get(i).get();
}
for (int i = 0; i < nThreads; i++) {
Object obj_i = array[i];
// do something with obj_i...
}
where MyCallable is analogous to your MyThread:
private static class MyCallable implements Callable<Object> {
private final int pIndex;
public MyCallable(final int index) {
pIndex = index;
}
#Override
public Object call() {
// some entirely local time-consuming computation...
return /* result of the computation */;
}
}
This results in simpler and more-obviously correct code, because you're not worrying about memory consistency: this is handled by the framework. It also gives you more flexibility, e.g. running it on fewer threads than work items, reusing a thread pool etc.
Atomic operations are required to ensure memory barriers are present when multiple threads access the same memory location. Without memory barriers, there is no happened-before relationship between the threads and there is no guarantee that the main thread will see the modifications done by the other threads, hence data rance. So what you really need is memory barriers for the write and read operations. You can achieve that using AtomicReferenceArray or a synchronized block on a common object.
You have Thread.join in the second program before the read operations. That should remove the data race. Without the join, you need explicit synchronization.
I'm new to multithreading in general, so I still don't fully understand it. I don't get why my code is having issues. I'm trying to populate an ArrayList with the first 1000 numbers, and then sum all of them using three threads.
public class Tst extends Thread {
private static int sum = 0;
private final int MOD = 3;
private final int compare;
private static final int LIMIT = 1000;
private static ArrayList<Integer> list = new ArrayList<Integer>();
public Tst(int compare){
this.compare=compare;
}
public synchronized void populate() throws InterruptedException{
for(int i=0; i<=Tst.LIMIT; i++){
if (i%this.MOD == this.compare){
list.add(i);
}
}
}
public synchronized void sum() throws InterruptedException{
for (Integer ger : list){
if (ger%MOD == this.compare){
sum+=ger;
}
}
}
#Override
public void run(){
try {
populate();
sum();
System.out.println(sum);
} catch (InterruptedException ex) {
Logger.getLogger(Tst.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
Tst tst1 = new Tst(0);
tst1.start();
Tst tst2 = new Tst(1);
tst2.start();
Tst tst3 = new Tst(2);
tst3.start();
}
}
I expected it to print "500.500", but instead it prints this:
162241
328741
Exception in thread "Thread-0" java.util.ConcurrentModificationException
at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1042)
at java.base/java.util.ArrayList$Itr.next(ArrayList.java:996)
at tst.Tst.sum(Tst.java:38)
at tst.Tst.run(Tst.java:50)
BUILD SUCCESSFUL (total time: 2 seconds)
The problem is happening because your methods are synchronized in "object level", I mean, the monitor lock it uses is of a particular object (tst1,tst2,tst3). In other words, each synchronized method is using a different lock.
Change your synchronized methods to static as a first step to fix it.
while run of tst1 is counting the sum in for-each then run of tst2 might increasing the size of list. So its throwing concurrent modification exception. Using a join can help.
public static void main(String[] args) {
Tst tst1 = new Tst(0);
tst1.start();
tst1.join()
Tst tst2 = new Tst(1);
tst2.start();
tst1.join()
Tst tst3 = new Tst(2);
tst3.start();
}
You misunderstood the semantic of synchronized method, each one uses different lock object in your case, do it this way:
class SynchList {
private int sum = 0;
private final int MOD = 3;
private int compare;
private final int LIMIT = 1000;
private ArrayList<Integer> list = new ArrayList<Integer>();
public synchronized void populate( int compare) throws InterruptedException{
for(int i=0; i<=LIMIT; i++){
if (i%this.MOD == compare){
list.add(i);
}
}
}
public synchronized void sum( int compare ) throws InterruptedException{
for (Integer ger : list){
if (ger%MOD == compare){
sum+=ger;
}
System.out.println( sum );
}
}
}
class Tst extends Thread {
int compare;
SynchList synchList;
public Tst(int compare, SynchList synchList)
{
this.compare= compare;
this.synchList = synchList;
}
#Override
public void run(){
try {
synchList.populate( compare );
synchList.sum( compare );
} catch (InterruptedException ex) {
Logger.getLogger(Tst.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
public class Main
{
public static void main(String[] args) {
SynchList synchList = new SynchList();
Tst tst1 = new Tst( 0 , synchList );
tst1.start();
Tst tst2 = new Tst( 1, synchList );
tst2.start();
Tst tst3 = new Tst( 2, synchList );
tst3.start();
}
}
Your use of synchronized methods isn't doing what you think it's doing. The way your code is written, the methods "sum" and "populate" are protected
from running at the same time, but only on the same thread instance. That means calls to "sum" and "populate" for a single Tst object will happen one at a time,
but simultaneous calls to "sum" on different object instances will be allowed to happen concurrently.
Using synchronized on a method is equivalent to writing a method that is wrapped
with synchronized(this) { ... } around the entire method body. With three different instances created – tst1, tst2, and tst3 – this form of synchronization
doesn't guard across object instances. Instead, it guarantees that only one of populate or sum will be running at a time on a single object; any other calls to one of
those methods (on the same object instance) will wait until the prior one finishes.
Take a look at 8.4.3.6. synchronized Methods in the Java Language Specification
for more detail.
Your use of static might also not be doing what you think it's doing. Your code also shares things across all instances of the Tst thread class – namely, sum and list. Because these are defined as static,
there will be a one sum and one list. There is no thread safety in your code to guard against concurrent changes to either of those.
For example, as threads are updating
"sum" (with the line: sum+=ger), the results will be non-deterministic. That is, you will likely see different results every time you run it.
Another example of unexpected behavior with multiple threads and a single static variable is list – that will grow over time which can result in concurrency issues. The Javadoc says:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Modifications include adding values as well as growing the backing array store. Without specifying a starting size – new ArrayList() – it will default to 10 or possibly some other relatively small number depending on which JVM version you're using. Once one thread tries to add an item that exceeds the ArrayList's capacity, it will trigger an automatic resize.
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
I have a class that should test my Fibonacci function using multithreading
public class PerformanceTesterImpl implements PerformanceTester{
public static List<Long> executionTimesList = new ArrayList();
public static List<Runnable> tasksList = new ArrayList();
public int fib;
public PerformanceTestResult performanceTestResult;
#Override
public PerformanceTestResult runPerformanceTest(Runnable task, int calculationCount, int threadPoolSize) {
for(int i=0; i<calculationCount; i++){
tasksList.add(createTask(fib));
}
ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize);
for(Runnable r : tasksList){
executor.execute(r);
}
executor.shutdown();
try {
executor.awaitTermination(1, TimeUnit.MINUTES);
} catch (InterruptedException e) {
e.printStackTrace();
}
// Here all threads should complete all work. Is it OK?
mapValues();
return performanceTestResult;
}
private PerformanceTestResult mapValues(){
Collections.sort(executionTimesList);
performanceTestResult = new PerformanceTestResult(getSum(executionTimesList), (Long)executionTimesList.get(0), (Long)executionTimesList.get(executionTimesList.size()-1));
return performanceTestResult;
}
public Runnable createTask (final int n) {
fib = n;
Runnable runnable = new Runnable() {
#Override
public void run() {
long startTime = System.currentTimeMillis();
FibCalc fibCalc = new FibCalcImpl();
fibCalc.fib(n);
long executionTime = System.currentTimeMillis() - startTime;
executionTimesList.add(executionTime);
}
};
return runnable;
}
private static long getSum(List<Long> executionTimes){
long sum = 0;
for(long l : executionTimes){
sum += l;
}
return sum;
}
}
but from time to time NULL appears in my collection and when I'm trying to sort executionTimeList I receive NullpointerException. I think there is a problem with executing threads. What should I do to correct this exception?
ArrayList is not thread safe.
From the Javadoc :
* If multiple threads access an <tt>ArrayList</tt> instance concurrently,
* and at least one of the threads modifies the list structurally, it
* <i>must</i> be synchronized externally. (A structural modification is
* any operation that adds or deletes one or more elements, or explicitly
* resizes the backing array; merely setting the value of an element is not
* a structural modification.) This is typically accomplished by
* synchronizing on some object that naturally encapsulates the list.
* If no such object exists, the list should be "wrapped" using the
* {#link Collections#synchronizedList Collections.synchronizedList}
* method. This is best done at creation time, to prevent accidental
* unsynchronized access to the list:<pre>
* List list = Collections.synchronizedList(new ArrayList(...));</pre>
When you call add from multiple threads, the ArrayList may be in inconsistent state.You should synchronize access to it.
Try :
public void run() {
long startTime = System.currentTimeMillis();
FibCalc fibCalc = new FibCalcImpl();
fibCalc.fib(n);
long executionTime = System.currentTimeMillis() - startTime;
synchronized (this) {
executionTimesList.add(executionTime);
}
}
executionTimesList is shared among all threads. They run in concurrence in your code with no synchronization. So it's logical that any inconsistency state appears if a thread works on the list and has not finished its job and the cpu gives the priority to another thread which works on the list too, the first thread will be in a inconsistent state when it will be restarted.
You must synchronize the access for the static field executionTimesList
I am trying to sort a file using threading. Here is Sort.java :
This function sorts with help of threading
public static String[] threadedSort(File[] files) throws IOException {
String sortedData[] = new String[0];
int counter = 0;
boolean allThreadsTerminated = false;
SortingThread[] threadList = new SortingThread[files.length];
for (File file : files) {
String[] data = getData(file);
threadList[counter] = new SortingThread(data);
threadList[counter].start();
counter++;
}
while(!allThreadsTerminated) {
allThreadsTerminated = true;
for(counter=0; counter<files.length; counter++) {
if(threadList[counter].getState() != Thread.State.TERMINATED) {
allThreadsTerminated = false;
}
}
}
for(counter=0; counter<files.length; counter++) {
sortedData = MergeSort.merge(sortedData, threadList[counter].data);
}
return sortedData;
}
This function sorts just normally
public static String[] sort(File[] files) throws IOException {
String[] sortedData = new String[0];
for (File file : files) {
String[] data = getData(file);
data = MergeSort.mergeSort(data);
sortedData = MergeSort.merge(sortedData, data);
}
return sortedData;
}
Now when I sort using both ways the normal sorting is faster than threaded version. What can be reason for it ? Had i missed something ?
My SortingThread is something like this :
public class SortingThread extends Thread {
String[] data;
SortingThread(String[] data) {
this.data = data;
}
public void run() {
data = MergeSort.mergeSort(data);
}
}
When I analyze my threaded implementation by comparing its performance to the original non-threaded implementation I find second one faster. What can be reason for such behavior ? If we talk of relative performance improvement we expect for threaded implementation to be faster if am not wrong.
EDIT : Assume I have properly functional MergeSort. But its of no use to post its code here. Also getData() function is just to take input from file.
I think problem lies with the fact that am taking whole file in array. I think I should provide different lines to different threads :
private static String[] getData(File file) throws IOException {
ArrayList<String> data = new ArrayList<String>();
BufferedReader in = new BufferedReader(new FileReader(file));
while (true) {
String line = in.readLine();
if (line == null) {
break;
}
else {
data.add(line);
}
}
in.close();
return data.toArray(new String[0]);
}
First of all, how do you measure elapsed time? Do you execute both tests in the same program? If so, keep in mind that mergesort will probably undergo Hotspot compilation while the first test is executed. I suggest you run each method twice, measuring the time on the second run
How many CPU/cores do you have? One problem with this code is that the main thread spends CPU time in "while(!allThreadsTerminated)" loop, actively checking thread state. If you have one CPU - you are wasting it, instead of doing actual sorting.
Replace the while-loop with:
for(counter=0; counter<files.length; counter++) {
threadList[counter].join();
}
You should use Stream and standard sort:
static String[] sort(File[] files, boolean parallel) {
return (parallel ? Stream.of(files).parallel() : Stream.of(files))
.flatMap(f -> {
try {
return Files.lines(f.toPath());
} catch (Exception e) {
e.printStackTrace();
return null;
}
})
.sorted()
.toArray(String[]::new);
}
static String[] sort(File[] files) {
return sort(files, false);
}
static String[] threadSort(File[] files) {
return sort(files, true);
}
In my environmet threadSort is faster.
sort:
files=511 sorted lines=104419 elapse=4784ms
threadSort:
files=511 sorted lines=104419 elapse=3060ms
You can use java.util.concurrent.ExecutorService which will run all your tasks in specified number of threads, and once all threads have finished execution you will get a list Future object which will hold the result of each thread execution. List of Future objects will be in same order as you inserted the Callable objects into its list.
For that first thing you need is have your SortingThread implement Callable interface so that you can get the result of each thread execution.
Each Callable object have to implement the call() method and its return type would be your Future object.
public class SortingThread implements Callable<String[]> {
String[] data;
SortingThread(String[] data) {
this.data = data;
}
#Override
public String[] call() throws Exception {
data = MergeSort.mergeSort(data);
return data;
}
}
Next you need is to use ExecutorSerivce for thread management.
public static String[] sortingExampleWithMultiThreads(File[] files) throws IOException {
String sortedData[] = new String[0];
int counter = 0;
boolean allThreadsTerminated = false;
SortingThread[] threadList = new SortingThread[files.length];
ArrayList<Callable<String[]>> callableList = new ArrayList<Callable<String[]>>();
for (File file : files) {
String[] data = getData(file);
callableList.add(new SortingThread(data)); //Prepare a Callable list which would be passed to invokeAll() method.
counter++;
}
ExecutorService service = Executors.newFixedThreadPool(counter); // Create a fixed size thread pool, one thread for each file processing...
List<Future<String[]>> futureObjects = service.invokeAll(callableList); //List of what call() method of SortingThread is returning...
for(counter=0; counter<files.length; counter++) {
sortedData = MergeSort.merge(sortedData, futureObjects.get(counter));
}
return sortedData;
}
This way you can avoid using WHILE loop which is known to increase CPU utilization (hence decrease in speed), and if you have single core CPU then it can reach 100% of utilization, and if dual core then 50%.
Also, using ExecutorService for thread management is better way when dealing with multi-threading instead of dev starting and monitoring threads for results. So, you can expect performance.
I have not ran it, so you may need to do so change here and there but I have highlighted you approach.
P.S.: When measuring the performance, to get the neat and precise results, always have a new JVM instance created for each run.
I am implementing an application using concurrent hash maps. It is required that one thread adds data into the CHM, while there is another thread that copies the values currently in the CHM and erases it using the clear() method. When I run it, after the clear() method is executed, the CHM always remains empty, though the other thread continues adding data to CHM.
Could someone tell me why it is so and help me find the solution.
This is the method that adds data to the CHM. This method is called from within a thread.
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentHashMap;
public static ConcurrentMap<String, String> updateJobList = new ConcurrentHashMap<String, String>(8, 0.9f, 6);
public void setUpdateQuery(String ticker, String query)
throws RemoteException {
dataBaseText = "streamming";
int n = 0;
try {
updateJobList.putIfAbsent(ticker, query);
}
catch(Exception e)
{e.printStackTrace();}
........................
}
Another thread calls the track_allocation method every minute.
public void track_allocation()
{
class Track_Thread implements Runnable {
String[] track;
Track_Thread(String[] s)
{
track = s;
}
public void run()
{
}
public void run(String[] s)
{
MonitoringForm.txtInforamtion.append(Thread.currentThread()+"has started runnning");
String query = "";
track = getMaxBenefit(track);
track = quickSort(track, 0, track.length-1);
for(int x=0;x<track.length;x++)
{
query = track[x].split(",")[0];
try
{
DatabaseConnection.insertQuery(query);
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
}
joblist = updateJobList.values();
MonitoringForm.txtInforamtion.append("\nSize of the joblist is:"+joblist.size());
int n = joblist.size()/6;
String[][] jobs = new String[6][n+6];
MonitoringForm.txtInforamtion.append("number of threads:"+n);
int i = 0;
if(n>0)
{
MonitoringForm.txtInforamtion.append("\nSize of the joblist is:"+joblist.size());
synchronized(this)
{
updateJobList.clear();
}
Thread[] threads = new Thread[6];
Iterator it = joblist.iterator();
int k = 0;
for(int j=0;j<6; j++)
{
for(k = 0; k<n; k++)
{
jobs[j][k] = it.next().toString();
MonitoringForm.txtInforamtion.append("\n\ninserted into queue:\n"+jobs[j][k]+"\n");
}
if(it.hasNext() && j == 5)
{
while(it.hasNext())
{
jobs[j][++k] = it.next().toString();
}
}
threads[j] = new Thread(new Track_Thread(jobs[j]));
threads[j].start();
}
}
}
I can see a glaring mistake. This is the implementation of your Track_Thread classes run method.
public void run()
{
}
So, when you do this:
threads[j] = new Thread(new Track_Thread(jobs[j]));
threads[j].start();
..... the thread starts, and then immediately ends, having done absolutely nothing. Your run(String[]) method is never called!
In addition, your approach of iterating the map and then clearing it while other threads are simultaneously adding is likely to lead to entries occasionally being removed from the map without the iteration actually seeing them.
While I have your attention, you have a lot of style errors in your code:
The indentation is a mess.
You have named your class incorrectly: it is NOT a thread, and that identifier ignores the Java identifier rule.
Your use of white-space in statements is inconsistent.
These things make your code hard to read ... and to be frank, they put me off trying to really understand it.