Threaded sort running slower than non threaded sorting - java

I am trying to sort a file using threading. Here is Sort.java :
This function sorts with help of threading
public static String[] threadedSort(File[] files) throws IOException {
String sortedData[] = new String[0];
int counter = 0;
boolean allThreadsTerminated = false;
SortingThread[] threadList = new SortingThread[files.length];
for (File file : files) {
String[] data = getData(file);
threadList[counter] = new SortingThread(data);
threadList[counter].start();
counter++;
}
while(!allThreadsTerminated) {
allThreadsTerminated = true;
for(counter=0; counter<files.length; counter++) {
if(threadList[counter].getState() != Thread.State.TERMINATED) {
allThreadsTerminated = false;
}
}
}
for(counter=0; counter<files.length; counter++) {
sortedData = MergeSort.merge(sortedData, threadList[counter].data);
}
return sortedData;
}
This function sorts just normally
public static String[] sort(File[] files) throws IOException {
String[] sortedData = new String[0];
for (File file : files) {
String[] data = getData(file);
data = MergeSort.mergeSort(data);
sortedData = MergeSort.merge(sortedData, data);
}
return sortedData;
}
Now when I sort using both ways the normal sorting is faster than threaded version. What can be reason for it ? Had i missed something ?
My SortingThread is something like this :
public class SortingThread extends Thread {
String[] data;
SortingThread(String[] data) {
this.data = data;
}
public void run() {
data = MergeSort.mergeSort(data);
}
}
When I analyze my threaded implementation by comparing its performance to the original non-threaded implementation I find second one faster. What can be reason for such behavior ? If we talk of relative performance improvement we expect for threaded implementation to be faster if am not wrong.
EDIT : Assume I have properly functional MergeSort. But its of no use to post its code here. Also getData() function is just to take input from file.
I think problem lies with the fact that am taking whole file in array. I think I should provide different lines to different threads :
private static String[] getData(File file) throws IOException {
ArrayList<String> data = new ArrayList<String>();
BufferedReader in = new BufferedReader(new FileReader(file));
while (true) {
String line = in.readLine();
if (line == null) {
break;
}
else {
data.add(line);
}
}
in.close();
return data.toArray(new String[0]);
}

First of all, how do you measure elapsed time? Do you execute both tests in the same program? If so, keep in mind that mergesort will probably undergo Hotspot compilation while the first test is executed. I suggest you run each method twice, measuring the time on the second run

How many CPU/cores do you have? One problem with this code is that the main thread spends CPU time in "while(!allThreadsTerminated)" loop, actively checking thread state. If you have one CPU - you are wasting it, instead of doing actual sorting.
Replace the while-loop with:
for(counter=0; counter<files.length; counter++) {
threadList[counter].join();
}

You should use Stream and standard sort:
static String[] sort(File[] files, boolean parallel) {
return (parallel ? Stream.of(files).parallel() : Stream.of(files))
.flatMap(f -> {
try {
return Files.lines(f.toPath());
} catch (Exception e) {
e.printStackTrace();
return null;
}
})
.sorted()
.toArray(String[]::new);
}
static String[] sort(File[] files) {
return sort(files, false);
}
static String[] threadSort(File[] files) {
return sort(files, true);
}
In my environmet threadSort is faster.
sort:
files=511 sorted lines=104419 elapse=4784ms
threadSort:
files=511 sorted lines=104419 elapse=3060ms

You can use java.util.concurrent.ExecutorService which will run all your tasks in specified number of threads, and once all threads have finished execution you will get a list Future object which will hold the result of each thread execution. List of Future objects will be in same order as you inserted the Callable objects into its list.
For that first thing you need is have your SortingThread implement Callable interface so that you can get the result of each thread execution.
Each Callable object have to implement the call() method and its return type would be your Future object.
public class SortingThread implements Callable<String[]> {
String[] data;
SortingThread(String[] data) {
this.data = data;
}
#Override
public String[] call() throws Exception {
data = MergeSort.mergeSort(data);
return data;
}
}
Next you need is to use ExecutorSerivce for thread management.
public static String[] sortingExampleWithMultiThreads(File[] files) throws IOException {
String sortedData[] = new String[0];
int counter = 0;
boolean allThreadsTerminated = false;
SortingThread[] threadList = new SortingThread[files.length];
ArrayList<Callable<String[]>> callableList = new ArrayList<Callable<String[]>>();
for (File file : files) {
String[] data = getData(file);
callableList.add(new SortingThread(data)); //Prepare a Callable list which would be passed to invokeAll() method.
counter++;
}
ExecutorService service = Executors.newFixedThreadPool(counter); // Create a fixed size thread pool, one thread for each file processing...
List<Future<String[]>> futureObjects = service.invokeAll(callableList); //List of what call() method of SortingThread is returning...
for(counter=0; counter<files.length; counter++) {
sortedData = MergeSort.merge(sortedData, futureObjects.get(counter));
}
return sortedData;
}
This way you can avoid using WHILE loop which is known to increase CPU utilization (hence decrease in speed), and if you have single core CPU then it can reach 100% of utilization, and if dual core then 50%.
Also, using ExecutorService for thread management is better way when dealing with multi-threading instead of dev starting and monitoring threads for results. So, you can expect performance.
I have not ran it, so you may need to do so change here and there but I have highlighted you approach.
P.S.: When measuring the performance, to get the neat and precise results, always have a new JVM instance created for each run.

Related

ConcurrentModificationException when trying to sum numbers of an Arraylist using multiple threads in Java

I'm new to multithreading in general, so I still don't fully understand it. I don't get why my code is having issues. I'm trying to populate an ArrayList with the first 1000 numbers, and then sum all of them using three threads.
public class Tst extends Thread {
private static int sum = 0;
private final int MOD = 3;
private final int compare;
private static final int LIMIT = 1000;
private static ArrayList<Integer> list = new ArrayList<Integer>();
public Tst(int compare){
this.compare=compare;
}
public synchronized void populate() throws InterruptedException{
for(int i=0; i<=Tst.LIMIT; i++){
if (i%this.MOD == this.compare){
list.add(i);
}
}
}
public synchronized void sum() throws InterruptedException{
for (Integer ger : list){
if (ger%MOD == this.compare){
sum+=ger;
}
}
}
#Override
public void run(){
try {
populate();
sum();
System.out.println(sum);
} catch (InterruptedException ex) {
Logger.getLogger(Tst.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
Tst tst1 = new Tst(0);
tst1.start();
Tst tst2 = new Tst(1);
tst2.start();
Tst tst3 = new Tst(2);
tst3.start();
}
}
I expected it to print "500.500‬", but instead it prints this:
162241
328741
Exception in thread "Thread-0" java.util.ConcurrentModificationException
at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1042)
at java.base/java.util.ArrayList$Itr.next(ArrayList.java:996)
at tst.Tst.sum(Tst.java:38)
at tst.Tst.run(Tst.java:50)
BUILD SUCCESSFUL (total time: 2 seconds)
The problem is happening because your methods are synchronized in "object level", I mean, the monitor lock it uses is of a particular object (tst1,tst2,tst3). In other words, each synchronized method is using a different lock.
Change your synchronized methods to static as a first step to fix it.
while run of tst1 is counting the sum in for-each then run of tst2 might increasing the size of list. So its throwing concurrent modification exception. Using a join can help.
public static void main(String[] args) {
Tst tst1 = new Tst(0);
tst1.start();
tst1.join()
Tst tst2 = new Tst(1);
tst2.start();
tst1.join()
Tst tst3 = new Tst(2);
tst3.start();
}
You misunderstood the semantic of synchronized method, each one uses different lock object in your case, do it this way:
class SynchList {
private int sum = 0;
private final int MOD = 3;
private int compare;
private final int LIMIT = 1000;
private ArrayList<Integer> list = new ArrayList<Integer>();
public synchronized void populate( int compare) throws InterruptedException{
for(int i=0; i<=LIMIT; i++){
if (i%this.MOD == compare){
list.add(i);
}
}
}
public synchronized void sum( int compare ) throws InterruptedException{
for (Integer ger : list){
if (ger%MOD == compare){
sum+=ger;
}
System.out.println( sum );
}
}
}
class Tst extends Thread {
int compare;
SynchList synchList;
public Tst(int compare, SynchList synchList)
{
this.compare= compare;
this.synchList = synchList;
}
#Override
public void run(){
try {
synchList.populate( compare );
synchList.sum( compare );
} catch (InterruptedException ex) {
Logger.getLogger(Tst.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
public class Main
{
public static void main(String[] args) {
SynchList synchList = new SynchList();
Tst tst1 = new Tst( 0 , synchList );
tst1.start();
Tst tst2 = new Tst( 1, synchList );
tst2.start();
Tst tst3 = new Tst( 2, synchList );
tst3.start();
}
}
Your use of synchronized methods isn't doing what you think it's doing. The way your code is written, the methods "sum" and "populate" are protected
from running at the same time, but only on the same thread instance. That means calls to "sum" and "populate" for a single Tst object will happen one at a time,
but simultaneous calls to "sum" on different object instances will be allowed to happen concurrently.
Using synchronized on a method is equivalent to writing a method that is wrapped
with synchronized(this) { ... } around the entire method body. With three different instances created – tst1, tst2, and tst3 – this form of synchronization
doesn't guard across object instances. Instead, it guarantees that only one of populate or sum will be running at a time on a single object; any other calls to one of
those methods (on the same object instance) will wait until the prior one finishes.
Take a look at 8.4.3.6. synchronized Methods in the Java Language Specification
for more detail.
Your use of static might also not be doing what you think it's doing. Your code also shares things across all instances of the Tst thread class – namely, sum and list. Because these are defined as static,
there will be a one sum and one list. There is no thread safety in your code to guard against concurrent changes to either of those.
For example, as threads are updating
"sum" (with the line: sum+=ger), the results will be non-deterministic. That is, you will likely see different results every time you run it.
Another example of unexpected behavior with multiple threads and a single static variable is list – that will grow over time which can result in concurrency issues. The Javadoc says:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally.
Modifications include adding values as well as growing the backing array store. Without specifying a starting size – new ArrayList() – it will default to 10 or possibly some other relatively small number depending on which JVM version you're using. Once one thread tries to add an item that exceeds the ArrayList's capacity, it will trigger an automatic resize.
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.

Parallelizing a fitness function in genetic algorithm

I'm beginner to Java and as my homework I'm supposed to implement concurrency to genetic algorithm solution for Travelling Salesman Problem posted here. Our goal is to make chromosome evaluation performed by threads. So my guess is I have to rewrite this part of code to be multithreaded:
// Gets the best tour in the population
public Tour getFittest() {
Tour fittest = tours[0];
// Loop through individuals to find fittest
for (int i = 1; i < populationSize(); i++) {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
}
return fittest;
}
// Gets population size
public int populationSize() {
return tours.length;
}
Originaly I intended on manually splitting the Array beetwen threads but I believe it;s not the best solution to the problem. So I made some research and everyone suggest to use either parallel streams or ExecutorService. However I had trouble applying both of this solutions even thought I tried to emulate examples posted in other threads. So my questions are: how exactly do I implement them in this case and which one is faster?
Edit: Sorry, I forget to post solution I've tried. Here it is:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
});
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
However when trying to run it I receive "Local variable fittest defined in an enclosing scope must be final or effectively final" error at line:
fittest = getTour(i);
And I have no clue why it's happening or how can I fix it as adding final keyword while initializing it does not fix it. Other than that I have some doubts about using synchronized keyword in this solution. I believe that to achieve true multithreading I need to make use on it due to resource being shared by various threads. Am I right? Sadly I didn't saved my attemp at using streams but I have trouble understanding how it works at all.
Edit2: I managed to "fix" my solution by adding two workarounds. Currently my code looks like that:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
final Integer innerI = new Integer(i);
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(innerI).getFitness()) {
setFitness(innerI, fittest);
}
}
);
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
public Tour setFitness (int i, Tour fittest) {
fittest = getTour(i);
return fittest;
}
That said, while it's compiling, there are two problems. Memory usage keeps rising every second the program runs, maxing out my 16GB of RAM in like ten seconds while variable 'fittest' does not change at all. So I guess I'm still doing something wrong.
Here is my steams implementation:
private static Tour getFittest(Tour[] tours){
List<Map.Entry<Tour,Double>> lengths = new ArrayList<>();
Arrays.stream(tours).parallel().forEach(t->lengths.add(new AbstractMap.SimpleEntry<Tour,Double>(t,t.getLength())));
return Collections.min(lengths,Comparator.comparingDouble(Map.Entry::getValue)).getKey();
}
Upon further looking can be 1liner kinda depending on your definition
private static Tour getFittest(Tour[] tours) {
return Arrays.stream(tours).parallel().map(t -> new AbstractMap.SimpleEntry<Tour, Double>(t, t.getLength()))
.min(Comparator.comparingDouble(Map.Entry::getValue)).get().getKey();
}
also after further looking they use .getFitness() which is reciprocal of length. if you use that then use .max() as the filter.
actually even better after review
return Arrays.stream(tours).parallel()
.min(Comparator.comparingDouble(Tour::getLength)).get();

Concurrent Modification Exception in Callable class

I'm trying to split a list of objects within smaller sublist and to process them separately on different threads. So I have following code:
List<Instance> instances = xmlInstance.readInstancesFromXml();
List<Future<List<Instance>>> futureList = new ArrayList<>();
int nThreads = 4;
ExecutorService executor = Executors.newFixedThreadPool(nThreads);
final List<List<Instance>> instancesPerThread = split(instances, nThreads);
for (List<Instance> instancesThread : instancesPerThread) {
if (instancesThread.isEmpty()) {
break;
}
Callable<List<Instance>> callable = new MyCallable(instancesThread);
Future<List<Instance>> submit = executor.submit(callable);
futureList.add(submit);
}
instances.clear();
for (Future<List<Instance>> future : futureList) {
try {
final List<Instance> instancesFromFuture = future.get();
instances.addAll(instancesFromFuture);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException ie) {
ie.printStackTrace();
}
And the MyCallable class :
public class MyCallable implements Callable<List<Instance>> {
private List<Instance> instances;
public MyCallable (List<Instance> instances) {
this.instances = Collections.synchronizedList(instances);
}
#Override
public List<Instance> call() throws Exception {
for (Instance instance : instances) {
//process each object and changing some fields;
}
return instances;
}
}
Split method(It's split an given list in given number of list, also trying to have almost same size on each sublist) :
public static List<List<Instance>> split(List<Instance> list, int nrOfThreads) {
List<List<Instance>> parts = new ArrayList<>();
final int nrOfItems = list.size();
int minItemsPerThread = nrOfItems / nrOfThreads;
int maxItemsPerThread = minItemsPerThread + 1;
int threadsWithMaxItems = nrOfItems - nrOfThreads * minItemsPerThread;
int start = 0;
for (int i = 0; i < nrOfThreads; i++) {
int itemsCount = (i < threadsWithMaxItems ? maxItemsPerThread : minItemsPerThread);
int end = start + itemsCount;
parts.add(list.subList(start, end));
start = end;
}
return parts;
}
So, when I'm trying to execute it, I'm getting java.util.ConcurrentModificationException on this line for (Instance instance : instances) {, can somebody give any ideas why it's happening?
public MyCallable (List<Instance> instances) {
this.instances = Collections.synchronizedList(instances);
}
Using synchronizedList like this doesn't help you in the way you think it might.
It's only useful to wrap a list in a synchronizedList at the time you create it (e.g. Collections.synchronizedList(new ArrayList<>()). Otherwise, the underlying list is directly accessible, and thus accessible in an unsynchronized way.
Additionally, synchronizedList only synchronizes for the duration of individual method calls, not for the whole time while you are iterating over it.
The easiest fix here is to take a copy of the list in the constructor:
this.instances = new ArrayList<>(instances);
Then, nobody else has access to that list, so they can't change it while you are iterating it.
This is different to taking a copy of the list in the call method, because the copy is done in a single-threaded part of the code: no other thread can be modifying it while you are taking that copy, so you won't get the ConcurrentModificationException (you can get a CME in single-threaded code, but not using this copy constructor). Doing the copy in the call method means the list is iterated, in exactly the same way as with the for loop you already have.

Why does a concurrent hash map work properly when accessed by two thread, one using the clear() and other using the putifAbsent() methods?

I am implementing an application using concurrent hash maps. It is required that one thread adds data into the CHM, while there is another thread that copies the values currently in the CHM and erases it using the clear() method. When I run it, after the clear() method is executed, the CHM always remains empty, though the other thread continues adding data to CHM.
Could someone tell me why it is so and help me find the solution.
This is the method that adds data to the CHM. This method is called from within a thread.
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentHashMap;
public static ConcurrentMap<String, String> updateJobList = new ConcurrentHashMap<String, String>(8, 0.9f, 6);
public void setUpdateQuery(String ticker, String query)
throws RemoteException {
dataBaseText = "streamming";
int n = 0;
try {
updateJobList.putIfAbsent(ticker, query);
}
catch(Exception e)
{e.printStackTrace();}
........................
}
Another thread calls the track_allocation method every minute.
public void track_allocation()
{
class Track_Thread implements Runnable {
String[] track;
Track_Thread(String[] s)
{
track = s;
}
public void run()
{
}
public void run(String[] s)
{
MonitoringForm.txtInforamtion.append(Thread.currentThread()+"has started runnning");
String query = "";
track = getMaxBenefit(track);
track = quickSort(track, 0, track.length-1);
for(int x=0;x<track.length;x++)
{
query = track[x].split(",")[0];
try
{
DatabaseConnection.insertQuery(query);
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
}
joblist = updateJobList.values();
MonitoringForm.txtInforamtion.append("\nSize of the joblist is:"+joblist.size());
int n = joblist.size()/6;
String[][] jobs = new String[6][n+6];
MonitoringForm.txtInforamtion.append("number of threads:"+n);
int i = 0;
if(n>0)
{
MonitoringForm.txtInforamtion.append("\nSize of the joblist is:"+joblist.size());
synchronized(this)
{
updateJobList.clear();
}
Thread[] threads = new Thread[6];
Iterator it = joblist.iterator();
int k = 0;
for(int j=0;j<6; j++)
{
for(k = 0; k<n; k++)
{
jobs[j][k] = it.next().toString();
MonitoringForm.txtInforamtion.append("\n\ninserted into queue:\n"+jobs[j][k]+"\n");
}
if(it.hasNext() && j == 5)
{
while(it.hasNext())
{
jobs[j][++k] = it.next().toString();
}
}
threads[j] = new Thread(new Track_Thread(jobs[j]));
threads[j].start();
}
}
}
I can see a glaring mistake. This is the implementation of your Track_Thread classes run method.
public void run()
{
}
So, when you do this:
threads[j] = new Thread(new Track_Thread(jobs[j]));
threads[j].start();
..... the thread starts, and then immediately ends, having done absolutely nothing. Your run(String[]) method is never called!
In addition, your approach of iterating the map and then clearing it while other threads are simultaneously adding is likely to lead to entries occasionally being removed from the map without the iteration actually seeing them.
While I have your attention, you have a lot of style errors in your code:
The indentation is a mess.
You have named your class incorrectly: it is NOT a thread, and that identifier ignores the Java identifier rule.
Your use of white-space in statements is inconsistent.
These things make your code hard to read ... and to be frank, they put me off trying to really understand it.

Java: What type of list to use in my multi-threaded app

I'm new to using threads and just trying to figure things out. My end game is to have a list of URLs, my program will take one URL from the list at a time and perform an action using that URL. There'll be a lot of URLs and this list may possibly be added to while some threads are using the same list.
To start experimenting and learning I'm using a simple ArrayList filled with numbers and am using a threaded pool to get the URLs. Here's my code:
public static void main(String[] args) {
for (int i = 0; i < 200; i++){
test.add(i);
}
SlothTest runner = new SlothTest();
Thread alpha = new Thread(runner);
Thread beta = new Thread(runner);
ExecutorService tasker = Executors.newFixedThreadPool(10);
while (!listEmpty()){
tasker.submit(new SlothTest());
}
tasker.shutdown();
System.out.println("Complete...");
}
#Override
public void run() {
getLink();
try {
Thread.sleep(20);
} catch (InterruptedException e) {
}
}
private synchronized String getLink(){
link = Thread.currentThread().getName() + " printed " + test.indexOf(test.size()-1);
test.remove(test.size()-1);
System.out.println(link);
return link;
}
private synchronized static boolean listEmpty(){
if (test.size() > 0){
return false;
} else {
return true;
}
}
I'm running into some concurrency issues while running the program and getting some -1's for my output. I'm not sure why this is happening and I know my above code is rough but I'm really in the learning stage a multi-threaded apps. Can anyone help me first off with fixing my concurrency issue and then if you can give me any pointers about my above code that would also be great
One problem is that
while (!listEmpty()){
tasker.submit(new SlothTest());
}
is not atomic. So listEmpty might return false, but become true by the time you reach the next statement.
Another one is that you synchronize on two different monitors:
private synchronized String getLink(){ //synchronized on this
private synchronized static boolean listEmpty(){//synchronized on this.class
Have you considered using a BlockingQueue instead of a list, which has useful methods for what you are trying to achieve.
Try using a ConcurrentLinkedQueue for your list of URLs. This is a good implementation often used in producer-consumer examples, similar to yours (although you don't have an active 'producer', per-se).
You're not globally synchronizing. By using synchronized methods you are locking the current instance, which is different for each task. You should use a global lock instead:
final static Object globalLock = new Object();
private String getLink() {
synchronized (globalLock) {
link = Thread.currentThread().getName() + " printed " + test.indexOf(test.size()-1);
test.remove(test.size()-1);
}
System.out.println(link);
return link;
}
private boolean listEmpty(){
synchronized (globalLock) {
if (test.size() > 0){
return false;
} else {
return true;
}
}
}

Categories