In my program I essentially have a method similar to:
for (int x=0; x<numberofimagesinmyfolder; x++){
for(int y=0; y<numberofimagesinmyfolder; y++){
compare(imagex,imagey);
if(match==true){
System.out.println("image x matches image y");
}
}
}
So basically I have a folder of images and I compare all combinations of images...so compare image 1 against all images then image 2...and so on. My problem is when searching to see what images match, it takes a long time. I am trying to multithread this process. Does anyone have any idea of how to do this?
Instead of comparing the images every time, hash the images, save the hash, and then compare the hashes of each pair of messages. Since a hash is far smaller you can fit more into memory and cache, which should significantly speed up comparisons.
There is probably a better way to do the search for equality as well, but one option would be to stick all the hashes into an array then sort them by hash value. Then iterate over the list looking for adjacent entries that are equal. This should be O(n*log(n)) instead of O(n^2) like your current version.
inner loop should start at y=x+1 to take advantage of symmetry.
load all images into memory first. don't do all compares from disk.
Use a Java ExecutorService (basically a thread pool). Queue tasks for all index combinations. Let threads pull index combinations out of a task queue and execute comparisons.
Here is some general code to do the multi threading:
public static class CompareTask implements Runnable {
CountDownLatch completion;
Object imgA;
Object imgB;
public CompareTask(CountDownLatch completion, Object imgA, Object imgB) {
this.completion = completion;
this.imgA = imgA;
this.imgB = imgB;
}
#Override
public void run() {
// TODO: Do computation...
try {
System.out.println("Thread simulating task start.");
Thread.sleep(500);
System.out.println("Thread simulating task done.");
} catch (InterruptedException e) {
e.printStackTrace();
}
completion.countDown();
}
}
public static void main(String[] args) throws Exception {
Object[] images = new Object[10];
ExecutorService es = Executors.newFixedThreadPool(5);
CountDownLatch completion = new CountDownLatch(images.length * (images.length - 1) / 2);
for (int i = 0; i < images.length; i++) {
for (int j = i + 1; j < images.length; j++) {
es.submit(new CompareTask(completion, images[i], images[j]));
}
}
System.out.println("Submitted tasks. Waiting...");
completion.await();
System.out.println("Done");
es.shutdown();
}
Related
I do not have a background in CS. I am really new to parallel programming and I do not know how exactly the hardware works when running a program. However, I have noticed the following. Say I have:
public class Counter {
private static int parallelCount = 0;
private static int sequentialCount = 0;
public static void main(String[] args) {
int n = 1000;
// I count in parallel:
IntStream.range(0, n).parallel().forEach(i -> {
parallelCount++;
});
// I count sequentially:
for (int i = 0; i < n; i++) {
sequentialCount++;
}
System.out.println("parallelCount = " + parallelCount);
System.out.println("sequentialCount = " + sequentialCount);
}
}
why I may get:
parallelCount = 984
sequentialCount = 1000
I guess this has to do with the hardware and the way the compiler access memory. I am really interested to know why this happens. And what is one possible solution?
Whenever more than one threads can access a value that is mutable then the system goes out of sync meaning the kind of problem that you are facing. No one can be sure what the result will be, and many a times the result will be wrong. You cannot guarantee which thread will write the value last.
Therefore, you need to synchronize the access to the shared resource (the integer you are incrementing) so that all threads get the latest updated value and the answer is always correct.
Coming to your program you can try making the parallelCount variable an Atomic Integer like AtomicInteger parallelCount = new AtomicInteger(); An Atomic Integer is thread safe meaning that they can be concurrently updated without running the system out of sync.
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.IntStream;
public class Counter {
private static AtomicInteger parallelCount = new AtomicInteger();
private static int sequentialCount = 0;
public static void main(String[] args) {
int n = 1000;
// I count in parallel:
IntStream.range(0, n).parallel().forEach(i -> {
parallelCount.getAndIncrement();
});
// I count sequentially:
for (int i = 0; i < n; i++) {
sequentialCount++;
}
System.out.println("parallelCount = " + parallelCount);
System.out.println("sequentialCount = " + sequentialCount);
}
}
As you can expect standard for loop will increment sequentialCount 1000 times
Regarding parallel stream, the application will try to open multiple threads which need to execute your function on parallel. In this situation, multiple threads can increment value at the same time and store value to int.
For example, suppose that we have two threads that working parallel and want to increment value from variable parallelCount. If parallelCount contains value 50. Both threads will read value 50 and calculate the new value 51 and store it to memory.
This approach can produce other concurrent problems. In order to solve this problem, you can use synchronization, locking, atomic classes, or another approach.
Multiple theads do an operation that is not atomic (incrementing a value).
The code you wrote translates to byte code and might cause something like this:
To avoid this, you need to synchronize the access to that critical code.
But note, that if all of your code is critical code, then it's redundant to use multiple threads.
AtomicInteger
We can make use of AtomicInteger class from Java concurrency package while working with parallel streams as the behavior can be unpredictable while using primitive data type
import java.util.stream.IntStream;
import java.util.concurrent.atomic.AtomicInteger;
public class Main
{
private static AtomicInteger parallelCount = new AtomicInteger();
private static int sequentialCount = 0;
public static void main(String[] args) {
System.out.println("Hello World");
int n = 100000;
// I count in parallel:
IntStream.range(0, n).parallel().forEach(i -> {
parallelCount.incrementAndGet();
});
// I count sequentially:
for (int i = 0; i < n; i++) {
sequentialCount++;
}
System.out.println("parallelCount = " + parallelCount);
System.out.println("sequentialCount = " + sequentialCount);
}
}
Consider the following code:
public static void main(String[] args) throws InterruptedException {
int nThreads = 10;
MyThread[] threads = new MyThread[nThreads];
AtomicReferenceArray<Object> array = new AtomicReferenceArray<>(nThreads);
for (int i = 0; i < nThreads; i++) {
MyThread thread = new MyThread(array, i);
threads[i] = thread;
thread.start();
}
for (MyThread thread : threads)
thread.join();
for (int i = 0; i < nThreads; i++) {
Object obj_i = array.get(i);
// do something with obj_i...
}
}
private static class MyThread extends Thread {
private final AtomicReferenceArray<Object> pArray;
private final int pIndex;
public MyThread(final AtomicReferenceArray<Object> array, final int index) {
pArray = array;
pIndex = index;
}
#Override
public void run() {
// some entirely local time-consuming computation...
pArray.set(pIndex, /* result of the computation */);
}
}
Each MyThread computes something entirely locally (without need to synchronize with other threads) and writes the result to its specific array cell. The main thread waits until all MyThreads have finished, and then retrieves the results and does something with them.
Using the get and set methods of AtomicReferenceArray provides a memory ordering which guarantees that the main thread will see the results written by the MyThreads.
However, since every array cell is written only once, and no MyThread has to see the result written by any other MyThread, I wonder if these strong ordering guarantees are actually necessary or if the following code, with plain array cell accesses, would be guaranteed to always yield the same results as the code above:
public static void main(String[] args) throws InterruptedException {
int nThreads = 10;
MyThread[] threads = new MyThread[nThreads];
Object[] array = new Object[nThreads];
for (int i = 0; i < nThreads; i++) {
MyThread thread = new MyThread(array, i);
threads[i] = thread;
thread.start();
}
for (MyThread thread : threads)
thread.join();
for (int i = 0; i < nThreads; i++) {
Object obj_i = array[i];
// do something with obj_i...
}
}
private static class MyThread extends Thread {
private final Object[] pArray;
private final int pIndex;
public MyThread(final Object[] array, final int index) {
pArray = array;
pIndex = index;
}
#Override
public void run() {
// some entirely local time-consuming computation...
pArray[pIndex] = /* result of the computation */;
}
}
On the one hand, under plain mode access the compiler or runtime might happen to optimize away the read accesses to array in the final loop of the main thread and replace Object obj_i = array[i]; with Object obj_i = null; (the implicit initialization of the array) as the array is not modified from within that thread. On the other hand, I have read somewhere that Thread.join makes all changes of the joined thread visible to the calling thread (which would be sensible), so Object obj_i = array[i]; should see the object reference assigned by the i-th MyThread.
So, would the latter code produce the same results as the above?
So, would the latter code produce the same results as the above?
Yes.
The "somewhere" that you've read about Thread.join could be JLS 17.4.5 (The "Happens-before order" bit of the Java Memory Model):
All actions in a thread happen-before any other thread successfully returns from a join() on that thread.
So, all of your writes to individual elements will happen before the final join().
With this said, I would strongly recommend that you look for alternative ways to structure your problem that don't require you to be worrying about the correctness of your code at this level of detail (see my other answer).
An easier solution here would appear to be to use the Executor framework, which hides typically unnecessary details about the threads and how the result is stored.
For example:
ExecutorService executor = ...
List<Future<Object>> futures = new ArrayList<>();
for (int i = 0; i < nThreads; i++) {
futures.add(executor.submit(new MyCallable<>(i)));
}
executor.shutdown();
for (int i = 0; i < nThreads; ++i) {
array[i] = futures.get(i).get();
}
for (int i = 0; i < nThreads; i++) {
Object obj_i = array[i];
// do something with obj_i...
}
where MyCallable is analogous to your MyThread:
private static class MyCallable implements Callable<Object> {
private final int pIndex;
public MyCallable(final int index) {
pIndex = index;
}
#Override
public Object call() {
// some entirely local time-consuming computation...
return /* result of the computation */;
}
}
This results in simpler and more-obviously correct code, because you're not worrying about memory consistency: this is handled by the framework. It also gives you more flexibility, e.g. running it on fewer threads than work items, reusing a thread pool etc.
Atomic operations are required to ensure memory barriers are present when multiple threads access the same memory location. Without memory barriers, there is no happened-before relationship between the threads and there is no guarantee that the main thread will see the modifications done by the other threads, hence data rance. So what you really need is memory barriers for the write and read operations. You can achieve that using AtomicReferenceArray or a synchronized block on a common object.
You have Thread.join in the second program before the read operations. That should remove the data race. Without the join, you need explicit synchronization.
I'm beginner to Java and as my homework I'm supposed to implement concurrency to genetic algorithm solution for Travelling Salesman Problem posted here. Our goal is to make chromosome evaluation performed by threads. So my guess is I have to rewrite this part of code to be multithreaded:
// Gets the best tour in the population
public Tour getFittest() {
Tour fittest = tours[0];
// Loop through individuals to find fittest
for (int i = 1; i < populationSize(); i++) {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
}
return fittest;
}
// Gets population size
public int populationSize() {
return tours.length;
}
Originaly I intended on manually splitting the Array beetwen threads but I believe it;s not the best solution to the problem. So I made some research and everyone suggest to use either parallel streams or ExecutorService. However I had trouble applying both of this solutions even thought I tried to emulate examples posted in other threads. So my questions are: how exactly do I implement them in this case and which one is faster?
Edit: Sorry, I forget to post solution I've tried. Here it is:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(i).getFitness()) {
fittest = getTour(i);
}
});
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
However when trying to run it I receive "Local variable fittest defined in an enclosing scope must be final or effectively final" error at line:
fittest = getTour(i);
And I have no clue why it's happening or how can I fix it as adding final keyword while initializing it does not fix it. Other than that I have some doubts about using synchronized keyword in this solution. I believe that to achieve true multithreading I need to make use on it due to resource being shared by various threads. Am I right? Sadly I didn't saved my attemp at using streams but I have trouble understanding how it works at all.
Edit2: I managed to "fix" my solution by adding two workarounds. Currently my code looks like that:
public Tour getFittest() {
Tour fittest = tours[0];
synchronized (fittest) {
final ExecutorService executor = Executors.newFixedThreadPool(4);
final List<Future<?>> futures = new ArrayList<>();
for (int i = 1; i < populationSize(); i++) {
final Integer innerI = new Integer(i);
Future<?> future = executor.submit((Runnable) () -> {
if (fittest.getFitness() <= getTour(innerI).getFitness()) {
setFitness(innerI, fittest);
}
}
);
futures.add(future);
}
try {
for (Future<?> future : futures) {
future.get();
}
}catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
return fittest;
}
public int populationSize() {
return tours.length;
}
public Tour setFitness (int i, Tour fittest) {
fittest = getTour(i);
return fittest;
}
That said, while it's compiling, there are two problems. Memory usage keeps rising every second the program runs, maxing out my 16GB of RAM in like ten seconds while variable 'fittest' does not change at all. So I guess I'm still doing something wrong.
Here is my steams implementation:
private static Tour getFittest(Tour[] tours){
List<Map.Entry<Tour,Double>> lengths = new ArrayList<>();
Arrays.stream(tours).parallel().forEach(t->lengths.add(new AbstractMap.SimpleEntry<Tour,Double>(t,t.getLength())));
return Collections.min(lengths,Comparator.comparingDouble(Map.Entry::getValue)).getKey();
}
Upon further looking can be 1liner kinda depending on your definition
private static Tour getFittest(Tour[] tours) {
return Arrays.stream(tours).parallel().map(t -> new AbstractMap.SimpleEntry<Tour, Double>(t, t.getLength()))
.min(Comparator.comparingDouble(Map.Entry::getValue)).get().getKey();
}
also after further looking they use .getFitness() which is reciprocal of length. if you use that then use .max() as the filter.
actually even better after review
return Arrays.stream(tours).parallel()
.min(Comparator.comparingDouble(Tour::getLength)).get();
I know this has been asked a few times before:
Java: how to synchronize array accesses and what are the limitations on what goes in a synchronized condition
Synchronizing elements in an array
but I couldn't quite find the answer to my question: When I have an array with multiple elements and my threads shall modify these elements at the same time (otherwise there would be no advantage of using threads, right?) and I use my array as a lock (which kind of makes sense because it is the critical data which could go into inconsistent state due to time-slicing) then my operations on the array are safe BUT all the parallelization would be lost, wouldn't it?!
I add some code I want to try this on:
import java.util.Arrays;
public class ParallelNine extends Thread{
public static final int[] input = new int[]{
119_119_119,
119_119_119,
111_111_111,
999_999_999,
};
private static int completed = 0;
static void process(int currentIndex){
System.out.println("Processing " + currentIndex);
String number = Integer.toString(input[currentIndex]);
int counter = 0;
for(int index = 0; index < number.length(); index++){
if(number.charAt(index) == '9')
counter++;
}
input[currentIndex] = counter;
}
#Override public void run(){
while(completed < input.length){
synchronized(input){
process(completed);
completed++;
}
}
}
public static void main(final String... args) throws InterruptedException{
Thread[] threads = new Thread[]{new ParallelNine(), new ParallelNine(), new ParallelNine(), new ParallelNine()};
for(final Thread next : threads){
next.run();
}
for(final Thread next : threads)
next.join();
System.out.println(Arrays.toString(input));
}
}
We have an array with primitive int values. A dedicated method (process) counts, how often the number 9 appears in the integers value at an index of our array and then overwrites the checked arrays element with the counted number of nines. So the correct output would be [3, 3, 0, 9].This array (input) is pretty small but if we have a few thousand entries for example it would make sense to have multiple threads counting nines: So I synchronized on the array but as mentioned above: All the parallelization is lost because only ONE thread at a time has access to the array!
I am implementing an application using concurrent hash maps. It is required that one thread adds data into the CHM, while there is another thread that copies the values currently in the CHM and erases it using the clear() method. When I run it, after the clear() method is executed, the CHM always remains empty, though the other thread continues adding data to CHM.
Could someone tell me why it is so and help me find the solution.
This is the method that adds data to the CHM. This method is called from within a thread.
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.ConcurrentHashMap;
public static ConcurrentMap<String, String> updateJobList = new ConcurrentHashMap<String, String>(8, 0.9f, 6);
public void setUpdateQuery(String ticker, String query)
throws RemoteException {
dataBaseText = "streamming";
int n = 0;
try {
updateJobList.putIfAbsent(ticker, query);
}
catch(Exception e)
{e.printStackTrace();}
........................
}
Another thread calls the track_allocation method every minute.
public void track_allocation()
{
class Track_Thread implements Runnable {
String[] track;
Track_Thread(String[] s)
{
track = s;
}
public void run()
{
}
public void run(String[] s)
{
MonitoringForm.txtInforamtion.append(Thread.currentThread()+"has started runnning");
String query = "";
track = getMaxBenefit(track);
track = quickSort(track, 0, track.length-1);
for(int x=0;x<track.length;x++)
{
query = track[x].split(",")[0];
try
{
DatabaseConnection.insertQuery(query);
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
}
joblist = updateJobList.values();
MonitoringForm.txtInforamtion.append("\nSize of the joblist is:"+joblist.size());
int n = joblist.size()/6;
String[][] jobs = new String[6][n+6];
MonitoringForm.txtInforamtion.append("number of threads:"+n);
int i = 0;
if(n>0)
{
MonitoringForm.txtInforamtion.append("\nSize of the joblist is:"+joblist.size());
synchronized(this)
{
updateJobList.clear();
}
Thread[] threads = new Thread[6];
Iterator it = joblist.iterator();
int k = 0;
for(int j=0;j<6; j++)
{
for(k = 0; k<n; k++)
{
jobs[j][k] = it.next().toString();
MonitoringForm.txtInforamtion.append("\n\ninserted into queue:\n"+jobs[j][k]+"\n");
}
if(it.hasNext() && j == 5)
{
while(it.hasNext())
{
jobs[j][++k] = it.next().toString();
}
}
threads[j] = new Thread(new Track_Thread(jobs[j]));
threads[j].start();
}
}
}
I can see a glaring mistake. This is the implementation of your Track_Thread classes run method.
public void run()
{
}
So, when you do this:
threads[j] = new Thread(new Track_Thread(jobs[j]));
threads[j].start();
..... the thread starts, and then immediately ends, having done absolutely nothing. Your run(String[]) method is never called!
In addition, your approach of iterating the map and then clearing it while other threads are simultaneously adding is likely to lead to entries occasionally being removed from the map without the iteration actually seeing them.
While I have your attention, you have a lot of style errors in your code:
The indentation is a mess.
You have named your class incorrectly: it is NOT a thread, and that identifier ignores the Java identifier rule.
Your use of white-space in statements is inconsistent.
These things make your code hard to read ... and to be frank, they put me off trying to really understand it.