Parallel implementation of Levenshtein distance slows down with more threads - java

This is a parallel implementation of Levenshtein distance that I was writing for fun. I'm disappointed in the results. I am running this on a core i7 processor, so I have plenty of available threads. However, as I increase the thread count, the performance degrades significantly. By that I mean it actually runs slower with more threads for input of the same size.
I was hoping that someone could look at the way I am using threads, and the java.util.concurrent package, and tell me if I am doing anything wrong. I'm really only interested in reasons why the parallelism is not working as I would expect. I don't expect the reader to look at the complicated indexing going on here. I believe the calculations I'm doing are correct. But even if they are not, I think I should still be seeing a close to linear speed-up as I increase the number of threads in the threadpool.
I've included the benchmarking code I used. I'm using libraries found here for benchmarking. The second code block is what I used for benchmarking.
Thanks for any help :).
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.*;
public class EditDistance {
private static final int MIN_CHUNK_SIZE = 5;
private final ExecutorService threadPool;
private final int threadCount;
private final String maxStr;
private final String minStr;
private final int maxLen;
private final int minLen;
public EditDistance(String s1, String s2, ExecutorService threadPool,
int threadCount) {
this.threadCount = threadCount;
this.threadPool = threadPool;
if (s1.length() < s2.length()) {
minStr = s1;
maxStr = s2;
} else {
minStr = s2;
maxStr = s1;
}
maxLen = maxStr.length();
minLen = minStr.length();
}
public int editDist() {
int iterations = maxLen + minLen - 1;
int[] prev = new int[0];
int[] current = null;
for (int i = 0; i < iterations; i++) {
int currentLen;
if (i < minLen) {
currentLen = i + 1;
} else if (i < maxLen) {
currentLen = minLen;
} else {
currentLen = iterations - i;
}
current = new int[currentLen * 2 - 1];
parallelize(prev, current, currentLen, i);
prev = current;
}
return current[0];
}
private void parallelize(int[] prev, int[] current, int currentLen,
int iteration) {
int chunkSize = Math.max(current.length / threadCount, MIN_CHUNK_SIZE);
List<Future<?>> futures = new ArrayList<Future<?>>(currentLen);
for (int i = 0; i < currentLen; i += chunkSize) {
int stopIdx = Math.min(currentLen, i + chunkSize);
Runnable worker = new Worker(prev, current, currentLen, iteration,
i, stopIdx);
futures.add(threadPool.submit(worker));
}
for (Future<?> future : futures) {
try {
Object result = future.get();
if (result != null) {
throw new RuntimeException(result.toString());
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
} catch (ExecutionException e) {
// We can only finish the computation if we complete
// all subproblems
throw new RuntimeException(e);
}
}
}
private void doChunk(int[] prev, int[] current, int currentLen,
int iteration, int startIdx, int stopIdx) {
int mergeStartIdx = (iteration < minLen) ? 0 : 2;
for (int i = startIdx; i < stopIdx; i++) {
// Edit distance
int x;
int y;
int leftIdx;
int downIdx;
int diagonalIdx;
if (iteration < minLen) {
x = i;
y = currentLen - i - 1;
leftIdx = i * 2 - 2;
downIdx = i * 2;
diagonalIdx = i * 2 - 1;
} else {
x = i + iteration - minLen + 1;
y = minLen - i - 1;
leftIdx = i * 2;
downIdx = i * 2 + 2;
diagonalIdx = i * 2 + 1;
}
int left = 1 + ((leftIdx < 0) ? iteration + 1 : prev[leftIdx]);
int down = 1 + ((downIdx < prev.length) ? prev[downIdx]
: iteration + 1);
int diagonal = penalty(x, y)
+ ((diagonalIdx < 0 || diagonalIdx >= prev.length) ? iteration
: prev[diagonalIdx]);
int dist = Math.min(left, Math.min(down, diagonal));
current[i * 2] = dist;
// Merge prev
int mergeIdx = i * 2 + 1;
if (mergeIdx < current.length) {
current[mergeIdx] = prev[mergeStartIdx + i * 2];
}
}
}
private int penalty(int maxIdx, int minIdx) {
return (maxStr.charAt(maxIdx) == minStr.charAt(minIdx)) ? 0 : 1;
}
private class Worker implements Runnable {
private final int[] prev;
private final int[] current;
private final int currentLen;
private final int iteration;
private final int startIdx;
private final int stopIdx;
Worker(int[] prev, int[] current, int currentLen, int iteration,
int startIdx, int stopIdx) {
this.prev = prev;
this.current = current;
this.currentLen = currentLen;
this.iteration = iteration;
this.startIdx = startIdx;
this.stopIdx = stopIdx;
}
#Override
public void run() {
doChunk(prev, current, currentLen, iteration, startIdx, stopIdx);
}
}
public static void main(String args[]) {
int threadCount = 4;
ExecutorService threadPool = Executors.newFixedThreadPool(threadCount);
EditDistance ed = new EditDistance("Saturday", "Sunday", threadPool,
threadCount);
System.out.println(ed.editDist());
threadPool.shutdown();
}
}
There is a private inner class Worker inside EditDistance. Each worker is responsible for filling in a range of the current array using EditDistance.doChunk. EditDistance.parallelize is responsible for creating those workers, and waiting for them to finish their tasks.
And the code I am using for benchmarks:
import java.io.PrintStream;
import java.util.concurrent.*;
import org.apache.commons.lang3.RandomStringUtils;
import bb.util.Benchmark;
public class EditDistanceBenchmark {
public static void main(String[] args) {
if (args.length != 2) {
System.out.println("Usage: <string length> <thread count>");
System.exit(1);
}
PrintStream oldOut = System.out;
System.setOut(System.err);
int strLen = Integer.parseInt(args[0]);
int threadCount = Integer.parseInt(args[1]);
String s1 = RandomStringUtils.randomAlphabetic(strLen);
String s2 = RandomStringUtils.randomAlphabetic(strLen);
ExecutorService threadPool = Executors.newFixedThreadPool(threadCount);
Benchmark b = new Benchmark(new Benchmarker(s1, s2, threadPool,threadCount));
System.setOut(oldOut);
System.out.println("threadCount: " + threadCount +
" string length: "+ strLen + "\n\n" + b);
System.out.println("s1: " + s1 + "\ns2: " + s2);
threadPool.shutdown();
}
private static class Benchmarker implements Runnable {
private final String s1, s2;
private final int threadCount;
private final ExecutorService threadPool;
private Benchmarker(String s1, String s2, ExecutorService threadPool, int threadCount) {
this.s1 = s1;
this.s2 = s2;
this.threadPool = threadPool;
this.threadCount = threadCount;
}
#Override
public void run() {
EditDistance d = new EditDistance(s1, s2, threadPool, threadCount);
d.editDist();
}
}
}

It's very easy to accidentally write code that does not parallelize very well. A main culprit is when your threads compete for underlying system resources (e.g. a cache line). Since this algorithm inherently acts on things that are close to each other in physical memory, I suspect pretty strongly that may be the culprit.
I suggest you review this excellent article on False Sharing
http://www.drdobbs.com/go-parallel/article/217500206?pgno=3
and then carefully review your code for cases where threads would block one another.
Additionally, running more threads than you have CPU cores will slow down performance if your threads are CPU bound (if you're already using all cores to near 100%, adding more threads will only add overhead for the context switches).

Related

Running single java thread is faster than main?

im doing a few concurrency experiments in java.
I have this prime calculation method, which is just for mimicking a semi-expensive operation:
static boolean isprime(int n){
if (n == 1)
return false;
boolean flag = false;
for (int i = 2; i <= n / 2; ++i) {
if (n % i == 0) {
flag = true;
break;
}
}
return ! flag;
}
And then I have this main method, which simply calculates all prime number from 0 to N, and stores results in a array of booleans:
public class Main {
public static void main(String[] args) {
final int N = 100_000;
int T = 1;
boolean[] bool = new boolean[N];
ExecutorService es = Executors.newFixedThreadPool(T);
final int partition = N / T;
long start = System.nanoTime();
for (int j = 0; j < N; j++ ){
boolean res = isprime(j);
bool[j] = res;
}
System.out.println(System.nanoTime()-start);
}
This gives me results like: 893888901 n/s 848995600 n/s
And i also have this drivercode, where I use a executorservice where I use one thread to do the same:
public class Main {
public static void main(String[] args) {
final int N = 100_000;
int T = 1;
boolean[] bool = new boolean[N];
ExecutorService es = Executors.newFixedThreadPool(T);
final int partition = N / T;
long start = System.nanoTime();
for (int i = 0; i < T; i++ ){
final int current = i;
es.execute(new Runnable() {
#Override
public void run() {
for (int j = current*partition; j < current*partition+partition; j++ ){
boolean res = isprime(j);
bool[j] = res;
}
}
});
}
es.shutdown();
try {
es.awaitTermination(1, TimeUnit.MILLISECONDS);
} catch (Exception e){
System.out.println("what?");
}
System.out.println(System.nanoTime()-start);
}
this gives results like: 9523201 n/s , 15485300 n/s.
Now the second example is, as you can read, much faster than the first. I can't really understand why that is? should'nt the exercutorservice thread (with 1 thread) be slower, since it's basically doing the work sequentially + overhead from "awaking" the thread, compared to the main thread?
I was expecting the executorservice to be faster when I started adding multiple threads, but this is a little counterintuitive.
It's the timeout at the bottom of your code. If you set that higher you arrive at pretty similar execution times.
es.awaitTermination(1000, TimeUnit.MILLISECONDS);
The execution times you mention for the first main are much higher than the millisecond you allow the second main to wait for the threads to finish.

Why is my ForkJoinPool program running so slow?

I'm learning about parallel programming in Java using fork join pool. I know how it works internally and decided to write a very simple program using it to calculate the max value for a large array. However, when I ran it I found that it would take so long that the program would hang and never complete. When I just calculate the max without using parallelism though it was much faster and finished in milliseconds! Also, when I use Runtime.getRuntime().availableProcessors() my computer returns 8, so it's not like I'm using a 1 core computer that can't handle it. Can somebody tell me if they are getting similar results on their own computer? If so, can you explain what is wrong with my code, i'm very confused. Thanks.
import java.util.concurrent.*;
public class ParralleMax {
private static int numberOfCores = Runtime.getRuntime().availableProcessors();
public static void main(String[] args) {
int[] list = new int[9_000_000];
for (int i = 0; i < list.length; i++)
list[i] = i;
System.out.println("Array created. Program now starting...");
long startTime = System.currentTimeMillis();
int max = calculateMax(list);
long endTime = System.currentTimeMillis();
System.out.println("This computer has " + numberOfCores + " cores.");
System.out.println("The max value " + max + " was calculated in " + (endTime - startTime) + "
miliseconds.");
}
public static int calculateMax(int[] list) {
RecursiveTask<Integer> maxTask = new MaxTask(list, 0, list.length);
ForkJoinPool pool = new ForkJoinPool();
return pool.invoke(maxTask);
}
private static class MaxTask extends RecursiveTask<Integer> {
private final static int THRESHOLD = 1000;
private int[] list;
private int low;
private int high;
public MaxTask(int[] list, int low, int high) {
this.list = list;
this.low = low;
this.high = high;
}
#Override
public Integer compute() {
if (high - low < THRESHOLD) {
int max = list[0];
for (int i = low; i < high; i++)
if (list[i] > max)
max = list[i];
return new Integer(max);
}
else {
int mid = (high + low) / 2;
RecursiveTask left = new MaxTask(list, 0, mid);
RecursiveTask right = new MaxTask(list, mid, high);
right.fork();
left.fork();
Integer leftMax = (Integer)left.join();
Integer rightMax = (Integer)right.join();
return new Integer(Math.max(leftMax.intValue(), rightMax.intValue()));
}
}
}
}

Using 10 Threads to Process an Array

I'm working to improve my java skills but a little unsure on how to handle this multi-threaded application. Basically, the program reads a text file and finds the largest number. I added a for loop within my search algorithm to create 10 threads but I'm not sure if it's actually creating 10 threads. The idea is to improve the execution time, or at least that's what I assume should happen. Is there anyway to check if I did it correctly and if the execution time is indeed improved?
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class ProcessDataFile {
public static void main(String[] args) throws IOException {
int max = Integer.MIN_VALUE;
int i = 0;
int[] numbers = new int[100000];
String datafile = "dataset529.txt"; //string which contains datafile
String line; //current line of text file
try (BufferedReader br = new BufferedReader(new FileReader(datafile))) { //reads in the datafile
while ((line = br.readLine()) != null) { //reads through each line
numbers[i++] = Integer.parseInt(line); //pulls out the number of each line and puts it in numbers[]
}
}
for (i = 0; i < 10000; i++){ //loop to go through each number in the file and compare it to find the largest int.
for(int j = 0; j < 10; j++) { //creates 10 threads
new Thread();
}
if (max < numbers[i]) //As max gets bigger it checks the array and keeps increasing it as it finds a larger int.
max = numbers[i]; //Sets max equal to the final highest value found.
}
System.out.println("The largest number in DataSet529 is: " + max);
}
}
This is a VERY basic example which demonstrates the basic concepts of creating and running threads which process a given range of values from a specific array. The example makes a few assumptions (only a even number of elements for example). The example is also slightly long winded and is done so deliberately, in an attempt to demonstrate the basic steps which would be needed
Start by taking a look at the Concurrency Trail for more details
import java.util.Random;
public class ThreadExample {
public static void main(String[] args) {
int[] numbers = new int[100000];
Random rnd = new Random();
for (int index = 0; index < numbers.length; index++) {
numbers[index] = rnd.nextInt();
}
Thread[] threads = new Thread[10];
Worker[] workers = new Worker[10];
int range = numbers.length / 10;
for (int index = 0; index < 10; index++) {
int startAt = index * range;
int endAt = startAt + range;
workers[index] = new Worker(startAt, endAt, numbers);
}
for (int index = 0; index < 10; index++) {
threads[index] = new Thread(workers[index]);
threads[index].start();
}
boolean isProcessing = false;
do {
isProcessing = false;
for (Thread t : threads) {
if (t.isAlive()) {
isProcessing = true;
break;
}
}
} while (isProcessing);
for (Worker worker : workers) {
System.out.println("Max = " + worker.getMax());
}
}
public static class Worker implements Runnable {
private int startAt;
private int endAt;
private int numbers[];
private int max = Integer.MIN_VALUE;
public Worker(int startAt, int endAt, int[] numbers) {
this.startAt = startAt;
this.endAt = endAt;
this.numbers = numbers;
}
#Override
public void run() {
for (int index = startAt; index < endAt; index++) {
max = Math.max(numbers[index], max);
}
}
public int getMax() {
return max;
}
}
}
A slightly simpler solution would involve the ExecutorService API, which would allow you to offer a series of Callables to the service which would then return a List of Future's. The benefit here is, the service won't return till all the Callables have completed (or have failed), so you don't need constantly check the states of the threads
import java.util.Arrays;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class ThreadExample {
public static void main(String[] args) {
int[] numbers = new int[100000];
Random rnd = new Random();
for (int index = 0; index < numbers.length; index++) {
numbers[index] = rnd.nextInt();
}
ExecutorService executor = Executors.newFixedThreadPool(10);
Worker[] workers = new Worker[10];
int range = numbers.length / 10;
for (int index = 0; index < 10; index++) {
int startAt = index * range;
int endAt = startAt + range;
workers[index] = new Worker(startAt, endAt, numbers);
}
try {
List<Future<Integer>> results = executor.invokeAll(Arrays.asList(workers));
for (Future<Integer> future : results) {
System.out.println(future.get());
}
} catch (InterruptedException | ExecutionException ex) {
ex.printStackTrace();
}
}
public static class Worker implements Callable<Integer> {
private int startAt;
private int endAt;
private int numbers[];
public Worker(int startAt, int endAt, int[] numbers) {
this.startAt = startAt;
this.endAt = endAt;
this.numbers = numbers;
}
#Override
public Integer call() throws Exception {
int max = Integer.MIN_VALUE;
for (int index = startAt; index < endAt; index++) {
max = Math.max(numbers[index], max);
}
return max;
}
}
}
I know this is a bit late answer but you can also use lambda expressions while using ExecutorService instead of creating new class that implements Runnable.
Here is a complete example below, you can play around THREAD_SIZE and RANDOM_ARRAY_SIZE variables.
import org.apache.log4j.Logger;
import java.security.SecureRandom;
import java.util.*;
import java.util.concurrent.*;
public class ConcurrentMaximumTest {
static final int THREAD_SIZE = 10;
static final int RANDOM_ARRAY_SIZE = 8999;
static final SecureRandom RAND = new SecureRandom();
private static Logger logger = Logger.getLogger(ConcurrentMaximumTest.class);
public static void main(String[] args) throws InterruptedException, ExecutionException {
int[] array = generateRandomIntArray(RANDOM_ARRAY_SIZE);
Map<Integer, Integer> positionMap = calculatePositions(array.length, THREAD_SIZE);
ExecutorService threads = Executors.newFixedThreadPool(THREAD_SIZE);
List<Callable<Integer>> toRun = new ArrayList<>(THREAD_SIZE);
for (Map.Entry<Integer, Integer> entry : positionMap.entrySet())
toRun.add(() -> findMax(array, entry.getKey(), entry.getValue()));
int result = Integer.MIN_VALUE;
List<Future<Integer>> futures = threads.invokeAll(toRun);
for (Future<Integer> future : futures) {
Integer localMax = future.get();
if(localMax > result)
result = localMax;
}
threads.shutdownNow();
logger.info("Max value calculated with " + THREAD_SIZE + " threads:" + result);
Arrays.sort(array);
int resultCrosscheck = array[array.length - 1];
logger.info("Max value calculated with sorting: " + resultCrosscheck);
assert result != resultCrosscheck : "Crosscheck failed";
}
/* Calculates start and end positions of each chunk(for simplicity). It can also be calculated on the fly.*/
private static Map<Integer, Integer> calculatePositions(int size, int numThreads){
int lengthOfChunk = size / numThreads;
int remainder = size % numThreads;
int start = 0;
Map<Integer,Integer> result = new LinkedHashMap<>();
for(int i = 0; i < numThreads -1; i++){
result.put(start, lengthOfChunk);
start += lengthOfChunk;
}
result.put(start, lengthOfChunk+remainder);
return result;
}
/*Find maximum value of given part of an array, from start position and chunk size.*/
private static int findMax(int[] wholeArray, int position, int size){
int end = (position + size);
int max = Integer.MIN_VALUE;
logger.info("Starting read for interval [" + position + "," + end + ")");
for(int i = position; i < (position + size); i++)
if(wholeArray[i] > max)
max = wholeArray[i];
logger.info("Finishing finding maximum for interval [" + position + "," + end + ")" + ". Calculated local maximum is " + max);
return max;
}
/* Helper function for generating random int array */
private static int[] generateRandomIntArray(int size){
int[] result = new int[size];
for (int i = 0; i < size; i++)
result[i] = RAND.nextInt(Integer.MAX_VALUE);
return result;
}
}

Search for an element in array with threads [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have following homework to do:
Implement parallel searching for specified element in array. Use number of threads as a function parameter. Eeach thread checks own array piece size of (ArraySize/NumberOfThreads).
class MyThread extends Thread {
final int[] SEARCH_TAB;
final int RANGE_TAB[][];
final int SEARCH_VALUE;
static int searchIndex = -1;
static boolean isWorking = true;
int whichThread;
MyThread(int[] searchTab, int[][] rangeTab, int searchValue, int whichThread) {
SEARCH_TAB = searchTab;
RANGE_TAB = rangeTab;
SEARCH_VALUE = searchValue;
this.whichThread = whichThread;
}
#Override
public void run() {
for (int i = RANGE_TAB[whichThread][0]; i < RANGE_TAB[whichThread][1] && isWorking; ++i) {
synchronized(this) {
if (SEARCH_TAB[i] == SEARCH_VALUE) {
isWorking = false;
searchIndex = i;
}
}
}
}
}
class Main {
private static int[][] range(int n, int p) {
int[] quantities = new int[p];
int remainder = n % p;
int quotient = n/p;
int i;
for (i = 0; i < p; ++i) quantities[i] = quotient;
i = 0;
while (remainder != 0) {
--remainder;
++quantities[i];
++i;
}
int[][] tab = new int[p][2];
tab[0][0] = 0;
tab[0][1] = quantities[0];
for (i = 1; i < p; ++i) {
tab[i][0] = tab[i-1][1];
tab[i][1] = tab[i][0] + quantities[i];
}
return tab;
}
private static int search(int[] searchTab, int numberOfThreads, int searchValue) {
int[][] rangeTab = range(searchTab.length, numberOfThreads);
Thread[] threads = new Thread[numberOfThreads];
for ( int i = 0; i < numberOfThreads; ++i) threads[i] = new MyThread(searchTab, rangeTab, searchValue, i);
for ( int i = 0; i < numberOfThreads; ++i) threads[i].start();
return MyThread.searchIndex;
}
public static void main(String[] args) {
int[] tab = {0, 1, 2, 3, 4, 5, 6, 7 , 8, 9, 10};
int value = 5;
int valueIndex = search(tab, 1, value);
if (valueIndex == -1) System.out.println("Not found.");
else System.out.println(valueIndex);
}
}
This code generally works but cant't find index when one thread is implemented. By the way my teacher said that my code is too long and complicated any suggestions with that?
I will be grateful for any kind of help.
How about the following code:
public class Searcher implements Runnable {
private int intToFind;
private int startIndex;
private int endIndex;
private int[] arrayToSearchIn;
public Searcher(int x, int s, int e, int[] a) {
intToFind = x;
startIndex = s;
endIndex = e;
arrayToSearchIn = a;
}
public void run() {
for (int i = startIndex; i <= endIndex; i++) {
if (arrayToSearchIn[i] == intToFind) System.out.println("Found x at index: " + i);
}
}
}
public class Starter {
public static void main(String[] args) {
int[] a = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
int numberOfThreads = 5;
int x = 20;
findElement(numberOfThreads, x, a);
}
private static void findElement(int numberOfThreads, int x, int[] a) {
int sizeOfa = a.length;
int range = sizeOfa/numberOfThreads;
for (int i = 0; i <= numberOfThreads-1; i++) {
Thread searcher;
if (i == numberOfThreads-1) {
searcher = new Thread(new Searcher(x, i*range, sizeOfa-1, a));
} else {
searcher = new Thread(new Searcher(x, i*range, i*range+range-1, a));
}
searcher.start();
}
}
}
You can still optimize the code e.g. by splitting the rest of the array on all threads instead of just pushing it into the last one (like in my code) but the idea is still the same.
EDIT: I think that there is a problem with your code. It will only show one appearance of x in the array. If you are looking for x = 5 in [5,5,5,5,5] using five threads you can neven know which index will be returned because it depends on how your threads are scheduled. The outcome will be between 0 and 5.

getting error StringIndexOutOfBoundsException: String index out of range [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I am trying to build a matrix with row and column having values such as "aaa" for aligning purposes. but when I run it I get an error. below is my code
public class compute_matrix {
static String seq1="aaa";
static String seq2="aaa";
static int[][] matrix;
static int max_row;
static int max_col;
private static int match_reward=1;
private static int mismatch_penalty= -1;
private static int gap_cost= -1;
private static boolean case_sensitive;
private static boolean isCaseSensitive() {
return case_sensitive;
}
private static int max(int ins, int sub, int del, int i) {
if (ins > sub) {
if (ins > del) {
return ins > i? ins : i;
} else {
return del > i ?del : i;
}
} else if (sub > del) {
return sub> i ? sub : i;
} else {
return del > i ? del : i;
}
}
protected char sequence[];
public static void main(String args[]){
int r, c, rows, cols, ins, sub, del, max_score;
rows = seq1.length()+1;
cols = seq2.length()+1;
matrix = new int [rows][cols];
// initiate first row
for (c = 0; c < cols; c++)
matrix[0][c] = 0;
// keep track of the maximum score
max_row = max_col = max_score = 0;
// calculates the similarity matrix (row-wise)
for (r = 1; r < rows; r++)
{
// initiate first column
matrix[r][0] = 0;
for (c = 1; c < cols; c++)
{
sub = matrix[r-1][c-1] + scoreSubstitution(seq1.charAt(r),seq2.charAt(c));
ins = matrix[r][c-1] + scoreInsertion(seq2.charAt(c));
del = matrix[r-1][c] + scoreDeletion(seq1.charAt(r));
// choose the greatest
matrix[r][c] = max (ins, sub, del, 0);
if (matrix[r][c] > max_score)
{
// keep track of the maximum score
max_score = matrix[r][c];
max_row = r; max_col = c;
}
}
}
}
private static int scoreSubstitution(char a, char b) {
if (isCaseSensitive())
if (a == b)
return match_reward;
else
return mismatch_penalty;
else
if (Character.toLowerCase(a) == Character.toLowerCase(b))
return match_reward;
else
return mismatch_penalty;
}
private static int scoreInsertion(char a) {
return gap_cost;
}
private static int scoreDeletion(char a) {
return gap_cost;
}
public char charAt (int pos)
{
// convert from one-based to zero-based index
return sequence[pos-1];
}
}
and my error is displaying this
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 3
at java.lang.String.charAt(String.java:695)
at compute_matrix.main(compute_matrix.java:67)
Java Result: 1
rows = seq1.length()+1;
cols = seq2.length()+1;
matrix = new int [rows][cols];
and then later:
for (c = 1; c < cols; c++)
{
//when c == cols-1, it is also `seq2.length()`
//the access to seq2.charAt(c) will cause this exception then.
sub = matrix[r-1][c-1] + scoreSubstitution(seq1.charAt(r),seq2.charAt(c));
ins = matrix[r][c-1] + scoreInsertion(seq2.charAt(c));
del = matrix[r-1][c] + scoreDeletion(seq1.charAt(r));
In the above loop, when c == cols-1, it is also seq2.length(), the access to seq2.charAt(c) will cause this exception then.
You initialize the number of rows and cols to length() + 1, while you later iterate from 0 to length (inclusive), while the string contain only length() chars - from 0 to n exclusive.
If you are a C programmer in your past - I assume you are expecting a \0 terminator at the end of the string. In java you don't have those - since String is an object - you can hold a field to indicate its exact length. Meaning the last char in the string, is actually the last character there.
in line 60 of your code
sub = matrix[r-1][c-1] + scoreSubstitution(seq1.charAt(r),seq2.charAt(c));
max value for r is 4 so when you look up for seq.charAt(3) there is nothingso it shows index out of bound
I refactored your code into more canonical java.
The things I've changed:
The class is now called SimilarityMatrix, a more appropriate, self documenting name
variable declarations now happen where they get used as opposed to at the top of main
The work is now done in an instance of the class rather than the main method
I used the built in Math.max(int, int) instead of rolling my own
I removed a lot of unnecessary nested if statements. Java's short circuit evaluation helps here
Since both r and c as well as r+1 and c+1 are used frequently in your calculation loop, I track both
I removed many of the dependencies on static state (made many things instance variables)
Static state that remains is all final now (I made them constants)
Used more java-y variable names (java people really like their camel case)
public class SimilarityMatrix
{
public static final int matchReward = 1;
public static final int mismatchPenalty = -1;
public static final int gapCost = -1;
private int[][] matrix;
private int maxRow = 0;
private int maxCol = 0;
private boolean caseSensitive = false;
SimilarityMatrix(String s1, String s2, boolean dontIgnoreCase)
{
this(s1, s2);
caseSensitive = dontIgnoreCase;
}
SimilarityMatrix(String s1, String s2)
{
int rows = s1.length() + 1;
int cols = s2.length() + 1;
matrix = new int[rows][cols];
int max_score = 0;
for (int x = 0; x < cols; x++)
{
matrix[0][x] = 0;
matrix[x][0] = 0;
}
for (int r = 0, rp1 = 1; rp1 < rows; ++r, ++rp1)
{
for (int c = 0, cp1 = 1; cp1 < rows; ++c, ++cp1)
{
int sub = matrix[r][c] + scoreSubstitution(s1.charAt(r), s2.charAt(c));
int ins = matrix[rp1][c] + scoreInsertion(s2.charAt(c));
int del = matrix[r][cp1] + scoreDeletion(s1.charAt(r));
// choose the greatest
matrix[rp1][cp1] = Math.max(Math.max(ins, sub), Math.max(del, 0));
if (matrix[rp1][cp1] > max_score)
{
// keep track of the maximum score
max_score = matrix[rp1][cp1];
maxRow = rp1;
maxCol = cp1;
}
}
}
}
public static void main(String args[])
{
SimilarityMatrix me = new SimilarityMatrix("aaa", "aaa");
System.out.println(me.getMaxRow() + " " + me.getMaxCol());
}
private int scoreSubstitution(char a, char b)
{
if ((a == b && caseSensitive) || Character.toLowerCase(a) != Character.toLowerCase(b))
return matchReward;
else
return mismatchPenalty;
}
public int getMaxRow()
{
return maxRow;
}
public int getMaxCol()
{
return maxCol;
}
private int scoreInsertion(char a)
{
return gapCost;
}
private int scoreDeletion(char a)
{
return gapCost;
}
}

Categories