Hey guys I need some help with my homework. I understand the way the Fork and Join Framework works, but my code does not join the results. Our exercise is to write a program, that counts the true values in an array. Sorry for any mistakes (bad grammar or something else) in this post, it is my first one.
Edit:
Thanks for all the requests here is my solution of this problem:
TrueFinder Class:
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
class TrueFinder extends RecursiveTask<TrueResult>
{
private static final int SEQUENTIAL_THRESHOLD = 5;
private boolean[] trueData;
private final int start;
private final int end;
public TrueFinder(boolean[] data, int start, int end)
{
this.trueData = data;
this.start = start;
this.end = end;
}
public TrueFinder(boolean[] data)
{
this(data, 0, data.length);
}
protected TrueResult compute()
{
final int length = end - start;
int counter = 0;
if (length < SEQUENTIAL_THRESHOLD)
{
for (int i = start; i < end; i++)
{
if (trueData[i])
{
counter++;
}
}
return new TrueResult(counter);
}
else
{
final int split = length / 2;
TrueFinder left = new TrueFinder(trueData, start, start + split);
left.fork();
TrueFinder right = new TrueFinder(trueData, start + split, end);
TrueResult subResultRight = right.compute();
TrueResult subResultLeft = left.join();
return new TrueResult(subResultRight.getTrueCounter() +
subResultLeft.getTrueCounter());
}
}
public static void main(String[] args)
{
int trues = 0;
boolean[] trueArray = new boolean[500];
for (int i = 0; i < 500; i++)
{
if (Math.random() < 0.3)
{
trueArray[i] = true;
trues++;
}
else
{
trueArray[i] = false;
}
}
TrueFinder finder = new TrueFinder(trueArray);
ForkJoinPool pool = new ForkJoinPool(4);
long startTime = System.currentTimeMillis();
TrueResult result = pool.invoke(finder);
long endTime = System.currentTimeMillis();
long actualTime = endTime - startTime;
System.out.println("Array mit der Länge " + trueArray.length + " in"
actualTime + " msec dursucht und " + result.getTrueCounter() +
" von " + trues + " True Werten gefunden.");
}
}
And the result class:
public class TrueResult
{
private int trueCounter;
public TrueResult(int counter)
{
this.trueCounter = counter;
}
public int getTrueCounter()
{
return trueCounter;
}
}
The splitting task of your souce code is wrong as :
(1) your splitting isn't started from 0:
your start is 1
(2) fraction point is ignored for your splitting;
(granted that SEQUENTIAL_THRESHOLD=5 and trueArray.length = 13, your splitting is ignoring of the numbers from 11 to 12)
(3) if you modify for (1) and (2), the length of subtasks must be split not SQCUQNTIALTHRESHOLD.
So, the modifying source code is below:
else
{
int split = (length - 1 ) / SEQUENTIAL_THRESHOLD + 1;
TrueFinder[] subtasks = new TrueFinder[split];
int start = 0;
for(int i = 0; i < split - 1; i++)
{
subtasks[i] = new TrueFinder(trueData, start, start + SEQUENTIAL_THRESHOLD);
subtasks[i].fork();
start += SEQUENTIAL_THRESHOLD;
}
subtasks[split - 1] = new TrueFinder(trueData, start, length);
counter = subtasks[split - 1].compute();// better invoking compute than join
for (int i = 0; i < SEQUENTIAL_THRESHOLD; i++)
{
counter += subtasks[i].join();
}
return new TrueResult(counter);
}
Related
I decided to optimize the piece of code below but encounter with problem. I tried to change the ArrayList to thread-safe collection by using this discussion but unfortunately something went wrong. The code is compiling but throw the exception.
Exception in thread "main" java.lang.ClassCastException:
java.util.Collections$SynchronizedRandomAccessList cannot be cast to
java.util.ArrayList at
bfpasswrd_multi.PasswordCracker.doItMulti(PasswordCracker.java:73) at
bfpasswrd_multi.PasswordCracker.runMulti(PasswordCracker.java:60) at
bfpasswrd_multi.Test.main(Test.java:16)
Please, tell me what is wrong ?
package bfpasswrd_multi;
import java.util.Scanner;
public class Test
{
public static void main(String[] args)
{
System.out.print("Type password to be cracked: ");
#SuppressWarnings("resource")
String input = new Scanner(System.in).nextLine();
PasswordCracker cracker = new PasswordCracker();
System.out.println("Multithreaded");
cracker.runMulti(input);
cracker = new PasswordCracker();
System.out.println("Finished...");
}
}
package bfpasswrd_multi;
import java.util.ArrayList;
import java.util.Collections;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class PasswordCracker
{
String passwordToCrack;
public boolean passwordFound;
int min;
int max;
StringBuffer crackedPassword;
public void prepare(String text)
{
passwordToCrack = text;
passwordFound = false;
min = 48;
max = 57; // http://ascii.cl/
crackedPassword = new StringBuffer();
crackedPassword.append((char) (min - 1));
}
public void result()
{
System.out.println("Cracked Password is: " + crackedPassword.toString());
}
public void incrementString(StringBuffer toCrack, int min, int max)
{
toCrack.setCharAt(0, (char) ((int) toCrack.charAt(0) + 1));
for (int i = 0; i < toCrack.length(); i++)
{
if (toCrack.charAt(i) > (char) max)
{
toCrack.setCharAt(i, (char) min);
if (toCrack.length() == i + 1)
{
toCrack.append((char) min);
}
else
{
toCrack.setCharAt(i + 1, (char) ((int) toCrack.charAt(i + 1) + 1));
}
}
}
}
public void runMulti(String text)
{
prepare(text);
double time = System.nanoTime();
doItMulti();
time = System.nanoTime() - time;
System.out.println(time / (1000000000));
result();
}
public void doItMulti()
{
int cores = Runtime.getRuntime().availableProcessors();
ArrayList<Future<?>> tasks ; // How do I make my ArrayList Thread-Safe? Another approach to problem in Java?
// https://stackoverflow.com/questions/2444005/how-do-i-make-my-arraylist-thread-safe-another-approach-to-problem-in-java
tasks = (ArrayList<Future<?>>) Collections.synchronizedList(new ArrayList<Future<?>>(cores));
// ArrayList<Future<?>> tasks = new ArrayList<>(cores);
ExecutorService executor = Executors.newFixedThreadPool(cores);
final long step = 2000;
for (long i = 0; i < Long.MAX_VALUE; i += step)
{
while(tasks.size() > cores)
{
for(int w = 0; w < tasks.size();w++)
{
if(tasks.get(w).isDone())
{
tasks.remove(w);
break;
}
}
try
{
Thread.sleep(0);
}
catch (InterruptedException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
{
final long j = i;
if (passwordFound == false)
{
tasks.add(executor.submit(new Runnable()
{
public void run()
{
long border = j + step;
StringBuffer toCrack = new StringBuffer(10);
toCrack.append(constructString3(j, min, max));
for (long k = j; k < border; k++)
{
incrementString(toCrack, min, max);
boolean found = toCrack.toString().equals(passwordToCrack);
if (found)
{
crackedPassword = toCrack;
passwordFound = found;
break;
}
}
}
}));
}
else
{
break;
}
}
}
executor.shutdownNow();
}
public String constructString3(long number, long min, long max)
{
StringBuffer text = new StringBuffer();
if (number > Long.MAX_VALUE - min)
{
number = Long.MAX_VALUE - min;
}
ArrayList<Long> vector = new ArrayList<Long>(10);
vector.add(min - 1 + number);
long range = max - min + 1;
boolean nextLetter = false;
for (int i = 0; i < vector.size(); i++)
{
long nextLetterCounter = 0;
while (vector.get(i) > max)
{
nextLetter = true;
long multiplicator = Math.abs(vector.get(i) / range);
if ((vector.get(i) - (multiplicator * range)) < min)
{
multiplicator -= 1;
}
vector.set(i, vector.get(i) - (multiplicator * range));
nextLetterCounter += multiplicator;
}
if (nextLetter)
{
vector.add((long) (min + nextLetterCounter - 1));
nextLetter = false;
}
text.append((char) vector.get(i).intValue());
}
return text.toString();
}
}
Many thanks in advance !
The issue that you're seeing is with this line:
tasks = (ArrayList<Future<?>>) Collections.synchronizedList(new ArrayList<Future<?>>(cores));
Collections.synchronizedList doesn't return an ArrayList; it returns some subclass of List - java.util.Collections$SynchronizedRandomAccessList to be exact - and I don't know anything about that class other than it's a List, but it's not an ArrayList.
The easy solution to this is to declare tasks to be a List<Future<?>>:
List<Future<?>> tasks =
Collections.synchronizedList(new ArrayList<Future<?>>(cores));
Dear community members thanks you for your comments. It seems that now my safe-thread list is working. For the people who interesting in solution I will submit the resolved code below. Also, probably I should mention that I rename task
to futures, please pay attention. Once again everybody thanks !
package bfpasswrd_multi;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class PasswordCracker
{
String passwordToCrack;
public boolean passwordFound;
int min;
int max;
StringBuffer crackedPassword;
public void prepare(String text)
{
passwordToCrack = text;
passwordFound = false;
min = 48;
max = 57; // http://ascii.cl/
crackedPassword = new StringBuffer();
crackedPassword.append((char) (min - 1));
}
public void result()
{
System.out.println("Cracked Password is: " + crackedPassword.toString());
}
public void incrementString(StringBuffer toCrack, int min, int max)
{
toCrack.setCharAt(0, (char) ((int) toCrack.charAt(0) + 1));
for (int i = 0; i < toCrack.length(); i++)
{
if (toCrack.charAt(i) > (char) max)
{
toCrack.setCharAt(i, (char) min);
if (toCrack.length() == i + 1)
{
toCrack.append((char) min);
}
else
{
toCrack.setCharAt(i + 1, (char) ((int) toCrack.charAt(i + 1) + 1));
}
}
}
}
public void runMulti(String text)
{
prepare(text);
double time = System.nanoTime();
doItMulti();
time = System.nanoTime() - time;
System.out.println(time / (1000000000));
result();
}
public void doItMulti()
{
int cores = Runtime.getRuntime().availableProcessors();
// ArrayList<Future<?>> task; // HOW IT WAS
//
// tasks = (ArrayList<Future<?>>) Collections.synchronizedList(new ArrayList<Future<?>>(cores)); // HOW IT WAS
List<Future<?>> futures ; // THE SOLUTION
futures = Collections.synchronizedList(new ArrayList<Future<?>>(cores)); // THE SOLUTION
// ArrayList<Future<?>> tasks = new ArrayList<>(cores);
ExecutorService executor = Executors.newFixedThreadPool(cores);
final long step = 2000;
for (long i = 0; i < Long.MAX_VALUE; i += step)
{
while(futures.size() > cores)
{
for(int w = 0; w < futures.size();w++)
{
if(futures.get(w).isDone())
{
futures.remove(w);
break;
}
}
try
{
Thread.sleep(0);
}
catch (InterruptedException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
{
final long j = i;
if (passwordFound == false)
{
futures.add(executor.submit(new Runnable()
{
public void run()
{
long border = j + step;
StringBuffer toCrack = new StringBuffer(10);
toCrack.append(constructString3(j, min, max));
for (long k = j; k < border; k++)
{
incrementString(toCrack, min, max);
boolean found = toCrack.toString().equals(passwordToCrack);
if (found)
{
crackedPassword = toCrack;
passwordFound = found;
break;
}
}
}
}));
}
else
{
break;
}
}
}
executor.shutdownNow();
}
public String constructString3(long number, long min, long max)
{
StringBuffer text = new StringBuffer();
if (number > Long.MAX_VALUE - min)
{
number = Long.MAX_VALUE - min;
}
ArrayList<Long> vector = new ArrayList<Long>(10);
vector.add(min - 1 + number);
long range = max - min + 1;
boolean nextLetter = false;
for (int i = 0; i < vector.size(); i++)
{
long nextLetterCounter = 0;
while (vector.get(i) > max)
{
nextLetter = true;
long multiplicator = Math.abs(vector.get(i) / range);
if ((vector.get(i) - (multiplicator * range)) < min)
{
multiplicator -= 1;
}
vector.set(i, vector.get(i) - (multiplicator * range));
nextLetterCounter += multiplicator;
}
if (nextLetter)
{
vector.add((long) (min + nextLetterCounter - 1));
nextLetter = false;
}
text.append((char) vector.get(i).intValue());
}
return text.toString();
}
}
I am using Comb Sort to sort out a given array of Strings. The code is :-
public static int combSort(String[] input_array) {
int gap = input_array.length;
double shrink = 1.3;
int numbOfComparisons = 0;
boolean swapped=true;
//while(!swapped && gap>1){
System.out.println();
while(!(swapped && gap==1)){
gap = (int)(gap/shrink);
if(gap<1){
gap=1;
}
int i = 0;
swapped = false;
String temp = "";
while((i+gap) < input_array.length){
numbOfComparisons++;
if(Compare(input_array[i], input_array[i+gap]) == 1){
temp = input_array[i];
input_array[i] = input_array[i+gap];
input_array[i+gap] = temp;
swapped = true;
System.out.println("gap: " + gap + " i: " + i);
ArrayUtilities.printArray(input_array);
}
i++;
}
}
ArrayUtilities.printArray(input_array);
return numbOfComparisons;
}
The problem is that while it sorts many arrays , it gets stuck in an infinite loop for some arrays, particularly small arrays. Compare(input_array[i], input_array[i+gap]) is a small method that returns 1 if s1>s2, returns -1 if s1
try this version. The string array is changed to integer array (I guess you can change it back to string version). The constant 1.3 is replaced with 1.247330950103979.
public class CombSort
{
private static final int PROBLEM_SIZE = 5;
static int[] in = new int[PROBLEM_SIZE];
public static void printArr()
{
for(int i=0;i<in.length;i++)
{
System.out.print(in[i] + "\t");
}
System.out.println();
}
public static void combSort()
{
int swap, i, gap=PROBLEM_SIZE;
boolean swapped = false;
printArr();
while ((gap > 1) || swapped)
{
if (gap > 1)
{
gap = (int)( gap / 1.247330950103979);
}
swapped = false;
for (i = 0; gap + i < PROBLEM_SIZE; ++i)
{
if (in[i] - in[i + gap] > 0)
{
swap = in[i];
in[i] = in[i + gap];
in[i + gap] = swap;
swapped = true;
}
}
}
printArr();
}
public static void main(String[] args)
{
for(int i=0;i<in.length;i++)
{
in[i] = (int) (Math.random()*PROBLEM_SIZE);
}
combSort();
}
}
Please find below implementation for comb sort in java.
public static void combSort(int[] elements) {
float shrinkFactor = 1.3f;
int postion = (int) (elements.length/shrinkFactor);
do {
int cursor = postion;
for(int i=0;cursor<elements.length;i++,cursor++) {
if(elements[i]>elements[cursor]) {
int temp = elements[cursor];
elements[cursor] = elements[i];
elements[i] = temp;
}
}
postion = (int) (postion/shrinkFactor);
}while(postion>=1);
}
Please review and let me know your's feedback.
So, I made a small program to test Multithreading in java and compare the time it takes to scale an array using a while loop and then creating multiple threads and running those threads. I'm unsure about then numbers I'm getting when the program finishes, so I was wondering if I made a boneheaded error at some point and messed something up to get very disparate numbers.
Code below:
import java.util.Scanner;
public class arrayScaling {
public static void main(String[] args) throws InterruptedException {
Scanner input = new Scanner(System.in);
System.out.println("Enter the amount of number you want the program to generate:");
int numOfNumbs = input.nextInt();
int [] arrayForNumbers = new int [numOfNumbs];
int [] newArrayForNumbers = new int [numOfNumbs];
for (int i = 0; i < arrayForNumbers.length; i++) {
arrayForNumbers[i] = (int) ((Math.random() * 25) + 1);
}
long startTime = System.nanoTime();
for (int i = 0; i < arrayForNumbers.length; i++) {
newArrayForNumbers[i] = newArrayForNumbers[i] * 3;
}
long endTime = System.nanoTime();
System.out.println();
long totalExecutionTime = endTime-startTime;
System.out.println("Time it takes execute scaling is " +
totalExecutionTime + " nanoseconds");
System.out.println();
int numOfNumLeftOver = numOfNumbs % 5;
int numOfNumDivided = numOfNumbs / 5;
int [] temp = null;
int [] temp2 = null;
int [] temp3 = null;
int [] temp4 = null;
int [] temp5 = null;
MyThread thread1 = new MyThread (numOfNumbs/5);
MyThread thread2 = new MyThread (numOfNumbs/5);
MyThread thread3 = new MyThread (numOfNumbs/5);
MyThread thread4 = new MyThread (numOfNumbs/5);
MyThread thread5;
if (numOfNumLeftOver != 0) {
numOfNumDivided = numOfNumDivided + numOfNumLeftOver;
thread5 = new MyThread (numOfNumDivided);
}
else {
thread5 = new MyThread (numOfNumbs/5);
}
int tempNum = 0;
for ( int i = 0; i < thread1.getArray().length; i ++) {
temp = thread1.getArray();
temp[tempNum] = arrayForNumbers[tempNum];
tempNum++;
}
for ( int i = 0; i < thread2.getArray().length; i ++) {
temp2 = thread2.getArray();
temp2[i] = arrayForNumbers[tempNum];
tempNum++;
}
for ( int i = 0; i < thread3.getArray().length; i ++) {
temp3 = thread3.getArray();
temp3[i] = arrayForNumbers[tempNum];
tempNum++;
}
for ( int i = 0; i < thread4.getArray().length; i ++) {
temp4 = thread4.getArray();
temp4[i] = arrayForNumbers[tempNum];
tempNum++;
}
for ( int i = 0; i < thread5.getArray().length; i ++) {
temp5 = thread5.getArray();
temp5[i] = arrayForNumbers[tempNum];
tempNum++;
}
thread1.setArray(temp);
thread2.setArray(temp2);
thread3.setArray(temp3);
thread4.setArray(temp4);
thread5.setArray(temp5);
long startTime2 = System.nanoTime();
thread1.start();
thread2.start();
thread3.start();
thread4.start();
thread5.start();
thread1.join();
thread2.join();
thread3.join();
thread4.join();
thread5.join();
long endTime2 = System.nanoTime();
long newTotalExecutionTime = endTime2 - startTime2;
System.out.println("Time it takes execute scaling w/ multiple threads is " +
newTotalExecutionTime + " nanoseconds");
if (newTotalExecutionTime < totalExecutionTime) {
System.out.println("Multithreading was more effective");
}
else if (totalExecutionTime < newTotalExecutionTime) {
System.out.println("The original algorithm was more effective");
}
else if (totalExecutionTime == newTotalExecutionTime) {
System.out.println("Both method worked at the same speed");
}
input.close();
}
}
public class MyThread extends Thread {
private int [] array;
private int [] scaleArray;
public MyThread(int size) {
array = new int [size];
scaleArray = new int [size];
}
public int[] getArray() {
return array;
}
public void setArray(int[] array) {
this.array = array;
}
public int[] getScaleArray() {
return scaleArray;
}
public void setScaleArray(int[] scaleArray) {
this.scaleArray = scaleArray;
}
public void run () {
for (int z = 0; z < array.length; z++){
scaleArray[z] = 3 * array[z];
}
}
}
And the output of this program is:
Enter the amount of number you want the program to generate:
16
Time it takes execute scaling is 893 nanoseconds
Time it takes execute scaling w/ multiple threads is 590345 nanoseconds
The original algorithm was more effective
Your results don't surprise me in the slightest. There's a lot of overhead to creating threads, starting them, waiting for them to finish and so on. Don't forget, 590345ns is still less than a millisecond; but most of that is to do with shuffling threads, not with multiplying the numbers.
If you want to see the threaded part of the program outperform the other part, try generating a whole lot more than 16 numbers.
I would like to improve my fork/join little example to show that during Java Fork/Join framework execution work stealing occurs.
What changes I need to do to following code? Purpose of example: just do a linear research of a value breaking up work between multiple threads.
package com.stackoverflow.questions;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
public class CounterFJ<T extends Comparable<T>> extends RecursiveTask<Integer> {
private static final long serialVersionUID = 5075739389907066763L;
private List<T> _list;
private T _test;
private int _lastCount = -1;
private int _start;
private int _end;
private int _divideFactor = 4;
private static final int THRESHOLD = 20;
public CounterFJ(List<T> list, T test, int start, int end, int factor) {
_list = list;
_test = test;
_start = start;
_end = end;
_divideFactor = factor;
}
public CounterFJ(List<T> list, T test, int factor) {
this(list, test, 0, list.size(), factor);
}
#Override
protected Integer compute() {
if (_end - _start < THRESHOLD) {
int count = 0;
for (int i = _start; i < _end; i++) {
if (_list.get(i).compareTo(_test) == 0) {
count++;
}
}
_lastCount = count;
return new Integer(count);
}
LinkedList<CounterFJ<T>> taskList = new LinkedList<>();
int step = (_end - _start) / _divideFactor;
for (int j = 0; j < _divideFactor; j++) {
CounterFJ<T> task = null;
if (j == 0)
task = new CounterFJ<T>(_list, _test, _start, _start + step, _divideFactor);
else if (j == _divideFactor - 1)
task = new CounterFJ<T>(_list, _test, _start + (step * j), _end, _divideFactor);
else
task = new CounterFJ<T>(_list, _test, _start + (step * j), _start + (step * (j + 1)), _divideFactor);
// task.fork();
taskList.add(task);
}
invokeAll(taskList);
_lastCount = 0;
for (CounterFJ<T> task : taskList) {
_lastCount += task.join();
}
return new Integer(_lastCount);
}
public int getResult() {
return _lastCount;
}
public static void main(String[] args) {
LinkedList<Long> list = new LinkedList<Long>();
long range = 200;
Random r = new Random(42);
for (int i = 0; i < 1000; i++) {
list.add(new Long((long) (r.nextDouble() * range)));
}
CounterFJ<Long> counter = new CounterFJ<>(list, new Long(100), 4);
ForkJoinPool pool = new ForkJoinPool();
long time = System.currentTimeMillis();
pool.invoke(counter);
System.out.println("Fork join counter in " + (System.currentTimeMillis() - time));
System.out.println("Occurrences:" + counter.getResult());
}
}
Finally I managed how to and it's not difficult so I leave this for future readers.
In the costructor of the RecursiveTask save thread that created the instance itself. In the compute method check if executing thread is the same or not. If not work-stealing has occurred.
So I added this member variable
private long _threadId = -1;
private static int stolen_tasks = 0;
changed constructor like this:
public CounterFJ(List<T> list, T test, int start, int end, int factor) {
_list = list;
_threadId = Thread.currentThread().getId(); //added
_test = test;
_start = start;
_end = end;
_branchFactor = factor;
}
and added comparison into compute method:
#Override
protected Integer compute() {
long thisThreadId = Thread.currentThread().getId();
if (_threadId != thisThreadId){
stolen_tasks++;
}
// rest of the method
I am testing BerkeleyDB Java Edition to understand whether I can use it in my project.
I've created very simple program which works with object of class com.sleepycat.je.Database:
writes N records of 5-15kb each, with keys generated like Integer.toString(random.nextInt());
reads these records fetching them with method Database#get in the same order they were created;
reads the same number of records with method Database#get in random order.
And I now see the strange thing. Execution time for third test grows very non-linearly with increasing of the number of records.
N=80000, write=55sec, sequential fetch=17sec, random fetch=3sec
N=100000, write=60sec, sequential fetch=20sec, random fetch=7sec
N=120000, write=68sec, sequential fetch=27sec, random fetch=11sec
N=140000, write=82sec, sequential fetch=32sec, random fetch=47sec
(I've run tests several times, of course.)
I suppose I am doing something quite wrong. Here is the source for reference (sorry, it is bit long), methods are called in the same order:
private Environment env;
private Database db;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen() {
EnvironmentConfig ec = new EnvironmentConfig();
DatabaseConfig dc = new DatabaseConfig();
ec.setAllowCreate(true);
dc.setAllowCreate(true);
env = new Environment(new File("mydbenv"), ec);
db = env.openDatabase(null, "moe", dc);
return true;
}
public int storeRecords(int i) {
int j;
long size = 0;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
key.setData(k.getBytes());
val.setData(data);
db.put(null, key, val);
}
System.out.println("GENERATED SIZE: " + size);
return j;
}
public int fetchRecords(int i) {
int j, res;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
key.setData(k.getBytes());
db.get(null, key, val, null);
if (Arrays.equals(data, val.getData())) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.getData().length);
}
}
return res;
}
public int fetchRandom(int i) {
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
key.setData(k.getBytes());
db.get(null, key, val, null);
}
return i;
}
Performance degradation is non-linear for two reasons:
BDB-JE data structure is a b-tree, which has O(log(n)) performance for retrieving one record. Retrieving all via the get method is O(n*log(n)).
Large data sets don't fit into RAM, and so disk access slows everything down. Random access has very poor cache locality.
Note that you can improve write performance by giving up some durability: ec.setTxnWriteNoSync(true);
You might also want to try Tupl, an open source BerkeleyDB replacement I've been working on. It's still in the alpha stage, but you can find it on SourceForge.
For a fair comparison between BDB-JE and Tupl, I set the cache size to 500M and an explicit checkpoint is performed at the end of the store method.
With BDB-JE:
N=80000, write=11.0sec, fetch=5.3sec
N=100000, write=13.6sec, fetch=7.0sec
N=120000, write=16.4sec, fetch=29.5sec
N=140000, write=18.8sec, fetch=35.9sec
N=160000, write=21.5sec, fetch=41.3sec
N=180000, write=23.9sec, fetch=46.4sec
With Tupl:
N=80000, write=21.7sec, fetch=4.4sec
N=100000, write=27.6sec, fetch=6.3sec
N=120000, write=30.2sec, fetch=8.4sec
N=140000, write=35.4sec, fetch=12.2sec
N=160000, write=39.9sec, fetch=17.4sec
N=180000, write=45.4sec, fetch=22.8sec
BDB-JE is faster at writing entries, because of its log-based format. Tupl is faster at reading, however. Here's the source to the Tupl test:
import java.io.;
import java.util.;
import org.cojen.tupl.*;
public class TuplTest {
public static void main(final String[] args) throws Exception {
final RandTupl rt = new RandTupl();
rt.dbOpen(args[0]);
{
long start = System.currentTimeMillis();
rt.storeRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("store duration: " + (end - start));
}
{
long start = System.currentTimeMillis();
rt.fetchRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("fetch duration: " + (end - start));
}
}
private Database db;
private Index ix;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen(String home) throws Exception {
DatabaseConfig config = new DatabaseConfig();
config.baseFile(new File(home));
config.durabilityMode(DurabilityMode.NO_FLUSH);
config.minCacheSize(500000000);
db = Database.open(config);
ix = db.openIndex("moe");
return true;
}
public int storeRecords(int i) throws Exception {
int j;
long size = 0;
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
ix.store(null, k.getBytes(), data);
}
System.out.println("GENERATED SIZE: " + size);
db.checkpoint();
return j;
}
public int fetchRecords(int i) throws Exception {
int j, res;
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
byte[] val = ix.load(null, k.getBytes());
if (Arrays.equals(data, val)) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.length);
}
}
return res;
}
public int fetchRandom(int i) throws Exception {
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
ix.load(null, k.getBytes());
}
return i;
}
}