BerkeleyDB JE random access time increases non-linearly

BerkeleyDB JE random access time increases non-linearly - java

I am testing BerkeleyDB Java Edition to understand whether I can use it in my project.
I've created very simple program which works with object of class com.sleepycat.je.Database:
writes N records of 5-15kb each, with keys generated like Integer.toString(random.nextInt());
reads these records fetching them with method Database#get in the same order they were created;
reads the same number of records with method Database#get in random order.
And I now see the strange thing. Execution time for third test grows very non-linearly with increasing of the number of records.
N=80000, write=55sec, sequential fetch=17sec, random fetch=3sec
N=100000, write=60sec, sequential fetch=20sec, random fetch=7sec
N=120000, write=68sec, sequential fetch=27sec, random fetch=11sec
N=140000, write=82sec, sequential fetch=32sec, random fetch=47sec
(I've run tests several times, of course.)
I suppose I am doing something quite wrong. Here is the source for reference (sorry, it is bit long), methods are called in the same order:
private Environment env;
private Database db;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen() {
EnvironmentConfig ec = new EnvironmentConfig();
DatabaseConfig dc = new DatabaseConfig();
ec.setAllowCreate(true);
dc.setAllowCreate(true);
env = new Environment(new File("mydbenv"), ec);
db = env.openDatabase(null, "moe", dc);
return true;
}
public int storeRecords(int i) {
int j;
long size = 0;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
key.setData(k.getBytes());
val.setData(data);
db.put(null, key, val);
}
System.out.println("GENERATED SIZE: " + size);
return j;
}
public int fetchRecords(int i) {
int j, res;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
key.setData(k.getBytes());
db.get(null, key, val, null);
if (Arrays.equals(data, val.getData())) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.getData().length);
}
}
return res;
}
public int fetchRandom(int i) {
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
key.setData(k.getBytes());
db.get(null, key, val, null);
}
return i;
}

Performance degradation is non-linear for two reasons:
BDB-JE data structure is a b-tree, which has O(log(n)) performance for retrieving one record. Retrieving all via the get method is O(n*log(n)).
Large data sets don't fit into RAM, and so disk access slows everything down. Random access has very poor cache locality.
Note that you can improve write performance by giving up some durability: ec.setTxnWriteNoSync(true);
You might also want to try Tupl, an open source BerkeleyDB replacement I've been working on. It's still in the alpha stage, but you can find it on SourceForge.
For a fair comparison between BDB-JE and Tupl, I set the cache size to 500M and an explicit checkpoint is performed at the end of the store method.
With BDB-JE:
N=80000, write=11.0sec, fetch=5.3sec
N=100000, write=13.6sec, fetch=7.0sec
N=120000, write=16.4sec, fetch=29.5sec
N=140000, write=18.8sec, fetch=35.9sec
N=160000, write=21.5sec, fetch=41.3sec
N=180000, write=23.9sec, fetch=46.4sec
With Tupl:
N=80000, write=21.7sec, fetch=4.4sec
N=100000, write=27.6sec, fetch=6.3sec
N=120000, write=30.2sec, fetch=8.4sec
N=140000, write=35.4sec, fetch=12.2sec
N=160000, write=39.9sec, fetch=17.4sec
N=180000, write=45.4sec, fetch=22.8sec
BDB-JE is faster at writing entries, because of its log-based format. Tupl is faster at reading, however. Here's the source to the Tupl test:
import java.io.;
import java.util.;
import org.cojen.tupl.*;
public class TuplTest {
public static void main(final String[] args) throws Exception {
final RandTupl rt = new RandTupl();
rt.dbOpen(args[0]);
{
long start = System.currentTimeMillis();
rt.storeRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("store duration: " + (end - start));
}
{
long start = System.currentTimeMillis();
rt.fetchRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("fetch duration: " + (end - start));
}
}
private Database db;
private Index ix;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen(String home) throws Exception {
DatabaseConfig config = new DatabaseConfig();
config.baseFile(new File(home));
config.durabilityMode(DurabilityMode.NO_FLUSH);
config.minCacheSize(500000000);
db = Database.open(config);
ix = db.openIndex("moe");
return true;
}
public int storeRecords(int i) throws Exception {
int j;
long size = 0;
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
ix.store(null, k.getBytes(), data);
}
System.out.println("GENERATED SIZE: " + size);
db.checkpoint();
return j;
}
public int fetchRecords(int i) throws Exception {
int j, res;
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
byte[] val = ix.load(null, k.getBytes());
if (Arrays.equals(data, val)) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.length);
}
}
return res;
}
public int fetchRandom(int i) throws Exception {
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
ix.load(null, k.getBytes());
}
return i;
}
}

Related

How to optimize this simulation in Java?

I am trying to do a multi threading simulation in Java and I have managed to do it with a queue but the execution time is high, any ideas on how I could optimize this? Can using recursion save time?
The input has to be like this:
2 5 It means that there are two threads(workers) for 5 jobs
1 2 3 4 5 This is the jobs that are an integer which means the time cost of processing that job so the output will be this:
0 0 The two threads try to simultaneously take jobs from the list, so thread with index 0 actually
1 0 takes the first job and starts working on it at the moment 0
0 1 After 1 second, thread 0 is done with the first job and takes the third job from the list, and starts processing it immediately at time 1.
1 2 One second later, thread 1 is done with the second job and takes the fourth job from the list, and starts processing it immediately at time 2
0 4 Finally, after 2 more seconds, thread 0 is done with the third job and takes the fifth job from the list, and starts processing it immediately at time 4
This is the code:
import java.io.*;
import java.util.HashMap;
import java.util.HashSet;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.StringTokenizer;
public class JobQueue {
private int numWorkers;
private int[] jobs;
private int[] assignedWorker;
private long[] startTime;
private FastScanner in;
private PrintWriter out;
public static void main(String[] args) throws IOException {
new JobQueue().solve();
}
private void readData() throws IOException {
numWorkers = in.nextInt();
int m = in.nextInt();
jobs = new int[m];
for (int i = 0; i < m; ++i) {
jobs[i] = in.nextInt();
}
}
private void writeResponse() {
for (int i = 0; i < jobs.length; ++i) {
out.println(assignedWorker[i] + " " + startTime[i]);
}
}
private void assignJobs() {
// TODO: replace this code with a faster algorithm.
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
PriorityQueue<Integer> nextTimesQueue = new PriorityQueue<Integer>();
HashMap<Integer, Set<Integer>> workersReadyAtTimeT = new HashMap<Integer,Set<Integer>>();
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
if(i<numWorkers) {
bestWorker = i;
nextTimesQueue.add(duration);
addToSet(workersReadyAtTimeT, duration, i,0);
}else {
int currentTime = nextTimesQueue.poll();
Set<Integer> workersReady = workersReadyAtTimeT.get(currentTime);
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
} else {
bestWorker = workersReady.iterator().next();
workersReadyAtTimeT.remove(currentTime);
nextTimesQueue.add(currentTime+duration);
addToSet(workersReadyAtTimeT, duration, bestWorker, currentTime);
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
private void addToSet(HashMap<Integer, Set<Integer>> workersReadyAtTimeT, int duration, int worker, int current) {
if(workersReadyAtTimeT.get(current+duration)==null) {
HashSet<Integer> s = new HashSet<Integer>();
s.add(worker);
workersReadyAtTimeT.put(current+duration, s);
}else {
Set<Integer> s = workersReadyAtTimeT.get(current+duration);
s.add(worker);
workersReadyAtTimeT.put(current+duration,s);
}
}
public void solve() throws IOException {
in = new FastScanner();
out = new PrintWriter(new BufferedOutputStream(System.out));
readData();
assignJobs();
writeResponse();
out.close();
}
static class FastScanner {
private BufferedReader reader;
private StringTokenizer tokenizer;
public FastScanner() {
reader = new BufferedReader(new InputStreamReader(System.in));
tokenizer = null;
}
public String next() throws IOException {
while (tokenizer == null || !tokenizer.hasMoreTokens()) {
tokenizer = new StringTokenizer(reader.readLine());
}
return tokenizer.nextToken();
}
public int nextInt() throws IOException {
return Integer.parseInt(next());
}
}
}

It seems to me that your jobsList object is completely redundant, everything it contains is also in the jobs array and when you take the front element you get the item at jobs[i]. To speed up a little you could take the constructors of the ints out of the loop and just assign new numbers to them. Another optimization would be to not search during the first numWorkers jobs because you know you still have idle workers until you have exausted your pool. Once you have found one good worker you dont have to keep looking so you can continue out of your for-loop.
public class JobQueue {
private int numWorkers;
private int[] jobs;
private int[] assignedWorker;
private long[] startTime;
private void readData() throws IOException {
numWorkers = in.nextInt();
int m = in.nextInt();
jobs = new int[m];
for (int i = 0; i < m; ++i) {
jobs[i] = in.nextInt();
}
}
private void assignJobs() {
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
bestWorker = 0;
if (i< numWorkers){
bestWorker= i;
} else{
for (int j = 0; j < numWorkers; ++j) {
if (nextFreeTime[j] < nextFreeTime[bestWorker])
bestWorker = j;
continue;
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
However, both your solution and this slightly trimmed down one take 2 milliseconds to run. I also looked at having HashMap to maintain a NextWorker marker but at some point you catch up with it and end up looking everytime for the next one and don't win much.
You could try having an ordered List/Queue, but then you have expensive inserts instead of expensive searches, and you have to kee track of the timeslice. But a version like that could look like this:
private void assignJobs() {
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
PriorityQueue<Integer> nextTimesQueue = new PriorityQueue<Integer>();
HashMap<Integer, Set<Integer>> workersReadyAtTimeT = new HashMap<Integer,Set<Integer>>();
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
if(i<numWorkers) {
bestWorker = i;
nextTimesQueue.add(duration);
addToSet(workersReadyAtTimeT, duration, i,0);
}else {
int currentTime = nextTimesQueue.poll();
Set<Integer> workersReady = workersReadyAtTimeT.get(currentTime);
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
} else {
bestWorker = workersReady.iterator().next();
workersReadyAtTimeT.remove(currentTime);
nextTimesQueue.add(currentTime+duration);
addToSet(workersReadyAtTimeT, duration, bestWorker, currentTime);
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
private void addToSet(HashMap<Integer, Set<Integer>> workersReadyAtTimeT, int duration, int worker, int current) {
if(workersReadyAtTimeT.get(current+duration)==null) {
HashSet<Integer> s = new HashSet<Integer>();
s.add(worker);
workersReadyAtTimeT.put(current+duration, s);
}else {
Set<Integer> s = workersReadyAtTimeT.get(current+duration);
s.add(worker);
workersReadyAtTimeT.put(current+duration,s);
}
}

GC overhead limit exceeded while training OpenNLP's NameFinderME

I want to get probability score for the extracted names using NameFinderME, but using the provided model gives very bad probabilities using the probs function.
For example, "Scott F. Fitzgerald" gets a score around 0.5 (averaging log probabilities, and taking an exponent), while "North Japan" and "Executive Vice President, Corporate Relations and Chief Philanthropy Officer" both get a score higher than 0.9...
I have more than 2 million first names, and another 2 million last names (with their frequency counts) And I want to synthetically create a huge dataset from outer multiplication of the first names X middle names (using the first names pool) X last names.
The problem is, I don't even get to go over all the last names once (even when discarding freq counts and only using each name only once) before I get a GC overhead limit exceeded exception...
I'm implementing a ObjectStream and give it to the train function:
public class OpenNLPNameStream implements ObjectStream<NameSample> {
private List<Map<String, Object>> firstNames = null;
private List<Map<String, Object>> lastNames = null;
private int firstNameIdx = 0;
private int firstNameCountIdx = 0;
private int middleNameIdx = 0;
private int middleNameCountIdx = 0;
private int lastNameIdx = 0;
private int lastNameCountIdx = 0;
private int firstNameMaxCount = 0;
private int middleNameMaxCount = 0;
private int lastNameMaxCount = 0;
private int firstNameKBSize = 0;
private int lastNameKBSize = 0;
Span span[] = new Span[1];
String fullName[] = new String[3];
String partialName[] = new String[2];
private void increaseFirstNameCountIdx()
{
firstNameCountIdx++;
if (firstNameCountIdx == firstNameMaxCount) {
firstNameIdx++;
if (firstNameIdx == firstNameKBSize)
return; //no need to update anything - this is the end of the run...
firstNameMaxCount = getFirstNameMaxCount(firstNameIdx);
firstNameCountIdx = 0;
}
}
private void increaseMiddleNameCountIdx()
{
lastNameCountIdx++;
if (middleNameCountIdx == middleNameMaxCount) {
if (middleNameIdx == firstNameKBSize) {
resetMiddleNameIdx();
increaseFirstNameCountIdx();
} else {
middleNameMaxCount = getMiddleNameMaxCount(middleNameIdx);
middleNameCountIdx = 0;
}
}
}
private void increaseLastNameCountIdx()
{
lastNameCountIdx++;
if (lastNameCountIdx == lastNameMaxCount) {
lastNameIdx++;
if (lastNameIdx == lastNameKBSize) {
resetLastNameIdx();
increaseMiddleNameCountIdx();
}
else {
lastNameMaxCount = getLastNameMaxCount(lastNameIdx);
lastNameCountIdx = 0;
}
}
}
private void resetLastNameIdx()
{
lastNameIdx = 0;
lastNameMaxCount = getLastNameMaxCount(0);
lastNameCountIdx = 0;
}
private void resetMiddleNameIdx()
{
middleNameIdx = 0;
middleNameMaxCount = getMiddleNameMaxCount(0);
middleNameCountIdx = 0;
}
private int getFirstNameMaxCount(int i)
{
return 1; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getMiddleNameMaxCount(int i)
{
return 3; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getLastNameMaxCount(int i)
{
return 1;
//String occurences = (String) lastNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
#Override
public NameSample read() throws IOException {
if (firstNames == null) {
firstNames = CSVFileTools.readFileFromInputStream("namep_first_name_idf.csv", new ClassPathResource("namep_first_name_idf.csv").getInputStream());
firstNameKBSize = firstNames.size();
firstNameMaxCount = getFirstNameMaxCount(0);
middleNameMaxCount = getFirstNameMaxCount(0);
}
if (lastNames == null) {
lastNames = CSVFileTools.readFileFromInputStream("namep_last_name_idf.csv",new ClassPathResource("namep_last_name_idf.csv").getInputStream());
lastNameKBSize = lastNames.size();
lastNameMaxCount = getLastNameMaxCount(0);
}
increaseLastNameCountIdx();;
if (firstNameIdx == firstNameKBSize)
return null; //we've finished iterating over all permutations!
String [] sentence;
if (firstNameCountIdx < firstNameMaxCount / 3)
{
span[0] = new Span(0,2,"Name");
sentence = partialName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[1] = (String)lastNames.get(lastNameIdx).get("last_name");
}
else
{
span[0] = new Span(0,3,"name");
sentence = fullName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[2] = (String)lastNames.get(lastNameIdx).get("last_name");
if (firstNameCountIdx < 2*firstNameCountIdx/3) {
sentence[1] = (String)firstNames.get(middleNameIdx).get("first_name");
}
else {
sentence[1] = ((String)firstNames.get(middleNameIdx).get("first_name")).substring(0,1) + ".";
}
}
return new NameSample(sentence,span,true);
}
#Override
public void reset() throws IOException, UnsupportedOperationException {
firstNameIdx = 0;
firstNameCountIdx = 0;
middleNameIdx = 0;
middleNameCountIdx = 0;
lastNameIdx = 0;
lastNameCountIdx = 0;
firstNameMaxCount = 0;
middleNameMaxCount = 0;
lastNameMaxCount = 0;
}
#Override
public void close() throws IOException {
reset();
firstNames = null;
lastNames = null;
}
}
And
TokenNameFinderModel model = NameFinderME.train("en","person",new OpenNLPNameStream(),TrainingParameters.defaultParams(),new TokenNameFinderFactory());
model.serialize(new FileOutputStream("trainedNames.bin",false));
I get the following error after a few minutes of running:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at opennlp.tools.util.featuregen.WindowFeatureGenerator.createFeatures(WindowFeatureGenerator.java:112)
at opennlp.tools.util.featuregen.AggregatedFeatureGenerator.createFeatures(AggregatedFeatureGenerator.java:79)
at opennlp.tools.util.featuregen.CachedFeatureGenerator.createFeatures(CachedFeatureGenerator.java:69)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:118)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:37)
at opennlp.tools.namefind.NameFinderEventStream.generateEvents(NameFinderEventStream.java:113)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:137)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:36)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:62)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:27)
at opennlp.tools.util.AbstractObjectStream.read(AbstractObjectStream.java:32)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:46)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:29)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:130)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
Edit: After increasing the memory of the JVM to 8GB, I still don't get past the first 2 million last names, but now the Exception is:
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at opennlp.tools.ml.model.AbstractDataIndexer.update(AbstractDataIndexer.java:141)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:134)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
It seems the problem stems from the fact I'm creating a new NameSample along with new Spans and Strings at every read call... But I can't reuse Spans or NameSamples, since they're immutables.
Should I just write my own language model, is there a better Java library for doing this sort of thing (I'm only interested in getting the probability the extracted text is actually a name) are there parameters I should tweak for the model I'm training?
Any advice would be appreciated.

Round Robin Scheduler

I need to implement a "round-robin" scheduler with a job class that I cannot modify. Round-robin scheduler should process the job that has been waiting the longest first, then reset timer to zero. If two jobs have same wait time, lower id is processed first. The job class only gives three values (job id, remaining duration, and priority(which is not needed for this). each job has a start time, so only a couple of jobs may be available during first cycle, few more next cycle, etc. Since the "job array" I am calling is different every time I call it, I'm not sure how to store the wait times.
This is the job class:
public class Jobs{
private int[] stas = new int[0];
private int[] durs = new int[0];
private int[] lefs = new int[0];
private int[] pris = new int[0];
private int[] fins = new int[0];
private int clock;
public Jobs()
{
this("joblist.csv");
}
public Jobs(String filename)
{
BufferedReader fp = null;
String line = "";
String[] b = null;
int[] tmp;
try
{
fp = new BufferedReader(new FileReader(filename));
while((line = fp.readLine()) != null)
{
b = line.split(",");
if(b.length == 3)
{
try
{
int sta = Integer.parseInt(b[0]);
//System.out.println("sta: " + b[0]);
int dur = Integer.parseInt(b[1]);
//System.out.println("dur: " + b[1]);
int pri = Integer.parseInt(b[2]);
//System.out.println("pri: " + b[2]);
stas = app(stas, sta);
//System.out.println("stas: " + Arrays.toString(stas));
durs = app(durs, dur);
//System.out.println("durs: " + Arrays.toString(durs));
lefs = app(lefs, dur);
//System.out.println("lefs: " + Arrays.toString(lefs));
pris = app(pris, pri);
//System.out.println("pris: " + Arrays.toString(pris));
fins = app(fins, -1);
//System.out.println("fins: " + Arrays.toString(fins));
}
catch(NumberFormatException e) {}
}
}
fp.close();
}
catch(FileNotFoundException e) { e.printStackTrace(); }
catch(IOException e) { e.printStackTrace(); }
clock = 0;
}
public boolean done()
{
boolean done = true;
for(int i=0; done && i<lefs.length; i++)
if(lefs[i]>0) done=false;
return done;
}
public int getClock() { return clock; }
public int[][] getJobs()
{
int count = 0;
for(int i=0; i<stas.length; i++)
if(stas[i]<=clock && lefs[i]>0)
count++;
int[][] jobs = new int[count][3];
count = 0;
for(int i=0; i<stas.length; i++)
if(stas[i]<=clock && lefs[i]>0)
{
jobs[count] = new int[]{i, lefs[i], pris[i]};
count++;
}
return jobs;
}
public int cycle() { return cycle(-1); }
public int cycle(int j)
{
if(j>=0 && j<lefs.length && clock>=stas[j] && lefs[j]>0)
{
lefs[j]--;
if(lefs[j] == 0) fins[j] = clock+1;
}
clock++;
return clock;
}
private int[] app(int[] a, int b)
{
int[] tmp = new int[a.length+1];
for(int i=0; i<a.length; i++) tmp[i] = a[i];
tmp[a.length] = b;
return tmp;
}
public String report()
{
String r = "JOB,PRIORITY,START,DURATION,FINISH,DELAY,PRI*DELAY\n";
float dn=0;
float pdn=0;
for(int i=0; i<stas.length; i++)
{
if(fins[i]>=0)
{
int delay = ((fins[i]-stas[i])-durs[i]);
r+= ""+i+","+pris[i]+","+stas[i]+","+durs[i]+","+fins[i]+","+delay+","+(pris[i]*delay)+"\n";
dn+= delay;
pdn+= pris[i]*delay;
}
else
{
int delay = ((clock*10-stas[i])-durs[i]);
r+= ""+i+","+pris[i]+","+stas[i]+","+durs[i]+","+fins[i]+","+delay+","+(pris[i]*delay)+"\n";
dn+= delay;
pdn+= pris[i]*delay;
}
}
if(stas.length>0)
{
r+= "Avg,,,,,"+(dn/stas.length)+","+pdn/stas.length+"\n";
}
return r;
}
public String toString()
{
String r = "There are "+stas.length+" jobs:\n";
for(int i=0; i<stas.length; i++)
{
r+= " JOB "+i+": START="+stas[i]+" DURATION="+durs[i]+" DURATION_LEFT="+lefs[i]+" PRIORITY="+pris[i]+"\n";
}
return r;
}
I don't need full code, just an idea of how to store wait times and cycle the correct job.

While a array based solution 'may' work, I would advocate a more object oriented approach. Create 'Job' class with the desire attributes (id, start_time, wait etc). Using the csv file, create Job objects and hold them in a list. Write a comparator to sort this jobs-list (in this case based on job wait/age would be the factor).
The job executor then has to do the following:
while(jobs exist) {
iterate on the list {
if job is executable // start_time > current sys_time
consume cycles/job for executable jobs
mark completed jobs (optional)
}
remove the completed jobs
}

//\ This loop will add +1 to each job
for(int i = 0; i < jobs.length; i++)
{
waitTime[jobs[i][0]] += 1;
}
int longestWait = 0;//\ This holds value for greatest wait time
int nextJob = 0; //\ This holds value for index of job with greatest wait time
//\ this loop will check for the greatest wait time and and set variables accordingly
for(int i = 0; i < waitTime.length; i++)
{
if(waitTime[i] > longestWait)
{
longestWait = waitTime[i];
nextJob = i;
}
}
//\ this cycles the job with the highest wait time
jobsource.cycle(nextJob);
//\ this resets the wait time for processed job
waitTime[nextJob] = 0;

CombSort implementation in java

I am using Comb Sort to sort out a given array of Strings. The code is :-
public static int combSort(String[] input_array) {
int gap = input_array.length;
double shrink = 1.3;
int numbOfComparisons = 0;
boolean swapped=true;
//while(!swapped && gap>1){
System.out.println();
while(!(swapped && gap==1)){
gap = (int)(gap/shrink);
if(gap<1){
gap=1;
}
int i = 0;
swapped = false;
String temp = "";
while((i+gap) < input_array.length){
numbOfComparisons++;
if(Compare(input_array[i], input_array[i+gap]) == 1){
temp = input_array[i];
input_array[i] = input_array[i+gap];
input_array[i+gap] = temp;
swapped = true;
System.out.println("gap: " + gap + " i: " + i);
ArrayUtilities.printArray(input_array);
}
i++;
}
}
ArrayUtilities.printArray(input_array);
return numbOfComparisons;
}
The problem is that while it sorts many arrays , it gets stuck in an infinite loop for some arrays, particularly small arrays. Compare(input_array[i], input_array[i+gap]) is a small method that returns 1 if s1>s2, returns -1 if s1

try this version. The string array is changed to integer array (I guess you can change it back to string version). The constant 1.3 is replaced with 1.247330950103979.
public class CombSort
{
private static final int PROBLEM_SIZE = 5;
static int[] in = new int[PROBLEM_SIZE];
public static void printArr()
{
for(int i=0;i<in.length;i++)
{
System.out.print(in[i] + "\t");
}
System.out.println();
}
public static void combSort()
{
int swap, i, gap=PROBLEM_SIZE;
boolean swapped = false;
printArr();
while ((gap > 1) || swapped)
{
if (gap > 1)
{
gap = (int)( gap / 1.247330950103979);
}
swapped = false;
for (i = 0; gap + i < PROBLEM_SIZE; ++i)
{
if (in[i] - in[i + gap] > 0)
{
swap = in[i];
in[i] = in[i + gap];
in[i + gap] = swap;
swapped = true;
}
}
}
printArr();
}
public static void main(String[] args)
{
for(int i=0;i<in.length;i++)
{
in[i] = (int) (Math.random()*PROBLEM_SIZE);
}
combSort();
}
}

Please find below implementation for comb sort in java.
public static void combSort(int[] elements) {
float shrinkFactor = 1.3f;
int postion = (int) (elements.length/shrinkFactor);
do {
int cursor = postion;
for(int i=0;cursor<elements.length;i++,cursor++) {
if(elements[i]>elements[cursor]) {
int temp = elements[cursor];
elements[cursor] = elements[i];
elements[i] = temp;
}
}
postion = (int) (postion/shrinkFactor);
}while(postion>=1);
}
Please review and let me know your's feedback.

Fork and Join Java

Hey guys I need some help with my homework. I understand the way the Fork and Join Framework works, but my code does not join the results. Our exercise is to write a program, that counts the true values in an array. Sorry for any mistakes (bad grammar or something else) in this post, it is my first one.
Edit:
Thanks for all the requests here is my solution of this problem:
TrueFinder Class:
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.RecursiveTask;
class TrueFinder extends RecursiveTask<TrueResult>
{
private static final int SEQUENTIAL_THRESHOLD = 5;
private boolean[] trueData;
private final int start;
private final int end;
public TrueFinder(boolean[] data, int start, int end)
{
this.trueData = data;
this.start = start;
this.end = end;
}
public TrueFinder(boolean[] data)
{
this(data, 0, data.length);
}
protected TrueResult compute()
{
final int length = end - start;
int counter = 0;
if (length < SEQUENTIAL_THRESHOLD)
{
for (int i = start; i < end; i++)
{
if (trueData[i])
{
counter++;
}
}
return new TrueResult(counter);
}
else
{
final int split = length / 2;
TrueFinder left = new TrueFinder(trueData, start, start + split);
left.fork();
TrueFinder right = new TrueFinder(trueData, start + split, end);
TrueResult subResultRight = right.compute();
TrueResult subResultLeft = left.join();
return new TrueResult(subResultRight.getTrueCounter() +
subResultLeft.getTrueCounter());
}
}
public static void main(String[] args)
{
int trues = 0;
boolean[] trueArray = new boolean[500];
for (int i = 0; i < 500; i++)
{
if (Math.random() < 0.3)
{
trueArray[i] = true;
trues++;
}
else
{
trueArray[i] = false;
}
}
TrueFinder finder = new TrueFinder(trueArray);
ForkJoinPool pool = new ForkJoinPool(4);
long startTime = System.currentTimeMillis();
TrueResult result = pool.invoke(finder);
long endTime = System.currentTimeMillis();
long actualTime = endTime - startTime;
System.out.println("Array mit der Länge " + trueArray.length + " in"
actualTime + " msec dursucht und " + result.getTrueCounter() +
" von " + trues + " True Werten gefunden.");
}
}
And the result class:
public class TrueResult
{
private int trueCounter;
public TrueResult(int counter)
{
this.trueCounter = counter;
}
public int getTrueCounter()
{
return trueCounter;
}
}

The splitting task of your souce code is wrong as :
(1) your splitting isn't started from 0:
your start is 1
(2) fraction point is ignored for your splitting;
(granted that SEQUENTIAL_THRESHOLD=5 and trueArray.length = 13, your splitting is ignoring of the numbers from 11 to 12)
(3) if you modify for (1) and (2), the length of subtasks must be split not SQCUQNTIALTHRESHOLD.
So, the modifying source code is below:
else
{
int split = (length - 1 ) / SEQUENTIAL_THRESHOLD + 1;
TrueFinder[] subtasks = new TrueFinder[split];
int start = 0;
for(int i = 0; i < split - 1; i++)
{
subtasks[i] = new TrueFinder(trueData, start, start + SEQUENTIAL_THRESHOLD);
subtasks[i].fork();
start += SEQUENTIAL_THRESHOLD;
}
subtasks[split - 1] = new TrueFinder(trueData, start, length);
counter = subtasks[split - 1].compute();// better invoking compute than join
for (int i = 0; i < SEQUENTIAL_THRESHOLD; i++)
{
counter += subtasks[i].join();
}
return new TrueResult(counter);
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

BerkeleyDB JE random access time increases non-linearly - java

Related

How to optimize this simulation in Java?

GC overhead limit exceeded while training OpenNLP's NameFinderME

Round Robin Scheduler

CombSort implementation in java

Fork and Join Java

Categories

Resources