GC overhead limit exceeded while training OpenNLP's NameFinderME

GC overhead limit exceeded while training OpenNLP's NameFinderME - java

I want to get probability score for the extracted names using NameFinderME, but using the provided model gives very bad probabilities using the probs function.
For example, "Scott F. Fitzgerald" gets a score around 0.5 (averaging log probabilities, and taking an exponent), while "North Japan" and "Executive Vice President, Corporate Relations and Chief Philanthropy Officer" both get a score higher than 0.9...
I have more than 2 million first names, and another 2 million last names (with their frequency counts) And I want to synthetically create a huge dataset from outer multiplication of the first names X middle names (using the first names pool) X last names.
The problem is, I don't even get to go over all the last names once (even when discarding freq counts and only using each name only once) before I get a GC overhead limit exceeded exception...
I'm implementing a ObjectStream and give it to the train function:
public class OpenNLPNameStream implements ObjectStream<NameSample> {
private List<Map<String, Object>> firstNames = null;
private List<Map<String, Object>> lastNames = null;
private int firstNameIdx = 0;
private int firstNameCountIdx = 0;
private int middleNameIdx = 0;
private int middleNameCountIdx = 0;
private int lastNameIdx = 0;
private int lastNameCountIdx = 0;
private int firstNameMaxCount = 0;
private int middleNameMaxCount = 0;
private int lastNameMaxCount = 0;
private int firstNameKBSize = 0;
private int lastNameKBSize = 0;
Span span[] = new Span[1];
String fullName[] = new String[3];
String partialName[] = new String[2];
private void increaseFirstNameCountIdx()
{
firstNameCountIdx++;
if (firstNameCountIdx == firstNameMaxCount) {
firstNameIdx++;
if (firstNameIdx == firstNameKBSize)
return; //no need to update anything - this is the end of the run...
firstNameMaxCount = getFirstNameMaxCount(firstNameIdx);
firstNameCountIdx = 0;
}
}
private void increaseMiddleNameCountIdx()
{
lastNameCountIdx++;
if (middleNameCountIdx == middleNameMaxCount) {
if (middleNameIdx == firstNameKBSize) {
resetMiddleNameIdx();
increaseFirstNameCountIdx();
} else {
middleNameMaxCount = getMiddleNameMaxCount(middleNameIdx);
middleNameCountIdx = 0;
}
}
}
private void increaseLastNameCountIdx()
{
lastNameCountIdx++;
if (lastNameCountIdx == lastNameMaxCount) {
lastNameIdx++;
if (lastNameIdx == lastNameKBSize) {
resetLastNameIdx();
increaseMiddleNameCountIdx();
}
else {
lastNameMaxCount = getLastNameMaxCount(lastNameIdx);
lastNameCountIdx = 0;
}
}
}
private void resetLastNameIdx()
{
lastNameIdx = 0;
lastNameMaxCount = getLastNameMaxCount(0);
lastNameCountIdx = 0;
}
private void resetMiddleNameIdx()
{
middleNameIdx = 0;
middleNameMaxCount = getMiddleNameMaxCount(0);
middleNameCountIdx = 0;
}
private int getFirstNameMaxCount(int i)
{
return 1; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getMiddleNameMaxCount(int i)
{
return 3; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getLastNameMaxCount(int i)
{
return 1;
//String occurences = (String) lastNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
#Override
public NameSample read() throws IOException {
if (firstNames == null) {
firstNames = CSVFileTools.readFileFromInputStream("namep_first_name_idf.csv", new ClassPathResource("namep_first_name_idf.csv").getInputStream());
firstNameKBSize = firstNames.size();
firstNameMaxCount = getFirstNameMaxCount(0);
middleNameMaxCount = getFirstNameMaxCount(0);
}
if (lastNames == null) {
lastNames = CSVFileTools.readFileFromInputStream("namep_last_name_idf.csv",new ClassPathResource("namep_last_name_idf.csv").getInputStream());
lastNameKBSize = lastNames.size();
lastNameMaxCount = getLastNameMaxCount(0);
}
increaseLastNameCountIdx();;
if (firstNameIdx == firstNameKBSize)
return null; //we've finished iterating over all permutations!
String [] sentence;
if (firstNameCountIdx < firstNameMaxCount / 3)
{
span[0] = new Span(0,2,"Name");
sentence = partialName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[1] = (String)lastNames.get(lastNameIdx).get("last_name");
}
else
{
span[0] = new Span(0,3,"name");
sentence = fullName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[2] = (String)lastNames.get(lastNameIdx).get("last_name");
if (firstNameCountIdx < 2*firstNameCountIdx/3) {
sentence[1] = (String)firstNames.get(middleNameIdx).get("first_name");
}
else {
sentence[1] = ((String)firstNames.get(middleNameIdx).get("first_name")).substring(0,1) + ".";
}
}
return new NameSample(sentence,span,true);
}
#Override
public void reset() throws IOException, UnsupportedOperationException {
firstNameIdx = 0;
firstNameCountIdx = 0;
middleNameIdx = 0;
middleNameCountIdx = 0;
lastNameIdx = 0;
lastNameCountIdx = 0;
firstNameMaxCount = 0;
middleNameMaxCount = 0;
lastNameMaxCount = 0;
}
#Override
public void close() throws IOException {
reset();
firstNames = null;
lastNames = null;
}
}
And
TokenNameFinderModel model = NameFinderME.train("en","person",new OpenNLPNameStream(),TrainingParameters.defaultParams(),new TokenNameFinderFactory());
model.serialize(new FileOutputStream("trainedNames.bin",false));
I get the following error after a few minutes of running:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at opennlp.tools.util.featuregen.WindowFeatureGenerator.createFeatures(WindowFeatureGenerator.java:112)
at opennlp.tools.util.featuregen.AggregatedFeatureGenerator.createFeatures(AggregatedFeatureGenerator.java:79)
at opennlp.tools.util.featuregen.CachedFeatureGenerator.createFeatures(CachedFeatureGenerator.java:69)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:118)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:37)
at opennlp.tools.namefind.NameFinderEventStream.generateEvents(NameFinderEventStream.java:113)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:137)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:36)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:62)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:27)
at opennlp.tools.util.AbstractObjectStream.read(AbstractObjectStream.java:32)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:46)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:29)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:130)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
Edit: After increasing the memory of the JVM to 8GB, I still don't get past the first 2 million last names, but now the Exception is:
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at opennlp.tools.ml.model.AbstractDataIndexer.update(AbstractDataIndexer.java:141)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:134)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
It seems the problem stems from the fact I'm creating a new NameSample along with new Spans and Strings at every read call... But I can't reuse Spans or NameSamples, since they're immutables.
Should I just write my own language model, is there a better Java library for doing this sort of thing (I'm only interested in getting the probability the extracted text is actually a name) are there parameters I should tweak for the model I'm training?
Any advice would be appreciated.

Related

How to optimize this simulation in Java?

I am trying to do a multi threading simulation in Java and I have managed to do it with a queue but the execution time is high, any ideas on how I could optimize this? Can using recursion save time?
The input has to be like this:
2 5 It means that there are two threads(workers) for 5 jobs
1 2 3 4 5 This is the jobs that are an integer which means the time cost of processing that job so the output will be this:
0 0 The two threads try to simultaneously take jobs from the list, so thread with index 0 actually
1 0 takes the first job and starts working on it at the moment 0
0 1 After 1 second, thread 0 is done with the first job and takes the third job from the list, and starts processing it immediately at time 1.
1 2 One second later, thread 1 is done with the second job and takes the fourth job from the list, and starts processing it immediately at time 2
0 4 Finally, after 2 more seconds, thread 0 is done with the third job and takes the fifth job from the list, and starts processing it immediately at time 4
This is the code:
import java.io.*;
import java.util.HashMap;
import java.util.HashSet;
import java.util.PriorityQueue;
import java.util.Set;
import java.util.StringTokenizer;
public class JobQueue {
private int numWorkers;
private int[] jobs;
private int[] assignedWorker;
private long[] startTime;
private FastScanner in;
private PrintWriter out;
public static void main(String[] args) throws IOException {
new JobQueue().solve();
}
private void readData() throws IOException {
numWorkers = in.nextInt();
int m = in.nextInt();
jobs = new int[m];
for (int i = 0; i < m; ++i) {
jobs[i] = in.nextInt();
}
}
private void writeResponse() {
for (int i = 0; i < jobs.length; ++i) {
out.println(assignedWorker[i] + " " + startTime[i]);
}
}
private void assignJobs() {
// TODO: replace this code with a faster algorithm.
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
PriorityQueue<Integer> nextTimesQueue = new PriorityQueue<Integer>();
HashMap<Integer, Set<Integer>> workersReadyAtTimeT = new HashMap<Integer,Set<Integer>>();
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
if(i<numWorkers) {
bestWorker = i;
nextTimesQueue.add(duration);
addToSet(workersReadyAtTimeT, duration, i,0);
}else {
int currentTime = nextTimesQueue.poll();
Set<Integer> workersReady = workersReadyAtTimeT.get(currentTime);
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
} else {
bestWorker = workersReady.iterator().next();
workersReadyAtTimeT.remove(currentTime);
nextTimesQueue.add(currentTime+duration);
addToSet(workersReadyAtTimeT, duration, bestWorker, currentTime);
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
private void addToSet(HashMap<Integer, Set<Integer>> workersReadyAtTimeT, int duration, int worker, int current) {
if(workersReadyAtTimeT.get(current+duration)==null) {
HashSet<Integer> s = new HashSet<Integer>();
s.add(worker);
workersReadyAtTimeT.put(current+duration, s);
}else {
Set<Integer> s = workersReadyAtTimeT.get(current+duration);
s.add(worker);
workersReadyAtTimeT.put(current+duration,s);
}
}
public void solve() throws IOException {
in = new FastScanner();
out = new PrintWriter(new BufferedOutputStream(System.out));
readData();
assignJobs();
writeResponse();
out.close();
}
static class FastScanner {
private BufferedReader reader;
private StringTokenizer tokenizer;
public FastScanner() {
reader = new BufferedReader(new InputStreamReader(System.in));
tokenizer = null;
}
public String next() throws IOException {
while (tokenizer == null || !tokenizer.hasMoreTokens()) {
tokenizer = new StringTokenizer(reader.readLine());
}
return tokenizer.nextToken();
}
public int nextInt() throws IOException {
return Integer.parseInt(next());
}
}
}

It seems to me that your jobsList object is completely redundant, everything it contains is also in the jobs array and when you take the front element you get the item at jobs[i]. To speed up a little you could take the constructors of the ints out of the loop and just assign new numbers to them. Another optimization would be to not search during the first numWorkers jobs because you know you still have idle workers until you have exausted your pool. Once you have found one good worker you dont have to keep looking so you can continue out of your for-loop.
public class JobQueue {
private int numWorkers;
private int[] jobs;
private int[] assignedWorker;
private long[] startTime;
private void readData() throws IOException {
numWorkers = in.nextInt();
int m = in.nextInt();
jobs = new int[m];
for (int i = 0; i < m; ++i) {
jobs[i] = in.nextInt();
}
}
private void assignJobs() {
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
bestWorker = 0;
if (i< numWorkers){
bestWorker= i;
} else{
for (int j = 0; j < numWorkers; ++j) {
if (nextFreeTime[j] < nextFreeTime[bestWorker])
bestWorker = j;
continue;
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
However, both your solution and this slightly trimmed down one take 2 milliseconds to run. I also looked at having HashMap to maintain a NextWorker marker but at some point you catch up with it and end up looking everytime for the next one and don't win much.
You could try having an ordered List/Queue, but then you have expensive inserts instead of expensive searches, and you have to kee track of the timeslice. But a version like that could look like this:
private void assignJobs() {
assignedWorker = new int[jobs.length];
startTime = new long[jobs.length];
PriorityQueue<Integer> nextTimesQueue = new PriorityQueue<Integer>();
HashMap<Integer, Set<Integer>> workersReadyAtTimeT = new HashMap<Integer,Set<Integer>>();
long[] nextFreeTime = new long[numWorkers];
int duration = 0;
int bestWorker = 0;
for (int i = 0; i < jobs.length; i++) {
duration = jobs[i];
if(i<numWorkers) {
bestWorker = i;
nextTimesQueue.add(duration);
addToSet(workersReadyAtTimeT, duration, i,0);
}else {
int currentTime = nextTimesQueue.poll();
Set<Integer> workersReady = workersReadyAtTimeT.get(currentTime);
if (workersReady.size()>1) {
bestWorker = workersReady.iterator().next();
workersReady.remove(bestWorker);
workersReadyAtTimeT.remove(currentTime);
workersReadyAtTimeT.put(currentTime,workersReady);
nextTimesQueue.add(currentTime);
} else {
bestWorker = workersReady.iterator().next();
workersReadyAtTimeT.remove(currentTime);
nextTimesQueue.add(currentTime+duration);
addToSet(workersReadyAtTimeT, duration, bestWorker, currentTime);
}
}
assignedWorker[i] = bestWorker;
startTime[i] = nextFreeTime[bestWorker];
nextFreeTime[bestWorker] += duration;
}
}
private void addToSet(HashMap<Integer, Set<Integer>> workersReadyAtTimeT, int duration, int worker, int current) {
if(workersReadyAtTimeT.get(current+duration)==null) {
HashSet<Integer> s = new HashSet<Integer>();
s.add(worker);
workersReadyAtTimeT.put(current+duration, s);
}else {
Set<Integer> s = workersReadyAtTimeT.get(current+duration);
s.add(worker);
workersReadyAtTimeT.put(current+duration,s);
}
}

How to extend a class array/ call class array attributes after extend a class?

for a larg-DB based dynamically-multiple-area-menu, I created a class MenuPoint:
class MenuPoint{
public int areaId;
public int preID;
public String name;
public String desc;
public String stepInImg = "bsp.img";
public String stepOutImg = "bsp.img";
public String detailedForm = "bsp.fxml";
public String detailedImg = "bsp.img";
public String [] additionalOptionForm = new String[0];
public String [] additionalOptionName = new String[0];
public String [] additionalOptionImg = new String[0];}
and initialised it as an 0-length array in my main class: MenuController
public MenuPoint [] menuItem = new MenuPoint[0];
I use a API call to get my DB stored information by initialisation of MenuController.
I store the results by using the following code:
int dataStructlength = 12;
String[] exampleApiResultKeys = new String[dataStructlength];
exampleApiResultKeys[0] = "SITE_NUMBER";
exampleApiResultKeys[1] = "SITE_DESC";
exampleApiResultKeys[2] = "SITE_NUMBER_EXT";
exampleApiResultKeys[3] = "CELL_NUMBER";
exampleApiResultKeys[4] = "CELL_DESC";
exampleApiResultKeys[5] = "CELL_TYPE";
exampleApiResultKeys[6] = "MACHINE_GROUP_NUMBER";
exampleApiResultKeys[7] = "MACHINE_GROUP_DESC";
exampleApiResultKeys[8] = "MACHINE_GROUP_TYPE";
exampleApiResultKeys[9] = "STATION_NUMBER";
exampleApiResultKeys[10] = "STATION_DESC";
exampleApiResultKeys[11] = "STATION_TYPE";
exampleApiController.test_mdataGetMachineAssetStructure(exampleApiFilter, exampleApiResultKeys);
for(int i = 0; exampleApiController.resultValues.value != null && i < exampleApiController.resultValues.value.length/12; i++)
{
boolean isUseless = false;
for(int a = 0; a < dataStructlength; a ++)
if(true == exampleApiController.resultValues.value[i*dataStructlength+a].trim().isEmpty())
isUseless = true;
if(!isUseless)
{
int preId= -1;
if("M".equals(exampleApiController.resultValues.value[i*12+5]))
{
if(giveItemId(0, preId, exampleApiController.resultValues.value[i*12]) >= 0)
preId = giveItemId(0, preId, exampleApiController.resultValues.value[i*12]);
else
preId = addMenuItem(0, preId, exampleApiController.resultValues.value[i*12], exampleApiController.resultValues.value[i*12+1], "bsp.form");
}
if("M".equals(exampleApiController.resultValues.value[i*12+5]))
{
if(giveItemId(1, preId, exampleApiController.resultValues.value[i*12+3]) >= 0)
preId = giveItemId(0, preId, exampleApiController.resultValues.value[i*12+3]);
else
preId = addMenuItem(1, preId, exampleApiController.resultValues.value[i*12+3], exampleApiController.resultValues.value[i*12+4], "bsp.form");
}
if("M".equals(exampleApiController.resultValues.value[i*12+8]))
{
if(giveItemId(2, preId, exampleApiController.resultValues.value[i*12+6]) >= 0)
preId = giveItemId(0, preId, exampleApiController.resultValues.value[i*12+6]);
else
preId = addMenuItem(2, preId, exampleApiController.resultValues.value[i*12+6], exampleApiController.resultValues.value[i*12+7], "bsp.form");
}
if("M".equals(exampleApiController.resultValues.value[i*12+11]))
{
if(giveItemId(3, preId, exampleApiController.resultValues.value[i*12+9]) >= 0)
preId = giveItemId(0, preId, exampleApiController.resultValues.value[i*12+9]);
else
preId = addMenuItem(3, preId, exampleApiController.resultValues.value[i*12+9], exampleApiController.resultValues.value[i*12+10], "bsp.form");
}
}
giveItemId:
public int giveItemId(int areaId_temp, int preId_temp, String name_temp)
{
for(int i = 0; i < menuItem.length; i++)
{
if(menuItem[i].areaId == areaId_temp && menuItem[i].name.equals(name_temp) && menuItem[i].preID == preId_temp)
return i;
}
return -1;
}
addMenuItem:
public int addMenuItem(int areaId_temp, int preId_temp, String name_temp, String desc_temp, String form_temp)
{
Object newArray1 = Array.newInstance(menuItem.getClass().getComponentType(), Array.getLength(menuItem)+1); // +1 Arrayfeld
System.arraycopy(menuItem, 0, newArray1, 0, Array.getLength(menuItem));
menuItem = (MenuPoint[]) newArray1; // expend but missing attributes
menuItem[menuItem.length-1].areaId = areaId_temp;
menuItem[menuItem.length-1].preID = preId_temp;
menuItem[menuItem.length-1].name = name_temp;
menuItem[menuItem.length-1].desc = desc_temp;
menuItem[menuItem.length-1].detailedForm = form_temp;
return menuItem.length-1;
}
I found out that menuItem dons´t carry any attributes after expending it.
Do I have to create "new" Instances of MenuPoint to make it work?
Is it even possible to expend a class-array without loosing attributes or their values?
It shoud, because in the end menuItem is just a pointer-array, pointing on multiple workstorage-addresses, aren't it?
Thank you guys for any tips or better concept you can give me. (I know this class-concept is silly but I have no idea for a better one)
And please excuse my bad grammar.

You create a new array in method addMenuItem which is then populated with the members from the existing menuItem, but the element at index (new) length - 1 isn't initialized:
menuItem[menuItem.length-1] = new MenuPoint();
You should have gotten a NullPointerException when trying to set the fields.
All this usage of java.lang.reflect.Array is rather contrived. There are simpler ways to achieve this. Here is a simplified versio of addMenuItem
public int addMenuItem(int areaId, String name){
MenuPoint[] newMenuItem = new MenuPoint[menuItem.length + 1];
System.arraycopy(menuItem, 0, newMenuITem, 0, menuItem.length);
menuItem = newMenuItem;
menuItem[menuItem.length-1] = new MenuPoint();
menuItem[menuItem.length-1].areaId = areaId;
menuItem[menuItem.length-1].name = name;
return menuItem.length; // Why -1 ?
}
But, above all, do use a List<MenuPoint>.

For the answer of your question Do I have to create "new" Instances of MenuPoint to make it work?
Yes
From your code snippets you need to initialize MenuPoint class object to store its values to the array menuItem.
Initialize your array element with below line
menuItem[menuItem.length-1] = new MenuPoint();
Place above line after copying your array by menuItem = (MenuPoint[]) newArray1;
Without initialization there is NullPointerException exception for setting values to null object.
There are some better ways to maintain list of object using ArrayList class.
To use ArrayList you need following code change
public List<MenuPoint> menuItem = new ArrayList<MenuPoint>; //Initialize list
Your Function will be
public int addMenuItem(int areaId_temp, int preId_temp, String name_temp, String desc_temp, String form_temp)
{
MenuPoint menuPoint = new MenuPoint();
menuPoint.areaId = areaId_temp;
menuPoint.preID = preId_temp;
menuPoint.name = name_temp;
menuPoint.desc = desc_temp;
menuPoint.detailedForm = form_temp;
menuItem.add(menuPoint);
return menuItem.size()-1;
}
I think this is easy than using reflection with array.

The reduce fails due to Task attempt failed to report status for 600 seconds. Killing! Solution?

The reduce phase of the job fails with:
of failed Reduce Tasks exceeded allowed limit.
The reason why each task fails is:
Task attempt_201301251556_1637_r_000005_0 failed to report status for 600 seconds. Killing!
Problem in detail:
The Map phase takes in each record which is of the format: time, rid, data.
The data is of the format: data element, and its count.
eg: a,1 b,4 c,7 correseponds to the data of a record.
The mapper outputs for each data element the data for every record. eg:
key:(time, a,), val: (rid,data)
key:(time, b,), val: (rid,data)
key:(time, c,), val: (rid,data)
Every reduce receives all the data corresponding to same key from all the records.
e.g:
key:(time, a), val:(rid1, data) and
key:(time, a), val:(rid2, data)
reach the same reduce instance.
It does some processing here and outputs similar rids.
My program runs without trouble for a small dataset such as 10MB. But fails when the data increases to say 1G, with the above mentioned reason. I don't know why this happens. Please help!
Reduce code:
There are two classes below:
VCLReduce0Split
CoreSplit
a. VCLReduce0SPlit
public class VCLReduce0Split extends MapReduceBase implements Reducer<Text, Text, Text, Text>{
// #SuppressWarnings("unchecked")
public void reduce (Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String key_str = key.toString();
StringTokenizer stk = new StringTokenizer(key_str);
String t = stk.nextToken();
HashMap<String, String> hmap = new HashMap<String, String>();
while(values.hasNext())
{
StringBuffer sbuf1 = new StringBuffer();
String val = values.next().toString();
StringTokenizer st = new StringTokenizer(val);
String uid = st.nextToken();
String data = st.nextToken();
int total_size = 0;
StringTokenizer stx = new StringTokenizer(data,"|");
StringBuffer sbuf = new StringBuffer();
while(stx.hasMoreTokens())
{
String data_part = stx.nextToken();
String data_freq = stx.nextToken();
// System.out.println("data_part:----->"+data_part+" data_freq:----->"+data_freq);
sbuf.append(data_part);
sbuf.append("|");
sbuf.append(data_freq);
sbuf.append("|");
}
/*
for(int i = 0; i<parts.length-1; i++)
{
System.out.println("data:--------------->"+data);
int part_size = Integer.parseInt(parts[i+1]);
sbuf.append(parts[i]);
sbuf.append("|");
sbuf.append(part_size);
sbuf.append("|");
total_size = part_size+total_size;
i++;
}*/
sbuf1.append(String.valueOf(total_size));
sbuf1.append(",");
sbuf1.append(sbuf);
if(uid.equals("203664471")){
// System.out.println("data:--------------------------->"+data+" tot_size:---->"+total_size+" sbuf:------->"+sbuf);
}
hmap.put(uid, sbuf1.toString());
}
float threshold = (float)0.8;
CoreSplit obj = new CoreSplit();
ArrayList<CustomMapSimilarity> al = obj.similarityCalculation(t, hmap, threshold);
for(int i = 0; i<al.size(); i++)
{
CustomMapSimilarity cmaps = al.get(i);
String xy_pair = cmaps.getRIDPair();
String similarity = cmaps.getSimilarity();
output.collect(new Text(xy_pair), new Text(similarity));
}
}
}
b. coreSplit
package com.a;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.StringTokenizer;
import java.util.TreeMap;
import org.apache.commons.collections.map.MultiValueMap;
public class PPJoinPlusCoreOptNewSplit{
public ArrayList<CustomMapSimilarity> similarityCalculation(String time, HashMap<String,String>hmap, float t)
{
ArrayList<CustomMapSimilarity> als = new ArrayList<CustomMapSimilarity>();
ArrayList<CustomMapSimilarity> alsim = new ArrayList<CustomMapSimilarity>();
Iterator<String> iter = hmap.keySet().iterator();
MultiValueMap index = new MultiValueMap();
String RID;
TreeMap<String, Integer> hmap2;
Iterator<String> iter1;
int size;
float prefix_size;
HashMap<String, Float> alpha;
HashMap<String, CustomMapOverlap> hmap_overlap;
String data;
while(iter.hasNext())
{
RID = (String)iter.next();
String data_val = hmap.get(RID);
StringTokenizer st = new StringTokenizer(data_val,",");
// System.out.println("data_val:--**********-->"+data_val+" RID:------------>"+RID+" time::---?"+time);
String RIDsize = st.nextToken();
size = Integer.parseInt(RIDsize);
data = st.nextToken();
StringTokenizer st1 = new StringTokenizer(data,"\\|");
String[] parts = data.split("\\|");
// hmap2 = (TreeMap<String, Integer>)hmap.get(RID);
// iter1 = hmap2.keySet().iterator();
// size = hmap_size.get(RID);
prefix_size = (float)(size-(0.8*size)+1);
if(size==1)
{
prefix_size = 1;
}
alpha = new HashMap<String, Float>();
hmap_overlap = new HashMap<String, CustomMapOverlap>();
// Iterator<String> iter2 = hmap2.keySet().iterator();
int prefix_index = 0;
int pi=0;
for(float j = 0; j<=prefix_size; j++)
{
boolean prefix_chk = false;
prefix_index++;
String ptoken = parts[pi];
// System.out.println("data:---->"+data+" ptoken:---->"+ptoken);
float val = Float.parseFloat(parts[pi+1]);
float temp_j = j;
j = j+val;
boolean j_l = false ;
float prefix_contri = 0;
pi= pi+2;
if(j>prefix_size)
{
// prefix_contri = j-temp_j;
prefix_contri = prefix_size-temp_j;
if(prefix_contri>0)
{
j_l = true;
prefix_chk = false;
}
else
{
prefix_chk = true;
}
}
if(prefix_chk == false){
filters(index, ptoken, RID, hmap,t, size, val, j_l, alpha, hmap_overlap, j, prefix_contri);
CustomMapPrefixTokens cmapt = new CustomMapPrefixTokens(RID,j);
index.put(ptoken, cmapt);
}
}
als = calcSimilarity(time, RID, hmap, alpha, hmap_overlap);
for(int i = 0; i<als.size(); i++)
{
if(als.get(i).getRIDPair()!=null)
{
alsim.add(als.get(i));
}
}
}
return alsim;
}
public void filters(MultiValueMap index, String ptoken, String RID, HashMap<String, String> hmap, float t, int size, float val, boolean j_l, HashMap<String, Float> alpha, HashMap<String, CustomMapOverlap> hmap_overlap, float j, float prefix_contri)
{
#SuppressWarnings("unchecked")
ArrayList<CustomMapPrefixTokens> positions_list = (ArrayList<CustomMapPrefixTokens>) index.get(ptoken);
if((positions_list!=null) &&(positions_list.size()!=0))
{
CustomMapPrefixTokens cmapt ;
String y;
Iterator<String> iter3;
int y_size = 0;
float check_size = 0;
// TreeMap<String, Integer> hmapy;
float RID_val=0;
float y_overlap = 0;
float ubound = 0;
ArrayList<Float> fl = new ArrayList<Float>();
StringTokenizer st;
for(int k = 0; k<positions_list.size(); k++)
{
cmapt = positions_list.get(k);
if(!cmapt.getRID().equals(RID))
{
y = hmap.get(cmapt.getRID());
// iter3 = y.keySet().iterator();
String yRID = cmapt.getRID();
st = new StringTokenizer(y,",");
y_size = Integer.parseInt(st.nextToken());
check_size = (float)0.8*(size);
if(y_size>=check_size)
{
//hmapy = hmap.get(yRID);
String y_data = st.nextToken();
StringTokenizer st1 = new StringTokenizer(y_data,"\\|");
while(st1.hasMoreTokens())
{
String token = st1.nextToken();
if(token.equals(ptoken))
{
String nxt_token = st1.nextToken();
// System.out.println("ydata:--->"+y_data+" nxt_token:--->"+nxt_token);
RID_val = (float)Integer.parseInt(nxt_token);
break;
}
}
// RID_val = (float) hmapy.get(ptoken);
float alpha1 = (float)(0.8/1.8)*(size+y_size);
fl = overlapCalc(alpha1, size, y_size, cmapt, j, alpha, j_l,RID_val,val,prefix_contri);
ubound = fl.get(0);
y_overlap = fl.get(1);
positionFilter(ubound, alpha1, cmapt, y_overlap, hmap_overlap);
}
}
}
}
}
public void positionFilter( float ubound,float alpha1, CustomMapPrefixTokens cmapt, float y_overlap, HashMap<String, CustomMapOverlap> hmap_overlap)
{
float y_overlap_total = 0;
if(null!=hmap_overlap.get(cmapt.getRID()))
{
y_overlap_total = hmap_overlap.get(cmapt.getRID()).getOverlap();
if((y_overlap_total+ubound)>=alpha1)
{
CustomMapOverlap cmap_tmp = hmap_overlap.get(cmapt.getRID());
float y_o_t = y_overlap+y_overlap_total;
cmap_tmp.setOverlap(y_o_t);
hmap_overlap.put(cmapt.getRID(),cmap_tmp);
}
else
{
float n = 0;
hmap_overlap.put(cmapt.getRID(), new CustomMapOverlap(cmapt.getRID(),n));
}
}
else
{
CustomMapOverlap cmap_tmp = new CustomMapOverlap(cmapt.getRID(),y_overlap);
hmap_overlap.put(cmapt.getRID(), cmap_tmp);
}
}
public ArrayList<Float> overlapCalc(float alpha1, int size, int y_size, CustomMapPrefixTokens cmapt, float j, HashMap<String, Float> alpha, boolean j_l, float RID_val, float val, float prefix_contri )
{
alpha.put(cmapt.getRID(), alpha1);
float min1 = y_size-cmapt.getPosition();
float min2 = size-j;
float min = 0;
float y_overlap = 0;
if(min1<min2)
{
min = min1;
}
else
{
min = min2;
}
if(j_l==true)
{
val = prefix_contri;
}
if(RID_val<val)
{
y_overlap = RID_val;
}
else
{
y_overlap = val;
}
float ubound = y_overlap+min;
ArrayList<Float> fl = new ArrayList<Float>();
fl.add(ubound);
fl.add(y_overlap);
return fl;
}
public ArrayList<CustomMapSimilarity> calcSimilarity( String time, String RID, HashMap<String,String> hmap , HashMap<String, Float> alpha, HashMap<String, CustomMapOverlap> hmap_overlap)
{
float jaccard = 0;
CustomMapSimilarity cms = new CustomMapSimilarity(null, null);
ArrayList<CustomMapSimilarity> alsim = new ArrayList<CustomMapSimilarity>();
Iterator<String> iter = hmap_overlap.keySet().iterator();
while(iter.hasNext())
{
String key = (String)iter.next();
CustomMapOverlap val = (CustomMapOverlap)hmap_overlap.get(key);
float overlap = (float)val.getOverlap();
if(overlap>0)
{
String yRID = val.getRID();
String RIDpair = RID+" "+yRID;
jaccard = unionIntersection(hmap, RIDpair);
if(jaccard>0.8)
{
cms = new CustomMapSimilarity(time+" "+RIDpair, String.valueOf(jaccard));
alsim.add(cms);
}
}
}
return alsim;
}
public float unionIntersection( HashMap<String,String> hmap, String RIDpair)
{
StringTokenizer st = new StringTokenizer(RIDpair);
String xRID = st.nextToken();
String yRID = st.nextToken();
String xdata = hmap.get(xRID);
String ydata = hmap.get(yRID);
int total_union = 0;
int xval = 0;
int yval = 0;
int part_union = 0;
int total_intersect = 0;
// System.out.println("xdata:------*************>"+xdata);
StringTokenizer xtokenizer = new StringTokenizer(xdata,",");
StringTokenizer ytokenizer = new StringTokenizer(ydata,",");
// String[] xpart = xdata.split(",");
// String[] ypart = ydata.split(",");
xtokenizer.nextToken();
ytokenizer.nextToken();
String datax = xtokenizer.nextToken();
String datay = ytokenizer.nextToken();
HashMap<String,Integer> x = new HashMap<String, Integer>();
HashMap<String,Integer> y = new HashMap<String, Integer>();
String [] xparts;
xparts = datax.toString().split("\\|");
String [] yparts;
yparts = datay.toString().split("\\|");
for(int i = 0; i<xparts.length-1; i++)
{
int part_size = Integer.parseInt(xparts[i+1]);
x.put(xparts[i], part_size);
i++;
}
for(int i = 0; i<yparts.length-1; i++)
{
int part_size = Integer.parseInt(yparts[i+1]);
y.put(xparts[i], part_size);
i++;
}
Set<String> xset = x.keySet();
Set<String> yset = y.keySet();
for(String elm:xset )
{
yval = 0;
xval = (Integer)x.get(elm);
part_union = 0;
int part_intersect = 0;
if(yset.contains(elm)){
yval = (Integer) y.get(elm);
if(xval>yval)
{
part_union = xval;
part_intersect = yval;
}
else
{
part_union = yval;
part_intersect = xval;
}
total_intersect = total_intersect+part_intersect;
}
else
{
part_union = xval;
}
total_union = total_union+part_union;
}
for(String elm: yset)
{
part_union = 0;
if(!xset.contains(elm))
{
part_union = (Integer) y.get(elm);
total_union = total_union+part_union;
}
}
float jaccard = (float)total_intersect/total_union;
return jaccard;
}
}

The reason for the timeouts might be a long-running computation in your reducer without reporting the progress back to the Hadoop framework. This can be resolved using different approaches:
I. Increasing the timeout in mapred-site.xml:
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
The default is 600000 ms = 600 seconds.
II. Reporting progress every x records as in the Reducer example in javadoc:
public void reduce(K key, Iterator<V> values,
OutputCollector<K, V> output,
Reporter reporter) throws IOException {
// report progress
if ((noValues%10) == 0) {
reporter.progress();
}
// ...
}
optionally you can increment a custom counter as in the example:
reporter.incrCounter(NUM_RECORDS, 1);

It's possible that you might have consumed all of Java's heap space or GC is happening too frequently giving no chance to the reducer to report status to master and is hence killed.
Another possibility is that one of the reducer is getting too skewed data, i.e. for a particular rid, a lot of records are there.
Try to increase your java heap by setting the following config:
mapred.child.java.opts
to
-Xmx2048m
Also, try and reduce the number of parallel reducers by setting the following config to a lower value than what it currently has (default value is 2):
mapred.tasktracker.reduce.tasks.maximum

BerkeleyDB JE random access time increases non-linearly

I am testing BerkeleyDB Java Edition to understand whether I can use it in my project.
I've created very simple program which works with object of class com.sleepycat.je.Database:
writes N records of 5-15kb each, with keys generated like Integer.toString(random.nextInt());
reads these records fetching them with method Database#get in the same order they were created;
reads the same number of records with method Database#get in random order.
And I now see the strange thing. Execution time for third test grows very non-linearly with increasing of the number of records.
N=80000, write=55sec, sequential fetch=17sec, random fetch=3sec
N=100000, write=60sec, sequential fetch=20sec, random fetch=7sec
N=120000, write=68sec, sequential fetch=27sec, random fetch=11sec
N=140000, write=82sec, sequential fetch=32sec, random fetch=47sec
(I've run tests several times, of course.)
I suppose I am doing something quite wrong. Here is the source for reference (sorry, it is bit long), methods are called in the same order:
private Environment env;
private Database db;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen() {
EnvironmentConfig ec = new EnvironmentConfig();
DatabaseConfig dc = new DatabaseConfig();
ec.setAllowCreate(true);
dc.setAllowCreate(true);
env = new Environment(new File("mydbenv"), ec);
db = env.openDatabase(null, "moe", dc);
return true;
}
public int storeRecords(int i) {
int j;
long size = 0;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
key.setData(k.getBytes());
val.setData(data);
db.put(null, key, val);
}
System.out.println("GENERATED SIZE: " + size);
return j;
}
public int fetchRecords(int i) {
int j, res;
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
key.setData(k.getBytes());
db.get(null, key, val, null);
if (Arrays.equals(data, val.getData())) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.getData().length);
}
}
return res;
}
public int fetchRandom(int i) {
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry val = new DatabaseEntry();
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
key.setData(k.getBytes());
db.get(null, key, val, null);
}
return i;
}

Performance degradation is non-linear for two reasons:
BDB-JE data structure is a b-tree, which has O(log(n)) performance for retrieving one record. Retrieving all via the get method is O(n*log(n)).
Large data sets don't fit into RAM, and so disk access slows everything down. Random access has very poor cache locality.
Note that you can improve write performance by giving up some durability: ec.setTxnWriteNoSync(true);
You might also want to try Tupl, an open source BerkeleyDB replacement I've been working on. It's still in the alpha stage, but you can find it on SourceForge.
For a fair comparison between BDB-JE and Tupl, I set the cache size to 500M and an explicit checkpoint is performed at the end of the store method.
With BDB-JE:
N=80000, write=11.0sec, fetch=5.3sec
N=100000, write=13.6sec, fetch=7.0sec
N=120000, write=16.4sec, fetch=29.5sec
N=140000, write=18.8sec, fetch=35.9sec
N=160000, write=21.5sec, fetch=41.3sec
N=180000, write=23.9sec, fetch=46.4sec
With Tupl:
N=80000, write=21.7sec, fetch=4.4sec
N=100000, write=27.6sec, fetch=6.3sec
N=120000, write=30.2sec, fetch=8.4sec
N=140000, write=35.4sec, fetch=12.2sec
N=160000, write=39.9sec, fetch=17.4sec
N=180000, write=45.4sec, fetch=22.8sec
BDB-JE is faster at writing entries, because of its log-based format. Tupl is faster at reading, however. Here's the source to the Tupl test:
import java.io.;
import java.util.;
import org.cojen.tupl.*;
public class TuplTest {
public static void main(final String[] args) throws Exception {
final RandTupl rt = new RandTupl();
rt.dbOpen(args[0]);
{
long start = System.currentTimeMillis();
rt.storeRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("store duration: " + (end - start));
}
{
long start = System.currentTimeMillis();
rt.fetchRecords(Integer.parseInt(args[1]));
long end = System.currentTimeMillis();
System.out.println("fetch duration: " + (end - start));
}
}
private Database db;
private Index ix;
private Random random = new Random();
private List<String> keys = new ArrayList<String>();
private int seed = 113;
public boolean dbOpen(String home) throws Exception {
DatabaseConfig config = new DatabaseConfig();
config.baseFile(new File(home));
config.durabilityMode(DurabilityMode.NO_FLUSH);
config.minCacheSize(500000000);
db = Database.open(config);
ix = db.openIndex("moe");
return true;
}
public int storeRecords(int i) throws Exception {
int j;
long size = 0;
random.setSeed(seed);
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
keys.add(k);
size += data.length;
random.nextBytes(data);
ix.store(null, k.getBytes(), data);
}
System.out.println("GENERATED SIZE: " + size);
db.checkpoint();
return j;
}
public int fetchRecords(int i) throws Exception {
int j, res;
random.setSeed(seed);
res = 0;
for (j = 0; j < i; j++) {
String k = Long.toString(random.nextLong());
byte[] data = new byte[5000 + random.nextInt(10000)];
random.nextBytes(data);
byte[] val = ix.load(null, k.getBytes());
if (Arrays.equals(data, val)) {
res++;
} else {
System.err.println("FETCH differs: " + j);
System.err.println(data.length + " " + val.length);
}
}
return res;
}
public int fetchRandom(int i) throws Exception {
for (int j = 0; j < i; j++) {
String k = keys.get(random.nextInt(keys.size()));
ix.load(null, k.getBytes());
}
return i;
}
}

error : java.lang.ArrayIndexOutOfBoundsException on Android

I have this code :
indexOfChromosomes : 48
indexOfGens : 20
double Truncation;
double Crossover;
double Mutation;
int Generation;
private Context context;
int indexOfChromosomes;
int indexOfGens;
int gensNumber;
int chromosomesNumber;
public AdapterDB(int Bits, double Truncation, double Crossover, double Mutation, int Chromosomes, int Generation, Context ctx)
{
this.indexOfGens = Bits;
this.Truncation = Truncation;
this.Crossover = Crossover;
this.Mutation = Mutation;
this.indexOfChromosomes = Chromosomes;
this.Generation = Generation;
this.context = ctx;
DBHelper = new DatabaseHelper (context);
}
String [][] population = new String[indexOfChromosomes][indexOfGens];
public void generateChromosome()
{
for(chromosomesNumber = 0; chromosomesNumber < indexOfChromosomes ; chromosomesNumber++)
{
int o = 0;
for(gensNumber = 0; gensNumber < 12 ; gensNumber++)
{
Cursor l = db.rawQuery("SELECT _id, key_foodstuff, key_calorie, key_carbohydrate, key_fat, key_protein FROM (food INNER JOIN categories ON food.key_nocategory = categories.nocategories) WHERE key_type='primary' AND _id!=164 AND (key_carbohydrate!=0 OR key_protein!=0 OR key_fat!=0) ORDER BY RANDOM() LIMIT 1", null);
if((l.moveToFirst()) && (l!=null))
{
if (o == indexOfGens)
{
gensNumber = 0;
sumOfCarbohydrateMor = 0;
sumOfFatMor = 0;
sumOfProteinMor = 0;
sumOfCalorieMor = 0;
o = 0;
}
population[chromosomesNumber][gensNumber] = l.getString(0);
morning_food[k] = l.getString(3);
sumOfCarbohydrateMor = sumOfCarbohydrateMor + Double.parseDouble(morning_food[k]);
morning_food[f] = l.getString(4);
sumOfFatMor = sumOfFatMor + Double.parseDouble(morning_food[f]);
morning_food[p] = l.getString(5);
sumOfProteinMor = sumOfProteinMor + Double.parseDouble(morning_food[p]);
morning_food[c] = l.getString(2);
sumOfCalorieMor = sumOfCalorieMor + Double.parseDouble(morning_food[c]);
if (((sumOfCarbohydrateMor >= (morning_car-(morning_car*0.2))) && (sumOfCarbohydrateMor <= morning_cal*1.1)) && ((sumOfProteinMor >= (morning_pro-(morning_pro*0.2))) && (sumOfProteinMor <= morning_pro*1.1)) && ((sumOfFatMor >= (morning_fat-(morning_fat*0.2))) && (sumOfFatMor <= morning_fat*1.1)))
//if((sumOfCarbohydrateMor > (morning_car*0.6)) && (sumOfProteinMor > (morning_pro*0.7)) && (sumOfFatMor > (morning_fat*0.8)))
{
Log.e("lala", "lalala");
break;
}
if ((sumOfCarbohydrateMor > (morning_car*1.1)) || (sumOfProteinMor > (morning_pro*1.1)) || (sumOfFatMor > (morning_fat*1.1)) || (sumOfCalorieMor > (morning_cal*1.1))
{
morning_food[k] = l.getString(3);
sumOfCarbohydrateMor = sumOfCarbohydrateMor - Double.parseDouble(morning_food[k]);
morning_food[f] = l.getString(4);
sumOfFatMor = sumOfFatMor - Double.parseDouble(morning_food[f]);
morning_food[p] = l.getString(5);
sumOfProteinMor = sumOfProteinMor - Double.parseDouble(morning_food[p]);
morning_food[c] = l.getString(2);
sumOfCalorieMor = sumOfCalorieMor - Double.parseDouble(morning_food[c]);
gensNumber--;
o++;
}
}
}
and it error in line : 48
The error says that : java.lang.ArrayIndexOutOfBoundsException
any idea? Thx u

Try setting
population = new String[indexOfChromosomes][indexOfGens];
inside your adapterDB.
It looks like when you are initializing populatioin, indexOfChromosomes and indexOfGens are not initialized yet, so you are creating an array of size 0. So when you call
population[chromosomesNumber][gensNumber]
you get java.lang.ArrayIndexOutOfBoundsException

It's because you define the population string outside a function, without initializing the indexOfChromosomes to 48 and the indexOfGens to 20. Try defining the population at the top and setting it to something new in the adapter, AFTER you've set your variables. Something like this:
double Truncation;
double Crossover;
double Mutation;
int Generation;
private Context context;
int indexOfChromosomes;
int indexOfGens;
int gensNumber;
int chromosomesNumber;
String [][] population;
public AdapterDB(int Bits, double Truncation, double Crossover, double Mutation, int Chromosomes, int Generation, Context ctx)
{
this.indexOfGens = Bits;
this.Truncation = Truncation;
this.Crossover = Crossover;
this.Mutation = Mutation;
this.indexOfChromosomes = Chromosomes;
this.Generation = Generation;
this.context = ctx;
DBHelper = new DatabaseHelper (context);
//Create population after initializing variables.
population = new String[indexOfChromosomes][indexOfGens];
}

This is your problem:
String [][] population = new String[indexOfChromosomes][indexOfGens];
This occurs outside your constructor, and is therefore executed before your constructor, when indexOfChromosomes and indexOfGens are both still 0. Put the initialization inside your constructor. Here's a simpler example showing the same problem:
public class Test {
private int size;
public Test(int size) {
this.size = size;
}
private String[] array = new String[size];
public static void main(String[] args) {
Test t = new Test(5);
System.out.println(t.array.length);
}
}
And the fixed version:
public class Test {
private int size;
private String[] array;
public Test(int size) {
this.size = size;
array = new String[size];
}
public static void main(String[] args) {
Test t = new Test(5);
System.out.println(t.array.length);
}
}
Note that the positioning of the variable declaration with respect to the constructor makes no difference to the execution flow.
EDIT: As for why it's now looping forever - in the middle of your code you have:
if (o == indexOfGens)
{
gensNumber = 0;
...
}
which will reset the inner loop back to (nearly) the start (not quite the start, as gensNumber will be incremented at the end of the loop body, before the start of the next iteration).
It's not at all clear what you're trying to do, but I suspect that's not helping.
I'd also encourage you to use local variables wherever possible - it's very unusual to use an instance variable as a loop counter, for example.
Finally, I'd encourage you to break up your large method into smaller ones for readability.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

GC overhead limit exceeded while training OpenNLP's NameFinderME - java

Related

How to optimize this simulation in Java?

How to extend a class array/ call class array attributes after extend a class?

The reduce fails due to Task attempt failed to report status for 600 seconds. Killing! Solution?

BerkeleyDB JE random access time increases non-linearly

error : java.lang.ArrayIndexOutOfBoundsException on Android

Categories

Resources