I'm trying to compile my first major program. Unfortunately in getBestFare() I get "null" coming out all the time. And it shouldn't! I'm asking you guys for help what's wrong.
I rebuilt the entire getBestFare() method but unfortunately it keeps coming up with "null". The earlier code was a bit more messy. Now it's better, but it still doesn't work.
public class TransitCalculator {
public int numberOfDays;
public int transCount;
public TransitCalculator(int numberOfDays, int transCount) {
if(numberOfDays <= 30 && numberOfDays > 0 && transCount > 0){
this.numberOfDays = numberOfDays;
this.transCount = transCount;
} else {
System.out.println("Invalid data.");
}
}
String[] length = {"Pay-per-ride", "7-day", "30-day"};
double[] cost = {2.75, 33.00, 127.00};
public double unlimited7Price(){
int weekCount = numberOfDays/7;
if (numberOfDays%7>0){
weekCount+=1;
}
double weeksCost = weekCount * cost[1];
return weeksCost;
}
public double[] getRidePrices(){
double price1 = cost[0];
double price2 = ((cost[1]*unlimited7Price()) / (unlimited7Price() * 7));
double price3 = cost[2] / numberOfDays;
double[] getRide = {price1, price2, price3};
return getRide;
}
public String getBestFare(){
int num = 0;
for (int i = 0; i < getRidePrices().length; i++) {
if(getRidePrices()[i] < getRidePrices()[num]){
return "You should get the " + length[num] + " Unlimited option at " + getRidePrices()[num]/transCount + " per ride.";
}
}
return null;
}
public static void main(String[] args){
TransitCalculator one = new TransitCalculator(30, 30);
System.out.println(one.unlimited7Price());
System.out.println(one.getRidePrices()[2]);
System.out.println(one.getBestFare());
}
}
I am having issues with the values of different keys in a hashmap I created. Somehow, the values are being overwritten by the most current added value.
For example:
map.put("String1", "123")
map.put("String2", "456")
System.out.println(map.get("String1")); //would print 456
My hashmap is:
Map<String, Testing> systems = new HashMap<>();
Testing:
class Testing {
private double[] ap;
private double[] dcg;
private double[] ndcg;
public Testing(double[] _ap, double[] _dcg, double[] _ndcg){
ap = _ap;
dcg = _dcg;
ndcg = _ndcg;
}
public double[] getAp(){
return ap;
}
public double[] getDcg(){
return dcg;
}
public double[] getNdcg(){
return ndcg;
}
}
Here is the problem (f is a File):
systems.put(f.getName(), new Testing(ap, dcg, ndcg));
if(f.getName().equals("input.Flab9atdnN.gz") ||
f.getName().equals("input.apl9lt.gz")) {
System.out.println(f.getName() + ": " +
systems.get(f.getName()).getAp()[5]);
}
if(f.getName().equals("input.Flab9atdnN.gz")) {
System.out.println(f.getName().equals("input.Flab9atdnN.gz"));
double temp = systems.get("input.apl9lt.gz").getAp()[5];
double temp2 = systems.get("input.Flab9atdnN.gz").getAp()[5];
System.out.println("input.Flab9atdnN.gz: " + temp2 + ". input.apl9lt.gz:
" + temp);
}
The first print gives different values for key "input.Flab9atdnN.gz" and key "input.apl9lt.gz".
System.out.println(f.getName() + ": " + systems.get(f.getName()).getAp()
[5]);
The last print statement gives the same value for key "input.Flab9atdnN.gz" but gives key "input.apl9lt.gz" key "input.Flab9atdnN.gz"'s value. Key "input.Flab9atdnN.gz" is added to the hashmap after key "input.apl9lt.gz" thus anything added after key "input.Flab9atdnN.gz" would give that new keys value to "input.Flab9atdnN.gz" as well.
double temp = systems.get("input.apl9lt.gz").getAp()[5];
double temp2 = systems.get("input.Flab9atdnN.gz").getAp()[5];
System.out.println("input.Flab9atdnN.gz: " + temp2 + ". input.apl9lt.gz: " +
temp);
Any clue why this is and any possible work around? I tried doing something similar with lists and had the same problem. I tried not using Testing as well.
Thank you.
Edit (for inputs):
input.apl9lt.gz
0.160837098024862
0.03075251487336594
0.22437008086531643
0.1971910732696186
0.26775040012743095
9.256258391747239E-4
0.1348288884102969
0.04098977989693765
0.22076261792825694
0.14351330413359978
0.4326923076923077
0.07127127472804279
4.552325182365065E-5
0.010058991520632703
0.013241228159087674
0.010137295467368818
0.16308220490382738
0.013974532767649097
0.1591821903406855
0.03546054590978735
0.017811035142771457
0.09931683119953653
0.0012300123001230013
3.2100667693888034E-5
0.13463869607114665
0.056660951442691745
0.009024064171122994
0.00111158621285874
0.19147531389263409
6.058415656054187E-4
0.15122464967762936
0.017945455244915694
0.24100308685261787
6.295914132164171E-4
0.41666666666666663
0.16054778554778554
0.12606805722666745
0.03122700118062138
0.05840908368719257
0.06151876506910154
8.167932696234583E-5
0.48663786619303134
0.0017420658249683476
0.20520161886380303
7.111269849728675E-5
0.1157176972265951
0.28587824256374156
0.032836748137528377
0.04182754182754183
0.02944176265259386
input.Flab9atdnN.gz
0.4550531747779656
0.11354712610736152
0.4465970245283123
0.39990864084973815
0.23410193071725469
8.08015513897867E-5
0.11287817139653589
0.02255268670973833
0.30038335608865446
0.21267974099603318
0.6041666666666666
0.15726821262566176
0.15690222729126874
1.5053439710973956E-4
0.0843584401155248
0.5027027027027027
0.1873237718924946
0.005660813678763912
0.012321170992537366
0.0529994153272247
0.04489848129896188
0.016508461433080466
0.0
1.0736065674698053E-4
0.07164253590778259
0.14083889573189318
0.024676040805073064
0.16099898114516484
0.16509562037628656
0.06488409960391041
0.22263263699157246
0.0568843663526689
0.4175364417422477
0.1106842493991619
0.15555555555555556
0.5416666666666666
0.4654817731306396
0.0930344930767678
0.344114561968089
0.1882981402539536
0.11698973634619976
0.4533746137676584
0.3389765988389732
0.475199277730597
0.08708693608991427
0.34790332410690694
0.035929746042875826
0.08056424630498706
0.20352743561030237
0.12758565977230674
Read these into a double array (size 50). To use the Testing class, just remove dcg and ndcg.
First of all:
import java.util.HashMap;
public class MapPutExample {
public final static void main(String[] args) {
HashMap<String, String> map = new HashMap<>();
map.put("String1", "123");
map.put("String2", "456");
System.out.println(map.get("String1")); //would print 456
}
}
prints
123
So: No it's not.
You haven't provided the actual code where we can see how you add elements to the map but given the fact that Testing contains arrays I suppose that you don't initiialize new arrays when setting its values from the file. So you do something like this:
import java.util.HashMap;
public class MapPutExample {
public final static void main(String[] args) throws Exception {
HashMap<String, char[]> map = new HashMap<>();
char[] buffer = new char[10];
for (int i = 0; i < buffer.length; i++) {
buffer[i] = (char) ('a' + i);
}
map.put("String1", buffer);
for (int i = 0; i < buffer.length; i++) {
buffer[i] = (char) ('k' + i);
}
map.put("String2", buffer);
System.out.println(map.get("String1")); //would print klmnopqrst
}
}
This actually prints
klmnopqrst
You need to create new arrays and set these to the Testing instances that you add to the map:
import java.util.HashMap;
public class MapPutExample {
public final static void main(String[] args) throws Exception {
HashMap<String, char[]> map = new HashMap<>();
char[] buffer = new char[10];
for (int i = 0; i < buffer.length; i++) {
buffer[i] = (char) ('a' + i);
}
map.put("String1", buffer);
buffer = new char[10];
for (int i = 0; i < buffer.length; i++) {
buffer[i] = (char) ('k' + i);
}
map.put("String2", buffer);
System.out.println(map.get("String1")); //would print klmnopqrst
}
}
This is printing
abcdefghij
Using for loop to compare the input value with the hashmap if it matches any value in the hash-map then the code prints all the related values with that time.
The result shows out for me NULL
System.out.println("Please enter time :");
Scanner scan = new Scanner(System.in);
String value = scan.nextLine();//Read input-time
Measurement measurement = measurements.get(value);//there can only be 1 Measurement for 1 time
if(measurement != null){
System.out.println(measurement);
}}
Class Measurement:
public void getTimeInfo(String value)
{
value = Measurements.get(time);
if (value == null) {
throw new MeasurementException();
}
System.out.println("The detailed info : " + this.time + "-" + this.temp+ " "+ this.wind+ "-" + this.humid );
}
}
}
Following the 3 steps (ignoring the Json part) u mentioned and reusing some of your code i can provide u this code:
Main.java:
public class Main {
static HashMap<String, Measurement> measurements = new HashMap();
public static void main(String[] args) {
for (int i = 0; i < 3; i++) {//Create 3 measurements
String time = ""+i;
measurements.put(time, new Measurement(time, (float) i, (float) i, (float) i));
}
System.out.println("Please enter time :");
Scanner scan = new Scanner(System.in);
String value = scan.nextLine();//Read input-time
Measurement measurement = measurements.get(value);//there can only be 1 Measurement for 1 time
if(measurement != null){
System.out.println(measurement);
}
}
}
Measurement.java:
public class Measurement {
String time ;
Float temp;
Float wind;
Float humid;
int iD;
public Measurement(String d, Float t, Float w, Float h){
this.time = d;
this.temp = t;
this.wind = w;
this.humid = h;
}
#Override
public String toString() {
return "The detailed info : " + this.time + "-" + this.temp+ " "+ this.wind+ "-" + this.humid;
}
}
It might not fit exactly your needs but it can be a help.
I want to get probability score for the extracted names using NameFinderME, but using the provided model gives very bad probabilities using the probs function.
For example, "Scott F. Fitzgerald" gets a score around 0.5 (averaging log probabilities, and taking an exponent), while "North Japan" and "Executive Vice President, Corporate Relations and Chief Philanthropy Officer" both get a score higher than 0.9...
I have more than 2 million first names, and another 2 million last names (with their frequency counts) And I want to synthetically create a huge dataset from outer multiplication of the first names X middle names (using the first names pool) X last names.
The problem is, I don't even get to go over all the last names once (even when discarding freq counts and only using each name only once) before I get a GC overhead limit exceeded exception...
I'm implementing a ObjectStream and give it to the train function:
public class OpenNLPNameStream implements ObjectStream<NameSample> {
private List<Map<String, Object>> firstNames = null;
private List<Map<String, Object>> lastNames = null;
private int firstNameIdx = 0;
private int firstNameCountIdx = 0;
private int middleNameIdx = 0;
private int middleNameCountIdx = 0;
private int lastNameIdx = 0;
private int lastNameCountIdx = 0;
private int firstNameMaxCount = 0;
private int middleNameMaxCount = 0;
private int lastNameMaxCount = 0;
private int firstNameKBSize = 0;
private int lastNameKBSize = 0;
Span span[] = new Span[1];
String fullName[] = new String[3];
String partialName[] = new String[2];
private void increaseFirstNameCountIdx()
{
firstNameCountIdx++;
if (firstNameCountIdx == firstNameMaxCount) {
firstNameIdx++;
if (firstNameIdx == firstNameKBSize)
return; //no need to update anything - this is the end of the run...
firstNameMaxCount = getFirstNameMaxCount(firstNameIdx);
firstNameCountIdx = 0;
}
}
private void increaseMiddleNameCountIdx()
{
lastNameCountIdx++;
if (middleNameCountIdx == middleNameMaxCount) {
if (middleNameIdx == firstNameKBSize) {
resetMiddleNameIdx();
increaseFirstNameCountIdx();
} else {
middleNameMaxCount = getMiddleNameMaxCount(middleNameIdx);
middleNameCountIdx = 0;
}
}
}
private void increaseLastNameCountIdx()
{
lastNameCountIdx++;
if (lastNameCountIdx == lastNameMaxCount) {
lastNameIdx++;
if (lastNameIdx == lastNameKBSize) {
resetLastNameIdx();
increaseMiddleNameCountIdx();
}
else {
lastNameMaxCount = getLastNameMaxCount(lastNameIdx);
lastNameCountIdx = 0;
}
}
}
private void resetLastNameIdx()
{
lastNameIdx = 0;
lastNameMaxCount = getLastNameMaxCount(0);
lastNameCountIdx = 0;
}
private void resetMiddleNameIdx()
{
middleNameIdx = 0;
middleNameMaxCount = getMiddleNameMaxCount(0);
middleNameCountIdx = 0;
}
private int getFirstNameMaxCount(int i)
{
return 1; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getMiddleNameMaxCount(int i)
{
return 3; //compromised on using just
//String occurences = (String) firstNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
private int getLastNameMaxCount(int i)
{
return 1;
//String occurences = (String) lastNames.get(i).get("occurences");
//return Integer.parseInt(occurences);
}
#Override
public NameSample read() throws IOException {
if (firstNames == null) {
firstNames = CSVFileTools.readFileFromInputStream("namep_first_name_idf.csv", new ClassPathResource("namep_first_name_idf.csv").getInputStream());
firstNameKBSize = firstNames.size();
firstNameMaxCount = getFirstNameMaxCount(0);
middleNameMaxCount = getFirstNameMaxCount(0);
}
if (lastNames == null) {
lastNames = CSVFileTools.readFileFromInputStream("namep_last_name_idf.csv",new ClassPathResource("namep_last_name_idf.csv").getInputStream());
lastNameKBSize = lastNames.size();
lastNameMaxCount = getLastNameMaxCount(0);
}
increaseLastNameCountIdx();;
if (firstNameIdx == firstNameKBSize)
return null; //we've finished iterating over all permutations!
String [] sentence;
if (firstNameCountIdx < firstNameMaxCount / 3)
{
span[0] = new Span(0,2,"Name");
sentence = partialName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[1] = (String)lastNames.get(lastNameIdx).get("last_name");
}
else
{
span[0] = new Span(0,3,"name");
sentence = fullName;
sentence[0] = (String)firstNames.get(firstNameIdx).get("first_name");
sentence[2] = (String)lastNames.get(lastNameIdx).get("last_name");
if (firstNameCountIdx < 2*firstNameCountIdx/3) {
sentence[1] = (String)firstNames.get(middleNameIdx).get("first_name");
}
else {
sentence[1] = ((String)firstNames.get(middleNameIdx).get("first_name")).substring(0,1) + ".";
}
}
return new NameSample(sentence,span,true);
}
#Override
public void reset() throws IOException, UnsupportedOperationException {
firstNameIdx = 0;
firstNameCountIdx = 0;
middleNameIdx = 0;
middleNameCountIdx = 0;
lastNameIdx = 0;
lastNameCountIdx = 0;
firstNameMaxCount = 0;
middleNameMaxCount = 0;
lastNameMaxCount = 0;
}
#Override
public void close() throws IOException {
reset();
firstNames = null;
lastNames = null;
}
}
And
TokenNameFinderModel model = NameFinderME.train("en","person",new OpenNLPNameStream(),TrainingParameters.defaultParams(),new TokenNameFinderFactory());
model.serialize(new FileOutputStream("trainedNames.bin",false));
I get the following error after a few minutes of running:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at opennlp.tools.util.featuregen.WindowFeatureGenerator.createFeatures(WindowFeatureGenerator.java:112)
at opennlp.tools.util.featuregen.AggregatedFeatureGenerator.createFeatures(AggregatedFeatureGenerator.java:79)
at opennlp.tools.util.featuregen.CachedFeatureGenerator.createFeatures(CachedFeatureGenerator.java:69)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:118)
at opennlp.tools.namefind.DefaultNameContextGenerator.getContext(DefaultNameContextGenerator.java:37)
at opennlp.tools.namefind.NameFinderEventStream.generateEvents(NameFinderEventStream.java:113)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:137)
at opennlp.tools.namefind.NameFinderEventStream.createEvents(NameFinderEventStream.java:36)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:62)
at opennlp.tools.util.AbstractEventStream.read(AbstractEventStream.java:27)
at opennlp.tools.util.AbstractObjectStream.read(AbstractObjectStream.java:32)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:46)
at opennlp.tools.ml.model.HashSumEventStream.read(HashSumEventStream.java:29)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:130)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
Edit: After increasing the memory of the JVM to 8GB, I still don't get past the first 2 million last names, but now the Exception is:
java.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.resize(HashMap.java:703)
at java.util.HashMap.putVal(HashMap.java:662)
at java.util.HashMap.put(HashMap.java:611)
at opennlp.tools.ml.model.AbstractDataIndexer.update(AbstractDataIndexer.java:141)
at opennlp.tools.ml.model.TwoPassDataIndexer.computeEventCounts(TwoPassDataIndexer.java:134)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:83)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:337)
It seems the problem stems from the fact I'm creating a new NameSample along with new Spans and Strings at every read call... But I can't reuse Spans or NameSamples, since they're immutables.
Should I just write my own language model, is there a better Java library for doing this sort of thing (I'm only interested in getting the probability the extracted text is actually a name) are there parameters I should tweak for the model I'm training?
Any advice would be appreciated.
The reduce phase of the job fails with:
of failed Reduce Tasks exceeded allowed limit.
The reason why each task fails is:
Task attempt_201301251556_1637_r_000005_0 failed to report status for 600 seconds. Killing!
Problem in detail:
The Map phase takes in each record which is of the format: time, rid, data.
The data is of the format: data element, and its count.
eg: a,1 b,4 c,7 correseponds to the data of a record.
The mapper outputs for each data element the data for every record. eg:
key:(time, a,), val: (rid,data)
key:(time, b,), val: (rid,data)
key:(time, c,), val: (rid,data)
Every reduce receives all the data corresponding to same key from all the records.
e.g:
key:(time, a), val:(rid1, data) and
key:(time, a), val:(rid2, data)
reach the same reduce instance.
It does some processing here and outputs similar rids.
My program runs without trouble for a small dataset such as 10MB. But fails when the data increases to say 1G, with the above mentioned reason. I don't know why this happens. Please help!
Reduce code:
There are two classes below:
VCLReduce0Split
CoreSplit
a. VCLReduce0SPlit
public class VCLReduce0Split extends MapReduceBase implements Reducer<Text, Text, Text, Text>{
// #SuppressWarnings("unchecked")
public void reduce (Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String key_str = key.toString();
StringTokenizer stk = new StringTokenizer(key_str);
String t = stk.nextToken();
HashMap<String, String> hmap = new HashMap<String, String>();
while(values.hasNext())
{
StringBuffer sbuf1 = new StringBuffer();
String val = values.next().toString();
StringTokenizer st = new StringTokenizer(val);
String uid = st.nextToken();
String data = st.nextToken();
int total_size = 0;
StringTokenizer stx = new StringTokenizer(data,"|");
StringBuffer sbuf = new StringBuffer();
while(stx.hasMoreTokens())
{
String data_part = stx.nextToken();
String data_freq = stx.nextToken();
// System.out.println("data_part:----->"+data_part+" data_freq:----->"+data_freq);
sbuf.append(data_part);
sbuf.append("|");
sbuf.append(data_freq);
sbuf.append("|");
}
/*
for(int i = 0; i<parts.length-1; i++)
{
System.out.println("data:--------------->"+data);
int part_size = Integer.parseInt(parts[i+1]);
sbuf.append(parts[i]);
sbuf.append("|");
sbuf.append(part_size);
sbuf.append("|");
total_size = part_size+total_size;
i++;
}*/
sbuf1.append(String.valueOf(total_size));
sbuf1.append(",");
sbuf1.append(sbuf);
if(uid.equals("203664471")){
// System.out.println("data:--------------------------->"+data+" tot_size:---->"+total_size+" sbuf:------->"+sbuf);
}
hmap.put(uid, sbuf1.toString());
}
float threshold = (float)0.8;
CoreSplit obj = new CoreSplit();
ArrayList<CustomMapSimilarity> al = obj.similarityCalculation(t, hmap, threshold);
for(int i = 0; i<al.size(); i++)
{
CustomMapSimilarity cmaps = al.get(i);
String xy_pair = cmaps.getRIDPair();
String similarity = cmaps.getSimilarity();
output.collect(new Text(xy_pair), new Text(similarity));
}
}
}
b. coreSplit
package com.a;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.StringTokenizer;
import java.util.TreeMap;
import org.apache.commons.collections.map.MultiValueMap;
public class PPJoinPlusCoreOptNewSplit{
public ArrayList<CustomMapSimilarity> similarityCalculation(String time, HashMap<String,String>hmap, float t)
{
ArrayList<CustomMapSimilarity> als = new ArrayList<CustomMapSimilarity>();
ArrayList<CustomMapSimilarity> alsim = new ArrayList<CustomMapSimilarity>();
Iterator<String> iter = hmap.keySet().iterator();
MultiValueMap index = new MultiValueMap();
String RID;
TreeMap<String, Integer> hmap2;
Iterator<String> iter1;
int size;
float prefix_size;
HashMap<String, Float> alpha;
HashMap<String, CustomMapOverlap> hmap_overlap;
String data;
while(iter.hasNext())
{
RID = (String)iter.next();
String data_val = hmap.get(RID);
StringTokenizer st = new StringTokenizer(data_val,",");
// System.out.println("data_val:--**********-->"+data_val+" RID:------------>"+RID+" time::---?"+time);
String RIDsize = st.nextToken();
size = Integer.parseInt(RIDsize);
data = st.nextToken();
StringTokenizer st1 = new StringTokenizer(data,"\\|");
String[] parts = data.split("\\|");
// hmap2 = (TreeMap<String, Integer>)hmap.get(RID);
// iter1 = hmap2.keySet().iterator();
// size = hmap_size.get(RID);
prefix_size = (float)(size-(0.8*size)+1);
if(size==1)
{
prefix_size = 1;
}
alpha = new HashMap<String, Float>();
hmap_overlap = new HashMap<String, CustomMapOverlap>();
// Iterator<String> iter2 = hmap2.keySet().iterator();
int prefix_index = 0;
int pi=0;
for(float j = 0; j<=prefix_size; j++)
{
boolean prefix_chk = false;
prefix_index++;
String ptoken = parts[pi];
// System.out.println("data:---->"+data+" ptoken:---->"+ptoken);
float val = Float.parseFloat(parts[pi+1]);
float temp_j = j;
j = j+val;
boolean j_l = false ;
float prefix_contri = 0;
pi= pi+2;
if(j>prefix_size)
{
// prefix_contri = j-temp_j;
prefix_contri = prefix_size-temp_j;
if(prefix_contri>0)
{
j_l = true;
prefix_chk = false;
}
else
{
prefix_chk = true;
}
}
if(prefix_chk == false){
filters(index, ptoken, RID, hmap,t, size, val, j_l, alpha, hmap_overlap, j, prefix_contri);
CustomMapPrefixTokens cmapt = new CustomMapPrefixTokens(RID,j);
index.put(ptoken, cmapt);
}
}
als = calcSimilarity(time, RID, hmap, alpha, hmap_overlap);
for(int i = 0; i<als.size(); i++)
{
if(als.get(i).getRIDPair()!=null)
{
alsim.add(als.get(i));
}
}
}
return alsim;
}
public void filters(MultiValueMap index, String ptoken, String RID, HashMap<String, String> hmap, float t, int size, float val, boolean j_l, HashMap<String, Float> alpha, HashMap<String, CustomMapOverlap> hmap_overlap, float j, float prefix_contri)
{
#SuppressWarnings("unchecked")
ArrayList<CustomMapPrefixTokens> positions_list = (ArrayList<CustomMapPrefixTokens>) index.get(ptoken);
if((positions_list!=null) &&(positions_list.size()!=0))
{
CustomMapPrefixTokens cmapt ;
String y;
Iterator<String> iter3;
int y_size = 0;
float check_size = 0;
// TreeMap<String, Integer> hmapy;
float RID_val=0;
float y_overlap = 0;
float ubound = 0;
ArrayList<Float> fl = new ArrayList<Float>();
StringTokenizer st;
for(int k = 0; k<positions_list.size(); k++)
{
cmapt = positions_list.get(k);
if(!cmapt.getRID().equals(RID))
{
y = hmap.get(cmapt.getRID());
// iter3 = y.keySet().iterator();
String yRID = cmapt.getRID();
st = new StringTokenizer(y,",");
y_size = Integer.parseInt(st.nextToken());
check_size = (float)0.8*(size);
if(y_size>=check_size)
{
//hmapy = hmap.get(yRID);
String y_data = st.nextToken();
StringTokenizer st1 = new StringTokenizer(y_data,"\\|");
while(st1.hasMoreTokens())
{
String token = st1.nextToken();
if(token.equals(ptoken))
{
String nxt_token = st1.nextToken();
// System.out.println("ydata:--->"+y_data+" nxt_token:--->"+nxt_token);
RID_val = (float)Integer.parseInt(nxt_token);
break;
}
}
// RID_val = (float) hmapy.get(ptoken);
float alpha1 = (float)(0.8/1.8)*(size+y_size);
fl = overlapCalc(alpha1, size, y_size, cmapt, j, alpha, j_l,RID_val,val,prefix_contri);
ubound = fl.get(0);
y_overlap = fl.get(1);
positionFilter(ubound, alpha1, cmapt, y_overlap, hmap_overlap);
}
}
}
}
}
public void positionFilter( float ubound,float alpha1, CustomMapPrefixTokens cmapt, float y_overlap, HashMap<String, CustomMapOverlap> hmap_overlap)
{
float y_overlap_total = 0;
if(null!=hmap_overlap.get(cmapt.getRID()))
{
y_overlap_total = hmap_overlap.get(cmapt.getRID()).getOverlap();
if((y_overlap_total+ubound)>=alpha1)
{
CustomMapOverlap cmap_tmp = hmap_overlap.get(cmapt.getRID());
float y_o_t = y_overlap+y_overlap_total;
cmap_tmp.setOverlap(y_o_t);
hmap_overlap.put(cmapt.getRID(),cmap_tmp);
}
else
{
float n = 0;
hmap_overlap.put(cmapt.getRID(), new CustomMapOverlap(cmapt.getRID(),n));
}
}
else
{
CustomMapOverlap cmap_tmp = new CustomMapOverlap(cmapt.getRID(),y_overlap);
hmap_overlap.put(cmapt.getRID(), cmap_tmp);
}
}
public ArrayList<Float> overlapCalc(float alpha1, int size, int y_size, CustomMapPrefixTokens cmapt, float j, HashMap<String, Float> alpha, boolean j_l, float RID_val, float val, float prefix_contri )
{
alpha.put(cmapt.getRID(), alpha1);
float min1 = y_size-cmapt.getPosition();
float min2 = size-j;
float min = 0;
float y_overlap = 0;
if(min1<min2)
{
min = min1;
}
else
{
min = min2;
}
if(j_l==true)
{
val = prefix_contri;
}
if(RID_val<val)
{
y_overlap = RID_val;
}
else
{
y_overlap = val;
}
float ubound = y_overlap+min;
ArrayList<Float> fl = new ArrayList<Float>();
fl.add(ubound);
fl.add(y_overlap);
return fl;
}
public ArrayList<CustomMapSimilarity> calcSimilarity( String time, String RID, HashMap<String,String> hmap , HashMap<String, Float> alpha, HashMap<String, CustomMapOverlap> hmap_overlap)
{
float jaccard = 0;
CustomMapSimilarity cms = new CustomMapSimilarity(null, null);
ArrayList<CustomMapSimilarity> alsim = new ArrayList<CustomMapSimilarity>();
Iterator<String> iter = hmap_overlap.keySet().iterator();
while(iter.hasNext())
{
String key = (String)iter.next();
CustomMapOverlap val = (CustomMapOverlap)hmap_overlap.get(key);
float overlap = (float)val.getOverlap();
if(overlap>0)
{
String yRID = val.getRID();
String RIDpair = RID+" "+yRID;
jaccard = unionIntersection(hmap, RIDpair);
if(jaccard>0.8)
{
cms = new CustomMapSimilarity(time+" "+RIDpair, String.valueOf(jaccard));
alsim.add(cms);
}
}
}
return alsim;
}
public float unionIntersection( HashMap<String,String> hmap, String RIDpair)
{
StringTokenizer st = new StringTokenizer(RIDpair);
String xRID = st.nextToken();
String yRID = st.nextToken();
String xdata = hmap.get(xRID);
String ydata = hmap.get(yRID);
int total_union = 0;
int xval = 0;
int yval = 0;
int part_union = 0;
int total_intersect = 0;
// System.out.println("xdata:------*************>"+xdata);
StringTokenizer xtokenizer = new StringTokenizer(xdata,",");
StringTokenizer ytokenizer = new StringTokenizer(ydata,",");
// String[] xpart = xdata.split(",");
// String[] ypart = ydata.split(",");
xtokenizer.nextToken();
ytokenizer.nextToken();
String datax = xtokenizer.nextToken();
String datay = ytokenizer.nextToken();
HashMap<String,Integer> x = new HashMap<String, Integer>();
HashMap<String,Integer> y = new HashMap<String, Integer>();
String [] xparts;
xparts = datax.toString().split("\\|");
String [] yparts;
yparts = datay.toString().split("\\|");
for(int i = 0; i<xparts.length-1; i++)
{
int part_size = Integer.parseInt(xparts[i+1]);
x.put(xparts[i], part_size);
i++;
}
for(int i = 0; i<yparts.length-1; i++)
{
int part_size = Integer.parseInt(yparts[i+1]);
y.put(xparts[i], part_size);
i++;
}
Set<String> xset = x.keySet();
Set<String> yset = y.keySet();
for(String elm:xset )
{
yval = 0;
xval = (Integer)x.get(elm);
part_union = 0;
int part_intersect = 0;
if(yset.contains(elm)){
yval = (Integer) y.get(elm);
if(xval>yval)
{
part_union = xval;
part_intersect = yval;
}
else
{
part_union = yval;
part_intersect = xval;
}
total_intersect = total_intersect+part_intersect;
}
else
{
part_union = xval;
}
total_union = total_union+part_union;
}
for(String elm: yset)
{
part_union = 0;
if(!xset.contains(elm))
{
part_union = (Integer) y.get(elm);
total_union = total_union+part_union;
}
}
float jaccard = (float)total_intersect/total_union;
return jaccard;
}
}
The reason for the timeouts might be a long-running computation in your reducer without reporting the progress back to the Hadoop framework. This can be resolved using different approaches:
I. Increasing the timeout in mapred-site.xml:
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
The default is 600000 ms = 600 seconds.
II. Reporting progress every x records as in the Reducer example in javadoc:
public void reduce(K key, Iterator<V> values,
OutputCollector<K, V> output,
Reporter reporter) throws IOException {
// report progress
if ((noValues%10) == 0) {
reporter.progress();
}
// ...
}
optionally you can increment a custom counter as in the example:
reporter.incrCounter(NUM_RECORDS, 1);
It's possible that you might have consumed all of Java's heap space or GC is happening too frequently giving no chance to the reducer to report status to master and is hence killed.
Another possibility is that one of the reducer is getting too skewed data, i.e. for a particular rid, a lot of records are there.
Try to increase your java heap by setting the following config:
mapred.child.java.opts
to
-Xmx2048m
Also, try and reduce the number of parallel reducers by setting the following config to a lower value than what it currently has (default value is 2):
mapred.tasktracker.reduce.tasks.maximum