I wrote a mapreduce program, but when I tried to run on hadoop it can't succeed since it generates that much amount of intermediate data that I get an error message: the node has no more space on it. After it tries with the second node, but the result is the same. I would like process two text files: approximately ~60k lines.
I have tried:
- enable snappy compression, but it didn't help.
- add more space, so the two node have 50-50gb storage
Since none of them are helped maybe the problem is with the code, not with the setup.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class FirstMapper extends Mapper<LongWritable, Text, Text, Text> {
enum POS_TAG {
CC, CD, DT, EX,
FW, IN, JJ, JJR,
JJS, LS, MD, NN,
NNS, NNP, NNPS, PDT,
WDT, WP, POS, PRP,
PRP$, RB, RBR, RBS,
RP, SYM, TO, UH,
VB, VBD, VBG, VBN,
VBP, VBZ, WP$, WRB
}
private static final List<String> tags = Stream.of(POS_TAG.values())
.map(Enum::name)
.collect(Collectors.toList());
private static final int MAX_NGRAM = 5;
private static String[][] cands = {
new String[3],
new String[10],
new String[32],
new String[10]
};
#Override
protected void setup(Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
String location = conf.get("job.cands.path");
if (location != null) {
BufferedReader br = null;
try {
FileSystem fs = FileSystem.get(conf);
Path path = new Path(location);
if (fs.exists(path)) {
FSDataInputStream fis = fs.open(path);
br = new BufferedReader(new InputStreamReader(fis));
String line;
int i = 0;
while ((line = br.readLine()) != null) {
String[] splitted = line.split(" ");
cands[i] = splitted;
i++;
}
}
} catch (IOException e) {
//
} finally {
br.close();
}
}
}
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] tokens = value.toString().split(" ");
int m = tokens.length;
for (int n = 2; n <= MAX_NGRAM; n++) {
for (int s = 0; s <= m - n; s++) {
for (int i = 0; i < cands[n - 2].length; i++) {
List<String> pattern = new ArrayList<>();
List<String> metWords = new ArrayList<>();
for (int j = 0; j <= n - 1; j++) {
String[] pair = tokens[s + j].split("/");
String word = pair[0];
String pos = pair[1];
char c = cands[n - 2][i].charAt(j);
addToPattern(word, pos, c, pattern);
if (c > 0 && tags.contains(pos)) {
metWords.add(word);
}
}
if (metWords.isEmpty()) {
metWords.add("_NONE");
}
Text resultKey = new Text(pattern.toString() + ";" + metWords.toString());
context.write(resultKey, new Text(key.toString()));
}
}
}
}
public void addToPattern(String word, String pos, char c, List<String> pattern) {
switch (c) {
case 'w':
pattern.add(word);
break;
case 'p':
pattern.add(pos);
break;
default:
pattern.add("_WC_");
break;
}
}
}
public class Main {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
conf.set("job.cands.path", "/user/thelfter/pwp");
Job job1 = Job.getInstance(conf, "word pattern1");
job1.setJarByClass(Main.class);
job1.setMapperClass(FirstMapper.class);
job1.setCombinerClass(FirstReducer.class);
job1.setReducerClass(FirstReducer.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(Text.class);
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, new Path("/user/thelfter/output"));
System.exit(job1.waitForCompletion(true) ? 0 : 1);
}
}
If your using YARN, then the Node Manager's disk space is controlled by yarn.nodemanager.local-dirs in your yarn-site.xml file so whatever that is pointing to needs to have enough disk space.
package demo_thesis;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.evaluation.NominalPrediction;
import weka.classifiers.rules.DecisionTable;
import weka.classifiers.rules.PART;
import weka.classifiers.trees.DecisionStump;
import weka.classifiers.trees.J48;
import weka.core.FastVector;
import weka.core.Instances;
public class WekaTest {
public static BufferedReader readDataFile(String filename) {
BufferedReader inputReader = null;
try {
inputReader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException ex) {
System.err.println("File not found: " + filename);
}
return inputReader;
}
public static Evaluation classify(Classifier model,
Instances trainingSet, Instances testingSet) throws Exception {
Evaluation evaluation = new Evaluation(trainingSet);
model.buildClassifier(trainingSet);
evaluation.evaluateModel(model, testingSet);
return evaluation;
}
public static double calculateAccuracy(FastVector predictions) {
double correct = 0;
for (int i = 0; i < predictions.size(); i++) {
NominalPrediction np = (NominalPrediction) predictions.elementAt(i);
if (np.predicted() == np.actual()) {
correct++;
}
}
return 100 * correct / predictions.size();
}
public static Instances[][] crossValidationSplit(Instances data, int numberOfFolds) {
Instances[][] split = new Instances[2][numberOfFolds];
for (int i = 0; i < n`enter code here`umberOfFolds; i++) {
split[0][i] = data.trainCV(numberOfFolds, i);
split[1][i] = data.testCV(numberOfFolds, i);
}
return split;
}
public static void main(String[] args) throws Exception {
BufferedReader datafile = readDataFile("C:\\Users\\user\\Desktop\\demo_thesis\\src\\input_file\\weather.txt");
Instances data = new Instances(datafile);
data.setClassIndex(data.numAttributes() - 1);
// Do 10-split cross validation
Instances[][] split = crossValidationSplit(data, 10);
// Separate split into training and testing arrays
Instances[] trainingSplits = split[0];
Instances[] testingSplits = split[1];
// Use a set of classifiers
Classifier[] models = {
new J48(), // a decision tree
new PART(),
new DecisionTable(),//decision table majority classifier
new DecisionStump() //one-level decision tree
};
// Run for each model
for (int j = 0; j < models.length; j++) {
// Collect every group of predictions for current model in a FastVector
FastVector predictions = new FastVector();
// For each training-testing split pair, train and test the classifier
for (int i = 0; i < trainingSplits.length; i++) {
Evaluation validation = classify(models[j], trainingSplits[i], testingSplits[i]);
predictions.appendElements(validation.predictions());
// Uncomment to see the summary for each training-testing pair.
System.out.println(models[j].toString());
}
// Calculate overall accuracy of current classifier on all splits
double accuracy = calculateAccuracy(predictions);
// Print current classifier's name and accuracy in a complicated,
// but nice-looking way.
System.out.println("Accuracy of " + models[j].getClass().getSimpleName() + ": "
+ String.format("%.2f%%", accuracy)
+ "\n---------------------------------");
}
}
}
I have integrated weka jar file to a java package. i have used this code.but there is a error show in the line predictions.appendElements(validation.predictions()); Giving a error like "cannot find symbol symbol: method predictions()".I have used Netbeans IDE 8.2 and Jdk 1.8. I have tried in many ways but can not solve it. how should I solve the error??
need help with writing to and receiving from the text files
it seems to go almost all the way but then it says that no file exists, at that point it should create one and then start writing to it. it says that it failed to find one and then it just ends itself. I don't know why
package sorting;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.Random;
public class Sorting {
private static int[] oneToFiftyThou = new int[50000];
private static int[] fiftyThouToOne = new int[50000];
private static int[] randomFiftyThou = new int[50000];
public static void main(String[] args) {
if(args.length>0) {
if(args[0].equalsIgnoreCase("init")) {
// initialize the 3 files
// 1-50000 file1
// 50000-1 file2
// random 50000 file3
initializeFiles();
writeFiles();
}
} else {
readFilestoArray();
System.out.println(""+oneToFiftyThou[0] + " - " +
oneToFiftyThou[oneToFiftyThou.length-1]);
System.out.println(""+fiftyThouToOne[0] + " - " +
fiftyThouToOne[fiftyThouToOne.length-1]);
System.out.println(""+randomFiftyThou[0] + " - " +
randomFiftyThou[randomFiftyThou.length-1]);
intInsertionSort(oneToFiftyThou);
intInsertionSort(fiftyThouToOne);
intInsertionSort(randomFiftyThou);
}
}
private static void initializeFiles() {
//Array one
for(int i=1; i<oneToFiftyThou.length+1; i++) {
oneToFiftyThou[i-1] = i;
}
//Array two
for(int i=50000; i>0; i--) {
fiftyThouToOne[fiftyThouToOne.length-(i)] = i;
}
//Array Three Random. Copy Array one into a new Array and shuffle.
System.arraycopy(oneToFiftyThou, 0, randomFiftyThou, 0,
randomFiftyThou.length);
Random random = new Random();
for(int i=randomFiftyThou.length-1; i>0; i--) {
int index = random.nextInt(i+1);
//Swap the values
int value = randomFiftyThou[index];
randomFiftyThou[index] = randomFiftyThou[i];
randomFiftyThou[i] = value;
}
}
public static void writeFiles() {
ArrayList<int[]> arrayList = new ArrayList<int[]>();
arrayList.add(oneToFiftyThou);
arrayList.add(fiftyThouToOne);
arrayList.add(randomFiftyThou);
int fileIter = 1;
for(Iterator<int[]> iter = arrayList.iterator();
iter.hasNext(); ) {
int[] array = iter.next();
try {
File file = new File("file"+fileIter+".txt");
//check for file, create it if it doesn't exist
if(!file.exists()) {
file.createNewFile();
}
FileWriter fileWriter = new FileWriter(file);
BufferedWriter bufferWriter = new BufferedWriter
(fileWriter);
for(int i = 0; i<array.length; i++) {
bufferWriter.write(""+array[i]);
if(i!=array.length-1) {
bufferWriter.newLine();
}
}
bufferWriter.close();
fileIter++;
}catch(IOException ioe) {
ioe.printStackTrace();
System.exit(-1);
}
}
}
public static void readFilestoArray() {
ArrayList<int[]> arrayList = new ArrayList<int[]>();
arrayList.add(oneToFiftyThou);
arrayList.add(fiftyThouToOne);
arrayList.add(randomFiftyThou);
int fileIter = 1;
for(Iterator<int[]> iter = arrayList.iterator();
iter.hasNext(); ) {
int[] array = iter.next();
try {
File file = new File("file"+fileIter+".txt");
//check for file, exit with error if file doesn't exist
if(!file.exists()) {
System.out.println("file doesn't exist "
+ file.getName());
System.exit(-1);
}
FileReader fileReader = new FileReader(file);
BufferedReader bufferReader = new BufferedReader
(fileReader);
for(int i = 0; i<array.length; i++) {
array[i] = Integer.parseInt
(bufferReader.readLine());
}
bufferReader.close();
fileIter++;
}catch(IOException ioe) {
ioe.printStackTrace();
System.exit(-1);
}
}
}
private static void intInsertionSort(int[] intArray) {
int comparisonCount = 0;
long startTime = System.currentTimeMillis();
for(int i=1; i<intArray.length;i++) {
int tempValue = intArray[i];
int j = 0;
for(j=i-1; j>=0 && tempValue<intArray[j];j--){
comparisonCount++;
intArray[j+1] = intArray[j];
}
intArray[j+1] = tempValue;
}
long endTime=System.currentTimeMillis();
System.out.println("Comparison Count = " + comparisonCount
+ " running time (in millis) = " +
(endTime-startTime) );
}
}
Well, works for me. Execute it in console like that:
java Sorting init
Then execute it another time:
java Sorting
Works perfectly. If you are in Eclipse go to run configuration > arguments and put init there.
Point is in your main method you are checking if someone invoked the program with init parameter, if yes then you create those files and write to them, if not - you are reading from them. You are probably invoking without init and the files are not there yet, that's why it doesn't work.
The problem with the code below is that, after running the application, "log.txt" is empty. Why?
I looked over the code, but i can't find something wrong.
package Main;
import java.io.*;
import java.util.*;
public class JavaApp1 {
public static void main(String[] args) throws IOException {
File file = new File("log.txt");
PrintWriter Log = new PrintWriter("log.txt");
int Line = 1;
Scanner ScanCycle = new Scanner(System.in);
System.out.println("Cate numere doriti sa fie afisate?");
int Cycle = ScanCycle.nextInt();
Scanner ScanRange = new Scanner(System.in);
System.out.println("Care este numarul maxim dorit?");
int Range = ScanRange.nextInt();
Random Generator = new Random();
for (int idx = 1; idx <= Cycle; ++idx){
int Value = Generator.nextInt(Range);
Log.println("(" + Line + ")" + "Number Generated: " + Value);
Line = Line + 1;
}
}
}
You need to flush your character stream out. Call close()[which internally calls flush()] or flush() on your PrintWriter instance.
PrintWriter log = new PrintWriter("log.txt");
//your code
log.close();
I have 2 files where 1(OrderCatalogue.java) reads in contents of a external file and 2(below). But I'm having the "FileNotFoundException must be caught or declard to be thrown" error for this line "OrderCatalogue catalogue= new OrderCatalogue();" and I understand that because its not in a method. But if i try puting it in a method, the code under the "getCodeIndex" and "checkOut" methods can't work with the error message of "package catalogue does not exist". Anyone has any idea how i can edit my code to make them work? Thank you!!
public class Shopping {
OrderCatalogue catalogue= new OrderCatalogue();
ArrayList<Integer> orderqty = new ArrayList<>(); //Create array to store user's input of quantity
ArrayList<String> ordercode = new ArrayList<>(); //Create array to store user's input of order number
public int getCodeIndex(String code)
{
int index = -1;
for (int i =0;i<catalogue.productList.size();i++)
{
if(catalogue.productList.get(i).code.equals(code))
{
index = i;
break;
}
}
return index;
}
public void checkout()
{
DecimalFormat df = new DecimalFormat("0.00");
System.out.println("Your order:");
for(int j=0;j<ordercode.size();j++)
{
String orderc = ordercode.get(j);
for (int i =0;i<catalogue.productList.size();i++)
{
if(catalogue.productList.get(i).code.equals(orderc))
{
System.out.print(orderqty.get(j)+" ");
System.out.print(catalogue.productList.get(i).desc);
System.out.print(" # $"+df.format(catalogue.productList.get(i).price));
}
}
}
}
And this is my OrderCatalogue file
public OrderCatalogue() throws FileNotFoundException
{
//Open the file "Catalog.txt"
FileReader fr = new FileReader("Catalog.txt");
Scanner file = new Scanner(fr);
while(file.hasNextLine())
{
//Read in the product details in the file
String data = file.nextLine();
String[] result = data.split("\\, ");
String code = result[0];
String desc = result[1];
String price = result[2];
String unit = result[3];
//Store the product details in a vector
Product a = new Product(desc, code, price, unit);
productList.add(a);
}
It seems the OrderCatalogue constructor throws FileNotFoundException. You can initialize catalogue inside Shopping constructor and catch the exception or declare it to throw FileNotFoundException.
public Shopping() throws FileNotFoundException
{
this.catalogue= new OrderCatalogue();
or
public Shopping()
{
try{
this.catalogue= new OrderCatalogue();
}catch(FileNotFoundException e)
blah blah
}