How to use multiple CSV files in mapreduce

How to use multiple CSV files in mapreduce - java

First, I will explain what I am trying to do. First, I am putting input file (first CSV file) into mapreduce job and other CSV file will be put inside mapper class. But here is the thing. The code in mapper class does not work properly, like this right bottom code. I want to combine two CSV files to use several columns in each CSV file.
For example, 1 file has BibNum (user account), checkoutdatetime (book checkoutdatetime), and itemtype (book itemtype), and 2 CSV file has BibNum (user account), Title (book Title), Itemtype and so on. I want to find out which book will be likely borrowed in coming month. I would be really appreciated if you know the way can combine two CSV file and enlighten me with any helps. If you have any doubts for my code, just let me know, I will try to clarify it.
Path p = new Path("hdfs://0.0.0.0:8020/user/training/Inventory_Sample");
FileSystem fs = FileSystem.get(conf);
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(p)));
try {
String BibNum = "Test";
//System.out.print("test");
while(br.readLine() != null){
//System.out.print("test");
if(!br.readLine().startsWith("BibNumber")) {
String subject[] = br.readLine().split(",");
BibNum = subject[0];
}
}
.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;
import java.util.HashMap;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class StubMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outkey = new Text();
//private MinMaxCountTuple outTuple = new MinMaxCountTuple();
//String csvFile = "hdfs://user/training/Inventory_Sample";
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
//conf.addResource("/etc/hadoop/conf/core-site.xml");
//conf.addResource("/etc/hadoop/conf/hdfs-site.xml");
Path p = new Path("hdfs://0.0.0.0:8020/user/training/Inventory_Sample");
FileSystem fs = FileSystem.get(conf);
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(p)));
try {
String BibNum = "Test";
//System.out.print("test");
while(br.readLine() != null){
//System.out.print("test");
if(!br.readLine().startsWith("BibNumber")) {
String subject[] = br.readLine().split(",");
BibNum = subject[0];
}
}
if(value.toString().startsWith("BibNumber"))
{
return;
}
String data[] = value.toString().split(",");
String BookType = data[2];
String DateTime = data[5];
SimpleDateFormat frmt = new SimpleDateFormat("MM/dd/yyyy hh:mm:ss a");
Date creationDate = frmt.parse(DateTime);
frmt.applyPattern("dd-MM-yyyy");
String dateTime = frmt.format(creationDate);
//outkey.set(BookType + " " + dateTime);
outkey.set(BibNum + " " + BookType + " " + dateTime);
//outUserId.set(userId);
context.write(outkey, new IntWritable(1));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
finally{
br.close();
}
}
}

You are reading CSV file in the mapper code.
If you are using the path to open a file in the mapper, I guess you are using Distributed Cache, then only file would be shipped with the jar to each node where the map reduce should run.
There is a way to combine, but not in the mapper.
You can try the below approach :-
1) Write 2 separate mapper for two different file.
2) Send only those fields required from mapper to reducer.
3) Combine the results in the reducer ( as you want to join on some specific key ).
You can check out Multi Input Format examples for more.

Related

Storing values from a Hash Map into a Text File

I have created a class that allows the user to create and store compounds into a Hash Map and now I want to create another class that allows me to take the values stored in that Hash Map and save them into a text file. I'm not sure if this is needed, but here is the code for the first class that I created containing the Hash Map:
package abi;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;
public class ChemicalComp {
public static void main(String[] args) throws IOException{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
Map<String, String> data = new HashMap<String, String>();
while(true){
String readinput=br.readLine();
if(readinput.equals(""))
break;
String input = readinput.replaceAll("\"", "");
String array[]=input.split(", ");
String compound=array[0];
String formula="";
for(int i=1;i<array.length;i++){
if(!array[i].equals("1")){
formula+=array[i];
}
}
data.put(compound, formula);
}
if(!data.isEmpty()) {
#SuppressWarnings("rawtypes")
Iterator it = data.entrySet().iterator();
while(it.hasNext()) {
#SuppressWarnings("rawtypes")
Map.Entry obj = (Entry) it.next();
System.out.println(obj.getKey()+":"+obj.getValue());
}
}
}
}
I'm not too familiar with text files, but I have done some research and this is what I've gotten so far. I know its pretty basic and that I will probably need some type of getter method, but I'm not sure where to incorporate it into what I have. Here is what I have for the class containing the text file:
package abi;
import java.io.FileWriter;
import java.io.PrintWriter;
import java.io.IOException;
public class CompoundManager {
private String path;
private boolean append_to_file = false;
public CompoundManager(String file_path) {
path = file_path;
}
public CompoundManager(String file_path, boolean append_value){
path = file_path;
append_to_file = append_value;
}
public void WriteToFile (String textLine) throws IOException{
FileWriter Compounds = new FileWriter(path, append_to_file);
PrintWriter print_line = new PrintWriter (Compounds);
print_line.printf("%s" + "%n", textLine);
print_line.close();
}
}

I can't understand what your program does but you can use a buffered writer for it.
Just create a try-catch block and wrap a filewriter in a bufferedwriter like this :
try (BufferedWriter br = new BufferedWriter(new FileWriter(new File("filename.txt"))))
{
for (Map.Entry<Integer, String> entry : map.entrySet()) {
int key = entry.getKey();
String value = entry.getValue();
br.write(key + ": " + value);
br.newLine();
}
} catch (Exception e) {
printStackTrace();
}

How can I append to a particular row using text file in JAVA?

The system I'm creating for my project would be a Course Registration System in Java.
The problem I'm facing right now is how can I append to a particular row (we can say refer to the student ID) so that the registration code modules would be behind the line after the comma.
Every time when I tried to append, it would always append to the last line of the file.
An example of the text file:
After the registration of modules, I would also need to display all modules of that particular student row for that specific subject.
I'm been researching about the solution to come off.
Some say it would be easier to implement the arrayList->File / writing and reading data from the file.
Could anyone help me solve this problem?

First read in your file.
List<String> lines = Files.readAllLines(Paths.get("/path/to/your/file.txt"), StandardCharsets.UTF_8);
Then find and modify your line, in this example I modify the line that starts with "0327159".
List<String> toWrite = new ArrayList<>();
for(int i = 0; i<lines.size(); i++){
String line = lines.get(i);
if(line.startsWith("0327159")){
String updated = line.trim() + ", more text\n";
toWrite.add(updated);
} else{
toWrite.add(line);
}
}
So now toWrite has all of the lines that you want to write to your file.
Files.write(
Paths.get("/path/to/outfile.txt"),
toWrite,
StandardCharsets.UTF_8,
StandardOpenOptions.CREATE,
StandardOpenOptions.TRUNCATE_EXISTING );

You should really try a JSON based approach, to make it less clumsy and eliminate confusion. Here's a working example. My code adds a new student every time and adds a new module to existing students every time. The code is not really optimized since this is just for illustration
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
public class ModuleRegistration
{
public static void main(String[] args) throws IOException
{
File file = new File("C:\\MyStudents.txt");
if (!file.exists())
file.createNewFile();
List<String> lines = Files.readAllLines(Paths.get(file.getAbsolutePath()));
ObjectMapper mapper = new ObjectMapper();
List<StudentInfo> newLines = new ArrayList<StudentInfo>(2);
for (String line : lines)
{
StudentInfo info = mapper.readValue(line, StudentInfo.class);
String modules = info.getModules() == null ? "" : info.getModules();
if (!"".equals(modules))
modules += ",";
modules += "Module" + System.currentTimeMillis();
info.setModules(modules);
newLines.add(info);
}
StudentInfo info = new StudentInfo();
long time = System.currentTimeMillis();
info.setId(time);
info.setModules("Module" + time);
info.setName("Name" + time);
info.setPassword("Password" + time);
info.setType("Local");
newLines.add(info);
try (FileWriter writer = new FileWriter(file, false);)
{
for (StudentInfo i : newLines)
{
writer.write(i.toString());
writer.write(System.lineSeparator());
}
}
System.out.println("Done");
}
static class StudentInfo
{
#JsonProperty("id")
private long id;
#JsonProperty("password")
private String password;
#JsonProperty("name")
private String name;
#JsonProperty("type")
private String type;
#JsonProperty("modules")
private String modules;
// getters and setters
#Override
public String toString()
{
try
{
return new ObjectMapper().writeValueAsString(this);
}
catch (JsonProcessingException exc)
{
exc.printStackTrace();
return exc.getMessage();
}
}
}
}

List attached devices on ubuntu in java

I'm a little stumped, currently I am trying to list all of the attached devices on my system in linux through a small java app (similar to gparted) I'm working on, my end goal is to get the path to the device so I can format it in my application and perform other actions such as labels, partitioning, etc.
I currently have the following returning the "system root" which on windows will get the appropriate drive (Ex: "C:/ D:/ ...") but on Linux it returns "/" since that is its technical root. I was hoping to get the path to the device (Ex: "/dev/sda /dev/sdb ...") in an array.
What I'm using now
import java.io.File;
class ListAttachedDevices{
public static void main(String[] args) {
File[] paths;
paths = File.listRoots();
for(File path:paths) {
System.out.println(path);
}
}
}
Any help or guidance would be much appreciated, I'm relatively new to SO and I hope this is enough information to cover everything.
Thank you in advance for any help/criticism!
EDIT:
Using part of Phillip's suggestion I have updated my code to the following, the only problem I am having now is detecting if the selected file is related to the linux install (not safe to perform actions on) or an attached drive (safe to perform actions on)
import java.io.File;
import java.io.IOException;
import java.nio.file.FileStore;
import java.nio.file.FileSystems;
import java.util.ArrayList;
import javax.swing.filechooser.FileSystemView;
class ListAttachedDevices{
public static void main(String[] args) throws IOException {
ArrayList<File> dev = new ArrayList<File>();
for (FileStore store : FileSystems.getDefault().getFileStores()) {
String text = store.toString();
String match = "(";
int position = text.indexOf(match);
if(text.substring(position, position + 5).equals("(/dev")){
if(text.substring(position, position + 7).equals("(/dev/s")){
String drivePath = text.substring( position + 1, text.length() - 1);
File drive = new File(drivePath);
dev.add(drive);
FileSystemView fsv = FileSystemView.getFileSystemView();
System.out.println("is (" + drive.getAbsolutePath() + ") root: " + fsv.isFileSystemRoot(drive));
}
}
}
}
}
EDIT 2:
Disregard previous edit, I did not realize this did not detect drives that are not already formatted

Following Elliott Frisch's suggestion to use /proc/partitions I've come up with the following answer. (Be warned this also lists bootable/system drives)
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
class ListAttachedDevices{
public static void main(String[] args) throws IOException {
ArrayList<File> drives = new ArrayList<File>();
BufferedReader br = new BufferedReader(new FileReader("/proc/partitions"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
String text = line;
String drivePath;
if(text.contains("sd")){
int position = text.indexOf("sd");
drivePath = "/dev/" + text.substring(position);
File drive = new File(drivePath);
drives.add(drive);
System.out.println(drive.getAbsolutePath());
}
line = br.readLine();
}
} catch(IOException e){
Logger.getLogger(ListAttachedDevices.class.getName()).log(Level.SEVERE, null, e);
}
finally {
br.close();
}
}
}

How to delete a line from a text file based on certain criteria

Just wondering if anyone would know how to iterate through a csv file and based on a set of rules, delete various lines. Or, alternatively the lines that satisfy the rules can be added to a new output.csv file.
So far I have managed to read the csv file and add each line to an ArrayList. But now I need to apply a set of rules to these lines (preferably using an if statement) and delete lines that do not fit the criteria.
package codeTest;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;
public class Main {
public static void main(String[] args) throws IOException {
String filename = "sample.csv";
try(Stream<String> stream = Files.lines(Paths.get(filename))){
stream.forEach(System.out::println);
try {
File inputFile = new File("sample.csv");
File outputFile = new File("Output.txt");
BufferedReader reader = new BufferedReader(new FileReader(inputFile));
BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile));
String strLine;
java.util.ArrayList<String> list = new java.util.ArrayList<String>();
while((strLine = reader.readLine()) != null){
list.add(strLine);
}
System.out.println("\nTEST OUTPUT..........................\n");
Stream<String> lineToRemove = list.stream().filter(x -> x.contains("yes"));
} catch(Exception e){
System.err.println("Error: " + e.getMessage());
}
}
}
}
Any suggestions?
I am in complete coders block if there is such a thing.

You can use Files.write method:
List<String> filtered = Files.lines(Paths.get(filename)).
filter(x -> x.contains("yes")).collect(Collectors.toList());
Files.write(Paths.get("Output.txt"),filtered);

I am not able to get properties from an ontology file

I am not able to get properties from an ontology file. I am new to ontology and apache jena. And also I am not able to use 'getProperty' command in proper way.
Till these code I am able to get classes but I don't know how to use 'getProperty','listObjectsOfProperty' command to get properties.
package onto1;
import java.io.InputStream;
import org.semarglproject.vocab.OWL;
import com.hp.hpl.jena.ontology.OntClass;
import com.hp.hpl.jena.ontology.OntModel;
import com.hp.hpl.jena.ontology.OntModelSpec;
import com.hp.hpl.jena.rdf.model.ModelFactory;
import com.hp.hpl.jena.util.FileManager;
import com.hp.hpl.jena.util.iterator.ExtendedIterator;
public class ontolo {
public static void main(String[] args) {
OntModel model = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
String inputFileName = "file:///F:/apache-jena-2.12.1/travel.owl";
InputStream in = FileManager.get().open( inputFileName );
if (in == null) {
throw new IllegalArgumentException("File: " + inputFileName + " not found");
}
//Property hasTime = model.createProperty( "file:///F:/apache-jena-2.12.1/shopping.owl#" );
//Property getProperty ( inputFileName, hastime) ;
model.read(in, null);
com.hp.hpl.jena.rdf.model.Property irrr = model.getProperty(OWL.ON_PROPERTIES);
com.hp.hpl.jena.rdf.model.NodeIterator iter1 = model.listObjectsOfProperty(irrr);
com.hp.hpl.jena.rdf.model.ResIterator i = model.listSubjectsWithProperty (irrr);
//com.hp.hpl.jena.rdf.model.Statement iir = model.getRequiredProperty(inputFileName, irrr);
//com.hp.hpl.jena.rdf.model.NodeIterator iter2 = model.listObjectsOfProperty(inputFileName.subClassOf);
ExtendedIterator<OntClass> iter = ((OntModel) model).listClasses();
while ( iter.hasNext()){
System.out.println(iter.next().toString());
}
// write it to standard out
model.write(System.out);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to use multiple CSV files in mapreduce - java

Related

Storing values from a Hash Map into a Text File

How can I append to a particular row using text file in JAVA?

List attached devices on ubuntu in java

How to delete a line from a text file based on certain criteria

I am not able to get properties from an ontology file

Categories

Resources