Access args[0] value in MapReduce

Access args[0] value in MapReduce - java

I am trying to do a chaning Job.
So to some point I want to access the args (public static void main(String[] args)).
say args[0] in mapper.
Is there a way to access those values in mapper rather than sending them to function and accessing?
Alternative Solution
conf.set("args", args[1]);
job1.setJarByClass(BinningDriver.class);
FileSystem fs1 = FileSystem.get(conf);
job1.setOutputKeyClass(Text.class);
job1.setOutputValueClass(Text.class);
job1.setMapperClass(BinningInput.class);
job1.setInputFormatClass(TextInputFormat.class);
job1.setOutputFormatClass(TextOutputFormat.class);
Path out = new Path(args[1]+"/Indexing"); //Output goes to user output location/indexing
if(fs1.exists(out)){
fs1.delete(out,true);
}
FileInputFormat.addInputPath(job1, new Path(args[0]));
FileOutputFormat.setOutputPath(job1, out);
}
Mapper
public void setup(Context context){
Configuration conf = context.getConfiguration();
String param = conf.get("args");
System.out.println("args:"+param);
}
This Works

Args[] is the input parameter of the main function of the Driver class. The only way to access this parameter is from within the Driver (the scope of this parameter is only the main function). So, if you want to pass these to the mapper, you will need to pass them as parameters (e.g. add this information to the Distributed Cache and get it from the configuration of the mappers).
If you simply want to pass some parameters, check this article, and replace "123" with args[2], or whatever arg you are interested in.
If you want to pass a whole file for processing, do the following:
Example:
main method in the Driver class:
public static void main(String[] args) {
...
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
...
try {
DistributedCache.addCacheFile(new URI(args[2]), conf);
} catch (URISyntaxException e) {
System.err.println(e.toString());
}
....
}
In the Mapper, before the map() method, define the configure method (I am using hadoop 1.2.0):
Set<String> lines;
public void configure(JobConf job){
lines = new HashSet<>();
BufferedReader SW;
try {
localFiles = DistributedCache.getLocalCacheFiles(job);
SW = new BufferedReader(new FileReader(localFiles[0].toString()));
lines.add(SW.readLine());
SW.close();
} catch (FileNotFoundException e) {
System.err.println(e.toString());
} catch (IOException e) {
System.err.println(e.toString());
}
}
For more information on how to use the Distributed Cache, see the API:
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/filecache/DistributedCache.html

Related

How to create an instance of newly added class in java at runtime

I have to write a different class to read a file of different kind. Now project is deployed on client side. And we have to give support to new files. so we have to create a new class and also modify in service class to create a new object of newly added class. Writing a new class for new type of class is fine. But I do not want to change service class each time. Is there any solution for this kind of problem? Thanks in advance.
Update 1: here is code of service class
#Service("StockistServiceImpl")
public class StockistServiceImpl implements StockistService {
#Override
#Transactional(propagation = Propagation.REQUIRED,rollbackFor=Exception.class)
public JSONArray saveStockistOrder(Integer stockistId,
MultipartFile[] orderFile, String orderNumber, String orderDate,
String partyCode,String order,Integer userId)
{
List<Pair<String, Integer>> charList = new ArrayList<Pair<String, Integer>>();
Properties code1 = new Properties();
try {
code.load(StockistServiceImpl.class.getClassLoader().getResourceAsStream("categoryOfFile.properties"));
}
catch (IOException e) {
//System.out.println("error in loading divisionNamePdfCode.properties");
e.printStackTrace();
}
String readDuelListedTxtFile = code.getProperty("readDuelListedTxtFile");
String readStartLineLengthForOrderTxtFile = code.getProperty("readStartLineLengthForOrderTxtFile");
String ReadFileWithNoStartLineTxtFile = code.getProperty("ReadFileWithNoStartLineTxtFile");
String ReadStartLineLengthForQtySingleListTxtFile = code.getProperty("ReadStartLineLengthForQtySingleListTxtFile");
if (readDuelListedTxtFile.contains(partyCode
.trim())) {
charList.addAll(dualListText
.readDuelListedTxtFile(
fileName, codeDetails));
}
else if (readStartLineLengthForOrderTxtFile.contains(partyCode
.trim())) {
charList.addAll(lineLength
.readStartLineLengthForOrderTxtFile(
fileName, codeDetails));
}
else if (ReadFileWithNoStartLineTxtFile.contains(partyCode
.trim())) {
T_FileWithNoStartLine noStartLine = new T_FileWithNoStartLine();
charList.addAll(noStartLine
.readFileWithNoStartLineTxtFile(
fileName, codeDetails));
}
else if (ReadStartLineLengthForQtySingleListTxtFile.contains(partyCode
.trim())) {
T_StartLineLengthForQtySingleList noStartLine = new T_StartLineLengthForQtySingleList();
charList.addAll(noStartLine
.readStartLineLengthForQtySingleListTxtFile(
fileName, codeDetails));
}
}
Update 2: here is property file from where we know that what is file type for a stockist.
#fileType,stockistCode
fileType1=ST001,ST009
fileType2=ST002,ST005,ST006
fileType3=ST003,ST007
fileType4=ST004,ST008
and i want to add a new property file like this to map a file type with class name so if a new class is added and then we will not have to edit service class.
#fileType,fullyqualifiedclassName
fileType1=FullyQualifiedClassName1
fileType2=FullyQualifiedclassName2
fileType3=FullyQualifiedClassName3
fileType4=FullyQualifiedclassName4

Separate the creation of the file readers objects and the service class.
public class BuildFileReader() {
FileReader getReader(String xyz) {
FileReader reader;
...
your logic
reader = new WhatEverReaderYouWant();
...
return reader;
}
}
The service class simply asks the BuildFileReader which FileReader to use and doesn't need to change anymore.
public class StockistServiceImpl {
...
BuildFileReader bfr = new BuildFileReader();
FileReader fileReader = bfr.getReader(xyz);
fileReader.readFile(fileName, codeDetails);
...
}
If you need only one type of file reader per client, you could configure your BuildFileReader for each client.
If you need more than one type of file reader per client, define an interface for each type an add a getReaderXYZ() function for each needed type in BuildFileReader.

Instance can be created at runtime using reflection in java, please have a look at below post:
Creating an instance using the class name and calling constructor

Finally after doing some code changes and adding property file for mapping class names with property of file here is the code and working fine.
#Service("StockistServiceImpl")
public class StockistServiceImpl implements StockistService {
List<Pair<String, Integer>> charList = new ArrayList<Pair<String, Integer>>();
Map<String,String> mapTxtFile = new HashMap<String, String>();
Properties fileTypeProperties = new Properties();
Properties matchClassNameProperties = new Properties();
try {
fileTypeProperties.load(StockistServiceImpl.class.getClassLoader().getResourceAsStream("fileTypeProperties.properties"));
}
catch (IOException e) {
//e.printStackTrace();
}
try {
matchClassNameProperties.load(StockistServiceImpl.class.getClassLoader().getResourceAsStream("matchClassNameProperties.properties"));
}
catch (IOException e) {
//e.printStackTrace();
}
for (String key : fileTypeProperties.stringPropertyNames()) {
String value = fileTypeProperties.getProperty(key);
mapTxtFile.put(key, value);
if(value.contains(partyCode.trim())){
String className = matchClassNameProperties.getProperty(key);
try {
Class clazz = Class.forName(className);
try {
TxtFile objToReadTxtFile = (TxtFile) clazz.newInstance();
charList= objToReadTxtFile.readTxtFile(fileName, codeDetails);
} catch (InstantiationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IllegalAccessException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}else{
//read normally else block
}
}
}
Now it is working fine.But for that i created an interface for reading txt file which has readTxtFile method. and all other classes now implement this interface.

How to use a Path object as a String

I'm looking to try and create a Java trivia application that reads the trivia from separate question files in a given folder. My idea was to use the run() method in the FileHandler class to set every text file in the folder into a dictionary and give them integer keys so that I could easily randomize the order at which they appear in the game. I found a simple chunk of code that is able to step through the folder and get the paths of every single file, but in the form a Path class. I need the paths (or just the names) in the form a String class. Because I need to later turn them into a file class (which excepts a String Constructor, not a Path). Here is the chunk of code that walks through the folder:
public class FileHandler implements Runnable{
static Map<Integer, Path> TriviaFiles; //idealy Map<Integer, String>
private int keyChoices = 0;
public FileHandler(){
TriviaFiles = new HashMap<Integer, Path>();
}
public void run(){
try {
Files.walk(Paths.get("/home/chris/JavaWorkspace/GameSpace/bin/TriviaQuestions")).forEach(filePath -> {
if (Files.isRegularFile(filePath)) {
TriviaFiles.put(keyChoices, filePath);
keyChoices++;
System.out.println(filePath);
}
});
} catch (FileNotFoundException e) {
System.out.println("File not found for FileHandler");
} catch (IOException e ){
e.printStackTrace();
}
}
public static synchronized Path getNextValue(){
return TriviaFiles.get(2);
}
}
There is another class named TextHandler() which reads the individual txt files and turns them into questions. Here it is:
public class TextHandler {
private String A1, A2, A3, A4, question, answer;
//line = null;
public void determineQuestion(){
readFile("Question2.txt" /* in file que*/);
WindowComp.setQuestion(question);
WindowComp.setAnswers(A1,A2,A3,A4);
}
public void readFile(String toRead){
try{
File file = new File("/home/chris/JavaWorkspace/GameSpace/bin/TriviaQuestions",toRead);
System.out.println(file.getCanonicalPath());
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
question = br.readLine();
A1 = br.readLine();
A2 = br.readLine();
A3 = br.readLine();
A4 = br.readLine();
answer = br.readLine();
br.close();
}
catch(FileNotFoundException e){
System.out.println("file not found");
}
catch(IOException e){
System.out.println("error reading file");
}
}
}
There is stuff I didn't include in this TextHandler sample which is unimportant.
My idea was to use the determineQuestion() method to readFile(FileHandler.getNextQuestion).
I am just having trouble working around the Path to String discrepancy
Thanks a bunch.

You can simply use Path.toString() which returns full path as a String. But kindly note that if path is null this method can cause NullPointerException. To avoid this exception you can use String#valueOf instead.
public class Test {
public static void main(String[] args) throws NoSuchFieldException, SecurityException {
Path path = Paths.get("/my/test/folder/", "text.txt");
String str = path.toString();
// String str = String.valueOf(path); //This is Null Safe
System.out.println(str);
}
}
Output
\my\test\folder\text.txt

How to run the same class in java with multiple different input files automatically

I am wondering how to run a same java class with different command line options without manually change those command line options?
Basically, for inputFile and treeFile, I have more than 100 different combinations of the two files. I can not do "edit configurations" in IntelliJ to get result manually for each combination of treeFile and inputFile.
Could anybody give some suggestions to me such that how to create a loop of those inputFile and treeFile so that I do not need to manually specifying them for each combination.
Your help is highly appreciated!!!!
#Option(gloss="File of provided alignment")
public File inputFile;
#Option(gloss="File of the tree topology")
public File treeFile;
My java class code is below:
public class UniformizationSample implements Runnable
{
#Option(gloss="File of provided alignment")
public File inputFile;
#Option(gloss="File of the tree topology")
public File treeFile;
#Option(gloss="ESS Experiment Number")
public int rep = 1;
#Option(gloss="Rate Matrix Method")
public RateMtxNames selectedRateMtx = RateMtxNames.POLARITYSIZEGTR;
#Option(gloss = "True rate matrix generating data")
public File rateMtxFile;
#Option(gloss="Use cache or not")
public boolean cached=true;
private final PrintWriter detailWriter = BriefIO.output(Results.getFileInResultFolder("experiment.details.txt"));
public void run() {
ObjectMapper mapper = new ObjectMapper();
double[][] array;
EndPointSampler.cached=cached;
try (FileInputStream in = new FileInputStream(rateMtxFile)) {
array = mapper.readValue(in, double[][].class);
long startTime = System.currentTimeMillis();
UnrootedTreeLikelihood<MultiCategorySubstitutionModel<ExpFamMixture>> likelihood1 =
UnrootedTreeLikelihood
.fromFastaFile(inputFile, selectedRateMtx)
.withSingleRateMatrix(array)
.withExpFamMixture(ExpFamMixture.rateMtxModel(selectedRateMtx))
.withTree(treeFile);
Random rand = new Random(1);
likelihood1.evolutionaryModel.samplePosteriorPaths(rand, likelihood1.observations, likelihood1.tree);
logToFile("Total time in seconds: " + ((System.currentTimeMillis() - startTime) / 1000.0));
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (JsonMappingException e) {
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String [] args)
{
Mains.instrumentedRun(args, new UniformizationSample());
}
public void logToFile(String someline) {
this.detailWriter.println(someline);
this.detailWriter.flush();
}
}

There is no way to do this in IntelliJ IDEA. However, you can modify your UniformizationSample class so that it will take the input data as method parameters, and write another Java class that will loop through your inputs and call your class with the necessary parameters.

Run BASH command in JAVA in background

I made a function that executes a command from BASH, and i want to make it run in background and never stop the execution of the main program.
I could use screen -AmdS screen_thread123 php script.php but the main ideea is that i learn and understand how threads work.
I have a basic knowledge about this, but right now i want to create a quick dynamic thread like the example of bellow :
public static void exec_command_background(String command) throws IOException, InterruptedException
{
List<String> listCommands = new ArrayList<String>();
String[] arrayExplodedCommands = command.split(" ");
// it should work also with listCommands.addAll(Arrays.asList(arrayExplodedCommands));
for(String element : arrayExplodedCommands)
{
listCommands.add(element);
}
new Thread(new Runnable(){
public void run()
{
try
{
ProcessBuilder ps = new ProcessBuilder(listCommands);
ps.redirectErrorStream(true);
Process p = ps.start();
p.waitFor();
}
catch (IOException e)
{
}
finally
{
}
}
}).start();
}
and it gives me this error
NologinScanner.java:206: error: local variable listCommands is accessed from within inner class; needs to be declared final
ProcessBuilder ps = new ProcessBuilder(listCommands);
1 error
Why is that and how can i solve it? I mean how can i access the variable listCommands from this block?
new Thread(new Runnable(){
public void run()
{
try
{
// code here
}
catch (IOException e)
{
}
finally
{
}
}
}).start();
}
Thanks.

You don't need that inner class (and you don't want to waitFor)... just use
for(String element : arrayExplodedCommands)
{
listCommands.add(element);
}
ProcessBuilder ps = new ProcessBuilder(listCommands);
ps.redirectErrorStream(true);
Process p = ps.start();
// That's it.
As for your question of accessing the variable listCommands in your original block; make the reference final - like so
final List<String> listCommands = new ArrayList<String>();

Finding the cluster that an instance got assigned to with Weka

Im using an EM clusterer with an AddCluster Filter in order to see what instances are getting assigned to the different clusters after training. Below is the code that I'm using. I'm faily sure that I am applying the filter correctly but once I have the new Instances I still dont know how to get the cluster info from them. Im sure its just a simple getBlah() call but I'm just not locating it. Thanks in advance.
public Cluster()
{
clusterer = new EM();
filter = new AddCluster();
try
{
clusterer.setMaxIterations(100);
clusterer.setNumClusters(20);
filter.setClusterer(clusterer);
}
catch (Exception e)
{
e.printStackTrace();
}
}
public void buildCluster(String fileName)
{
try
{
DataSource source = new DataSource(fileName);
inst = source.getDataSet();
filter.setInputFormat(inst);
inst = AddCluster.useFilter(inst, filter);
}
catch (Exception e)
{
e.printStackTrace();
}
}

I think you should use "Dictionary" Class. Here is my example code:
Enumeration clusteredInst = data_to_use.enumerateInstances();<br>
Dictionary<Integer, ArrayList<Instance>> clusteredSamples = new ashtable<>();
while (clusteredInst.hasMoreElements()) {<br>
Instance ins = (Instance) clusteredInst.nextElement();<br>
int clusterNumb = em.clusterInstance(ins);<br>
ArrayList<Instance> cls = null;<br>
cls = clusteredSamples.get(clusterNumb);<br>
if (cls != null) {<br>
cls.add(ins);<br>
} else {<br>
cls = new ArrayList<>();<br>
cls.add(ins);<br>
//you add elements to dictionary using put method<br>
//put(key, value)<br>
clusteredSamples.put(clusterNumb, cls);<br>
}
}
And you also can retrieval your data from dictionary by call it's Key.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Access args[0] value in MapReduce - java

Related

How to create an instance of newly added class in java at runtime

How to use a Path object as a String

How to run the same class in java with multiple different input files automatically

Run BASH command in JAVA in background

Finding the cluster that an instance got assigned to with Weka

Categories

Resources