MapReduce set input and output

MapReduce set input and output - java

I have a file
import java.io.IOException;
import java.nio.file.Paths;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class ViewCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new ViewCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
//Path inputPath = new Path(args[0]);
Path inputPath = Paths.get("C:/WorkSpace/input.txt");
Path outputPath = Paths.get("C:/WorkSpace/output.txt");
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
I try to run a the app in windows. How can I set inputPath and outputPath? The method I use now doesn't work. Before I had
Path inputPath = new Path(args[0]);
Path outputPath = new Path(args[1]);
and I had to go to the command line. Now I want to run the app from the IDE.
I'm getting
Required:
org.apache.hadoop.fs.Path
Found:
java.nio.file.Path

For Eclipse, you could set arguments :
Run -> run configuration -> arguments.
It should be the same in Intellij.

The error tells you that it is expecting a org.apache.hadoop.fs.Path, but instead it receives a java.nio.file.Paths.
This means that you should change the second import of your code to
org.apache.hadoop.fs.Path. IDEs import suggestions can be wrong some times ;)
Change the import and then use the method that you already had to add the input and output path. Those arguments are given in Eclipse with right-clicking the project -> Run as -> Run configurations -> arguments. The two paths should be white-space separated. Apply and run!
For the next executions, just run the project.

Related

How to get Log Aggregation in code by yarnClient

I'm following this example about create YarnApp by java API.
https://github.com/hortonworks/simple-yarn-app
Works fine, but, the log exists only execution, after it the log gone.
How I can caught this by code ? or maybe enable one option?

You can find logs using LogCliHelpers by application id after application had finished:
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;
import org.apache.hadoop.yarn.conf.YarnConfiguration;
import org.apache.hadoop.yarn.exceptions.YarnException;
import org.apache.hadoop.yarn.logaggregation.LogCLIHelpers;
import java.io.IOException;
import java.io.PrintStream;
public static void getLogs(YarnConfiguration conf, YarnClientApplication app) throws IOException, YarnException {
ApplicationSubmissionContext appContext =
app.getApplicationSubmissionContext();
ApplicationId appId = appContext.getApplicationId();
LogCLIHelpers logCLIHelpers = new LogCLIHelpers();
logCLIHelpers.setConf(conf);
FileSystem fs = FileSystem.get(conf);
Path logFile = new Path("/path/to/log/file.log");
fs.create(logFile, false);
try (PrintStream printStream = new PrintStream(logFile.toString())) {
logCLIHelpers.dumpAllContainersLogs(appId, UserGroupInformation.getCurrentUser().getShortUserName(), printStream);
}
}

Include an exe-file into jar and run it

I tried to to include an exe-file into a jar-Application and to run it. The idea is to extract it temporary at first and then to run the temp-exe-file. How to do that? That is what I have tried. There is my code. It occurs the exception java.io.FileNotFoundException because of the source file "ffmpeg.exe". I verified, but the file is included and the directory is correct.
package extractffmpeg;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
import java.net.URL;
import javafx.application.Application;
import javafx.application.Platform;
import javafx.stage.Stage;
import org.apache.commons.io.FileUtils;
public class ExtractFFmpeg extends Application {
public void start(Stage primaryStage) throws IOException, URISyntaxException {
extractExe();
System.out.println("extract successfull");
Platform.exit();
}
public static void main(String[] args) {
launch(args);
}
public void extractExe() throws URISyntaxException, IOException{
final String resourcesPath = "ffmpeg/ffmpeg.exe";
URL url = ExtractFFmpeg.class.getResource(resourcesPath);
File source = new File(url.toString());
System.out.println("shows url of ffmpeg: " + url.getPath());
System.out.println("shows file of ffmpeg: " + source);
File destination = new File("C:/Users/FNI2Abt/Desktop/ffmpeg.exe");
FileUtils.copyFile(source, destination);
}
}

The idea is to create a self-extracting archive. The archive shall contain both JAR and EXE. The JAR file shall contain a class which would call Process.exec(...) on the adjacent EXE. Starting there, you can follow this tutorial:
How do I make a self extract and running installer

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: expected: file:///

I am trying to implement copyFromLocal command using java, below is my code.
package com.hadoop;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class CopyFromLocal {
public static void main(String[] args) throws IOException, URISyntaxException {
Configuration conf =new Configuration();
conf.addResource(new Path("/usr/hdp/2.3.0.0-2557/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/usr/hdp/2.3.0.0-2557/hadoop/conf/mapred-site.xml"));
conf.addResource(new Path("/usr/hdp/2.3.0.0-2557/hadoop/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
Path sourcePath = new Path("/root/sample.txt");
Path destPath = new Path("hdfs://sandbox.hortonworks.com:8020/user/Deepthy");
if(!(fs.exists(destPath)))
{
System.out.println("No Such destination exists :"+destPath);
return;
}
fs.copyFromLocalFile(sourcePath, destPath);
}
}
I get the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://sandbox.hortonworks.com:8020/user/Deepthy, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
at com.amal.hadoop.CopyFromLocal.main(CopyFromLocal.java:27)
I added these jars to classpath:
hadoop-0.20.1-core.jar
commons-logging-1.1.3.jar
Kindly suggest where I'm going wrong.

Change the configuration as below
conf.set("fs.default.name","hdfs://sandbox.hortonworks.com:8020");
Please Give a relative path in your destination destPath like
Path destPath = new Path("/user/Deepthy");
This will fix the issue

Can a Jar File be updated programmatically without rewriting the whole file?

It is possible to update individual files in a JAR file using the jar command as follows:
jar uf TicTacToe.jar images/new.gif
Is there a way to do this programmatically?
I have to rewrite the entire jar file if I use JarOutputStream, so I was wondering if there was a similar "random access" way to do this. Given that it can be done using the jar tool, I had expected there to be a similar way to do it programmatically.

It is possible to update just parts of the JAR file using Zip File System Provider available in Java 7:
import java.net.URI;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.HashMap;
import java.util.Map;
public class ZipFSPUser {
public static void main(String [] args) throws Throwable {
Map<String, String> env = new HashMap<>();
env.put("create", "true");
// locate file system by using the syntax
// defined in java.net.JarURLConnection
URI uri = URI.create("jar:file:/codeSamples/zipfs/zipfstest.zip");
try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {
Path externalTxtFile = Paths.get("/codeSamples/zipfs/SomeTextFile.txt");
Path pathInZipfile = zipfs.getPath("/SomeTextFile.txt");
// copy a file into the zip file
Files.copy( externalTxtFile,pathInZipfile,
StandardCopyOption.REPLACE_EXISTING );
}
}
}

Yes, if you use this opensource library you can modify it in this way as well.
https://truevfs.java.net
public static void main(String args[]) throws IOException{
File entry = new TFile("c:/tru6413/server/lib/nxps.jar/dir/second.txt");
Writer writer = new TFileWriter(entry);
try {
writer.write(" this is writing into a file inside an archive");
} finally {
writer.close();
}
}

Executing a jar through ant clean dist- appears under different name?

I'm working with hadoop and need to create a .jar file combined from all of my classes in /src file. Everytime I try to create it it appears under WordCount.jar instead of Twitter.jar which I have stated in my code below:
import java.util.Arrays;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Twitter {
public static void runJob(String[] input, String output) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(Twitter.class);
job.setReducerClass(TwitterReducer.class);
job.setMapperClass(TwitterMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(NullWritable.class);
Path outputPath = new Path(output);
FileInputFormat.setInputPaths(job, StringUtils.join(input, ","));
FileOutputFormat.setOutputPath(job, outputPath);
outputPath.getFileSystem(conf).delete(outputPath, true);
job.waitForCompletion(true);
}
public static void main(String[] args) throws Exception {
runJob(Arrays.copyOfRange(args, 0, args.length - 1), args[args.length - 1]);
}
}
Therefore I am unsure what is wrong? The files in the .jar itself are exactly the same as in /src folder.

The name of the Jar file has nothing to do the name of a class in it. You need to check the Ant buildfile, specifically the target that creates the jar. The Ant task that creates the Jar file is usually the jar task and the name of the file can be specified via the destfile attribute.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

MapReduce set input and output - java

For Eclipse, you could set arguments : Run -> run configuration -> arguments. It should be the same in Intellij.

Related

How to get Log Aggregation in code by yarnClient

Include an exe-file into jar and run it

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: expected: file:///

Can a Jar File be updated programmatically without rewriting the whole file?

Executing a jar through ant clean dist- appears under different name?

Categories

Resources