Copy Json Flat file from local to HDFS - java

package com.Main;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class Main {
public static void main(String[] args) throws IOException {
//Source file in the local file system
String localSrc = args[0];
//Destination file in HDFS
String dst = args[1];
//Input stream for the file in local file system to be written to HDFS
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
//Get configimport org.apache.commons.configuration.Configuration;uration of Hadoop system
Configuration conf = new Configuration();
System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));
//Destination file in HDFS
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst));
//Copy file from local to HDFS
IOUtils.copyBytes(in, out, 4096, true);
System.out.println(dst + " copied to HDFS");
}
}
AM getting following error message "Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at com.Main.Main.main(Main.java:22)"
I have Json file in my local , have to move that in HDFS
Ex:
{"Del":"Ef77xvP","time":1509073785106},
{"Del":"2YXsF7r","time":1509073795109}

Specify command line arguments to your program. You code snippet expects first argument to be source and next arguments to be destination.
For more details refer to What is "String args[]"? parameter in main method Java

Related

Writing to Google Persistent DIsk

I've written some java code that creates a CSV file at the mount point of a disk I have attached to a Google Compute instance. I run the script in the form of a SQL stored procedure from the instance that the disk is attached to. The issue is that at the mount point, a "lost+found" folder is created where I would expect to find my CSV file. What am I doing wrong? Thank you for your time! The code is similar to as follows:
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.FileOutputStream;
public class file_write {
public static void main(String[] args) throws IOException{
String filePath = "/mnt/point/file.csv";
// Creates file in mount point
File myFile = new File(filePath);
myFile.getParentFile().mkdirs();
myFile.createNewFile();
FileWriter stringWriter = new FileWriter(myFile);
for(int i = 0; i < 100000; i++) {
stringWriter.write(i + ", ");
stringWriter.write("something");
stringWriter.write(System.lineSeparator());
}
stringWriter.close();
}
}

Typo in word "hdfs" gives me: "java.io.IOException: No FileSystem for scheme: hdfs". Using FileSystem lib over hadoop 2.7.7

While using FileSystem.get(URI.create("hdfs://localhost:9000/"), configuration) I'm getting the error "Typo in word hdfs", when I tried to run the code gives me the IOException:
java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2658)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2665)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2701)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2683)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at com.oracle.hadoop.client.Test.main(Test.java:53)
I already tried to use in different ways to use the call to hdfs, I'm using the libraries for hadoop 2.7.7
Here is my current code:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.log4j.BasicConfigurator;
import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
public class Test {
public static void main(String []args) {
Configuration conf = new Configuration();
InputStream in = null;
try {
FileSystem fs = FileSystem.get(URI.create("hdfs://localhost:9000/"), conf);
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
} catch (IOException e) {
e.printStackTrace();
} finally {
IOUtils.closeStream(in);
}
}
Actually, I just added this maven dependency: http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-hdfs/2.7.7
to maven pom.xml and problem solved.

Copy JSON file from Local to HDFS

import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class HdfsWriter extends Configured implements Tool {
public int run(String[] args) throws Exception {
//String localInputPath = args[0];
Path outputPath = new Path(args[0]); // ARGUMENT FOR OUTPUT_LOCATION
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
OutputStream os = fs.create(outputPath);
InputStream is = new BufferedInputStream(new FileInputStream("/home/acadgild/acadgild.txt")); //Data set is getting copied into input stream through buffer mechanism.
IOUtils.copyBytes(is, os, conf); // Copying the dataset from input stream to output stream
return 0;
}
public static void main(String[] args) throws Exception {
int returnCode = ToolRunner.run(new HdfsWriter(), args);
System.exit(returnCode);
}
}
Need to Move the data from Local to HDFS.
The above code I got from another blog , it's not working. can anyone help me on this.
Also i need to parse the Json using MR and group by DateTime and move to HDFS
Map Reduce is a distributed job processing framework
for each mapper local means the local filesytem on the node on which that mapper is running.
What you want is reading from local on a given node to be put on to HDFS and then processing it via MapReduce.
There are multiple tools available for copying from Local of one node to HDFS
hdfs put localPath HdfsPath (Shell script)
flume

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: expected: file:///

I am trying to implement copyFromLocal command using java, below is my code.
package com.hadoop;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class CopyFromLocal {
public static void main(String[] args) throws IOException, URISyntaxException {
Configuration conf =new Configuration();
conf.addResource(new Path("/usr/hdp/2.3.0.0-2557/hadoop/conf/core-site.xml"));
conf.addResource(new Path("/usr/hdp/2.3.0.0-2557/hadoop/conf/mapred-site.xml"));
conf.addResource(new Path("/usr/hdp/2.3.0.0-2557/hadoop/conf/hdfs-site.xml"));
FileSystem fs = FileSystem.get(conf);
Path sourcePath = new Path("/root/sample.txt");
Path destPath = new Path("hdfs://sandbox.hortonworks.com:8020/user/Deepthy");
if(!(fs.exists(destPath)))
{
System.out.println("No Such destination exists :"+destPath);
return;
}
fs.copyFromLocalFile(sourcePath, destPath);
}
}
I get the following exception:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://sandbox.hortonworks.com:8020/user/Deepthy, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:305)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:643)
at com.amal.hadoop.CopyFromLocal.main(CopyFromLocal.java:27)
I added these jars to classpath:
hadoop-0.20.1-core.jar
commons-logging-1.1.3.jar
Kindly suggest where I'm going wrong.
Change the configuration as below
conf.set("fs.default.name","hdfs://sandbox.hortonworks.com:8020");
Please Give a relative path in your destination destPath like
Path destPath = new Path("/user/Deepthy");
This will fix the issue

Create and write a file into hdfs from my local machine

I have two systems connected in the network. One is hdfs running. I want to create a file and write data from my another machine.
package myorg;
import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class Write1{
public static void main (String [] args) throws Exception{
try{
System.out.println("Starting...");
Path pt=new Path("hdfs://10.236.173.95:8020/user/jfor/out/gwmdfd");
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
// TO append data to a file, use fs.append(Path f)
String line;
line="Disha Dishu Daasha dfasdasdawqeqwe";
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
System.out.println("File not found");
}
}
}
I compiled it using
javac -classpath hadoop-0.20.1-dev-core.jar -d Write1/ Write1.java
Created a jar using
jar -cvf Write1.jar -C Write1/ .
Run command
hadoop jar Write1.jar myorg.Write1
If i run this, i am getting
starting...
File not found
What could be the reason? If i run this program, in my hadoop machine, it works fine [I replaced ip with localhost].
Error is at BufferedWriter line. It says "File Not found". what does it mean? I used fs.creat. Then it should create if it doesn't exist. Isn't?
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.72.40.68:8020/user/jfor/..... expected localhost:8020
So i modified the following line
FileSystem fs = FileSystem.get(new URI("hdfs://<ip>:8020"),new Configuration());
It says Connection refused. What could be the reason

Categories