Running Hbase ImportTSV job remotely - java

I am trying to run HBase importTSV hadoop job to load data into HBase from a TSV file. I am using the following code.
Configuration config = new Configuration();
Iterator iter = config.iterator();
while(iter.hasNext())
{
Object obj = iter.next();
System.out.println(obj);
}
Job job = new Job(config);
job.setJarByClass(ImportTsv.class);
job.setJobName("ImportTsv");
job.getConfiguration().set("user", "hadoop");
job.waitForCompletion(true);
I am getting this error
ERROR security.UserGroupInformation: PriviledgedActionException as:E317376 cause:org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=E317376, access=WRITE, inode="staging":hadoop:supergroup:rwxr-xr-x
I dont know how user name E317376 is being set. This is my windows machine user from where I am trying to run this job in a remote cluster. My haddop user account in linux machine is "hadoop"
when i run this in linux machine which is part of Hadoop cluster under hadoop user account, everything works well. But I want to programatically run this job in a java web application. Am I doing anything wrong. Please help...

you should have a property like bellow in your mapred-site.xml file
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
<property>
and maybe it is necessary to chmod the /user folder of your dfs file system to 777
do not forget to stop/start your jobtrackers and tasktrackers (sh stop-mapred.sh and sh start-mapred.sh)

I havent tested these solutions, but try adding something like this in your job configuration
conf.set("hadoop.job.ugi", "hadoop");
The above may be abolete so you can also try the following with user set as hadoop
(code from http://hadoop.apache.org/common/docs/r1.0.3/Secure_Impersonation.html)
:
UserGroupInformation ugi =
UserGroupInformation.createProxyUser(user, UserGroupInformation.getLoginUser());
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
//Submit a job
JobClient jc = new JobClient(conf);
jc.submitJob(conf);
//OR access hdfs
FileSystem fs = FileSystem.get(conf);
fs.mkdir(someFilePath);
}
}

Related

Unable to Connect to Hbase through Java

I am trying to connect to Hbase from Java. Hbase -Version 1.0.0
But I am unable to connect it. Kindly tell me what I am missing as I am new to Hbase.
Here is my code
public class HbaseAddRetrieveData{
public static void main(String[] args) throws IOException {
TableName tableName = TableName.valueOf("stock-prices");
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.master","LocalHost:60000");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.zookeeper.quorum", "LocalHost");
conf.set("zookeeper.znode.parent", "/hbase-unsecure");
System.out.println("Config set");
Connection conn = ConnectionFactory.createConnection(conf);
System.out.println("Connection");
Admin admin = conn.getAdmin();
if (!admin.tableExists(tableName)) {
System.out.println("In admin");
admin.createTable(new HTableDescriptor(tableName).addFamily(new HColumnDescriptor("cf")));
}
Table table = conn.getTable(tableName);
Put p = new Put(Bytes.toBytes("AAPL10232015"));
p.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("close"), Bytes.toBytes(119));
table.put(p);
Result r = table.get(new Get(Bytes.toBytes("AAPL10232015")));
System.out.println(r);
}
Below is the error I am facing :
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:131)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:287)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:267)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:139)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:134)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:823)
at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:601)
at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:365)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:281)
at com.tcs.healthcare.HbaseRetrieveData.main(HbaseRetrieveData.java:32)
Kindly guide me through this
change L in localhost to l and 'H' to 'h'
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.master","localhost:60000");
conf.set("hbase.zookeeper.property.clientPort", "2181");
conf.set("hbase.zookeeper.quorum", "localhost");
conf.set("zookeeper.znode.parent", "/hbase-unsecure");
If you are using remote machine, then conf.set("hbase.master","remotehost:60000"); even its not working then you check all the jars (which are remote) in the classpath
if you are using maven you can point to the jars which are there in cluster.
you can go to cluster and check below command to know the version of jars which are there in remote machine.
`hbase classpath`
Same version jars should be there in your client machine. For ex hbasex.jar in the remote machine and you are using hbasey.jar then it wont connect.
Moreover, please check from client machine you can able to ping server/cluster or not. Generally there will be firewall restrictions.

Hadoop Remote Copying

I need to copy some files from hdfs:///user/hdfs/path1 to hdfs:///user/hdfs/path2. I wrote a java code to do the job:-
ugi = UserGroupInformation.createRemoteUser("hdfs", AuthMethod.SIMPLE);
System.out.println(ugi.getUserName());
conf = new org.apache.hadoop.conf.Configuration();
// TODO: Change IP
conf.set("fs.defaultFS", URL);
conf.set("hadoop.job.ugi", user);
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
// paths = new ArrayList<>();
fs = FileSystem.get(conf);
I am getting all paths for the wild card as
fs.globStatus(new Path(regPath));
and copy as
FileUtil.copy(fs, p, fs, new Path(to + "/" + p.getName()), false, true, conf);
However copying is failing with following message whereas globstatus execute successfully
WARN BlockReaderFactory:682 - I/O error constructing remote block reader.
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/10.110.80.177:50010]
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3044)
at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:744)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:659)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:327)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:574)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
Note that I am running code remotely over Internet using port forwarding. I.e.
192.168.1.10[JAVA API] ---> 154.23.56.116:8082[Name Node Public I/P]======10.1.3.4:8082[Name Node private IP]
I guess following is the reason:-
Query is made to namenode for globStatus which is successfully executed by the name node
Copying command is passed to namenode which will return 10.110.80.177:50010 for other data nodes address on other machines, and then Java IP will try to pass the copy commands to these data nodes, since they are not exported to outside world I got this error!
Am I right in this deduction? How to solve the issue? Do I need to create a java server at namenode which will collect copy command and copy the files locally in the cluster.

Hadoop : java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException

Am new to hadoop, Today only i started with it,
I want to write the file to hdfs hadoop server, Am using the server hadoop 1.2.1, When i give jps command in cli am able to see all the nodes are running,
31895 Jps
29419 SecondaryNameNode
29745 TaskTracker
29257 DataNode
This is my sample client code to write the file to hdfs system
public static void main(String[] args)
{
try {
//1. Get the instance of COnfiguration
Configuration configuration = new Configuration();
configuration.addResource(new Path("/data/WorkArea/hadoop/hadoop-1.2.1/hadoop-1.2.1/conf/core-site.xml"));
configuration.addResource(new Path("/data/WorkArea/hadoop/hadoop-1.2.1/hadoop-1.2.1/conf/hdfs-site.xml"));
//2. Create an InputStream to read the data from local file
InputStream inputStream = new BufferedInputStream(new FileInputStream("/home/local/PAYODA/hariprasanth.l/Desktop/ProjectionTest"));
//3. Get the HDFS instance
FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:54310"), configuration);
//4. Open a OutputStream to write the data, this can be obtained from the FileSytem
OutputStream outputStream = hdfs.create(new Path("hdfs://localhost:54310/user/hadoop/Hadoop_File.txt"),
new Progressable() {
#Override
public void progress() {
System.out.println("....");
}
});
try
{
IOUtils.copyBytes(inputStream, outputStream, 4096, false);
}
finally
{
IOUtils.closeStream(inputStream);
IOUtils.closeStream(outputStream);
}
} catch (Exception e) {
e.printStackTrace();
}
}
My Exception while running the code,
java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1063)
at org.apache.hadoop.ipc.Client.call(Client.java:1031)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at com.sun.proxy.$Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:235)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:275)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:249)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:163)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:283)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:247)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:109)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1792)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:76)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1826)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1808)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:265)
at com.test.hadoop.writefiles.FileWriter.main(FileWriter.java:27)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:760)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:698)
When i debug it, The error happens in the line when i try to connect to hdfs local server,
FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:54310"), configuration);
As fas as I googled, It shows that am mis-matching the version,
Server version of hadoop is - 1.2.1
Client jar am using are
hadoop-common-0.22.0.jar
hadoop-hdfs-0.22.0.jar
Please tell me the problem, ASAP,
If possible recommend the place where can i find the client jars for hadoop, name the jars too... please...
Regards,
Hari
It is because of the same class representation in different jar (i.e) hadoop commons and hadoop core having the same class.
Actually I got confused of using the corresponding jars.
Finally I ended up using the apache hadoop core. It works like a fly.
There is no NameNode running. Problem is with your Namenode. Did you format NameNode before starting up?
hadoop namenode -format

Java Client For Secure Hbase

Hi I am trying to write a java client for secure hbase.
I want to do kinit also from code itself for that i`m using the usergroup information class.
Can anyone point out where am I going wrong here?
this is the main method that Im trying to connect o hbase from.
I have to add the configuration in the CONfiguration object rather than using the xml, because the client can be located anywhere.
Please see the code below:
public static void main(String [] args) {
try {
System.setProperty(CommonConstants.KRB_REALM, ConfigUtil.getProperty(CommonConstants.HADOOP_CONF, "krb.realm"));
System.setProperty(CommonConstants.KRB_KDC, ConfigUtil.getProperty(CommonConstants.HADOOP_CONF,"krb.kdc"));
System.setProperty(CommonConstants.KRB_DEBUG, "true");
final Configuration config = HBaseConfiguration.create();
config.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, AUTH_KRB);
config.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION, AUTHORIZATION);
config.set(CommonConfigurationKeysPublic.FS_AUTOMATIC_CLOSE_KEY, AUTO_CLOSE);
config.set(CommonConfigurationKeysPublic.FS_DEFAULT_NAME_KEY, defaultFS);
config.set("hbase.zookeeper.quorum", ConfigUtil.getProperty(CommonConstants.HBASE_CONF, "hbase.host"));
config.set("hbase.zookeeper.property.clientPort", ConfigUtil.getProperty(CommonConstants.HBASE_CONF, "hbase.port"));
config.set("hbase.client.retries.number", Integer.toString(0));
config.set("zookeeper.session.timeout", Integer.toString(6000));
config.set("zookeeper.recovery.retry", Integer.toString(0));
config.set("hbase.master", "gauravt-namenode.pbi.global.pvt:60000");
config.set("zookeeper.znode.parent", "/hbase-secure");
config.set("hbase.rpc.engine", "org.apache.hadoop.hbase.ipc.SecureRpcEngine");
config.set("hbase.security.authentication", AUTH_KRB);
config.set("hbase.security.authorization", AUTHORIZATION);
config.set("hbase.master.kerberos.principal", "hbase/gauravt-namenode.pbi.global.pvt#pbi.global.pvt");
config.set("hbase.master.keytab.file", "D:/var/lib/bda/secure/keytabs/hbase.service.keytab");
config.set("hbase.regionserver.kerberos.principal", "hbase/gauravt-datanode2.pbi.global.pvt#pbi.global.pvt");
config.set("hbase.regionserver.keytab.file", "D:/var/lib/bda/secure/keytabs/hbase.service.keytab");
UserGroupInformation.setConfiguration(config);
UserGroupInformation userGroupInformation = UserGroupInformation.loginUserFromKeytabAndReturnUGI("hbase/gauravt-datanode2.pbi.global.pvt#pbi.global.pvt", "D:/var/lib/bda/secure/keytabs/hbase.service.keytab");
UserGroupInformation.setLoginUser(userGroupInformation);
User user = User.create(userGroupInformation);
user.runAs(new PrivilegedExceptionAction<Object>() {
#Override
public Object run() throws Exception {
HBaseAdmin admins = new HBaseAdmin(config);
if(admins.isTableAvailable("ambarismoketest")) {
System.out.println("Table is available");
};
HConnection connection = HConnectionManager.createConnection(config);
HTableInterface table = connection.getTable("ambarismoketest");
admins.close();
System.out.println(table.get(new Get(null)));
return table.get(new Get(null));
}
});
System.out.println(UserGroupInformation.getLoginUser().getUserName());
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I`m getting the following exception:
Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): GSS initiate failed
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.readStatus(HBaseSaslRpcClient.java:110)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:146)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:762)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:354)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:883)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:880)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:880)
... 33 more
Any pointers would be helpful.
The above works nicely, but I've seen a lot of folks struggle with setting all of the right properties in the Configuration object. There's no de-facto list that I've found of exactly what you need and don't need and it is painfully dependent on your cluster configuration.
The surefire way is to have a copy of your HBase configurations in your classpath, since your client can be anywhere as you mentioned. Then you can add the resources to your object without having to specify all properties.
Configuration conf = HBaseConfiguration.create();
conf.addResource("core-site.xml");
conf.addResource("hbase-site.xml");
conf.addResource("hdfs-site.xml");
Here were some sources to back this approach:
IBM,
Scalding (Scala)
Also note that this approach doesn't limit you to actually use the internal Zookeeper principal and keytab, i.e. you can create keytabs for applications or Active Directory users and leave the internally generated keytabs for the daemons to authenticate amongst themselves.
Not sure if you still need help. I think setting the "hadoop.security.authentication" property is missing from your snippet.
I am using following code snippet to connect to secure HBase (on CDH5). You can give a try.
config.set("hbase.zookeeper.quorum", zookeeperHosts);
config.set("hbase.zookeeper.property.clientPort", zookeeperPort);
config.set("hadoop.security.authentication", "kerberos");
config.set("hbase.security.authentication", "kerberos");
config.set("hbase.master.kerberos.principal", HBASE_MASTER_PRINCIPAL);
config.set("hbase.regionserver.kerberos.principal", HBASE_RS_PRINCIPAL);
UserGroupInformation.setConfiguration(config);
UserGroupInformation.loginUserFromKeytab(ZOOKEEPER_PRINCIPAL,ZOOKEEPER_KEYTAB);
HBaseAdmin admins = new HBaseAdmin(config);
TableName[] tables = admins.listTableNames();
for(TableName table: tables){
System.out.println(table.toString());
}
in Jdk 1.8, you need set
"System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");"
config.set("hbase.zookeeper.quorum", zookeeperHosts);
config.set("hbase.zookeeper.property.clientPort", zookeeperPort);
config.set("hadoop.security.authentication", "kerberos");
config.set("hbase.security.authentication", "kerberos");
config.set("hbase.master.kerberos.principal", HBASE_MASTER_PRINCIPAL);
config.set("hbase.regionserver.kerberos.principal", HBASE_RS_PRINCIPAL);
System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
UserGroupInformation.setConfiguration(config);
UserGroupInformation.loginUserFromKeytab(ZOOKEEPER_PRINCIPAL,ZOOKEEPER_KEYTAB);
HBaseAdmin admins = new HBaseAdmin(config);
TableName[] tables = admins.listTableNames();
for(TableName table: tables){
System.out.println(table.toString());
}
quote:
http://hbase.apache.org/book.html#trouble.client
question: 142.9
I think the best is https://scalding.io/2015/02/making-your-hbase-client-work-in-a-kerberized-environment/
To make the code work you don’t have to change any line from the one written in the top of this post, you just have to make your client able to access the full HBase configuration. This just implies to change your running classpath to:
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hbase/conf:target/scala-2.11/hbase-assembly-1.0.jar
This will make everything run smoothly. It is specific for CDH 5.3 but you can adapt it for your cluster configuration.
PS No need in this:
conf.addResource("core-site.xml");
conf.addResource("hbase-site.xml");
conf.addResource("hdfs-site.xml");
Because HBaseConfiguration has
public static Configuration addHbaseResources(Configuration conf) {
conf.addResource("hbase-default.xml");
conf.addResource("hbase-site.xml");

Reading remote HDFS file with Java

I’m having a bit of trouble with a simple Hadoop install. I’ve downloaded hadoop 2.4.0 and installed on a single CentOS Linux node (Virtual Machine). I’ve configured hadoop for a single node with pseudo distribution as described on the apache site (http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html). It starts with no issues in the logs and I can read + write files using the “hadoop fs” commands from the command line.
I’m attempting to read a file from the HDFS on a remote machine with the Java API. The machine can connect and list directory contents. It can also determine if a file exists with the code:
Path p=new Path("hdfs://test.server:9000/usr/test/test_file.txt");
FileSystem fs = FileSystem.get(new Configuration());
System.out.println(p.getName() + " exists: " + fs.exists(p));
The system prints “true” indicating it exists. However, when I attempt to read the file with:
BufferedReader br = null;
try {
Path p=new Path("hdfs://test.server:9000/usr/test/test_file.txt");
FileSystem fs = FileSystem.get(CONFIG);
System.out.println(p.getName() + " exists: " + fs.exists(p));
br=new BufferedReader(new InputStreamReader(fs.open(p)));
String line = br.readLine();
while (line != null) {
System.out.println(line);
line=br.readLine();
}
}
finally {
if(br != null) br.close();
}
this code throws the exception:
Exception in thread "main" org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-13917963-127.0.0.1-1398476189167:blk_1073741831_1007 file=/usr/test/test_file.txt
Googling gave some possible tips but all checked out. The data node is connected, active, and has enough space. The admin report from hdfs dfsadmin –report shows:
Configured Capacity: 52844687360 (49.22 GB)
Present Capacity: 48507940864 (45.18 GB)
DFS Remaining: 48507887616 (45.18 GB)
DFS Used: 53248 (52 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:
Name: 127.0.0.1:50010 (test.server)
Hostname: test.server
Decommission Status : Normal
Configured Capacity: 52844687360 (49.22 GB)
DFS Used: 53248 (52 KB)
Non DFS Used: 4336746496 (4.04 GB)
DFS Remaining: 48507887616 (45.18 GB)
DFS Used%: 0.00%
DFS Remaining%: 91.79%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Last contact: Fri Apr 25 22:16:56 PDT 2014
The client jars were copied directly from the hadoop install so no version mismatch there. I can browse the file system with my Java class and read file attributes. I just can’t read the file contents without getting the exception. If I try to write a file with the code:
FileSystem fs = null;
BufferedWriter br = null;
System.setProperty("HADOOP_USER_NAME", "root");
try {
fs = FileSystem.get(new Configuraion());
//Path p = new Path(dir, file);
Path p = new Path("hdfs://test.server:9000/usr/test/test.txt");
br = new BufferedWriter(new OutputStreamWriter(fs.create(p,true)));
br.write("Hello World");
}
finally {
if(br != null) br.close();
if(fs != null) fs.close();
}
this creates the file but doesn’t write any bytes and throws the exception:
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /usr/test/test.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
Googling for this indicated a possible space issue but from the dfsadmin report, it seems there is plenty of space. This is a plain vanilla install and I can’t get past this issue.
The environment summary is:
SERVER:
Hadoop 2.4.0 with pseudo-distribution (http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/SingleCluster.html)
CentOS 6.5 Virtual Machine 64 bit server
Java 1.7.0_55
CLIENT:
Windows 8 (Virtual Machine)
Java 1.7.0_51
Any help is greatly appreciated.
Hadoop error messages are frustrating. Often they don't say what they mean and have nothing to do with the real issue. I've seen problems like this occur when the client, namenode, and datanode cannot communicate properly. In your case I would pick one of two issues:
Your cluster runs in a VM and its virtualized network access to the client is blocked.
You are not consistently using fully-qualified domain names (FQDN) that resolve identically between the client and host.
The host name "test.server" is very suspicious. Check all of the following:
Is test.server a FQDN?
Is this the name that has been used EVERYWHERE in your conf files?
Can the client and all hosts forward and reverse resolve
"test.server" and its IP address and get the same thing?
Are IP addresses being used instead of FQDN anywhere?
Is "localhost" being used anywhere?
Any inconsistency in the use of FQDN, hostname, numeric IP, and localhost must be removed. Do not ever mix them in your conf files or in your client code. Consistent use of FQDN is preferred. Consistent use of numeric IP usually also works. Use of unqualified hostname, localhost, or 127.0.0.1 cause problems.
We need to make sure to have configuration with fs.default.name space set such as
configuration.set("fs.default.name","hdfs://ourHDFSNameNode:50000");
Below I've put a piece of sample code:
Configuration configuration = new Configuration();
configuration.set("fs.default.name","hdfs://ourHDFSNameNode:50000");
FileSystem fs = pt.getFileSystem(configuration);
BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(pt)));
String line = null;
line = br.readLine
while (line != null) {
try {
line = br.readLine
System.out.println(line);
}
}
The answer above is pointing to the right direction. Allow me to add the following:
Namenode does NOT directly read or write data.
Client (your Java program using Direct access to HDFS) interacts with Namenode to update HDFS namespace and retrieve block locations for reading/writing.
Client interacts directly with Datanode to read/write data.
You were able to list directory contents because hostname:9000was accessible to your client code. You were doing the number 2 above.
To be able to read and write, your client code needs access to the Datanode (number 3). The default port for Datanode DFS data transfer is 50010. Something was blocking your client communication to hostname:50010. Possibly a firewall or SSH tunneling configuration problem.
I was using Hadoop 2.7.2, so maybe you have a different port number setting.

Categories