Java with HDFS file read/write

Java with HDFS file read/write - java

I am new to Hadoop and Java. I have to read and write to a *.txt file stored on HDFS in my remote cloud-era distribution. And for the same I have this small java program written:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URI;
import java.net.URISyntaxException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class ReadHadoopFileData {
public static void main(String[] args) throws IOException, URISyntaxException {
Configuration configuration = new Configuration();
FileSystem hdfs = FileSystem.get( new URI( "hdfs://admin:H4d00p#172.16.10.124:8888" ), configuration );
Path file = new Path("hdfs://admin:H4d00p#172.16.10.124:8888/user/admin/Data/Tlog.txt");
try{
BufferedReader br=new BufferedReader(new InputStreamReader(hdfs.open(file)));
String line;
line=br.readLine();
while (line != null){
System.out.println(line);
line=br.readLine();
}
}catch(Exception e){
e.printStackTrace();
}
}
}
But when the row BufferedReader br=new BufferedReader(new InputStreamReader(hdfs.open(file))); is executed I am running into this error:
java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.; Host Details : local host is: "KWTLT02221/169.254.208.16"; destination host is: "172.16.104.124":8888;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1220)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1210)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1200)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:271)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:231)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1498)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at ReadHadoopFileData.main(ReadHadoopFileData.java:26)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had invalid wire type.
at com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:461)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:579)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:280)
at com.google.protobuf.CodedInputStream.readGroup(CodedInputStream.java:240)
at com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:488)
at com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:2207)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.<init>(RpcHeaderProtos.java:2165)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:2295)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto$1.parsePartialFrom(RpcHeaderProtos.java:2290)
at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:253)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:259)
at com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:49)
at org.apache.hadoop.ipc.protobuf.RpcHeaderProtos$RpcResponseHeaderProto.parseDelimitedFrom(RpcHeaderProtos.java:3167)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:966)
Could someone help me out to get this resolved please ? I am on this for a day now.

I figured out the solution for this error. And looks like I was using the wrong port. I was using the port number as I see on HUE URL ( misleaded from different sources).
If I chose the port number as defined for the configuration "NameNode Service RPC Port" OR "dfs.namenode.servicerpc-address" on the name node from Cloudera manager, it works fine.

Related

Copy Json Flat file from local to HDFS

package com.Main;
import java.io.BufferedInputStream;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
public class Main {
public static void main(String[] args) throws IOException {
//Source file in the local file system
String localSrc = args[0];
//Destination file in HDFS
String dst = args[1];
//Input stream for the file in local file system to be written to HDFS
InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
//Get configimport org.apache.commons.configuration.Configuration;uration of Hadoop system
Configuration conf = new Configuration();
System.out.println("Connecting to -- "+conf.get("fs.defaultFS"));
//Destination file in HDFS
FileSystem fs = FileSystem.get(URI.create(dst), conf);
OutputStream out = fs.create(new Path(dst));
//Copy file from local to HDFS
IOUtils.copyBytes(in, out, 4096, true);
System.out.println(dst + " copied to HDFS");
}
}
AM getting following error message "Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at com.Main.Main.main(Main.java:22)"
I have Json file in my local , have to move that in HDFS
Ex:
{"Del":"Ef77xvP","time":1509073785106},
{"Del":"2YXsF7r","time":1509073795109}

Specify command line arguments to your program. You code snippet expects first argument to be source and next arguments to be destination.
For more details refer to What is "String args[]"? parameter in main method Java

Sphinx4 dictionary file not found

I am using Sphinx4 for custome home automatization software and I am stuck on using my custom dict file, for some reason it cant find the file even though I am sure I am using the correct path. This is my code :
package pccomone;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import edu.cmu.sphinx.api.Configuration;
import edu.cmu.sphinx.api.LiveSpeechRecognizer;
import edu.cmu.sphinx.api.SpeechResult;
import edu.cmu.sphinx.api.StreamSpeechRecognizer;
public class Main {
private static final String BODICPATH = "PCCom/src/resource/bodict.dict";
public static void main(String args[]) throws Exception{
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us");
configuration.setDictionaryPath(BODICPATH);
//configuration.setDictionaryPath("resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict"); use only for large commands!!!!!!!!!
configuration.setLanguageModelPath("resource:/edu/cmu/sphinx/models/en-us/en-us.lm.bin");
LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer(configuration);
recognizer.startRecognition(true);
SpeechResult result;
while((result = recognizer.getResult()) != null){
String command = result.getHypothesis();
if(command.equalsIgnoreCase("stop")){
recognizer.stopRecognition();
System.exit(0);
}
System.out.println(command);
}
}
}
I didn't come far since I can't get it to use my custom dict file and the included dict file is way to large for the software to work fast and accurate.
This is the Error,
Exception in thread "main" java.lang.RuntimeException: Allocation of
search manager resources failed
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:247)
at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:103)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:164)
at edu.cmu.sphinx.api.LiveSpeechRecognizer.startRecognition(LiveSpeechRecognizer.java:47)
at pccomone.Main.main(Main.java:26)
Caused by: java.io.FileNotFoundException: PCCom\src\resource\bodict.dict (The system cannot find the path
specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at java.io.FileInputStream.(Unknown Source)
at sun.net.www.protocol.file.FileURLConnection.connect(Unknown Source)
at sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown
Source)
at java.net.URL.openStream(Unknown Source)
at edu.cmu.sphinx.linguist.dictionary.TextDictionary.allocate(TextDictionary.java:180)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:332)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:243)
... 4 more

Java - Get data from website doesn't work (403 Error)

I'm trying to make a program that get data from here but an error appear (403 error)
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
public class Test {
public static void main(String[] args) throws IOException {
URL urlObject;
String codigo;
try{
urlObject=new URL("http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html");
InputStreamReader isr = new InputStreamReader(urlObject.openStream());
BufferedReader br=new BufferedReader(isr);
while((codigo=br.readLine())!=null)
System.out.println(codigo);
br.close();
}
catch(MalformedURLException e){
e.printStackTrace();
}
catch(IOException e){
e.printStackTrace();
}
}
}
When I run the program this error appear:
java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at Test.Test.main(Test.java:17)
The purpose of the program it's get the price of the product and print it with a System.out.println, how can I do that?

I have just tested with curl it works, but if I set the User-Agent used by Java by default I get this 403 HTTP error. It seems that the web master of this website doesn't like Java :-)
To work around this, simply set another User-Agent by doing this:
urlObject=new URL("http://www.pccomponentes.com/intel_core_i5_6600_3_3ghz_box.html");
URLConnection c = urlObject.openConnection();
c.setRequestProperty("User-Agent", "<put a the user agent of your choice here>");
InputStreamReader isr = new InputStreamReader(c.getInputStream());
If you don't know which User-Agent to use, use the one of your browser that you can get from here

Create and write a file into hdfs from my local machine

I have two systems connected in the network. One is hdfs running. I want to create a file and write data from my another machine.
package myorg;
import java.io.*;
import java.util.*;
import java.net.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class Write1{
public static void main (String [] args) throws Exception{
try{
System.out.println("Starting...");
Path pt=new Path("hdfs://10.236.173.95:8020/user/jfor/out/gwmdfd");
FileSystem fs = FileSystem.get(new Configuration());
BufferedWriter br=new BufferedWriter(new OutputStreamWriter(fs.create(pt,true)));
// TO append data to a file, use fs.append(Path f)
String line;
line="Disha Dishu Daasha dfasdasdawqeqwe";
System.out.println(line);
br.write(line);
br.close();
}catch(Exception e){
System.out.println("File not found");
}
}
}
I compiled it using
javac -classpath hadoop-0.20.1-dev-core.jar -d Write1/ Write1.java
Created a jar using
jar -cvf Write1.jar -C Write1/ .
Run command
hadoop jar Write1.jar myorg.Write1
If i run this, i am getting
starting...
File not found
What could be the reason? If i run this program, in my hadoop machine, it works fine [I replaced ip with localhost].
Error is at BufferedWriter line. It says "File Not found". what does it mean? I used fs.creat. Then it should create if it doesn't exist. Isn't?
java.lang.IllegalArgumentException: Wrong FS: hdfs://10.72.40.68:8020/user/jfor/..... expected localhost:8020
So i modified the following line
FileSystem fs = FileSystem.get(new URI("hdfs://<ip>:8020"),new Configuration());
It says Connection refused. What could be the reason

About detecting proxy setting using rome api

The below provides an error when in lan behind a proxy ,but works properly outside lan/proxy.
Please let me know how I can rectify it
I used the code to detect proxy setting and it gave
Detecting Windows/IE proxy setting using Java
I am getting: proxy hostname : DIRECT No Proxy Does this mean I am not behind a Proxy Server?
I'm trying to use java rome-fetcher to acquire rss feeds for processing. Everything works fine when I have direct internet access.
However, I need to be able to run my application behind a proxy server.
The below provides an error when in lan ,but works properly outside lan
Exception in thread "main" java.net.ConnectException: Connection timed out: connect
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(Unknown Source)
import java.util.Properties;
import java.net.*;
import java.io.*;
import java.io.FileWriter;
import java.io.Writer;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Iterator;
import java.util.List;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.SyndFeedOutput;
import com.sun.syndication.io.XmlReader;
public class RomeLibraryExample {
#SuppressWarnings("unchecked")
public static void main(String[] args) throws Exception {
URL url = new URL("http://rss.cnn.com/rss/cnn_topstories.rss");
//System.setProperty("http.proxyHost", "DIRECT");
// System.setProperty("http.proxyPort", "8080");
HttpURLConnection httpcon = (HttpURLConnection)url.openConnection(Proxy.NO_PROXY);
// Reading the feed
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(httpcon));
List<SyndEntry> entries = feed.getEntries();
Iterator<SyndEntry> itEntries = entries.iterator();
while (itEntries.hasNext()) {
SyndEntry entry = itEntries.next();
System.out.println("Title: " + entry.getTitle());
System.out.println("Link: " + entry.getLink());
System.out.println("Author: " + entry.getAuthor());
System.out.println("Publish Date: " + entry.getPublishedDate());
System.out.println("Description: " + entry.getDescription().getValue());
System.out.println();
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java with HDFS file read/write - java

Related

Copy Json Flat file from local to HDFS

Sphinx4 dictionary file not found

Java - Get data from website doesn't work (403 Error)

Create and write a file into hdfs from my local machine

About detecting proxy setting using rome api

Categories

Resources