I had a specific filtering problem (described here: Pig - How to manipulate and compare dates?), so as we told me, I decided to write my own filtering UDF. Here is the code:
import java.io.IOException;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;
import org.joda.time.*;
import org.joda.time.format.*;
public class DateCloseEnough extends FilterFunc {
int nbmois;
/*
* #param nbMois: if the number of months between two dates is inferior to this variable, then we consider that these two dates are close
*/
public DateCloseEnough(String nbmois_) {
nbmois = Integer.valueOf(nbmois_);
}
public Boolean exec(Tuple input) throws IOException {
// We're getting the date
String date1 = (String)input.get(0);
// We convert it into date
final DateTimeFormatter dtf = DateTimeFormat.forPattern("MM yyyy");
LocalDate d1 = new LocalDate();
d1 = LocalDate.parse(date1, dtf);
d1 = d1.withDayOfMonth(1);
// We're getting today's date
DateTime today = new DateTime();
int mois = today.getMonthOfYear();
String real_mois;
if(mois >= 1 && mois <= 9) real_mois = "0" + mois;
else real_mois = "" + mois;
LocalDate d2 = new LocalDate();
d2 = LocalDate.parse(real_mois + " " + today.getYear(), dtf);
d2 = d2.withDayOfMonth(1);
// Number of months between these two dates
String nb_months_between = "" + Months.monthsBetween(d1,d2);
return (Integer.parseInt(nb_months_between) <= nbmois);
}
}
I created a Jar file of this code from Eclipse.
I'm filtering my data with these lines of piglatin code:
REGISTER Desktop/myUDFs.jar
DEFINE DateCloseEnough DateCloseEnough('12');
experiences1 = LOAD '/home/training/Desktop/BDD/experience.txt' USING PigStorage(',') AS (id_cv:int, id_experience:int, date_deb:chararray, date_fin:chararray, duree:int, contenu_experience:chararray);
experiences = FILTER experiences1 BY DateCloseEnough(date_fin);
I'm launching my program with this linux command:
pig -x local "myScript.pig"
And I get this error:
2013-06-19 07:27:17,253 [main] INFO org.apache.pig.Main - Logging error messages to: /home/training/pig_1371652037252.log
2013-06-19 07:27:17,933 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/joda/time/ReadablePartial Details at logfile: /home/training/pig_1371652037252.log
I checked into the log file and I saw this:
Pig Stack Trace
ERROR 2998: Unhandled internal error. org/joda/time/ReadablePartial
java.lang.NoClassDefFoundError: org/joda/time/ReadablePartial
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:441)
at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:471)
at org.apache.pig.impl.PigContext.instantiateFuncFromAlias(PigContext.java:544)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.EvalFuncSpec(QueryParser.java:4834)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PUnaryCond(QueryParser.java:1949)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PAndCond(QueryParser.java:1790)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.POrCond(QueryParser.java:1734)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.PCond(QueryParser.java:1700)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.FilterClause(QueryParser.java:1548)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1276)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:893)
at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:682)
at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:320)
Caused by: java.lang.ClassNotFoundException: org.joda.time.ReadablePartial
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 24 more
I tried to modify my PIG_CLASSPATH variable but i figured out that this variable doesn't exist at all (some other pig scripts are working though).
Do you have an idea to solve te problem ?
Thanks.
At first, you need to tell Pig which jar you are using. See this answer: how to include external jar file using PIG. Configure build path to add it in eclipse is not enough. Eclipse will not help you generate the correct jar.
Secondly, String nb_months_between = "" + Months.monthsBetween(d1,d2); is wrong. You can use int nb_months_between = Months.monthsBetween(d1,d2).getMonths();. If you read the Months.toString, it returns "P" + String.valueOf(getValue()) + "M";. So you can not use this value and want to convert it to a int.
u need this package: org/joda/time/ReadablePartial
can find here: jarfinder
download the joda-time-1.5.jar. Add to your project, this to should resolve.
Related
I am trying to connect java with R using Rserve
Java: 1.8.0_151
R: 3.5.0
OS: Mac 10.13.4 HighSierra
To connect R with Java, I typed the following on RStudio
install.packages("Rserve")
library(Rserve)
Rserve(args="--no-save")
things went smooth and I was so happy about it.
Then I jumped back to Java (Java Eclipse so to speak) and continued typing. Here is what I've done on Eclipse
package rserve;
import org.rosuda.REngine.REXPMismatchException;
import org.rosuda.REngine.REngineException;
import org.rosuda.REngine.Rserve.RConnection;
import org.rosuda.REngine.Rserve.RserveException;
public class WordCloud1 {
public static void main(String[] args) throws REngineException,
REXPMismatchException {
RConnection c = new RConnection();
String path = "/Users/JinhoShin/Desktop/study/R/r_temp2";
String file = "seoul_new.txt";
c.parseAndEval("library(KoNLP)");
c.parseAndEval("useSejongDic()");
c.parseAndEval("library(wordcloud)");
c.parseAndEval("library(RColorBrewer)");
c.parseAndEval("setwd('" + path + "')");
c.parseAndEval("data1=readLines('" + file + "')");
c.parseAndEval("data2 = sapply(data1,extractNoun,USE.NAMES=F)");
c.parseAndEval("data3 = unlist(data2)");
c.parseAndEval("data3=gsub('seoul','',data3)");
c.parseAndEval("data3=gsub('request','',data3)");
c.parseAndEval("data3=gsub('place','',data3)");
c.parseAndEval("data3=gsub('transportation','',data3)");
c.parseAndEval("data3=gsub(' ','',data3)");
c.parseAndEval("data3=gsub('-','',data3)");
c.parseAndEval("data3=gsub('OO','',data3)");
c.parseAndEval("write(unlist(data3),'seoul_2.txt')");
c.parseAndEval("data4 = read.table('seoul_2.txt')"); ########this is what blows me up
c.parseAndEval("wordcount=table(data4)");
c.parseAndEval("palete = brewer.pal(9,'Set3')");
c.parseAndEval(
"wordcloud(names(wordcount),freq = wordcount,scale=c(5,1),rot.per=0.25, min.freq = 1," +
" random.order=F, random.color = T, colors=palete)");
c.parseAndEval("savePlot('0517seoul.png', type = 'png')");
c.parseAndEval("dev.off()");
c.close();
}
}
as you notice from the code
c.parseAndEval("data4 = read.table('seoul_2.txt')"); => at rserve.WordCloud1.main(WordCloud1.java:30)
I have no idea why it can't read my text file despite the fact that it could write that file.
This is what Java Eclipse console keeps showing me
Exception in thread "main" org.rosuda.REngine.REngineException: eval failed
at org.rosuda.REngine.Rserve.RConnection.parseAndEval(RConnection.java:499)
at org.rosuda.REngine.REngine.parseAndEval(REngine.java:108)
at rserve.WordCloud1.main(WordCloud1.java:30)
Caused by: org.rosuda.REngine.Rserve.RserveException: eval failed
at org.rosuda.REngine.Rserve.RConnection.eval(RConnection.java:261)
at org.rosuda.REngine.Rserve.RConnection.parseAndEval(RConnection.java:497)
... 2 more
and this is what RStudio keeps showing me
Error: long vectors not supported yet: qap_encode.c:36
Fatal error: unable to initialize the JIT
I tried everything I could do to resolve this issue, but still I am on the same spot.
I am trying to use the Date(int, int, int) constructor (per instructor requirements) and I am coming into some difficulty.
Initially I'm getting warnings because apparently this constructor is deprecated and additionally I am getting errors due to my usage of code.
I'll attach my code below. I tried using fileRead.nextInt() for the file Scanner and I also tried the method you see below with Integer.parseInt(fileRead.next()).
This is reading from a file that has text in the format:
firstName lastName, 4, 24, 2016, aStringOfTextPossiblyMultipleWords...
Where 4 is month, 24 is day, 2016 is year.
The errors I'm getting are...
Exception in thread "main" java.lang.NumberFormatException: For input string: " 4"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:569)
at java.lang.Integer.parseInt(Integer.java:615)
at BlogEntryTester.main(BlogEntryTester.java:59)
/NetBeans/8.1/executor-snippets/run.xml:53: Java returned: 1
BUILD FAILED (total time: 6 seconds)
And here is the code. The error is during runtime near the end of the code.
import java.util.Date;
import java.util.Scanner;
import java.io.*;
public class BlogEntryTester {
/**
* #param args the command line arguments
*/
public static void main(String[] args){
Date newDate = new Date();
BlogEntry BE1 = new BlogEntry();
BlogEntry BE2 = new BlogEntry("firstName", newDate, "This is the body of "
+ "blog entry number two. This is the last sentence.");
BlogEntry BE3 = new BlogEntry(BE2);
BE1.setUsername("randFirstName");
BE1.setDateOfBlog(newDate);
BE1.setBlog("This is less than 10 words...");
System.out.println(BE1.toString());
System.out.println(BE2.toString());
System.out.println(BE3.toString());
Scanner keyboard = new Scanner(System.in);
Scanner fileRead = null;
String fileName;
System.out.print("Enter the name of the file you wish to read from: ");
fileName = keyboard.next();
try{
fileRead = new Scanner(new FileInputStream(fileName));
System.out.println("> File opened successfully.");
fileRead.useDelimiter(",|\\n");
}
catch(FileNotFoundException e){
System.out.println("> File not found.");
System.exit(0);
}
BlogEntry newBlog = new BlogEntry();
newBlog.setUsername(fileRead.next()); // Reads username from file.
if(newBlog.getUsername().length() > 20){
System.out.println("> Error: Username read from file exceeds 20 "
+ "characters.");
}
newBlog.setDateOfBlog(new Date(Integer.parseInt(fileRead.next()),
Integer.parseInt(fileRead.next()),
Integer.parseInt(fileRead.next())));
newBlog.setBlog(fileRead.next()); // Reads the text of the blog.
System.out.println(newBlog.toString()); // Prints the data gathered from file.
}
}
Trim whitespace
As comments said, you must trim the SPACE character appearing in front of your digit 4. You could call replace( " " , "" ) on the String. Or use the Google Guava library for clearing whitespace.
sql vs util
Be aware of the java.sql.Date class which is intended to represent a date-only value. In contrast the java.util.Date class you are using represents a date plus a time-of-day.
For a date-only value, the sql.Date class is more appropriate if you cannot use java.time framework described next. But also know that class is a bad hack, extending from util.Date while instructing you to ignore that fact of inheritance and to ignore its embedded time-of-day that is adjusted to 00:00:00 UTC. Confusing? Yes. These old date-time classes are a bloody mess.
java.time
You said your instructor is requiring the Date class, but you should know that class is notoriously troublesome and not recommended.
You are using an old outmoded class, java.util.Date, that has been supplanted by the java.time framework built into Java 8 and later.
Instead use LocalDate for a date-only value with no time-of-day and no time zone.
LocalDate localDate = LocalDate.of( 2016 , 4 , 24 );
my UDF:
import java.text.SimpleDateFormat;
import java.util.Date;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
public class HoursDiff extends UDF {
//private = new Text();
public IntWritable evaluate(String date,String time)
{
String dateStart = "2014-12-01 00:00:00";
String currentdate=date+" "+time;
SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
Date d1 = null;
Date d2 = null;
try
{
d1 = format.parse(dateStart);
d2 = format.parse(currentdate);
long diff = d2.getTime() - d1.getTime();
long diffHours = diff / (3600000) % 24;
long diffDays = diff / (86400000);
int hours=(int)(diffDays*24+diffHours);
IntWritable hour=new IntWritable(hours);
return hour;
}
catch (Exception e)
{
e.printStackTrace();
}
return null;
}
}
I exported into /home/hadoop/mapreduce/HoursDiff.jar
I opened the hive shell:
add jar /home/hadoop/mapreduce/HoursDiff.jar;
create temporary function hoursdiff as HoursDiff;
when I am trying to execute the following command, im getting FileNotFoundException:
select hoursdiff(date,time) as hours from date_test;
STACK TRACE
create temporary function hoursdiff as 'HoursDiff';
OK
Time taken: 0.009 seconds
hive> select hoursdiff(date,time) as hours from date_test;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
15/10/11 15:17:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Execution log at: /tmp/hadoop/hadoop_20151011151616_2c15561f-7cd2-4012-8bd2-b7dfcf488432.log
java.io.FileNotFoundException: File does not exist: hdfs://172.16.253.17:54310/home/hadoop/mapreduce/HoursDiff.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://172.16.253.17:54310/home/hadoop/mapreduce/HoursDiff.jar)'
Execution failed with exit status: 1
Everything you done is correct, but it is searching in HDFS path, you registered with local path.
Copy the jar into HDFS location and try to register it with the HDFS path.
I hope you opened the hive terminal with HDFS user, so it is searching the path of HDFS.
Note: It will also accept the local path also to register the jar.
I am trying to load up my own UDF in pig. I have made it into a jar using eclipse's export function. I am getting this 1066 error when running my pig script. I am not sure B = .. as I can dump A, but I can not dump B.
Script
REGISTER myudfs.jar;
DEFINE HOUR myudfs.HOUR;
A = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, getRequset:chararray, status:int, port:int);
B = FOREACH A GENERATE HOUR(ip);
DUMP B;
Function
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
public class HOUR extends EvalFunc<String>
{
#SuppressWarnings("deprecation")
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try{
String str = (String)input.get(0);
return str.toUpperCase();
}catch(Exception e){
throw WrappedIOException.wrap("Caught exception processing input row ", e);
}
}
}
Running command
pig -x mapreduce 2.pig
Data Format
199.72.81.55 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245
| | | | |
ip date getRequest status port
Pig Stack Trace
ERROR 1066: Unable to open iterator for alias B
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:836)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:604)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:828)
... 12 more
I am extremely unfamiliar with pig, and any and all pointers would be greatly appreciated. I know this is a lot of information to look at, but I have had no luck in mutating any data in a UDF, and I am just not sure where I went wrong.
Thanks
i write this code to read json data , and there is an error when run the code
first this is the code i write ( i changed the code to correct the json string but the problem still exist )
import net.sf.json.JSONObject;
import net.sf.json.JSONSerializer;
public class defaults {
public static void main(String[] args) {
String jsonTxt = "{lhs: \"100 Euros\",rhs: \"128.551738 Australian dollars\",error: \"\",icc: true}";
JSONObject json = (JSONObject) JSONSerializer.toJSON( jsonTxt );
String title = json.getString("title");
System.out.println( "title: " + title );
}
}
and i have found this error when run the code
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/lang/exception/NestableRuntimeException
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(Unknown Source)
at java.security.SecureClassLoader.defineClass(Unknown Source)
at java.net.URLClassLoader.defineClass(Unknown Source).....
the error gone away if i remove lines that talks about json
You're missing the Apache Commons lang library from your classpath.
If you ever are stumped by a NoClassDefFoundError try plugging the class name into jarFinder - that will tell you jar files where the class can be found.
String jsonTxt = "{'lhs': '100 Euros','rhs':'128.551738 Australian dollars','error':'','icc': 'true'}";
JSONObject json = (JSONObject) JSONSerializer.toJSON( jsonTxt );
System.out.println( "lhs: " + json.getString("lhs") );
System.out.println( "rhs: " + json.getString("rhs") );
System.out.println( "error: " + json.getString("error") );
System.out.println( "icc: " + json.getString("icc") );
OUTPUT:
lhs: 100 Euros
rhs: 128.551738 Australian dollars
error:
icc: true
you can give the json string with double quotes(") or single quotes(') or key without quotes. All works.
you need following jars:
1. commons-lang-2.4.jar
2. ezmorph-1.0.jar
3. json-lib-0.9.jar
for adding the jars through eclipse:
1.right click on project folder
2.click on prperties
3.select "java build path"
4.select libraries tab
5.click on "Add External jars"
6.Browse your jars, select and click ok.
You have a hanging "," after 'true'.
And technically strings and property names in JSON should use double quotes, not single quotes.
You doesn't have Apache commons-lang in your runtime classpath.