Cannot validate serde : org.openx.data.jsonserde.jsonserde

Cannot validate serde : org.openx.data.jsonserde.jsonserde - java

I have written this query to create a table on hive. My data is initially in json format, so i have downloaded and build serde and added all jar required for it to run. But i am getting the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.openx.data.jsonserde.JsonSerDe
QUERY:
create table tip(type string,
text string,
business_id string,
user_id string,
date date,
likes int)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES("date.mapping"="date")
STORED AS TEXTFILE;

I too encountered this problem. In my case, I managed to fix this issue by adding json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar at hive command prompt as shown below:
hive> ADD JAR /usr/local/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar;
Below are the steps I have followed on Ubuntu 14.04:
1. Fire up Linux terminal and cd /usr/local
2. sudo git clone https://github.com/rcongiu/Hive-JSON-Serde.git
3. sudo mvn -Pcdh5 clean package
4. The serde file will be in
/usr/local/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7-SNAPSHOT-jar-with-dependencies.jar
5. Go to hive prompt and ADD JAR file as shown in Step 6.
6. hive> ADD JAR /usr/local/Hive-JSON-Serde/json-serde/target/json-serde-1.3.7- SNAPSHOT-jar-with-dependencies.jar;
7. Now create hive table from hive> prompt. At this stage, Hive table should be created successfully without any error.
Hive Version: 1.2.1
Hadoop Version: 2.7.1
Reference: Hive-JSON-Serde

You have to build the project cloned using the maven !
mvn install
in the directory /path/directory/Hive-JSON-Serd
here we are in /usr/local

Related

How to import mongo data in hive?

I am facing an issue.
So when I try to import mongo data to hive using the below command it is giving me an error.
CREATE EXTERNAL TABLE gok
(
id STRING,
name STRING,
state STRING,
email STRING) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name","state":"state"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/gokul_test.play_test');
Note:
The versions of the tools used are below:
Java JDK 8
Hadoop: 2.8.4
Hive: 2.3.3
MongoDB: 4.2
The jar versions are of below which has been moved to HADOOP_HOME/lib and HIVE_HOME/lib:
mongo-hadoop-core-2.0.2.jar
mongo-hadoop-hive-2.0.2.jar
mongo-java-driver-2.13.2.jar
So the error is
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hive/serde2/SerDe
I have tried by manually adding jars in hive then the error which I have received is below.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.com/mongodb/hadoop/hive/BSONSerDe
Both the errors are different.
let me know if you know any resolution or need more details.

You should add the jars to your hive session.
Which hive client are you using?
If you were using "beeline", you can add the full path of the jars before trying to create the table:
beeline !connect jdbc:hive2://localhost:10000 “” ””
So, as soon as your session is created, you must add the jars, using "add jar" and the full path of the jar file:
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-hive-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-core-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongodb-driver-3.0.4.jar;
So the next step is to drop/create the table
DROP TABLE IF EXISTS bars;
CREATE EXTERNAL TABLE bars
(
objectid STRING,
Symbol STRING,
TS STRING,
Day INT,
Open DOUBLE,
High DOUBLE,
Low DOUBLE,
Close DOUBLE,
Volume INT
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"objectid":"_id",
"Symbol":"Symbol", "TS":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minibars');
source: https://community.cloudera.com/t5/Support-Questions/Mongodb-with-hive-Error-return-code-1-from-org-apache-hadoop/td-p/138161

It looks like the mongo-hadoop-hive-<version>.jar is not correctly added into the hive system.
Try adding the mongodb JAR using the below command:
ADD JAR /path-to/mongo-hadoop-hive-<version>.jar
More info: https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
Alternatively: you could also try to ingest the mongodb BSON data into hive in an AVRO format and then build tables in hive. Its a long process but it will get your job done. You will need to build a new connector for reading from mongo and converting it to avro format.

Unable to connect Hive with MongoDB using mongo-hadoop connector

I am trying to install and configure hive with mongo-hadoop-core 2.0.2, for the first time. I have installed hadoop 2.8.0, Hive 2.1.1 and MongoDB 3.4.6. and everything works fine when running individually.
My problem is, I am not able to connect MongoDB with Hive. I am using mongo-Hadoop connector for this as mentioned here https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
The required jars are added to Hadoop and Hive lib. Even I add them in hive.sh or runtime from hive console.
I am getting error while executing Create table query
My Query is
CREATE EXTERNAL TABLE testHive
(
id STRING,
name STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/hiveDb.testHive');
And I get the following error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/hadoop/io/BSONWritable
hive> ERROR hive.ql.exec.DDLTask - java.lang.NoClassDefFoundError: com/mongodb/hadoop/io/BSONWritable
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:132)
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:537)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:424)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:411)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:279)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:261)
It shows that com/mongodb/hadoop/io/BSONWritable class is not in classpath but I have added the required(mongo-hadoop-core.jar) jar and class are present in the jar.
The version of jars I am using
mongo-hadoop-core 2.0.2,
mongo-hadoop-hive 2.0.2,
mongo-java-driver 3.0.2
Thanks

You need to register jars explicitly. In your Hive script, use ADD JAR commands to include these JARs (core, hive, and the Java driver), e.g., ADD JAR /path-to/mongo-hadoop-hive-<version>.jar;.
If you are running from Hive shell, use like this.
hive> ADD JAR /path-to/mongo-hadoop-hive-<version>.jar;
Then execute your query.

Spark submit: Table or view not found using jar

When I run HiveRead.java from intellij ide I can successfully run and get result. Then I created jar file ( It's a maven project ) , then I tried to run from IDE, it gave me
ClassLoaderResolver for class "" gave error on creation : {1}
Then I looked at SO answers and found I had to add datanulcues jars, I did something like this
java -jar /home/saurab/sparkProjects/spark_hive/target/myJar-jar-with-dependencies.jar --jars jars/datanucleus-api-jdo-3.2.6.jar,jars/datanucleus-core-3.2.10.jar,jars/datanucleus-rdbms-3.2.9.jar,/home/saurab/hadoopec/hive/lib/mysql-connector-java-5.1.38.jar
Then I got this error
org.datanucleus.exceptions.NucleusUserException: Persistence process has been specified to use a ClassLoaderResolver of name "datanucleus" yet this has not been found by the DataNucleus plugin mechanism. Please check your CLASSPATH and plugin specification.
Somewhere I found I should do spark-submit. So I did like this
./bin/spark-submit --class HiveRead --master yarn --jars jars/datanucleus-api-jdo-3.2.6.jar,jars/datanucleus-core-3.2.10.jar,jars/datanucleus-rdbms-3.2.9.jar,/home/saurab/hadoopec/hive/lib/mysql-connector-java-5.1.38.jar --files /home/saurab/hadoopec/spark/conf/hive-site.xml /home/saurab/sparkProjects/spark_hive/target/myJar-jar-with-dependencies.jar
Now I get new type of error
Table or view not found: `bigmart`.`o_sales`;
HELP ME !! :)
I have copied my hive-site.xml to /spark/conf, started hive-metastore service ( hiveserver2 --service metastore )
Here is HiveRead.Java code if anyone is interested.

Spark session is not able to read the hive directory.
Provide the hive-site.xml file path with spark-submit command as below.
For hortonworks - file path /usr/hdp/current/spark2-client/conf/hive-site.xml
pass it as --files /usr/hdp/current/spark2-client/conf/hive-site.xml in spark-submit command.

Spark with Java - Error: Cannot load main class from JAR

I am trying a simple movie recommendation machine learning program in spark.
Spark version:2.1.1
Java version:java 8
Scala version: Scala code runner version 2.11.7
Env: windows 7
Running these commands to start master and worker slaves
//start master
spark-class org.apache.spark.deploy.master.Master
//start worker
spark-class org.apache.spark.deploy.worker.Worker spark://valid ip:7077
I am trying a very simple movie recommendation code from here: http://blogs.quovantis.com/recommendation-engine-using-apache-spark/
I have updated code to :
SparkConf conf = new SparkConf().setAppName("Collaborative Filtering Example").setMaster("spark://valid ip:7077");
conf.setJars(new String[] {"C:\\Spark2.1.1\\spark-2.1.1-bin-hadoop2.7\\jars\\spark-mllib_2.11-2.1.1.jar"});
I cannot run this thru intelliJ
Running mvn clean install and copying the jar to folder does not work.
The command I used to run on :
bin\spark-submit --verbose –-jars jars\spark-mllib_2.11-2.1.1.jar –-class “com.abc.enterprise.RecommendationEngine” –-master spark://valid ip:7077 C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\spark-poc-1.0-SNAPSHOT.jar C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\ratings.csv C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\movies.csv 10
The error I see is:
C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7>bin\spark-submit --verbose --class "com.sandc.enterprise.RecommendationEngine" --master spark://10.64.98.101:7077 C:\Spark2.1.1\spark-2.1.1-
bin-hadoop2.7\spark-mllib-example\spark-poc-1.0-SNAPSHOT.jar C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-mllib-example\ratings.csv C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\spark-m
llib-example\movies.csv 10
Using properties file: C:\Spark2.1.1\spark-2.1.1-bin-hadoop2.7\bin\..\conf\spark-defaults.conf
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.executor.extraJavaOptions=-XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.memory=5g
Adding default property: spark.master=spark://valid ip:7077
Error: Cannot load main class from JAR file:/C:/Spark2.1.1/spark-2.1.1-bin-hadoop2.7/û-class
Run with --help for usage help or --verbose for debug output
If I give the --jar command, it gives the error:
Error: Cannot load main class from JAR file:/C:/Spark2.1.1/spark-2.1.1-bin-hadoop2.7/û-jars
Any ideas how I can submit this job to spark??

Is your Jar built correctly ?
Also you don't need to add double quotes for --class option value.

Running tests on travis using mysql

I've been trying for the last week or so to make integration tests work on travis for a school project. I've debugged a fair bit of the project but now I'm blocked and need external help.
To give a bit of context, so far, I've debugged the java project so that the tests can be launched from eclipse or from maven in command line. I've worked on the travis file so that a database is created, the database scripts run and the java tests launch. However, the tests fail on travis because of a "table missing" in the database.
This is a link to our repo.
This is the travis.yml file's code:
language : java
jdk:
- oraclejdk8
service:
- mysql
before_script:
- mysql -e 'DROP DATABASE IF EXISTS koalatest'
- mysql -e 'CREATE DATABASE IF NOT EXISTS koalatest;'
- mysql -u root --default-character-set=utf8 koalatest < backend/koalacal-backend/koalacal.sql
script: cd backend && cd koalacal-backend && mvn test -X
after_success:
- bash <(curl -s https://codecov.io/bash)
The java project that is being built and run by maven is located under rootfolder -> backend -> koalacal-backend.
Here is a link to the error log maven produces on travis.
This line seems to be the source of the error:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'koalatest.Calendar' doesn't exist
I have two hypothesis:
1- The sql script that creates all the tables is not being run properly by travis.
To test this hypothesis, I changed the name of the script called by travis. As expected, I got an error saying that travis can't find the file. So at least, I know that this line of code causes travis to look up at an sql file.
- mysql -u root --default-character-set=utf8 koalatest < backend/koalacal-backend/koalacal.sql
That being said, I have no idea if the file is run properly on the database.
For the sake of putting all relevant informations in this post, here is a link to the database script.
2- The tests can't connect properly to the database.
Here is the config file that contain the info regarding which database to connect to:
TestInstance=true
user=root
password=
serverName=localhost
databaseName=koalacal
portNumber=3306
testUser=root
testPassword=
testServerName=127.0.0.1
testDatabaseName=koalatest
testPortNumber=3306</code>
If the parameter TestInstance is set to true, the tests take the informations testUser, testPassword, testServerName, testDatabaseName and testPortNumber to connect to the relevant database.
I believe the connection informations currently contained in the config file match how the travis documentation says we need to connect to a mysql database. I tried to change the testUser to something invalid (like root3) and got error messages as expected.
Maybe somehow the tests can't connect to the database and don't produce a related error message, but I doubt it.
Can someone look at my problem and see if I've missed something obvious (or not)? I don't know what else to try and I don't want to be blocked one more week on a technical issue.

For anyone who may google travis mysql and has a similar error to the one I had, I solved my problem.
The error was caused by a case sensitivity issue. The java code tried to connect to tables like 'Calendar' and 'Event' while the sql script created the tables 'calendar' and 'event'.
It took a long time to troubleshoot this because the case sensitivity didn't pose any problem on my machine. Maven can run its tests properly without any issue. It's only on the travis servers that case sensitivity of the tables started to matter.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.