i took example from cloudera website to write a custom SerDe for parsing a file
http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
it seems a good example but when i create table with custom serde
ADD JAR <path-to-hive-serdes-jar>;
CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT<
text:STRING,
user:STRUCT<screen_name:STRING,name:STRING>,
retweet_count:INT>,
entities STRUCT<
urls:ARRAY<STRUCT<expanded_url:STRING>>,
user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
hashtags:ARRAY<STRUCT<text:STRING>>>,
text STRING,
user STRUCT<
screen_name:STRING,
name:STRING,
friends_count:INT,
followers_count:INT,
statuses_count:INT,
verified:BOOLEAN,
utc_offset:INT,
time_zone:STRING>,
in_reply_to_screen_name STRING
)
PARTITIONED BY (datehour INT)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/flume/tweets';
it executed perfectly fine but when i do
select * from tweets;
i am getting nothing so thats why i wanted to know if i can run hive in debug mode to see where it is getting failed
You better start hive shell by switching logger mode to DEBUG as follows, I hope you could find something useful from there.
hive --hiveconf hive.root.logger=DEBUG,console
Hive code can be debugged.This link may help you : https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode
Setting hive --hiveconf hive.root.logger=DEBUG,console may not always work because of company specific setup.
I ended up creating a hive-log4j.properties file in my home directory with following settings:
log4j.rootCategory=DEBUG,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
and started hive shell using CLASSPATH=$HOME hive which adds your home directory having hive-log4j.properties in front of the classpath and so is picked up.
Related
I am using p6spy to log the sql statements.we are using springboot/hibernate for Java ORM mapping.I see only select statements are getting loggged in spy.log.when insert statemnets are executing in the code I see only commmit is coming in the log but insert statements are not coming.
|connection|commit||
|connection|statement | select * from emp_id where id=1234
https://github.com/gavlyukovskiy/spring-boot-data-source-decorator
# Register P6LogFactory to log JDBC events
decorator.datasource.p6spy.enable-logging=true
# Use com.p6spy.engine.spy.appender.MultiLineFormat instead of com.p6spy.engine.spy.appender.SingleLineFormat
decorator.datasource.p6spy.multiline=true
# Use logging for default listeners [slf4j, sysout, file, custom]
decorator.datasource.p6spy.logging=file
# Log file to use (only with logging=file)
decorator.datasource.p6spy.log-file=spy.log
# Class file to use (only with logging=custom). The class must implement com.p6spy.engine.spy.appender.FormattedLogger
decorator.datasource.p6spy.custom-appender-class=my.custom.LoggerClass
# Custom log format, if specified com.p6spy.engine.spy.appender.CustomLineFormat will be used with this log format
decorator.datasource.p6spy.log-format=
# Use regex pattern to filter log messages. If specified only matched messages will be logged.
decorator.datasource.p6spy.log-filter.pattern=
# Report the effective sql string (with '?' replaced with real values) to tracing systems.
# NOTE this setting does not affect the logging message.
decorator.datasource.p6spy.tracing.include-parameter-values=true
I need to write my spark dataset to oracle database table. I am using dataset write method with append mode. But getting analysis exception,
when the spark job was triggered on cluster using spark2-submit command.
I have read the json file, flattened it and set into a dataset as abcDataset.
Spark Version - 2
Oracle Database
JDBC Driver - oracle.jdbc.driver.OracleDriver
Programming Language - Java
Dataset<Row> abcDataset= dataframe.select(col('abc').....{and other columns};
Properties dbProperties = new Properties();
InputStream is = SparkReader.class.getClassLoader().getResourceAsStream("dbProperties.yaml");
dbProperties.load(is);
String jdbcUrl = dbProperties.getProperty("jdbcUrl");
dbProperties.put("driver","oracle.jdbc.driver.OracleDriver");
String where = "USER123.PERSON";
abcDataset.write().format("org.apache.spark.sql.execution.datasources.jdbc.DefaultSource").option("driver", "oracle.jdbc.driver.OracleDriver").mode("append").jdbc(jdbcUrl, where, dbProperties);
Expected - to write into database but getting the error below -
org.apache.spark.sql.AnalysisException: Multiple sources found for jdbc (org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, org.apache.spark.sql.execution.datasources.jdbc.DefaultSource), please specify the fully qualified class name.;
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:670)
Do we need to set any additional property in spark submit command, as i am running this on cluster, or any step is missing ?
You need to use either abcDataset.write.jdbc or abcDataset.write.format("jdbc") when you are writing via jdbc from spark to rdbms.
# database
elastic#elastic:~/ELK/database$ sudo sqlite3 data.db
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table test(id integer primary key autoincrement, ip integer, res integer);
sqlite>
sqlite> insert into test (ip,res) values(200,500);
sqlite> insert into test (ip,res) values(300,400);
# aaa.conf
input{
sqlite{
path => "/home/elastic/ELK/database/data.db"
type => test
}
}
output{
stdout{
codec => rubydebug{}
}
}
elastic#elastic:~/ELK/logstash-5.1.1$ sudo bin/logstash -f aaa.conf
Sending Logstash's logs to /home/elastic/ELK/logstash-5.1.1/logs which is now configured via log4j2.properties
[2017-04-25T00:11:41,397][INFO ][logstash.inputs.sqlite ] Registering sqlite input {:database=>"/home/elastic/ELK/database/data.db"}
[2017-04-25T00:11:41,588][INFO ][logstash.pipeline ] Starting pipeline {"id"=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>125}
[2017-04-25T00:11:41,589][INFO ][logstash.pipeline ] Pipeline main started
[2017-04-25T00:11:41,632][ERROR][logstash.pipeline ] A plugin had an unrecoverable error. Will restart this plugin.
Plugin: <LogStash::Inputs::Sqlite path=>"/home/elastic/ELK/database/data.db", type=>"test", id=>"5545bd3bab8541578394a2127848be342094c195-1", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_1349faf2-3b33-40d0-b328-f588fd97ae7e", enable_metric=>true, charset=>"UTF-8">, batch=>5>
Error: Missing Valuefier handling for full class name=org.jruby.RubyObject, simple name=RubyObject
Jruby.RubyObject, simple name = RubyObject. I do not know how to handle that errors.
I solved this problem by installing logstash-input-jdbc plugin.
I think jdbc plugin is requirement of sqlite plugin.
So, plugin installation:
bin/logstash-plugin install logstash-input-jdbc
Hope this help!
If you used sqlite plugin 3.0.4 and hit this problem I would say it is probably a bug, I raised it at logstash forum https://discuss.elastic.co/t/sqlite-plugin-3-0-4-failed-to-start-and-it-seems-a-bug/150305
So you can just use jdbc plugin to get your sqlite data, e.g. https://github.com/theangryangel/logstash-output-jdbc/blob/master/examples/sqlite.md
BTW, if you check logstash-input-sqlite-3.0.4/lib/logstash/inputs/sqlite.rb it was quite simple actually. But I can't figure out why event = LogStash::Event.new("host" => #host, "db" => #db) failed in my case.
I am working on enabling globalization support in my DB.
I have done migrating character set to UTF (AL16UTF16).
After migration, I can pass Unicode characters from Java to Oracle and store in table's NVARCHAR2 column. Also I can retrieve from DB and pass to Java.
But, If I do a raise_application_error with the Unicode data. It sends the error message to java like below
; nested exception is java.sql.SQLException: ORA-20001: ¿¿¿ ¿¿¿¿¿¿¿¿¿
Can anyone tell me what's wrong? and how can I get the Unicode error messages in java?
Thanks in advance.
The problem is I have done character set migration using the below steps, but it doesn't work for me.
1.Backup the database.
2.Run CSSCAN command.
3.Restart the DB with RESTRICT mode.
4.Run CSALTER script.
5.Restart the DB.
After that I have tried using the below steps.
1.Take backup of the DB using expdp command.
2.Create a new database with required character set (Unicode AL32UTF8).
3.Import the backup dump file into the newly created DB.
That's all. It works!
Now I don't need to use NVARCHAR2 data type to store unicode data (VARCHAR2 itself stores Unicode). raise_application_error also works fine (sends error messages with Unicode data to Java).
Thanks.
My issue is related with Hive UDF,
I have created one UDF which convert String date to julian date , It's working fine when I execute select query but it throws an error while using command Create table temp as.
Create function convertToJulian as 'com.convertToJulian'
Using jar 'hdfs:/user/hive/'.
Execute Select Query :
SELECT name, date FROM custTable
WHERE name is not null and convertToJulian(date) < convertToJulia
(to_date(from_unixtime(unix_timestamp())));
OutPut :
converting to local hdfs:/user/hive/udf.jar
Added [/usr/local/hivetmp/amit.pathak/9381feb3-6c5f-469b-b6b1-
9af55abbdabd/udf.jar] to class path
Added resources: [hdfs:/user/hive/udf.jar]
It's working fine and providing me exact data what I need.
Now in second step I want to add this data in another new table so I added
CREATE Table trop
As
SELECT name, date FROM custTable
WHERE name is not null and convertToJulian(date) <
convertToJulian (to_date(from_unixtime(unix_timestamp())));
Output :
java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hivetmp/amit.pathak/9381feb3-6c5f-469b-b6b1-9af55abbdabd/udf.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
I am not able to find out why its fetching data from hdfs location
hdfs://localhost:54310/usr/local/hivetmp/amit.pathak/9381feb3-6c5f-469b-b6b1-9af55abbdabd/udf.jar
I also tried several ways like adding data manually in hdfs.
But hive generating random session id which create folder name with same session Id name.
I am able to fix above issue , now I have two ways to fix this, I have added my finding on my blog, you can refer this link Hive udf exception Fix