Understanding the real reason behind a Hive failure - java

I'm using a JDBC driver to run "describe TABLE_NAME" on hive. It gives me the following error:
NativeException: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
return code 1 doesn't tell me very much. How do I figure out what the underlying reason is?

It's most likely because your Hive metastore is not setup properly. Hive uses a RDBMS metastore to store meta data about its tables. This includes things like table names, schemas, partitioning/bucketing/sorting columms, table level statistics, etc.
By default, Hive uses an embedded derby metastore which can only be accessed by one process at a time. If you are using that, it's possible that you have multiple sessions to Hive open that's causing this problem.
In any case, I would recommend you to set up a stand alone metastore for Hive. Embedded derby was chosen for its usability in running tests and a good out of the box metastore. However, in my opinion, it's not fit for production workflows. You can find instructions on how to configure MySQL as Hive metastore here.

Possibly you have another sesssion open. Since derby allows only one session per person.
You can check -
ps -wwwfu <your id>
kill the id which is running the hive connection.

It is because the table with the name you've specified is didn't exist in the database.
Try creating the table and again run the command. it will work. :)

Related

NoHostAvailableException in Cassandra though host is online

I am using a DataStax Cassandra client version 2.1.1 and I connect to 10 different clusters. I use one session per cluster and we are doing inserts to different clusters in our server.
We have prepared statements to insert queries per host and when we need to do an insert to the particular cluster, we get the session object's connection and do the insert.
When we ran load test, two things we noted:
1) I do an insert to one host(X) for a long time(burst of calls,etc) - no issues are found
2) Do a burst call to two clusters(X,Y) -then most of the records inserted into the first cluster(Y) fails
Any reason for this?
Thanks,
Gopi
I found the issue for the driver misbehaving. The actual problem was with the data model used. My data model had a map(Collection) datatype and during high load, there were timeouts. When I changed my datatype from Map to Text and when I added COMPACT STORAGE when I created tables, then things worked fine.
Yes, it is weird, but it worked. An explanation to why this works would really help.
Thanks,
Gopi

How to allow apache hive to handle multiple clients query

I have created an external table via hive from hbase. When someone loged in to shell and deploy some queries , no one else can run any query. when someone try to run a quesry it gives following error.
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
when first shell exits then queries can be run easily. So it means that my hive a cannot handle multiple clients. How to enables it to handle multiple clients
If you have not configured your Hive Metastore, its not possible for more than one user to access the Hive Server.
Embedded Metastore An embedded metastore is mainly used for unit tests. Only one process can connect to the metastore at a time, so it
is not really a practical solution but works well for unit tests
Check this link for more information.

What is the difference between the hive jdbc client and the hive metastore java api?

I was using hive jdbc but after that I came to know that there is hive metastore java api (here) by which you can again connect to hive and manipulate hive database.
But I was wondering that what exactly is the difference between these two ways.
Sorry if asked anything obvious but any information will be highly appreciated.
as far as I understand there are 2 ways to connect to Hive
using hive metastore server, which then connects in the background to a relational db such as mysql for schema manifestation. This runs on port 9083, generally.
hive jdbc server, called HiveServer2, which runs on port 10001, generally...
Now, in the earlier editions of hive, hiveserver2 used to be not so stable and in fact it's multi-threading support was also limited. Things have probably improved in that arena, I'd imagine.
So for JDBC api - yes, it would let you communicate using JDBC and sql.
For the metastore connectivity, there appear to be 2 features.
to actually run SQL queries - DML
to perform DDL operations.
DDL -
for DDL, the metastore APIs come in handy, org.apache.hadoop.hive.metastore.HiveMetaStoreClient HiveMetaStoreClient class can be utilized for that purpose
DML -
what I have found useful in this regard is the org.apache.hadoop.hive.ql.Driver https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/Driver.html hive.ql.Driver class
This class has a method called run() which lets you execute a SQL statement and get the result back.
for e.g. you can do following
Driver driver = new Driver(hiveConf);
HiveMetaStoreClient client = new HiveMetaStoreClient(hiveConf);
SessionState.start(new CliSessionState(hiveConf));
driver.run("select * from employee);
// DDL example
client.dropTable(db, table);
metastore in hive as the name indicates is a store for hive db's metadata.
This store is usually an RDBMS.
The metastore api supports interacting with the RDBMS to tinker/tweak the metadata and not the actual hive db/data.For normal usage you may never want/have to use these.I would think that these are meant for people working on creating toolsets to work with the metastore and not for normal day to day usage.

How do I log SQL statements from JDBC calls without mySQL server running?

So, I'm trying to figure out how to log the SQL statements that would be run, without actually having an active mySQL server.
The problem I'm trying to solve is right now we're writing data to both a remote mySQL instance, as well as a local (archive data). We're logging to a local as a backup in case the remote becomes unreachable/goes down/etc.
So, what we'd like to do instead is log the SQL statements locally (we're going through Spring JDBC Template w/variable replacement), so not quite as easy as taking the SQL we piece together ourselves and write it to a file.
I did find log4jdbc which looks great, except we'd still need a local mySQL instance active, even if it's just using the blackhole engine. I did think maybe we could just use the log4jdbc on the remote server, but if the connection goes away, will the JDBCTemplate even try and run the queries on the underlying JDBC driver objects before getting the failure? Most pooling mechanisms will validate the connection before returning it, so it'd still just fail before the query had a chance to run & get logged.
So, who has any bright ideas?
Run the remote MySQL instance with bin logging enabled.
The binary logs can be backed up and if necessary converted into SQL using the mysqlbinlog command, to later restore a database.
Mark O'Connors solution is probably the best in general, but the way we solved this was to simply write out the data as CSV files formatted to be ready for import via a load data infile statement.

Copy a huge MySQL table from a remote to a local database

I have read-only access to a remote MySQL database, which contains a very large table (hundreds of millions of lines).
To get faster access to that table, I want to copy it to my local database.
What is the best way to do this?
"SELECT INTO OUTFILE" doesn't work, because I don't have the required permissions on the remote database.
I tried to use Java to SELECT all rows FROM the remote table, save them to a local text file, then use LOAD DATA INFILE; however, the select broke with
"Exception in thread "main" java.lang.OutOfMemoryError: Java heap space".
Use the mysqldump command on the remote database to extract the SQL statements of the database required. Then copy the extracted file to your local system and execute the sql file which will create the database in the local system.
Here is the mysqldump example
http://www.roseindia.net/tutorial/mysql/mysqlbackup/mysqldump.html
Try to set Synchronization, latest version of PHPMyAdmin provides an option to set synchronization. You need to set source DB as remote database and destination to your local database.
Setting up a PHP (and PHPMyAdmin too) on local machine is not a big task. if table is much bigger you may need to increase maximum execution time for phpmyadmin script.
Alternatively if you can access remote MySQL port then you can try to connect to remote db from your local machine as mysql -h remote_IP -u usernmae -pPassword. if it connects then you can definitely use mysqldump command on local machine. check this link
The problem with your Java program is likely to be because the MySQL JDBC driver stores the entire ResultSet in memory by default. With a huge table, this is highly likely to cause an OutOfMemoryError.
You can stop the MySQL driver from doing this by following the instructions in the ResultSet section of this page in the MySQL documentation (which I found via this blog post).
I was able to reproduce an OutOfMemoryError with a simple Java program that simply read each row out of a table with over 120 million rows. After making the changes suggested in the MySQL documentation, my Java program completed without any memory issues.

Categories