How to allow apache hive to handle multiple clients query - java

I have created an external table via hive from hbase. When someone loged in to shell and deploy some queries , no one else can run any query. when someone try to run a quesry it gives following error.
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
when first shell exits then queries can be run easily. So it means that my hive a cannot handle multiple clients. How to enables it to handle multiple clients

If you have not configured your Hive Metastore, its not possible for more than one user to access the Hive Server.
Embedded Metastore An embedded metastore is mainly used for unit tests. Only one process can connect to the metastore at a time, so it
is not really a practical solution but works well for unit tests
Check this link for more information.

Related

Cannot connect to H2 database

I have been getting struggle to connect H2 database from a Spring Boot app by using the following connection string as mentioned on Database URL Overview section:
spring.datasource.url=jdbc:h2:tcp://localhost:9092/~/test-db
I also tried many different combination for tcp (server mode) connection, but still get error e.g. "Connection is broken: "java.net.SocketTimeoutException: connect timed out: localhost:9092" when running Spring Boot app.
#SpringBootApplication
public class Application {
// code omitted
#Bean(initMethod = "start", destroyMethod = "stop")
public Server h2Server() throws SQLException {
return Server.createTcpServer("-tcp", "-tcpAllowOthers", "-tcpPort", "9092");
}
}
So, how can I fix this problem and connect to H2 database via server mode?
You seem to be a little confused.
H2 can run in two different 'modes'.
Local mode
Local mode means H2 'just works', and you access this mode with the file: thing in the JDBC connect URL. The JDBC driver itself does all the database work, as in, it opens files, writes data, it does it all. There is no 'database server' at all. Or, if you prefer, the JDBC driver is its own server though it opens no ports.
Server mode
In this case you need a (separate) JVM and separately fire up H2 in server mode and then you can use the same library (still h2.jar) to serve as a JDBC server. In this mode, the two things are completely separate - if you want, you can run h2.jar on one machine to be the server, and run the same h2.jar on a completely different machine just to connect to the other H2 machine. The database server machine does the bulk of the work, with the 'client' H2 just being the JDBC driver. H2 is no different than e.g. mysql or postgres in such a mode: You have one 'app' / JVM process that runs as a database engine, allowing multiple different processes, even coming from completely different machines halfway around the world if you want to, to connect to it.
You access this mode with the tcp: thing in the JDBC string.
If you really want, you can run this mode and still have it all on a single machine, even a single JVM, but why would you want to? Whatever made you think this will 'solve lock errors' wouldn't be fixed by running all this stuff on a single JVM. There are only two options:
You're mis-analysing the problem.
You really do have multiple separate JVM processes (either one machine with 2 java processes in the activity monitor / ps auxww output / task manager, or 2+ machines) all trying to connect to a single database in which case you certainly do need this, yes.
How to do server mode right
You most likely want a separate JVM that starts before and that hosts the h2 database; it needs to run before the 'client' JVMs (the ones that will connect to it) start running. Catalina is not the 'server' you are looking for, it is org.h2.tools.Server, and if it says 'not found' you need to fix your maven imports. This needs be a separate JVM (you COULD write code that goes: Oh, hey, there isn't a separate JVM running with the h2 server so I'll start it in-process right here right now, but that means that process needs to stay in the air forever, which is just weird. Hence, you want a separate JVM process for this).
You haven't explained what you're doing. But, let's say what you're doing is this:
I have a CI script that fires up multiple separate JVMs, some in parallel even, which runs a bunch of integration and unit tests in parallel.
Even though they run in parallel (or perhaps intentionally so), you all want to run this off of a single DB. This is usually a really bad idea (you want tests to be isolated; that running them on their own continues to behave identically. You don't want a test to fail in a way that can only be reproduced if you run the same batch of 18 separate tests using the same run code, where one unrelated test fails in a specific fashion, whilst it's Tuesday, a full moon, and Beethoven is playing in your music player, and it's warmer than 24º in the room affecting the CPU's throttling, of course. Which is exactly what tends to happen if you try to re-use resources in multiple tests!) – still, you somehow really want this.
... then, edit the CI script to first Launch a JVM that hosts a H2 server, and once that's up and running, presumably run a process that fills this database with test data, and once that's done, then run all tests in parallel, and once those are all done, shut down the JVM, and delete the DB file.
Exactly how to do the third part is a separate question - if you need help with that, ask a new question and name the relevant tool(s) you are using to run this stuff, paste the config files, etc.

Document created even after NoHostAvailableException

I am trying to execute multiple BatchSatements in parallel with ExecutorService of Java. I want to know whether my query is successfully executed.
I have gone through:
how do I find out if the update query was successful or not in Cassandra Datastax
It's saying if no exception is there, we can consider it successful. But I am getting NoHostAvailableException.
All host(s) tried for query failed (tried: *********************(com.datastax.driver.core.exceptions.OperationTimedOutException: [******************] Timed out waiting for server response))
But I can see my data in Cassandra. I want to know how can I know if my document is created successfully in Cassandra. Is there any way for it?
Batches are different in Cassandra in comparison to the relational databases. And they should be used only in limited number of the use cases, and they shouldn't be used for batching of inserts/updates to multiple partions, until it's really necessary (see "misuse of batches" doc).
Batch will be eventually replayed, even if you got an error back to the driver - this happens because batch is replicated to other nodes before execution. See following diagram for details.

H2 Database Auto Server mode : Accessing through web console remotely

I am fairly new to H2 Database. As a part of a PoC, I am using H2 database(version : 1.4.187) for mocking the MS SQL Server DB. I have one application, say app1 which generates the data and save into H2. Another application, app2, needs to read from the H2 database and process the data it reads. I am trying to use Auto Server mode so that even if one of the application is down, other one is able to read/write to/from the database.
After reading multiple examples, i found how to build the h2 url and shown as below:
jdbc:h2:~/datafactory;MODE=MSSQLServer;AUTO_SERVER=TRUE;
Enabled the tcp and remote access as Below:
org.h2.tools.Server.createTcpServer("-tcpAllowOthers","-webAllowOthers").start()
With this, I am able to write to the database. Now, I want to read the data using the h2-web-console application. I am able to do that from my local machine. However, I am not able to understand how I can connect to this database remotely from another machine.
My plant is to run these two apps in an ubuntu machine and I can monitor the data using the web console from my machine. Is it not possible with this approach?
How can I solve this ?
Or do I need to use server mode and explicitly start the h2 server? Any help would be appreciated.
By default, remote connections are disabled for H2 database for protection. To enable remote access to the TCP server, you need to start the TCP server using the option -tcpAllowOthers or the other flags -webAllowOthers, -pgAllowOthers
.
To start both the Web Console server (the H2 Console tool) and the TCP server with remote connections enabled, you will have to use something like below
java -jar /path/to/h2.jar -web -webAllowOthers -tcp -tcpAllowOthers -browser
More information can be found in the docs here and console settings can be configured from here
Not entirely sure but looking at the documentation and other questions answered previously regarding the same topic the url should be something like this:
jdbc:h2:tcp://<host>:<port>/~/datafactory;MODE=MSSQLServer;AUTO_SERVER=TRUE;
It seems that the host may not be localhost and the database may not be in memory
Is there a need for the H2 web console?
You can use a different SQL tool using the TCP server you have already started. I use SQuirreL SQL Client (http://squirrel-sql.sourceforge.net/) to connect to different databases.
If you need a web interface you could use Adminer (https://www.adminer.org/) which can connect to different database vendors, including MS SQL, which happens to be mode you're running H2. There is an Adminer Debian package that should work for Ubuntu.

What is the difference between the hive jdbc client and the hive metastore java api?

I was using hive jdbc but after that I came to know that there is hive metastore java api (here) by which you can again connect to hive and manipulate hive database.
But I was wondering that what exactly is the difference between these two ways.
Sorry if asked anything obvious but any information will be highly appreciated.
as far as I understand there are 2 ways to connect to Hive
using hive metastore server, which then connects in the background to a relational db such as mysql for schema manifestation. This runs on port 9083, generally.
hive jdbc server, called HiveServer2, which runs on port 10001, generally...
Now, in the earlier editions of hive, hiveserver2 used to be not so stable and in fact it's multi-threading support was also limited. Things have probably improved in that arena, I'd imagine.
So for JDBC api - yes, it would let you communicate using JDBC and sql.
For the metastore connectivity, there appear to be 2 features.
to actually run SQL queries - DML
to perform DDL operations.
DDL -
for DDL, the metastore APIs come in handy, org.apache.hadoop.hive.metastore.HiveMetaStoreClient HiveMetaStoreClient class can be utilized for that purpose
DML -
what I have found useful in this regard is the org.apache.hadoop.hive.ql.Driver https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/Driver.html hive.ql.Driver class
This class has a method called run() which lets you execute a SQL statement and get the result back.
for e.g. you can do following
Driver driver = new Driver(hiveConf);
HiveMetaStoreClient client = new HiveMetaStoreClient(hiveConf);
SessionState.start(new CliSessionState(hiveConf));
driver.run("select * from employee);
// DDL example
client.dropTable(db, table);
metastore in hive as the name indicates is a store for hive db's metadata.
This store is usually an RDBMS.
The metastore api supports interacting with the RDBMS to tinker/tweak the metadata and not the actual hive db/data.For normal usage you may never want/have to use these.I would think that these are meant for people working on creating toolsets to work with the metastore and not for normal day to day usage.

Understanding the real reason behind a Hive failure

I'm using a JDBC driver to run "describe TABLE_NAME" on hive. It gives me the following error:
NativeException: java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
return code 1 doesn't tell me very much. How do I figure out what the underlying reason is?
It's most likely because your Hive metastore is not setup properly. Hive uses a RDBMS metastore to store meta data about its tables. This includes things like table names, schemas, partitioning/bucketing/sorting columms, table level statistics, etc.
By default, Hive uses an embedded derby metastore which can only be accessed by one process at a time. If you are using that, it's possible that you have multiple sessions to Hive open that's causing this problem.
In any case, I would recommend you to set up a stand alone metastore for Hive. Embedded derby was chosen for its usability in running tests and a good out of the box metastore. However, in my opinion, it's not fit for production workflows. You can find instructions on how to configure MySQL as Hive metastore here.
Possibly you have another sesssion open. Since derby allows only one session per person.
You can check -
ps -wwwfu <your id>
kill the id which is running the hive connection.
It is because the table with the name you've specified is didn't exist in the database.
Try creating the table and again run the command. it will work. :)

Categories