Hive Jdbc is very slow - java

Following this guide, I connect to hive using jdbctemplate autheticated by kerberos. I am new to hive.
After getting connection, I try to execute some sql, but it is very slow to get all results.
while (resultSet.next()) {
// process each row
}
And I can see a lot of logs like below for each row got from resultSet:
org.apache.hive.jdbc.HiveQueryResultSet - Fetched row string:
org.apache.thrift.transport.TSaslTransport - writing data length: 100
org.apache.thrift.transport.TSaslTransport - CLIENT reading data length: 256
It seems that slowness is due to limit of transport size(256 bytes I guess in this case). If so, how can I increase the size?

Related

java.sql.SQLRecoverableException: No more data to read from socket reading from NoSQL

I am trying to read data from NoSQL and inserting data in to oracle. If i truncate oracle table and try to insert, it works fine. I am getting this below error if already data exists or if i stop in middle of read/write operation and try again from the beginning i am getting the same error.
xx-xx-xx xx:xx:xx SEVERE AnalyticsMigrate main Exceptionjava.sql.SQLRecoverableException: No more data to read from socket
java.sql.SQLRecoverableException: No more data to read from socket
at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:119)
at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
JDBC 12.2 and higher uses Java NIO Calls in blocking mode - when there are interrupts done by application. Previous JDBC releases uses stream-based I/O API calls which not affected by interrupts.
Please try to use
System.setProperty("oracle.jdbc.javaNetNio", "false");
before connecting and retry

Sql Server data extraction (java jdbc): process hangs during retrieving rows

I need to extract data from a remote Sql server database. I am using the mssql jdbc driver.
I noticed that often dwhen retrieving rows from the database the process suddenly hangs, giving no errors. It remains simply stuck and no more rows are processed.
The code to read from the database is the following:
String connectionUrl = "jdbc:sqlserver://10.10.10.28:1433;databaseName=MYDB;user=MYUSER;password=MYPWD;selectMethod=direct;sendStringParametersAsUnicode=false;responseBuffering=adaptive;";
String query = "SELECT * FROM MYTABLE";
try (Connection sourceConnection = DriverManager.getConnection(connectionUrl);
Statement stmt = sourceConnection.createStatement(SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY, SQLServerResultSet.CONCUR_READ_ONLY) ) {
stmt.setFetchSize(100);
resultSet = stmt.executeQuery(query);
while (resultSet.next()) {
// Often, after retrieving some rows, process remains stuck here
}
}
Usually the connection is established correctly, some rows are fetched, than at some point the process can become stuck in retrieving the next rows batch, giving no errors and not processing any new row. This happens some times, other times it completes succesfully.
AFAIK the only reason I can see is that at some point a connection problem occurs with the remote machine, but shouldn't I be notified of this from the driver?
I am not sure how I should handle these type of situations...is there anything I can do on my side to let the process complete even if there is a temporary connection problem with the remote server (of course if the connection is not recoverable there is nothing I can do)?
As another test, instead of the java jdbc driver I've tried the bcp utility to extract data from the remote database and even with this native utility I can observe the same problem: sometimes it completes succesfully, other times it retrieves some rows (say 20000) and then becomes stuck, no errors and no more rows processed.

Apache Tomcat JDBC Connection Pool bad performance on batch \ bulk inserts

I have recently incorporated the Apache Tomcat JDBC Connection Pool to my application (using MySQL DB). I tried using Apache DBCP before, but didn't like its results, and the tomcat implementation seemed to fit my needs even though I run a standalone java application and don't use tomcat at all.
Recently, I encountered a huge performance problem when executing batch (aka bulk ) insert queries.
I have a flow in which I insert ~2500 records to a table in a batched fashion. It takes forever when using the jdbc connection pool, compared to a few seconds when reverting back to opening a connection for each query (no pooling).
I wrote a small application that inserts 30 rows to the same table. It takes 12 seconds when pooling, and ~ 800 millis when not pooling.
Prior to using the connection pool, I used com.mysql.jdbc.jdbc2.optional.MysqlDataSource as my DataSource. The connection was configured with the following line:
dataSource.setRewriteBatchedStatements(true);
I'm quite sure that this is the core difference between the two approaches, but couldn't find an equivalent parameter in jdbc-pool.
MySql JDBC driver does not support batch operations. RewriteBatchedStatement is the best that you can get. Here the code from mysql PreparedStatement.java:
try {
statementBegins();
clearWarnings();
if (!this.batchHasPlainStatements && this.connection.getRewriteBatchedStatements()) {
if (canRewriteAsMultiValueInsertAtSqlLevel()) {
return executeBatchedInserts(batchTimeout);
}
if (this.connection.versionMeetsMinimum(4, 1, 0) && !this.batchHasPlainStatements && this.batchedArgs != null
&& this.batchedArgs.size() > 3 /* cost of option setting rt-wise */) {
return executePreparedBatchAsMultiStatement(batchTimeout);
}
}
return executeBatchSerially(batchTimeout);
} finally {
this.statementExecuting.set(false);
clearBatch();
}
It is one of the reason why I do not like MySql and prefer Postgres
EDIT:
You should combine connection pool, batch operation, and RewriteBatchedStatement option. You can set RewriteBatchedStatement option through jdbc url parameter: jdbc:mysql://localhost:3307/mydb?rewriteBatchedStatements=true

Query timeout on MS SQL Server

I have a Java webapp accessing MSSQL database on a MS SQL Server 2012 running on the same machine.
Some of the queries fail after exactly 3 seconds with:
com.microsoft.sqlserver.jdbc.SQLServerException: The query has timed out.
It happens a couple of times a day, in the mornings, when the app is not heavy loaded.
On average the queries took less than 50 ms.
I'm using Microsoft JDBC Driver 4.0 and the queries will fail after exactly 3 seconds even if I use statement.setQueryTimeout(0);
Remote query timeout on the server is set to its default value (600s).
Any idea why the queries fail after 3s?
Edit:
Here are some of the queries:
UPDATE CampaignCalls SET note = 'Short Text' WHERE (saveTime IS NULL) AND (agent = ?)
This one updates no more than 50 rows
INSERT INTO CampaignCustomers (campaignId, clientId, completed, callTime)
SELECT ?, clientId, 0, callTime
FROM CampaignCustomers WITH (NOLOCK) WHERE campaignId = ?
This one copies no more than 1500 rows.
The connection to the server doesn't break. I'm reusing it a moment later with no problems.
I am wondering why the 3 seconds? Is there any other timeout setting I am not seeing? Even if the table is locked for some reason, why is the query timing out after exactly 3s?
Thank you all!

Problems logging into MSSQL server from Java

I am trying to connect to MSSQL server 2008 on my localhost, but I am getting Errors
WARNING: ConnectionID:2 Prelogin error: host 127.0.0.1 port 1434 Error reading prelogin response: Connection reset
this error repeats like 20 times very quickly, then i get
com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
I suspect this is some wrong configuration of server. I can connect to external servers with no issues. I can also connect via management studio with no problems.
Please help me, I am getting sick of this issue :|
1) TCP protocol for MSSQL 2008 is disabled by default. You need to enable it from SQL Configuration Manager (in the same [Start], Programs folder as MSSQL):
http://msdn.microsoft.com/en-us/library/ms190425%28v=sql.105%29.aspx
2) To use TCP connections (even on localhost!) you need to allow MSSQL in Windows Firewall:
http://technet.microsoft.com/en-us/library/cc646023.aspx
3) You don't necessarily need to use TCP/IP in order to access MSSQL:
http://msdn.microsoft.com/en-us/library/ms187892%28v=sql.105%29.aspx
'Hope that helps!
To connect to MSSQL Server from a Java application, you need to use the JDBC API. The JDBC API provides classes and methods that connect to the database, load the appropriate driver, send SQL queries, retrieve results etc.
HOW TO CONNECT TO THE DATABASE A ‘Connection’ object represents a connection with a database. To establish the connection, use the method ‘DriverManager.getConnection’. This method takes a string containing a URL which represents the database we are trying to connect to. Below is the sample code for establishing a connection:
private String DATABASE_URL = "jdbc:odbc:embedded_sql_app"; // establish connection to database
Connection connection = DriverManager.getConnection( DATABASE_URL,"sa","123" );
Detailed discussion about the Database URL and how to create it can be found in the resource provided at the end of this post.
QUERYING THE DATABASE The JDBC API provides three interfaces for sending SQL statements to the database, and corresponding methods in the ‘Connection’ interface create instances of them. 1. Statement - created by the ‘Connection.createStatement’ methods. A ‘Statement’ object is used for sending SQL statements with no parameters.
2. PreparedStatement - created by the ‘Connection.prepareStatement methods’. A ‘PreparedStatement’ object is used for precompiled SQL statements. These can take one or more parameters as input arguments (IN parameters).
3. CallableStatement - created by the ‘Connection.prepareCall’ methods. ‘CallableStatement’ objects are used to execute SQL stored procedures from Java database applications.
RETRIEVING THE RESULT A ‘ResultSet ‘is a Java object that contains the results of executing a SQL query. The data stored in a ‘ResultSet’ object is retrieved through a set of get methods that allows access to the various columns of the current row. The ‘ResultSet.next’ method is used to move to the next row of the ‘ResultSet’, making it the current row. The following code fragment executes a query that returns a collection of rows, with column ‘a’ as an ‘int’, column ‘b’ as a ‘String’, and column ‘c’ as a ‘float’:
java.sql.Statement stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("SELECT a, b, c FROM Table1");
while (rs.next()) { // retrieve and print the values for the current row
int i = rs.getInt("a");
String s = rs.getString("b");
float f = rs.getFloat("c");
System.out.println("ROW = " + i + " " + s + " " + f); }
This is just a brief introduction on how to interact with a database from Java. For more details on the items discussed above as well as information on passing parameters, executing stored procedures etc. please refer to the following resource: ( http://www.shahriarnk.com/Shahriar-N-K-Research-Embedding-SQL-in-C-Sharp-Java.html#Shahriar_N_Embedding_SQL_in_Java ) Here, you will also find information on how to interact with a database programmatically; i.e. without using SQL. Hope you find this useful.

Categories