How to Connect Teradata using Pyspark

How to Connect Teradata using Pyspark - java

I am trying to connect teradata server through PySpark.
My CLI code is as below,
from pyspark.sql import SparkSession
spark=SparkSession.builder
.appName("Teradata connect")
.getOrCreate()
df = sqlContext.read
.format("jdbc")
.options(url="jdbc:teradata://xy/",
driver="com.teradata.jdbc.TeraDriver",
dbtable="dbname.tablename",
user="user1",password="***")
.load()
Which is giving error,
py4j.protocol.Py4JJavaError: An error occurred while calling o159.load.
: java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
To resolve this I think, I need to add jar terajdbc4.jar and `tdgssconfig.jar.
In Scala, to add jar we can use
sc.addJar("<path>/jar-name.jar")
If I use the same for PySpark, I am having error,
AttributeError: 'SparkContext' object has no attribute 'addJar'.
or
AttributeError: 'SparkSession' object has no attribute 'addJar'
How can I add jar terajdbc4.jar and tdgssconfig.jar?

Try following this post which explains how to add jdbc drivers to pyspark.
How to add jdbc drivers to classpath when using PySpark?
The above example is for postgres and docker, but the answer should work for your scenario.
Note, you are correct about the driver files. Most JDBC drivers are in a single file, but Teradata splits it out into two parts. I think one is the actual driver and the other (tdgss) has security stuff in it. Both files must be added to the classpath for it to work.
Alternatively, simply google "how to add jdbc drivers to pyspark".

Related

How to import mongo data in hive?

I am facing an issue.
So when I try to import mongo data to hive using the below command it is giving me an error.
CREATE EXTERNAL TABLE gok
(
id STRING,
name STRING,
state STRING,
email STRING) STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name","state":"state"}') TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/gokul_test.play_test');
Note:
The versions of the tools used are below:
Java JDK 8
Hadoop: 2.8.4
Hive: 2.3.3
MongoDB: 4.2
The jar versions are of below which has been moved to HADOOP_HOME/lib and HIVE_HOME/lib:
mongo-hadoop-core-2.0.2.jar
mongo-hadoop-hive-2.0.2.jar
mongo-java-driver-2.13.2.jar
So the error is
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hive/serde2/SerDe
I have tried by manually adding jars in hive then the error which I have received is below.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.com/mongodb/hadoop/hive/BSONSerDe
Both the errors are different.
let me know if you know any resolution or need more details.

You should add the jars to your hive session.
Which hive client are you using?
If you were using "beeline", you can add the full path of the jars before trying to create the table:
beeline !connect jdbc:hive2://localhost:10000 “” ””
So, as soon as your session is created, you must add the jars, using "add jar" and the full path of the jar file:
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-hive-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongo-hadoop-core-1.5.0-SNAPSHOT.jar;
add jar hdfs://sandbox.hortonworks.com:8020/tmp/udfs/mongodb-driver-3.0.4.jar;
So the next step is to drop/create the table
DROP TABLE IF EXISTS bars;
CREATE EXTERNAL TABLE bars
(
objectid STRING,
Symbol STRING,
TS STRING,
Day INT,
Open DOUBLE,
High DOUBLE,
Low DOUBLE,
Close DOUBLE,
Volume INT
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"objectid":"_id",
"Symbol":"Symbol", "TS":"Timestamp", "Day":"Day", "Open":"Open", "High":"High", "Low":"Low", "Close":"Close", "Volume":"Volume"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/marketdata.minibars');
source: https://community.cloudera.com/t5/Support-Questions/Mongodb-with-hive-Error-return-code-1-from-org-apache-hadoop/td-p/138161

It looks like the mongo-hadoop-hive-<version>.jar is not correctly added into the hive system.
Try adding the mongodb JAR using the below command:
ADD JAR /path-to/mongo-hadoop-hive-<version>.jar
More info: https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
Alternatively: you could also try to ingest the mongodb BSON data into hive in an AVRO format and then build tables in hive. Its a long process but it will get your job done. You will need to build a new connector for reading from mongo and converting it to avro format.

Unable to execute Snowflake PUT Command through Java

We are trying to load JSON files using Java from a file location to Snowflake Named Stage. Currently, the PUT command only works for ODBC and not JDBC. So is there any way to execute PUT command using Java code?
Thanks

The Snowflake JDBC Driver does support use of PUT statements for local file uploads. The following java statement is considered a valid query and the file is uploaded:
statement.executeQuery("PUT file:///tmp/foo.json #JSONSTAGE/ overwrite=true");
Running it under a logger produces logs such as the following (logs here are from JDBC driver version v3.12.2):
n.s.c.jdbc.SnowflakeFileTransferAgent$1 FINE call:778 - filePath: /tmp/foo.json
n.s.c.jdbc.SnowflakeFileTransferAgent FINE uploadFiles:1751 - Done with uploading
The JDBC driver also supports a more efficient way of uploading a stream directly, documented here.

Impala driver class not being found through Jaydebeapi Connection

I recently switched over from using a PC to a Mac and now for whatever reason one of my Impala drivers that worked fine is no longer found when run in Python. I keep receiving this error every time I run the script : "java.lang.RuntimeException: Class com.cloudera.impala.jdbc41.Driver not found". Please see code snippet for my connection below.
c = jaydebeapi.connect
(jclassname='com.cloudera.impala.jdbc41.Driver',
url='jdbc:impala://cloudera-impala-proxy.live.bi.xxx/;AuthMech=3;ssl=1;',
driver_args=['xxx', self.dwh_password], jars='/Users/xxx/Desktop/ImpalaJDBC41 2.jar')
Any help or suggestions are appreciated, I feel like I'm going crazy trying to get this to work.

Did you check do you have the ImpalaJDBC***.jar in your new machine.
Please check properly weather it's available at classpath/build path or not.
Edit:
You can use hive jdbc jar as well to connect with impala , just use the port of impala rather hive in jdbc url.

Looking at this error means your jar is corrupt.
First check your impalaJDBC jar
java -jar ImpalaJDBC<version>.jar
If it gives you error that means your jar is corrupt.
Download the correct jar from cloudera

Unable to connect Hive with MongoDB using mongo-hadoop connector

I am trying to install and configure hive with mongo-hadoop-core 2.0.2, for the first time. I have installed hadoop 2.8.0, Hive 2.1.1 and MongoDB 3.4.6. and everything works fine when running individually.
My problem is, I am not able to connect MongoDB with Hive. I am using mongo-Hadoop connector for this as mentioned here https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage
The required jars are added to Hadoop and Hive lib. Even I add them in hive.sh or runtime from hive console.
I am getting error while executing Create table query
My Query is
CREATE EXTERNAL TABLE testHive
(
id STRING,
name STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/hiveDb.testHive');
And I get the following error
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/hadoop/io/BSONWritable
hive> ERROR hive.ql.exec.DDLTask - java.lang.NoClassDefFoundError: com/mongodb/hadoop/io/BSONWritable
at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:132)
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:537)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:424)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:411)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:279)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:261)
It shows that com/mongodb/hadoop/io/BSONWritable class is not in classpath but I have added the required(mongo-hadoop-core.jar) jar and class are present in the jar.
The version of jars I am using
mongo-hadoop-core 2.0.2,
mongo-hadoop-hive 2.0.2,
mongo-java-driver 3.0.2
Thanks

You need to register jars explicitly. In your Hive script, use ADD JAR commands to include these JARs (core, hive, and the Java driver), e.g., ADD JAR /path-to/mongo-hadoop-hive-<version>.jar;.
If you are running from Hive shell, use like this.
hive> ADD JAR /path-to/mongo-hadoop-hive-<version>.jar;
Then execute your query.

MySQL to PostgreSQL migration: mysql connector

I am trying to migrate from MySQL to PostgreSQL and I have a Java-related problem that I am not able to fix. Full disclosure: I know little or nothing about Java, but the migration uses a Java-based script, so for me it becomes a configuration problem.
Short version of the problem:
The migration tool throws this exception:
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
mysql-connector-java-5.0.8-bin.jar is already in the "JAVA_HOME\jre\lib\ext" directory, and I don't know how to solve this depencency problem.
Long version of the problem:
I was trying to migrate from MySQL to PostgreSQL. I checked the official postgresql documentation and I chose the free tool from entreprisedb (that can be downloaded here) to start the migration.
From the installation readme, they tell you that the mysql connector is not installed by default, but they also tell you the steps to solve this problem:
To enable MySQL connectivity, download MySQL's freely available JDBC driver from:
http://www.enterprisedb.com/downloads/third-party-jdbc-drivers
Place the mysql-connector-java-5.0.8-bin.jar file in the "JAVA_HOME\jre\lib\ext" directory (in my case: "C:\Program Files\Java\jre1.8.0_60\lib\ext\mysql-connector-java-5.0.8-bin.jar").
After configuring the tool properly and executing the .bat, this is the error I get:
Connecting with source MySQL database server...
MTK-11009: Error Connecting Database "MySQL Server"
DB-null: java.sql.SQLException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
Stack Trace:
com.edb.MTKException: MTK-11009: Error Connecting Database "MySQL Server"
at com.edb.dbhandler.mysql.MySQLConnection.<init>(MySQLConnection.java:48)
at com.edb.common.MTKFactory.createMTKConnection(MTKFactory.java:250)
at com.edb.MigrationToolkit.createNewSourceConnection(MigrationToolkit.java:5982)
at com.edb.MigrationToolkit.initToolkit(MigrationToolkit.java:3346)
at com.edb.MigrationToolkit.main(MigrationToolkit.java:1700)
Caused by: java.sql.SQLException: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at com.edb.Utility.processException(Utility.java:327)
at com.edb.dbhandler.mysql.MySQLConnection.<init>(MySQLConnection.java:47)
... 4 more
...which, to my understanding, probably means that mysql-connector-java-5.0.8-bin.jar is not found.
All the links I've found online regarding the error are specific for Eclipse or other IDEs, so I have not yet been able to solve this dependency problem.

SOLUTION
With the help of a friend that masters Java, this is the solution he achieved:
To start looking for the problem, we opened the runMTK.bat. The execution line reads:
cscript //nologo "..\etc\sysconfig\runJavaApplication.vbs" "..\etc\sysconfig\edbmtk-49.config" "-Dprop=..\etc\toolkit.properties -classpath -jar edb-migrationtoolkit.jar %*"
So then we opened this runJavaApplication.vbs, and in order to know the JAVA_EXECUTABLE_PATH that the program was using, we add this line to the script:
Wscript.Echo "JAVA_EXECUTABLE_PATH = " & JAVA_EXECUTABLE_PATH
With that info, we discover that the script is using the Java folder under C:\Program Files (x86), instead of the one under C:\Program Files (where I dropped the mysql jar). So we copy the mysql-connector-java-5.0.8-bin.jar in the \ext folder of the x86, and now the script works.
Word of advice: the script is throwing errors in half of the exported tables, so all the hassle may not be worth it. BUT if anyone is interested in making this migration script work from A to Z (which has been quite a challenge), here are the details:
HOW TO
Free tool (from entreprisedb):
http://www.enterprisedb.com/downloads/postgres-postgresql-downloads
Extract the files from the zip and fun the installer (ppasmeta-9.5.0.5-windows-x64.exe) as administrator.
To enable MySQL connectivity, download MySQL's freely available JDBC driver from:
http://www.enterprisedb.com/downloads/third-party-jdbc-drivers
Place the mysql-connector-java-5.0.8-bin.jar file in the "JAVA_HOME\jre\lib\ext" directory (in my case: "C:\Program Files\Java\jre1.8.0_60\lib\ext\mysql-connector-java-5.0.8-bin.jar").
The Migration Toolkit documentation can be found:
here (online doc): https://www.enterprisedb.com/docs/en/9.4/migrate/toc.html
or here (pdf doc): http://get.enterprisedb.com/docs/Postgres_Plus_Migration_Guide_v9.5.pdf
First: modify C:\Program Files\PostgresPlus\edbmtk\etc\toolkit.properties (Info here):
SRC_DB_URL=jdbc:mysql://SOURCE-HOST-NAME/SOURCE-DB-NAME
SRC_DB_USER=********
SRC_DB_PASSWORD=********
TARGET_DB_URL=jdbc:edb://localhost:5444/DESTINATION-DB-NAME
TARGET_DB_USER=enterprisedb
TARGET_DB_PASSWORD=********
Then: execute C:\Program Files\PostgresPlus\edbmtk\bin\runMTK.bat (Info here).
runMTK.bat -sourcedbtype mysql -targetdbtype enterprisedb -allTables YOUR_DB_SCHEMA
// ...or with a limited subset of tables:
runMTK.bat -sourcedbtype mysql -targetdbtype enterprisedb -tables TABLE1,TABLE2,TABLE3 YOUR_DB_SCHEMA
In order to get this subset of tables from MySQL:
SELECT
GROUP_CONCAT(TABLE_NAME)
FROM
information_schema.tables
WHERE
TABLE_SCHEMA = 'your_db_name'

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Connect Teradata using Pyspark - java

Related

How to import mongo data in hive?

Unable to execute Snowflake PUT Command through Java

Impala driver class not being found through Jaydebeapi Connection

Unable to connect Hive with MongoDB using mongo-hadoop connector

MySQL to PostgreSQL migration: mysql connector

Categories

Resources