Using Spark Cassandra java Connector to append a table

Using Spark Cassandra java Connector to append a table - java

javaFunctions(recomm).writerBuilder("recommender", "recommendations_wkg",
mapToRow(Recommendations_wkg.class))
.saveToCassandra();
The code insert into the table but won't update it. If the column exist it inserts a new one. I want to update if I have different info.

Related

Write to dynamic partition Java-Spark

I've created the following table in Hive:
CREATE TABLE mytable (..columns...) PARTITIONED BY (load_date string) STORED AS ...
And I'm trying to insert data to my table with spark as follow:
Dataset<Row> dfSelect = df.withColumn("load_date","15_07_2018");
dfSelect.write().mode("append").partitionBy("load_date").save(path);
And also make the following configuration:
sqlContext().setConf("hive.exec.dynamic.partition","true");
sqlContext().setConf("hive.exec.dynamic.partition.mode","nonstrict");
And after I make the write command I see on HDFS the directory /myDbPath/load_date=15_07_2018, which contains the file that I've written but when I make query like:
show partitions mytable
or
select * from mytable where load_date="15_07_2018"
I get 0 records.
What happened and how can I fix this?
EDIT
If I run the following command in Hue:
msck repair table mytable
I solve the problem, how can I do it in my code?

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (say by using hadoop fs -put command (or) .save..etc), the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of the below commands
Meta store check command (msck repair table)
msck repair table <db.name>.<table_name>;
(or)
ALTER TABLE table_name ADD PARTITION commands on each of the newly added partitions.
We can also add partition by using alter table statement by using this way we need to add each and every newly created partition to the table
alter table <db.name>.<table_name> add partition(load_date="15_07_2018") location <hdfs-location>;
Run either of the above statements and then check the data again for load_date="15_07_2018"
For more details refer these links add partitions and msck repair table

Spring JDBC Template batchUpdate to update thousands of records in a tbale

I have an update query which I am trying to execute through batchUpdate method of spring jdbc template. This update query can potentially match 1000s of rows in EVENT_DYNAMIC_ATTRIBUTE table which needs to be get updated. Will updating thousands of rows in a table cause any issue in production database apart from timeout? like, will it crash database or slowdown the performance of entire database engine for other connections...etc?
Is there a better way to achieve this instead of firing single update query in spring JDBC template or JPA? I have the following settings for jdbc template.
this.jdbc = new JdbcTemplate(ds);
jdbc.setFetchSize(1000);
jdbc.setQueryTimeout(0); // zero means there is no limit
The update query:
UPDATE EVENT_DYNAMIC_ATTRIBUTE eda
SET eda.ATTRIBUTE_VALUE = 'claim',
eda.LAST_UPDATED_DATE = SYSDATE,
eda.LAST_UPDATED_BY = 'superUsers'
WHERE eda.DYNAMIC_ATTRIBUTE_NAME_ID = 4002
AND eda.EVENT_ID IN
(WITH category_data
AS ( SELECT c.CATEGORY_ID
FROM CATEGORY c
START WITH CATEGORY_ID = 495984
CONNECT BY PARENT_ID = PRIOR CATEGORY_ID)
SELECT event_id
FROM event e
WHERE EXISTS
(SELECT 't'
FROM category_data cd
WHERE cd.CATEGORY_ID = e.PRIMARY_CATEGORY_ID))

If it is one time thing, I normally first select the records which needs to be updated and put in a temporary table or in a csv, and I make sure that I save primary key of those records in a table or in a csv. Then I read records in batches from temporary table or csv, and do the update in the table using the primary key. This way tables are not locked for a long time and you can have fixed set of records added in the batch which needs update and updates are done using primary key so it will be very fast. And if any update fails then you know which records got failed by logging out the failed records primary key in a log file or in an error table. I have followed this approach many time for updating millions of records in the PROD database, as it is very safe approach.

MySqlSyntaxErrorException wrong Query

I'm tring to create a table in mysql from java desktop program but I obtain a MySqlSyntaxErrorException.
The query is :
CREATE TABLE FileXFascia(fila0 Integer,fila1 Integer,fila2 Integer,fila3 Integer) VALUES ('3','4','3','3')
Anyone knows where I'm wrong?
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'VALUES ('3','4','3','3')' at line 1

You need to split these as follows:
CREATE TABLE FileXFascia(fila0 Integer,fila1 Integer,fila2 Integer,fila3 Integer);
INSERT INTO FileXFascial (fila0, fila1, fila2, fila3) VALUES ('3','4','3','3');

In your question there are two different operations on a table, You are trying to create and insert data in single query even in a wrong way. First you need to create table then insert data into created table. Like below syntax.
create table tableName(col1 dataType,col2 dataType,col3 dataType,.......coln dataType);
After creation of table now you can insert data into table. Like below syntax.
insert into tableName(col1, col2,col3,......coln) values ('data1','data2','data3',......'datan');

update table by using custom sql in liferay

I just want to update my table using custom sql in liferay 6.1, I have table with primary key auto increment not null. So whenever I am trying to update It creates a new row and instead of updating data it generate a new row. Please help me to update data using custom sql and my database is mysql.
thanks
asif aftab

Import only specific rows from hsqldb backup

I'm trying to create a function in my java application, where the user could select a prior made backup but only import table-rows that aren't in the current database instance. With a MySql database I could dump my tables, rename them inside the .sql to create temporary tables when imported again, and then simply cross query all rows not in the DB. Any idea how I could acomplish something similar in hsqldb from within my java application?

You can do this:
open the backup database
create a text table that is a copy of the main table, e.g. CREATE TEXT TABLE yourtable_copy AS (SELECT * FROM yourtable)
set a file for the table SET TABLE yourtable_copy SOURCE 'filepath'
copy the data to the new table
set the source off with SET TABLE yourtable_copy SOURCE OFF
shutdown the backup database
open the main database
now do the same text table creation and source setting with the main database but do not copy the data, as the backup data is already there and will be opend
do your updates then turn the text source off in the main database
reference http://www.hsqldb.org/doc/2.0/guide/texttables-chapt.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using Spark Cassandra java Connector to append a table - java

javaFunctions(recomm).writerBuilder("recommender", "recommendations_wkg", mapToRow(Recommendations_wkg.class)) .saveToCassandra(); The code insert into the table but won't update it. If the column exist it inserts a new one. I want to update if I have different info.

Related

Write to dynamic partition Java-Spark

Spring JDBC Template batchUpdate to update thousands of records in a tbale

MySqlSyntaxErrorException wrong Query

update table by using custom sql in liferay

Import only specific rows from hsqldb backup

Categories

Resources