We're trying to run ALTER DDL statements on existing Athena tables, previously created through a regular Java SDK StartQueryExecutionRequest without saving versions, so we don't run into the TABLE_VERSION Glue limit (see Glue limits link below). We had run our application for a while, unaware that previous versions were all being stored, and we ran into a hard limit in our AWS account. Specifically, we're adding partitions and also updating avro schemas programatically using the AWS Java SDK version 2 (2.10.66, if that matters).
It looks like we need to enable an option called SkipArchive using the Glue UpdateTable request to disable this previous versions functionality. However, the AWS documentation, while extensive, is all over the place, doesn't have clear examples, and I can't figure out how to simply run a basic UpdateTableRequest.
Anybody has any example/links that we could look at that details how to run this command through the Java API?
Doc listing the limits of table versions per table and per account in Glue - https://docs.aws.amazon.com/general/latest/gr/glue.html
https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html#API_UpdateTable_RequestSyntax
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/glue/AWSGlue.html#updateTable-com.amazonaws.services.glue.model.UpdateTableRequest-
The Answer might be bit late but here is a update_table request example for reference :
import boto3
glue=boto3.client('glue')
#List A Specific Table from Glue Catalog
"""
paginator = glue.get_paginator('get_table_versions')
response_iterator = paginator.paginate(DatabaseName='etl_framework',TableName='etl_master_withskiparchive')
for page in response_iterator:
print(page)
"""
#Sample glue.create_table method
#glue.create_table(DatabaseName='etl_framework',TableInput={'Name':'etl_master_20221013','StorageDescriptor':{'Columns':[{'Name':'id','Type':'bigint','Comment':''},{'Name':'groupid','Type':'bigint','Comment':''},{'Name':'tableName','Type':'string','Comment':''},{'Name':'schemaName','Type':'string','Comment':''},{'Name':'source','Type':'string','Comment':''},{'Name':'loadType','Type':'string','Comment':''},{'Name':'incrementalcolumn','Type':'string','Comment':''},{'Name':'active','Type':'bigint','Comment':''},{'Name':'createdate','Type':'date','Comment':''},{'Name':'startdatetime','Type':'timestamp','Comment':''},{'Name':'enddatetime','Type':'timestamp','Comment':''},{'Name':'sql_query','Type':'string','Comment':''},{'Name':'partition_column','Type':'string','Comment':''},{'Name':'priority','Type':'bigint','Comment':''},{'Name':'pk_columns','Type':'string','Comment':''},{'Name':'merge_query','Type':'string','Comment':''},{'Name':'bucket_Name','Type':'string','Comment':''},{'Name':'offset_value','Type':'bigint','Comment':''},{'Name':'audit','Type':'bigint','Comment':''}],'Location':'s3://pb-4-0-datalake-raw-layer/iceberg/etl_master_20221013','InputFormat':'','OutputFormat':'','Compressed':False,'NumberOfBuckets':0,'SerdeInfo':{},'BucketColumns':[],'SortColumns':[],'SkewedInfo':{},'StoredAsSubDirectories':False},'Parameters':{'metadata_location':'s3://pb-4-0-datalake-raw-layer/iceberg/etl_master_20221013/metadata/','table_type':'ICEBERG'},'TableType': 'EXTERNAL_TABLE'})
#Sample glue.update_table method
glue.update_table(DatabaseName='etl_framework',TableInput={'Name':'etl_master_20221011','StorageDescriptor':{'Columns':[{'Name': 'id', 'Type': 'bigint', 'Parameters': {'iceberg.field.current': 'true', 'iceberg.field.id': '1', 'iceberg.field.optional': 'true'}}],'Location':'s3://pb-4-0-datalake-raw-layer/iceberg/etl_master_20221010','InputFormat':'','OutputFormat':'','Compressed':False,'NumberOfBuckets':0,'SerdeInfo':{},'BucketColumns':[],'SortColumns':[],'SkewedInfo':{},'StoredAsSubDirectories':False},'Parameters':{'metadata_location':'s3://pb-4-0-datalake-raw-layer/iceberg/etl_master_20221012/metadata/','table_type':'ICEBERG'},'TableType': 'EXTERNAL_TABLE'},SkipArchive=True)
I have a folder with *.DDF and *.DAT files that are a pervasive/btrieve database. I am able to open and see the content of the database with DDF Periscope (ddf-periscope.com).
I can export data from each table individually using ddf periscope, and I would like to do the same thing using Java. Access the data in the DB and export them to a CSV file, POJOs or any way I can manipulate the data.
Is this possible?
You can use either JDBC or the JCL interfaces to access the data. You do still need the Pervasive engine but you can use Java. Here is a simple sample for the JDBC driver.
I don't have a JCL sample but there should be one in the Pervasive / Actian Java Class Library SDK.
I've been searching for a way to avoid hard coding my database credentials into my code base (mainly written in Java), but I haven't found many solutions. I read this post where they said a one way hash could be the answer. Is there another way of securely connecting to a database without running into the risk of someone decompiling your code?
Just to clarify, I'm not looking for code, rather a nudge in the right direction.
If you can used spring boot application, then you can configure using cloud config method. I have added some postgresql db connection details for your further reference. Please refer following link for spring boot cloud config. spring_cloud
spring.datasource.driverClassName=org.postgresql.Driver
spring.datasource.url=jdbc:postgresql://{{db_url}}:5432/{{db_name}}
spring.datasource.username=postgres
spring.datasource.password=
spring.datasource.maxActive=3
spring.datasource.maxIdle=3
spring.datasource.minIdle=2
spring.datasource.initialSize=2
spring.datasource.removeAbandoned=true
spring.datasource.tomcat.max-wait=10000
spring.datasource.tomcat.max-active=3
spring.datasource.tomcat.test-on-borrow=true
You could load a config file in your code. Define some kind of file, such as JSON or XML, and define all of your configurations in there. You could point to the file as a command line argument, or just hardcode the file path.
Here's a post talking about parsing JSON config in Java:
How to read json file into java with simple JSON library
You can refer to these post. They are basically just saying to either hash, store it in a property file or use an API. Some of the posts are not merely on Java but you can get ideas from them.
How can I avoid hardcoding the database connection password?
https://security.stackexchange.com/questions/36076/how-to-avoid-scripts-with-hardcoded-password
https://www.codeproject.com/Articles/1087423/Simplest-Way-to-Avoid-Hardcoding-of-the-Confidenti
The solution in our team, database as a service,other application use it's API to get database credentials,the request contains simple credentials like application name.
You have several options to avoid hard code values in your source code:
Properties using Advanced Platforms
Properties from Environment variables
Properties from SCM
Properties from File System
More details here:
https://stackoverflow.com/a/51268633/3957754
I have a finalized database in SQL SERVER containing 50+ tables in it and needed to connect it with Dropwizard Code.
I am new to JAVA so my conception about Migrations.xml is it is used to create the tables in database or if any change in database is needed it will be updated through migrations.xml.
So if i don't need any change in database (as told earlier it is finalized).
Can i skip this migrations.xml file?
Need some experts advice please.
If you are handling your database changes elsewhere, then you have no need for any migration xml files within your dropwizard project. It's an optional module, you don't need to use it. You don't even need to include the dropwizard-migrations dependency if you don't want to include database updates in your dropwizard project. You can still connect to your database fine within dropwizard. The docs provide examples using modules dropwizard-jdbi and dropwizard-hibernate.
To connect to your database, add the appropriate code the your java configuration file and yml config as explained in the docs.
jdbi
http://www.dropwizard.io/0.9.2/docs/manual/jdbi.html
hibernate
http://www.dropwizard.io/0.9.2/docs/manual/hibernate.html
As I want to store data on HDFS, so need to access the HBase, so how could I connect to HBase using Java APIs.
Please suggest.
Thanks.
HBase has Java API. Have a look at http://hbase.apache.org/apidocs/index.html
Two important classes are
1) HBaseAdmin
2) HTable
HBaseAdmin is admin API used to create/delete/alter tables
HTable is the client API used to put/get/scan records.
I write a simple framework to operate on hbase.
https://github.com/zhang-xzhi/simplehbase/
Simplehbase is a lightweight ORM framework between java app and hbase.
The main feature of it are following:
data type mapping: mapping java type to hbase's bytes back and forth.
hbase operation wrapping: warpping hbase's put get scan operation to simple java interface.
hbase query language: using hbase filter, simplehbase can use sql-like style to operate on hbase.
dynamic query: like myibatis, simplehbase can use xml config file to define dynamic query to operate on hbase.
insert update support: provide insert, update on top of checkAndPut.
multiple version support: provide interface to operation on hbase's multiple version.
hbase batch operation support.
hbase native interface support.
HTablePool management.
HTable count and sum.
Use HBase as source using using TableMapper class and store in hdfs