Update/replace tabledata in google bigquery via java coding - java

I am trying to update bigquery tabledata using java. WriteDisposition is an option according to my research. I am bit novice, couldn't get through. kindly help.
Being said that I have tried to insert using WriteChannelConfiguration which worked fine. Need to make changes to this code to update the table.
public class BigQryAPI {
public static void explicit() {
// Load credentials from JSON key file. If you can't set the GOOGLE_APPLICATION_CREDENTIALS
// environment variable, you can explicitly load the credentials file to construct the
// credentials.
try {
GoogleCredentials credentials;
File credentialsPath = new File(BigQryAPI.class.getResource("/firstprojectkey.json").getPath()); // TODO: update to your key path.
FileInputStream serviceAccountStream = new FileInputStream(credentialsPath);
credentials = ServiceAccountCredentials.fromStream(serviceAccountStream);
// Instantiate a client
BigQuery bigquery =
BigQueryOptions.newBuilder().setCredentials(credentials).build().getService();
System.out.println("Datasets:");
for (Dataset dataset : bigquery.listDatasets().iterateAll()) {
System.out.printf("%s%n", dataset.getDatasetId().getDataset());
}
//load into table
TableId tableId = TableId.of("firstdataset","firsttable");
WriteChannelConfiguration writeChannelConfiguration =
WriteChannelConfiguration.newBuilder(tableId).setFormatOptions(FormatOptions.csv()).build();
TableDataWriteChannel writer = bigquery.writer(writeChannelConfiguration);
String csvdata="zzzxyz,zzzxyz";
// Write data to writer
try {
writer.write(ByteBuffer.wrap(csvdata.getBytes(Charsets.UTF_8)));
} finally {
writer.close();
}
// Get load job
Job job = writer.getJob();
job = job.waitFor();
LoadStatistics stats = job.getStatistics();
System.out.printf("these are my stats"+stats);
String query = "SELECT Name,Phone FROM `firstproject-256319.firstdataset.firsttable`;";
QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();
// Print the results.
for (FieldValueList row : bigquery.query(queryConfig).iterateAll()) {
for (FieldValue val : row) {
System.out.printf("%s,", val.toString());
}
System.out.printf("\n");
}
}catch(Exception e) {System.out.println(e.getMessage());}
}
}

We can set write desposition while building the 'WriteChannelConfiguration'
WriteChannelConfiguration writeChannelConfiguration = WriteChannelConfiguration.newBuilder(table.tableId).setFormatOptions(FormatOptions.csv())
.setWriteDisposition(JobInfo.WriteDisposition.WRITE_TRUNCATE).build()
Details could be found here BigQuery API Docs

Related

Can we use cosmosContainer.queryItems() method to execute the delete query on cosmos container

I have a Java method in my code, in which I am using following line of code to fetch any data from azure cosmos DB
Iterable<FeedResponse<Object>> feedResponseIterator =
cosmosContainer
.queryItems(sqlQuery, queryOptions, Object.class)
.iterableByPage(continuationToken, pageSize);
Now the whole method looks like this
public List<LinkedHashMap> getDocumentsFromCollection(
String containerName, String partitionKey, String sqlQuery) {
List<LinkedHashMap> documents = new ArrayList<>();
String continuationToken = null;
do {
CosmosQueryRequestOptions queryOptions = new CosmosQueryRequestOptions();
CosmosContainer cosmosContainer = createContainerIfNotExists(containerName, partitionKey);
Iterable<FeedResponse<Object>> feedResponseIterator =
cosmosContainer
.queryItems(sqlQuery, queryOptions, Object.class)
.iterableByPage(continuationToken, pageSize);
int pageCount = 0;
for (FeedResponse<Object> page : feedResponseIterator) {
long startTime = System.currentTimeMillis();
// Access all the documents in this result page
page.getResults().forEach(document -> documents.add((LinkedHashMap) document));
// Along with page results, get a continuation token
// which enables the client to "pick up where it left off"
// in accessing query response pages.
continuationToken = page.getContinuationToken();
pageCount++;
log.info(
"Cosmos Collection {} deleted {} page with {} number of records in {} ms time",
containerName,
pageCount,
page.getResults().size(),
(System.currentTimeMillis() - startTime));
}
} while (continuationToken != null);
log.info(containerName + " Collection has been collected successfully");
return documents;
}
My question is that can we use same line of code to execute delete query like (DELETE * FROM c)? If yes, then what it would be returning us in Iterable<FeedResponse> feedResponseIterator object.
SQL statements can only be used for reads. Delete operations must be done using DeleteItem().
Here are Java SDK samples (sync and async) for all document operations in Cosmos DB.
Java v4 SDK Document Samples

Kafka Avro To BigQuery using Apache Beam in Java

Here is the scenario:
Kafka To BigQuery using Apache Beam. This is an alternative to BigQuerySinkConnector [WePay] using Kafka Connect.
I have been able to read Avro message from Kafka Topic. I am also able to print the contents to console accurately. I am looking for help with writing these KafkaRecords to BigQuery table.
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);
//Customer is an auto generated class from avro schema using eclipse avro maven plugin
// Read from Kafka Topic and get KafkaRecords
#SuppressWarnings("unchecked")
PTransform<PBegin, PCollection<KafkaRecord<String, Customer>>> input = KafkaIO.<String, Customer>read()
.withBootstrapServers("http://server1:9092")
.withTopic("test-avro")
.withConsumerConfigUpdates(ImmutableMap.of("specific.avro.reader", (Object)"true"))
.withConsumerConfigUpdates(ImmutableMap.of("auto.offset.reset", (Object)"earliest"))
.withConsumerConfigUpdates(ImmutableMap.of("schema.registry.url", (Object)"http://server2:8181"))
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder((Class)KafkaAvroDeserializer.class, AvroCoder.of(Customer.class));
// Print kafka records to console log
pipeline.apply(input)
.apply("ExtractRecord", ParDo.of(new DoFn<KafkaRecord<String,Customer>, KafkaRecord<String,Customer>>() {
#ProcessElement
public void processElement(ProcessContext c) {
KafkaRecord<String, Customer> record = (KafkaRecord<String, Customer>) c.element();
KV<String, Customer> log = record.getKV();
System.out.println("Key Obtained: " + log.getKey());
System.out.println("Value Obtained: " + log.getValue().toString());
c.output(record);
}
}));
// Write each record to BigQuery Table
// Table is already available in BigQuery so create disposition would be CREATE_NEVER
// Records to be appended to table - so write disposition would be WRITE_APPEND
// All fields in the Customer object have corresponding column names and datatypes - so it is one to one mapping
// Connection to BigQuery is through service account JSON file. This file has been set as environment variable in run config of eclipse project
// Set table specification for BigQuery
String bqTable = "my-project:my-dataset:my-table";
The current examples available - shows how to manually set a schema and assign field by field the values. I am looking for an automated way to infer the Customer Avro object and assign it to the columns directly without such manual field by field assignment.
Is this possible?
After much trial and error I was able to make the following work.
I would welcome review comments to share concerns / propose better solutions.
SchemaRegistryClient registryClient = new CachedSchemaRegistryClient(http://server2:8181,10);
SchemaMetadata latestSchemaMetadata;
Schema avroSchema = null;
try {
// getLatestSchemaMetadata takes the subject name which is topic-value format where "-value" is suffixed to topic
// so if topic is "test-avro" then subject is "test-avro-value"
latestSchemaMetadata = registryClient.getLatestSchemaMetadata("test-avro-value");
avroSchema = new Schema.Parser().parse(latestSchemaMetadata.getSchema());
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("IO Exception while obtaining registry data");
e.printStackTrace();
} catch (RestClientException e) {
// TODO Auto-generated catch block
System.out.println("Client Exception while obtaining registry data");
e.printStackTrace();
}
// Printing avro schema obtained
System.out.println("---------------- Avro schema ----------- " + avroSchema.toString());
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);
// Read from Kafka Topic and get KafkaRecords
// Create KafkaIO.Read with Avro schema deserializer
#SuppressWarnings("unchecked")
KafkaIO.Read<String, GenericRecord> read = KafkaIO.<String, GenericRecord>read()
.withBootstrapServers("http://server1:9092")
.withTopic(KafkaConfig.getInputTopic())
.withConsumerConfigUpdates(ImmutableMap.of("schema.registry.url", "http://server2:8181"))
.withConsumerConfigUpdates(ImmutableMap.of("specific.avro.reader", (Object)"true"))
.withConsumerConfigUpdates(ImmutableMap.of("auto.offset.reset", (Object)"earliest"))
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder((Class) KafkaAvroDeserializer.class, AvroCoder.of(avroSchema));
// Set Beam Schema
org.apache.beam.sdk.schemas.Schema beamSchema = AvroUtils.toBeamSchema(avroSchema);
// Print kafka records to console log
// Write each record to BigQuery Table
// Table is already available in BigQuery so create disposition would be CREATE_NEVER
// Records to be appended to table - so write disposition would be WRITE_APPEND
// All fields in the Customer object have corresponding column names and datatypes - so it is one to one mapping
// Connection to BigQuery is through service account JSON file. This file has been set as environment variable in run config of eclipse project
// Set table specification for BigQuery
String bqTable = "my-project:my-dataset:my-table";
p.apply(read)
.apply("ExtractRecord", ParDo.of(new DoFn<KafkaRecord<String,GenericRecord>, KV<String, GenericRecord>>() {
/**
*
*/
private static final long serialVersionUID = 1L;
#ProcessElement
public void processElement(ProcessContext c) {
KafkaRecord<String, GenericRecord> record = (KafkaRecord<String, GenericRecord>) c.element();
KV<String, GenericRecord> log = record.getKV();
System.out.println("Key Obtained: " + log.getKey());
System.out.println("Value Obtained: " + log.getValue().toString());
c.output(log);
}
}))
.apply(Values.<GenericRecord>create()).setSchema(beamSchema, TypeDescriptor.of(GenericRecord.class) ,AvroUtils.getToRowFunction(GenericRecord.class, avroSchema), AvroUtils.getFromRowFunction(GenericRecord.class))
.apply(BigQueryIO.<GenericRecord>write()
.to(tableSpec)
.useBeamSchema()
.withCreateDisposition(CreateDisposition.CREATE_NEVER)
.withWriteDisposition(WriteDisposition.WRITE_APPEND));
p.run().waitUntilFinish();
The above works with CREATE_IF_NEEDED also.

Pentaho java api job

I am currently using Java API to connect to Pentaho Repository.I want to know if we have any methods if a particular Pentaho file is of Job or Transformation type.
I am using samle code as below.Here if you see I am manually creating Jobmeta or Transmeta.
Is there any API to get the pentao job type.
Repository repository = new PentahoContext().initialize(repositoryName,
userName, password);
RepositoryDirectoryInterface directoryPublic = repository
.loadRepositoryDirectoryTree();
RepositoryDirectoryInterface directoryPublic1 = directoryPublic
.findDirectory(“/home");
JobMeta jobMeta = repository.loadJob(jobName, directoryPublic1, null,
null);
I don't think you can query for a single file, but you can use Repository.getJobAndTransformationObjects() or RepositoryDirectoryInterface.getRepositoryObjects() to return a list of objects in the specified directory. You can then iterate over that list looking for the object with that name, then call getObjectType() to see if it is equal to RepositoryObjectType.TRANSFORMATION or RepositoryObjectType.JOB:
// pseudo-Java code
repositoryObjects = repository.getJobAndTransformationObjects( directoryPublic1.getObjectId(), false );
for (RepositoryElementMetaInterface object : repositoryObjects) {
if(object.getName().equals("myFile")) {
if(object.getObjectType().equals(RepositoryObjectType.TRANSFORMATION) {
TransMeta transMeta = repository.loadTransformation(...);
}
else if(object.getObjectType().equals(RepositoryObjectType.JOB) {
JobMeta jobMeta = repository.loadJob(...);
}
}
}

Cannot retrieve Storage Gateway snapshots using Java API

I'm trying to grab a list of snapshots from our Storage Gateway and put them into a JTable. However, when I use the AWS Java API to retrieve a list of snapshots, I only am able to retrieve what appears to be the public snapshots published by Amazon. When I set the DescribeSnapshotsRequest.setOwnerIds() to include "self", the list is empty.
Here is the offending code:
private void createTable() {
Object[][] data = null;
String[] columns = new String[]{"Snapshot ID", "Owner ID", "Snapshot Date"};
DescribeSnapshotsRequest req = new DescribeSnapshotsRequest();
req.setOwnerIds(Arrays.<String>asList("self"));
try {
snapshots = ec2Client.describeSnapshots(req).getSnapshots();
data = new Object[snapshots.size()][3];
int i = 0;
for(Snapshot item : snapshots) {
data[i][0] = item.getSnapshotId();
data[i][1] = item.getOwnerId();
data[i][2] = item.getStartTime();
i++;
}
} catch(Exception e) {
System.out.println("Invalid Credentials!");
}
table = new JTable(data, columns);
table.setAutoCreateRowSorter(true);
}
The List snapshots is empty unless I either remove the DescribeSnapshotsRequest, or set the owner ID to "amazon".
Long story short, why can't I access my private snapshots from the Storage Gateway?
Figured it out. Turns out you have to explicitly define the EC2 endpoint. Somehow I missed that step.
Here is the list of endpoints:
http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region
AmazonEC2Client.setEndpoint("<Endpoint URL>");

Serializing RowMutation

Inside the Cassandra source trunk on Github (https://github.com/apache/cassandra), there's an example of writing data in examples/client_only/src/ClientOnlyExample.java:
private static void testWriting() throws Exception
{
// do some writing.
for (int i = 0; i < 100; i++)
{
RowMutation change = new RowMutation(KEYSPACE, ByteBufferUtil.bytes(("key" + i)));
ColumnPath cp = new ColumnPath(COLUMN_FAMILY).setColumn(("colb").getBytes());
change.add(new QueryPath(cp), ByteBufferUtil.bytes(("value" + i)), 0);
// don't call change.apply(). The reason is that is makes a static call into Table, which will perform
// local storage initialization, which creates local directories.
// change.apply();
StorageProxy.mutate(Arrays.asList(change), ConsistencyLevel.ONE);
System.out.println("wrote key" + i);
}
System.out.println("Done writing.");
}
I'm looking to serialize the data into a readable format (JSON) where the writes seem to happen, org.apache.cassandra.service.StorageProxy, inside of the method performWrite:
public static IWriteResponseHandler performWrite(IMutation mutation,
ConsistencyLevel consistency_level,
String localDataCenter,
WritePerformer performer)
throws UnavailableException, IOException
{
...
The IMutation parameter appears to be what I want, as RowMutation implements that class. I can get the table (keyspace), and the column families out, but can't seem to get the column name/values. If I'm inside of the mentioned method, how do I get that information from IMutation mutation?
// keyspace
String table = mutation.getTable();
// TODO won't work with batch?
UUID cfId = mutation.getColumnFamilyIds().iterator().next();
// column family name cfMetadata.cfName
CFMetaData cfMetadata = Schema.instance.getCFMetaData(cfId);
// row key
RowMutation data = new RowMutation(table, mutation.key());
String row = ByteBufferUtil.bytesToHex(data.key());
// column name/values ??
// data. ....
I looked in the row mutation source code where there was on same package a RowMutationSerializer which has method
serialize(RowMutation, DataOutputStream, int)
Could this serve for something?

Categories