Google Dataproc API (through Java) does not submit Job to cluster

Google Dataproc API (through Java) does not submit Job to cluster - java

I was trying to get this section of code to submit a Hadoop job request based on this code sample:
import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.dataproc.v1.Job;
import com.google.cloud.dataproc.v1.JobControllerClient;
import com.google.cloud.dataproc.v1.JobControllerSettings;
import com.google.cloud.dataproc.v1.JobMetadata;
import com.google.cloud.dataproc.v1.JobPlacement;
import com.google.cloud.dataproc.v1.SparkJob;
import com.google.cloud.storage.Blob;
import com.google.cloud.storage.Storage;
import com.google.cloud.storage.StorageOptions;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class SubmitJob {
public static void submitJob() throws IOException, InterruptedException {
// TODO(developer): Replace these variables before running the sample.
String projectId = "your-project-id";
String region = "your-project-region";
String clusterName = "your-cluster-name";
submitJob(projectId, region, clusterName);
}
public static void submitJob(
String projectId, String region, String clusterName)
throws IOException, InterruptedException {
String myEndpoint = String.format("%s-dataproc.googleapis.com:443", region);
// Configure the settings for the job controller client.
JobControllerSettings jobControllerSettings =
JobControllerSettings.newBuilder().setEndpoint(myEndpoint).build();
// Create a job controller client with the configured settings. Using a try-with-resources
// closes the client,
// but this can also be done manually with the .close() method.
try (JobControllerClient jobControllerClient =
JobControllerClient.create(jobControllerSettings)) {
// Configure cluster placement for the job.
JobPlacement jobPlacement = JobPlacement.newBuilder().setClusterName(clusterName).build();
// Configure Spark job settings.
HadoopJob hadJob =
HadoopJob.newBuilder()
.setMainClass("my jar file")
.addArgs("input")
.addArgs("output")
.build();
Job job = Job.newBuilder().setPlacement(jobPlacement).setHadoopJob(hadJob).build();
// Submit an asynchronous request to execute the job.
OperationFuture<Job, JobMetadata> submitJobAsOperationAsyncRequest =
jobControllerClient.submitJobAsOperationAsync(projectId, region, job);
// THIS IS WHERE IT SEEMS TO TIMEOUT VVVVVVVV
Job response = submitJobAsOperationAsyncRequest.get();
// Print output from Google Cloud Storage.
Matcher matches =
Pattern.compile("gs://(.*?)/(.*)").matcher(response.getDriverOutputResourceUri());
matches.matches();
Storage storage = StorageOptions.getDefaultInstance().getService();
Blob blob = storage.get(matches.group(1), String.format("%s.000000000", matches.group(2)));
System.out.println(
String.format("Job finished successfully: %s", new String(blob.getContent())));
} catch (ExecutionException e) {
// If the job does not complete successfully, print the error message.
System.err.println(String.format("submitJob: %s ", e.getMessage()));
}
}
}
When running this sample, the code seems to timeout on Job response = submitJobAsOperationAsyncRequest.get(), and the Job is never submitted to my Google Cloud. I've checked all my project, region, and cluster names and I'm sure that is not the issue. I also have the following dependencies installed for the sample:
jar files
I believe I am not missing any .jar files.
Any suggestions? I appreciate any and all help.

Related

Java Google Api application Default credentials

How do I declare the application credentials? I have my .json file which is the key.
package shyam;
// Imports the Google Cloud client library
import com.google.cloud.vision.v1.AnnotateImageRequest;
import com.google.cloud.vision.v1.AnnotateImageResponse;
import com.google.cloud.vision.v1.BatchAnnotateImagesResponse;
import com.google.cloud.vision.v1.EntityAnnotation;
import com.google.cloud.vision.v1.Feature;
import com.google.cloud.vision.v1.Feature.Type;
import com.google.cloud.vision.v1.Image;
import com.google.cloud.vision.v1.ImageAnnotatorClient;
import com.google.protobuf.ByteString;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
public class App {
public static void main(String[] args) throws Exception {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests. After completing all of your requests, call
// the "close" method on the client to safely clean up any remaining background resources.
try (ImageAnnotatorClient vision = ImageAnnotatorClient.create()) {
// The path to the image file to annotate
String fileName = "./resources/wakeupcat.jpg";
// Reads the image file into memory
Path path = Paths.get(fileName);
byte[] data = Files.readAllBytes(path);
ByteString imgBytes = ByteString.copyFrom(data);
// Builds the image annotation request
List<AnnotateImageRequest> requests = new ArrayList<>();
Image img = Image.newBuilder().setContent(imgBytes).build();
Feature feat = Feature.newBuilder().setType(Type.LABEL_DETECTION).build();
AnnotateImageRequest request =
AnnotateImageRequest.newBuilder().addFeatures(feat).setImage(img).build();
requests.add(request);
// Performs label detection on the image file
BatchAnnotateImagesResponse response = vision.batchAnnotateImages(requests);
List<AnnotateImageResponse> responses = response.getResponsesList();
for (AnnotateImageResponse res : responses) {
if (res.hasError()) {
System.out.format("Error: %s%n", res.getError().getMessage());
return;
}
// for (EntityAnnotation annotation : res.getLabelAnnotationsList()) {
// annotation
// .getAllFields()
// .forEach((k, v) -> System.out.format("%s : %s%n", k, v.toString()));
// }
}
}
}
}
I'm getting the error
Application default credentials are not available
I have already set it in my cmd using set GOOGLE_APPLICATION_CREDENTIALS='key_path'. I have a lot initialized my Google Cloud Account in the cli. Hope someone can help me. Thank you.

how to batch insert data into Google BigQuery from a Java service?

I've read through a few similar questions on SO and GCP docs - but did not get a definitive answer...
Is there a way to batch insert data from my Java service into BigQuery directly, without using intermediary files, PubSub, or other Google services?
The key here is the "batch" mode: I do not want to use streaming API as it costs a lot.
I know there are other ways to do batch inserts using Dataflow, Google Cloud Storage, etc. - I am not interested in those, I need to do batch inserts programmatically for my use case.
I was hoping to use the REST batch API but it looks like it is deprecated now: https://cloud.google.com/bigquery/batch
Alternatives that are pointed to by the docs are:
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll REST request - but it looks like it will be working in the streaming mode inserting one row at a time (and cost a lot)
a Java client library: https://developers.google.com/api-client-library/java/google-api-java-client/dev-guide
After following through the links and references I ended up finding this specific API method promising: https://googleapis.dev/java/google-api-client/latest/index.html?com/google/api/client/googleapis/batch/BatchRequest.html
with the following usage pattern:
Create an BatchRequest object from this Google API client instance.
Sample usage:
client.batch(httpRequestInitializer)
.queue(...)
.queue(...)
.execute();
Is this API using the batch mode, not streaming one, and is the right way to go ?
thank you!

The "batch" version of writing data is called a "load job" in the Java client library. The bigquery.writer method creates an object which can be used to write data bytes as a batch load job. Set the format options based on the type of file you'd like to serialize to.
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.FormatOptions;
import com.google.cloud.bigquery.Job;
import com.google.cloud.bigquery.JobId;
import com.google.cloud.bigquery.JobStatistics.LoadStatistics;
import com.google.cloud.bigquery.TableDataWriteChannel;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.WriteChannelConfiguration;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.channels.Channels;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.UUID;
public class LoadLocalFile {
public static void main(String[] args) throws IOException, InterruptedException {
String datasetName = "MY_DATASET_NAME";
String tableName = "MY_TABLE_NAME";
Path csvPath = FileSystems.getDefault().getPath(".", "my-data.csv");
loadLocalFile(datasetName, tableName, csvPath, FormatOptions.csv());
}
public static void loadLocalFile(
String datasetName, String tableName, Path csvPath, FormatOptions formatOptions)
throws IOException, InterruptedException {
try {
// Initialize client that will be used to send requests. This client only needs to be created
// once, and can be reused for multiple requests.
BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
TableId tableId = TableId.of(datasetName, tableName);
WriteChannelConfiguration writeChannelConfiguration =
WriteChannelConfiguration.newBuilder(tableId).setFormatOptions(formatOptions).build();
// The location and JobName must be specified; other fields can be auto-detected.
String jobName = "jobId_" + UUID.randomUUID().toString();
JobId jobId = JobId.newBuilder().setLocation("us").setJob(jobName).build();
// Imports a local file into a table.
try (TableDataWriteChannel writer = bigquery.writer(jobId, writeChannelConfiguration);
OutputStream stream = Channels.newOutputStream(writer)) {
// This example writes CSV data from a local file,
// but bytes can also be written in batch from memory.
// In addition to CSV, other formats such as
// Newline-Delimited JSON (https://jsonlines.org/) are
// supported.
Files.copy(csvPath, stream);
}
// Get the Job created by the TableDataWriteChannel and wait for it to complete.
Job job = bigquery.getJob(jobId);
Job completedJob = job.waitFor();
if (completedJob == null) {
System.out.println("Job not executed since it no longer exists.");
return;
} else if (completedJob.getStatus().getError() != null) {
System.out.println(
"BigQuery was unable to load local file to the table due to an error: \n"
+ job.getStatus().getError());
return;
}
// Get output status
LoadStatistics stats = job.getStatistics();
System.out.printf("Successfully loaded %d rows. \n", stats.getOutputRows());
} catch (BigQueryException e) {
System.out.println("Local file not loaded. \n" + e.toString());
}
}
}
Resources:
https://cloud.google.com/bigquery/docs/batch-loading-data#loading_data_from_local_files
https://cloud.google.com/bigquery/docs/samples/bigquery-load-from-file
system test which writes JSON from memory

Azure functions using Java - How to created #BlobTrigger

i need to created Azure function BlobTrigger using Java to monitor my storage container for new and updated blobs.
tried with below code
import java.util.*;
import com.microsoft.azure.serverless.functions.annotation.*;
import com.microsoft.azure.serverless.functions.*;
import java.nio.file.*;
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import com.microsoft.azure.storage.*;
import com.microsoft.azure.storage.blob.*;
#FunctionName("testblobtrigger")
public String testblobtrigger(#BlobTrigger(name = "test", path = "testcontainer/{name}") String content) {
try {
return String.format("Blob content : %s!", content);
} catch (Exception e) {
// Output the stack trace.
e.printStackTrace();
return "Access Error!";
}
}
when executed it is showing error
Storage binding (blob/queue/table) must have non-empty connection. Invalid storage binding found on method:
it is working when added connection string
public String kafkablobtrigger(#BlobTrigger(name = "test", path = "testjavablobstorage/{name}",connection=storageConnectionString) String content) {
why i need to add connection string when using blobtrigger?
in C# it is working without connection string:
public static void ProcessBlobContainer1([BlobTrigger("container1/{blobName}")] CloudBlockBlob blob, string blobName)
{
ProcessBlob("container1", blobName, blob);
}
i didn't see any Java sample for Azure functions for #BlobTrigger.

After all, connection is necessary for the trigger to identify where the container locates.
After test I find #Mikhail is right.
For C#, the default value(in local.settings.json or in application settings in portal) will be used if connection is ignored. But unfortunately there's no same settings for java.
You can add #StorageAccount("YourStorageConnection") below your #FuncionName as it's another valid way to choose. And value of YourStorageConnection in local.settings.json or in portal's application settings is up to you.
You can follow this tutorial, use mvn azure-functions:add to find four(Http/Blob/Queue/TimerTrigger) templates for java.

How to integrate karate with testrail

I am new to Java and using karate for API automation. I need help to integrate testrail with karate. I want to use tags for each scenario which will be the test case id (from testrail) and I want to push the result 'after the scenario'.
Can someone guide me on this? Code snippets would be more appreciated. Thank you!

I spent a lot of effort for this.
That's how I implement. Maybe you can follow it.
First of all, you should download the APIClient.java and APIException.java files from the link below.
TestrailApi in github
Then you need to add these files to the following path in your project.
For example: YourProjectFolder/src/main/java/testrails/
In your karate-config.js file, after each test, you can send your case tags, test results and error messages to the BaseTest.java file, which I will talk about shortly.
karate-config.js file
function fn() {
var config = {
baseUrl: 'http://111.111.1.111:11111',
};
karate.configure('afterScenario', () => {
try{
const BaseTestClass = Java.type('features.BaseTest');
BaseTestClass.sendScenarioResults(karate.scenario.failed,
karate.scenario.tags, karate.info.errorMessage);
}catch(error) {
console.log(error)
}
});
return config;
}
Please dont forget give tag to scenario in Feature file.
For example #1111
Feature: ExampleFeature
Background:
* def conf = call read('../karate-config.js')
* url conf.baseUrl
#1111
Scenario: Example
Next, create a runner file named BaseTests.java
BaseTest.java file
package features;
import com.intuit.karate.junit5.Karate;
import net.minidev.json.JSONObject;
import org.junit.jupiter.api.BeforeAll;
import testrails.APIClient;
import testrails.APIException;
import java.io.IOException;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
public class BaseTest {
private static APIClient client = null;
private static String runID = null;
#BeforeAll
public static void beforeClass() throws Exception {
String fileName = System.getProperty("karate.options");
//Login to API
client = new APIClient("Write Your host, for example
https://yourcompanyname.testrail.io/");
client.setUser("user.name#companyname.com");
client.setPassword("password");
//Create Test Run
Map data = new HashMap();
data.put("suite_id", "Write Your Project SuitId(Only number)");
data.put("name", "Api Test Run");
data.put("description", "Karate Architect Regression Running");
JSONObject c = (JSONObject) client.sendPost("add_run/" +
TESTRAİL_PROJECT_ID, data);
runID = c.getAsString("id");
}
//Send Scenario Result to Testrail
public static void sendScenarioResults(boolean failed, List<String> tags, String errorMessage) {
try {
Map data = new HashMap();
data.put("status_id", failed ? 5 : 1);
data.put("comment", errorMessage);
client.sendPost("add_result_for_case/" + runID + "/" + tags.get(0),
data);
} catch (IOException e) {
e.printStackTrace();
} catch (APIException e) {
e.printStackTrace();
}
}
#Karate.Test
Karate ExampleFeatureRun() {
return Karate.run("ExampleFeatureRun").relativeTo(getClass());
}
}

Please look at 'hooks' documented here: https://github.com/intuit/karate#hooks
And there is an example with code over here: https://github.com/intuit/karate/blob/master/karate-demo/src/test/java/demo/hooks/hooks.feature
I'm sorry I can't help you with how to push data to testrail, but it may be as simple as an HTTP request. And guess what Karate is famous for :)
Note that values of tags can be accessed within a test, here is the doc for karate.tagValues (with link to example): https://github.com/intuit/karate#the-karate-object
Note that you need to be on the 0.7.0 version, right now 0.7.0.RC8 is available.
Edit - also see: https://stackoverflow.com/a/54527955/143475

With Elastic Beanstalk, can I determine programmatically if I'm on the leader node?

I have some housekeeping tasks within an Elastic Beanstalk Java application running on Tomcat, and I need to run them every so often. I want these tasks run only on the leader node (or, more correctly, on a single node, but the leader seems like an obvious choice).
I was looking at running cron jobs within Elastic Beanstalk, but it feels like this should be more straightforward than what I've come up with. Ideally, I'd like one of these two options within my web app:
Some way of testing within the current JRE whether or not this server is the leader node
Some some way to hit a specific URL (wget?) to trigger the task, but also restrict that URL to requests from localhost.
Suggestions?

It is not possible, by design (leaders are only assigned during deployment, and not needed on other contexts). However, you can tweak and use the EC2 Metadata for this exact purpose.
Here's an working example about how to achieve this result (original source). Once you call getLeader, it will find - or assign - an instance to be set as a leader:
package br.com.ingenieux.resource;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.List;
import javax.inject.Inject;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import org.apache.commons.io.IOUtils;
import com.amazonaws.services.ec2.AmazonEC2;
import com.amazonaws.services.ec2.model.CreateTagsRequest;
import com.amazonaws.services.ec2.model.DeleteTagsRequest;
import com.amazonaws.services.ec2.model.DescribeInstancesRequest;
import com.amazonaws.services.ec2.model.Filter;
import com.amazonaws.services.ec2.model.Instance;
import com.amazonaws.services.ec2.model.Reservation;
import com.amazonaws.services.ec2.model.Tag;
import com.amazonaws.services.elasticbeanstalk.AWSElasticBeanstalk;
import com.amazonaws.services.elasticbeanstalk.model.DescribeEnvironmentsRequest;
#Path("/admin/leader")
public class LeaderResource extends BaseResource {
#Inject
AmazonEC2 amazonEC2;
#Inject
AWSElasticBeanstalk elasticBeanstalk;
#GET
public String getLeader() throws Exception {
/*
* Avoid running if we're not in AWS after all
*/
try {
IOUtils.toString(new URL(
"http://169.254.169.254/latest/meta-data/instance-id")
.openStream());
} catch (Exception exc) {
return "i-FFFFFFFF/localhost";
}
String environmentName = getMyEnvironmentName();
List<Instance> environmentInstances = getInstances(
"tag:elasticbeanstalk:environment-name", environmentName,
"tag:leader", "true");
if (environmentInstances.isEmpty()) {
environmentInstances = getInstances(
"tag:elasticbeanstalk:environment-name", environmentName);
Collections.shuffle(environmentInstances);
if (environmentInstances.size() > 1)
environmentInstances.removeAll(environmentInstances.subList(1,
environmentInstances.size()));
amazonEC2.createTags(new CreateTagsRequest().withResources(
environmentInstances.get(0).getInstanceId()).withTags(
new Tag("leader", "true")));
} else if (environmentInstances.size() > 1) {
DeleteTagsRequest deleteTagsRequest = new DeleteTagsRequest().withTags(new Tag().withKey("leader").withValue("true"));
for (Instance i : environmentInstances.subList(1,
environmentInstances.size())) {
deleteTagsRequest.getResources().add(i.getInstanceId());
}
amazonEC2.deleteTags(deleteTagsRequest);
}
return environmentInstances.get(0).getInstanceId() + "/" + environmentInstances.get(0).getPublicIpAddress();
}
#GET
#Produces("text/plain")
#Path("am-i-a-leader")
public boolean isLeader() {
/*
* Avoid running if we're not in AWS after all
*/
String myInstanceId = null;
String environmentName = null;
try {
myInstanceId = IOUtils.toString(new URL(
"http://169.254.169.254/latest/meta-data/instance-id")
.openStream());
environmentName = getMyEnvironmentName();
} catch (Exception exc) {
return false;
}
List<Instance> environmentInstances = getInstances(
"tag:elasticbeanstalk:environment-name", environmentName,
"tag:leader", "true", "instance-id", myInstanceId);
return (1 == environmentInstances.size());
}
protected String getMyEnvironmentHost(String environmentName) {
return elasticBeanstalk
.describeEnvironments(
new DescribeEnvironmentsRequest()
.withEnvironmentNames(environmentName))
.getEnvironments().get(0).getCNAME();
}
private String getMyEnvironmentName() throws IOException,
MalformedURLException {
String instanceId = IOUtils.toString(new URL(
"http://169.254.169.254/latest/meta-data/instance-id"));
/*
* Grab the current environment name
*/
DescribeInstancesRequest request = new DescribeInstancesRequest()
.withInstanceIds(instanceId)
.withFilters(
new Filter("instance-state-name").withValues("running"));
for (Reservation r : amazonEC2.describeInstances(request)
.getReservations()) {
for (Instance i : r.getInstances()) {
for (Tag t : i.getTags()) {
if ("elasticbeanstalk:environment-name".equals(t.getKey())) {
return t.getValue();
}
}
}
}
return null;
}
public List<Instance> getInstances(String... args) {
Collection<Filter> filters = new ArrayList<Filter>();
filters.add(new Filter("instance-state-name").withValues("running"));
for (int i = 0; i < args.length; i += 2) {
String key = args[i];
String value = args[1 + i];
filters.add(new Filter(key).withValues(value));
}
DescribeInstancesRequest req = new DescribeInstancesRequest()
.withFilters(filters);
List<Instance> result = new ArrayList<Instance>();
for (Reservation r : amazonEC2.describeInstances(req).getReservations())
result.addAll(r.getInstances());
return result;
}
}

You can keep a secret URL (a long URL is un-guessable, almost as safe as a password), hit this URL from somewhere. On this you can execute the task.
One problem however is that if the task takes too long, then during that time your server capacity will be limited. Another approach would be for the URL hit to post a message to the AWS SQS. The another EC2 can have a code which waits on SQS and execute the task. You can also look into http://aws.amazon.com/swf/

Another approach if you're running on the Linux-type EC2 instance:
Write a shell script that does (or triggers) your periodic task
Leveraging the .ebextensions feature to customize your Elastic Beanstalk instance, create a container command that specifies the parameter leader_only: true -- this command will only run on an instance that is designated the leader in your Auto Scaling group
Have your container command copy your shell script into /etc/cron.hourly (or daily or whatever).
The result will be that your "leader" EC2 instance will have a cron job running hourly (or daily or whatever) to do your periodic task and the other instances in your Auto Scaling group will not.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Google Dataproc API (through Java) does not submit Job to cluster - java

Related

Java Google Api application Default credentials

how to batch insert data into Google BigQuery from a Java service?

Azure functions using Java - How to created #BlobTrigger

How to integrate karate with testrail

With Elastic Beanstalk, can I determine programmatically if I'm on the leader node?

Categories

Resources