How to config S3 Object Storage in Flink

How to config S3 Object Storage in Flink - java

I need to read and write data from S3 Object Storage. I used flink-s3-fs-hadoop-1.15.0.jar plugin to connect to S3.
This is code to read file:
public DataStream<GenericRecord> readParquet(StreamExecutionEnvironment env, String dir){
FileSource<GenericRecord> source =
FileSource.forRecordStreamFormat(
AvroParquetReaders.forGenericRecord(schema), new Path("s3://dev/input.parquet"))
.build();
return env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");
}
I have configured as instructed by Flink, but still can't get the data from S3 Object Storage. This is flink-conf.yaml file:
s3.endpoint: https://hcm01.test.vn
s3.path.style.access: true
s3.access-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
s3.secret-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
I want to configure more region="HCM01". Is there any way to configure more region?
Please help me! Thank you.

Related

AWS Java S3 Uploading error: "profile file cannot be null"

I get an exception when trying to upload a file to Amazon S3 from my Java Spring application. The method is pretty simple:
private void productionFileSaver(String keyName, File f) throws InterruptedException {
String bucketName = "{my-bucket-name}";
TransferManager tm = new TransferManager(new ProfileCredentialsProvider());
// TransferManager processes all transfers asynchronously,
// so this call will return immediately.
Upload upload = tm.upload(
bucketName, keyName, new File("/mypath/myfile.png"));
try {
// Or you can block and wait for the upload to finish
upload.waitForCompletion();
System.out.println("Upload complete.");
} catch (AmazonClientException amazonClientException) {
System.out.println("Unable to upload file, upload was aborted.");
amazonClientException.printStackTrace();
}
}
It is basically the same that amazon provides here, and the same exception with the exactly same message ("profile file cannot be null") appears when trying this other version.
The problem is not related to the file not existing or being null (I have already checked in a thousand ways that the File argument recieved by TransferManager.upload method exists before calling it).
I cannot find any info about my exception message "profile file cannot be null". The first lines of the error log are the following:
com.amazonaws.AmazonClientException: Unable to complete transfer: profile file cannot be null
at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.unwrapExecutionException(AbstractTransfer.java:281)
at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.rethrowExecutionException(AbstractTransfer.java:265)
at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.waitForCompletion(AbstractTransfer.java:103)
at com.fullteaching.backend.file.FileController.productionFileSaver(FileController.java:371)
at com.fullteaching.backend.file.FileController.handlePictureUpload(FileController.java:247)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
My S3 policy allows getting and puttings objects for all kind of users.
What's happening?

ProfileCredentialsProvider() creates a new profile credentials provider that returns the AWS security credentials configured for the default profile.
So, if you haven't any configuration for default profile at ~/.aws/credentials, while trying to put object, it yields that error.
If you run your code on Lambda service, it will not provide this file. In that case, you also do not need to provide credentials. Just assign right IAM Role to your lambda function, then using default constructor should solve issue.
You may want to change TransferManager constructor according to your needs.

The solution was pretty simple: I was trying to implement this communication without an AmazonS3 bean for Spring.
This link will help with the configuration:
http://codeomitted.com/upload-file-to-s3-with-spring/

my code worked fine as below:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withCredentials(DefaultAWSCredentialsProviderChain.getInstance()).withRegion(clientRegion).build();

Using Firebase Storage in server

I learned from past thread that Firebase Database for plain JVM can be accessed from the new server SDK. So now I could use Firebase Database SDK for my JavaFX project, but how do I use the Storage SDK?
Sadly, Firebase Storage doc doesn't mention anything about setting up storage in server. StorageReference is also not available from com.google.firebase:firebase-server-sdk:[3.0.0,) or com.google.firebase:firebase-server-sdk:3.0.1.

Firebase Storage does not have a server SDK, but because it's backed by Google Cloud Storage, you can use the GCS server SDKs. Here are the GCS docs for accessing it from Java.

In the Gradle File add compile 'com.google.cloud:google-cloud-storage:1.7.0'
or latest java library
If the Storage url is say gs://some-bucket-name.appspot.com/directory/some_blob.zip then STORAGE_BUCKET must be some-bucket-name.appspot.com
and path should be directory/some_blob.zip
private long getSize(String path) {
Storage storage = StorageOptions.newBuilder()
.setProjectId(STORAGE_BUCKET)
// Optionally Add credentials
//.setCredentials(GoogleCredentials.fromStream(new FileInputStream(jsonFile)))
.build()
.getService();
// Optional third parameter to limit fields returned, just getting size for my use case
Blob blob = storage.get(STORAGE_BUCKET, path, Storage.BlobGetOption.fields(Storage.BlobField.SIZE));
if (blob != null) {
return blob.getSize();
}
return 0;
}

Amazon S3 - Batch File Upload Using Java API?

We're looking to begin using S3 for some of our storage needs and I'm looking for a way to perform a batch upload of 'N' files. I've already written code using the Java API to perform single file uploads, but is there a way to provide a list of files to pass to an S3 bucket?
I did look at the following question is-it-possible-to-perform-a-batch-upload-to-amazon-s3, but it is from two years ago and I'm curious if the situation has changed at all. I can't seem to find a way to do this in code.
What we'd like to do is to be able to set up an internal job (probably using scheduled tasking in Spring) to transition groups of files every night. I'd like to have a way to do this rather than just looping over them and doing a put request for each one, or having to zip batches up to place on S3.

The easiest way to go if you're using the AWS SDK for Java is the TransferManager. Its uploadFileList method takes a list of files and uploads them to S3 in parallel, or uploadDirectory will upload all the files in a local directory.

public void uploadDocuments(List<File> filesToUpload) throws
AmazonServiceException, AmazonClientException,
InterruptedException {
AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(getCredentials()).withRegion(Regions.AP_SOUTH_1)
.build();
TransferManager transfer = TransferManagerBuilder.standard().withS3Client(s3).build();
String bucket = Constants.BUCKET_NAME;
MultipleFileUpload upload = transfer.uploadFileList(bucket, "", new File("."), filesToUpload);
upload.waitForCompletion();
}
private AWSCredentialsProvider getCredentials() {
String accessKey = Constants.ACCESS_KEY;
String secretKey = Constants.SECRET_KEY;
BasicAWSCredentials awsCredentials = new BasicAWSCredentials(accessKey, secretKey);
return new AWSStaticCredentialsProvider(awsCredentials);
}

How to use 'compose' on GCS using the Java client

I want to combine multiple GCS files into one big file. According to the docs there is a compose function, which looks like it does exactly what I need:
https://developers.google.com/storage/docs/json_api/v1/objects/compose
However, I can't find how to call that function from GAE using the Java client:
https://developers.google.com/appengine/docs/java/googlecloudstorageclient/
Is there a way to do this with that library?
Or should I mess around with reading the files one by one using channels?
Or should I call the low level JSON methods?
What's the best way?

Compose option available in the new Java client, I have tried using google-cloud-storage:1.63.0.
/** Example of composing two blobs. */
// [TARGET compose(ComposeRequest)]
// [VARIABLE "my_unique_bucket"]
// [VARIABLE "my_blob_name"]
// [VARIABLE "source_blob_1"]
// [VARIABLE "source_blob_2"]
public Blob composeBlobs(
String bucketName, String blobName, String sourceBlob1, String sourceBlob2) {
// [START composeBlobs]
BlobId blobId = BlobId.of(bucketName, blobName);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("text/plain").build();
ComposeRequest request =
ComposeRequest.newBuilder()
.setTarget(blobInfo)
.addSource(sourceBlob1)
.addSource(sourceBlob2)
.build();
Blob blob = storage.compose(request);
// [END composeBlobs]
return blob;
}

The compose operation does indeed do exactly what you want it to do. However, the compose operation isn't currently available for the GAE Google Cloud Storage client. You have a few alternatives.
You can use the non-GAE Google APIs client (link to the Java one). It invokes the lower level JSON API and supports compose(). The downside is that this client doesn't have any special AppEngine magic, so some little things will be different. For example, if you run it in the local development server, it will contact the real Google Cloud Storage. Also you'll need to configure it to authorize its requests, etc.
Another option would be to invoke the JSON or XML APIs directly.
Finally, if you only need to do this one time, you could simply use the command-line utility:
gsutil compose gs://bucket/source1 gs://bucket/source2 gs://bucket/output

Storing data to google cloud storage using gae for java

i am trying to store data from the app engine app to google cloud storage.
if i do it locally code runs fine but i dont see any data being stored.
if i upload the app to the app engine then i get a null pointer exception on running the code at:
AppEngineFile writableFile = fileService.createNewGSFile(optionsBuilder.build());
the complete code i m using is
// Get the file service
FileService fileService = FileServiceFactory.getFileService();
/**
* Set up properties of your new object
* After finalizing objects, they are accessible
* through Cloud Storage with the URL:
* http://commondatastorage.googleapis.com/my_bucket/my_object
*/
GSFileOptionsBuilder optionsBuilder = new GSFileOptionsBuilder()
.setBucket("blood")
.setKey("1234567")
.setAcl("public-read")
.setMimeType("text/html");//.setUserMetadata("date-created", "092011", "owner", "Jon");
// Create your object
AppEngineFile writableFile = fileService.createNewGSFile(optionsBuilder.build());
// Open a channel for writing
boolean lockForWrite = true; // Do you want to exclusively lock this object?
FileWriteChannel writeChannel = fileService.openWriteChannel(writableFile, lockForWrite);
// For this example, we write to the object using the PrintWriter
PrintWriter out = new PrintWriter(Channels.newWriter(writeChannel, "UTF8"));
out.println("The woods are lovely and deep.");
out.println("But I have promises too keep.");
// Close the object without finalizing.
out.close();
// Finalize the object
writeChannel.closeFinally();
System.out.println("Completed writing to google storage");

When using the dev service and the file service with Google Storage, it does not actually write to your Google Storage bucket but to the local file system - are you expecting it to write to Google Storage?
For the null pointer exception would need to see a stack trace to try and narrow down what's going wrong.

So the problem was that app engine was not authorized to read/write to the bucket created in the cloud store.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to config S3 Object Storage in Flink - java

Related

AWS Java S3 Uploading error: "profile file cannot be null"

Using Firebase Storage in server

Amazon S3 - Batch File Upload Using Java API?

How to use 'compose' on GCS using the Java client

Storing data to google cloud storage using gae for java

Categories

Resources