Amazon S3 - Batch File Upload Using Java API? - java

We're looking to begin using S3 for some of our storage needs and I'm looking for a way to perform a batch upload of 'N' files. I've already written code using the Java API to perform single file uploads, but is there a way to provide a list of files to pass to an S3 bucket?
I did look at the following question is-it-possible-to-perform-a-batch-upload-to-amazon-s3, but it is from two years ago and I'm curious if the situation has changed at all. I can't seem to find a way to do this in code.
What we'd like to do is to be able to set up an internal job (probably using scheduled tasking in Spring) to transition groups of files every night. I'd like to have a way to do this rather than just looping over them and doing a put request for each one, or having to zip batches up to place on S3.

The easiest way to go if you're using the AWS SDK for Java is the TransferManager. Its uploadFileList method takes a list of files and uploads them to S3 in parallel, or uploadDirectory will upload all the files in a local directory.

public void uploadDocuments(List<File> filesToUpload) throws
AmazonServiceException, AmazonClientException,
InterruptedException {
AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(getCredentials()).withRegion(Regions.AP_SOUTH_1)
.build();
TransferManager transfer = TransferManagerBuilder.standard().withS3Client(s3).build();
String bucket = Constants.BUCKET_NAME;
MultipleFileUpload upload = transfer.uploadFileList(bucket, "", new File("."), filesToUpload);
upload.waitForCompletion();
}
private AWSCredentialsProvider getCredentials() {
String accessKey = Constants.ACCESS_KEY;
String secretKey = Constants.SECRET_KEY;
BasicAWSCredentials awsCredentials = new BasicAWSCredentials(accessKey, secretKey);
return new AWSStaticCredentialsProvider(awsCredentials);
}

Related

How to config S3 Object Storage in Flink

I need to read and write data from S3 Object Storage. I used flink-s3-fs-hadoop-1.15.0.jar plugin to connect to S3.
This is code to read file:
public DataStream<GenericRecord> readParquet(StreamExecutionEnvironment env, String dir){
FileSource<GenericRecord> source =
FileSource.forRecordStreamFormat(
AvroParquetReaders.forGenericRecord(schema), new Path("s3://dev/input.parquet"))
.build();
return env.fromSource(source, WatermarkStrategy.noWatermarks(), "file-source");
}
I have configured as instructed by Flink, but still can't get the data from S3 Object Storage. This is flink-conf.yaml file:
s3.endpoint: https://hcm01.test.vn
s3.path.style.access: true
s3.access-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
s3.secret-key: xxxxxxxxxxxxxxxxxxxxxxxxxxxx
I want to configure more region="HCM01". Is there any way to configure more region?
Please help me! Thank you.

AWS Java S3 Uploading error: "profile file cannot be null"

I get an exception when trying to upload a file to Amazon S3 from my Java Spring application. The method is pretty simple:
private void productionFileSaver(String keyName, File f) throws InterruptedException {
String bucketName = "{my-bucket-name}";
TransferManager tm = new TransferManager(new ProfileCredentialsProvider());
// TransferManager processes all transfers asynchronously,
// so this call will return immediately.
Upload upload = tm.upload(
bucketName, keyName, new File("/mypath/myfile.png"));
try {
// Or you can block and wait for the upload to finish
upload.waitForCompletion();
System.out.println("Upload complete.");
} catch (AmazonClientException amazonClientException) {
System.out.println("Unable to upload file, upload was aborted.");
amazonClientException.printStackTrace();
}
}
It is basically the same that amazon provides here, and the same exception with the exactly same message ("profile file cannot be null") appears when trying this other version.
The problem is not related to the file not existing or being null (I have already checked in a thousand ways that the File argument recieved by TransferManager.upload method exists before calling it).
I cannot find any info about my exception message "profile file cannot be null". The first lines of the error log are the following:
com.amazonaws.AmazonClientException: Unable to complete transfer: profile file cannot be null
at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.unwrapExecutionException(AbstractTransfer.java:281)
at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.rethrowExecutionException(AbstractTransfer.java:265)
at com.amazonaws.services.s3.transfer.internal.AbstractTransfer.waitForCompletion(AbstractTransfer.java:103)
at com.fullteaching.backend.file.FileController.productionFileSaver(FileController.java:371)
at com.fullteaching.backend.file.FileController.handlePictureUpload(FileController.java:247)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
My S3 policy allows getting and puttings objects for all kind of users.
What's happening?
ProfileCredentialsProvider() creates a new profile credentials provider that returns the AWS security credentials configured for the default profile.
So, if you haven't any configuration for default profile at ~/.aws/credentials, while trying to put object, it yields that error.
If you run your code on Lambda service, it will not provide this file. In that case, you also do not need to provide credentials. Just assign right IAM Role to your lambda function, then using default constructor should solve issue.
You may want to change TransferManager constructor according to your needs.
The solution was pretty simple: I was trying to implement this communication without an AmazonS3 bean for Spring.
This link will help with the configuration:
http://codeomitted.com/upload-file-to-s3-with-spring/
my code worked fine as below:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withCredentials(DefaultAWSCredentialsProviderChain.getInstance()).withRegion(clientRegion).build();

File uploads from multiple clients slowing Java Spring application

I have an application with Java Spring, mysql backend and AngularJS frontend. It's hosted on amazon ec2 m4.xlarge instance.
I use the HTML5 camera capture capability to take photograph and send base64 encoded image and some other metadata to backend via a RESTful web service. At the back-end I convert the base64 data to a png file and save to disk and also make an entry to MySQL database about the status of the file. This has been working fine until many users start uploading images at the same time. There are 4000+ users in the system and at the peak there could be around 1000 concurrent users trying to upload image data simultaneously. Having too many users slows down my application and it take 10-15 seconds to return any page (normally its under 2 seconds). I checked my server stats and the CPU utilization is under 20%, and no SWAP memory is used. I am not sure where the bottleneck is and how to measure it. Any suggestion on how to approach the problem? I know that autoscaling ec2 and having queue in image data processing at the backend might help, but before doing anything I want to get to the root cause of the problem.
Java code to process base64 image:
/**
* POST /rest/upload/studentImage -> Upload a photo of the user and update the Student (Student History and System Logs)
*/
#RequestMapping(value = "/rest/upload/studentImage",
method = RequestMethod.POST,
produces = "application/json")
#Timed
public Integer create(#RequestBody StudentPhotoDTO studentPhotoDTO) {
log.debug("REST request to save Student : {}", studentPhotoDTO);
Boolean success = false;
Integer returnValue = 0;
final String baseDirectory = "/Applications/MAMP/htdocs/studentPhotos/";
Long studentId = studentPhotoDTO.getStudent().getStudentId();
String base64Image = studentPhotoDTO.getImageData().split(",")[1];
byte[] imageBytes = DatatypeConverter.parseBase64Binary(base64Image);
try {
BufferedImage image = ImageIO.read(new ByteArrayInputStream(imageBytes));
String filePath = baseDirectory + 'somestring';
log.info("Saving uploaded file to: " + filePath);
File f = new File(filePath);
Boolean bool = f.mkdirs();
File outputfile = new File(filePath + studentId + ".png");
success = ImageIO.write(image, "png", outputfile);
} catch (IOException e) {
success = false;
e.printStackTrace();
} catch(Exception ex){
success = false;
ex.printStackTrace();
}
if(success) {
returnValue = 1;
// update student
studentPhotoDTO.getStudent().setPhotoAvailable(true);
studentPhotoDTO.getStudent().setModifiedOn(new Date());
studentRepository.save(studentPhotoDTO.getStudent());
}
return returnValue;
}
Here is an image of my cloudwatch network monitoring
Update (06/17/2015):
I am serving the Spring Boot tomcat application with apache fronting using ajp ProxyPass. I tried to directly serve the tomcat app without the apache fronting and that seem to significantly improve the performance. My app didn't slow down as before. Still looking for the root cause.
Instead of using a technical approach to solve this problem, I would recommend having a good design that takes care of this problem. In my opinion, which is used totally based on the information that you have provided, I think its a design problem. To find the root cause of this problem, I need to have a look at your code.
But as of now, a design should be to use a queue that processes images at the back end, and at the front end, notifies the user that their image has been queued to process. Once its processed, it should say that it has been processed (or not processed because of some error).
If you can show code, I can try to figure out the problem if there is any.

How to use 'compose' on GCS using the Java client

I want to combine multiple GCS files into one big file. According to the docs there is a compose function, which looks like it does exactly what I need:
https://developers.google.com/storage/docs/json_api/v1/objects/compose
However, I can't find how to call that function from GAE using the Java client:
https://developers.google.com/appengine/docs/java/googlecloudstorageclient/
Is there a way to do this with that library?
Or should I mess around with reading the files one by one using channels?
Or should I call the low level JSON methods?
What's the best way?
Compose option available in the new Java client, I have tried using google-cloud-storage:1.63.0.
/** Example of composing two blobs. */
// [TARGET compose(ComposeRequest)]
// [VARIABLE "my_unique_bucket"]
// [VARIABLE "my_blob_name"]
// [VARIABLE "source_blob_1"]
// [VARIABLE "source_blob_2"]
public Blob composeBlobs(
String bucketName, String blobName, String sourceBlob1, String sourceBlob2) {
// [START composeBlobs]
BlobId blobId = BlobId.of(bucketName, blobName);
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).setContentType("text/plain").build();
ComposeRequest request =
ComposeRequest.newBuilder()
.setTarget(blobInfo)
.addSource(sourceBlob1)
.addSource(sourceBlob2)
.build();
Blob blob = storage.compose(request);
// [END composeBlobs]
return blob;
}
The compose operation does indeed do exactly what you want it to do. However, the compose operation isn't currently available for the GAE Google Cloud Storage client. You have a few alternatives.
You can use the non-GAE Google APIs client (link to the Java one). It invokes the lower level JSON API and supports compose(). The downside is that this client doesn't have any special AppEngine magic, so some little things will be different. For example, if you run it in the local development server, it will contact the real Google Cloud Storage. Also you'll need to configure it to authorize its requests, etc.
Another option would be to invoke the JSON or XML APIs directly.
Finally, if you only need to do this one time, you could simply use the command-line utility:
gsutil compose gs://bucket/source1 gs://bucket/source2 gs://bucket/output

Storing data to google cloud storage using gae for java

i am trying to store data from the app engine app to google cloud storage.
if i do it locally code runs fine but i dont see any data being stored.
if i upload the app to the app engine then i get a null pointer exception on running the code at:
AppEngineFile writableFile = fileService.createNewGSFile(optionsBuilder.build());
the complete code i m using is
// Get the file service
FileService fileService = FileServiceFactory.getFileService();
/**
* Set up properties of your new object
* After finalizing objects, they are accessible
* through Cloud Storage with the URL:
* http://commondatastorage.googleapis.com/my_bucket/my_object
*/
GSFileOptionsBuilder optionsBuilder = new GSFileOptionsBuilder()
.setBucket("blood")
.setKey("1234567")
.setAcl("public-read")
.setMimeType("text/html");//.setUserMetadata("date-created", "092011", "owner", "Jon");
// Create your object
AppEngineFile writableFile = fileService.createNewGSFile(optionsBuilder.build());
// Open a channel for writing
boolean lockForWrite = true; // Do you want to exclusively lock this object?
FileWriteChannel writeChannel = fileService.openWriteChannel(writableFile, lockForWrite);
// For this example, we write to the object using the PrintWriter
PrintWriter out = new PrintWriter(Channels.newWriter(writeChannel, "UTF8"));
out.println("The woods are lovely and deep.");
out.println("But I have promises too keep.");
// Close the object without finalizing.
out.close();
// Finalize the object
writeChannel.closeFinally();
System.out.println("Completed writing to google storage");
When using the dev service and the file service with Google Storage, it does not actually write to your Google Storage bucket but to the local file system - are you expecting it to write to Google Storage?
For the null pointer exception would need to see a stack trace to try and narrow down what's going wrong.
So the problem was that app engine was not authorized to read/write to the bucket created in the cloud store.

Categories