S3 path configuration and SQS Extended Client Library - java

I want to save all messages that go in a particular SQS queue in the already created s3 bucket.
But I want to save those messages in certain directories for an easier search by date and time.
S3Client has software.amazon.awssdk.services.s3.model.PutObjectRequest
Where I can determine bucket, path where the object is saved and some headers
PutObjectRequest objectRequest =
PutObjectRequest.builder()
.bucket(bucketName)
.key(s3Path)
.metadata(keyAndMetadata.getMetadata())
.build();
After that s3Client.putObject(objectRequest, body) do the thing
Now, I want to configure s3 in a similar way using ExtendedClientConfiguration, but I can only see very simple input parameters
ExtendedClientConfiguration extendedClientConfiguration =
new ExtendedClientConfiguration()
.withPayloadSupportEnabled(s3Client, bucketName, false)
.withAlwaysThroughS3(true);
And after that, we create that extended Sqs client with no way to configure s3 more extensively
AmazonSQSExtendedClient amazonSQSExtendedClient = new AmazonSQSExtendedClient(sqsClient, extendedClientConfiguration);
I know that I could probably separately save all messages that go to SQS to s3, but I'd better configure all that on the client level. Does someone have any ideas?

I found out there's no way to configure s3 path on the client level. But back up to s3 wasn't created for that purpose, and saving to s3 probably should be handled differently. Deleting files from s3 as they disappear from SQS is the best option for using this library.

Related

Is it safe to cache AmazonS3 client for later use?

I am writing a Java program to upload files to AWS S3 and I have succeed to get the S3 client using the following code:
BasicAWSCredentials awsCreds = new BasicAWSCredentials("aaa", "bbb");
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(Regions.fromName("ccc"))
.withCredentials(new AWSStaticCredentialsProvider(awsCreds)).build();
As I find that it takes quite a few seconds each time to setup the S3 client, I am wondering if it is possible to cache the client for repeated use.
Also, if I cache the client for like a year, will the client still be valid to connect to AWS?
Your client will work as long as the the credentials are valid. It will work for a year if your credentials are not changed or updated.
Basically, when you create a client, you don't convert the original credentials to any form, everything will be reference later when needed to perform the actual operation.
Your client will no longer work once you update your credentials after its object creation.
If you want to initialize once and use it later for a year. Yes, it will work. With the best security practices, it is not good to keep the credentials fixed for a longer period of time.
More about credentials:
https://docs.aws.amazon.com/sdk-for-java/v2/developer-guide/credentials.html
Hope it helps.

Amazon S3 Copy between two buckets with different Authentication

I have two buckets, each with a Private ACL.
I have an authenticated link to the source:
String source = "https://bucket-name.s3.region.amazonaws.com/key?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=...&X-Amz-SignedHeaders=host&X-Amz-Expires=86400&X-Amz-Credential=...Signature=..."
and have been trying to use the Java SDK CopyObjectRequest to copy it into another bucket using:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey)
AWSCredentialsProvider provider = new AWSStaticCredentialsProvider(credentials)
AmazonS3 s3Client = AmazonS3ClientBuilder
.standard()
.withCredentials(provider)
AmazonS3URI sourceURI = new AmazonS3URI(URI(source))
CopyObjectRequest request = new CopyObjectRequest(sourceURI.getBucket, sourceURI.getKey, destinationBucket, destinationKey);
s3Client.copyObject(request);
However I get AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied because My AWS credentials I've set the SDK up with do not have access to the source file.
Is there a way I can provide an authenticated source URL instead of just the bucket and key?
This isn't supported. The PUT+Copy service API, which is used by s3Client.copyObject(), uses an internal S3 mechanism to copy of the object, and the source object is passed as /bucket/key -- not as a full URL. There is no API functionality that can be used for fetching from a URL, S3 or otherwise.
With PUT+Copy, the user making the request to S3...
must have READ access to the source object and WRITE access to the destination bucket
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
The only alternative is download followed by upload.
Doing this from EC2... or a Lambda function running in the source region would be the most cost-effective, but if the object is larger than the Lambda temp space, you'll have to write hooks and handlers to read from the stream and juggle the chunks into a multipart upload... not impossible, but requires some mental gyrations in order to understand what you're actually trying to persuade your code to do.

Amazon S3 AWS SDK [Java] - MultiPart Upload how to get a custom header in the http response?

how can I access a custom header from a server response when using TransferManager ?
we have a custom header added in the response from our server, from the client side we use multi part upload with default transfer manager
any suggestion how in how i could hook up it ?
so basically i want to pass over the response from the return response.getAwsResponse(); found in the class: AmazonS3Client on the method
private <X, Y extends AmazonWebServiceRequest> X invoke(Request<Y> request,
HttpResponseHandler<AmazonWebServiceResponse<X>> responseHandler,
String bucket, String key, boolean isAdditionalHeadRequestToFindRegion) {
that response will have the HTTP response from the server containing the custom heather which I'm after, basically is a unique Id send back when the file was 100% completed so than i can manipulate it.
I need to pass over this custom header from the response to the very beginning where I use the transfer manager and the upload.waitForCompletion(),
also i don't want to edit the amazon's,
so does anyone know if there is an interface or some other object which provides me access to it ?
After some debug into the framework I strongly believe that there is no way to have access to the http response when using the TransferManager
for what we are trying to do we need to send an unique id from the server to the client when the file upload is completed and assembled
** therefore if you don't mind in do not use the beauty of the TransferManager you could write "your own TransferMananger" than you will have full control, but again on the client side we don't really want to add custom code but have a standard and simple approach (but that is for my scenario), if you decide to do it manually it can be done I have already tried and works !
So as a alternative we though in send from the server via the eTag, which is not great but will do the job and will keep the client side simple and clean
Any suggestion in how to send this value back in a better way ?
Upload up = tm.upload(bucketName, file.getName(), file);
UploadResult result = (UploadResult) ((UploadImpl) up).getMonitor().getFuture().get();
String uniqueIdFromServer = result.getETag();

Generating a pre-signed PUT url for Amazon S3 with a maximum content-length

I'm trying to generate a pre-signed URL a client can use to upload an image to a specific S3 bucket. I've succesfully generated requests to GET files, like so:
GeneratePresignedUrlRequest urlRequest = new GeneratePresignedUrlRequest(bucket, filename);
urlRequest.setMethod(method);
urlRequest.setExpiration(expiration);
where expiration and method are Date and HttpMethod objects respectively.
Now I'm trying to create a URL to allow users to PUT a file, but I can't figure out how to set the maximum content-length. I did find information on POST policies, but I'd prefer to use PUT here - I'd also like to avoid constructing the JSON, though that doesn't seem possible.
Lastly, an alternative answer could be some way to pass an image upload from the API Gateway to Lambda so I can upload it from Lambda to S3 after validating file type and size (which isn't ideal).
While I haven't managed to limit the file size on upload, I ended up creating a Lambda function that is activated on upload to a temporary bucket. The function has a signature like the below
public static void checkUpload(S3EventNotification event) {
(this is notable because all the guides I found online refer to a S3Event class that doesn't seem to exist anymore)
The function pulls the file's metadata (not the file itself, as that potentially counts as a large download) and checks the file size. If it's acceptable, it downloads the file then uploads it to the destination bucket. If not, it simply deletes the file.
This is far from ideal, as uploads failing to meet the criteria will seem to work but then simply never show up (as S3 will issue a 200 status code on upload without caring what Lambda's response is).
This is effectively a workaround rather than a solution, so I won't be accepting this answer.

S3 Bulk putObject

I am profiling my Java distributed crawler (that stores crawled documents in S3), and S3 insertion is definitely a bottleneck. In fact, at high enough number of threads, the threads will consistently get timeout exception from S3 due to the fact that it takes too long for S3 to read the data. Is there a bulk putObject function provided by either Amazon or another library that can do this more efficiently?
Example code:
BUCKET = ...; // S3 bucket definition
AmazonS3 client= ...;
InputStream is = ...; // convert the data into input stream
ObjectMetadata meta = ...; // get metadata
String key = ...;
client.putObject(new PutObjectRequest(BUCKET, key, is, meta));
I haven't used S3 with java but AWS does support multipart uploads for large files.
http://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
The boto library for Python does support this for sure. I've used it to successfully upload very very large database backups before.
After looking at the javadocs for the java library I think you may need to use http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/UploadPartRequest.html instead of the regular request and you can get the multipart upload going.

Categories