Put file to Amazon S3 using multipart upload

Put file to Amazon S3 using multipart upload - java

I'm trying to upload a file with the Amazon Java SDK, via multipart upload. The idea is to pass an upload-id to an applet, which puts the file parts into a readonly-bucket. Going this way, I avoid to store AWS credentials in the applet.
In my tests, I generate an upload-id with boto (python) and store a file into the bucket. That works well.
My Applet gets a "403 Access denied" from the S3, and I have no idea why.
Here's my code (which is partially taken from http://docs.amazonwebservices.com/AmazonS3/latest/dev/llJavaUploadFile.html):
AmazonS3 s3Client = new AmazonS3Client();
List<PartETag> partETags = new ArrayList<PartETag>();
long contentLength = file.length();
long partSize = Config.getInstance().getInt("part_size");
String bucketName = Config.getInstance().getString("bucket");
String keyName = "mykey";
String uploadId = getParameter("upload_id");
try {
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
partSize = Math.min(partSize, (contentLength - filePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucket).withKey(keyName)
.withUploadId(uploadId).withPartNumber(i)
.withFileOffset(filePosition)
.withFile(file)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
filePosition += partSize;
}
System.out.println("Completing upload");
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(bucket,
keyName,
uploadId,
partETags);
s3Client.completeMultipartUpload(compRequest);
} catch (Exception e) {
s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, keyName, uploadId));
}
In the applet debug log, I find this, then:
INFO: Sending Request: PUT https://mybucket.s3.amazonaws.com /mykey Parameters: (uploadId: V4hwobOLQ1rYof54zRW0pfk2EfhN7B0fpMJTOpHOcmaUl8k_ejSo_znPI540.lpO.ZO.bGjh.3cx8a12ZMODfA--, partNumber: 1, ) Headers: (Content-Length: 4288546, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
24.01.2012 16:48:42 com.amazonaws.http.AmazonHttpClient handleErrorResponse
INFO: Received error response: Status Code: 403, AWS Service: null, AWS Request ID: DECF32CCFEE9EBF0, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: xtL1ixsGM2/vsxJ+cZRHpkPZ23SMfP8hZZjQCQnp8oWGwdS2/aGfYgomihyqaDCQ
Do you find any obvious failures in the code?
Thanks,
Stefan

While your use case is sound and this is an obvious attempt indeed, I don't think the Multipart Upload API has been designed to allow this and you are actually violating a security barrier:
The upload ID is merely an identifier to assist the Multipart Upload API in assembling the parts together (i.e. more like a temporary object key) not a dedicated security mechanism (see below). Consequently you still require proper access credentials in place, but since you are calling AmazonS3Client(), which Constructs a new Amazon S3 client that will make anonymous requests to Amazon S3, your request yields a 403 Access denied accordingly.
What you are trying to achieve is possible via Uploading Objects Using Pre-Signed URLs, albeit only without the multipart functionality, unfortunately:
A pre-signed URL gives you access to the object identified in the URL,
provided that the creator of the pre-signed URL has permissions to
access that object. That is, if you receive a pre-signed URL to upload
an object, you can upload the object only if the creator of the
pre-signed URL has the necessary permissions to upload that object.
[...] The pre-signed URLs
are useful if you want your user/customer to be able upload a specific
object [...], but you don't require them to have AWS security
credentials or permissions. When you create a pre-signed URL, you must
provide your security credentials, specify a bucket name an object
key, an HTTP method (PUT of uploading objects) and an expiration date
and time. [...]
The lenghty quote illustrates, why a system like this likely needs a more complex security design than 'just' handing out an upload ID (as similar as both might appear at first sight).
Obviously one would like to be able to use both features together, but this doesn't appear to be available yet.

Related

Amazon S3 Copy between two buckets with different Authentication

I have two buckets, each with a Private ACL.
I have an authenticated link to the source:
String source = "https://bucket-name.s3.region.amazonaws.com/key?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=...&X-Amz-SignedHeaders=host&X-Amz-Expires=86400&X-Amz-Credential=...Signature=..."
and have been trying to use the Java SDK CopyObjectRequest to copy it into another bucket using:
AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey)
AWSCredentialsProvider provider = new AWSStaticCredentialsProvider(credentials)
AmazonS3 s3Client = AmazonS3ClientBuilder
.standard()
.withCredentials(provider)
AmazonS3URI sourceURI = new AmazonS3URI(URI(source))
CopyObjectRequest request = new CopyObjectRequest(sourceURI.getBucket, sourceURI.getKey, destinationBucket, destinationKey);
s3Client.copyObject(request);
However I get AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied because My AWS credentials I've set the SDK up with do not have access to the source file.
Is there a way I can provide an authenticated source URL instead of just the bucket and key?

This isn't supported. The PUT+Copy service API, which is used by s3Client.copyObject(), uses an internal S3 mechanism to copy of the object, and the source object is passed as /bucket/key -- not as a full URL. There is no API functionality that can be used for fetching from a URL, S3 or otherwise.
With PUT+Copy, the user making the request to S3...
must have READ access to the source object and WRITE access to the destination bucket
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html
The only alternative is download followed by upload.
Doing this from EC2... or a Lambda function running in the source region would be the most cost-effective, but if the object is larger than the Lambda temp space, you'll have to write hooks and handlers to read from the stream and juggle the chunks into a multipart upload... not impossible, but requires some mental gyrations in order to understand what you're actually trying to persuade your code to do.

429 Too many requests when generating presigned urls for s3 objects using aws-sdk

I have an app which is a Digital Asset Management system. It displays thumbnails. I have these thumbnails set up to be served with AWS S3 presigned urls: https://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURLJavaSDK.html. This piece of code is working, until I change how many items get processed through the request. The application has selections for 25, 50, 100, 200. If I select 100 or 200 the process will fail with "Error: com.amazonaws.AmazonServiceException: Too Many Requests (Service: null; Status Code: 429; Error Code: null; Request ID: null)"
Right now the process is as follows:
Perform a search > run each object key through a method that returns a presigned url for that object.
We run this application through Elastic Container Service which allows us to pull in credentials via ContainerCredentialsProvider.
Relevant code for review:
String s3SignedUrl(String objectKeyUrl) {
// Environment variables for S3 client.
String clientRegion = System.getenv("REGION");
String bucketName = System.getenv("S3_BUCKET");
try {
// S3 credentials get pulled in from AWS via ContainerCredentialsProvider.
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.withCredentials(new ContainerCredentialsProvider())
.build();
// Set the pre-signed URL to expire after one hour.
java.util.Date expiration = new java.util.Date();
long expTimeMillis = expiration.getTime();
expTimeMillis += 1000 * 60 * 60;
expiration.setTime(expTimeMillis);
// Generate the presigned URL.
GeneratePresignedUrlRequest generatePresignedUrlRequest =
new GeneratePresignedUrlRequest(bucketName, objectKeyUrl)
.withMethod(HttpMethod.GET)
.withExpiration(expiration);
return s3Client.generatePresignedUrl(generatePresignedUrlRequest).toString();
} catch (AmazonServiceException e) {
throw new AssetException(FAILED_TO_GET_METADATA, "The call was transmitted successfully, but Amazon " +
"S3 couldn't process it, so it returned an error response. Error: " + e);
} catch (SdkClientException e) {
throw new AssetException(FAILED_TO_GET_METADATA, "Amazon S3 couldn't be contacted for a response, or " +
"the client couldn't parse the response from Amazon S3. Error: " + e);
}
}
And this is the part where we process the items:
// Overwrite the url, it's nested deeply in maps of maps.
for (Object anAssetList : assetList) {
String assetId = ((Map) anAssetList).get("asset_id").toString();
if (renditionAssetRecordMap.containsKey(assetId)) {
String s3ObjectKey = renditionAssetRecordMap.get(assetId).getThumbObjectLocation();
((Map) ((Map) ((Map) anAssetList)
.getOrDefault("rendition_content", new HashMap<>()))
.getOrDefault("thumbnail_content", new HashMap<>()))
.put("url", s3SignedUrl(s3ObjectKey));
}
}
Any guidance would be appreciated. Would love a solution that is simple and hopefully configurable on the AWS side. Otherwise, right now I am looking at adding a process for this to generate the urls in batches.

The problem is unrelated to generating pre-signed URLs. These are done with no interaction with the service, so there is no possible way it could be rate-limited. A pre-signed URL uses an HMAC-SHA algorithm to prove to the service that an entity in possession of the credentials has authorized a specific request. The one-way (non-reversible) nature of HMAC-SHA allows these URLs to be generated entirely on the machine where the code is running, with no service interaction.
However, it seems very likely that repeatedly fetching the credentials is the actual cause of the exception -- and you appear to be doing that unnecessarily over and over.
This is an expensive operation:
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.withCredentials(new ContainerCredentialsProvider())
.build();
Each time you call this again, the credentials have to be fetched again. That's actually the limit you're hitting.
Build your s3client only once, and refactor s3SignedUrl() to expect that object to be passed in, so you can reuse it.
You should see a notable performance improvement, in addition to resolving the 429 error.

Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxxxxxxxxxx)

I am trying to access my s3 bucket using a application deployed on my tomcat running on ec2.
I could see lots of posts related to this, but look like most of them complaint about not having proper access. I have proper access to all buckets, I am able to upload the file from another application using different application like jenkins s3 plugin without any issues. I am clueless why this should happen only for a java web application deployed on tomcat. I have confirmed below things.
The ec2 instance was created with an IAM role.
The IAM role has write access to bucket. The puppet scripts is able to write to bucket.
Tried with other application to check the IAM role and it is working fine with out any issues.
As per my understanding if I do not specify any credentials while creating the S3 bucket client(AmazonS3Client ),it will take the IAM role authentication as default.
This is a sample function which I wrote to test the permission.
public boolean checkWritePermission(String bucketName) {
AmazonS3Client amazonS3Client=new AmazonS3Client();
LOG.info("Checking bucket write permission.....");
boolean hasWritePermissions = false;
final ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(0);
// Create empty content
final InputStream emptyContent = new ByteArrayInputStream(new byte[0]);
// Create a PutObjectRequest with test object
final PutObjectRequest putObjectRequest = new PutObjectRequest(bucketName,
"TestDummy.txt", emptyContent, metadata);
try {
if (amazonS3Client.putObject(putObjectRequest) != null) {
LOG.info("Permissions validated!");
// User has write permissions, TestPassed.
hasWritePermissions = true;
}
}
catch (AmazonClientException s3Ex) {
LOG.warn("Write permissions not available!", s3Ex.getMessage());
LOG.error("Write permissions not available!", s3Ex);
}
return hasWritePermissions;
}
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxxxxxxxxxxx).

Not sure if you have solved this issue yet; however, if you are using custom KMS keys on your bucket and the file you are trying to reach is encrypted with the custom key then this error will also be thrown.
This issue is sometimes hidden by the fact you can still list objects inside your S3 bucket. Make sure your IAM policy includes kms permissions to decrypt.

How do I create a Google Cloud Storage resumable upload URL with Google Client Library for Java on App Engine?

I found the follow note, which describes exactly what I'd like to do:
Note: If your users are only uploading resources (writing) to an access-controlled bucket, you can use the resumable uploads functionality of Google Cloud Storage, and avoid signing URLs or requiring a Google account. In a resumable upload scenario, your (server-side) code authenticates and initiates an upload to Google Cloud Storage without actually uploading any data. The initiation request returns an upload ID, which can then be used in a client request to upload the data. The client request does not need to be signed because the upload ID, in effect, acts as an authentication token. If you choose this path, be sure to transmit the upload ID over HTTPS.
https://cloud.google.com/storage/docs/access-control#Signed-URLs
However, I cannot figure out how to do this with the Google Cloud Storage Library for Java.
https://developers.google.com/resources/api-libraries/documentation/storage/v1/java/latest/
I can't find any reference to resumable files, or getting the URL for a file anywhere in this API. How can I do this?

That library does not expose the URLs that it creates to its caller, which means you can't use it to accomplish this. If you want to use either signed URLs or the trick you mention above, you'll need to implement it manually.
I would advise going with the signed URL solution over the solution where the server initializes the resumable upload, if possible. It's more flexible and easier to get right, and there are some odd edge cases with the latter method that you could run into.
Someone wrote a up a quick example of signing a URL from App Engine a while back in another question: Cloud storage and secure download strategy on app engine. GCS acl or blobstore

You can build the url yourself. Here is an example :
OkHttpClient client = new OkHttpClient();
AppIdentityService appIdentityService = credential.getAppIdentityService();
Collection<String> scopes = credential.getScopes();
String accessToken = appIdentityService.getAccessToken(scopes).getAccessToken();
Request request = new Request.Builder()
.url("https://www.googleapis.com/upload/storage/v1/b/" + bucket + "/o?name=" + fileName + "&uploadType=resumable")
.post(RequestBody.create(MediaType.parse(mimeType), new byte[0]))
.addHeader("X-Upload-Content-Type", mimeType)
.addHeader("X-Upload-Content-Length", "" + length)
.addHeader("Origin", "http://localhost:8080")
.addHeader("Origin", "*")
.addHeader("authorization", "Bearer "+accessToken)
.build();
Response response = client.newCall(request).execute();
return response.header("location");

It took some digging, but I came up with the following which does the right thing. Some official documentation on how to do this would have been nice, especially because the endpoint for actually triggering the resumable upload is different from what the docs call out. What is here came from using the gsutil tool to sign requests and then working out what was being done. The under-documented additional thing is that the code which POSTs to this URL to get a resumable session URL must include the "x-goog-resumable: start" header to trigger the upload. From there, everything is the same as the docs for performing a resumable upload to GCS.
import base64
import datetime
import time
import urllib
from google.appengine.api import app_identity
SIGNED_URL_EXPIRATION = datetime.timedelta(days=7)
def SignResumableUploadUrl(gcs_resource_path):
"""Generates a signed resumable upload URL.
Note that documentation on this ability is sketchy. The canonical source
is derived from running the gsutil program to generate a RESUMABLE URL
with the "-m RESUMABLE" argument. Run "gsutil help signurl" for info and
the following for an example:
gsutil -m RESUMABLE -d 10m keyfile gs://bucket/file/name
Note that this generates a URL different from the standard mechanism for
deriving a resumable start URL and the initiator needs to add the header:
x-goog-resumable:start
Args:
gcs_resource_path: The path of the GCS resource, including bucket name.
Returns:
A full signed URL.
"""
method = "POST"
expiration = datetime.datetime.utcnow() + SIGNED_URL_EXPIRATION
expiration = int(time.mktime(expiration.timetuple()))
signature_string = "\n".join([
method,
"", # content md5
"", # content type
str(expiration),
"x-goog-resumable:start",
gcs_resource_path
])
_, signature_bytes = app_identity.sign_blob(signature_string)
signature = base64.b64encode(signature_bytes)
query_params = {
"GoogleAccessId": app_identity.get_service_account_name(),
"Expires": str(expiration),
"Signature": signature,
}
return "{endpoint}{resource}?{querystring}".format(
endpoint="https://storage.googleapis.com",
resource=gcs_resource_path,
querystring=urllib.urlencode(query_params))

Resumable upload to GCS from Android client using `upload_id` for auth

This is from the GCS Access Control documentation on signed URLs and it matches my use case exactly (the resumable upload scenario):
Note: If your users are only uploading resources (writing) to an
access-controlled bucket, you can use the resumable uploads
functionality of Google Cloud Storage, and avoid signing URLs or
requiring a Google account. In a resumable upload scenario, your
(server-side) code authenticates and initiates an upload to Google
Cloud Storage without actually uploading any data. The initiation
request returns an upload ID, which can then be used in a client
request to upload the data. The client request does not need to be
signed because the upload ID, in effect, acts as an authentication
token. If you choose this path, be sure to transmit the upload ID over
HTTPS.
I have a GAE instance which successfully authenticates and initiates a resumable upload to GCS. As expected, GCS returns my GAE server a response:
HTTP/1.1 200 OK
Location: https://www.googleapis.com/upload/storage/v1/b/<my-apps-default-bucket>/o?uploadType=resumable&name=<my-file-name>&upload_id=xa298sd_sdlkj2
Content-Length: 0
The GAE server hands the Android client the URL from the above location and the Android client uses this to try to PUT the file to GCS. Here is the basic code snippet used:
HttpClient client = new DefaultHttpClient();
String url = <URL-returned in Location-Header>; // exactly the URL returned in the GCS response above
Log.v("PUT URL", url);
HttpPut put = new HttpPut(url);
put.addHeader("Content-Type", "<my-file-mime-type>");
// Note that HttpPut adds the `Content_Length` based on the entity added. Doing it by hand will throw an Exception
MultipartEntityBuilder entityBuilder = MultipartEntityBuilder.create();
entityBuilder.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);
entityBuilder.addBinaryBody("file", new File("<path-to-my-file>"));
HttpEntity entity = entityBuilder.build();
put.setEntity(entity);
HttpResponse response = client.execute(put);
HttpEntity httpEntity = response.getEntity();
responseMsg = EntityUtils.toString(httpEntity);
Log.v("resultMsg", responseMsg);
Logs show the response from GCS for the above PUT is:
{
"error":{
"errors":[
{
"domain":"global",
"reason":"badRequest",
"message":"Invalid Upload Request"
}
],
"code":400,
"message":"Invalid Upload Request"
}
}
My question is to anyone who has gotten the resumable upload scenario to work for this use case: (1) Client asks server to initiate resumable upload, (2) server initiates upload and gets Location with upload_id, (3) server passes these to client, (4) client uses these to upload file to GCS directly with no additional authentication (no signed URL). Is there something I'm missing? According to the documentation it looks like this approach should be working. Does anyone have pointers or experience that could help me out?
Thanks.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Put file to Amazon S3 using multipart upload - java

Related

Amazon S3 Copy between two buckets with different Authentication

429 Too many requests when generating presigned urls for s3 objects using aws-sdk

Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxxxxxxxxxx)

How do I create a Google Cloud Storage resumable upload URL with Google Client Library for Java on App Engine?

Resumable upload to GCS from Android client using `upload_id` for auth

Categories

Resources