Google List API PDF Upload with Java - java

I am looking to upload a PDF using the resumable upload mechanism. However, the web server is throwing a 403 exception which states: "Files must uploaded using the resumable upload mechanism."
This is particularly frustrating, since, the resumable upload mechanism is what I'm using. I am able to change the file to a .txt and it works efficiently.
String contentType = DocumentListEntry.MediaType.fromFileName(file3.getName()).getMimeType();
System.out.println("This is break zero.");
MediaFileSource mediaFile = new MediaFileSource(file3, contentType);
System.out.println("This is break one.");
ResumableGDataFileUploader uploader =
new ResumableGDataFileUploader.Builder(
client, new URL("https://docs.google.com/feeds/default/private/full"), mediaFile, null /*empty meatadata*/)
.title(mediaFile.getName())
.chunkSize(DEFAULT_CHUNK_SIZE).executor(executor)
.trackProgress(listener, PROGRESS_UPDATE_INTERVAL)
.build();

You should send the request to the resumable upload url, which is https://docs.google.com/feeds/upload/create-session/default/private/full.
Check the resumable upload documentation for more details: https://developers.google.com/gdata/docs/resumable_upload

Related

AWS S3 Using transferManager to make multipart upload by sending parts

I have recently learned about TransferManager class in AWS S3.
This behind the scenes creates multi-part upload, however it seems like I need to put whole file inside for it to work.
I receive my file in parts, so I need to create multipart upload manually. Is something like that possible using TransferManager? For example instead of
Upload upload = tm.upload(bucketName, keyName, new File(filePath));
to use for example something like
Upload upload = tm.upload(bucketName, keyName, partOfFile1);
Upload upload = tm.upload(bucketName, keyName, partOfFile2);
Upload upload = tm.upload(bucketName, keyName, partOfFile3);
Or am I stuck with AmazonS3 class when I need to upload file by parts manually?
Thanks for help!

How to choose upload method in GCS Storage

I read that GCS Storage REST api supports 3 upload methods:
simple HTTP uploaded
chunked upload
resumed upload
I see that google-api-services-storage-v1 uses resumed upload approach,
but I am curious how to change this, because resume upload wastes
2 HTTP requests 1 for metadata and the second for data.
Request body of the first request is just {"name": "xxx"}.
InputStreamContent contentStream = new InputStreamContent(
APPLICATION_OCTET_STREAM, stream);
StorageObject objectMetadata = new StorageObject()
.setName(id.getValue());
Storage.Objects.Insert insertRequest = storage.objects().insert(
bucketName, objectMetadata, contentStream);
StorageObject object = insertRequest.execute();
I believe that particular library exclusively uses resumable uploads. Resumable uploads are very useful for large transfers, as they can recover from error and continue the upload. This is indeed sub-optimal in some cases, such as if you wanted to upload a very large number of very small objects one at a time.
If you want to do perform simpler uploads, you might want to consider another library, such as gcloud-java, which can perform direct uploads like so:
Storage storage = StorageOptions.defaultInstance().service();
Bucket bucket = storage.get(bucketName);
bucket.create(objectName, /*byte[] or InputStream*/, contentType);
That'll use only one request, although for larger uploads I recommend sticking with resumable uploads.

How do I create a Google Cloud Storage resumable upload URL with Google Client Library for Java on App Engine?

I found the follow note, which describes exactly what I'd like to do:
Note: If your users are only uploading resources (writing) to an access-controlled bucket, you can use the resumable uploads functionality of Google Cloud Storage, and avoid signing URLs or requiring a Google account. In a resumable upload scenario, your (server-side) code authenticates and initiates an upload to Google Cloud Storage without actually uploading any data. The initiation request returns an upload ID, which can then be used in a client request to upload the data. The client request does not need to be signed because the upload ID, in effect, acts as an authentication token. If you choose this path, be sure to transmit the upload ID over HTTPS.
https://cloud.google.com/storage/docs/access-control#Signed-URLs
However, I cannot figure out how to do this with the Google Cloud Storage Library for Java.
https://developers.google.com/resources/api-libraries/documentation/storage/v1/java/latest/
I can't find any reference to resumable files, or getting the URL for a file anywhere in this API. How can I do this?
That library does not expose the URLs that it creates to its caller, which means you can't use it to accomplish this. If you want to use either signed URLs or the trick you mention above, you'll need to implement it manually.
I would advise going with the signed URL solution over the solution where the server initializes the resumable upload, if possible. It's more flexible and easier to get right, and there are some odd edge cases with the latter method that you could run into.
Someone wrote a up a quick example of signing a URL from App Engine a while back in another question: Cloud storage and secure download strategy on app engine. GCS acl or blobstore
You can build the url yourself. Here is an example :
OkHttpClient client = new OkHttpClient();
AppIdentityService appIdentityService = credential.getAppIdentityService();
Collection<String> scopes = credential.getScopes();
String accessToken = appIdentityService.getAccessToken(scopes).getAccessToken();
Request request = new Request.Builder()
.url("https://www.googleapis.com/upload/storage/v1/b/" + bucket + "/o?name=" + fileName + "&uploadType=resumable")
.post(RequestBody.create(MediaType.parse(mimeType), new byte[0]))
.addHeader("X-Upload-Content-Type", mimeType)
.addHeader("X-Upload-Content-Length", "" + length)
.addHeader("Origin", "http://localhost:8080")
.addHeader("Origin", "*")
.addHeader("authorization", "Bearer "+accessToken)
.build();
Response response = client.newCall(request).execute();
return response.header("location");
It took some digging, but I came up with the following which does the right thing. Some official documentation on how to do this would have been nice, especially because the endpoint for actually triggering the resumable upload is different from what the docs call out. What is here came from using the gsutil tool to sign requests and then working out what was being done. The under-documented additional thing is that the code which POSTs to this URL to get a resumable session URL must include the "x-goog-resumable: start" header to trigger the upload. From there, everything is the same as the docs for performing a resumable upload to GCS.
import base64
import datetime
import time
import urllib
from google.appengine.api import app_identity
SIGNED_URL_EXPIRATION = datetime.timedelta(days=7)
def SignResumableUploadUrl(gcs_resource_path):
"""Generates a signed resumable upload URL.
Note that documentation on this ability is sketchy. The canonical source
is derived from running the gsutil program to generate a RESUMABLE URL
with the "-m RESUMABLE" argument. Run "gsutil help signurl" for info and
the following for an example:
gsutil -m RESUMABLE -d 10m keyfile gs://bucket/file/name
Note that this generates a URL different from the standard mechanism for
deriving a resumable start URL and the initiator needs to add the header:
x-goog-resumable:start
Args:
gcs_resource_path: The path of the GCS resource, including bucket name.
Returns:
A full signed URL.
"""
method = "POST"
expiration = datetime.datetime.utcnow() + SIGNED_URL_EXPIRATION
expiration = int(time.mktime(expiration.timetuple()))
signature_string = "\n".join([
method,
"", # content md5
"", # content type
str(expiration),
"x-goog-resumable:start",
gcs_resource_path
])
_, signature_bytes = app_identity.sign_blob(signature_string)
signature = base64.b64encode(signature_bytes)
query_params = {
"GoogleAccessId": app_identity.get_service_account_name(),
"Expires": str(expiration),
"Signature": signature,
}
return "{endpoint}{resource}?{querystring}".format(
endpoint="https://storage.googleapis.com",
resource=gcs_resource_path,
querystring=urllib.urlencode(query_params))

Resumable upload to GCS from Android client using `upload_id` for auth

This is from the GCS Access Control documentation on signed URLs and it matches my use case exactly (the resumable upload scenario):
Note: If your users are only uploading resources (writing) to an
access-controlled bucket, you can use the resumable uploads
functionality of Google Cloud Storage, and avoid signing URLs or
requiring a Google account. In a resumable upload scenario, your
(server-side) code authenticates and initiates an upload to Google
Cloud Storage without actually uploading any data. The initiation
request returns an upload ID, which can then be used in a client
request to upload the data. The client request does not need to be
signed because the upload ID, in effect, acts as an authentication
token. If you choose this path, be sure to transmit the upload ID over
HTTPS.
I have a GAE instance which successfully authenticates and initiates a resumable upload to GCS. As expected, GCS returns my GAE server a response:
HTTP/1.1 200 OK
Location: https://www.googleapis.com/upload/storage/v1/b/<my-apps-default-bucket>/o?uploadType=resumable&name=<my-file-name>&upload_id=xa298sd_sdlkj2
Content-Length: 0
The GAE server hands the Android client the URL from the above location and the Android client uses this to try to PUT the file to GCS. Here is the basic code snippet used:
HttpClient client = new DefaultHttpClient();
String url = <URL-returned in Location-Header>; // exactly the URL returned in the GCS response above
Log.v("PUT URL", url);
HttpPut put = new HttpPut(url);
put.addHeader("Content-Type", "<my-file-mime-type>");
// Note that HttpPut adds the `Content_Length` based on the entity added. Doing it by hand will throw an Exception
MultipartEntityBuilder entityBuilder = MultipartEntityBuilder.create();
entityBuilder.setMode(HttpMultipartMode.BROWSER_COMPATIBLE);
entityBuilder.addBinaryBody("file", new File("<path-to-my-file>"));
HttpEntity entity = entityBuilder.build();
put.setEntity(entity);
HttpResponse response = client.execute(put);
HttpEntity httpEntity = response.getEntity();
responseMsg = EntityUtils.toString(httpEntity);
Log.v("resultMsg", responseMsg);
Logs show the response from GCS for the above PUT is:
{
"error":{
"errors":[
{
"domain":"global",
"reason":"badRequest",
"message":"Invalid Upload Request"
}
],
"code":400,
"message":"Invalid Upload Request"
}
}
My question is to anyone who has gotten the resumable upload scenario to work for this use case: (1) Client asks server to initiate resumable upload, (2) server initiates upload and gets Location with upload_id, (3) server passes these to client, (4) client uses these to upload file to GCS directly with no additional authentication (no signed URL). Is there something I'm missing? According to the documentation it looks like this approach should be working. Does anyone have pointers or experience that could help me out?
Thanks.

Put file to Amazon S3 using multipart upload

I'm trying to upload a file with the Amazon Java SDK, via multipart upload. The idea is to pass an upload-id to an applet, which puts the file parts into a readonly-bucket. Going this way, I avoid to store AWS credentials in the applet.
In my tests, I generate an upload-id with boto (python) and store a file into the bucket. That works well.
My Applet gets a "403 Access denied" from the S3, and I have no idea why.
Here's my code (which is partially taken from http://docs.amazonwebservices.com/AmazonS3/latest/dev/llJavaUploadFile.html):
AmazonS3 s3Client = new AmazonS3Client();
List<PartETag> partETags = new ArrayList<PartETag>();
long contentLength = file.length();
long partSize = Config.getInstance().getInt("part_size");
String bucketName = Config.getInstance().getString("bucket");
String keyName = "mykey";
String uploadId = getParameter("upload_id");
try {
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
partSize = Math.min(partSize, (contentLength - filePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucket).withKey(keyName)
.withUploadId(uploadId).withPartNumber(i)
.withFileOffset(filePosition)
.withFile(file)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
filePosition += partSize;
}
System.out.println("Completing upload");
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(bucket,
keyName,
uploadId,
partETags);
s3Client.completeMultipartUpload(compRequest);
} catch (Exception e) {
s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, keyName, uploadId));
}
In the applet debug log, I find this, then:
INFO: Sending Request: PUT https://mybucket.s3.amazonaws.com /mykey Parameters: (uploadId: V4hwobOLQ1rYof54zRW0pfk2EfhN7B0fpMJTOpHOcmaUl8k_ejSo_znPI540.lpO.ZO.bGjh.3cx8a12ZMODfA--, partNumber: 1, ) Headers: (Content-Length: 4288546, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
24.01.2012 16:48:42 com.amazonaws.http.AmazonHttpClient handleErrorResponse
INFO: Received error response: Status Code: 403, AWS Service: null, AWS Request ID: DECF32CCFEE9EBF0, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: xtL1ixsGM2/vsxJ+cZRHpkPZ23SMfP8hZZjQCQnp8oWGwdS2/aGfYgomihyqaDCQ
Do you find any obvious failures in the code?
Thanks,
Stefan
While your use case is sound and this is an obvious attempt indeed, I don't think the Multipart Upload API has been designed to allow this and you are actually violating a security barrier:
The upload ID is merely an identifier to assist the Multipart Upload API in assembling the parts together (i.e. more like a temporary object key) not a dedicated security mechanism (see below). Consequently you still require proper access credentials in place, but since you are calling AmazonS3Client(), which Constructs a new Amazon S3 client that will make anonymous requests to Amazon S3, your request yields a 403 Access denied accordingly.
What you are trying to achieve is possible via Uploading Objects Using Pre-Signed URLs, albeit only without the multipart functionality, unfortunately:
A pre-signed URL gives you access to the object identified in the URL,
provided that the creator of the pre-signed URL has permissions to
access that object. That is, if you receive a pre-signed URL to upload
an object, you can upload the object only if the creator of the
pre-signed URL has the necessary permissions to upload that object.
[...] The pre-signed URLs
are useful if you want your user/customer to be able upload a specific
object [...], but you don't require them to have AWS security
credentials or permissions. When you create a pre-signed URL, you must
provide your security credentials, specify a bucket name an object
key, an HTTP method (PUT of uploading objects) and an expiration date
and time. [...]
The lenghty quote illustrates, why a system like this likely needs a more complex security design than 'just' handing out an upload ID (as similar as both might appear at first sight).
Obviously one would like to be able to use both features together, but this doesn't appear to be available yet.

Categories