How to keep track of bytes uploaded through Azure java - java

I have a use case where if a file is uploaded as a multipart file to azure in java, I want to get the real time updates of how many bytes that are transferred till now. Below is a code snippet i have been using. I couldn't find any implementation that azure provides. Is there any workaround available?
// multipart upload logic
ParallelTransferOptions options = new ParallelTransferOptions()
.setBlockSizeLong(int)
.setMaxConcurrency(int)
.setMaxSingleUploadSizeLong(int);
BlobHttpHeaders headers = getBlobHeaders(localPath, key, folder, null, metadata);
RWLog.MAIN.info("Multipart upload started for " + path);
blob.uploadFromFile(localPath, options, headers, metadata,
tier, null, null);
RWLog.MAIN.info("Multipart upload completed for " + path);

Related

How to choose upload method in GCS Storage

I read that GCS Storage REST api supports 3 upload methods:
simple HTTP uploaded
chunked upload
resumed upload
I see that google-api-services-storage-v1 uses resumed upload approach,
but I am curious how to change this, because resume upload wastes
2 HTTP requests 1 for metadata and the second for data.
Request body of the first request is just {"name": "xxx"}.
InputStreamContent contentStream = new InputStreamContent(
APPLICATION_OCTET_STREAM, stream);
StorageObject objectMetadata = new StorageObject()
.setName(id.getValue());
Storage.Objects.Insert insertRequest = storage.objects().insert(
bucketName, objectMetadata, contentStream);
StorageObject object = insertRequest.execute();
I believe that particular library exclusively uses resumable uploads. Resumable uploads are very useful for large transfers, as they can recover from error and continue the upload. This is indeed sub-optimal in some cases, such as if you wanted to upload a very large number of very small objects one at a time.
If you want to do perform simpler uploads, you might want to consider another library, such as gcloud-java, which can perform direct uploads like so:
Storage storage = StorageOptions.defaultInstance().service();
Bucket bucket = storage.get(bucketName);
bucket.create(objectName, /*byte[] or InputStream*/, contentType);
That'll use only one request, although for larger uploads I recommend sticking with resumable uploads.

Image or Pdf File download from Gcloud Storage is corrupted

I am using file downloaded from Gcloud storage as attachment to Mandrill API for sending as an attachment in email. The problem is it's only working for Text file, but for Image or Pdf, the attachment is corrupted.
Following Code is for downloading the file and converting it to Base64 encoded String.
Storage.Objects.Get getObject = getService().objects().get(bucket, object);
ByteArrayOutputStream out = new ByteArrayOutputStream();
// If you're not in AppEngine, download the whole thing in one request, if possible.
getObject.getMediaHttpDownloader().setDirectDownloadEnabled(true);
getObject.executeMediaAndDownloadTo(out);
//log.info("Output: {}", out.toString("UTF-8"));
return Base64.encodeBase64URLSafeString(out.toString("UTF-8")
.getBytes(StandardCharsets.UTF_8));
I am setting this String in Content of MessageContent of Mandrill API.
Got it working. I only needed to store the OutputStream in a temp File before using it as attachment in email. Posting the code below for reference.
Storage.Objects.Get getObject = storage.objects().get("bucket", "object");
OutputStream out = new FileOutputStream("/tmp/object");
// If you're not in AppEngine, download the whole thing in one request, if possible.
getObject.getMediaHttpDownloader().setDirectDownloadEnabled(true);
getObject.executeMediaAndDownloadTo(out);

How do I create a Google Cloud Storage resumable upload URL with Google Client Library for Java on App Engine?

I found the follow note, which describes exactly what I'd like to do:
Note: If your users are only uploading resources (writing) to an access-controlled bucket, you can use the resumable uploads functionality of Google Cloud Storage, and avoid signing URLs or requiring a Google account. In a resumable upload scenario, your (server-side) code authenticates and initiates an upload to Google Cloud Storage without actually uploading any data. The initiation request returns an upload ID, which can then be used in a client request to upload the data. The client request does not need to be signed because the upload ID, in effect, acts as an authentication token. If you choose this path, be sure to transmit the upload ID over HTTPS.
https://cloud.google.com/storage/docs/access-control#Signed-URLs
However, I cannot figure out how to do this with the Google Cloud Storage Library for Java.
https://developers.google.com/resources/api-libraries/documentation/storage/v1/java/latest/
I can't find any reference to resumable files, or getting the URL for a file anywhere in this API. How can I do this?
That library does not expose the URLs that it creates to its caller, which means you can't use it to accomplish this. If you want to use either signed URLs or the trick you mention above, you'll need to implement it manually.
I would advise going with the signed URL solution over the solution where the server initializes the resumable upload, if possible. It's more flexible and easier to get right, and there are some odd edge cases with the latter method that you could run into.
Someone wrote a up a quick example of signing a URL from App Engine a while back in another question: Cloud storage and secure download strategy on app engine. GCS acl or blobstore
You can build the url yourself. Here is an example :
OkHttpClient client = new OkHttpClient();
AppIdentityService appIdentityService = credential.getAppIdentityService();
Collection<String> scopes = credential.getScopes();
String accessToken = appIdentityService.getAccessToken(scopes).getAccessToken();
Request request = new Request.Builder()
.url("https://www.googleapis.com/upload/storage/v1/b/" + bucket + "/o?name=" + fileName + "&uploadType=resumable")
.post(RequestBody.create(MediaType.parse(mimeType), new byte[0]))
.addHeader("X-Upload-Content-Type", mimeType)
.addHeader("X-Upload-Content-Length", "" + length)
.addHeader("Origin", "http://localhost:8080")
.addHeader("Origin", "*")
.addHeader("authorization", "Bearer "+accessToken)
.build();
Response response = client.newCall(request).execute();
return response.header("location");
It took some digging, but I came up with the following which does the right thing. Some official documentation on how to do this would have been nice, especially because the endpoint for actually triggering the resumable upload is different from what the docs call out. What is here came from using the gsutil tool to sign requests and then working out what was being done. The under-documented additional thing is that the code which POSTs to this URL to get a resumable session URL must include the "x-goog-resumable: start" header to trigger the upload. From there, everything is the same as the docs for performing a resumable upload to GCS.
import base64
import datetime
import time
import urllib
from google.appengine.api import app_identity
SIGNED_URL_EXPIRATION = datetime.timedelta(days=7)
def SignResumableUploadUrl(gcs_resource_path):
"""Generates a signed resumable upload URL.
Note that documentation on this ability is sketchy. The canonical source
is derived from running the gsutil program to generate a RESUMABLE URL
with the "-m RESUMABLE" argument. Run "gsutil help signurl" for info and
the following for an example:
gsutil -m RESUMABLE -d 10m keyfile gs://bucket/file/name
Note that this generates a URL different from the standard mechanism for
deriving a resumable start URL and the initiator needs to add the header:
x-goog-resumable:start
Args:
gcs_resource_path: The path of the GCS resource, including bucket name.
Returns:
A full signed URL.
"""
method = "POST"
expiration = datetime.datetime.utcnow() + SIGNED_URL_EXPIRATION
expiration = int(time.mktime(expiration.timetuple()))
signature_string = "\n".join([
method,
"", # content md5
"", # content type
str(expiration),
"x-goog-resumable:start",
gcs_resource_path
])
_, signature_bytes = app_identity.sign_blob(signature_string)
signature = base64.b64encode(signature_bytes)
query_params = {
"GoogleAccessId": app_identity.get_service_account_name(),
"Expires": str(expiration),
"Signature": signature,
}
return "{endpoint}{resource}?{querystring}".format(
endpoint="https://storage.googleapis.com",
resource=gcs_resource_path,
querystring=urllib.urlencode(query_params))

Google List API PDF Upload with Java

I am looking to upload a PDF using the resumable upload mechanism. However, the web server is throwing a 403 exception which states: "Files must uploaded using the resumable upload mechanism."
This is particularly frustrating, since, the resumable upload mechanism is what I'm using. I am able to change the file to a .txt and it works efficiently.
String contentType = DocumentListEntry.MediaType.fromFileName(file3.getName()).getMimeType();
System.out.println("This is break zero.");
MediaFileSource mediaFile = new MediaFileSource(file3, contentType);
System.out.println("This is break one.");
ResumableGDataFileUploader uploader =
new ResumableGDataFileUploader.Builder(
client, new URL("https://docs.google.com/feeds/default/private/full"), mediaFile, null /*empty meatadata*/)
.title(mediaFile.getName())
.chunkSize(DEFAULT_CHUNK_SIZE).executor(executor)
.trackProgress(listener, PROGRESS_UPDATE_INTERVAL)
.build();
You should send the request to the resumable upload url, which is https://docs.google.com/feeds/upload/create-session/default/private/full.
Check the resumable upload documentation for more details: https://developers.google.com/gdata/docs/resumable_upload

Put file to Amazon S3 using multipart upload

I'm trying to upload a file with the Amazon Java SDK, via multipart upload. The idea is to pass an upload-id to an applet, which puts the file parts into a readonly-bucket. Going this way, I avoid to store AWS credentials in the applet.
In my tests, I generate an upload-id with boto (python) and store a file into the bucket. That works well.
My Applet gets a "403 Access denied" from the S3, and I have no idea why.
Here's my code (which is partially taken from http://docs.amazonwebservices.com/AmazonS3/latest/dev/llJavaUploadFile.html):
AmazonS3 s3Client = new AmazonS3Client();
List<PartETag> partETags = new ArrayList<PartETag>();
long contentLength = file.length();
long partSize = Config.getInstance().getInt("part_size");
String bucketName = Config.getInstance().getString("bucket");
String keyName = "mykey";
String uploadId = getParameter("upload_id");
try {
long filePosition = 0;
for (int i = 1; filePosition < contentLength; i++) {
partSize = Math.min(partSize, (contentLength - filePosition));
// Create request to upload a part.
UploadPartRequest uploadRequest = new UploadPartRequest()
.withBucketName(bucket).withKey(keyName)
.withUploadId(uploadId).withPartNumber(i)
.withFileOffset(filePosition)
.withFile(file)
.withPartSize(partSize);
// Upload part and add response to our list.
partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
filePosition += partSize;
}
System.out.println("Completing upload");
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(bucket,
keyName,
uploadId,
partETags);
s3Client.completeMultipartUpload(compRequest);
} catch (Exception e) {
s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
bucketName, keyName, uploadId));
}
In the applet debug log, I find this, then:
INFO: Sending Request: PUT https://mybucket.s3.amazonaws.com /mykey Parameters: (uploadId: V4hwobOLQ1rYof54zRW0pfk2EfhN7B0fpMJTOpHOcmaUl8k_ejSo_znPI540.lpO.ZO.bGjh.3cx8a12ZMODfA--, partNumber: 1, ) Headers: (Content-Length: 4288546, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
24.01.2012 16:48:42 com.amazonaws.http.AmazonHttpClient handleErrorResponse
INFO: Received error response: Status Code: 403, AWS Service: null, AWS Request ID: DECF32CCFEE9EBF0, AWS Error Code: AccessDenied, AWS Error Message: Access Denied, S3 Extended Request ID: xtL1ixsGM2/vsxJ+cZRHpkPZ23SMfP8hZZjQCQnp8oWGwdS2/aGfYgomihyqaDCQ
Do you find any obvious failures in the code?
Thanks,
Stefan
While your use case is sound and this is an obvious attempt indeed, I don't think the Multipart Upload API has been designed to allow this and you are actually violating a security barrier:
The upload ID is merely an identifier to assist the Multipart Upload API in assembling the parts together (i.e. more like a temporary object key) not a dedicated security mechanism (see below). Consequently you still require proper access credentials in place, but since you are calling AmazonS3Client(), which Constructs a new Amazon S3 client that will make anonymous requests to Amazon S3, your request yields a 403 Access denied accordingly.
What you are trying to achieve is possible via Uploading Objects Using Pre-Signed URLs, albeit only without the multipart functionality, unfortunately:
A pre-signed URL gives you access to the object identified in the URL,
provided that the creator of the pre-signed URL has permissions to
access that object. That is, if you receive a pre-signed URL to upload
an object, you can upload the object only if the creator of the
pre-signed URL has the necessary permissions to upload that object.
[...] The pre-signed URLs
are useful if you want your user/customer to be able upload a specific
object [...], but you don't require them to have AWS security
credentials or permissions. When you create a pre-signed URL, you must
provide your security credentials, specify a bucket name an object
key, an HTTP method (PUT of uploading objects) and an expiration date
and time. [...]
The lenghty quote illustrates, why a system like this likely needs a more complex security design than 'just' handing out an upload ID (as similar as both might appear at first sight).
Obviously one would like to be able to use both features together, but this doesn't appear to be available yet.

Categories