Google Drive resumable upload in v3 - java

I am looking for some help/example to perform a resumeable upload to Google Drive using the new v3 REST API in Java.
I know there is a low level description here: Upload files | Google Drive API. But at the moment I am not willing to understand any of these low level requests, if there isn't another, simpler method ( like former MediaHttpUploader, which is deprecated now...)
What I currently do is:
File fileMetadata = new File();
fileMetadata.setName(name);
fileMetadata.setDescription(...);
fileMetadata.setParents(parents);
fileMetadata.setProperties(...);
FileContent mediaContent = new FileContent(..., file);
drive.files().create(fileMetadata, mediaContent).execute();
But for large files, this isn't good if the connection interrupts.

I've just created an implementation on that recently. It will create a new file on your DriveFolder and return its metadata when the task succeeds. While uploading, it will also update the listener with uploading info. I added comments to make it auto explanable:
public Task<File> createFile(java.io.File yourfile, MediaHttpUploaderProgressListener uploadListener) {
return Tasks.call(mExecutor, () -> {
//Generates an input stream with your file content to be uploaded
FileContent mediaContent = new FileContent("yourFileMimeType", yourfile);
//Creates an empty Drive file
File metadata = new File()
.setParents(parents)
.setMimeType(yourFileMimeType)
.setName(yourFileName);
//Builds up the upload request
Drive.Files.Create uploadFile = mDriveService.files().create(metadata, mediaContent);
//This will handle the resumable upload
MediaHttpUploader uploader = uploadBackup.getMediaHttpUploader();
//choose your chunk size and it will automatically divide parts
uploader.setChunkSize(MediaHttpUploader.MINIMUM_CHUNK_SIZE);
//according to Google, this enables gzip in future (optional)
uploader.setDisableGZipContent(false); versions
//important, this enables resumable upload
uploader.setDirectUploadEnabled(false);
//listener to be updated
uploader.setProgressListener(uploadListener);
return uploadFile.execute();
});
}
And make your Activity extends MediaHttpUploaderProgressListener so you have real time updates on the file progress:
#Override
public void progressChanged(MediaHttpUploader uploader) {
String sizeTemp = "Uploading"
+ ": "
+ Formatter.formatShortFileSize(this, uploader.getNumBytesUploaded())
+ "/"
+ Formatter.formatShortFileSize(this, totalFileSize);
runOnUiThread(() -> textView.setText(sizeTemp));
}
For calculating the progress percentage, you simply do:
double percentage = uploader.getNumBytesUploaded() / totalFileSize
Or use this one:
uploader.getProgress()
It gives you the percentage of bytes that have been uploaded, represented between 0.0 (0%) and 1.0 (100%). But be sure to have your content length specified, otherwise it will throw IllegalArgumentException.

Related

S3 / MinIO with Java / Scala: Saving byte buffers chunks of files to object storage

So, imagine that I have a Scala Vert.x Web REST API that receives file uploads via HTTP multipart requests. However, it doesn't receive the incoming file data as a single InputStream. Instead, each file is received as a series of byte buffers handed over via a few callback functions.
The callbacks basically look like this:
// the callback that receives byte buffers (chunks) of the file being uploaded
// it is called multiple times until the full file has been received
upload.handler { buffer =>
// send chunk to backend
}
// the callback that gets called after the full file has been uploaded
// (i.e. after all chunks have been received)
upload.endHandler { _ =>
// do something after the file has been uploaded
}
// callback called if an exception is raised while receiving the file
upload.exceptionHandler { e =>
// do something to handle the exception
}
Now, I'd like to use these callbacks to save the file into a MinIO Bucket (MinIO, if you're unfamiliar, is basically self-hosted S3 and it's API is pretty much the same as the S3 Java API).
Since I don't have a file handle, I need to use putObject() to put an InputStream into MinIO.
The inefficient work-around I'm currently using with the MinIO Java API looks like this:
// this is all inside the context of handling a HTTP request
val out = new PipedOutputStream()
val in = new PipedInputStream()
var size = 0
in.connect(out)
upload.handler { buffer =>
s.write(buffer.getBytes)
size += buffer.length()
}
upload.endHandler { _ =>
minioClient.putObject(
PutObjectArgs.builder()
.bucket("my-bucket")
.object("my-filename")
.stream(in, size, 50000000)
.build())
}
Obviously, this isn't optimal. Since I'm using a simple java.io stream here, the entire file ends up getting loaded into memory.
I don't want to save the File to disk on the server before putting it into object storage. I'd like to put it straight into my object storage.
How could I accomplish this using the S3 API and a series of byte buffers given to me via the upload.handler callback?
EDIT
I should add that I am using MinIO because I cannot use a commercially-hosted cloud solution, like S3. However, as mentioned on MinIO's website, I can use Amazon's S3 Java SDK while using MinIO as my storage solution.
I attempted to follow this guide on Amazon's website for uploading objects to S3 in chunks.
That solution I attempted looks like this:
context.request.uploadHandler { upload =>
println(s"Filename: ${upload.filename()}")
val partETags = new util.ArrayList[PartETag]
val initRequest = new InitiateMultipartUploadRequest("docs", "my-filekey")
val initResponse = s3Client.initiateMultipartUpload(initRequest)
upload.handler { buffer =>
println("uploading part", buffer.length())
try {
val request = new UploadPartRequest()
.withBucketName("docs")
.withKey("my-filekey")
.withPartSize(buffer.length())
.withUploadId(initResponse.getUploadId)
.withInputStream(new ByteArrayInputStream(buffer.getBytes()))
val uploadResult = s3Client.uploadPart(request)
partETags.add(uploadResult.getPartETag)
} catch {
case e: Exception => println("Exception raised: ", e)
}
}
// this gets called for EACH uploaded file sequentially
upload.endHandler { _ =>
// upload successful
println("done uploading")
try {
val compRequest = new CompleteMultipartUploadRequest("docs", "my-filekey", initResponse.getUploadId, partETags)
s3Client.completeMultipartUpload(compRequest)
} catch {
case e: Exception => println("Exception raised: ", e)
}
context.response.setStatusCode(200).end("Uploaded")
}
upload.exceptionHandler { e =>
// handle the exception
println("exception thrown", e)
}
}
}
This works for files that are small (my test small file was 11 bytes), but not for large files.
In the case of large files, the processes inside the upload.handler get progressively slower as the file continues to upload. Also, upload.endHandler is never called, and the file somehow continues uploading after 100% of the file has been uploaded.
However, as soon as I comment out the s3Client.uploadPart(request) portion inside upload.handler and the s3Client.completeMultipartUpload parts inside upload.endHandler (basically throwing away the file instead of saving it to object storage), the file upload progresses as normal and terminates correctly.
I figured out what I was doing wrong (when using the S3 client). I was not accumulating bytes inside my upload.handler. I need to accumulate bytes until the buffer size is big enough to upload a part, rather than upload each time I receive a few bytes.
Since neither Amazon's S3 client nor the MinIO client did what I want, I decided to dig into how putObject() was actually implemented and make my own. This is what I came up with.
This implementation is specific to Vert.X, however it can easily be generalized to work with built-in java.io InputStreams via a while loop and using a pair of Piped- streams.
This implementation is also specific to MinIO, but it can easily be adapted to use the S3 client since, for the most part, the two APIs are the same.
In this example, Buffer is basically a container around a ByteArray and I'm not really doing anything special here. I replaced it with a byte array to ensure that it would still work, and it did.
package server
import com.google.common.collect.HashMultimap
import io.minio.MinioClient
import io.minio.messages.Part
import io.vertx.core.buffer.Buffer
import io.vertx.core.streams.ReadStream
import scala.collection.mutable.ListBuffer
class CustomMinioClient(client: MinioClient) extends MinioClient(client) {
def putReadStream(bucket: String = "my-bucket",
objectName: String,
region: String = "us-east-1",
data: ReadStream[Buffer],
objectSize: Long,
contentType: String = "application/octet-stream"
) = {
val headers: HashMultimap[String, String] = HashMultimap.create()
headers.put("Content-Type", contentType)
var uploadId: String = null
try {
val parts = new ListBuffer[Part]()
val createResponse = createMultipartUpload(bucket, region, objectName, headers, null)
uploadId = createResponse.result.uploadId()
var partNumber = 1
var uploadedSize = 0
// an array to use to accumulate bytes from the incoming stream until we have enough to make a `uploadPart` request
var partBuffer = Buffer.buffer()
// S3's minimum part size is 5mb, excepting the last part
// you should probably implement your own logic for determining how big
// to make each part based off the total object size to avoid unnecessary calls to S3 to upload small parts.
val minPartSize = 5 * 1024 * 1024
data.handler { buffer =>
partBuffer.appendBuffer(buffer)
val availableSize = objectSize - uploadedSize - partBuffer.length
val isMinPartSize = partBuffer.length >= minPartSize
val isLastPart = uploadedSize + partBuffer.length == objectSize
if (isMinPartSize || isLastPart) {
val partResponse = uploadPart(
bucket,
region,
objectName,
partBuffer.getBytes,
partBuffer.length,
uploadId,
partNumber,
null,
null
)
parts.addOne(new Part(partNumber, partResponse.etag))
uploadedSize += partBuffer.length
partNumber += 1
// empty the part buffer since we have already uploaded it
partBuffer = Buffer.buffer()
}
}
data.endHandler { _ =>
completeMultipartUpload(bucket, region, objectName, uploadId, parts.toArray, null, null)
}
data.exceptionHandler { exception =>
// should also probably abort the upload here
println("Handler caught exception in custom putObject: " + exception)
}
} catch {
// and abort it here as well...
case e: Exception =>
println("Exception thrown in custom `putObject`: " + e)
abortMultipartUpload(
bucket,
region,
objectName,
uploadId,
null,
null
)
}
}
}
This can all be used pretty easily.
First, set up the client:
private val _minioClient = MinioClient.builder()
.endpoint("http://localhost:9000")
.credentials("my-username", "my-password")
.build()
private val myClient = new CustomMinioClient(_minioClient)
Then, where you receive the upload request:
context.request.uploadHandler { upload =>
myClient.putReadStream(objectName = upload.filename(), data = upload, objectSize = myFileSize)
context.response().setStatusCode(200).end("done")
}
The only catch with this implementation is that you need to know the file sizes in advance for the request.
However, this can easily be solved the way I did it, especially if you're using a web UI.
Before attempting to upload the files, send a request to the server containing a map of file name to file size.
That pre-request should generate a unique ID for the upload.
The server can save group of filename->filesize using the upload ID as an index. - Server sends the upload ID back to the client.
Client sends the multipart upload request using the upload ID
Server pulls the list of files and their sizes and uses it to call .putReadStream()

GCP Speech to text - Java API not working

I have a sample .webm file recorded using MediaRecorder in Chrome Browser. When I use Google speech java client to get transcription for the video, it returns empty transcription. Here is what my code looks like
SpeechSettings settings = null;
Path path = Paths.get("D:\\scrap\\gcp_test.webm");
byte[] content = null;
try {
content = Files.readAllBytes(path);
settings = SpeechSettings.newBuilder().setCredentialsProvider(credentialsProvider).build();
} catch (IOException e1) {
throw new IllegalStateException(e1);
}
try (SpeechClient speech = SpeechClient.create(settings)) {
// Builds the request for remote FLAC file
RecognitionConfig config = RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setUseEnhanced(true)
.setModel("video")
.setEnableAutomaticPunctuation(true)
.setSampleRateHertz(48000)
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build();
// RecognitionAudio audio = RecognitionAudio.newBuilder().setUri("gs://xxxx/gcp_test.webm") .build();
// Use blocking call for getting audio transcript
RecognizeResponse response = speech.recognize(config, audio);
List<SpeechRecognitionResult> results = response.getResultsList();
for (SpeechRecognitionResult result : results) {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
}
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
If, I use the same file and visit https://cloud.google.com/speech-to-text/ and upload file in the demo section. It seems to work fine and shows transcription. I am clueless about whats going wrong here. I verified the request sent by demo and here it what looks like
I am sending the exact set of parameters, but that didn't work. Tried uploading file to Cloud storage but that too gave same result (no transcription).
After going through error and trials (and looking at the javascript samples), I could solve the issue. The serialized version of audio should be in FLAC format. I was sending the video file(webm) as is to Google Cloud. The demo on the site extracts audio stream using Javascript Audio API and then sents the data in base64 format to make it work.
Here are the steps that I executed to get the output.
Used FFMPEG to extract audio stream into FLAC format from webm.
ffmpeg -i sample.webm -vn -acodec flac sample.flac
The extracted file should be made available using either Storage cloud or send as ByteString.
Set the appropriate model while calling the speech API (for english language video model works, while for french language command_and_search). I don't have any logical reason for this. I realised it after trial and error with demo on Google cloud site.
I got results with flac encoded file.
Sample code results words with timestamp,
public class SpeechToTextSample {
public static void main(String... args) throws Exception {
try (SpeechClient speechClient = SpeechClient.create()) {
String gcsUriFlac = "gs://yourfile.flac";
RecognitionConfig config =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.FLAC)
.setEnableWordTimeOffsets(true)
.setLanguageCode("en-US")
.build();
RecognitionAudio audio = RecognitionAudio.newBuilder().setUri(gcsUriFlac).build(); //for large files
OperationFuture<LongRunningRecognizeResponse, LongRunningRecognizeMetadata> response = speechClient.longRunningRecognizeAsync(config, audio);
while (!response.isDone()) {
System.out.println("Waiting for response...");
Thread.sleep(1000);
}
// Performs speech recognition on the audio file
List<SpeechRecognitionResult> results = response.get().getResultsList();
for (SpeechRecognitionResult result : results) {
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcription: %s%n", alternative.getTranscript());
for (WordInfo wordInfo : alternative.getWordsList()) {
System.out.println(wordInfo.getWord());
System.out.printf(
"\t%s.%s sec - %s.%s sec\n",
wordInfo.getStartTime().getSeconds(),
wordInfo.getStartTime().getNanos() / 100000000,
wordInfo.getEndTime().getSeconds(),
wordInfo.getEndTime().getNanos() / 100000000);
}
}
}
}
}
GCP supports different languages, I have used "en-US" for my example.
Please refer following link document to know language list.

How to upload multiple files to Google Cloud Storage in a single call using Java API?

We want to upload multiple files to Google Cloud Storage. Currently, we are uploading one by one using the Google Java API. The code is below:
public void uploadFile(File srcFile,String bucketName, String destPath) throws IOException {
BlobId blobId = BlobId.of(bucketName, srcFile.getName());
BlobInfo blobInfo = BlobInfo.newBuilder(blobId).build();
long startTime = System.currentTimeMillis();
// Blob blob = storage.create(blobInfo,new FileInputStream(srcFile));
try (WriteChannel writer = storage.writer(blobInfo)) {
try (FileInputStream in = new FileInputStream(srcFile)){
byte[] buffer = new byte[1024 * 1024 * 100] ;
writer.setChunkSize(buffer.length);
int readSize = 0;
while((readSize = in.read(buffer)) > 0) {
writer.write(ByteBuffer.wrap(buffer, 0, readSize));
}
long endTime = System.currentTimeMillis();
double writeTime = (double)(endTime - startTime) / 1000;
System.out.println("File write time : " + writeTime);
}
}
}
Our application wants to upload multiple files at a time. I tried to find a method to upload multiple files in the Java API, but could not find the any method to upload multiple files using a single call.
If I loop and upload using multiple calls, it is adding a huge network overhead and performance is very slow, which the application cannot afford.
My questions are:
Is it possible to upload multiple files via a single call, either using the Java API or REST API?
Could you please provide an example?
Neither the google-cloud-java library nor the underlying JSON API has an API call to upload multiple files, but you could accomplish what you're trying to do by spawning multiple threads and having each thread upload a file. The gsutil -m option does it this way.

AWS Amazon S3 Java SDK - Refresh credentials / token when expired while uploading large file

I'm trying to upload a large file to a server which uses a token and the token expires after 10 minutes, so if I upload a small file it will work therefore if the file is big than I will get some problems and will be trying to upload for ever while the access is denied
So I need refresh the token in the BasicAWSCredentials which is than used for the AWSStaticCredentialsProvider therefore I'm not sure how can i do it, please help =)
Worth to mention that we use a local server (not amazon cloud) with provides the token and for convenience we use amazon's code.
here is my code:
public void uploadMultipart(File file) throws Exception {
//this method will give you a initial token for a given user,
//than calculates when a new token is needed and will refresh it just when necessary
String token = getUsetToken();
String existingBucketName = myTenant.toLowerCase() + ".package.upload";
String endPoint = urlAPI + "s3/buckets/";
String strSize = FileUtils.byteCountToDisplaySize(FileUtils.sizeOf(file));
System.out.println("File size: " + strSize);
AwsClientBuilder.EndpointConfiguration endpointConfiguration = new AwsClientBuilder.EndpointConfiguration(endPoint, null);//note: Region has to be null
//AWSCredentialsProvider
BasicAWSCredentials sessionCredentials = new BasicAWSCredentials(token, "NOT_USED");//secretKey should be set to NOT_USED
AmazonS3 s3 = AmazonS3ClientBuilder
.standard()
.withCredentials(new AWSStaticCredentialsProvider(sessionCredentials))
.withEndpointConfiguration(endpointConfiguration)
.enablePathStyleAccess()
.build();
int maxUploadThreads = 5;
TransferManager tm = TransferManagerBuilder
.standard()
.withS3Client(s3)
.withMultipartUploadThreshold((long) (5 * 1024 * 1024))
.withExecutorFactory(() -> Executors.newFixedThreadPool(maxUploadThreads))
.build();
PutObjectRequest request = new PutObjectRequest(existingBucketName, file.getName(), file);
//request.putCustomRequestHeader("Access-Token", token);
ProgressListener progressListener = progressEvent -> System.out.println("Transferred bytes: " + progressEvent.getBytesTransferred());
request.setGeneralProgressListener(progressListener);
Upload upload = tm.upload(request);
LocalDateTime uploadStartedAt = LocalDateTime.now();
log.info("Starting upload at: " + uploadStartedAt);
try {
upload.waitForCompletion();
//upload.waitForUploadResult();
log.info("Upload completed. " + strSize);
} catch (Exception e) {//AmazonClientException
log.error("Error occurred while uploading file - " + strSize);
e.printStackTrace();
}
}
Solution found !
I found a way to get this working and for to be honest I quite happy about the result, I've done so many tests with big files (50gd.zip) and in every scenario worked very well
My solution is, remove the line: BasicAWSCredentials sessionCredentials = new BasicAWSCredentials(token, "NOT_USED");
AWSCredentials is a interface so we can override it with something dynamic, the the logic of when the token is expired and needs a new fresh token is held inside the getToken() method meaning you can call every time with no harm
AWSCredentials sessionCredentials = new AWSCredentials() {
#Override
public String getAWSAccessKeyId() {
try {
return getToken(); //getToken() method return a string
} catch (Exception e) {
return null;
}
}
#Override
public String getAWSSecretKey() {
return "NOT_USED";
}
};
When uploading a file (or parts of a multi-part file), the credentials that you use must last long enough for the upload to complete. You CANNOT refresh the credentials as there is no method to update AWS S3 that you are using new credentials for an already signed request.
You could break the upload into smaller files that upload quicker. Then only upload X parts. Refresh your credentials and upload Y parts. Repeat until all parts are uploaded. Then you will need to finish by combining the parts (which is a separate command). This is not a perfect solution as transfer speeds cannot be accurately controlled AND this means that you will have to write your own upload code (which is not hard).

Returning video information Video Intelligence API

I am not able to display the information of the local video, when I do the test with the videos examples are returning, but when I try with the files of the machine does not return anything.
public String consultar() throws Throwable {
requisicaoVideo("C:\\Users\\Web Designer\\Desktop\\Placas de Carros\\cat.mp4");
return "analiseForenseVideos.xhtml";
}
public void requisicaoVideo(String filePath) throws Exception {
try (VideoIntelligenceServiceClient client = VideoIntelligenceServiceClient.create()) {
// Read file and encode into Base64
Path path = Paths.get(filePath);
byte[] data = Files.readAllBytes(path);
byte[] encodedBytes = Base64.encodeBase64(data);
System.out.println(encodedBytes + "Linha 74");
AnnotateVideoRequest request = AnnotateVideoRequest.newBuilder()
.setInputContent(ByteString.copyFrom(encodedBytes)).addFeatures(Feature.LABEL_DETECTION).build();
// Create an operation that will contain the response when the operation
// completes.
OperationFuture<AnnotateVideoResponse, AnnotateVideoProgress> response = client.annotateVideoAsync(request);
System.out.println("Waiting for operation to complete...");
System.out.println(response.get().getAnnotationResultsList() + "Linha 83");
for (VideoAnnotationResults results : response.get().getAnnotationResultsList()) {
// process video / segment level label annotations
System.out.println("Locations: ");
for (LabelAnnotation labelAnnotation : results.getSegmentLabelAnnotationsList()) {
System.out.println("Video label: " + labelAnnotation.getEntity().getDescription());
// categories
for (Entity categoryEntity : labelAnnotation.getCategoryEntitiesList()) {
System.out.println("Video label category: " + categoryEntity.getDescription());
}
// segments
for (LabelSegment segment : labelAnnotation.getSegmentsList()) {
double startTime = segment.getSegment().getStartTimeOffset().getSeconds()
+ segment.getSegment().getStartTimeOffset().getNanos() / 1e9;
double endTime = segment.getSegment().getEndTimeOffset().getSeconds()
+ segment.getSegment().getEndTimeOffset().getNanos() / 1e9;
System.out.printf("Segment location: %.3f:%.2f\n", startTime, endTime);
System.out.println("Confidence: " + segment.getConfidence());
}
}
I am with Google Cloud Support. Thank you for reporting this issue. I have been doing some tests and identified some kind of bug in analyzeLabelsFile function in Detect.java file.
If you let the job run for a lot of time, it might get finished (for me takes 30sec importing the file from Google Cloud Storage and 16min using the local file), but anyway is not providing information, just "Locations: " message at the end.
I have sent all the relevant information regarding this (how to reproduce the issue, possible cause, etc.) to Google Video Intelligence API team so they can have a look.
I have not found a workaround for local files but you can process the file in GCS through its URL and the analyzeLabels function.

Categories