Upload 2GB file with Apache HttpClient - java

I'm new to Apache HC and I'm wondering what is the best way to upload files of about 1 or 2 GB size.
I am using the Minio SDK to retrieve a presigned url from the server. After that, I am sending this presigned url to the client that will upload the specified file.
From Minio side, the max size per put operation is 5GiB so there should be no problems from minio side. My main concern is:
What would be the best way to achieve the upload of the file from Apache HC in order to get the best performance / less error prone behaviour?
I'm guessing that directly uploading a 2GB file is not a good option. Does the Apache HttpClient handles that upload in case an error occur? Is it convenient to upload the file as parts? How do I achieve that?

From Minio side, the max size per put operation is 5GiB so there should be no problems from minio side. My main concern is:
What would be the best way to achieve the upload of the file from Apache HC in order to get the best performance / less error prone behaviour?
I'm guessing that directly uploading a 2GB file is not a good option. Does the Apache HttpClient handles that upload in case an error occur? Is it convenient to upload the file as parts? How do I achieve that?
Minio server changed the maximum size to 16GiB per PutObject. Apache Client should handle uploading upto 2GiB without issues.

Related

Uploading large size (more than 1 GB) file on Amazon S3 using Java: Large files temporary consuming lot of space in server

I am trying to upload large files (more than 1 GB) on amazon S3 using Java
I am using AWS S3 multipart upload to upload large files in chunks.
https://docs.aws.amazon.com/AmazonS3/latest/dev/HLuploadFileJava.html
I am using also uploading the files in chunks from the frontend.
So, the file being uploaded will be temporarily uploaded on the server in chunks and it will be uploaded on S3 in chunks.
Now the problem is that this method puts a huge load on the server since this consumes server space temporarily. If multiple users are trying to upload large files at the same time then it will create an issue.
Is there any way of directly uploaded files from the user's system to amazon S3 in chunks without storing the file on server temporarily?
If upload the files via frontend directly then there a major risk of keys getting exposed.
You should leverage the upload directly from client with Signed URL
There are plenty documentation for this
AWS SDK Presigned URL + Multipart upload
The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions.
You could also be interested in limiting the size that user is able to upload
Limit Size Of Objects While Uploading To Amazon S3 Using Pre-Signed URL
Think about signed URL as a temporary credential for client to access a specific S3 location. These credential expire in a short time so there is less security concern, but do remember to restrict the access of the signed URLs appropriately

Best strategy to upload files with unknown size to S3

I have a server-side application that runs through a large number of image URLs and uploads the images from these URLs to S3.
The files are served over HTTP. I download them using InputStream I get from an HttpURLConnection using the getInputStream method. I hand the InputStream to AWS S3 Client putObject method (AWS Java SDK v1) to upload the stream to S3. So far so good.
I am trying to introduce a new external image data source. The problem with this data source is that the HTTP server serving these images does not return a Content-Length HTTP header. This means I cannot tell how many bytes the image will be, which is a number required by the AWS S3 client to validate the image was correctly uploaded from the stream to S3.
The only ways I can think of dealing with this issue is to either get the server owner to add Content-Length HTTP header to their response (unlikely), or to download the file to a memory buffer first and then upload it to S3 from there.
These are not big files, but I have many of them.
When considering downloading the file first, I am worried about the memory footprint and concurrency implications (not being able to upload and download chunks of the same file at the same time).
Since I am dealing with many small files, I suspect that concurrency issues might be "resolved" if I focus on the concurrency of the multiple files instead of a single file. So instead of concurrently downloading and uploading chunks of the same file, I will use my IO effectively downloading one file while uploading another.
I would love your ideas on how to do this, best practices, pitfalls or any other thought on how to best tackle this issue.

Streaming large volume of data over http

I need to read about millions of xmls (about few gbs
) and stream them over http via rest GET call with low latency. What would be the options to achieve this with java and/or open source tools.
Thank you
One option is to do streaming attachment over SOAP using MTOM. See link

How to upload a larger file from a java mobile app?

I have done a backup and restore application for java phone including nokia, it works fine but pictures larger than 1 MB cannot be uploaded is that possible to upload a file larger than 1 MB, if so please suggest me whether it is possible on HTTP or FTP.
Thank you.
Have a look at this step by step tutorial. What you need is to send files in multiple parts over a persistent HTTP connection.
Uploading files to HTTP server using POST on Android.

Rate Limit s3 Upload using Java API

I am using the java aws sdk to transfer large files to s3. Currently I am using the upload method of the TransferManager class to enable multi-part uploads. I am looking for a way to throttle the rate at which these files are transferred to ensure I don't disrupt other services running on this CentOS server. Is there something I am missing in the API, or some other way to achieve this?
Without support in the API for this, one approach is to wrap the s3 command with trickle.

Categories