How to compress files on azure data lake store - java

I'm using Azure data lake store as a storage service for my Java app, sometimes I need to compress multiples files, what I do for now is I copy all files into the server compress them locally and then send the zip to azure, even though this is work it take a lot of time, so I'm wondering is there a way to compress files directly on azure, I checked the data-lake-store-SDK, but there's no such functionality.

Unfortunately, at the moment there is no option to do that sort of compression.
There is an open feature request HTTP compression support for Azure Storage Services (via Accept-Encoding/Content-Encoding fields) that discusses uploading compressed files to Azure Storage, but there is no estimation on when this feature might be released.
The only option for you is to implement such a mechanism on your own (using an Azure Function for example).
Hope it helps!

Related

Unable to access uploaded resources in Spring on Heroku [duplicate]

I've built an app where users can upload their avatars. I used the paperclip gem and everything works fine on my local machine. On Heroku everything works fine until server restart. Then every uploaded images disappear. Is it possible to keep them on the server?
Notice: I probably should use services such as Amazon S3 or Google Cloud. However each of those services require credit card or banking account information, even if you want to use a free mode. This is a small app just for my portfolio and I would rather avoid sending that information.
No, this isn't possible. Heroku's filesystem is ephemeral and there is no way to make it persistent. You will lose your uploads every time your dyno restarts.
You must use an off-site file storage service like Amazon S3 if you want to store files long-term.
(Technically you could store your images directly in your database, e.g. as a bytea in Postgres, but I strongly advise against that. It's not very efficient and then you have to worry about how to provide the saved files to the browser. Go with S3 or something similar.)

How to share files between application running on multiple servers

I have a java application running on Apache tomcat on two different servers A and B. The application involves uploading and downloading files mostly pdf and images. Currently I have an FTP server F ,where I host all my files. Now I am having the following problems:
Uploading and Downloading of files is causing issues while creating FTP connection (Sometimes it connects and Sometimes it throws the timeout error).
I am displaying images by converting them into BASE 64 format, which causes the same trouble discussed above.
Solutions that I can think of is
Use application server to host files (Is it a right practice??),
also as I have two different servers running the application it
would be tough to create a sync between them.
I have heard something about shared file hosting but that will cause security troubles.
Any solutions for my above problem would be really appreciated.Thanks
If your application uses a database, you could store these files as LOBs (Character or binary large objects)in the database instead of on disk.
If the files are small you can store them as CLOB or BLOB in a database and serve them through HTTP (rest endpoints from your application server)
If your files are large, store them in a NAS or any other shared storage. Don't convert them to BASE64 instead serve them as binary attachments over HTTP (rest endpoints from your application server). You may or may not store the file locations somewhere maybe in Database to keep track of it.

Azure storage blob upload from url

Is there a way to do this?
I have plenty of files over few servers and Amazon s3 storage and need to upload to Azure from an app (Java / Ruby)
I prefer not to download these files on my app server and then upload it to Azure blob storage.
I've checked the Java and Ruby sdk, it seems there's no straight way to do this based on the examples (means I have to download these files first on my app server and upload it to Azure)
Update:
Just found out about CloudBlockBlob.startCopy() in the Java SDK.
Tried it and it's basically what I want without using Third party tools like AzCopy.
You have a few options, mostly licensed, but I think that AzureCopy is your best free alternative. You can check a step by step experience on MSDN Blogs.
All you need is your Access Keys for both services and with a simple command:
azurecopy -i https://mybucket.s3-us-west-2.amazonaws.com/ -o https://mystorage.blob.core.windows.net/mycontainer -azurekey %AzureAccountKey% -s3k %AWSAccessKeyID% -s3sk %AWSSecretAccessKeyID% -blobcopy -destblobtype block
You can pass blobs from one container to the other.
As #EmilyGerner said, AzCopy is the Microsoft offical tool, and AzureCopy that #MatiasQuaranta said is the third party tool on GitHub https://github.com/kpfaulkner/azurecopy.
The simple way is using AWS Command Line and AzCopy to copy all files from S3 to local directory to Azure Blob Storage. You can refer to my answer for the other thread Migrating from Amazon S3 to Azure Storage (Django web app). But it is only suitable for a small data size bucket.
The other effective way is programming with SDKs of Amazon S3 and Azure Blob Storage for Java. Per my experience, Azure SDK APIs for Java is similiar with C#‘s, so you can try to refer to Azure Blob Storage Getstarted doc for Java and AWS SDK for Java to follow #GauravMantri sample code to rewrite code in Java.

Rate Limit s3 Upload using Java API

I am using the java aws sdk to transfer large files to s3. Currently I am using the upload method of the TransferManager class to enable multi-part uploads. I am looking for a way to throttle the rate at which these files are transferred to ensure I don't disrupt other services running on this CentOS server. Is there something I am missing in the API, or some other way to achieve this?
Without support in the API for this, one approach is to wrap the s3 command with trickle.

scalable file upload/download permissions

What would be a scalable file upload/download system/database?
I'm building a website where users can login, upload images that are private, but truly private. I can't upload them to a map on the harddisk of a server, since that would not scale (what happend if we add more servers?) and it wouldn't be private since everyone could go:
http://127.372.171.33/images/private_picture.png
and download the file.
I am building the project in Play Framework (scala/java)
How do websites like flickr handle these kind of things? Do they put them in a database? And what kind of database would be suitable for this situation?
Thanks for help
I can't tell you how those big sites handle it but putting those images into a database might be one way.
Another way would be to put the files into a virtual filesystem that spans a cluster of servers or distribute them onto different servers and just don't make the directories that contain the images visible to the webserver. Thus nobody should be able to open the image just using the server and the path on that server.
To actually deliver the images you could them implement some streaming service that sends a bytestream to the browser for display (like the webservers would do as well). This service could first check the download permissions for the requested image.

Categories