download all files from S3 and upload them in same folder - java

I have enabled versioning on the bucket whenever a new bucket is created. But for backward compatibility (buckets which are already created), i'm downloading all files/keys and uploading again. I'm doing this:
fullObject = s3Client.getObject(new GetObjectRequest(bucketName, key));
But i am not able to figure out while uploading how to upload file in their specific folder. Or is there another solution to fix backward compatibility.

S3 doesn't really have folders, it "fakes" it with logic that looks for common prefixes separated by "/". The "key" is the full path name of the file within the bucket.
It sounds like you are doing a get and then want to do a put of the same bytes to the same key in the bucket. If that is what you want to do, then just use the same key that you used for getting the object.
In a bucket that has versioning turned on, this will result in two copies of the file. It is not clear why you would want to do that (assuming that you are writing back exactly what you are reading). Wouldn't you just end up paying for two copies of the same file?
My understanding is that if you turn on versioning for a bucket that already has files in it, that everything works the way you would expect. If a pre-existing file gets deleted, it just adds a delete marker. If a pre-existing file gets overwritten, it keeps the prior version (as a prior version of the new file). There is no need to pro-actively rewrite existing files when you turn on versioning. If you are concerned, you can easily test this through either the S3 Console, the command line interface, or through one of the language-specific APIs.

Related

How do you use the Zip4J API to add streamed data with specified file permissions

The Zip4J API provides a convenient method for adding a streamed entry to a zip file:
ZipFile.addStream(InputStream stream, ZipParameters pars)
There does not appear to be a method of specifying the 'file permissions' or 'default file permissions' on instances of either the ZipFile or the ZipParameter classes.
The default behaviour is to have all file properties set to false on the entry, which on a unix system, means that there is no read, write, or execute permissions for owner, group, and other. This is inconvenient. I would like to set at least the read permission flag for the owner.
Is there a means for setting the file permissions on a 'streamed' zip file entry (ie one added using the ZipFile.addStream method?
If not (1) is there a means for adding the file permission after the entry has been created (which is actually stored in the underpinning zip-file on disk - see additional information for this caveat)?
Additional Information
Note, once a stream entry has been added to a Zip file it is possible to get and set the file property information from its header data, which can be obtained using the ZipFile.getHeader(entryName) method. However, setting the file permission values using this API does not affect the underpinning zip file directly. Further, there appears to be no way of saving the updated header information to disk (though I might be missing something).
For reference the methods for getting and setting file attributes are:
byte[] FileHeader.getInternalFileAttributes()
void FileHeader.setInternalFileAttributes(byte[] attributes)
byte[] FileHeader.getExternalFileAttributes()
void FileHeader.setExternalFileAttributes(byte[] attributes)
Digging into the zip4j code, indicates that these file attributes are stored in a 4-byte array, where the bits in bytes 2 and 3 (starting from byte 0) represent the unix file permission bits. This is found in the net.lingala.zip4j.util.FileUtils class's apply posix file attributes.
Potential Workaround (which I am trying to avoid)
One workaround that I can see is writing the data from the stream to a temporary file, ensuring that file has the wanted permissions, adding the file to the zip archive, and then removing the temporary file (as it has served its purpose). This approach assumes that the on disk file permissions are correctly maintained, which appears to be the case, when on a 'posix system'.
I would prefer not to use this approach.
I had this same error, I was using version 2.6.1 and then I found this issue:
Unix permissions are messed up
It was fixed for version 2.6.2 and up.
Just adding standard permissions when running in a Linux Box, not letting you as a user to change those.
Check if that version is useful for you.

Overwrite a file on S3 bucket

I am developing a feature where we need to back up our files on S3 bucket with the key pattern as "tmp/yyyy-mm-dd.file_type.fileName"
Now if I am running my app today for backing up the fileName "abc.txt", it will store that as the pattern specified.
Let's say tomorrow "abc.txt" is updated and the updated file now needs to be backed up on S3. Thus, it will be pushed with a different timestamp but with the same fileName present in key of our bucket.
So what should be done such that there is no redundancy on S3 bucket and the file should be overwritten?
It appears that you wish to implement de-duplication so that the same file is not stored multiple times.
Amazon S3 does not provide de-duplication as a feature. Your software would need to recognize such duplication and implement this capability itself.
Alternatively, you might want to use commercial backup software that already has this capability in-built.
Example: MSP36 Backup (formerly CloudBerry Backup)

How to sync directory with AWS S3 using Java SDK?

I came across this article https://aws.amazon.com/blogs/developer/syncing-data-with-amazon-s3/ which made me aware of the uploadDirectory() method. The blog states: "This small bit of code compares the contents of the local directory to the contents in the Amazon S3 bucket and only transfer files that have changed." This does not seem to be entirely correct since it appears to always transfer every file in a given directory as opposed to only the files that have changed.
I was able to do what I wanted using AWSCLI's s3 sync command, however the goal is to be able to do this syncing using the Java SDK. Is it possible to do this same type of sync using the Java SDK?
There is no SDK implementation of s3 sync command. You will have to implement it in Java if needed. According to the CLI doc https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html,
An s3 object will require downloading if one of the following
conditions is true:
The s3 object does not exist in the local directory.
The size of the s3 object differs from the size of the local file.
The last modified time of the s3 object is older than the last
modified time of the local file.
Therefore essentially you will need to compare objects in target bucket with your local files based on above rules.
Also note that above checking will not handle --delete, so you might need to implement the logic for deleting remote objects when the local file does not exist if it is needed.
I've found it, it is TransferManage.uploadDirectory()
TransferManager.copy() might do something similar, but I do not know what behaviour is employed in case a file or directory with the same name and modification time exists on the destination server.

Amazon S3 Java SDK: Recursive copy from S3 to S3 [duplicate]

I came across this article https://aws.amazon.com/blogs/developer/syncing-data-with-amazon-s3/ which made me aware of the uploadDirectory() method. The blog states: "This small bit of code compares the contents of the local directory to the contents in the Amazon S3 bucket and only transfer files that have changed." This does not seem to be entirely correct since it appears to always transfer every file in a given directory as opposed to only the files that have changed.
I was able to do what I wanted using AWSCLI's s3 sync command, however the goal is to be able to do this syncing using the Java SDK. Is it possible to do this same type of sync using the Java SDK?
There is no SDK implementation of s3 sync command. You will have to implement it in Java if needed. According to the CLI doc https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html,
An s3 object will require downloading if one of the following
conditions is true:
The s3 object does not exist in the local directory.
The size of the s3 object differs from the size of the local file.
The last modified time of the s3 object is older than the last
modified time of the local file.
Therefore essentially you will need to compare objects in target bucket with your local files based on above rules.
Also note that above checking will not handle --delete, so you might need to implement the logic for deleting remote objects when the local file does not exist if it is needed.
I've found it, it is TransferManage.uploadDirectory()
TransferManager.copy() might do something similar, but I do not know what behaviour is employed in case a file or directory with the same name and modification time exists on the destination server.

Downloading and storing files with checksum information in Java

We are building a service to front fetching remote static files to our android app. The service will give a readout of the current md5 checksum of a file. The concept is that we retain the static file on the device until the checksum changes. When the file changes, the service will return a different checksum and this is the trigger for the device to download the file again.
I was thinking of just laying the downloaded files down in the file system with a .md5 file next to each one. When the code starts up, I'd go over all the files and make a map of file_name (known to be unique) to checksum. Then on requests for a file I'd check the remote service (whose response would only be checked every few minutes) and compare the result against that in the map.
The more I thought about this, the more I thought someone must have already done it. So before I put time into this I was wondering if there was a project out there doing this. I did some searching but could not find any.
Yes, it's built into HTTP. You can use conditional requests and cache files based on ETags, Last-Modified, etc. If you are looking for a library that implements your particular caching scheme, it's a bit unlikely that one exists. Write one and share it on GitHub :)

Categories