Downloading and storing files with checksum information in Java - java

We are building a service to front fetching remote static files to our android app. The service will give a readout of the current md5 checksum of a file. The concept is that we retain the static file on the device until the checksum changes. When the file changes, the service will return a different checksum and this is the trigger for the device to download the file again.
I was thinking of just laying the downloaded files down in the file system with a .md5 file next to each one. When the code starts up, I'd go over all the files and make a map of file_name (known to be unique) to checksum. Then on requests for a file I'd check the remote service (whose response would only be checked every few minutes) and compare the result against that in the map.
The more I thought about this, the more I thought someone must have already done it. So before I put time into this I was wondering if there was a project out there doing this. I did some searching but could not find any.

Yes, it's built into HTTP. You can use conditional requests and cache files based on ETags, Last-Modified, etc. If you are looking for a library that implements your particular caching scheme, it's a bit unlikely that one exists. Write one and share it on GitHub :)

Related

Azure Functions and temporary File Storage

I'm a beginner and have never dealt with cloud-based solutions yet before, so apologies for the dumb question.
I have an Azure Blob Storage containing PDF files from which I want to extract data using PDFBox. Because PDFbox can't load blobs directly, I currently download these files locally first. However, eventually my project will need to become fully Cloud-based, preferably as an Azure Function.
The main hurdle therefore is figuring out how my Azure Function should access the files. When using the console inside my Azure Function I noticed it comes with a file storage. Can the Function download blobs and store them here before processing it? Does this file storage work the same as a local environment or are there differences to keep in mind?
I'm only looking to store files temporarily here, for only a few minutes at a time.
The main hurdle therefore is figuring out how my Azure Function should
access the files. When using the console inside my Azure Function I
noticed it comes with a file storage.
Yes, all of the information of your deployed azure function is stored in the file storage you set.(It is defined when you create the function app.)
Can the Function download blobs and store them here before processing
it? Does this file storage work the same as a local environment or are
there differences to keep in mind?
Yes, you can. And the root directory is D:/home/site/wwwroot. So if you don't specify, the file you create will be in this directory.
Remember to delete the files, because the storage space is limited. It is based on the plan you selected.
I'm only looking to store files temporarily here, for only a few
minutes at a time.
By the way, if you get a file from blob storage, at this time you have completely got its data. You can process the obtained data directly in the code without temporarily storing it in the current folder. (Of course, if you have special needs, please ignore this one.)
You can use a blob trigger or input binding to load a blob into memory of your function for processing by PDFBox.
With regards to the local file system, you can read about more about it here. From the description of your problem I think a blob trigger or input binding should be sufficient for you.

Putting file to S3 right after it's created

I have two machines with different Java applications that both run on Linux and use a common Windows share folder. One app is triggering another to generate a specific file (e.g. image/pdf). Then the first app tries to upload the generated file to S3. The problem is I sometimes get this:
com.amazonaws.services.s3.model.AmazonS3Exception: The Content-MD5 you specified did not match what we received.
OR this:
com.amazonaws.AmazonClientException: Data read has a different length than the expected: dataLength=247898; expectedLength=262062; includeSkipped=false; in.getClass()=class com.amazonaws.internal.ResettableInputStream; markedSupported=true; marked=0; resetSinceLastMarked=false; markCount=1; resetCount=0
All the processes are happening synchronously, one after another (i have also checked the logs which show no concurrent activity). Also I am not setting the md5 hash or the content length by myself, aws-sdk handles it by itself.
So my guess is that the generating application has written a file and returned but in fact it is still being written by the OS in background and that is why the first app is getting an incomplete file.
I would really appreciate suggestions on how to handle such situations. Maybe there is a way to detect if the file is not currently being modified by the OS?
I was experiencing AmazonS3Exception: The Content-MD5 you specified did not match what we received. I finally solved it by addressing the first item on the list below, not terribly obvious.
Possible Solutions For Anyone Else:
Make sure not to use the same ObjectMetadata object across multiple putObject calls.
Consider disabling ChunkedEncoding. client.setS3ClientOptions(S3ClientOptions.builder().disableChunkedEncoding().build())
Make sure the file isn't being edited while it's being uploaded.

How to append data to existing Dropbox file?

I'm accessing Dropbox API to upload and download files using Java. Now, I need to create a function which can append data to existing Dropbox file.
I've a working code which first downloads a file and then uploads it with the text appended. However, is there is a better way to do this, because my code has is inefficient?
Thanks in advance. :)
Conventionally there is no support for direct file editing in Dropbox, so what you looking for is not supported in existing APIs of Dropbox, possibly what you are doing currently,
first downloads a file and then uploads it with the text appended
is the best (and the only) way of modifying files in Dropbox cloud.
But apart from this it does support file revision mechanism, which can be achieved with help of /delta, /revision
A way of letting you keep up with changes to files and folders in a
user's Dropbox. You can periodically call /delta to get a list of
"delta entries", which are instructions on how to update your local
state to match the server's state.
https://www.dropbox.com/developers-v1/core/docs#revisions
Best Luck :)

Grails App on Tomcat to Password Protect Files

I need to write a little Grails (or Java) app that will handle authentication (from our proprietary Single Sign On system) and then once authenticated allow a user to download files. This is very straight forward if I simply include the files in the WAR file of the application, however, I'd like to avoid that since there will be multiple files and I'd rather not have to upload a new WAR file every time we add a new file. Is it possible to accomplish this by having the application be in a WAR file but the files outside the WAR file, if so, how do I configure this kind of setup? We'll be running this on Tomcat.
Yes this is possible. Without knowing all your requirements or what you have tried and why it didn't work work for you the best I can do is give you a general idea of how to accomplish this.
Have a controller that takes an ID of the file that you want the user to download. Based on this key find the associated Domain instance. The domain should store the file name of the file. Then use this file name to resolve the file from the local file system (path configured in your application configuration). Open the file and stream the contents to the browser. Be sure to set he headers correctly to indicate the file name and size.
There are a lot of moving parts involved here but it can be done. Now, if you get stuck on something I suggest you post what you have tried and what's not working about it. Otherwise, the best we/I can do is give you general guidance/advice.
Hope this helps!
Edit
The real key is going to be in the controller for downloading the files. Here is a quick snippet of what that may look like:
String fileName = "something.zip" // should come from your domain instance
String filePathAndName = "/downloads/${fileName}" // should come from your configuration
response.setContentType("application/octet-stream")
response.setHeader("Content-disposition", "attachment;filename=${fileName")
// this will actually buffer/stream the file in 8k chunks instead of reading the entire file into memory.
org.apache.commons.io.IOUtils.copy((new File(filePathAndName)).openStream(), response.outputStream)
response.outputStream.flush()
response.outputStream.close()

Mule: How to track non-deletable & non-movable files

I have a directory with files that cannot be removed because they are used by other applications or have read only properties. This means that I can't move or delete the files like Mule does as a natural file tracking system. In order to process these files through Mule once they arrive or when they get updated without deleting/moving them from the original directory I need some sort of custom tracking. To do this I think I need to add some rules and be able to track files that are:
New files
Processed files
Updated files
For this, I thought of having a log file in the same directory that would track each file by name and date modified, but I'm not sure if this is the correct way of doing this. I would need to be able to write and read this log file and compare its content with current files in the directory in order to determine which files are new or updated. This seems to be a bit too complicated and requires me to add quite a bit of programming (maybe as groovy scripts or overriding some methods).
Is there any other simpler way to do this on Mule? If not, how should I start tackling this problem? I'm guessing I can write some java to talk to File EndPoint.
As Victor Romero pointed out, Idempotent Filter does the trick. I tried two types of Idempotent Filter to see which one works best: Idempotent Message Filter and Idempotent Secure Hash Message Filter. Both of them did the job, however I ended up using Idempotent Message Filter (no Hash) to log timestamp and filename in the simple-text-file-store.
Just after the File inbound-endpoint:
<idempotent-message-filter idExpression="#[message.inboundProperties.originalFilename+'-'+message.inboundProperties.timestamp]" storePrefix="prefix" doc:name="Idempotent Message">
<simple-text-file-store name="uniqueProcessedMessages" directory="C:\yourDirectory"/>
</idempotent-message-filter>
Only new or modified files for the purposes of my process would pass through. However Idempotent Secure Hash Message Filter should do a better job at identifying different files.

Categories