Flushing content to AWS S3 with Java - java

I have a Java application that writes content to a file in a AWS S3 bucket.
The writer is created with this code
SequenceWriter getBufferedWriter(final ObjectWriter newWriter)
throws IOException {
var key = "myFile.csv";
manager = new StreamTransferManager(bucket, key, client.getClient()).numStreams(1)
.numUploadThreads(1)
.queueCapacity(1)
.partSize(PART_SIZE_MB);
outputStream = manager.getMultiPartOutputStreams()
.get(0);
return newWriter.writeValues(outputStream);
}
Then I write values with
writer.writeValue(myData);
The application works fine, and when it is finished, the data is in the S3 file. However, I'd like to have the content written (and flushed) while the application is running, so if for any reason the application crashes, I still get partial content in the file.
I'd actually like to "flush" it programmatically, so when a certain event occurs in my application, I force the flush
I've tried using writer.flush() but it didn't achieve what I wanted.
How can I force the content to be written to S3?

Related

Most practical way to read an Azure Blob (PDF) in the Cloud?

I'm somewhat of a beginner and have never dealt with cloud-based solutions yet before.
My program uses the PDFBox library to extract data from PDFs and rename the file based on the data. It's all local currently, but eventually will need to be deployed as an Azure Function. The PDFs will be stored in an Azure Blob Container - the Azure Blob Storage trigger for Azure Functions is an important reason for this choice.
Of course I can download the blob locally and read it, but the program should run solely in the Cloud. I've tried reading the blobs directly using Java, but this resulted in gibberish data and wasn't compatible with PDFbox. My plan for now is to temp store the files elsewhere in the Cloud (e.g. OneDrive, Azure File Storage) and try opening them from there. However, this seems like it can quickly turn into an overly messy solution. My questions:
(1) Is there any way a blob can be opened as a File, rather than a CloudBlockBlob so this additional step isn't needed?
(2) If no, what would be a recommended temporary storage be in this case?
(3) Are there any alternative ways to approach this issue?
Since you are planning Azure function, you can use blob trigger/binding to get the bytes directly. Then you can use PDFBox PdfDocument load method to directly build the object PDDocument.load(content). You won't need any temporary storage to store the file to load that.
#FunctionName("blobprocessor")
public void run(
#BlobTrigger(name = "file",
dataType = "binary",
path = "myblob/{name}",
connection = "MyStorageAccountAppSetting") byte[] content,
#BindingName("name") String filename,
final ExecutionContext context
) {
context.getLogger().info("Name: " + filename + " Size: " + content.length + " bytes");
PDDocument doc = PDDocument.load(content);
// do your stuffs
}

Read and Append data to the File from a Blob URL path before download

This is my first hands on using Java Spring boot in a project, as I have mostly used C# and I have a requirement of reading a file from a blob URL path and appending some string data(like a key) to the same file in the stream before my API downloads the file.
Here are the ways that I have tried to do it:
FileOutputStream/InputStream: This throws a FileNotfoundException as it is not able to resolve the blob path.
URLConnection: This got me somewhere and I was able to download the file successfully but when I tried to write/append some value to the file before I download, I failed.
the code I have been doing.
//EXTERNAL_FILE_PATH is the azure storage path ending with for e.g. *.txt
URL urlPath = new URL(EXTERNAL_FILE_PATH);
URLConnection connection = urlPath.openConnection();
connection.setDoOutput(true); //I am doing this as I need to append some data and the docs mention to set this flag to true.
OutputStreamWriter out = new OutputStreamWriter(connection.getOutputStream());
out.write("I have added this");
out.close();
//this is where the issues exists as the error throws saying it cannot read data as the output is set to true and it can only write and no read operation is allowed. So, I get a 405, Method not allowed...
inputStream = connection.getInputStream();
I am not sure if the framework allows me to modify some file in the URL path and read it simultaneously and download the same.
Please help me in understanding if they is a better way possible here.
From logical point of view you are not appending data to the file from URL. You need to create new file, write some data and after that append content from file from URL. Algorithm could look like below:
Create new File on the disk, maybe in TMP folder.
Write some data to the file.
Download file from the URL and append it to file on the disk.
Some good articles from which you can start:
Download a File From an URL in Java
How to download and save a file from Internet using Java?
How to append text to an existing file in Java
How to write data with FileOutputStream without losing old data?

Disable caching of files on google cloud storage (flexible app engine java)

In Java on FLEXIBLE google app engine, how do you disable caching of files? I don't care if it's disabled on the entire bucket with gsutil, or individual files when I save them, or when they're read. (I just don't want anything cached, as files are frequently replaced and use the same filename).
My code to store files:
private static Storage storageService;
public static void uploadStream(
String name, InputStream stream, String bucketName)
throws IOException, GeneralSecurityException {
Storage storage = StorageOptions.getDefaultInstance().getService();
Blob blob = storage.create(BlobInfo.newBuilder(bucketName, name).build(),stream);}
This code works flawlessly for uploading and replacing pdf files as intended. When the user views the pdf on the web page, if it was recently replaced, they see a cached copy. It takes an hour before the new version can be viewed on the web site.
I'm not sure if this is something where I need to edit the bucket, set no caching when saving the file in java, or set no caching when reading the file. My code for reading the file is:
public ByteArrayOutputStream downloadStream (String bucketName, String filePath)
throws Exception {
Storage storage = getService();
byte [] bytes = storage.readAllBytes(bucketName,filePath);
ByteArrayOutputStream baos = new ByteArrayOutputStream(bytes.length);
baos.write(bytes, 0, bytes.length);
return baos;
}
This is then returned via a web servlet.
Standard google app engine was tagged as well, as this is really a google cloud storage issue, and I'm not sure if the solution lies with gsutil or the cloud console, but note that the java code to access google cloud storage will differ between flexible and standard.
Objects are cacheable if they are publicly readable and the Cache-Control header allows caching. Thus, you can disable caching by changing either/both of these things. See this gsutil documentation about setting the Cache-Control header, for example:
https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#cache-control

Download file to stream instead of File

I'm implementing an helper class to handle transfers from and to an AWS S3 storage from my web application.
In a first version of my class I was using directly a AmazonS3Client to handle upload and download, but now I discovered TransferManager and I'd like to refactor my code to use this.
The problem is that in my download method I return the stored file in form of byte[]. TransferManager instead has only methods that use File as download destination (for example download(GetObjectRequest getObjectRequest, File file)).
My previous code was like this:
GetObjectRequest getObjectRequest = new GetObjectRequest(bucket, key);
S3Object s3Object = amazonS3Client.getObject(getObjectRequest);
S3ObjectInputStream objectInputStream = s3Object.getObjectContent();
byte[] bytes = IOUtils.toByteArray(objectInputStream);
Is there a way to use TransferManager the same way or should I simply continue using an AmazonS3Client instance?
The TransferManager uses File objects to support things like file locking when downloading pieces in parallel. It's not possible to use an OutputStream directly. If your requirements are simple, like downloading small files from S3 one at a time, stick with getObject.
Otherwise, you can create a temporary file with File.createTempFile and read the contents into a byte array when the download is done.

File issues with threading in tomcat

I have a tomcat server and i have a controller which writes in to a file, the data coming in the request. SO my doubt is whether multiple threads within the server can write into the same file at the same time and cause issues?
My requirement is that all requests appends data to the same file. I am not using any threading from my end.
My code is as follows:
File file = new File(fileName);
try {
if(!file.exists()) {
file.createNewFile();
}
InputStream inputStream = request.getInputStream();
FileWriter fileWriter = new FileWriter(fileName,true);
BufferedWriter bufferWriter = new BufferedWriter(fileWriter);
bufferWriter.write(IOUtils.toString(inputStream));
bufferWriter.flush();
bufferWriter.close();
}
There is the standard solution for such issue.
You have to create singleton class, which will be shared between all threads.
This singleton will have some BlockingQueue (e.g. LinkedBlockingQueue) in which all threads will put their messages for writing into single file.
This singleton by it self also will be the Thread and inside its run() method it will constantly take values from queue and sequentially write it into needed file.
My requirement is that all requests appends data to the same file
Doing a task for each request (like logging or in your case, appending text to a file) can be best implemented using a filter (javax.servlet.Filter). You don't have to create a singleton manually then and you can turn a filter on or off whenever you need its functionality or not.
However, you still need to synchronize the concurrent access to your file. As Andremoniy pointed out, you can do this using an own Thread, so that your filter does not block the request/response.
EDIT
One thing about the shared object used to write to the file: It is better to store an instance of this object in the javax.servlet.ServletContext rather than creating a singleton object. This is the standard way to go if you need to have an object accessible by all other components in a Java web application using servlets.

Categories