Processing huge files using AWS S3 and Lambda

Processing huge files using AWS S3 and Lambda - java

I am trying to decrypt files that arrives periodically into our s3 bucket.How can I process if the file size is huge (eg 10GB) ,since the computing resources of Lambda is Limited. Im not sure if it is necessary download the whole file into Lambda and perform the decryption or is there some other way we can chunk the file and process?
Edit :- Processing the file here includes decrypting the file and the parse each row and write it to persistent store like a SQL queue or Database.

You can set the byte-range in the GetObjectRequest to load a specific range of bytes from an S3 object.
The following example comes from the AWS official documentation on S3 GetObject API:
// Get a range of bytes from an object and print the bytes.
GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key).withRange(0, 9);
objectPortion = s3Client.getObject(rangeObjectRequest);
System.out.println("Printing bytes retrieved.");
displayTextInputStream(objectPortion.getObjectContent());
For more information, you can visit the documentation here:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html

Related

Retrieving only metadata information from object stored in s3 without reading the object payload

I am storing object which consists of stream and metadata in s3 using aws java sdk v2.
Metadata is a map of values extracted from object received from UI.
My code looks this
response=s3Client.putObject(PutObjectRequest.builder().bucket(bucket).key(key).metadata(metadata(media)).build(),
RequestBody.fromBytes(readAsBytesFromStream(media)));
I want to retrieve only the meta information from the object saved and not read the object's payload.
The use case is i have to read only the meta info to render on UI preventing s3 to read object's content.
Is there any way where i can read only the meta info and not the content of saved object.As reading multiple object's content(payload+metadata) and then rendering would make it slow.
Some other way to store meta and payload separately so that reading meta becomes efficient .

You should be able to use the headObject method.
Something like:
response = s3Client.headObject(PutObjectRequest.builder().bucket(bucket).key(key).build();
metadata = response.metadata();
SDK documentation:
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3Client.html#headObject-java.util.function.Consumer-
https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/HeadObjectResponse.html

Does AWS Lambda/Firehose support Base64 URL decoding?

My pipeline is as follows:
Firehose -> Lambda (AWS' Java SDK) -> (S3 & Redshift)
An un-encoded (raw) JSON record is submitted to Firehose. It then triggers a Lambda function which transforms it slightly. Firehose then puts the transformed record into an S3 bucket and into Redshift.
For Firehose to add the transformed data to S3, it requires that the data be Base64 encoded (and Firehose decodes it before adding it to S3).
However, I have a URL within the data that, when decoded, = characters are replaced with their equivalent unicode character (\u003d) due to it being the character that Amazon's Base64 decoder uses as padding.
https://www.[snipped].com/...?returnurl\u003dnull\u0026referrer\u003dnull
How can I retain those = characters within the decoded data?
Note: I've tried using Base64.getUrlEncoder(), but AWS only seems to support Base64.getEncoder().

It turns out that HTML escaping was enabled on the JSON library (Gson) that I was using when (de)serializing my Lambda record. To fix it, I just had to disable HTML escaping:
new GsonBuilder().disableHtmlEscaping().create();

Not able to read a large JSON file in memory from Amazon S3 in Jackson ArrayNode using Jackson

I am trying to read a very large JSON file which is stored on Amazon S3 and contains around 30,000 records and is of size 100 MB.
I am trying to reads all the records in JSON ArrayNode in Java. I am not able to read all the records in memory. It reads upto 3619 records and then my server restarts. I am not able to find any log traces of this problem in logs too.
Can anyone help me out with this.
Thanks.

Please download the file from S3 using the S3 console, and then modify this perl script as appropriate to see if the data you have downloaded is actually valid Json:
use warnings FATAL => qw(all);
use strict;
use Carp;
use Data::Dump qw(dump);
use JSON;
my $File = "...."; # File name containing downloaded JSON
open(my $f, "<$File") or die "Cannot open file $File for input";
local $/; # Enable localized slurp mode
my $j = <$f>;
my $d = decode_json $j;
print dump($d);

The Process to Store & Retrieve an Image in a Database

I have never saved and retrieved an image to and from the database before. I wrote down what I guessed would be the process. I would just like to know if this is correct though:
Save image:
Select & Upload image file from jsp (Struts 2) which will save it as a .tmp file.
Convert the .tmp file to a byte[] array (Java Server-Side)
Store the byte[] array as a blob in the database (Java Server-Side)
Get image:
Get the byte[] array from the database (Java Server-Side)
Convert the byte[] array to an image file (Java Server-Side)
Create the file in a location (Java Server-Side)
Use an img tag to display the file (JSP Client-Side)
Delete the file after it's finished being used? (Java Server-Side)
I'm aware of the fact that it is highly recommended to not save & retrieve images to and from the database. I would like to know how to do it anyway.
Thanks

Almost correct.
It's expensive and not so great to create the file on the fly and then delete it.
Yes, you store it as the raw bytes in the database, but the way to retrieve it and display it to a client machine is to implement a web handler that sets the content-type of the response to the appropriate MIME type and then dumps the bytes out to the response stream.

Yes, You get it right.
Save Image :
The decision to save image is very much dependent on further usage. You have one option to save the file on the file system. The location for saved file should be saved into the metadata in the database table.
Get Image:
You do not have to right file data on any temp location. It can be easily rendered from the database only. Just send a request from client and intercept that request in a spacial designed Servlet. This Servlet will read the file metadata and corresponding file, if successful, write the file back on the response stream.

mapping byte[] in Hibernate and adding file chunk by chunk

I have web service which receives 100 Mb video file by chunks
public void addFileChunk(Long fileId, byte[] buffer)
How can I store this file in Postgresql database using hibernate?
Using regular JDBC is very straight forward. I would use the following code inside my web service method:
LargeObject largeObject = largeObjectManager.Open(fileId, LargeObjectManager.READWRITE);
int size = largeObject.Size();
largeObject.Seek(size);
largeObject.Write(buffer);
largeObject.Close();
How can I achieve the same functionality using Hibernate? and store this file by chunk?
Storing each file chunk in separate row as bytea seems to me not so smart idea. Pease advice.

its now advisable to store 100MB files in database. I would instead store them in the filesystem, but considering transactions are active, employing Servlets seems reasonable.
process http request so that file (received one) is stored in some temporal location.
open transaction, persist file metadata including temporal location, close transaction
using some external process which will monitor temporal files, transfer this file to its final destination from which it will be available to user through some Servlet.

see http://in.relation.to/Bloggers/PostgreSQLAndBLOBs
Yeah byteas would be bad. Hibernate has a way to continue to use large objects and you get to keep the streaming interface.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Processing huge files using AWS S3 and Lambda - java

Related

Retrieving only metadata information from object stored in s3 without reading the object payload

Does AWS Lambda/Firehose support Base64 URL decoding?

Not able to read a large JSON file in memory from Amazon S3 in Jackson ArrayNode using Jackson

The Process to Store & Retrieve an Image in a Database

mapping byte[] in Hibernate and adding file chunk by chunk

Categories

Resources