I was asked to write a code to send a .csv file to S3 using Amazon Kinesis Firehose. But as someone who has never used Kinesis, I have no idea how I should do this. Can you help with this, or if you have a code that does this job, it can also help (Java or Scala).
csv data should be sent to Kinesis Firehose to be written to a S3 bucket in gzip format using a Firehose client application.
Thanks in advance.
Firstly, Firehose is streaming to send a record (or records) to a destination, not a file transfer such as copy a csv file to S3. You can use S3 CLI commands if you need to copy files from somewhere to S3.
So please first make sure what you need to do is streaming or file copy. If it is not streaming, then I wonder why Firehose.
There are multiple input source you can use. First better decide which way to use.
If you use JAVA+AWS SDK, then probably PutRecord API call would be the way
Writing to Kinesis Data Firehose Using the AWS SDK
aws-sdk-java/src/samples/AmazonKinesisFirehose/
Put data to Amazon Kinesis Firehose delivery stream using Spring Boot
If you can use AWS Amazon Linux to send the data to Firehose, Firehose Agent will be easier. It just monitor a file and can send the deltas to S3.
enter link description here
Related
I need to upload a file using a web form to AWS and then trigger a function to import it into a Postgres DB. I have the file import to a DB working locally using Java, but need it to work in the cloud
It needs a file upload with some settings (such as which table to import into) to be passed through a Java function which imports it to the Postgres DB
I can upload files to an EC2 instance with php, but then need to trigger a lambda function on that file. My research suggests S3 buckets are perhaps a better solution? Looking for some pointers to which services could be best suited
There are two main steps in your scenario:
Step 1: Upload a file to Amazon S3
It is simple to create an HTML form that uploads data directly to an Amazon S3 bucket.
However, it is typically unwise to allow anyone on the Internet to use the form, since they might upload any number and type of files. Typically, you will want your back-end to confirm that they are entitled to upload the file. Your back-end can then Upload objects using presigned URLs - Amazon Simple Storage Service, which authorize the user to perform the upload.
For some examples in various coding languages, see:
Direct uploads to AWS S3 from the browser (crazy performance boost)
File Uploads Directly to S3 From the Browser
Amazon S3 direct file upload from client browser - private key disclosure
Uploading to Amazon S3 directly from a web or mobile application | AWS Compute Blog
Step 2: Load the data into the database
When the object is created in the Amazon S3 bucket, you can configure S3 to trigger an AWS Lambda function, which can be written in the programming language of your choice.
The Bucket and Filename (Key) of the object will be passed into the Lambda function via the event parameter. The Lambda function can then:
Read the object from S3
Connect to the database
Insert the data into the desired table
It is your job to code this functionality but you will find many examples on the Internet.
You can use AWS SDK in your convenient language to invoke Lambda.
Please refer this documentation
Hi, I'm trying to fetch files from AWS s3 to ec2 to zip it, and then wanna upload the zip back to s3,
all via AWS internal communication.
In order to achieve this, I have set up VPC, and both s3 and ec2 are in the same region.
I'm able to fetch files from s3 to ec2 on AWS CLI but don't know how to achieve the same using java.
I need help for this purpose
I need to implemnt a AWS backend API that allows the users of my mobile app to upload a file (image) in Amazon S3.
Creating an API directly interfaced with the Amazon S3 is not an option because i will not be able to correlate the uploaded file to the record of the user on DynamoDB.
I've thought to create a Lambda function (Java) triggered by an API that performs the following steps:
1) calls the Amazon S3 functionality to upload the file
2) write the record into my Dynamo DB with the reference of the file
Is there a way to provide a binary file in input to my Lambda function exposed as API?
please let me know. thank you!
davide
The best way to do this is with presigned URLs. You can generate a URL that will let the user upload files directly to S3 with specific name and type. This way you don't have to worry about big files slowing down your server, lambda limits, or double charges for bandwidth. It's also faster for the user in most cases and supports S3 transfer acceleration.
The process can look something like:
User requests link from your server
Your server writes an entry in DynamoDB and returns a presigned URL
User uploads file directly to S3 using presigned URL (with exact name of your server's choice)
Once upload is done you either get a notification using Lambda, or just have the user tell your server the upload is done
Your server performs any required post-processing and marks the file as ready
And to answer your actual question, yes, there is a way to pass binary data to Lambda functions. The link is a step-by-step tutorial, but basically in API Gateway you have to set "Request body passthrough" to "When there are no templates defined (recommended)" and fill in your expected content types. Your mapping should include "base64data": "$input.body", and you need to setup your types under "Binary Support". In your actual lambda function, you should have access to the data as "base64data".
I'd like to upload image to S3 via CloudFront.
If you see the document about CloudFront, you can find that cloud front offers put method for uploading to cloudFront
There could be someone to ask me why i use the cloud front for uploading to S3
If you search out about that, you can find the solution
What i wanna ask is whether there is method in SDK for uploading to cloud front or not
As you know , there is method "putObejct" for uploading directly to S3 but i can't find for uploading cloud front ...
please help me..
Data can be sent through Amazon CloudFront to the back-end "origin". This is used for using a POST on web forms, to send information back to web servers. It can also be used to POST data to Amazon S3.
If you would rather use an SDK to upload data to Amazon S3, there is no benefit in sending it "via CloudFront". Instead, use the Amazon S3 APIs to upload the data directly to S3.
So, bottom line:
If you're uploading from a web page that was initially served via CloudFront, send it through CloudFront to S3
If you're calling an API, call S3 directly
If the bucket's region is far away from the uploading computer you can upload faster by enabling S3 Accelerate which uploads directly through the Amazon server located closest to you and then continues sending the file from there to the bucket's actual region at an optimal route.
Have a look here.
I am using the java aws sdk to transfer large files to s3. Currently I am using the upload method of the TransferManager class to enable multi-part uploads. I am looking for a way to throttle the rate at which these files are transferred to ensure I don't disrupt other services running on this CentOS server. Is there something I am missing in the API, or some other way to achieve this?
Without support in the API for this, one approach is to wrap the s3 command with trickle.