AWS Lambda function used with a DLQ - java

I have an SQS queue with a supported DLQ for failures. I want to write a custom Lambda function in java (spring boot) to get the messages from this dlq, write it to a file, upload the file to an S3 bucket and send the file as an alert to a specified webhook.
I'm new to lambda and I hope this design can be implemented.
One requirement is that I want to execute the Lambda only once per day. So let's say at 6:00 am everyday. I want all the messages in the queue to be written to a file.
I'm trying to see examples of RequestHandler being implemented where the messages in the queue are received and iterated to be saved in the file one at a time.
I'm not sure how to configure the lambda such that it runs only once per day instead of each time a message entering the DLQ.
Any documentation relating to these queries will be really helpful. Please critique my expected implementation and offer any better solutions for the same.

You can have your lambda code run on any schedule (once per day for your case) using CloudWatch Event Schedule.
To create the schedule, follow this link
In you lambda code, you can fetch the messages from DLQ and process them iteratively.

You no need to use Spring framework in AWS Lambda use Java only
Use Lambda with Cloud watch cron expression and schedule daily run.
Write your own logic
https://docs.aws.amazon.com/lambda/latest/dg/java-samples.html
https://www.freecodecamp.org/news/using-lambda-functions-as-cronjobs/

Related

How to retrieve event.json file to unit test lambda

I am able to run AWS lambda using postman. I want to use the event to run unit test along it. I am not sure how I can generate the event.json file.
I tried logging event that I receive in my handleRequest method on cloudwatch logs but it does not seem to be the right one.
you can find the Lambda event structures in the documentation. Here is the list for all the different event types, https://docs.aws.amazon.com/lambda/latest/dg/lambda-services.html.
In your case I would say the event you are looking for coming from API Gateway, you can find it here: https://docs.aws.amazon.com/lambda/latest/dg/services-apigateway.html#apigateway-example-event

How to make Aws Lambda to run continuously

I have a AWS Lambda function which is deployed in console. Since, it will consume the message from some external queue automatically,I want it to be keep on running continuously instead of scheduling.
How can I make aws lambda to run continuously?
You can add a trigger to your Lambda, and set the SQS queue you want to respond to. In the AWS console (on the web), you can either do this from the Lambda functions itself, or from SQS (i'd advise this strategy). The console will guide you through the details for setting up the proper security stuff.
More info on the setup:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-configure-lambda-function-trigger.html
Some general info on consuming SQS messages in Lambda:
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
Your preferred programming language probably has a client library that implements the API for you.
IMPORTANT: If you want to process your queue sequentially, make sure you set the reserved concurrency of your lambda to 0.
Also, if you use CLI or other automatic deploy tools, make sure to update your config files, so you don't overwrite your Lambda's settings on deploy.
EDIT: when you say external queue; you mean a non SQS queue?
I guess it could still be done with another system. Best way to do it is to raise an event. Trigger the Lambda with a http request when a message is added. If for some reason you can't do this, you could add a schedule for your Lambda, and run it, let's say every 5 minutes. More info on scheduling: https://docs.aws.amazon.com/eventbridge/latest/userguide/run-lambda-schedule.html
If instead of an external queue you could use the SQS provided by AWS you could set that queue as a trigger to your Lambda function and have it execute once a new item is set on the queue.

Building a File Polling/Ingest Task with Spring Batch and Spring Cloud Data Flow

We are planning to create a new processing mechanism which consists of listening to a few directories e.g: /opt/dir1, /opt/dirN and for each document create in these directories, start a routine to process, persist it's registries in a database (via REST calls to an existing CRUD API) and generate a protocol file to another directory.
For testing purposes, I am not using any modern (or even decent) framework/approach, just a regular SpringBoot app with WatchService implementation that listens to these directories and poll the files to be processed as soon as they are created. It works but, clearly I am most definitely having some performance implications at some time when I move to production and start receiving dozens of files to be processed in parallel, which isn't a reality in my example.
After some research and some tips from a few colleagues, I found Spring Batch + Spring Cloud Data Flow to be the best combination for my needs. However, I have never dealt with neither of Batch or Data Flow before and I'm kinda confuse on what and how I should build these blocks in order to get this routine going in the most simple and performatic manner. I have a few questions regarding it's added value and architecture and would really appreciate hearing your thoughts!
I managed to create and run a sample batch file ingest task based on this section of Spring Docs. How can I launch a task every time a file is created in a directory? Do I need a Stream for that?
If I do, How can I create a stream application that launches my task programmaticaly for each new file passing it's path as argument? Should I use RabbitMQ for this purpose?
How can I keep some variables externalized for my task e.g directories path? Can I have these streams and tasks read an application.yml somewhere else than inside it's jar?
Why should I use Spring Cloud Data Flow alongside Spring Batch and not only a batch application? Just because it spans parallel tasks for each file or do I get any other benefit?
Talking purely about performance, how would this solution compare to my WatchService + plain processing implementation if you think only about the sequential processing scenario, where I'd receive only 1 file per hour or so?
Also, if any of you have any guide or sample about how to launch a task programmaticaly, I would really thank you! I am still searching for that, but doesn't seem I'm doing it right.
Thank you for your attention and any input is highly appreciated!
UPDATE
I managed to launch my task via SCDF REST API so I could keep my original SpringBoot App using WatchService launching a new task via Feign or XXX. I still know this is far from what I should do here. After some more research I think creating a stream using file source and sink would be my way here, unless someone has any other opinion, but I can't get to set the inbound channel adapter to poll from multiple directories and I can't have multiple streams, because this platform is supposed to scale to the point where we have thousands of particiants (or directories to poll files from).
Here are a few pointers.
I managed to create and run a sample batch file ingest task based on this section of Spring Docs. How can I launch a task every time a file is created in a directory? Do I need a Stream for that?
If you'd have to launch it automatically upon an upstream event (eg: new file), yes, you could do that via a stream (see example). If the events are coming off of a message-broker, you can directly consume them in the batch-job, too (eg: AmqpItemReader).
If I do, How can I create a stream application that launches my task programmaticaly for each new file passing it's path as argument? Should I use RabbitMQ for this purpose?
Hopefully, the above example clarifies it. If you want to programmatically launch the Task (not via DSL/REST/UI), you can do so with the new Java DSL support, which was added in 1.3.
How can I keep some variables externalized for my task e.g directories path? Can I have these streams and tasks read an application.yml somewhere else than inside it's jar?
The recommended approach is to use Config Server. Depending on the platform where this is being orchestrated, you'd have to provide the config-server credentials to the Task and its sub-tasks including batch-jobs. In Cloud Foundry, we simply bind config-server service instance to each of the tasks and at runtime the externalized properties would be automatically resolved.
Why should I use Spring Cloud Data Flow alongside Spring Batch and not only a batch application? Just because it spans parallel tasks for each file or do I get any other benefit?
Ad a replacement for Spring Batch Admin, SCDF provides monitoring and management for Tasks/Batch-Jobs. The executions, steps, step-progress, and stacktrace upon errors are persisted and available to explore from the Dashboard. You can directly also use SCDF's REST endpoints to examine this information.
Talking purely about performance, how would this solution compare to my WatchService + plain processing implementation if you think only about the sequential processing scenario, where I'd receive only 1 file per hour or so?
This is implementation specific. We do not have any benchmarks to share. However, if performance is a requirement, you could explore remote-partitioning support in Spring Batch. You can partition the ingest or data processing Tasks with "n" number of workers, so that way you can achieve parallelism.

Camel-Twitter difference between direct and event-based

I'm considering the use of camel-twitter (The Twitter Component for Apache Camel: http://camel.apache.org/twitter.html). I want to use Twitters Streaming API.
What is the difference between the types event and direct?
Does somebody have an example code for the usage of the event-driven consumer? (I only found this one so far https://fisheye6.atlassian.com/browse/camel/trunk/components/camel-twitter/src/test/java/org/apache/camel/component/twitter/SearchEventTest.java)
direct means that you do an explict direct call to trigger twitter. For example using the direct component in Camel to call a route with twitter.
event means event driven consumer, where you have twitter react on events, such as new tweets found in a search etc.
And for examples, we have also this websocket twitter example: http://camel.apache.org/twitter-websocket-example.html
Keep in mind that the Streaming API cannot be used with the direct endpoint -- only event and polling are supported.
From an API usage standpoint, both event and polling work identically. A single stream listener is opened and maintained. Rate limit considerations do not differ.
The only difference is that the event endpoint sends 1 event per message, immediately when it's received. Polling queues up the received messages and releases them on each poll.
So, the differences are purely how they're delivered inside of Camel. With respect to the API, both streaming endpoints are the same.
The Direct type tells the consumer/producer that it will connect to Twitter each time the endpoint is activated somehow. Let's say you want to use a schedule saved in your database to do searches on Twitter:
Use a JDBC/JPA endpoint that consumes scheduling data
Dynamically create and register Quartz endpoints based on the scheduling data from your DB
Configure your Quartz endpoint to send a message to your Direct Twitter endpoint to do the search at that moment
You will be rated, always. No matter if you use Streaming, Direct, or Polling.
In case you are using Streaming, please read this FAQ from Twitter Developer Center

Java patterns for long running process in a web service

I'm building a web service that executes a database process (SQL code to run several queries , then move data between two really large tables), I'm assuming some processes might take 2 to 10 hours to execute.
What are the best practices for executing a long running database process from within a Java web service (it's actually REST-based using JAX-RS and Spring)? The process would be executed upon 1 web service call. It is expected that this execution would be done once a week.
Thanks in advance!
It's gotta be asynchronous.
Since your web service call is an RPC, best to have the implementation validate the request, put it on a queue for processing, and immediately send back a response that has a token or URL to check on progress.
Set up a JMS queue and register a listener that takes the message off the queue and persists it.
If this is really taking 2-10 hours, I'd recommend looking at your schema and queries to see if you can speed it up. There's an index missing somewhere, I'd bet.
Where I work, I am currently evaluating different strategies for this exact situation, only times are different.
With the times you state, you may be better served by using Publish/Subscribe message queuing (ActiveMQ).

Categories