Get Pub/Sub message's messageId field from PubSubIO with Apache Beam - java

Is it possible the get somehow the messageId field of a Pub/Sub message in a DoFn after using the PubSubIO Beam source to read the messages?
I need the default id which was assigned by the Pub/Sub service. I want to log it for debugging purposes.
Using a custom attribute for the unique id and the withIdAttribute() method is not possible for me, because I have no influence on the publisher in this case.
I use the 2.2.0 version of the Dataflow Java SDK.

Support for reading the Pubsub message id was added starting with Beam v2.16.0. To turn it on, replace .readMessages() with .readMessagesWithMessageId() in your pipeline setup then it is as easy as message.getMessageId() after that change.

For debugging purposes you can use the seek option.
It creates a snapshot of the messages which you can replay when needed.

Related

Unacknowledge some pub/sub messages in apache beam pipeline

Currently we have a use case where we want to process some messages at later point of time, after some conditions met.
Is it possible to unacknowledge some pub/sub messages in apache beam pipeline which will be later available after visibility time out which we can process later?
You can't unack the message with Apache beam. When the message are correctly ingested in the pipeline, they are acked automatically.
You can keep them in the pipeline and reprocess them until the conditions are met. But you could have a congestion, or an overusage of Dataflow resources for nothing. It could be better to clean the message before, on a Cloud Functions for instance, that unack the message when they aren't valid, and publish in a target PubSub topic the valid messages.
As an alternative to #guillaume's suggestion, you can also store the "to-be-processed-later" messages (in raw format) in storage mediums such as BigQuery or Cloud Bigtable. All the messages will be acked by the pipeline and then the segregation can be done inside the pipeline where the "valid" messages are processed as usual while the "invalid" messages are preserved in storage for future processing.
Once the processing conditions are satisfied, the "invalid" messages can be retrieved from the storage medium and processed after which they can be deleted from storage. This could be a viable solution if the "invalid" messages will be processed after the message retention period which is 7 days.
The above workflow is inspired by this section of the Google Cloud blog. I considered the "invalid" messages to be "bad" data.

Be notified when a RDS instance was created

I'm using AWS SDK for Java.
Imagine I create a RDS instance as described in the AWS documentation.
AmazonRDS client = AmazonRDSClientBuilder.standard().build();
CreateDBInstanceRequest request = new CreateDBInstanceRequest().withDBInstanceIdentifier("mymysqlinstance").withAllocatedStorage(5)
.withDBInstanceClass("db.t2.micro").withEngine("MySQL").withMasterUsername("MyUser").withMasterUserPassword("MyPassword");
DBInstance response = client.createDBInstance(request);
If I call instance.getEndpoint() right after making the request it will return null to me, because AWS is still creating the database. I need to know this endpoint when it becomes available, but I'm not figuring out how to do it.
Is there a way, using the AWS SDK, to be notified when the instance was finally created?
You can use the RDS SNS notifications:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Events.html#USER_Events.Messages
Subscribing to Amazon RDS Event Notification
You can create an Amazon
RDS event notification subscription so you can be notified when an
event occurs for a given DB instance, DB snapshot, DB security group,
or DB parameter group. The simplest way to create a subscription is
with the RDS console. If you choose to create event notification
subscriptions using the CLI or API, you must create an Amazon Simple
Notification Service topic and subscribe to that topic with the Amazon
SNS console or Amazon SNS API. You will also need to retain the Amazon
Resource Name (ARN) of the topic because it is used when submitting
CLI commands or API actions. For information on creating an SNS topic
and subscribing to it, see Getting Started with Amazon SNS.
Disclaimer: Opinionated Answer
IMO creating infrastructure at runtime in code like this is devil's work. Stacks are the way to go here, much more modular and you will get some of the following benefits:
If you start creating more than one table per customer you will be able to logically group them into a stack and clean then up easier as needed
If for some reason the creation of a resource fails you can see this very easily in the stack console
Management is much easier to search through stacks as you have a console already built for you
Updating a stack in AWS is much easier as well than updating tables individually
MOST IMPORTANT: If an error occurs the stack functionality already has rollback and redundancy functionality built in, which you control the behaviour of. If something happens in your code during your on boarding process it will be a mess to clean up, what if one table succeeded and the other not? You will have to troll through logs (if they exist) to find out what happened.
You can also combine this approach with using something like AWS Pipelines or even AWS Simple Workflow Service to add custom steps in your custom on-boarding process, eg run a lambda function, send a notification when completed, wait for some payment. This builds on my last point that if this pipeline does fail, you will be able to see which step failed, and why it failed. You will also be able to see if things timeout.
Lastly I want to advise caution in creating infrastructure per customer. It's much more work and adds allot more ways in which things can break. Make sure you put limits in AWS as well that you don't have a situation in which your bill sky-rockets because of some bug creating infrastructure.

What is the equivalent of BrokerProperty "setScheduledEnqueueTimeUtc" in AMQP/JMS world

I'm working on a java application that needs to send message to Azure service bus such that message is available to the next process after certain delay.
Using Azure sdk, it can be achieved by setting setScheduledEnqueueTimeUtc BrokerProperty on the Brokered message, but I'm unable to find an equivalent of this in AMQP/JMS world.
Using Message.setProperty with a key,value pair results in property being put under application property and the message appears in queue immediately.
Is there a way to achieve this delay?
JMS 2.0 specifications define "delivery delay" feature which lets a message to be delivered after specified time duration. See here http://www.oracle.com/technetwork/articles/java/jms2messaging-1954190.html for more details. You will need a messaging provider that implements JMS 2.0 specification.

How to configure SNS delivery status with the AWS SDK?

If you create an SNS Topic and, in the Amazon Console, open the Delivery Status options under Other topic actions, you can see this:
As you can see, it's possible now to get SNS delivery status feedback by configuring success and failure IAM roles. This works fine and I can see all the logs in CloudWatch for all published messages to each subscriber.
What I can't do is to set these values with the Java AWS SDK, is there any way of doing this?
I'm using aws-java-sdk:1.10.23 (latest as of now)
As #david-murray pointed out in the documentation, this is the solution to configure the feedback for HTTP endpoints:
amazonSnsClient.setTopicAttributes(topicArn, "HTTPFailureFeedbackRoleArn", "arn:aws:iam::1234567890:role/SNSFailureFeedback");
The same idea can be used for Application, Lambda and SQS.
My mistake was trying to set all of them with a single call like the form in the screenshot does by using:
https://eu-west-1.console.aws.amazon.com/sns/v2/SetMultiTopicAttributes
Although this doesn't seem to be present in the SDK at the moment, 4 separate calls will have the same effect.
Thanks!

How can we save Java Message Queues for reference?

How can we keep track of every message that gets into our Java Message Queue? We need to save the message for later reference. We already log it into an application log (log4j) but we need to query them later.
You can store them
in memory - in a collection or in an in-memory database
in a standalone database
You could create a database logging table for the messages, storing the message as is in a BLOB column, the timestamp that it was created / posted to the MQ and a simple counter as primary key. You can also add fields like message type etc if you want to create statistical reports on messages sent.
Cleanup of the tabe can be done simply by deleting all message older than the retention period by using the timestamp column.
I implemented such a solution in the past, we chose to store messages with all their characteristics in a database and developed a search, replay and cancel application on top of it. This is the Message Store pattern:
(source: eaipatterns.com)
We also used this application for the Dead Letter Channel.
(source: eaipatterns.com)
If you don't want to build a custom solution, have a look at the ReplayService for JMS from CodeStreet.
The best way to do this is to use whatever tracing facility your middleware provider offers. Or possibly, you could set up an intermediate listener whose only job was to log messages and forward on to your existing application.
In most cases, you will find that the middleware provider already has the ability to do this for you with no changes or awareness by your application.
I would change the queue to a topic, and then keep the original consumer that processes the messages, and add another consumer for auditing the messages to a database.
Some JMS providers cater for topic-to-queue-bridge definitions, the consumers then receive from their own dedicated queues, and don't have to read past messages that are left on the queue due to other consumers being inactive.
Alternatively, you could write a log4j appender, which writes your logged messages to a database.

Categories