I wanted to create metric based log alert in GCP environment. I needed a email notification on particular error message throws(if 3 or more errors in 5 mins time period) on GCP log explorer.
I have created a counter metric with log query. From the metric I've created a alert policy with following configs,
Rolling window:5mins
Rolling window function:count
Time series aggregation:none
Condition Type:Threshold
Alert trigger:Any time series violates
Threshold position:Above threshold
Threshold values:3
From the above config I am not getting mail alert on 3 errors with in five minutes periods.
example, at 5pm 2errors generated in the logs but threshold shows as 4 and mail received
Did I miss anything ? Thank you
Related
I use Apache Log4j2 and its SMTPAppender in an application. It's configured to send email notifications for events of level ERROR or above. Usually this works great.
But recently I had a batch processing situation in which thousands of ERROR lines were logged in a time interval of 5 minutes. My inbox was flooded with thousands of emails and our mail server blacklisted the affected application server...
To avoid such a mishap: Can we apply a maximum limit to the number of emails sent per time interval?
E.g. I'd like SMTPAppender to not send more than 20 emails per hour. If this limit is exceeded, further ERROR/FATAL lines should be aggregated into a single email which is sent as soon as one more email may be sent regarding the limit of 20/hour.
Is there a Log4j2-standard way to achieve that? How did you solve this task in your apps using Log4j2?
You can use BurstFilter. These are the parameters (from the documentation):
Parameter Name
Type
Description
level
String
Level of messages to be filtered. Anything at or below this level will be filtered out if maxBurst has been exceeded. The default is WARN meaning any messages that are higher than warn will be logged regardless of the size of a burst.
rate
float
The average number of events per second to allow.
maxBurst
integer
The maximum number of events that can occur before events are filtered for exceeding the average rate. The default is 10 times the rate.
onMatch
String
Action to take when the filter matches. May be ACCEPT, DENY or NEUTRAL. The default value is NEUTRAL.
onMismatch
String
Action to take when the filter does not match. May be ACCEPT, DENY or NEUTRAL. The default value is DENY.
<Appenders>
<SMTP> <!-- parameters omitted for brevity -->
<BurstFilter level="ERROR" rate="16" maxBurst="100"/>
</SMTP>
</Appenders>
Background
We have a data transfer solution with Azure Service Bus as the message broker. We are transferring data from x datasets through x queues - with x dedicated QueueClients as senders. Some senders publish messages at the rate of one message every two seconds, while others publish one every 15 minutes.
The application on the data source side (where senders are) is working just fine, giving us the desired throughput.
On the other side, we have an application with one QueueClient receiver per queue with the following configuration:
maxConcurrentCalls = 1
autoComplete = true (if receive mode = RECEIVEANDDELETE) and false (if receive mode = PEEKLOCK) - we have some receivers where, if they shut-down unexpectedly, would want to preserve the messages in the Service Bus Queue.
maxAutoRenewDuration = 3 minutes (lock duraition on all queues = 30 seconds)
an Executor service with a single thread
The MessageHandler registered with each of these receivers does the following:
public CompletableFuture<Void> onMessageAsync(final IMessage message) {
// deserialize the message body
final CustomObject customObject = (CustomObject)SerializationUtils.deserialize((byte[])message.getMessageBody().getBinaryData().get(0));
// process processDB1() and processDB2() asynchronously
final List<CompletableFuture<Boolean>> processFutures = new ArrayList<CompletableFuture<Boolean>>();
processFutures.add(processDB1(customObject)); // processDB1() returns Boolean
processFutures.add(processDB2(customObject)); // processDB2() returns Boolean
// join both the completablefutures to get the result Booleans
List<Boolean> results = CompletableFuture.allOf(processFutures.toArray(new CompletableFuture[processFutures.size()])).thenApply(future -> processFutures.stream()
.map(CompletableFuture<Boolean>::join).collect(Collectors.toList())
if (results.contains(false)) {
// dead-letter the message if results contains false
return getQueueClient().deadLetterAsync(message.getLockToken());
} else {
// complete the message otherwise
getQueueClient().completeAsync(message.getLockToken());
}
}
We tested with the following scenarios:
Scenario 1 - receive mode = RECEIVEANDDELETE, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We observe random, long periods of inactivity from the QueueClient - ranging from minutes to hours - there is no Outgoing Messages from the Service Bus namespace (observed on the Metrics charts) and there are no consumption logs for the same time periods!
Scenario 2 - receive mode = PEEKLOCK, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We keep seeing MessageLockLostException constantly after 20-30 minutes into the run of the application.
We tried doing the following -
we reduced the prefetch count (from 20 * processing rate - as mentioned in the Best Practices guide) to a bare minimum (to even 0 in one test cycle), to reduce the no. of messages that are locked for the client
increased the maxAutoRenewDuration to 5 minutes - our processDB1() and processDB2() do not take more than a second or two for almost 90% of the cases - so, I think the lock duration of 30 seconds and maxAutoRenewDuration are not issues here.
removed the blocking CompletableFuture.get() and made the processing synchronous.
None of these tweaks helped us fix the issue. What we observed is that the COMPLETE or RENEWMESSAGELOCK are throwing the MessageLockLostException.
We need help with finding answers for the following:
why is there a long period of inactivity of the QueueClient in scenario 1?
how do we know the MessageLockLostExceptions are thrown, because the locks have indeed expired? we suspect the locks cannot expire too soon, as our processing happens in a second or two. disabling prefetch also did not solve this for us.
Versions and Service Bus details
Java - openjdk-11-jre
Azure Service Bus namespace tier: Standard
Java SDK version - 3.4.0
For Scenario 1 :
If you have the duplicate detection history enabled, there is a possibility of this behavior happening as per the below explained scenario :
I had enabled for 30 seconds. I constantly hit Service bus with duplicate messages ( im my case messages with the same messageid from the client - 30 /per minute). I would be seeing a no activity outgoing for the window. Though the messages are received at the servicebus from the sending client, I was not be able to see them in outgoing messages. You could probably check whether you re encountering the duplicate messages which are filtered - inturn resulting inactivity in outgoing.
Also Note : You can't enable/disable duplicate detection after the queue is created. You can only do so at the time of creating the queue.
The issue was not with the QueueClient object per se. It was with the processes that we were triggering from within the MessageHandler: processDB1(customObject) and processDB2(customObject). since these processes were not optimized, the message consumption dropped and the locks gor expired (in peek-lock mode), as the handler was spending more time (in relation to the rate at which messages were published to the queues) in completing these opertations.
After optimizing the processes, the consumption and completion (in peek-lock mode) were just fine.
I have a scenario where I load a subscription with around 1100 messages. I then start a Spark job which pulls messages from this subscription with these settings:
MaxOutstandingElementCount: 5
MaxAckExtensionPeriod: 60 min
AckDeadlineSeconds: 600
The first message to get processed starts a cache generation which takes about 30 minutes to complete. Any other messages arriving while this is going on are simply "returned" with no ack or nack. After that, a given message takes between 1 min and 30 mins to process. With an ack extension period of 60 min, I would never expect to see resending of messages.
The behaviour I am seeing is that while the initial cache is being generated, every 10 minutes 5 new messages are grabbed by the client and returned with no ack or nack by my code. This is unexpected. I would expect the deadline of the original 5 messages to be extended up to an hour.
Furthermore, after having processed and acked about 500 of the messages, I would expect around 600 left in the subscription, but I see almost the original 1100. These turn out to be resent duplicates, as I log these in my code. This is also very unexpected.
This is a screenshot from google console after around 500 messages have been processed and acked (ignore the first "hump", that was an aborted test run):
Am I missing something?
Here is the setup code:
val name = ProjectSubscriptionName.of(ConfigurationValues.ProjectId,
ConfigurationValues.PubSubSubscription)
val topic = ProjectTopicName.of(ConfigurationValues.ProjectId,
ConfigurationValues.PubSubSubscriptionTopic)
val pushConfig = PushConfig.newBuilder.build
val ackDeadlineSeconds = 600
subscriptionAdminClient.createSubscription(
name,
topic,
pushConfig,
ackDeadlineSeconds)
val flowControlSettings = FlowControlSettings.newBuilder()
.setMaxOutstandingElementCount(5L)
.build();
// create a subscriber bound to the asynchronous message receiver
val subscriber = Subscriber
.newBuilder(subscriptionName, new EtlMessageReceiver(spark))
.setFlowControlSettings(flowControlSettings)
.setMaxAckExtensionPeriod(Duration.ofMinutes(60))
.build
subscriber.startAsync.awaitRunning()
Here is the code in the receiver which runs when a message arrives while the cache is being generated:
if(!BIQConnector.cacheGenerationDone){
Utilities.logLine(
s"PubSub message for work item $uniqueWorkItemId ignored as cache is still being generated.")
return
}
And finally when a message has been processed:
consumer.ack()
Utilities.logLine(s"PubSub message ${message.getMessageId} for $tableName acknowledged.")
// Write back to ETL Manager
Utilities.logLine(
s"Writing result message back to topic ${etlResultTopic} for table $tableName, $tableDetailsForLog.")
sendPubSubResult(importTableName, validTableName, importTimestamp, 2, etlResultTopic, stageJobData,
tableDetailsForLog, "Success", isDeleted)
Is your Spark job using a Pub/Sub client library to pull messages? These libraries should indeed keep extending your message deadlines up to the MaxAckExtensionPeriod you specified.
If your job is using a Pub/Sub client library, this is unexpected behavior. You should contact Google Cloud support with your project name, subscription name, client library version, and a sample of the message IDs from the messages you are "returning" without acking. They will be able to investigate further into why you're receiving these resent messages.
I am writing an application using activemq where I am using the redelivery policy to redeliver the messages. I am using the ActiveMQ's ExponentialBackOff concept.
My question is how does this ExponentialBackOff/setBackOffMultiplier work.
For example in my case I want to redeliver the message till the message expiration time, which is 15 minutes.I want to try to redeliver 10 times within 15 minutes.But ExponentialBackOff makes the message to redeliver beyond the 15 minutes expiry time of the message i.e. the message to be redelivered is still in the pending state even after the expiration time which is 15 minutes.
Why is this? I am kind of confused with this behavior. The redelivery policy I am using is as below.
RedeliveryPolicy queuePolicy = new RedeliveryPolicy();
queuePolicy.setInitialRedeliveryDelay(0);
queuePolicy.setBackOffMultiplier(3);
queuePolicy.setUseExponentialBackOff(true);
queuePolicy.setMaximumRedeliveries(10);
with this RedeliveryPolicy config, the RedeliveryPolicy will make attempts after each time waiting below :
after 1s
after 3s
9s
27s
81s
243s
729s
2187s
6561s
19683s
as you see like this attempts are executed after hours and in the meantime you see messages state is pending.
to prevent these long periods maybe you want to set the maximumRedeliveryDelay=300000L (5 minutes).
Note that
Once a message's redelivery attempts exceeds the maximumRedeliveries configured for the Redelivery Policy, a "Poison ack" is sent back to the broker letting him know that the message was considered a poison pill. The Broker then takes the message and sends it to a Dead Letter Queue so that it can be analyzed later on.
you need to adapt your RedeliveryPolicy because the message is pending as long as maximumRedeliveries is not exceeded.
http://activemq.apache.org/message-redelivery-and-dlq-handling.html
I am trying to access Yahoo mail with IMAP using JavaMail API. I can connect to the Yahoo mail server successfully and am able to fetch the messages using folder.getMessages() call where folder is an object of javax.mail.Folder class.
I need to iterate over all the messages returned by this call and I fetch received date of each message in this iteration. The iteration works well for small number of messages as it does not takes a long time, however if the number of returned messages is large (say around 10000) and iteration takes more than 30 minutes, then following exception occurs after 30 minutes:
javax.mail.FolderClosedException: * BYE IMAP4rev1 Server logging out
at com.sun.mail.imap.IMAPMessage.loadEnvelope(IMAPMessage.java:1234)
at com.sun.mail.imap.IMAPMessage.getReceivedDate(IMAPMessage.java:378)
at mypack.ImapUtils.getReceivedDate(ImapUtils.java:193)
...
Please note that I do not use the Folder object again during this iteration.
Could anyone please tell:
if there is a way to keep the folder open on yahoo mail server until it is explicitly closed?
if there is some property or setting which can be used to increase this "30 minutes" interval after which the folder is closed by the yahoo's IMAP server.
Thanks.