I use Apache Log4j2 and its SMTPAppender in an application. It's configured to send email notifications for events of level ERROR or above. Usually this works great.
But recently I had a batch processing situation in which thousands of ERROR lines were logged in a time interval of 5 minutes. My inbox was flooded with thousands of emails and our mail server blacklisted the affected application server...
To avoid such a mishap: Can we apply a maximum limit to the number of emails sent per time interval?
E.g. I'd like SMTPAppender to not send more than 20 emails per hour. If this limit is exceeded, further ERROR/FATAL lines should be aggregated into a single email which is sent as soon as one more email may be sent regarding the limit of 20/hour.
Is there a Log4j2-standard way to achieve that? How did you solve this task in your apps using Log4j2?
You can use BurstFilter. These are the parameters (from the documentation):
Parameter Name
Type
Description
level
String
Level of messages to be filtered. Anything at or below this level will be filtered out if maxBurst has been exceeded. The default is WARN meaning any messages that are higher than warn will be logged regardless of the size of a burst.
rate
float
The average number of events per second to allow.
maxBurst
integer
The maximum number of events that can occur before events are filtered for exceeding the average rate. The default is 10 times the rate.
onMatch
String
Action to take when the filter matches. May be ACCEPT, DENY or NEUTRAL. The default value is NEUTRAL.
onMismatch
String
Action to take when the filter does not match. May be ACCEPT, DENY or NEUTRAL. The default value is DENY.
<Appenders>
<SMTP> <!-- parameters omitted for brevity -->
<BurstFilter level="ERROR" rate="16" maxBurst="100"/>
</SMTP>
</Appenders>
Related
I wanted to create metric based log alert in GCP environment. I needed a email notification on particular error message throws(if 3 or more errors in 5 mins time period) on GCP log explorer.
I have created a counter metric with log query. From the metric I've created a alert policy with following configs,
Rolling window:5mins
Rolling window function:count
Time series aggregation:none
Condition Type:Threshold
Alert trigger:Any time series violates
Threshold position:Above threshold
Threshold values:3
From the above config I am not getting mail alert on 3 errors with in five minutes periods.
example, at 5pm 2errors generated in the logs but threshold shows as 4 and mail received
Did I miss anything ? Thank you
Background
We have a data transfer solution with Azure Service Bus as the message broker. We are transferring data from x datasets through x queues - with x dedicated QueueClients as senders. Some senders publish messages at the rate of one message every two seconds, while others publish one every 15 minutes.
The application on the data source side (where senders are) is working just fine, giving us the desired throughput.
On the other side, we have an application with one QueueClient receiver per queue with the following configuration:
maxConcurrentCalls = 1
autoComplete = true (if receive mode = RECEIVEANDDELETE) and false (if receive mode = PEEKLOCK) - we have some receivers where, if they shut-down unexpectedly, would want to preserve the messages in the Service Bus Queue.
maxAutoRenewDuration = 3 minutes (lock duraition on all queues = 30 seconds)
an Executor service with a single thread
The MessageHandler registered with each of these receivers does the following:
public CompletableFuture<Void> onMessageAsync(final IMessage message) {
// deserialize the message body
final CustomObject customObject = (CustomObject)SerializationUtils.deserialize((byte[])message.getMessageBody().getBinaryData().get(0));
// process processDB1() and processDB2() asynchronously
final List<CompletableFuture<Boolean>> processFutures = new ArrayList<CompletableFuture<Boolean>>();
processFutures.add(processDB1(customObject)); // processDB1() returns Boolean
processFutures.add(processDB2(customObject)); // processDB2() returns Boolean
// join both the completablefutures to get the result Booleans
List<Boolean> results = CompletableFuture.allOf(processFutures.toArray(new CompletableFuture[processFutures.size()])).thenApply(future -> processFutures.stream()
.map(CompletableFuture<Boolean>::join).collect(Collectors.toList())
if (results.contains(false)) {
// dead-letter the message if results contains false
return getQueueClient().deadLetterAsync(message.getLockToken());
} else {
// complete the message otherwise
getQueueClient().completeAsync(message.getLockToken());
}
}
We tested with the following scenarios:
Scenario 1 - receive mode = RECEIVEANDDELETE, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We observe random, long periods of inactivity from the QueueClient - ranging from minutes to hours - there is no Outgoing Messages from the Service Bus namespace (observed on the Metrics charts) and there are no consumption logs for the same time periods!
Scenario 2 - receive mode = PEEKLOCK, message publish rate: 30/ minute
Expected Behavior
The messages should be received continuosuly with a constant throughput (which need not necessarily be the throughput at source, where messages are published).
Actual behavior
We keep seeing MessageLockLostException constantly after 20-30 minutes into the run of the application.
We tried doing the following -
we reduced the prefetch count (from 20 * processing rate - as mentioned in the Best Practices guide) to a bare minimum (to even 0 in one test cycle), to reduce the no. of messages that are locked for the client
increased the maxAutoRenewDuration to 5 minutes - our processDB1() and processDB2() do not take more than a second or two for almost 90% of the cases - so, I think the lock duration of 30 seconds and maxAutoRenewDuration are not issues here.
removed the blocking CompletableFuture.get() and made the processing synchronous.
None of these tweaks helped us fix the issue. What we observed is that the COMPLETE or RENEWMESSAGELOCK are throwing the MessageLockLostException.
We need help with finding answers for the following:
why is there a long period of inactivity of the QueueClient in scenario 1?
how do we know the MessageLockLostExceptions are thrown, because the locks have indeed expired? we suspect the locks cannot expire too soon, as our processing happens in a second or two. disabling prefetch also did not solve this for us.
Versions and Service Bus details
Java - openjdk-11-jre
Azure Service Bus namespace tier: Standard
Java SDK version - 3.4.0
For Scenario 1 :
If you have the duplicate detection history enabled, there is a possibility of this behavior happening as per the below explained scenario :
I had enabled for 30 seconds. I constantly hit Service bus with duplicate messages ( im my case messages with the same messageid from the client - 30 /per minute). I would be seeing a no activity outgoing for the window. Though the messages are received at the servicebus from the sending client, I was not be able to see them in outgoing messages. You could probably check whether you re encountering the duplicate messages which are filtered - inturn resulting inactivity in outgoing.
Also Note : You can't enable/disable duplicate detection after the queue is created. You can only do so at the time of creating the queue.
The issue was not with the QueueClient object per se. It was with the processes that we were triggering from within the MessageHandler: processDB1(customObject) and processDB2(customObject). since these processes were not optimized, the message consumption dropped and the locks gor expired (in peek-lock mode), as the handler was spending more time (in relation to the rate at which messages were published to the queues) in completing these opertations.
After optimizing the processes, the consumption and completion (in peek-lock mode) were just fine.
We are caching our pages and content in Google CDN.
Google has provided us an API to invalidate cache for a particular page/path.
Our website is built using a CMS called AEM(Adobe Experience Manager), this CMS supports constant page/content updation eg. we may update what is shown on our https://our-webpage/homepage.html twice in a day. When such an operation is done there is a need to flush the cache at the Google CDN for "homepage.html".
Such kind of an activity is very common, meaning we need to send several cache invalidation requests(thousands) in a day.
We are sending so many invalidation requests that after sometime we get this error
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "usageLimits",
"message" : "Rate Limit Exceeded",
"reason" : "rateLimitExceeded"
} ],
"message" : "Rate Limit Exceeded"
}
How do we solve this?
I've read this page https://developers.google.com/drive/api/v3/handle-errors
It mentions batching requests.
How do I send invalidation requests for multiple pages to Google CDN in one batch?
Or Is it possible to increase or set the API flush call limit to a higher number per day.
Right now if we have 100 pages to flush from CDN we make the below HTTP call 100 times(one for each page).
CacheInvalidationRule requestBody = new CacheInvalidationRule();
// IMPORTANT
requestBody.setPath(pagePath);
Compute computeService = createComputeService();
Compute.UrlMaps.InvalidateCache request =
computeService.urlMaps().invalidateCache(projectName, urlMap, requestBody);
Operation response = request.execute();
if(LOG.isDebugEnabled()) {
LOG.debug("Google CDN Flush Response JSON :: {}",response);
}
LOG.info("Google CDN Flush Invalidation for Page Path {}:: Response Status Code:: {}",pagePath,response.getStatus());
We set the page to flush in requestBody.setPath(pagePath);
Can we do this in a more efficient way, like sending all pages as an array or strings in one HTTP call?
Like :
requestBody.setPath(pagePath);
Where
pagePath="['/homepage.html','/videos.html','/sports/basketball.html','/tickets.html','/faqs.html']";
Rate Limit Exceeded is flood protection you are going to fast slow down your requests.
Implement exponential back off for retrying the requests.
You can periodically retry a failed request over an increasing amount of time to handle errors related to rate limits, network volume, or response time. For example, you might retry a failed request after one second, then after two seconds, and then after four seconds. This method is called exponential backoff and it is used to improve bandwidth usage and maximize throughput of requests in concurrent environments. When using exponential backoff, consider the following:
Start retry periods at least one second after the error.
If the attempted request introduces a change, such as a create request, add a check to make sure nothing is duplicated. Some errors, such as invalid authorization credentials or "file not found" errors, aren’t resolved by retrying the request.
Batching wont help much your still going to be limited to the same issues with the rate limit i have even seen rate limit errors when batching.
Kindly note your link is from the Google drive api im not even sure Cloud CDN supports batching of requests.
Wouldn't it be better to aggregate several updates on AEM side and send only one request to the CDN after a max period of time and / or a max amount of changes?
I mean, if you change your homepage on AEM, usually you would invalidate all the subpages as well (navigation might change, ...).
Isn't there a possibility for the google cdn to invalidate a tree or subtree?
At least that's what I would extract from this documentation https://cloud.google.com/sdk/gcloud/reference/compute/url-maps/invalidate-cdn-cache
I've some applications installed to my customers and I configured smtp appender to receive errors email.
Unfortunally I need a way to understand from which customer is arriving the email.
I'm trying to set a parameter in the map in order to show it as the subject of the email. I can set this parameter only after my app is started and the db is up:
String[] parametri = {username};
MapLookup.setMainArguments(parametri);
and my log4j2.xml is:
<SMTP name="Mailer" subject="${sys:logPath} - ${map:0}" to="${receipients}"
from="${from}" smtpHost="${smtpHost}" smtpPort="${smtpPort}"
smtpProtocol="${smtpProtocol}" smtpUsername="${smtpUser}"
smtpPassword="${smtpPassword}" smtpDebug="false" bufferSize="200"
ignoreExceptions="false">
</SMTP>
the subject is the relevant part. Unfortunally the subject is not replaced from log4j and remains as it is.
What I'm doing wrong?
Thanks
Currently, the SmtpAppender class (actually its helper SmtpManager) creates a MimeMessage object once and reuses it for all messages to be sent. The message subject is initialized only once. The lookup is done only once when your configuration is read.
I suggest you raise a feature request on the Log4j2 Jira issue tracker for your use case.
Note: log4j 2.6+ supports this natively; you need Java7+ for this.
I created a free useable solution for log4j2 and also Java6 with an ExtendedSmtpAppender supporting PatternLayout in subject.
If you still use log4j 1.x (original question), simply replace your log4j-1.x.jar with log4j-1.2-api-2.x.jar - and log4j-core-2.x.jar + log4j-api-2.x.jar of course.
You get it from Maven Central as de.it-tw:log4j2-extras (This requires Java 7+ and log4j 2.8+).
If you are restricted to Java 6 (and thus log4j 2.3) then use de.it-tw:log4j2-Java6-extras
See also the GitLab project: https://gitlab.com/thiesw/log4j2-extras (or https://gitlab.com/thiesw/log4j2-Java6-extras)
Additionally, it supports burst summarizing, so you will not get 1000 error emails within a few seconds or minutes. Use case: Send all ERROR-logs via Email to support/developer. On a broken network or database this can cause hundreds of the same error email.
This appender does the following:
the first occurrence is emailed immediately
all following similar ERROR logs are buffered for a certain time (similarity and time is configurable)
after the time passed, a summary email with summary info (number of events, time) and the first and last event is send
Example configuration (inside <Appenders>):
<SMTPx name="ErrorMail" smtpHost="mailer.xxxx.de" smtpPort="25"
from="your name <noReply#xxx.de>" to="${errorEmailAddresses}"
subject="[PROJECT-ID, ${hostName}, ${web:contextPath}] %p: %c{1} - %m%notEmpty{ =>%ex{short})}"
subjectWithLayout="true" bufferSize="5"
burstSummarizingSeconds="300" bsCountInSubject="S" bsMessageMaskDigits="true"
bsExceptionOrigin="true" >
<PatternLayout pattern="-- %d %p %c [%.20t,%x] %m%n" charset="UTF-8" /> <!-- SMTP uses fixed charset for message -->
</SMTPx>
<Async name="AsyncErrorMail" blocking="false" errorRef="Console">
<AppenderRef ref="ErrorMail"/>
</Async>
See also https://issues.apache.org/jira/browse/LOG4J2-1192.
I have set up an errorHandler in a Camel route that will retry a message several times before sending the message to a dead letter channel (an activemq queue in this case). What I would like is to see an ERROR log when the message failed to be retried the max number of times and was then sent to the dead letter queue.
Looking at the docs for error handling and dead letter channels, it seems that there are 2 options available on the RedeliveryPolicy: retriesAttemptedLogLevel and retriesExhaustedLogLevel. Supposedly by default the retriesExhaustedLogLevel is already set at LoggingLevel.ERROR, but it does not appear to actually log anything when it has expended all retries and routes the message to the dead letter channel.
Here is my errorHandler definition via Java DSL.
.errorHandler(this.deadLetterChannel(MY_ACTIVE_MQ_DEAD_LETTER)
.useOriginalMessage()
.maximumRedeliveries(3)
.useExponentialBackOff()
.retriesExhaustedLogLevel(LoggingLevel.ERROR)
.retryAttemptedLogLevel(LoggingLevel.WARN))
I have explicitly set the level to ERROR now and it still does not appear to log out anything (to any logging level). On the other hand, retryAttemptedLogLevel is working just fine and will log to the appropriate LoggingLevel (ie, I could set retryAttemptedLogLevel to LoggingLevel.ERROR and see the retries as ERROR logs). However I only want a single ERROR log in the event of exhaustion, instead of an ERROR log for each retry when a subsequent retry could potentially succeed.
Maybe I am missing something, but it seems that the retriesExhaustedLogLevel does not do anything...or does not log anything if the ErrorHandler is configured as a DeadLetterChannel. Is there a configuration that I am still needing, or does this feature of RedeliveryPolicy not execute for this specific ErrorHandlerFactory?
I could also set up a route to send my exhausted messages that simply logs and routes to my dead letter channel, but I would prefer to try and use what is already built into the ErrorHandler if possible.
Updated the ErrorHandler's DeadLetterChannel to be a direct endpoint. Left the 2 logLevel configs the same. I got the 3 retry attempted WARN logs, but no ERROR log telling me the retries were exhausted. I did, however, set up a small route listening to the direct dead letter endpoint that logs, and that is working.
Not a direct solution to my desire to have the ERROR log work for the exhaustion, but is an acceptable workaround for now.
Please try with this code:
.errorHandler(deadLetterChannel("kafka:sample-dead-topic")
.maximumRedeliveries(4).redeliveryDelay(60000)
.retriesExhaustedLogLevel(LoggingLevel.WARN)
.retryAttemptedLogLevel( LoggingLevel.WARN)
.retriesExhaustedLogLevel(LoggingLevel.ERROR)
.logHandled(true)
.allowRedeliveryWhileStopping(true)
.logRetryStackTrace(true)
.logExhausted(true)
.logStackTrace(true)
.logExhaustedMessageBody(true)
)
retry is configured for 1 minute interval.
Camel application logged the errors for evry retry with the detailed information.