Prevent Duplicate Daily Report Emails being sent from Google App Engine - java

We have a problem. Our customers are complaining that they are getting duplicate emails in their in-box. Some days up to 5 or 6 instances of the exact same email at the exact same time. We don't understand why. The code has been re-written at least once but the problem persists.
I'll try to explain this... but it's a bit complicated :O(
Every night (early morning) we want to send our users a daily report containing usage stats. So we have a cron job:
<cron>
<url>/redacted/report/url</url>
<description>Send out daily reports to active subscribers</description>
<schedule>every 2 hours</schedule>
</cron>
The cron job hits the servlet get method:
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
AccountFilter filter = AccountFilter.forWebSafeName(req.getParameter("filter"));
createTasks(filter, null);
}
Which calls the createTasks method with a null cursor:
private void createTasks(AccountFilter accountFilter, String cursor) {
try {
PagedResults<Account> pagedAccounts = accountRepository.getAccounts(accountFilter.getFilter(), 50, cursor);
createTaskBatch(pagedAccounts);
// If there are still more results in cursor, then send cursor back to this servlet's doPost method so we don't hit the request time limit
if (pagedAccounts.getCursor() != null) {
getQueue(QUEUE_NAME).add(withUrl(WORKER_URL).param(CURSOR_KEY, pagedAccounts.getCursor()).param(FILTER_KEY, accountFilter.getWebSafeName()));
}
} catch(Exception ex) {
logger.log(Level.WARNING, "Problem creating daily report task batch for filter " + accountFilter.getWebSafeName(), ex);
}
}
which grabs 50 accounts and iterates over them creating new queued jobs for the emails that should be sent at this time. There is code to explcitely check the last report sent timestamp and update the timestamp BEFORE creating the new queued task. This should err on the side of not sending the report rather than sending duplicates:
private void createTaskBatch(PagedResults<Account> pagedAccounts) {
// GAE datastore query might return duplicate results?!
List<Account> list = pagedAccounts.getResults();
Set<Account> noDuplicates = new HashSet<>(list);
int dups = list.size() - noDuplicates.size();
if ( dups > 0 ){
logger.warning ("Accounts paged results contained " + dups + " duplicates!");
}
for (Account account : noDuplicates) {
try {
if (lastReportSentOver12HoursAgo(account)) {
List<Parent> parents = parentRepository.getVerifiedParentsForAccount(account.getId());
if (eitherParentSubscribed(parents)) {
List<AccountUser> users = accountUserRepository.listUsers(account.getId());
List<Device> devices = getUserDevices(account, users);
if (!devices.isEmpty()) {
DateTimeZone tz = getMostCommonTimezone(devices);
if ( null == tz ){
logger.warning("No timezone found for account: " + account.getId() );
}
else{
// Send early in the morning as the report contains the previous day's stats
if (now(tz).getHourOfDay() < 7) {
// mark sent now because queue might not be processed for a while
// and the next cursor set might contain some of the same accounts
accountRepository.markReportSent(account.getId(), now());
getQueue(QUEUE_NAME).add(withUrl(DailyReportServlet.WORKER_URL).param(DailyReportServlet.ACCOUNT_ID, account.getId()).param(DailyReportServlet.COMMON_TIMEZONE, tz.getID()));
}
}
}
}
}
} catch(Exception ex) {
logger.log(Level.WARNING, "Problem creating daily report task for " + account.getId(), ex);
}
}
}
The servlet POST method takes care of handling the follow up pages of results via the cursor method:
public void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
AccountFilter accountFilter = AccountFilter.forWebSafeName(req.getParameter(FILTER_KEY));
logger.log(Level.INFO, "doPost hit from task queue with filter " + accountFilter.getWebSafeName());
String cursor = req.getParameter(CURSOR_KEY);
createTasks(accountFilter, cursor);
}
There is another servlet that handles each report task and it just creates the email contents and calls send on the com.sendgrid.SendGrid class.
The eventual consistency in Datastore seems a likely candidate but that should be resolved within a few seconds and I don't see how that would account for both the number of customers complaining and the number of duplicates that some customers see.
Help! Any ideas? Are we being dumb somewhere?
UPDATED
For clarity... the email send task queue ends up in this method which does catch exceptions and reports them back to us. We don't see an exception for the duplicate cases:
private void sendReport(Account account, DateTimeZone tz) throws IOException, EntityNotFoundException {
try {
boolean sent = false;
Map<String, Object> root = buildEmailData(account, tz);
for (Parent parent : parentRepository.getVerifiedParentsForAccount(account.getId())) {
if (parent.getEmailPreferences().isSubscribedReports()) {
emailBuilder.send(account, parent, root, "report", EmailSender.NOTIFICATION);
sent = true;
}
}
if ( sent ){
accountRepository.markReportSent(account.getId(), now());
}
} catch (Exception ex) {
String message = "Problem building report email for account " + account.getId();
logger.log(Level.WARNING, message, ex);;
new TeamNotificationEvent( message + " : exception: " + ex.getMessage()).fire();
throw new IOException(message, ex);
}
}
UPDATE 2 AFTER ADDING EXTRA DEBUG LOGGING
I see two POSTS in at the same time to the same task queue with the same cursor:
09:35:08.397 2015-04-30 200 0 B 3.78s /ws/notification/daily-report-task-creator
0.1.0.2 - - [30/Apr/2015:01:35:08 -0700] "POST /ws/notification/daily-report-task-creator HTTP/1.1" 200 0 "http://screentimelabs.appspot.com/ws/notification/daily-report-task-creator" "AppEngine-Google; (+http://code.google.com/appengine)" "screentimelabs.appspot.com" ms=3782 cpu_ms=662 queue_name=dailyReports task_name=8168414365365326983 instance=00c61b117c33a909790f0d1882657e04f40b2c7e app_engine_release=1.9.20
09:35:04.618 com.screentime.service.taskqueue.reports.DailyReportTaskCreatorServlet createTasks: createTasks called for filter: ACTIVE with cursor: E-ABAIICO2oQc35zY3JlZW50aW1lbGFic3InCxIHQWNjb3VudCIaamFybW8ua2Fya2thaW5lbkBnbWFpbC5jb20MiAIAFA
09:35:08.432 2015-04-30 200 0 B 8.84s /ws/notification/daily-report-task-creator
0.1.0.2 - - [30/Apr/2015:01:35:08 -0700] "POST /ws/notification/daily-report-task-creator HTTP/1.1" 200 0 "http://screentimelabs.appspot.com/ws/notification/daily-report-task-creator" "AppEngine-Google; (+http://code.google.com/appengine)" "screentimelabs.appspot.com" ms=8837 cpu_ms=1348 queue_name=dailyReports task_name=50170612326424582061 instance=00c61b117c2bffe8de313e96fea8aeb813f4b20f app_engine_release=1.9.20 trace_id=7e5c0348382e66cf4e2c6ba400529fb7
09:34:59.608 com.screentime.service.taskqueue.reports.DailyReportTaskCreatorServlet createTasks: createTasks called for filter: ACTIVE with cursor: E-ABAIICO2oQc35zY3JlZW50aW1lbGFic3InCxIHQWNjb3VudCIaamFybW8ua2Fya2thaW5lbkBnbWFpbC5jb20MiAIAFA
Searching for 1 particular account id I see these requests:
09:35:08.397 2015-04-30 200 0 B 3.78s /ws/notification/daily-report-task-creator
09:35:08.432 2015-04-30 200 0 B 8.84s /ws/notification/daily-report-task-creator
09:35:08.443 2015-04-30 200 0 B 6.73s /ws/notification/daily-report-task-creator
09:35:10.541 2015-04-30 200 0 B 4.03s /ws/notification/daily-report-task-creator
09:35:10.690 2015-04-30 200 0 B 11.09s /ws/notification/daily-report-task-creator
09:35:13.678 2015-04-30 200 0 B 862ms /ws/notification/daily-report-worker
09:35:13.829 2015-04-30 500 0 B 1.21s /ws/notification/daily-report-worker
09:35:14.677 2015-04-30 200 0 B 1.56s /ws/notification/daily-report-worker
09:35:14.961 2015-04-30 200 0 B 346ms /ws/notification/daily-report-worker
Some have repeated cursor values.

I will make a guess because i dont see the task queue code. Its likely that you are not handling errors correctly in the task queue. If a task queue finishes with an error, gae will re-queue it. thus if some emails were already sent, the task will still run again. you need a way to remember what you already processed in the task queue so a retry wont reprocess those.

Related

Java: Azure Service Bus Queue Receiving messsages with sessions

I'm writing code in java (using Azure SDK for Java), I have a Service bus queue that contains sessionful messages. I want to receive those messages and process them to another place.
I make a connection to the Queue by using QueueClient, and then I use registerSessionHandler to process through the messages (code below).
The problem is that whenever a message is received, I can print all details about it including the content, but it is printed 10 times and after each time it prints an Exception.
(printing 10 times: I understand that this is because there is a 10 times retry policy before it throws the message to the Dead letter queue and goes to the next message.)
The Exception says
> USERCALLBACK-Receiver not created. Registering a MessageHandler creates a receiver.
The output with the Exception
But I'm sure that the SessionHandler does the same thing as MessageHandler but includes support for sessions, so it should create a receiver since it receives messages. I have tried to use MessageHandler but it won't even work and stops the whole program because it doesn't support sessionful messages, and the ones I receive have sessions.
My problem is understanding what the Exception wants me to do, and how can I fix the code so it won't give me any exceptions? Does anyone have suggestions on how to improve the code? or other methods that do the same thing?
QueueClient qc = new QueueClient(
new ConnectionStringBuilder(connectionString),
ReceiveMode.PEEKLOCK);
qc.registerSessionHandler(
new ISessionHandler() {
#Override
public CompletableFuture<Void> onMessageAsync(IMessageSession messageSession, IMessage message) {
System.out.printf(
"\nMessage received: " +
"\n --> MessageId = %s " +
"\n --> SessionId = %s" +
"\n --> Content Type = %s" +
"\n --> Content = \n\t\t %s",
message.getMessageId(),
messageSession.getSessionId(),
message.getContentType(),
getMessageContent(message)
);
return qc.completeAsync(message.getLockToken());
}
#Override
public CompletableFuture<Void> OnCloseSessionAsync(IMessageSession iMessageSession) {
return CompletableFuture.completedFuture(null);
}
#Override
public void notifyException(Throwable throwable, ExceptionPhase exceptionPhase) {
System.out.println("\n Exception " + exceptionPhase + "-" + throwable.getMessage());
}
},
new SessionHandlerOptions(1, true, Duration.ofMinutes(1)),
Executors.newSingleThreadExecutor()
);
(The getMessageContent(message) method is a separate method, for those interested:)
public String getMessageContent(IMessage message){
List<byte[]> content = message.getMessageBody().getBinaryData();
StringBuilder sb = new StringBuilder();
for (byte[] b : content) {
sb.append(new String(b)
);
}
return sb.toString();
}
For those who wonder, I managed to solve the problem!
It was simply done by using Azure Functions ServiceBusQueueTrigger, it will then listen to the Service bus Queue and process the messages. By setting isSessionsEnabled to true, it will accept sessionful messages as I wanted :)
So instead of writing more than 100 lines of code, the code looks like this now:
public class Function {
#FunctionName("QueueFunction")
public void run(
#ServiceBusQueueTrigger(
name = "TriggerName", //Any name you choose
queueName = "queueName", //QueueName from the portal
connection = "ConnectionString", //ConnectionString from the portal
isSessionsEnabled = true
) String message,
ExecutionContext context
) {
// Write the code you want to do with the message here
// Using the variable messsage which contains the messageContent, messageId, sessionId etc.
}
}

Creating several Google Calendars and avoiding 403 : Rate Limit Exceeded

I need to create up to 130 calendars on a Google account using Java and Google Calendar API but keep getting
"403 : Rate Limit Exceeded".
What I've tried :
-looping with
service.insert(calendar).execute();
-> result : I receive error 403 after 25 inserts ( which weirdly enough seems to be the old limit, should be 60 according to : https://support.google.com/a/answer/2905486?hl=en )
-looping with delay between each request (up to 60 seconds)
-> result : Didn't change the outcome. still 403 after 25 inserts ( In its documentation about Exponential Backoff, Google talks about mere seconds so I should think 1 whole minute is enough even if I don't increase exponentially that delay ).
-Request Batching ( following THIS Google example code )
-> result : After about 10 callbacks the response falls into the onFailure method with.. You guessed it 403 status code.
I think I am well within my ( maxed-out ) API quotas most of the time. I've seen "quotaExceeded" only once after several tests.
Batch Request sample :
batch = this.service.batch();
JsonBatchCallback<Calendar> callback = new JsonBatchCallback<Calendar>(){
#Override
public void onFailure(GoogleJsonError e, HttpHeaders responseHeaders) throws HttpResponseException{
log.debug(e);
throw new HttpResponseException(Integer.valueOf(e.get("code").toString()), e.getMessage());
}
#Override
public void onSuccess(Calendar cal, HttpHeaders responseHeaders) {
log.debug("Calendrier créé pour " + cal.getSummary());
}
};
for( String user : usernameList ) {
cal = new Calendar().setSummary(user);
this.service.calendars().insert(cal).queue(batch, callback);
}
batch.execute();

Why doesn't this thread pool execute HTTP requests simultaneously?

I wrote a few lines of code which will send 50 HTTP GET requests to a service running on my machine. The service will always sleep 1 second and return a HTTP status code 200 with an empty body. As expected the code runs for about 50 seconds.
To speed things up a little I tried to create an ExecutorService with 4 threads so I could always send 4 requests at the same time to my service. I expected the code to run for about 13 seconds.
final List<String> urls = new ArrayList<>();
for (int i = 0; i < 50; i++)
urls.add("http://localhost:5000/test/" + i);
final RestTemplate restTemplate = new RestTemplate();
final List<Callable<String>> tasks = urls
.stream()
.map(u -> (Callable<String>) () -> {
System.out.println(LocalDateTime.now() + " - " + Thread.currentThread().getName() + ": " + u);
return restTemplate.getForObject(u, String.class);
}).collect(Collectors.toList());
final ExecutorService executorService = Executors.newFixedThreadPool(4);
final long start = System.currentTimeMillis();
try {
final List<Future<String>> futures = executorService.invokeAll(tasks);
final List<String> results = futures.stream().map(f -> {
try {
return f.get();
} catch (InterruptedException | ExecutionException e) {
throw new IllegalStateException(e);
}
}).collect(Collectors.toList());
System.out.println(results);
} finally {
executorService.shutdown();
executorService.awaitTermination(10, TimeUnit.SECONDS);
}
final long elapsed = System.currentTimeMillis() - start;
System.out.println("Took " + elapsed + " ms...");
But - if you look at the seconds of the debug output - it seems like the first 4 requests are executed simultaneously but all other request are executed one after another:
2018-10-21T17:42:16.160 - pool-1-thread-3: http://localhost:5000/test/2
2018-10-21T17:42:16.160 - pool-1-thread-1: http://localhost:5000/test/0
2018-10-21T17:42:16.160 - pool-1-thread-2: http://localhost:5000/test/1
2018-10-21T17:42:16.159 - pool-1-thread-4: http://localhost:5000/test/3
2018-10-21T17:42:17.233 - pool-1-thread-3: http://localhost:5000/test/4
2018-10-21T17:42:18.232 - pool-1-thread-2: http://localhost:5000/test/5
2018-10-21T17:42:19.237 - pool-1-thread-4: http://localhost:5000/test/6
2018-10-21T17:42:20.241 - pool-1-thread-1: http://localhost:5000/test/7
...
Took 50310 ms...
So for debugging purposes I changed the HTTP request to a sleep call:
// return restTemplate.getForObject(u, String.class);
TimeUnit.SECONDS.sleep(1);
return "";
And now the code works as expected:
...
Took 13068 ms...
So my question is why does the code with the sleep call work as expected and the code with the HTTP request doesn't? And how can I get it to behave in the way I expected?
From the information, I can see this is the most probable root cause:
The requests you make are done in parallel but the HTTP server which fulfils these request handles 1 request at a time.
So when you start making requests, the executor service fires up the requests concurrently, thus you get the first 4 at same time.
But the HTTP server can respond to requests one at a time i.e. after 1 second each.
Now when 1st request is fulfilled the executor service picks another request and fires it and this goes on till last request.
4 request are blocked at HTTP server at a time, which are being served serially one after the other.
To get a Proof of Concept of this theory what you can do is use a messaging service (queue) which can receive concurrently from 4 channels an test. That should reduce the time.

PUSH Notifications for >1000 devices through GCM Server(Java)

I have a GCM-backend Java server and I'm trying to send to all users a notification msg. Is my approach right? To just split them into 1000 each time before giving the send request? Or is there a better approach?
public void sendMessage(#Named("message") String message) throws IOException {
int count = ofy().load().type(RegistrationRecord.class).count();
if(count<=1000) {
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).limit(count).list();
sendMsg(records,message);
}else
{
int msgsDone=0;
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).list();
do {
List<RegistrationRecord> regIdsParts = regIdTrim(records, msgsDone);
msgsDone+=1000;
sendMsg(regIdsParts,message);
}while(msgsDone<count);
}
}
The regIdTrim method
private List<RegistrationRecord> regIdTrim(List<RegistrationRecord> wholeList, final int start) {
List<RegistrationRecord> parts = wholeList.subList(start,(start+1000)> wholeList.size()? wholeList.size() : start+1000);
return parts;
}
The sendMsg method
private void sendMsg(List<RegistrationRecord> records,#Named("message") String message) throws IOException {
if (message == null || message.trim().length() == 0) {
log.warning("Not sending message because it is empty");
return;
}
Sender sender = new Sender(API_KEY);
Message msg = new Message.Builder().addData("message", message).build();
// crop longer messages
if (message.length() > 1000) {
message = message.substring(0, 1000) + "[...]";
}
for (RegistrationRecord record : records) {
Result result = sender.send(msg, record.getRegId(), 5);
if (result.getMessageId() != null) {
log.info("Message sent to " + record.getRegId());
String canonicalRegId = result.getCanonicalRegistrationId();
if (canonicalRegId != null) {
// if the regId changed, we have to update the datastore
log.info("Registration Id changed for " + record.getRegId() + " updating to " + canonicalRegId);
record.setRegId(canonicalRegId);
ofy().save().entity(record).now();
}
} else {
String error = result.getErrorCodeName();
if (error.equals(Constants.ERROR_NOT_REGISTERED)) {
log.warning("Registration Id " + record.getRegId() + " no longer registered with GCM, removing from datastore");
// if the device is no longer registered with Gcm, remove it from the datastore
ofy().delete().entity(record).now();
} else {
log.warning("Error when sending message : " + error);
}
}
}
}
Quoting from Google Docs:
GCM is support for up to 1,000 recipients for a single message. This capability makes it much easier to send out important messages to your entire user base. For instance, let's say you had a message that needed to be sent to 1,000,000 of your users, and your server could handle sending out about 500 messages per second. If you send each message with only a single recipient, it would take 1,000,000/500 = 2,000 seconds, or around half an hour. However, attaching 1,000 recipients to each message, the total time required to send a message out to 1,000,000 recipients becomes (1,000,000/1,000) / 500 = 2 seconds. This is not only useful, but important for timely data, such as natural disaster alerts or sports scores, where a 30 minute interval might render the information useless.
Taking advantage of this functionality is easy. If you're using the GCM helper library for Java, simply provide a List collection of registration IDs to the send or sendNoRetry method, instead of a single registration ID.
We can not send more than 1000 push notification at time.I searched a lot but not result then i did this with same approach split whole list in sub lists of 1000 items and send push notification.

How to identify URIs of active (long-running) HTTP requests?

Imagine a webapp which (sometimes) takes long time to respond to some HTTP (POST/GET/etc) request - how would You find such a request on server side?
So far I've used tomcat AccessLogValve to see the "completed" requests, but that doesn't let me to see the "in-progress" (stuck) ones :(
For example:
with netstat I'm able to identify long-lived sockets which could give me a count of currently-stuck requests (not URIs though), but HTTP keep-alives invalidate this approach
I could stackdump the app-server (kill -3 <server_pid>) multiple times and guess which threads are running long and reverse-engineer the URIs - not a smart way either
I could inject a router/proxy in front of web-app server (substitute hostnames, clone certs) which would show me the currently-running calls - not a simple approach
I could fall into just running tcpdump continously and parsing the traffic to keep list of currently-running URIs, but what to do with httpS then?
the closest I found is tomcat7's StuckThreadDetectionValve which'd periodically reports long-running calls, but it outputs the stacktrace (not URI) and doesn't provide "live" data (e.g. only polls periodically, floods the logs and lets to see the state of 1-60 seconds ago, but not "now")
Maybe I'm just missing/overlooking one of vital/core/basic tomcat features? or maybe weblogic (or any other app-server) has something robust to offer for this?
I'm kind of lost without such simple and essential feature.
Help? Please?
OK - creation of my own Valve was a proper and easy approach, sharing below. Apache did rework AccessLogValve multiple times, but all revisions follow same concept:
invoke(...) method just uses getNext().invoke(request,response) to invoke a chain of remaining valves and the actual handler/executor
log(...) method is invoked after above is complete
So we only need to:
also invoke log(...) before the getNext().invoke(request,response)
modify log(...) to distinguish "before" and "after" invocations
Easiest way would've been:
#Override
public void invoke(Request request, Response response) throws IOException, ServletException {
log(request, response, -1); // negative time indicates "before"
super.invoke(request, response);
}
But tomcat_6.0.16 code wasn't well-extendable, so I prefixed log-messages (in a hard-coded manner) with Thread.getName() and "before"/"after" indicator. Also I preferred to use reflection to access private AccessLogValve.getDate() :
package org.apache.catalina.valves;
import java.io.IOException;
import java.lang.reflect.Method;
import java.util.Date;
import javax.servlet.ServletException;
import org.apache.catalina.connector.Request;
import org.apache.catalina.connector.Response;
public class PreAccessLogValve extends AccessLogValve {
#Override
public void invoke(Request request, Response response) throws IOException, ServletException {
long timeStart = System.currentTimeMillis();
log(request, response, -timeStart); // negative time indicates "before" request
getNext().invoke(request, response);
log(request, response, System.currentTimeMillis() - timeStart); // actual (positive) - "after"
}
public void log(Request request, Response response, long time) {
if (started && getEnabled() && null != logElements && (null == condition || null == request.getRequest().getAttribute(condition))) {
StringBuffer result = new StringBuffer();
try {
Date date = (Date) methodGetDate.invoke(this);
for (int i = 0; i < logElements.length; i++) {
logElements[i].addElement(result, date, request, response, time);
}
} catch (Throwable t) { t.printStackTrace(); }
log(Thread.currentThread().getName() + (time<0?" > ":" < ") + result.toString());
}
}
private static final Method methodGetDate;
static {
Method m = null;
try {
m = AccessLogValve.class.getDeclaredMethod("getDate");
m.setAccessible(true);
} catch (Throwable t) { t.printStackTrace(); }
methodGetDate = m;
}
}
compiled above code with catalina.jar + servlet-api.jar and produced new catalina-my.jar, which was placed into tomcat/lib folder. After that - I've modified server.xml to have:
<Valve className="org.apache.catalina.valves.PreAccessLogValve"
directory="/tmp" prefix="test." suffix=".txt"
pattern="%a %t %m %U %s %b %D" resolveHosts="false" buffered="false"/>
Here's the sample output:
http-8007-exec-1 > 10.61.105.105 [18/Jan/2014:05:54:14 +0000] POST /admin/0$en_US/secure/enduser/search.do 200 - -1390024454470
http-8007-exec-5 > 10.61.105.105 [18/Jan/2014:05:54:17 +0000] GET /admin/0$en_US/secure/enduser/search.do 200 - -1390024457300
http-8007-exec-5 < 10.61.105.105 [18/Jan/2014:05:54:17 +0000] GET /admin/0$en_US/secure/enduser/search.do 200 13933 44
http-8007-exec-3 > 10.61.105.105 [18/Jan/2014:05:54:17 +0000] GET /admin/html/main.js 200 - -1390024457490
http-8007-exec-3 < 10.61.105.105 [18/Jan/2014:05:54:17 +0000] GET /admin/html/main.js 200 3750 0
http-8007-exec-5 > 10.61.105.105 [18/Jan/2014:05:54:17 +0000] GET /admin/images/layout/logo.gif 200 - -1390024457497
http-8007-exec-5 < 10.61.105.105 [18/Jan/2014:05:54:17 +0000] GET /admin/images/layout/logo.gif 200 1996 0
http-8007-exec-1 < 10.61.105.105 [18/Jan/2014:05:54:24 +0000] POST /admin/0$en_US/secure/enduser/search.do 200 13308 10209
This way all "in-progress" URIs can be easily retrieved at any moment:
[root#serv1 tomcat]# awk '{if(">"==$2){if($1 in S)print S[$1];S[$1]=$0}else delete S[$1]}END{for(i in S)print S[i]}' test
http-8007-exec-4 > 10.61.105.105 [18/Jan/2014:06:13:20 +0000] GET /admin/images/1x1blank.gif 200 - -13
http-8007-exec-2 > 10.61.105.105 [18/Jan/2014:06:13:16 +0000] POST /admin/servlet/handlersvr 200 - -13
Unfortunately, there is not a simple way to get a list of the in-flight HTTP requests that are taking a long time. As you mention, taking several thread dumps a few seconds apart will tell you which threads are performing the HTTP operations slowly (because the thread stack will be identical in each one that is waiting for the response). But, it doesn't tell you much more than that unless you can follow the code back to a static piece of code with the URL. But, you can take the thread dumps and identify the thread IDs, then take a heap dump and find those threads in the heap dump. While not straight-forward and definitely not simple, you can get the URL that is being used, how long it has been waiting, etc.

Categories