How to recover client from "No handler waiting for message" warning? - java

At medium to high load (test and production), when using the Vert.x Redis client, I get the following warning after a few hundred requests.
2019-11-22 11:30:02.320 [vert.x-eventloop-thread-1] WARN io.vertx.redis.client.impl.RedisClient - No handler waiting for message: [null, 400992, <data from redis>]
As a result, the handler supplied to the Redis call (see below) does not get called and the incoming request times out.
Handler<AsyncResult<String>> handler = res -> {
// success handler
};
redis.get(key, res -> {
handler.handle(res);
});
The real issue is that once the "No handler ..." warning comes up, the Redis client becomes useless because all further calls to Redis made via the client fails with the same warning resulting in the handler not getting called. I have an exception handler set on the client to attempt reconnection, but I do not see any reconnections being attempted.
How can one recover from this problem? Any workarounds to alleviate the severity would also be great.
I'm on vertx-core and vertx-redis-client 3.8.1 .

The upcoming 4.0 release had addressed this issue and a release should be hapening soon, how soon, I can't really tell.
The problem is that we can't easily port back from the master branch to the 3.8 branch because a major refactoring has happened on the client and the codebases are very different.
The new code, uses a connection pool and has been tested for concurrent access (and this is where the issue you're seeing comes from). Under load the requests are routed across all event loops and the queue that maintains the state between in flight requests (requests sent to redis) and waiting handlers would get out of sync in very special conditions.
So I'd first try to see if you can already start moving your code to 4.0, you can have a try with the 4.0.0-milestone3 version but to be totally fine, just have a run with the latest master which has more issues solved in this area.

Related

Cumulocity MQTT operation status doesn't change

I've registered a trial account to test Cumulocity and its mqtt api.
I want to send operation to some device (currently emulated by java service) and receive operation result.
As manual I use the following links:
https://www.cumulocity.com/guides/users-guide/device-management/#-a-name-operation-monitoring-a-working-with-operations
https://cumulocity.com/guides/device-sdk/mqtt#hello-mqtt-java
The following code is used to make a response to Cumulocity.
if (payload.startsWith("510")) {
System.out.println("Simulating device restart...");
client.publish("s/us", "501,c8y_Restart".getBytes(), 2, false);
System.out.println("...restarting...");
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
client.publish("s/us", "503,c8y_Restart".getBytes(), 2, false);
System.out.println("...done...");
}
501 code means that restart operation started and 503 code means that device restarted successfully.
But actually in Cumulocity UI operation status has changed to Pending.
If I send restart operation again the previous operation will changed to Success but the new one to Pending.
So, what am I doing wrong?
I expect to mark the operation as Failed or Success.
The "PENDING" state is always the initial state of every operation in Cumulocity.
The SmartREST MQTT for operations always follows the following order:
PENDING -> EXECUTING -> SUCCESSFUL/FAILED
SmartREST will always update the oldest operation (as we want to execute operations in the historical order).
So if you send 501 it will look for the oldest matching operation in PENDING state.
If you send 503 it will look for the oldest matching operation in EXECUTING state.
From your explanation it is not fully clear if there were already 2 restart operations when you your code was executed. You code is fully correct but if there were already two restart operation this would explain why one is now SUCCESSFUL and the other still in PENDING.

How to continue on client when heavy server computation is done

This might be a simple problem, but I can't seem to find a good solution right now.
I've got:
OldApp - a Java application started from the command line (no web front here)
NewApp - a Java application with a REST api behind Apache
I want OldApp to call NewApp through its REST api and when NewApp is done, OldApp should continue.
My problem is that NewApp is doing a lot of stuff that might take a lot of time which in some cases causes a timeout in Apache, and then sends a 502 error to OldApp. The computations continue in NewApp, but OldApp does not know when NewApp is done.
One solution I thought of is fork a thread in NewApp and store some kind of ID for the API request, and return it to OldApp. Then OldApp could poll NewApp to see if the thread is done, and if so - continue. Otherwise - keep polling.
Are there any good design patterns for something like this? Am I complicating things? Any tips on how to think?
If NewApp is taking a long time, it should immediately return a 202 Accepted. The response should contain a Location header indicating where the user can go to look up the result when it's done, and an estimate of when the request will be done.
OldApp should wait until the estimate time is reached, then submit a new GET call to the location. The response from that GET will either be the expected data, or an entity with a new estimated time. OldApp can then try again at the later time, repeating until the expected data is available.
So The conversation might look like:
POST /widgets
response:
202 Accepted
Location: "http://server/v1/widgets/12345"
{
"estimatedAvailableAt": "<whenever>"
}
.
GET /widgets/12345
response:
200 OK
Location: "http://server/v1/widgets/12345"
{
"estimatedAvailableAt": "<wheneverElse>"
}
.
GET /widgets/12345
response:
200 OK
Location: "http://server/v1/widgets/12345"
{
"myProperty": "myValue",
...
}
Yes, that's exactly what people are doing with REST now. Because there no way to connect from server to client, client just polls very often. There also some improved method called "long polling", when connection between client and server has big timeout, and server send information back to connected client when it becomes available.
The question is on java and servlets ... So I would suggest looking at Servlet 3.0 asynchronous support.
Talking from a design perspective, you would need to return a 202 accepted with an Id and an URL to the job. The oldApp needs to check for the result of the operation using the URL.
The thread that you fork on the server needs to implement the Callable interface. I would also recommend using a thread pool for this. The GET url for the Job that was forked can check the Future object status and return it to the user.

What causes "java.io.IOException: stream was reset: CANCEL" with okhttp and spdy?

I'm experimenting with OKHttp (version 2.0.0-RC2) and SPDY and seeing IOException: stream was reset: CANCEL quite a lot, maybe 10% or more of all requests in some preliminary testing. When using Apache HttpClient and regular https we were not seeing any equivalent issue as far as I'm aware. I'm pretty sure we also don't see anything equivalent with OkHttp when SPDY is disabled (client.setProtocols(ImmutableList.of(Protocol.HTTP_1_1))) but I haven't done enough testing to be 100% confident.
This previous question sees these exceptions among others and the advice there is to ignore them, but this seems crazy: we get an exception while reading data from the server, so we abort the data processing code (which using Jackson). We need to do something in such cases. We could retry the request, of course, but sometimes it's a POST request which is not retry-able, and if we've already started receiving data from the server then it's a good bet that the server as already taken the requested action.
Ideally there is some configuration of the client and/or the server that we can do in order to reduce the incidence of these exceptions, but I don't understand SPDY well enough to know even where to start looking or to advise our server-admin team to start looking.
Stack trace, in case it's helpful:
java.io.IOException: stream was reset: CANCEL
at com.squareup.okhttp.internal.spdy.SpdyStream$SpdyDataSource.checkNotClosed(SpdyStream.java:442)
at com.squareup.okhttp.internal.spdy.SpdyStream$SpdyDataSource.read(SpdyStream.java:344)
at com.squareup.okhttp.internal.http.SpdyTransport$SpdySource.read(SpdyTransport.java:273)
at okio.RealBufferedSource.exhausted(RealBufferedSource.java:60)
at okio.InflaterSource.refill(InflaterSource.java:96)
at okio.InflaterSource.read(InflaterSource.java:62)
at okio.GzipSource.read(GzipSource.java:80)
at okio.RealBufferedSource$1.read(RealBufferedSource.java:227)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.loadMore(UTF8StreamJsonParser.java:174)
at com.fasterxml.jackson.core.base.ParserBase.loadMoreGuaranteed(ParserBase.java:431)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2111)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2092)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:275)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:205)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:230)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:202)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.ObjectMapper._readValue(ObjectMapper.java:2765)
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:1546)
at com.fasterxml.jackson.core.JsonParser.readValueAsTree(JsonParser.java:1363)
at (application-level code...)
Your best bet is to set a breakpoint in the two places where the CANCEL error code is assigned: that's SpdyStream#closeInternal (line 246) and SpdyStream#receiveRstStream (line 304). If you can put a breakpoint here, you can capture who is canceling your stream and that'll shed light on the problem.
If for whatever reason you cannot attach a debugger, you can instrument the code to print a stacktrace when those lines are reached:
new Exception("SETTING ERROR CODE TO " + errorCode).printStackTrace();
In either case, I'm the author of that code and I'd love to help you resolve this problem.
Had the same problem and this was a result of network connection timeout, this was a result of downloading a large file from the web service
i had my timeout set to 2-min so i changed it to 5-min and it solved my problem
val okkHttpclient = OkHttpClient.Builder()
.connectTimeout(5, TimeUnit.MINUTES)
.writeTimeout(5, TimeUnit.MINUTES) // write timeout
.readTimeout(5, TimeUnit.MINUTES) // read timeout
.addInterceptor(networkConnectionInterceptor)
.build()
We had this issue because of broken http headers. The android Base64 encoder by default adds newlines which broke our Authorization headers.

Read timeout on a web service, but the operation still completes?

I have a Java web service client running on Linux (using Axis 1.4) that invokes a series of web services operations performed against a Windows server. There are times that some transactional operations fail with this Exception:
java.net.SocketTimeoutException: Read timed out
However, the operation on the server is completed (even having no useful response on the client). Is this a bug of either the web service server/client? Or is expected to happen on a TCP socket?
This is the expected behavior, rather than a bug. The operation behind the web service doesn't know anything about your read timing out so continues processing the operation.
You could increase the timeout of the connection - if you are manually manipulating the socket itself, the socket.connect() method can take a timeout (in milliseconds). A zero should avoid your side timing out - see the API docs.
If the operation is going to take a long time in each case, you may want to look at making this asynchronous - a first request submits the operations, then a second request to get back the results, possibly with some polling to see when the results are ready.
If you think the operation should be completing in this time, have you access to the server to see why it is taking so long?
I had similar issue. We have JAX-WS soap webservice running on Jboss EAP6 (or JBOSS 7). The default http socket timeout is set to 60 seconds unless otherwise overridden in server or by the client. To fix this issue I changed our java client to something like this. I had to use 3 different combinations of propeties here
This combination seems to work as standalone java client or webservice client running as part of other application on other web server.
//Set timeout on the client
String edxWsUrl ="http://www.example.com/service?wsdl";
URL WsURL = new URL(edxWsUrl);
EdxWebServiceImplService edxService = new EdxWebServiceImplService(WsURL);
EdxWebServiceImpl edxServicePort = edxService.getEdxWebServiceImplPort();
//Set timeout on the client
BindingProvider edxWebserviceBindingProvider = (BindingProvider)edxServicePort;
BindingProvider edxWebserviceBindingProvider = (BindingProvider)edxServicePort;
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.internal.ws.request.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.internal.ws.connect.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.ws.request.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("com.sun.xml.ws.connect.timeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("javax.xml.ws.client.receiveTimeout", connectionTimeoutInMilliSeconds);
edxWebserviceBindingProvider.getRequestContext().put("javax.xml.ws.client.connectionTimeout", connectionTimeoutInMilliSeconds);

Salesforce/PHP - Bulk Outbound message (SOAP), Time out issue - See update #2

Salesforce can send up to 100 requests inside 1 SOAP message. While sending this type of Bulk Ooutbound message request my PHP script finishes executing but SF fails to accept the ACK used to clear the message queue on the Salesforce side of things. Looking at the Outbound message log (monitoring) I see all the messages in a pending state with the Delivery Failure Reason "java.net.SocketTimeoutException: Read timed out". If my script has finished execution, why do I get this error?
I have tried these methods to increase the execution time on my server as I have no access on the Salesforce side:
set_time_limit(0); // in the script
max_execution_time = 360 ; Maximum execution time of each script, in seconds
max_input_time = 360 ; Maximum amount of time each script may spend parsing request data
memory_limit = 32M ; Maximum amount of memory a script may consume
I used the high settings just for testing.
Any thoughts as to why this is failing the ACK delivery back to Salesforce?
Here is some of the code:
This is how I accept and send the ACK file for the imcoming SOAP request
$data = 'php://input';
$content = file_get_contents($data);
if($content) {
respond('true');
} else {
respond('false');
}
The respond function
function respond($tf) {
$ACK = <<<ACK
<?xml version = "1.0" encoding = "utf-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<notifications xmlns="http://soap.sforce.com/2005/09/outbound">
<Ack>$tf</Ack>
</notifications>
</soapenv:Body>
</soapenv:Envelope>
ACK;
print trim($ACK);
}
These are in a generic script that I include into the script that uses the data for a specific workflow. I can process about 25 requests (That are in 1 SOAP response) but once I go over that I get the timeout error in the Salesforce queue. for 50 requests is usually takes my PHP script 86.77 seconds.
Could it be Apache? PHP?
I have also tested just accepting the 100 request SOAP response and just accepting and sending the ACK the queue clears out, so I know it's on my side of things.
I show no errors in the apache log, the script runs fine.
I did find some info on the Salesforce site but still no luck. Here is the link.
Also I'm using the PHP Toolkit 11 (From Salesforce).
Other forum with good SF help
Thanks for any insight into this,
--Phill
UPDATE:
If I receive the incoming message and print the response, should this happen first regardless if I do anything else after? Or does it wait for my process to finish and then print the response?
UPDATE #2:
okay I think I have the problem:
PHP uses the single thread processing approach and will not send back the ACK file until the thread has completed it's processing. Is there a way to make this a mutli thread process?
Thread #1 - accept the incoming SOAP request and send back the ACK
Thread #2 - Process the SOAP request
I know I could break it up into like a DB table or flat file, but is there a way to accomplish this without doing that?
I'm going to try to close the socket after the ACK submission and continue the processing, cross my fingers it will work.
Sounds like the outbound message is hitting the timeout. Other users have reported timeouts as low as 10 seconds (see forum link below). The sandbox instance that I use (cs1) is timing out after about 1 minute, from my testing. It's possible that the timeout is an organization or instance level setting that Salesforce controls.
Two things you could try:
Open a support ticket with
Salesforce to see if they can
increase the timeout value for
outbound messages. From my
experience, there are lot of
settings that they can modify on the
organization level - this might be
one of them.
Offload processing of your data, so
that the ACK is sent immediately
back to Salesforce. Then the actual
processing of your data will take
place asynchronously. ie. Message
queue, separate thread, etc.
Some other resources that might be helpful:
related Salesforce forum discussion
Outbound messaging documentation
I think they timeout the thing waiting for Your script to end.
There is a way You could try to fix this.
Output the envelope with ack message at the beginning and then flush the thing so that their server gets it before You end processing. No threading, just plain priorities rethinking :)
read this for best info on flushing content
Are you 100% sure that Salesforce will wait the amount of time your scripts need too run? 80 seconds seem like a loong time too me.
If all requests failed I would guess that Salesforce expects you to set the Content-Type header appropriately, but this does not seem to be the case.
I don't know about Salesforce, but if you want to make some multithreading with PHP you should take a look at this code example and more precisely to pcntl_fork().
N.B: pcntl is not enabled by default and won't work on Windows platforms.
So what I've done is:
Accept all incoming OBM's, parse them into a DB
When this is done kick of a process that runs in the background (Actually I send it to the background so the script can end)
Send ACK file back
By just accepting the raw data, parsing into fields and inserting it into a DB is fairly quick. Then I issue a Linux Command Line command that also send the processing script to run in the background. Then I send the ACK file to SF and the script ends within the allotted time. It is cumbersome to split the script process into two separate stages but it works.

Categories