Advice please on connection management and retries with IMqttAsyncClient in Java
I see that connection options include auto-reconnection, which in the synchronous client only come into play when an initial connection has been made.
I don't see explicit documentation on the behaviour for IMqttAsyncClient is the initial connection fails. Am I required to have retry logic in my code for the initial connection?
So far it appears that when I attempt to connect as expected the failure callback fires. But then what am I supposed to do? Am I required to code some retry logic myself? In the later auto-reconnect scenario, retries occur. At first sight it appears that once the initial connection fails then that's it.
I have coded a retry in the failure callback
// my original connection method, re-used below from failure
... connect(... params ...) {
attempt connection using MqttAsyncClient.connect()
}
// callback
public void onFailure(IMqttToken asyncActionToken, Throwable exception) {
// log failure here
// User Context lets me know what we're trying to do
if ( "connecting".equals(asyncActionToken.getUserContext())
&& ( ! myShutdownRequested ) ) {
// log retrying and sleep a bit here
connect(); // calling original connect method again
}
}
explicit questions
Do I have responsibility for handling retries? Seems odd, given the auto-try capability, but so it goes.
In which case is it safe to call MqttAsyncClient.connect() from inside the failure callback?
I've failed to find explicit documentation on either point, and "did I try it" doesn't cover point 2. If there's a subtle race condition any problem might not show up immediately. So far it appears to work nicely ...
Related
I get lots of events to process in RabbitMq and then those get forward to service 1 to process and after some processing the data, there is an internal call to a micro service2. However, I do get java.net.SocketTimeoutException: timeout frequently when I call service2, so I tried to increase timeout limit from 2s to 10 sec as a first trial and it did minimise the timeout exceptions but still lot of them are still there,
second change I made is removal of deprecated retry method of spring and replace the same with retryWhen method with back off and jitter factor introduced as shown below
.retryWhen(Retry.backoff(ServiceUtils.NUM_RETRIES, Duration.ofSeconds(2)).jitter(0.50)
.onRetryExhaustedThrow((retryBackoffSpec, retrySignal) -> {
throw new ServiceException(
ErrorBo.builder()
.message("Service failed to process after max retries")
.build());
}))
.onErrorResume(error -> {
// return and print the error only if all the retries have been exhausted
log.error(error.getMessage() + ". Error occurred while generating pdf");
return Mono.error(ServiceUtils
.returnServiceException(ServiceErrorCodes.SERVICE_FAILURE,
String.format("Service failed to process after max retries, failed to generate PDF)));
})
);
So my questions are,
I do get success for few service call and for some failure, does it mean some where there is still bottle neck for processing the request may be at server side that is does not process all the request.
Do I need to still increase timeout limit if possible
How do I make sure that there is no java.net.SocketTimeoutException: timeout
This issue has started coming recently. and it seems there is no change in ports or any connection level changes.
But still what all things I should check in order to make sure the connection level setting are correct. Could someone please guide on this.
Thanks in advance.
I'm running an HL Fabric private network and submitting transactions to the ledger from a Java Application using Fabric-Java-Sdk.
Occasionally, like 1/10000 of the times, the Java application throws an exception when I'm submitting the transaction to the ledger, like the message below:
ERROR 196664 --- [ Thread-4] org.hyperledger.fabric.sdk.Channel
: Future completed exceptionally: sendTransaction
java.lang.IllegalArgumentException: The proposal responses have 2
inconsistent groups with 0 that are invalid. Expected all to be
consistent and none to be invalid. at
org.hyperledger.fabric.sdk.Channel.doSendTransaction(Channel.java:5574)
~[fabric-sdk-java-2.1.1.jar:na] at
org.hyperledger.fabric.sdk.Channel.sendTransaction(Channel.java:5533)
~[fabric-sdk-java-2.1.1.jar:na] at
org.hyperledger.fabric.gateway.impl.TransactionImpl.commitTransaction(TransactionImpl.java:138)
~[fabric-gateway-java-2.1.1.jar:na] at
org.hyperledger.fabric.gateway.impl.TransactionImpl.submit(TransactionImpl.java:96)
~[fabric-gateway-java-2.1.1.jar:na] at
org.hyperledger.fabric.gateway.impl.ContractImpl.submitTransaction(ContractImpl.java:50)
~[fabric-gateway-java-2.1.1.jar:na] at
com.apidemoblockchain.RepositoryDao.BaseFunctions.Implementations.PairTrustBaseFunction.sendTrustTransactionMessage(PairTrustBaseFunction.java:165)
~[classes/:na] at
com.apidemoblockchain.RepositoryDao.Implementations.PairTrustDataAccessRepository.run(PairTrustDataAccessRepository.java:79)
~[classes/:na] at java.base/java.lang.Thread.run(Thread.java:834)
~[na:na]
While my submitting method goes like this:
public void sendTrustTransactionMessage(Gateway gateway, Contract trustContract, String payload) throws TimeoutException, InterruptedException, InvalidArgumentException, TransactionException, ContractException {
// Prepare
checkIfChannelIsReady(gateway);
// Execute
trustContract.submitTransaction(getCreateTrustMethod(), payload);
}
I'm using a 4 org network with 2 peers each and I am using 3 channels, one for each chaincode DataType, in order to keep the things clean.
I think that the error coming from the Channel doesn't make sense because I am using the Contract to submit it...
Like I'm opening the gateway and then I keep it open for continuously submit the txs.
try (Gateway gateway = getBuilder(getTrustPeer()).connect()) {
Contract trustContract = gateway.getNetwork(getTrustChaincodeChannelName()).getContract(getTrustChaincodeId(), getTrustChaincodeName());
while (!terminateLoop) {
if (message) {
String payload = preparePayload();
sendTrustTransactionMessage(gateway, trustContract, payload);
}
...
wait();
}
...
}
EDIT:
After reading #bestbeforetoday advice, I've managed to catch the ContractException and analyze the logs. Still, I don't fully understand where might be the bug and, therefore, how to fix it.
I'll add 3 prints that I've taken to the ProposalResponses received in the exception and a comment after it.
ProposalResponses-1
ProposalResponses-2
ProposalResponses-3
So, in the first picture, I can see that 3 proposal responses were received at the exception and the exception cause message says:
"The proposal responses have 2 inconsistent groups with 0 that are invalid. Expected all to be consistent and none to be invalid."
In pictures, 2/3 is represented the content of those responses and I notice that there are 2 fields saving null value, namely "ProposalRespondePayload" and "timestamp_", however, I don't know if those are the "two groups" referred at the message cause of the exception.
Thanks in advance...
It seems that, while the endorsing peers all successfully endorsed your transaction proposal, those peer responses were not all byte-for-byte identical.
There are several things that might differ, including read/write sets or the value returned from the transaction function invocation. There are several reasons why differences might occur, including non-deterministic transaction function implementation, different transaction function behaviour between peers, or different ledger state at different peers.
To figure out what caused this specific failure you probably need to look at the peer responses to identify how they differ. You should be getting a ContractException thrown back from your transaction submit call, and this should allow you to access the proposal responses by calling e.getProposalResponses():
https://hyperledger.github.io/fabric-gateway-java/release-2.2/org/hyperledger/fabric/gateway/ContractException.html#getProposalResponses()
I have successfully setup a Flux receiving events from a remote system (the protocol is websocket but that's irrelevant for the question) and handling connection glitches gracefully using retryBackoff method. The code (simplified) is something like this:
Flux flux = myEventFlux
.retryBackoff( Long.MAX_VALUE, Duration.ofSeconds(5) )
.publish()
.refCount();
flux.subscribe( System.out::println );
Now I'd like to handle connection lost and connection recovery events in order to show some cues in the UI, or at least register some logs. Detecting errors seems easy, just a doOnError before retryBackoff does the trick. But recovery is another story... I need something like "first successful event after an error", so I've tryed this:
flux.next().subscribe( event -> System.out.println("first = " + event) );
It works in the first normal connection (no previous error) but not in subsequent reconnections after errors.
The difficulty with what you want to achieve is that there is no way to distinguish two legitimate end-of-line subscribers subscribing to the retrying Flux vs one subscriber that triggers two attempts, from the perspective of myEventFlux.
So you could use doOnSubscribe (or doFirst since 3.2.10.RELEASE), but it would be subject to the limitation above. It would also trigger on the original connection, not just the retries...
Maybe for the UI use case this would still help?
I'm trying to make a simple SQL Transaction, but unfortunately I can't get it to work right.
What I'm doing right now:
protected Single<SQLConnection> tx() {
return PostgreSQLClient.createShared(getVertx(), SqlUtil.getConnectionData())
.rxGetConnection().map((SQLConnection conn) -> {
conn.rxSetAutoCommit(false);
return conn;
});
}
This should be enough from what I understand from reading the docs?
when I inspect conn I see:
inTransaction = false
isAutoCommit = true
Why is that and what am I doing wrong here?
--
I use the common sql driver (http://vertx.io/docs/vertx-sql-common) with vertx 3.4.1
What you're seeing is the internal state of the connection. The current implementation, controls transactionality using 2 flags:
inTransaction
isAutoCommit
The last one is flipped once you call the method:
conn.rxSetAutoCommit(false);
But this is handled internally as a NOOP. Only when a call is performed after the transaction will be started and the first flag will change.
Keep in mind that this is internal state of the client and can/will change in the future when proper transaction isolation levels are implemented in the async driver for which there is already a pending pull request.
If you want to see it working, basically issue a SQL statement in your code, e.g.:
conn.rxSetAutoCommit(false).rxExecute("SELECT ...")
and if you inspect again you will see that both flags are now true as well there is a running transaction on your server.
I have a spring rabbit consumer:
public class SlackIdle1Consumer extends AbstractMessageConsumer {
#Override public void process(Message amqpMessage, Channel channel)
throws Exception {
/*very bad exception goes here.
it causes amqp message to be rejected and if no other consumer is available and error
still persists, the message begins looping over and over.
And when the error is fixed,
those messages are being processed but the result of this procession may be harmful.
*/
}
}
}
And somewhere inside an exception happens. Lets imagine this is a bad exception - development logic error. So amqp message begins to spin indefinitely, and when error is fixed and consumer restarted, all old messages are being processed, and it's bad, because logic and data may change since those messages were sent. How to handle it properly?
So the question is: how to fix this situation properly? Should I wrap all my code to try-catch clause or will I have to develop 'checks' in each consumer to prevent consistency issues in my app?
There are several options:
Set the container's defaultRequeueRejected property to false so failed messages are always rejected (discarded or sent to a dead letter exchange depending on queue configuration).
If you want some exceptions to be retried and others not, then add a try catch and throw an AmqpRejectAndDontRequeueException to reject those you don't want retried.
Add a custom ErrorHandler to the container, to do the same thing as #2 - determine which exceptions you want retried - documentation here.
Add a retry advice with a recoverer - the default recoverer simply logs the error, the RejectAndDontRequeueRecoverer causes the message to be rejected after retries are exhausted, the RepublishMessageRecoverer is used to write to a queue with additional diagnostics in headers - documentation here.