Am invoking an REST endpoint from another service using restTemplate.exchange.
The endpoint that receives the request invokes DB and fetches around 1.5 mil records and stores them in another DB.
Now am getting below x_cf_routererror:"endpoint_failure (context canceled)" after invoking the DB. I get this error in about 120+ seconds and process continues as is.
After this error I see another call being made the same endpoint and this is resulting duplicates in target DB.
Not sure why this is happening, I do not have any retry mechanism in place and the restTempalte timeout is set to 300 at client service that invokes.
Has someone faced this issue? whats causing this endpoint_failure (context canceled) and duplicate invocation of endpoint.
Appreciate your help in this.
Log snippet:
2022-05-12T08:57:18.840-04:00 [APP/PROC/WEB/0] [OUT] 2022-05-12 12:57:18.840 INFO 28 --- [nio-8080-exec-4]
Controller1 : Request received to load all timecard information::RequestedTime=12:57:18.840
2022-05-12T08:59:21.530-04:00 [RTR/17] [OUT] - [2022-05-12T12:57:18.829182975Z] "GET HTTP/1.1" 499 0 22 "-" "Java/1.8.0_332" "" "1" x_forwarded_for:"" x_forwarded_proto:"https" vcap_request_id:"" response_time:122.701301 gorouter_time:0.000164 app_id:"" app_index:"0" instance_id:"" x_cf_routererror:"endpoint_failure (context canceled)" x_b3_traceid:"" x_b3_spanid:"" x_b3_parentspanid:"-" b3:"599552bb012c2adc60adef7187a865e7-60adef7187a865e7"
**Below is the duplicate call**
2022-05-12T08:59:21.777-04:00 [APP/PROC/WEB/0] [OUT] 2022-05-12 12:59:21.777 INFO 28 --- [nio-8080-exec-2]
Controller1 : Request received to load all timecard information::RequestedTime=12:59:21.777
Thanks,
S
The error 499 (or 502) is returned by PCF gorouter, instead of your web app [APP/PROC/WEB/0].
PCF gorouter will retry with same HTTP request for particular error cases.
For more details please refer to: https://docs.cloudfoundry.org/adminguide/troubleshooting-router-error-responses.html
Error 499 usually mean that call is taking too much of time and client closed the connection. It has been noticed many times when your app is not able to get a response within specific time (usually 2min time limit set). You might need to see if there is any DB issue or something else that delaying the expected response.
Related
On one of our platforms, HDFS namenode is shutting down with following error message every 1 or 3 days
FATAL namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(390)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [<ip1>:<port>,<ip2>:<port>, etc], stream=QuorumOutputStream starting at txid 29873171))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:710)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:188)
at java.lang.Thread.run(Thread.java:748)
Before this FATAL log we can see following kind of logs, on which we can detect a degradation of response time
WARN client.QuorumJournalManager (QuorumCall.java:waitFor(185)) - Waited 18014 ms (timeout=20000 ms) for a response for sendEdits. Succeeded so far: [<ip1>:<port>,<ip2>:<port>]
Have you already encountered this problem, and do you have any advices to fix it ?
We have already:
checked that our VMs are time synchronized
detected that when the problem occurs a burst of data on the network is in progress, without detecting the root cause yet
checked our network devices. Except a problem on a port which goes from UP state to DOWN state quickly that we are going to fix, the network seems correct
Thanks in advance
let's say my Java chaincode (running on Fabric 1.4.4) wants to throw an exception to show that the new asset to be created already exists. I am throwing a RunTimeException with the problem or error (In this case, "Contract LL00001 already registered") which is logged in the Peer node executing the transaction:
2019-11-29 20:15:37.807 UTC [peer.chaincode.nid1-blockchain-hapeer1-mrrc-0.1.4] func2 -> INFO 16a8ec Contract LL00001 already registered
2019-11-29 20:15:37.807 UTC [peer.chaincode.nid1-blockchain-hapeer1-mrrc-0.1.4] func2 -> INFO 16a8ed java.lang.RuntimeException: Contract LL00001 already registered
But then, after the stack trace I see that the peer node is returning it as 500 error without includeing my error description or any reference to the error Exception in java (this makes sense as that error is language agnostic):
2019-11-29 20:15:37.807 UTC [peer.chaincode.nid1-blockchain-hapeer1-mrrc-0.1.4] func2 -> INFO 16a8ff 20:15:37:804 SEVERE org.hyperledger.fabric.shim.impl.ChaincodeInnvocationTask call [1f56a053] Invoke failed with error code 500. Sending ERROR
Which is logged in my client java application (which uses fabrik-java-sdk):
org.hyperledger.fabric.sdk.exception.InvalidArgumentException: Proposal response is invalid.
at org.hyperledger.fabric.sdk.ProposalResponse.getChaincodeActionResponsePayload(ProposalResponse.java:272)
at ...
So I just know that there was a problem from the chaincode, but I can't know what the problem is. How can I get the error type and description so I can show the problem to the user? Now I need to go to the peer node an check logs there to see what problem is.
Note: I am extending the new org.hyperledger.fabric.contract.ContractInterface in my chaincode class.
Update: peer node logs the error exception (org.hyperledger.fabric.shim.ChaincodeException) and seems to return correctly the error message ("The document was not found") in the 500 response as shown in log, but this message does not get to Java SDK
2019-12-23 22:11:09.178 UTC [peer.chaincode.nid1-blockchain-hapeer1-mrrc-0.9.7] func2 -> INFO 5aa7 22:11:09:176 SEVERE org.hyperledger.fabric.shim.impl.ChaincodeInnvocationTask call [12cc4ad0] Invoke failed with error code 500. Sending ERROR
2019-12-23 22:11:09.179 UTC [peer.chaincode.nid1-blockchain-hapeer1-mrrc-0.9.7] func2 -> INFO 5aa8 22:11:09:177 FINE org.hyperledger.fabric.shim.impl.ChaincodeSupportClient$2 accept > sendToPeer 12cc4ad09a1feb7fc1246ac04bf69509204ca74368be2c7e4bbf0a503e90417f
2019-12-23 22:11:09.181 UTC [endorser] callChaincode -> INFO 5aa9 [mrrc][12cc4ad0] Exit chaincode: name:"mrrc" (36ms)
2019-12-23 22:11:09.181 UTC [endorser] SimulateProposal -> ERRO 5aaa [mrrc][12cc4ad0] failed to invoke chaincode name:"mrrc" , error: transaction returned with failure: The document was not found
Edit: It seems to be a error in Java SDK. I have created a JIRA issue in Fabric's JIRA:
https://jira.hyperledger.org/browse/FABJ-508
To throw this error back to your fabric java sdk client, one way would be to make your chaincode class extend the ChaincodeBase class (which can be imported in your java program by importing org.hyperledger.fabric.shim ) and then you can use its newErrorResponse method in each of your chaincode methods to throw your custom errors wherein you can provide the error string as it's first (or only) parameter. You can possibly have a look at this example from the fabric samples repo:
https://github.com/hyperledger/fabric-samples/blob/master/chaincode/abstore/java/src/main/java/org/hyperledger/fabric-samples/ABstore.java#L28
To have a look at the overloaded implementations of the newErrorResponse method, so that you can see what other possible params you can pass to it, please follow:
https://jar-download.com/artifacts/org.hyperledger.fabric-chaincode-java/fabric-chaincode-shim/1.4.0/source-code/org/hyperledger/fabric/shim/ChaincodeBase.java
UPDATE : If you are using the newer chaincodeInterface (as suggested by icordoba) for your chaincode implementations instead, then you should throw an instance of ChaincodeException class which can be imported by importing org.hyperledger.fabric.shim.ChaincodeException, to achieve the same, you can have a look at a sample chaincode here
I am using "hazelcast.operation.call.timeout.millis = 100" configuration to timeout hazelcast operations.
But at the startup of the hazelcast some of the map size operation are getting timeout because of this configuration. I just only want to timeout the operations after the map load which are basically map get operations. Is there any way to add custom operation timeout for those map.get() operations ?
Is there any other way to get this done ???
com.hazelcast.core.OperationTimeoutException: HDMapSizeOperation got rejected before execution due to not starting within the operation-call-timeout of: 100ms. Current time: 2017-05-15 11:41:47.503. Start time: 2017-05-15 11:41:44.189. Total elapsed time: 3314 ms. Invocation{op=com.hazelcast.map.impl.operation.HDMapSizeOperation{serviceName='hz:impl:mapService', identityHash=1941379381, partitionId=0, replicaIndex=0, callId=-24461, invocationTime=1494828707296 (2017-05-15 11:41:47.296), waitTimeout=-1, callTimeout=100, name=blockMap}, tryCount=250, tryPauseMillis=500, invokeCount=11, callTimeoutMillis=100, firstInvocationTimeMs=1494828704189, firstInvocationTime='2017-05-15 11:41:44.189', lastHeartbeatMillis=0, lastHeartbeatTime='1970-01-01 05:30:00.000', target=[192.168.2.204]:5701, pendingResponse={VOID}, backupsAcksExpected=0, backupsAcksReceived=0, connection=null}
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.newOperationTimeoutException(InvocationFuture.java:151)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:99)
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:75)
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:155)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.retryFailedPartitions(InvokeOnPartitions.java:143)
at com.hazelcast.spi.impl.operationservice.impl.InvokeOnPartitions.invoke(InvokeOnPartitions.java:73)
at com.hazelcast.spi.impl.operationservice.impl.OperationServiceImpl.invokeOnAllPartitions(OperationServiceImpl.java:371)
at com.hazelcast.map.impl.proxy.MapProxySupport.size(MapProxySupport.java:628)
at com.hazelcast.map.impl.proxy.MapProxyImpl.size(MapProxyImpl.java:102)
at it.XXXX.tbx.server.MapLoader.run(MapLoader.java:36)
Regards,
Tharinda
If you are trying to control waiting on the result of e.g. a map.get; you could have a look at the asynchronous version like map.getAsync. It returns a future and you can control how long you want to wait for a result.
Modifying the call timeout is not advised.
I got strange behavior when my application tries to send an email.
20:59:08,926 ERROR [release.com.mycompany.mail.GenericMail] (EJB default - 5) [MY_EJB INBOUND] Sending message failed!: javax.mail.MessagingException: Can't send command to SMTP host;
nested exception is:
java.net.SocketException: Broken pipe
at com.sun.mail.smtp.SMTPTransport.sendCommand(SMTPTransport.java:2106) [mail-1.4.4-redhat-2.jar:1.4.4-redhat-2]
at com.sun.mail.smtp.SMTPTransport.sendCommand(SMTPTransport.java:2093) [mail-1.4.4-redhat-2.jar:1.4.4-redhat-2]
at com.sun.mail.smtp.SMTPTransport.close(SMTPTransport.java:1184) [mail-1.4.4-redhat-2.jar:1.4.4-redhat-2]
at javax.mail.Transport.send0(Transport.java:197) [mail-1.4.4-redhat-2.jar:1.4.4-redhat-2]
at javax.mail.Transport.send(Transport.java:124) [mail-1.4.4-redhat-2.jar:1.4.4-redhat-2]
at com.mycompany.MailUtils.sendMail(MailUtils.java:258) [classes:]
Before this exception some timeout exception has been thrown:
20:57:50,291 ERROR [org.jboss.as.ejb3] (EJB default - 9) [ ] JBAS014122: Error during
retrying timeout for timer: [id=6be904b5-c1ef-4f0e-a277-d4c9f93e21b3 timedObjectId=SOME_EJB
auto-timer?:false persistent?:false timerService=org.jboss.as.ejb3.timerservice.TimerServiceImpl#99fdab1 initialExpiration=/* date */ 00:00:00 UTC 2015 intervalDuration(in milli sec)=0 nextExpiration=/* other date */ 21:00:00 UTC 2015
timerState=RETRY_TIMEOUT: javax.ejb.EJBTransactionRolledbackException: JBAS014373:
EJB 3.1 PFD2 4.8.5.5.1 concurrent access timeout on org.jboss.invocation.InterceptorContext$Invocation#66668127 - could not obtain lock within 5000MILLISECONDS
To be honest i don't know whats happened. I got these kind of exception for 5 hours.
I want to know whats happened and be able to avoid exceptions in future.
UPDATE 1
SOME_EJB is an ejb with works with timerService. Runs every 3 minutes, when the conditions are met sends an email.
My only idea is that there was some network/database issue and it caused that single execution of task took more than 3 minutes.
MailUtils is a #Stateless ejb
Is the mail sent or not?
It looks like JavaMail is closing the connection to the mail server when the exception occurs. If a previous error caused the server to drop the connection immediately, JavaMail may be getting this exception while trying to send the SMTP BYE command before closing the connection.
Turn on JavaMail Session debugging to see what error the server might have reported before this exception occurs.
I've implemented an HTTP service based on the HTTP server example as provided by the netty.io project.
When I execute a GET request to the service URL from command-line (wget) or from a browser, I receive a result as expected.
When I perform a load test using ApacheBench ab -n 100000 -c 8 http://localhost:9000/path/to/service, experience no errors (neither on service nor on ab side) and see fair numbers for request processing duration.
Afterwards, I set up a test plan in JMeter having a thread group with 1 thread and a loop count of 2. I inserted an HTTP request sampler where I simply added the server name localhost, the port number 9000 and the path /path/to/service. Then I also added a View Results Tree and a Summary Report listener.
Finally, I executed the test plan and received one valid response and one error showing the following content:
Thread Name: Thread Group 1-1
Sample Start: 2015-06-04 09:23:12 CEST
Load time: 0
Connect Time: 0
Latency: 0
Size in bytes: 2068
Headers size in bytes: 0
Body size in bytes: 2068
Sample Count: 1
Error Count: 1
Response code: Non HTTP response code: org.apache.http.NoHttpResponseException
Response message: Non HTTP response message: The target server failed to respond
Response headers:
HTTPSampleResult fields:
ContentType:
DataEncoding: null
The associated exception found in response data tab showed the following content
org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.jmeter.protocol.http.sampler.MeasuringConnectionManager$MeasuredConnection.receiveResponseHeader(MeasuringConnectionManager.java:201)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:715)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:520)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.executeRequest(HTTPHC4Impl.java:517)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:331)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:74)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1146)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1135)
at org.apache.jmeter.threads.JMeterThread.process_sampler(JMeterThread.java:434)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:261)
at java.lang.Thread.run(Thread.java:745)
As I have a similar service already running which receives and processes web tracking data which shows no errors, it might be a problem within my test plan or JMeter .. but I am not sure :-(
Did anyone experience similar behavior? Thanks in advance ;-)
Issue can be related to Keep-Alive management.
Read those:
https://bz.apache.org/bugzilla/show_bug.cgi?id=57921
https://wiki.apache.org/jmeter/JMeterSocketClosed
So your solution is one of those:
If you're sure it's a keep alive issue:
Try jmeter nightly build http://jmeter.apache.org/nightly.html:
Download the _bin and _lib files
Unpack the archives into the same directory structure
The other archives are not needed to run JMeter.
And adapt the value of httpclient4.idletimeout
A workaround is to increase retry or add connection stale check as per :
https://wiki.apache.org/jmeter/JMeterSocketClosed