Need help with this socket timeout exception that I keep getting intermittently in my application while trying to read messages from AWS SQS queues.
Process:
I keep calling SQS queue receive message request with default wait message time of 20 seconds. (AmazonSQSClient.receiveMessage)
If no message was received in above request another call is made to SQS to again poll the queue (same receive message request).
Issue:
Intermittently in my application, I get the following exception in SQS receiveMessage requests:
com.amazonaws.http.AmazonHttpClient:executeHelper:492 - main - INFO :
Unable to execute HTTP request: Read timed out
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_121]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0_121]
at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:1.8.0_121]
at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:1.8.0_121]
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[?:1.8.0_121]
at sun.security.ssl.InputRecord.read(InputRecord.java:503) ~[?:1.8.0_121]
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[?:1.8.0_121]
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[?:1.8.0_121]
By checking SQS queue from the console I see that during the failures a message is read from the queue (message goes in flight) but is not received at the application end. This leads to message getting locked for the visibility timeout period (1 hour) and hence a delayed processing of the message.
Additional details:
waitTimeSeconds for ReceiveMessageRequest is SQS default of 20 seconds.
Socket timeout for the request is AmazonHttpClient's default 50 seconds. (Exception is also seen after 50 seconds from previous read)
Max number of messages to be received in a request is set to 10 messages.
AWS SDK version used : 1.10.24 (aws-java-sdk-sqs-1.10.24.jar)
Any help around how to resolve or debug this further is greatly appreciated. Let me know if any further details are needed.
Related
Flink 1.5.3, When I submit flink job to flink cluster (on yarn), it always throw AskTimeoutException. In flink configuration file, I have configed the parmater "akka.ask.timeout=1000s" , but the Exception is still like this below.
That means I have increased the timeout parameter, "akka.ask.timeout=1000s" , but it does not work.
org.apache.flink.runtime.rest.handler.RestHandlerException: Job submission failed.
at org.apache.flink.runtime.rest.handler.job.JobSubmitHandler.lambda$handleRequest$2(JobSubmitHandler.java:116)
at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1851759541]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
... 21 more
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/dispatcher#-1851759541]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
... 9 more
So is there any solution to avoid this issue?
The timeouts of the communication between the REST handlers and the Flink cluster is controlled by web.timeout. The timeout is specified in milliseconds and, thus, you would need to set it to web.timeout: 1000000 in your flink-conf.yaml if you want to wait 1000s.
Moreover, it would be good to check the cluster entrypoint logs why the job submission takes so long. Usually it should not take longer than 10 seconds.
I am trying to upload file using Drive.Files.Create api. It ran well but suddenly getting SocketTimeOut Exception after 10-15 request.
drive = new Drive.Builder(httpTransport, JSON_FACTORY, credential).setApplicationName(
APPLICATION_NAME).build();
Drive.Files.Create insert = drive.files().create(fileMetadata, mediaContent);
insert.setUseContentAsIndexableText(true);
File f=insert.execute();
Exception StackTrace:
Exception in thread "main" java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:338)
at com.google.api.client.http.javanet.NetHttpResponse.<init>(NetHttpResponse.java:37)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:94)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:981)
at com.google.api.client.googleapis.media.MediaHttpUploader.executeCurrentRequestWithoutGZip(MediaHttpUploader.java:545)
at com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:417)
at com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:427)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at com.ob.pdfparser.MyClass.uploadFile(MyClass.java:42)
at com.ob.pdfparser.MyClass.main(MyClass.java:91)
Usage Limit:
Queries
requests/day 20 of 1,000,000,000
requests/100seconds/user 1,000
Any Idea??
Below thing worked
Drive drive = new Drive.Builder(this.httpTransport, this.jsonFactory, this.credential).setHttpRequestInitializer(new HttpRequestInitializer() {#Override
public void initialize(HttpRequest httpRequest) throws IOException {
credential.initialize(httpRequest);
httpRequest.setConnectTimeout(300 * 60000); // 300 minutes connect timeout
httpRequest.setReadTimeout(300 * 60000); // 300 minutes read timeout
}
}).setApplicationName("My Application").build();
SocketTimeoutException: Read timed out can be solved by defining a connection timeout or adjusting the timeout barrier so that it's more flexible on network delays. Other than that, you can also check what causes delays in your network because, as far as I know, this kind of error is a network connectivity issue and not a programming issue.
Complete explanation about this kind of error can be found in java.net.SocketTimeoutException – How to Solve SocketTimeoutException and this SO post - Getting java.net.SocketTimeoutException: Connection timed out in android might also help.
Below is a simple code snippet that shows how to connect to a VoltDB server.
ClientConfig clientConfig = new ClientConfig();
Client client = ClientFactory.createClient(clientConfig);
String server = "192.168.43.32";
client.createConnection(server);
Based on my experiments, if the server is down or just not connectable from network layer, it will take about 75 seconds to get the response.
SEVERE: Failed to connect to 192.168.43.32, in 75,359 ms
java.net.ConnectException: Operation timed out
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:458)
at sun.nio.ch.Net.connect(Net.java:450)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:154)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:142)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:134)
at org.voltdb.client.Distributer.createConnectionWithHashedCredentials(Distributer.java:878)
at org.voltdb.client.ClientImpl.createConnectionWithHashedCredentials(ClientImpl.java:189)
at org.voltdb.client.ClientImpl.createConnection(ClientImpl.java:682)
at src.java.tutorial.voltdb.integration.ConnectionTest.main(ConnectionTest.java:27)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Is there any ways to set the time out time, so the application needs not to wait for such a long time. A successful connection normally takes just tens of milliseconds, so I think if the connection cannot be established within 1000 milliseconds, something is definitely wrong already.
I have tried the setting of below
clientConfig.setConnectionResponseTimeout(1000);
In this case, it has no effects at all. So I guess it is not for this purpose.
Normally when the database is down and your client tries to connect it will get an immediate Connection refused exception, for example:
Exception in thread "main" java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:364)
at sun.nio.ch.Net.connect(Net.java:356)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:623)
at java.nio.channels.SocketChannel.open(SocketChannel.java:184)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:165)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:153)
at org.voltdb.client.ConnectionUtil.getAuthenticatedConnection(ConnectionUtil.java:145)
at org.voltdb.client.Distributer.createConnectionWithHashedCredentials(Distributer.java:890)
at org.voltdb.client.ClientImpl.createConnectionWithHashedCredentials(ClientImpl.java:191)
at org.voltdb.client.ClientImpl.createConnection(ClientImpl.java:684)
at benchmark.Benchmark.<init>(Benchmark.java:17)
at benchmark.Benchmark.main(Benchmark.java:78)
In general, a "java.net.ConnectException: Connection timed out" can occur if there is a firewall that prevents the client from receiving any sort of response, or there could be other causes. The first thing to check might be if you have any firewall or network settings that would prevent access to port 21212 (the default VoltDB database connection port).
The ClientConfig setConnectionResponseTimeout() setting is used to cause a live connection to be closed if it hasn't received a response from a procedure call or a ping for the given number of milliseconds, but it is not used for creating a new connection.
If I set a socket SoTimeout, and read from it. when read time exceed the timeout limit, I'll get an "SocketTimeoutException: Read timed out".
and here is the stack in my case:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:277)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:527)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:462)
but here I encountered "IOExcetion: Connection timed out", i don't know how it happened.
Stacks:
java.io.IOException: Connection timed out
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:277)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:527)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:462)
Can someone tell me what's the differences between the two exceptions, Thanks.
A connection timeout means you attempted to connect to the remote IP/port pair and failed to do so: it did not answer at all. Another possible error at that stage would be connection refused, in which this pair is available but rejected your connection attempt. Both of these errors appear on the initial setup of a socket. Note that these errors only occur with TCP, since a TCP connection requires the establishment of a session.
When you have a socket read timeout, it means you are connected, but failed to read data in time. Timeouts on sockets are configurable. You may also get a connection reset error, which means you did connect successfully, but the other end decided that after all you're not worth it :p
Simple answer:
In one case (Connection timed out) your application cannot connect to the server in a timely manner. In the other case (Read timed out) the connection can be established but during read the connection times out.
'Connection timed out' after the connect phase means that something has gone seriously wrong with the connection and it must be closed. 'Read timeout' just means that no data arrived within the specified receive timeout period: it isn't fatal.
I'm using a Jetty based servlet to do RPC and I'm having an issue where a request that takes a long time throws the following exception on the server:
2012-02-11 21:07:07,673 [btpool0-4] DEBUG org.mortbay.log - EXCEPTION
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at org.mortbay.io.ByteArrayBuffer.readFrom(ByteArrayBuffer.java:168)
at org.mortbay.io.bio.StreamEndPoint.fill(StreamEndPoint.java:99)
at org.mortbay.jetty.bio.SocketConnector$Connection.fill(SocketConnector
.java:190)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:277)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:203)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:357)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:217)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool
.java:475) 2012-02-11 21:07:07,674 [btpool0-4] DEBUG org.mortbay.log
- EOF
I tried setting the Connection,Keep-Alive http request property but that had no effect and from what I can gather, http 1.1 (which I'm pretty sure I'm using) is persistent by default.
So I think there are 2 ways I can try to address this:
figure out how to prevent the timeout exception from being
thrown at all
Have the client issue the initial request without waiting
for a response, and then ping with separate requests to check when
the server is done.
Update (2/12/2012): I set the maxIdleTime as Tim suggested and that did extend the time before the timeout occurred, but then I started getting a new exception:
2012-02-11 23:24:01,187 [btpool0-1] DEBUG org.mortbay.log - EXCEPTION
java.io.IOException: An existing connection was forcibly closed by the
remote host at sun.nio.ch.SocketDispatcher.read0(Native Method) at
sun.nio.ch.SocketDispatcher.read(Unknown Source) at
sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at
sun.nio.ch.IOUtil.read(Unknown Source) at
sun.nio.ch.SocketChannelImpl.read(Unknown Source) at
org.mortbay.io.nio.ChannelEndPoint.fill(ChannelEndPoint.java:129) at
org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:277) at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:203) at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:357) at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:329)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475)
So something outside of Jetty was killing the connection, I suspect most likely a firewall. So what I ended up doing was making the server process the request with multiple threads; the original thread would immediately respond to the http request and a second thread would be kicked off to perform the action that was taking a long time. The client would then poll with http requests to check when the action on the server was complete.
This is a socket timeout, so nothing you do at the HTTP level can fix it - hence your keep alive not achieving anything.
Try setting the maxIdleTime on the SocketConnector
See here: http://docs.codehaus.org/display/JETTY/Configuring+Connectors ( archive link )